Logo Logo
Hilfe
Hilfe
Switch Language to English

Sahala, Aleksi; Baragli, Beatrice ORCID logoORCID: https://orcid.org/0000-0002-3549-2012; Lentini, Giulia und Tushingham, Poppy (2024): Towards a word similarity gold standard for Akkadian: creation and model optimization. In: it - Information Technology, Bd. 66, Nr. 1: S. 4-14

Volltext auf 'Open Access LMU' nicht verfügbar.

Abstract

We present a word similarity gold standard for Akkadian, a language documented in ancient Mesopotamian sources from the 24th century BCE until the first century CE. The gold standard comprises 300 word pairs ranked by their paradigmatic similarity by five independently working Assyriologists. We use the gold standard to tune PMI + SVD and fastText models to improve their performance. We also present a hyper-parametrized PMI + SVD model for building count-based word embeddings, that aims to deal with the data sparsity and repetition issues encountered in Akkadian texts. Our model combines Dirichlet smoothing with context distribution smoothing, and uses context similarity weighting to down-sample distortion caused by formulaic litanies and partially or fully duplicated passages.

Dokument bearbeiten Dokument bearbeiten