Abstract
Morphological analysis is an important component of natural language processing systems like spelling correction tools, parsers, machine translation systems, and dictionary tools. In this paper, we present TRMOR, a morphological analyzer for Turkish, which uses the SFST tool (Stuttgart Finite-State Transducer). TRMOR can be freely used for academic research (see http://www.cis.uni-muenchen.de/schmid/tools/SFST/). It covers a large part of Turkish morphology including inflection, derivation, and some compounding. It uses morphotactic and morphophonological rules and a stem lexicon. We describe the morphological structure of Turkish, explain the phonological and morphological rules implemented in TRMOR, evaluate the system, and test it in special cases. The evaluation of TRMOR was executed on gold-standard words. One thousand words were randomly selected from Wikipedia word lists. For those words, we achieved gold-standard analysis. TRMOR has 94.12% precision on these 1000 words that were randomly selected from Wikipedia word lists. Morphological analyses of Turkish are prepared for the gold-standard version since, to our knowledge, there is no gold-standard segmentation available for Turkish morphological analyzers for noncommercial purposes.
Dokumententyp: | Zeitschriftenartikel |
---|---|
Fakultätsübergreifende Einrichtungen: | Centrum für Informations- und Sprachverarbeitung (CIS) |
Themengebiete: | 400 Sprache > 400 Sprache |
ISSN: | 1300-0632 |
Sprache: | Englisch |
Dokumenten ID: | 84255 |
Datum der Veröffentlichung auf Open Access LMU: | 15. Dez. 2021, 15:10 |
Letzte Änderungen: | 15. Dez. 2021, 15:10 |