Abstract
A number of methods have been proposed to automatically extract collocations, i.e., conventionalized lexical combinations, from text corpora. However, the attempts to evaluate and compare them with a specific application in mind lag behind. This paper compares three end-to-end resources for collocation learning, all of which used the same corpus but different methods. Adopting a gold-standard evaluation method, the results show that the method of dependency parsing outperforms regex-over-pos in collocation identification. The lexical association measures (AMs) used for collocation ranking perform about the same overall but differently for individual collocation types. Further analysis has also revealed that there are considerable differences between other commonly used AMs.
Dokumententyp: | Zeitschriftenartikel |
---|---|
Fakultät: | Sprach- und Literaturwissenschaften > Department 2 |
Themengebiete: | 400 Sprache > 400 Sprache |
Sprache: | Englisch |
Dokumenten ID: | 82082 |
Datum der Veröffentlichung auf Open Access LMU: | 15. Dez. 2021, 15:00 |
Letzte Änderungen: | 15. Dez. 2021, 15:00 |