Neural Transductive Learning and Beyond: Morphological Generation in the Minimal-Resource Setting

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Kann, Katharina und Schütze, Hinrich (Oktober 2018): Neural Transductive Learning and Beyond: Morphological Generation in the Minimal-Resource Setting. 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31. October - 04. November 2018. Riloff, Ellen; Chiang, David; Hockenmaier, Julia und Tsujii, Jun’ichi (Hrsg.): In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA: Association for Computational Linguistics (ACL). S. 3060-3066 [PDF, 879kB]

Vorschau

DOI: 10.5282/ubm/epub.61867

Externer Volltext: https://aclweb.org/anthology/papers/D/D18/D18-1343/

Abstract

Neural state-of-the-art sequence-to-sequence (seq2seq) models often do not perform well for small training sets. We address paradigm completion, the morphological task of, given a partial paradigm, generating all missing forms. We propose two new methods for the minimalresource setting: (i) Paradigm transduction: Since we assume only few paradigms available for training, neural seq2seq models are able to capture relationships between paradigm cells, but are tied to the idiosyncracies of the training set. Paradigm transduction mitigates this problem by exploiting the input subset of inflected forms at test time. (ii) Source selection with high precision (SHIP): Multi-source models which learn to automatically select one or multiple sources to predict a target inflection do not perform well in the minimal-resource setting. SHIP is an alternative to identify a reliable source if training data is limited. On a 52-language benchmark dataset, we outperform the previous state of the art by up to 9.71% absolute accuracy.

Dokumententyp:	Konferenzbeitrag (Paper)
EU Funded Grant Agreement Number:	740516
EU-Projekte:	Horizon 2020 > ERC Grants > ERC Advanced Grant > ERC Grant 740516: NonSequeToR - Non-sequence models for tokenization replacement
Fakultätsübergreifende Einrichtungen:	Centrum für Informations- und Sprachverarbeitung (CIS)
Themengebiete:	000 Informatik, Informationswissenschaft, allgemeine Werke > 000 Informatik, Wissen, Systeme 000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik 400 Sprache > 400 Sprache 400 Sprache > 410 Linguistik
URN:	urn:nbn:de:bvb:19-epub-61867-9
Ort:	Stroudsburg, PA
Sprache:	Englisch
Dokumenten ID:	61867
Datum der Veröffentlichung auf Open Access LMU:	13. Mai 2019 14:01
Letzte Änderungen:	04. Nov. 2020 13:39

Dokument bearbeiten