Aligning Very Small Parallel Corpora Using Cross-Lingual Word Embeddings and a Monogamy Objective

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Poerner, Nina; Sabet, Masoud Jalili; Roth, Benjamin und Schütze, Hinrich (Oktober 2018): Aligning Very Small Parallel Corpora Using Cross-Lingual Word Embeddings and a Monogamy Objective. [PDF, 409kB]

Vorschau

DOI: 10.5282/ubm/epub.61865

Abstract

Count-based word alignment methods, such as the IBM models or fast-align, struggle on very small parallel corpora. We therefore present an alternative approach based on cross-lingual word embeddings (CLWEs), which are trained on purely monolingual data. Our main contribution is an unsupervised objective to adapt CLWEs to parallel corpora. In experiments on between 25 and 500 sentences, our method outperforms fast-align. We also show that our fine-tuning objective consistently improves a CLWE-only baseline.

Dokumententyp:	Paper
EU Funded Grant Agreement Number:	740516
EU-Projekte:	Horizon 2020 > ERC Grants > ERC Advanced Grant > ERC Grant 740516: NonSequeToR - Non-sequence models for tokenization replacement
Fakultätsübergreifende Einrichtungen:	Centrum für Informations- und Sprachverarbeitung (CIS)
Themengebiete:	000 Informatik, Informationswissenschaft, allgemeine Werke > 000 Informatik, Wissen, Systeme 000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik 400 Sprache > 400 Sprache 400 Sprache > 410 Linguistik
URN:	urn:nbn:de:bvb:19-epub-61865-8
Sprache:	Englisch
Dokumenten ID:	61865
Datum der Veröffentlichung auf Open Access LMU:	13. Mai 2019 13:40
Letzte Änderungen:	04. Nov. 2020 13:39

Dokument bearbeiten