Better OOV Translation with Bilingual Terminology Mining

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Huck, Matthias; Hangya, Viktor und Fraser, Alexander (2019): Better OOV Translation with Bilingual Terminology Mining. In: 57Th Annual Meeting of the Association for Computational Linguistics (Acl 2019): S. 5809-5815

Volltext auf 'Open Access LMU' nicht verfügbar.

Abstract

Unseen words, also called out-of-vocabulary words (OOVs), are difficult for machine translation. In neural machine translation, byte-pair encoding can be used to represent OOVs, but they are still often incorrectly translated. We improve the translation of OOVs in NMT using easy-to-obtain monolingual data. We look for OOVs in the text to be translated and translate them using simple-to-construct bilingual word embeddings (BWEs). In our MT experiments we take the 5-best candidates, which is motivated by intrinsic mining experiments. Using all five of the proposed target language words as queries we mine target-language sentences. We then back-translate, forcing the back-translation of each of the five proposed target-language OOV-translation-candidates to be the original source-language OOV. We show that by using this synthetic data to finetune our system the translation of OOVs can be dramatically improved. In our experiments we use a system trained on Europarl and mine sentences containing medical terms from monolingual data.

Dokumententyp:	Zeitschriftenartikel
Fakultätsübergreifende Einrichtungen:	Centrum für Informations- und Sprachverarbeitung (CIS)
Themengebiete:	400 Sprache > 400 Sprache
Sprache:	Englisch
Dokumenten ID:	84253
Datum der Veröffentlichung auf Open Access LMU:	15. Dez. 2021 15:10
Letzte Änderungen:	15. Dez. 2021 15:10

Dokument bearbeiten