On the Copying Problem of Unsupervised NMT: A Training Schedule with a Language Discriminator Loss

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Liu, Yihong; Chronopoulou, Alexandra; Schütze, Hinrich und Fraser, Alexander (Juli 2023): On the Copying Problem of Unsupervised NMT: A Training Schedule with a Language Discriminator Loss. 20th International Conference on Spoken Language Translation (IWSLT 2023), Toronto, Canada, July 13 - 14, 2023. Salesky, Elizabeth; Federico, Marcello und Carpuat, Marine (Hrsg.): In: Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), Stroudsburg, PA: Association for Computational Linguistics (ACL). S. 491-502 [PDF, 704kB]

Vorschau

Creative Commons: Namensnennung 4.0 (CC-BY)

DOI: 10.18653/v1/2023.iwslt-1.48

Abstract

Although unsupervised neural machine translation (UNMT) has achieved success in many language pairs, the copying problem, i.e., directly copying some parts of the input sentence as the translation, is common among distant language pairs, especially when low-resource languages are involved. We find this issue is closely related to an unexpected copying behavior during online back-translation (BT). In this work, we propose a simple but effective training schedule that incorporates a language discriminator loss. The loss imposes constraints on the intermediate translation so that the translation is in the desired language. By conducting extensive experiments on different language pairs, including similar and distant, high and low-resource languages, we find that our method alleviates the copying problem, thus improving the translation performance on low-resource languages.

Dokumententyp:	Konferenzbeitrag (Paper)
EU Funded Grant Agreement Number:	740516
EU-Projekte:	Horizon 2020 > ERC Grants > ERC Advanced Grant > ERC Grant 740516: NonSequeToR - Non-sequence models for tokenization replacement
Fakultätsübergreifende Einrichtungen:	Centrum für Informations- und Sprachverarbeitung (CIS)
Themengebiete:	400 Sprache > 400 Sprache 400 Sprache > 410 Linguistik
URN:	urn:nbn:de:bvb:19-epub-107442-4
Ort:	Stroudsburg, PA
Bemerkung:	ISBN 978-1-959429-84-5
Sprache:	Englisch
Dokumenten ID:	107442
Datum der Veröffentlichung auf Open Access LMU:	20. Okt. 2023 08:13
Letzte Änderungen:	20. Okt. 2023 08:13

Dokument bearbeiten