Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Nie, Ercong; Liang, Sheng; Schmid, Helmut und Schütze, Hinrich (Juli 2023): Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages. 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023), Toronto, Canada, July 2023. In: Findings of the Association for Computational Linguistics: ACL 2023, Stroudsburg, PA: Association for Computational Linguistics. S. 8320-8340 [PDF, 1MB]

[thumbnail of 2023.findings-acl.528.pdf]

Vorschau

Creative Commons: Namensnennung 4.0 (CC-BY)

DOI: 10.18653/v1/2023.findings-acl.528

Externer Volltext: https://aclanthology.org/2023.findings-acl.528

Abstract

Multilingual Pretrained Language Models (MPLMs) perform strongly in cross-lingual transfer. We propose Prompts Augmented by Retrieval Crosslingually (PARC) to improve zero-shot performance on low-resource languages (LRLs) by augmenting the context with prompts consisting of semantically similar sentences retrieved from a high-resource language (HRL). PARC improves zero-shot performance on three downstream tasks (sentiment classification, topic categorization, natural language inference) with multilingual parallel test sets across 10 LRLs covering 6 language families in unlabeled (+5.1%) and labeled settings (+16.3%). PARC also outperforms finetuning by 3.7%. We find a significant positive correlation between cross-lingual transfer performance on one side, and the similarity between high- and low-resource languages as well as the amount of low-resource pretraining data on the other side. A robustness analysis suggests that PARC has the potential to achieve even stronger performance with more powerful MPLMs.

Dokumententyp:	Konferenzbeitrag (Paper)
EU Funded Grant Agreement Number:	740516
EU-Projekte:	Horizon 2020 > ERC Grants > ERC Advanced Grant > ERC Grant 740516: NonSequeToR - Non-sequence models for tokenization replacement
Fakultätsübergreifende Einrichtungen:	Centrum für Informations- und Sprachverarbeitung (CIS)
Themengebiete:	400 Sprache > 400 Sprache 400 Sprache > 410 Linguistik
URN:	urn:nbn:de:bvb:19-epub-107445-1
Ort:	Stroudsburg, PA
Sprache:	Englisch
Dokumenten ID:	107445
Datum der Veröffentlichung auf Open Access LMU:	20. Okt. 2023 08:36
Letzte Änderungen:	24. Okt. 2023 13:12

Dokument bearbeiten