Abstract
Recent research has made impressive progress in large-scale multimodal pre-training. In the context of the rapid growth of model size, it is necessary to seek efficient and flexible methods other than finetuning. In this paper, we propose to use prompt vectors to align the modalities. Our method achieves comparable performance to several other multimodal fusion methods in low-resource settings. We further show that our method is modular and parameter-efficient for processing tasks involving two or more data modalities.
Dokumententyp: | Konferenzbeitrag (Paper) |
---|---|
EU Funded Grant Agreement Number: | 740516 |
EU-Projekte: | Horizon 2020 > ERC Grants > ERC Advanced Grant > ERC Grant 740516: NonSequeToR - Non-sequence models for tokenization replacement |
Fakultätsübergreifende Einrichtungen: | Centrum für Informations- und Sprachverarbeitung (CIS) |
Themengebiete: | 000 Informatik, Informationswissenschaft, allgemeine Werke > 000 Informatik, Wissen, Systeme
400 Sprache > 400 Sprache 400 Sprache > 410 Linguistik |
URN: | urn:nbn:de:bvb:19-epub-92202-4 |
Ort: | Stroudsburg, PA |
Sprache: | Englisch |
Dokumenten ID: | 92202 |
Datum der Veröffentlichung auf Open Access LMU: | 27. Mai 2022, 10:08 |
Letzte Änderungen: | 27. Mai 2022, 10:08 |