The Better Your Syntax, the Better Your Semantics? Probing Pretrained Language Models for the English Comparative Correlative

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Weissweiler, Leonie; Hofmann, Valentin; Köksal, Abdullatif und Schütze, Hinrich (Dezember 2022): The Better Your Syntax, the Better Your Semantics? Probing Pretrained Language Models for the English Comparative Correlative. EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022. Che, Wanxiang und Shutova, Ekaterina (Hrsg.): In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Stroudsburg, PA: Association for Computational Linguistics (ACL). S. 10859-10882 [PDF, 635kB]

Vorschau

Creative Commons: Namensnennung 4.0 (CC-BY)

DOI: 10.18653/v1/2022.emnlp-main.746

Abstract

Construction Grammar (CxG) is a paradigm from cognitive linguistics emphasising the connection between syntax and semantics. Rather than rules that operate on lexical items, it posits constructions as the central building blocks of language, i.e., linguistic units of different granularity that combine syntax and semantics. As a first step towards assessing the compatibility of CxG with the syntactic and semantic knowledge demonstrated by state-of-the-art pretrained language models (PLMs), we present an investigation of their capability to classify and understand one of the most commonly studied constructions, the English comparative correlative (CC). We conduct experiments examining the classification accuracy of a syntactic probe on the one hand and the models’ behaviour in a semantic application task on the other, with BERT, RoBERTa, and DeBERTa as the example PLMs. Our results show that all three investigated PLMs are able to recognise the structure of the CC but fail to use its meaning. While human-like performance of PLMs on many NLP tasks has been alleged, this indicates that PLMs still suffer from substantial shortcomings in central domains of linguistic knowledge.

Dokumententyp:	Konferenzbeitrag (Paper)
EU Funded Grant Agreement Number:	740516
EU-Projekte:	Horizon 2020 > ERC Grants > ERC Advanced Grant > ERC Grant 740516: NonSequeToR - Non-sequence models for tokenization replacement
Fakultätsübergreifende Einrichtungen:	Centrum für Informations- und Sprachverarbeitung (CIS)
Themengebiete:	400 Sprache > 400 Sprache 400 Sprache > 410 Linguistik
URN:	urn:nbn:de:bvb:19-epub-107436-1
Ort:	Stroudsburg, PA
Bemerkung:	ISBN 978-1-959429-41-8
Sprache:	Englisch
Dokumenten ID:	107436
Datum der Veröffentlichung auf Open Access LMU:	20. Okt. 2023 06:55
Letzte Änderungen:	20. Okt. 2023 07:14

Dokument bearbeiten