Pre-trained language models evaluating themselves

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Koch, Philipp; Aßenmacher, Matthias und Heumann, Christian (26. Mai 2022): Pre-trained language models evaluating themselves - A comparative study. Proceedings of the Third Workshop on Insights from Negative Results in NLP, Dublin, 26.05.2022. Dublin, Ireland: Association for Computational Linguistics. S. 180-187 [PDF, 242kB]

Vorschau

DOI: http://dx.doi.org/10.18653/v1/2022.insights-1.25

Externer Volltext: https://aclanthology.org/2022.insights-1.25

Abstract

Evaluating generated text received new attention with the introduction of model-based metrics in recent years. These new metrics have a higher correlation with human judgments and seemingly overcome many issues of previous n-gram based metrics from the symbolic age. In this work, we examine the recently introduced metrics BERTScore, BLEURT, NUBIA, MoverScore, and Mark-Evaluate (Petersen). We investigate their sensitivity to different types of semantic deterioration (part of speech drop and negation), word order perturbations, word drop, and the common problem of repetition. No metric showed appropriate behaviour for negation, and further none of them was overall sensitive to the other issues mentioned above.

Dokumententyp:	Konferenzbeitrag (Poster)
Fakultät:	Mathematik, Informatik und Statistik > Statistik Mathematik, Informatik und Statistik > Statistik > Lehrstühle/Arbeitsgruppen > Methoden für fehlende Daten, Modellselektion und Modellmittelung
Themengebiete:	500 Naturwissenschaften und Mathematik > 510 Mathematik
URN:	urn:nbn:de:bvb:19-epub-92533-3
Ort:	Dublin, Ireland
Dokumenten ID:	92533
Datum der Veröffentlichung auf Open Access LMU:	01. Jul. 2022 09:41
Letzte Änderungen:	01. Jul. 2022 09:41

Dokument bearbeiten