Drift Detection in Text Data with Document Embeddings

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Feldhans, Robert; Wilke, Adrian; Heindorf, Stefan; Shaker, Mohammad Hossein; Hammer, Barbara; Ngonga Ngomo, Axel-Cyrille und Hüllermeier, Eyke ORCID: https://orcid.org/0000-0002-9944-4108 (November 2021): Drift Detection in Text Data with Document Embeddings. Intelligent Data Engineering and Automated Learning – IDEAL 2021, Manchester, United Kingdom, November 25-27, 2021. In: Intelligent Data Engineering and Automated Learning – IDEAL 2021, Bd. 13113 Cham: Springer. S. 107-118

Volltext auf 'Open Access LMU' nicht verfügbar.

DOI: 10.1007/978-3-030-91608-4_11

Abstract

Collections of text documents such as product reviews and microblogs often evolve over time. In practice, however, classifiers trained on them are updated infrequently, leading to performance degradation over time. While approaches for automatic drift detection have been proposed, they were often designed for low-dimensional sensor data, and it is unclear how well they perform for state-of-the-art text classifiers based on high-dimensional document embeddings. In this paper, we empirically compare drift detectors on document embeddings on two benchmarking datasets with varying amounts of drift. Our results show that multivariate drift detectors based on the Kernel Two-Sample Test and Least-Squares Density Difference outperform univariate drift detectors based on the Kolmogorov-Smirnov Test. Moreover, our experiments show that current drift detectors perform better on smaller embedding dimensions.

Dokumententyp:	Konferenzbeitrag (Paper)
Publikationsform:	Publisher's Version
Fakultät:	Mathematik, Informatik und Statistik > Informatik > Künstliche Intelligenz und Maschinelles Lernen
Themengebiete:	000 Informatik, Informationswissenschaft, allgemeine Werke > 000 Informatik, Wissen, Systeme
ISSN:	0302-9743
Ort:	Cham
Sprache:	Englisch
Dokumenten ID:	92511
Datum der Veröffentlichung auf Open Access LMU:	18. Jul. 2022 12:39
Letzte Änderungen:	18. Jul. 2022 12:39

Dokument bearbeiten