CC-Top: Constrained Clustering for Dynamic Topic Discovery

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Goschenhofer, Jann; Ragupathy, Pranav; Heumann, Christian; Bischl, Bernd und Aßenmacher, Matthias (Dezember 2022): CC-Top: Constrained Clustering for Dynamic Topic Discovery. Proceedings of the The First Workshop on Ever Evolving NLP (EvoNLP), Abu Dhabi, UAE, 7.12.22 - 11.12.22. S. 26-34 [PDF, 1MB]

Vorschau

DOI: 10.5282/ubm/epub.95618

Abstract

Research on multi-class text classification of short texts mainly focuses on supervised (transfer) learning approaches, requiring a finite set of pre-defined classes which is constant over time. This work explores deep constrained clustering (CC) as an alternative to supervised learning approaches in a setting with a dynamically changing number of classes, a task we introduce as dynamic topic discovery (DTD). We do so by using pairwise similarity constraints instead of instance-level class labels which allow for a flexible number of classes while exhibiting a competitive performance compared to supervised approaches. First, we substantiate this through a series of experiments and show that CC algorithms exhibit a predictive performance similar to state-of-the-art supervised learning algorithms while requiring less annotation effort. Second, we demonstrate the overclustering capabilities of deep CC for detecting topics in short text data sets in the absence of the ground truth class cardinality during model training. Third, we showcase how these capabilities can be leveraged for the DTD setting as a step towards dynamic learning over time. Finally, we release our codebase to nurture further research in this area.

Dokumententyp:	Konferenzbeitrag (Vortrag)
Fakultät:	Mathematik, Informatik und Statistik > Statistik Mathematik, Informatik und Statistik > Statistik > Lehrstühle/Arbeitsgruppen > Computationale Statistik Mathematik, Informatik und Statistik > Statistik > Lehrstühle/Arbeitsgruppen > Methoden für fehlende Daten, Modellselektion und Modellmittelung
Themengebiete:	500 Naturwissenschaften und Mathematik > 510 Mathematik
URN:	urn:nbn:de:bvb:19-epub-95618-2
Dokumenten ID:	95618
Datum der Veröffentlichung auf Open Access LMU:	05. Apr. 2023 07:35
Letzte Änderungen:	05. Apr. 2023 07:35

Dokument bearbeiten