Reproducible Extraction of Cross-lingual Topics (rectr)

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Chan, Chung-Hong; Zeng, Jing; Wessler, Hartmut; Jungblut, Marc; Welbers, Kasper; Bajjalieh, Joseph W.; Atteveldt, Wouter van und Althaus, Scott L. (2020): Reproducible Extraction of Cross-lingual Topics (rectr). In: Communication Methods and Measures, Bd. 14, Nr. 4: S. 285-305

Volltext auf 'Open Access LMU' nicht verfügbar.

DOI: 10.1080/19312458.2020.1812555

Abstract

With global media content databases and online content being available, analyzing topical structures in different languages simultaneously has become an urgent computational task. Some previous studies have analyzed topics in a multilingual corpus by translating all items into a single language using a machine translation service, such as Google Translate. We argue that this method is not reproducible in the long run and proposes a new method - Reproducible Extraction of Cross-lingual Topics Using R (rectr). Our method utilizes open-source-aligned word embeddings to understand the cross-lingual meanings of words and has a mechanism to normalize residual influence from language differences. We present a benchmark that compares the topics extracted from a corpus of English, German, and French news using our method with methods used in the literature. We show that our method is not only reproducible but can also generate high-quality cross-lingual topics. We demonstrate how our method can be applied in tracking news topics across time and languages.

Dokumententyp:	Zeitschriftenartikel
Fakultät:	Sozialwissenschaften > Institut für Kommunikationswissenschaft und Medienforschung (IfKW)
Themengebiete:	000 Informatik, Informationswissenschaft, allgemeine Werke > 070 Publizistische Medien, Journalismus, Verlagswesen
ISSN:	1931-2458
Sprache:	Englisch
Dokumenten ID:	88766
Datum der Veröffentlichung auf Open Access LMU:	25. Jan. 2022 09:28
Letzte Änderungen:	25. Jan. 2022 09:28

Dokument bearbeiten