Mining the Web for New Words: Semi-Automatic Neologism Identification with the NeoCrawler

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Kerremans, Daphne und Prokic, Jelena (2018): Mining the Web for New Words: Semi-Automatic Neologism Identification with the NeoCrawler. In: Anglia - Zeitschrift für Englische Philologie, Bd. 136, Nr. 2: S. 239-268

Volltext auf 'Open Access LMU' nicht verfügbar.

DOI: 10.1515/ang-2018-0032

Abstract

Lexical innovation is omnipresent and constantly at work. Studies aiming to understand the process of lexical innovation and the subsequent diffusion of neologisms therefore benefit from systematic methods of neologism identification. Retrieval procedures in the past have largely consisted of manual activities of participant observations and close reading. Recently, attempts have been made at designing automatized identification procedures, assisted by state-of-the-art natural language processing techniques and tools. Beginning with a discussion of the most commonly used neologism detection methods and applications in linguistics, the present paper will describe a semi-automatic approach to identifying new words on the web, the NeoCrawler's Discoverer, which has been developed as part of a project on the incipient diffusion of lexical innovations. The Discoverer daily processes large batches of online text in English and automatically identifies unknown grapheme sequences as potential neologism candidates by means of a dictionary matching procedure, in which the individual tokens are matched against a very large dictionary. These potential neologisms subsequently are presented to the user for manual evaluation of their neologism status. Finally, candidates are added to the NeoCrawler's database for continuous close monitoring of their development in the online speech community. We argue that the use of dictionary matching in neologism identification offers an efficient method to semi-automatically extract potential instances of lexical innovation with high precision and high recall when compared to previous approaches.

Dokumententyp:	Zeitschriftenartikel
Fakultät:	Sprach- und Literaturwissenschaften > Department 3
Themengebiete:	400 Sprache > 400 Sprache
ISSN:	0340-5222
Sprache:	Englisch
Dokumenten ID:	66221
Datum der Veröffentlichung auf Open Access LMU:	19. Jul. 2019 12:19
Letzte Änderungen:	04. Nov. 2020 13:47

Dokument bearbeiten