Logo
DeutschClear Cookie - decide language by browser settings
Reichel, Uwe D. and Bucar Shigemori, Lia Saki (2008): Automatic correction of part-of-speech corpora. In: Speech and language technology, Vol. 11: pp. 167-174
[img]
Preview

PDF

55kB

Abstract

In this study a simple method for automatic correction of part-ofspeech corpora is presented, which works as follows: Initially two or more already available part-of-speech taggers are applied on the data. Then a sample of differing outputs is taken to train a classifier to predict for each difference which of the taggers (if any) delivered the correct output. As classifiers we employed instance-based learning, a C4.5 decision tree and a Bayesian classifier. Their performances ranged from 59.1 % to 67.3 %. Training on the automatically corrected data finally lead to significant improvements in tagger performance.