
Abstract
In this study a simple method for automatic correction of part-ofspeech corpora is presented, which works as follows: Initially two or more already available part-of-speech taggers are applied on the data. Then a sample of differing outputs is taken to train a classifier to predict for each difference which of the taggers (if any) delivered the correct output. As classifiers we employed instance-based learning, a C4.5 decision tree and a Bayesian classifier. Their performances ranged from 59.1 % to 67.3 %. Training on the automatically corrected data finally lead to significant improvements in tagger performance.
Item Type: | Journal article |
---|---|
Faculties: | Languages and Literatures > Department 2 > Speech Science |
Subjects: | 400 Language > 400 Language |
URN: | urn:nbn:de:bvb:19-epub-13565-7 |
Language: | English |
Item ID: | 13565 |
Date Deposited: | 13. Jul 2012, 08:10 |
Last Modified: | 04. Nov 2020, 12:54 |