Abstract
The mapping of a raw phonetic transcription to an orthographic word sequence is carried out in three steps: First, a syllable segmentation of the transcription is bootstrapped, based on unsupervised subtractive learning. Then, the syllables are grouped to word entities guided by non-linguistic distributional properties. Finally, the phonetic word segmentations are mapped onto entries of a canonic pronunciation dictionary by means of a co-occurrence based aligner. For syllable segmentation accuracies between 89 and 96% are obtained, and for word segmentation accuracies between 92 and 98%. The transcription to word conversion performance amounts 77%.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Form of publication: | Postprint |
Keywords: | segmentation, bootstrap, distributional learning, subtractive learning, alignment |
Faculties: | Languages and Literatures > Department 2 > Speech Science |
Subjects: | 400 Language > 410 Linguistics |
URN: | urn:nbn:de:bvb:19-epub-18040-8 |
Language: | English |
Item ID: | 18040 |
Date Deposited: | 27. Jan 2014, 12:50 |
Last Modified: | 04. Nov 2020, 12:59 |