Abstract
The mapping of a raw phonetic transcription to an orthographic word sequence is carried out in three steps: First, a syllable segmentation of the transcription is bootstrapped, based on unsupervised subtractive learning. Then, the syllables are grouped to word entities guided by non-linguistic distributional properties. Finally, the phonetic word segmentations are mapped onto entries of a canonic pronunciation dictionary by means of a co-occurrence based aligner. For syllable segmentation accuracies between 89 and 96% are obtained, and for word segmentation accuracies between 92 and 98%. The transcription to word conversion performance amounts 77%.
| Item Type: | Conference or Workshop Item (Paper) |
|---|---|
| Form of publication: | Postprint |
| Keywords: | segmentation, bootstrap, distributional learning, subtractive learning, alignment |
| Faculties: | Languages and Literatures > Department 2 > Speech Science |
| Subjects: | 400 Language > 410 Linguistics |
| URN: | urn:nbn:de:bvb:19-epub-18040-8 |
| Language: | English |
| Item ID: | 18040 |
| Date Deposited: | 27. Jan 2014 12:50 |
| Last Modified: | 04. Nov 2020 12:59 |

