Abstract
The mapping of a raw phonetic transcription to an orthographic word sequence is carried out in three steps: First, a syllable segmentation of the transcription is bootstrapped, based on unsupervised subtractive learning. Then, the syllables are grouped to word entities guided by non-linguistic distributional properties. Finally, the phonetic word segmentations are mapped onto entries of a canonic pronunciation dictionary by means of a co-occurrence based aligner. For syllable segmentation accuracies between 89 and 96% are obtained, and for word segmentation accuracies between 92 and 98%. The transcription to word conversion performance amounts 77%.
Dokumententyp: | Konferenzbeitrag (Paper) |
---|---|
Publikationsform: | Postprint |
Keywords: | segmentation, bootstrap, distributional learning, subtractive learning, alignment |
Fakultät: | Sprach- und Literaturwissenschaften > Department 2 > Phonetik und Sprachverarbeitung |
Themengebiete: | 400 Sprache > 410 Linguistik |
URN: | urn:nbn:de:bvb:19-epub-18040-8 |
Sprache: | Englisch |
Dokumenten ID: | 18040 |
Datum der Veröffentlichung auf Open Access LMU: | 27. Jan. 2014, 12:50 |
Letzte Änderungen: | 04. Nov. 2020, 12:59 |