Abstract
The mapping of a raw phonetic transcription to an orthographic word sequence is carried out in three steps: First, a syllable segmentation of the transcription is bootstrapped, based on unsupervised subtractive learning. Then, the syllables are grouped to word entities guided by non-linguistic distributional properties. Finally, the phonetic word segmentations are mapped onto entries of a canonic pronunciation dictionary by means of a co-occurrence based aligner. For syllable segmentation accuracies between 89 and 96% are obtained, and for word segmentation accuracies between 92 and 98%. The transcription to word conversion performance amounts 77%.
| Dokumententyp: | Konferenzbeitrag (Paper) |
|---|---|
| Publikationsform: | Postprint |
| Keywords: | segmentation, bootstrap, distributional learning, subtractive learning, alignment |
| Fakultät: | Sprach- und Literaturwissenschaften > Department 2 > Phonetik und Sprachverarbeitung |
| Themengebiete: | 400 Sprache > 410 Linguistik |
| URN: | urn:nbn:de:bvb:19-epub-18040-8 |
| Sprache: | Englisch |
| Dokumenten ID: | 18040 |
| Datum der Veröffentlichung auf Open Access LMU: | 27. Jan. 2014 12:50 |
| Letzte Änderungen: | 04. Nov. 2020 12:59 |

