Abstract
The Levenshtein distance is an established metric to represent phonological distances between dialects. So far, this metric has usually been applied on manually transcribed word lists. In this study we introduce several extensions of the Levenshtein distance by incorporating probabilistic edit costs as well as temporal alignment costs. We tested all variants for compliance with the axioms that within-dialect utterance pairs are phonologically more similar than across-dialect ones. In contrast to former studies we are not applying the metrics on preselected, prototypical word lists but on real connected speech data which was automatically segmented and labeled. It turned out, that the transcription edit distances already performed well in reflecting the difference between within- and across-dialect comparisons, and that the adding of a temporal component rather weakens the performance of the metrics.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Form of publication: | Postprint |
Keywords: | dialect, distance metrics, Levenshtein, temporal distance |
Faculties: | Languages and Literatures > Department 2 > Speech Science |
Subjects: | 400 Language > 410 Linguistics |
URN: | urn:nbn:de:bvb:19-epub-18050-3 |
Language: | German |
Item ID: | 18050 |
Date Deposited: | 27. Jan 2014, 12:49 |
Last Modified: | 04. Nov 2020, 12:59 |