Parallel Representation Learning for the Classification of Pathological Speech: Studies on Parkinson's Disease and Cleft Lip and Palate

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Vasquez-Correa, J. C.; Arias-Vergara, T.; Schuster, M.; Orozco-Arroyave, J. R. und Noeth, E. (2020): Parallel Representation Learning for the Classification of Pathological Speech: Studies on Parkinson's Disease and Cleft Lip and Palate. In: Speech Communication, Bd. 122: S. 56-67

Volltext auf 'Open Access LMU' nicht verfügbar.

DOI: 10.1016/j.specom.2020.07.005

Abstract

Speech signals may contain different paralinguistic aspects such as the presence of pathologies that affect the proper communication capabilities of a speaker. Those speech disorders have different origin depending on the type of the disease. For instance, diseases with morphological origin such as cleft lip and palate that causes hypernasality, or with neurodegenerative origin such as Parkinson's disease that generates hypokinetic dysarthria on the patients. Automatic assessment of pathological speech allows to support the diagnosis and/or the evaluation of the disease severity. Conventional methods are based on the manually applied assessment of single features such as jitter, shimmer, or formant frequencies that may not completely model all of the phenomena that appear due to the disease. This paper introduces a novel strategy based on unsupervised representation learning for automatic detection of pathological speech. The proposed approach is based on the use of recurrent and convolutional autoencoders trained to extract informative features to characterize the presence of pathologies in speech. A novel feature set based on the reconstruction error of the autoencoders is also proposed. The performance of the introduced models is evaluated classifying pathological speech signals recorded from people suffering from Parkinson's disease, and children with cleft lip and palate. All participants from this study were Spanish native speakers. The proposed models are accurate to classify the speech signals of both kinds of diseases, with an accuracy of up to 97% for cleft lip and palate, and up to 84% for the case of Parkinson's disease. We also show that the reconstruction error from the autoencoders in different frequency regions contain information related to specific speech symptoms of both diseases.

Dokumententyp:	Zeitschriftenartikel
Fakultät:	Medizin
Themengebiete:	600 Technik, Medizin, angewandte Wissenschaften > 610 Medizin und Gesundheit
ISSN:	0167-6393
Sprache:	Englisch
Dokumenten ID:	85934
Datum der Veröffentlichung auf Open Access LMU:	25. Jan. 2022 09:16
Letzte Änderungen:	25. Jan. 2022 09:16

Dokument bearbeiten