Abstract
Spectrograms provide a visual representation of the time-frequency variations of a speech signal. Furthermore, the color scales can be used as a pre-processing normalization step. In this study, we investigated the suitability of using different color scales for the reconstruction of spectrograms together with bottleneck features extracted from Convolutional AutoEncoders (CAEs). We trained several CAEs considering different parameters such as the number of channels, wideband/narrowband spectrograms, and different color scales. Additionally, we tested the suitability of the proposed CAE architecture for the prediction of the severity of Parkinson’s Disease (PD) and for the nasality level in children with Cleft Lip and Palate (CLP). The results showed that it is possible to estimate the neurological state for PD with Spearman’s correlations of up to 0.71 using the Grayscale, and the nasality level in CLP with F-scores of up to 0.58 using the raw spectrogram. Although the color scales improved performance in some cases, it is not clear which color scale is the most suitable for the selected application, as we did not find significant differences in the results for each color scale.
Dokumententyp: | Konferenzbeitrag (Paper) |
---|---|
Fakultät: | Medizin > Klinikum der LMU München > Klinik und Poliklinik für Hals-, Nasen- und Ohrenheilkunde |
Themengebiete: | 600 Technik, Medizin, angewandte Wissenschaften > 610 Medizin und Gesundheit |
ISSN: | 0302-9743 |
Ort: | Cham, Switzerland |
Sprache: | Englisch |
Dokumenten ID: | 110006 |
Datum der Veröffentlichung auf Open Access LMU: | 22. Mrz. 2024, 11:27 |
Letzte Änderungen: | 22. Mrz. 2024, 11:27 |