- Dataset Preprocessing:
- Gather a diverse and comprehensive dataset encompassing a range of pathologies.
- Implement preprocessing techniques to extract relevant features from the vocal data, considering utility of spectrograms.
- Neural Network Architecture Design:
- Explore and experiment with different neural network architectures, such as convolutional neural networks (CNNs) to identify the most effective model for pathology classification based on voice analysis.
- Optimize hyperparameters, including learning rates, layer configurations, and activation functions, to enhance the model's accuracy and generalization capabilities.
- Training and Validation:
- Train the neural network on the prepared dataset, employing rigorous cross-validation techniques to ensure robust model performance.
- Implement transfer learning strategies, if applicable, to leverage pre-trained models.
- Summary
The Saarbrücken dataset is a curated collection designed for the analysis and classification of various vocal pathologies [Barry, W.J., Pützer, M.: Saarbrücken Voice Database, Institute of Phonetics, Univ. of Saarland, http://www.stimmdatenbank.coli.uni-saarland.de/].
- Dysphonie: 101 samples
- Funktionelle Dysphonie: 112 samples
- Hyperfunktionelle Dysphonie: 212 samples
- Laryngitis: 140 samples
- Rekurrensparese: 213 samples
In addition to the pathological recordings, the dataset includes 657 samples from healthy individuals.
- Vowels: The dataset includes recordings focusing on the stable articulation of the vowels /a/, /i/, and /e/ enabling a detailed examination of vowel-specific characteristics.
- Utterance: A set of recordings captures the utterance of the phrase "Guten Morgen, wie geht's es Ihnen?" (Good morning, how are you?), offering insights into the impact of different pathologies on the pronunciation of common phrases.