Skip to content

This is the project for the course named Neural Networks, PWR Winter Semester 2023. The scientific goal is to achieve high performance in disease classification from speech.

Notifications You must be signed in to change notification settings

najdamikolaj00/Disease_From_Speech_Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neural Networks course - PWR Winter Semester 2023

1. Introduction:

The human voice holds a vast amount of information beyond its role in communication. Through subtle nuances and patterns, valuable insights into an individual's health can be uncovered. Our project aims to utilize the power of neural networks to unlock the untapped diagnostic potential within vocal data.

2. Scientific objective:

The primary scientific objective of our project is to develop and optimize a neural network-based system for the accurate classification of various pathologies through the analysis of vocal patterns. This involves the following key components:
  1. Dataset Preprocessing:
  • Gather a diverse and comprehensive dataset encompassing a range of pathologies.
  • Implement preprocessing techniques to extract relevant features from the vocal data, considering utility of spectrograms.
  1. Neural Network Architecture Design:
  • Explore and experiment with different neural network architectures, such as convolutional neural networks (CNNs) to identify the most effective model for pathology classification based on voice analysis.
  • Optimize hyperparameters, including learning rates, layer configurations, and activation functions, to enhance the model's accuracy and generalization capabilities.
  1. Training and Validation:
  • Train the neural network on the prepared dataset, employing rigorous cross-validation techniques to ensure robust model performance.
  • Implement transfer learning strategies, if applicable, to leverage pre-trained models.
  1. Summary

3. Saarbrücken Dataset Description

The Saarbrücken dataset is a curated collection designed for the analysis and classification of various vocal pathologies [Barry, W.J., Pützer, M.: Saarbrücken Voice Database, Institute of Phonetics, Univ. of Saarland, http://www.stimmdatenbank.coli.uni-saarland.de/].

Pathological Categories and Distribution:

  • Dysphonie: 101 samples
  • Funktionelle Dysphonie: 112 samples
  • Hyperfunktionelle Dysphonie: 212 samples
  • Laryngitis: 140 samples
  • Rekurrensparese: 213 samples
For this study we have chosen 5 most common pathologies.

Healthy Samples:

In addition to the pathological recordings, the dataset includes 657 samples from healthy individuals.

Subdivision by Speech Elements:

  • Vowels: The dataset includes recordings focusing on the stable articulation of the vowels /a/, /i/, and /e/ enabling a detailed examination of vowel-specific characteristics.
  • Utterance: A set of recordings captures the utterance of the phrase "Guten Morgen, wie geht's es Ihnen?" (Good morning, how are you?), offering insights into the impact of different pathologies on the pronunciation of common phrases.

4. Visual Representation

Visualizing the intricate patterns and relationships within vocal data is crucial for understanding the effectiveness of our neural network-based pathology classification system. The following image provides a snapshot of the spectrogram analysis.
NSpec

About

This is the project for the course named Neural Networks, PWR Winter Semester 2023. The scientific goal is to achieve high performance in disease classification from speech.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages