Speech Recognition Project

Overview

This project implements a speech recognition system using Python and the Wav2Vec2 model from Hugging Face. The system is fine-tuned on the PolyAI MINDS-14 dataset and deployed with a real-time transcription interface using Gradio.

Features

Dataset: Utilizes the PolyAI MINDS-14 dataset, resampled to 16kHz.
Model & Processor: Employs the Wav2Vec2ForCTC model and processor for audio-to-text transcription.
Training: Fine-tunes the model with custom training arguments and a data collator for handling varying input lengths.
Evaluation: Assesses model performance using the Word Error Rate (WER) metric.
Deployment: Provides a Gradio interface for real-time transcription.

Skills Used

Python
Machine Learning
Deep Learning
Natural Language Processing (NLP)
Audio Processing
PyTorch
Transformers (Hugging Face)
Gradio Interface Development
Model Evaluation (WER)
Dataset Management
GPU Acceleration (CUDA)

Installation

Clone the repository:

git clone https://github.com/yourusername/speech-recognition-project.git

Navigate to the project directory:
```
cd speech-recognition-project
```
Install the required packages:
```
pip install -r requirements.txt
```

Usage

Training the Model

Run the training script to fine-tune the model:
```
python train.py
```

Real-Time Transcription

Launch the Gradio interface:
```
python app.py
```
Use the microphone input to test real-time transcription.

Example

Transcription for a sample audio:

The input test audio is: "Sample transcription"
The output prediction is: "Sample prediction"

Acknowledgements

Hugging Face for providing the Wav2Vec2 model and transformers library.
PolyAI for the MINDS-14 dataset.
Gradio for the interactive interface.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
wav2vec2.py		wav2vec2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Recognition Project

Overview

Features

Skills Used

Installation

Usage

Training the Model

Real-Time Transcription

Example

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speech Recognition Project

Overview

Features

Skills Used

Installation

Usage

Training the Model

Real-Time Transcription

Example

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages