AudioVision is a web app designed to support individuals with hearing impairments by converting audio files into text and visual representations. Using audio processing and AI, the application not only transcribes audio but also generates graphical sound wave visualizations and provides insightful analyses of the transcribed content.
You can access my app and test it at click here.
- Audio Transcription: Upload audio files in various formats, such as
.ogg
and.wav
, to receive an accurate text transcription of the spoken content. - Waveform Visualization: View a graphical representation of sound waves, offering a visual way to understand the characteristics of the audio.
- Content Analysis: Leverage AI to interpret the transcription, generating summaries and contextual insights tailored for enhanced clarity and accessibility.
- Flask for the application backend
- Librosa and Matplotlib for waveform visualizations
- SpeechRecognition for audio transcription
- Google Generative AI for intelligent summaries and contextual analysis of audio content
- Upload an audio file on the main page.
- The app converts the audio to text, displays the transcription, and shows a representative waveform.
- Optionally, view an AI-driven analysis of the content for broader understanding of the transcribed audio.
AudioVision was created to make auditory information more accessible through visual and textual formats, fostering inclusion and accessibility for individuals with hearing loss.