Skip to content

A full-stack ASR (Automatic Speech Recognition) system using Facebook’s Wav2Vec2, Elasticsearch, and React. Transcribe, index, and search audio data with demographic filtering and faceted search.

Notifications You must be signed in to change notification settings

millieseow123/asr-htx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A searchable audio transcript interface using Wav2Vec2, Elasticsearch, Flask, and React.

URL: http://3.90.182.103/

Project Structure

  • asr/: ASR microservice with wav2vec2 model
  • deployment-design/: Architecture design (PDF)
  • elastic-backend/: Elasticsearch indexing setup
  • search-ui/: Frontend search interface

Run ASR API with Docker

  1. Navigate to directory
cd asr
  1. Build the image:
docker build -t asr-api ./asr
  1. Run the container:
docker run -p 8001:8001 asr-api
  1. Test the API:
curl -F "file=@/path/to/sample.mp3" http://localhost:8001/asr

Run Elasticsearch Backend

  1. Navigate to directory
cd elastic-backend
  1. (Optional but recommended) Create a virtual environment:
python3 -m venv venv
source venv/bin/activate
  1. Install Python dependencies:
pip install -r requirements.txt
  1. Start Elasticsearch cluster
docker compose up

Open http://localhost:9200/_cat/nodes?v in your browser — you should see both es01 and es02 nodes listed.

  1. Index data
python cv-index.py
  1. Start backend API server
python search_api.py

Run Frontend (search-ui)

  1. Navigate to the frontend directory:
cd search-ui
  1. Install dependencies:
npm install
  1. Start development sever
npm start

Open http://localhost:3000 with your browser to see the result.

Limitations

  • The ASR model used (wav2vec2-large-960h) may produce inaccurate transcriptions, especially for non-US accents or noisy audio, therefore the search function searches both transcribed and actual audio
  • Some metadata fields (e.g., age, gender, accent) may be missing or inconsistent in the source CSV file.
  • The search UI currently does not support fuzzy matching or partial phrase queries.
  • Facets are limited to a fixed number of values (e.g., only top 10 accent types are shown).
  • Backend and search functionality assumes the local Elasticsearch and ASR services are running on ports 9200 and 8001 respectively.

About

A full-stack ASR (Automatic Speech Recognition) system using Facebook’s Wav2Vec2, Elasticsearch, and React. Transcribe, index, and search audio data with demographic filtering and faceted search.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published