Skip to content

Latest commit

 

History

History
49 lines (34 loc) · 1.67 KB

README.md

File metadata and controls

49 lines (34 loc) · 1.67 KB

Audio Keyword Collector

This is a tool to extract spoken word utterances from audio, and preprocess the clips to prepare them for machine learning algorithms. The tool can collect utterances from YouTube videos or local WAV files.

It utilizes Mozilla DeepSpeech speech-to-text models to identify spoken words. This model is used instead of other popular STT engines due to it being open source, allowing an unlimited duration of audio to be analyzed at no cost.

Features

  • Extracts and exports keyword utterances from long-form audio
  • Automatically resamples downloaded audio and converts to 1-channel
  • Allows for custom search queries to find relevant audio on YouTube

Setup

Clone repository:

git clone https://github.com/c-jg/keyword-collector.git

Create virtual environment and install dependencies:

python -m venv venv3
source venv3/bin/activate
pip install -r requirements.txt

If you want to extract keywords from YouTube videos, you will need to create an environment variable named YT_API_KEY that contains your API key for the YouTube Data API

You may also need to install the following packages on Linux:

sudo apt install libsndfile1
sudo apt install ffmpeg

Download trained DeepSpeech models: (~1.1 GB)

curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer

Run application:

python main.py