This project uses OpenAI's Whisper model to transcribe audio files from a directory and save the results as text files in another directory.
- Python 3.x
whisperlibrary (install viapip install openai-whisper)ffmpeg(required by Whisper, install via your package manager)
-
Clone the repository:
git clone https://github.com/yourusername/speech-to-text.git cd speech-to-text -
Install dependencies:
pip install -r requirements.txt
-
(Optional) Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate # On macOS/Linux venv\Scripts\activate # On Windows
-
Default directories and language:
python3 src/transcribe_audio.py
This will transcribe all
.oggand.wavfiles from thevoice_inputdirectory and save the results in thetext_outputdirectory. The default language is Russian (ru). -
Custom directories and language:
python3 src/transcribe_audio.py --input_dir my_input_folder --output_dir my_output_folder --language en
This will transcribe files from
my_input_folderand save the results inmy_output_folder. The language is set to English (en).
-
Default directories and language:
make transcribe
This will transcribe all
.oggand.wavfiles from thevoice_inputdirectory and save the results in thetext_outputdirectory. The default language is Russian (ru). -
Custom directories and language:
make transcribe INPUT_DIR=my_input_folder OUTPUT_DIR=my_output_folder LANGUAGE=ru
This will transcribe files from
my_input_folderand save the results inmy_output_folder. The language is set to English (ru).
- Ensure that the
voice_inputdirectory exists and contains valid audio files. - The
text_outputdirectory will be created automatically if it doesn't exist. - Supported languages include
ru(Russian),en(English), and others. Refer to the Whisper documentation for a full list.