Code for Speech-To-Text Online APIs - Python Scripts

Bulk Audio Transcription using Speech Transcription Services

Initial Steps

Ensure you have Python 3
Clone this repo and cd into it.
Go into each folder for specific services and check the README files

Supported Services

Please feel free to contribute for other online speech transcriptions that you are aware of.

Input format

Ensure you have a folder of audio files in WAV format. (For example, wav_folder)

Example:

wav_folder
├───000561a49624c7c56625e6d8ccd230b15d3f129083b84c19846a9593.wav
├───0005680e7ac8826cff24f15022b67a2651acd691bf897bf3d3e44345.wav
├───000568340cb01e73daaa263d90765a5c213160a75201d642d899b4df.wav
├───00057062512f8dbc62d1691b97d0e6d997f350f41c908956fec02dbd.wav
├───00057091dd6ea751089e57358095034164067c180c4d1730254924ac.wav
├───000574e671847cbc40ef7fa325f39bfb6338a7f7781e09e773702b41.wav
...

Output format

Each script will dump the transcriptions in the specified output folder in the following format:

Example:

output_txt_folder
├───000561a49624c7c56625e6d8ccd230b15d3f129083b84c19846a9593.txt
├───0005680e7ac8826cff24f15022b67a2651acd691bf897bf3d3e44345.txt
├───000568340cb01e73daaa263d90765a5c213160a75201d642d899b4df.txt
├───00057062512f8dbc62d1691b97d0e6d997f350f41c908956fec02dbd.txt
├───00057091dd6ea751089e57358095034164067c180c4d1730254924ac.txt
├───000574e671847cbc40ef7fa325f39bfb6338a7f7781e09e773702b41.txt
...

Measuring quality

If you have ground truth in the same format as the output folder described above, you can calculate the Word Error Rate (WER) as follows:

Prerequisite: pip install jiwer==2.2.0
Set the ground truth and prediction folders in the last line of calc_wer.py
Run python calc_wer.py

DL Models

If say suppose you want to compare the online transcription output to the output of your deep learning models, it's easy!

We follow the format of LibriSpeech dataset in this repo.
So ensure you dump the output in that format (same as this repo format) and use the calc_wer.py script to compare the quality.

For example purposes, we have supported the following DL models:

ESPnet

Please feel free to contribute for other DL models that you are aware of.

Any pull requests or issues for bugs or fixes or new features are warmly welcomed. :-)

Alternatives

You can also check the following Python Libraries for more services:

SpeechRecognition

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
AWS-Transcribe		AWS-Transcribe
Azure-Cognitive-Service		Azure-Cognitive-Service
ESPNet-Model-Inference		ESPNet-Model-Inference
Google-Speech2Text		Google-Speech2Text
RevAI-Temi-API		RevAI-Temi-API
LICENSE		LICENSE
README.md		README.md
calc_wer.py		calc_wer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code for Speech-To-Text Online APIs - Python Scripts

Bulk Audio Transcription using Speech Transcription Services

Initial Steps

Supported Services

Input format

Output format

Measuring quality

DL Models

Alternatives

About

Languages

License

narVidhai/Speech-Transcription-Benchmarking

Folders and files

Latest commit

History

Repository files navigation

Code for Speech-To-Text Online APIs - Python Scripts

Bulk Audio Transcription using Speech Transcription Services

Initial Steps

Supported Services

Input format

Output format

Measuring quality

DL Models

Alternatives

About

Topics

Resources

License

Stars

Watchers

Forks

Languages