This repository is under active development
Yet Another PyKaldi
This is a simple Python wrapper around parts of kaldi which is intended to be easy to install and setup with(out) a ROS environment.
The wrappers are generated with pybind11.
Target audience are developers who work with Docker and/or ROS to and would like to use kaldi-asr as the speech recognition system in their application on GNU/Linux operating systems (preferably Ubuntu>=18.04).
- Python 2.7 or 3.6
- numpy
- pybind11
- setuptools
- pkgconfig
- pyaudio
- future
- kaldi-asr or tue-robotics fork of kaldi-asr
tue-get install python-yapykaldi
This does not install the test scripts and data directory at the moment.
-
Install dependencies
sudo apt-get install build-essential portaudio19-dev pip install setuptools numpy pybind11 pkgconfig pyaudio
-
[Recommended] Install kaldi-asr from tue-robotics fork. This fork has some modifications to the cmake generate script and comes with installation scripts that ensure the pkgconfig file is generated correctly and is available to the bash environment
git clone https://github.com/tue-robotics/kaldi.git cd kaldi ./install.bash --tue echo "source ~/kaldi/setup.bash" >> ~/.bashrc
-
[Alternative] Install kaldi-asr from the upstream kaldi repository using CMake with -DBUILD_SHARED_LIBS=ON. Create a pkgconfig file in
dist/lib/pkgconfig/
(relative to repository root) and add the following to~/.bashrc
if [[ :$PKG_CONFIG_PATH: != *:$KALDI_ROOT/dist/lib/pkgconfig:* ]] then export PKG_CONFIG_PATH=$KALDI_ROOT/dist/lib/pkgconfig${PKG_CONFIG_PATH:+:${PKG_CONFIG_PATH}} fi
-
Install yapykaldi
git clone https://github.com/ar13pit/yapykaldi cd yapykaldi pip install .
-
[Optional] Download nnet3 models to run examples
cd yapykaldi/data wget https://github.com/tue-robotics/yapykaldi/releases/download/v0.1.0/kaldi-generic-en-tdnn_fl-r20190609.tar.xz tar xf kaldi-generic-en-tdnn_fl-r20190609.tar.xz mv kaldi-generic-en-tdnn_fl-r20190609 kaldi-generic-en-tdnn_fl-latest
To run the examples the optional step from installation from source needs to be completed.
- Test kaldi nnet3 model using test_nnet3.py
- Test simple audo recording using test_audio.py
- Test continuous live speech recognition using test_live.py
yapykaldi can be used for both online and offline speech recognition with open grammar.
The idea behind online speech recognition workflow is that a microphone stream (created using pyaudio) is connected to yapykaldi OnlineDecoder object using IPC. The microphone stream writes a stream chunk to the shared queue which is sequentially read by the OnlineDecoder object to generate a stream of recognized words. A signal handler listens for an interrupt signal which tells the OnlineDecoder to stop and finalize the recognition, and signals the microphone process to cleanly close the stream and write the heard data in a wav
file. Refer to test_live.py for this workflow.
The offline speech recognition workflow follows two approaches. First is to read the entire wav
file and do the recognition over the entire file at once. Second is to create a data stream from the wav
file to emulate a microphone and recognize data in chunks. Refer to test_nnet3.py for this workflow.