wav2letter Inference API:
wav2letter.ipynb is google colab notebook file.
Run this colab file in google colab, which will fetch all dependencies, compile and run the inference. The output of the inference is served over python web API.
While installing Arrayfire, you need to accept the licence by inputting Y in the output cell.
Upload wav file to wav2letterInference folder and change the name to numbersAudioMale.wav(or change filename/line 16 in convertAudio.py) for Inference
If colab doesnt work, run the container in ubuntu18 machine as below.
Run the container
1)port forward to 8888
sudo docker run -p 8888:8888--rm -itd --ipc=host --name w2l wav2letter/wav2letter:inference-latest
2)Execute the container
sudo docker exec -it w2l bash
wav2letter library/project inside the path /root/wav2letter/
Running inference inside the container using python code,
1)download model from AWS with below command into folder model:
for f in acoustic_model.bin tds_streaming.arch decoder_options.json feature_extractor.bin language_model.bin lexicon.txt tokens.txt ; do wget http://dl.fbaipublicfiles.com/wav2letter/inference/examples/model/${f} ; done
- Download this repository, for python code
git clone https://github.com/jkreddy123/wav2letterInference.git
3)Check Inference on shell, run:
/root/wav2letter/build> python ~/wav2letterInference/runmodel.py
NOTE: Change path for binary, model folder and wav file path accordingly in the runmodel file
- check Inference results on browser http://0.0.0.0:8888/index, run
python ~/wav2letterInference/convertAudio.py 8888
contact jkreddy@colorssoftware.com
Output looks like below
b'Completed features model file loading elapsed time=2407 microseconds\n'
b'\n'
b'Started acoustic model file loading ... \n'
b'Completed acoustic model file loading elapsed time=4732 milliseconds\n'
b'\n'
b'Started tokens file loading ... \n'
b'Completed tokens file loading elapsed time=1341 microseconds\n'
b'\n'
b'Tokens loaded - 9998 tokens\n'
b'Started decoder options file loading ... \n'
b'Completed decoder options file loading elapsed time=91 microseconds\n'
b'\n'
b'Started create decoder ... \n'
b'Completed create decoder elapsed time=1653 milliseconds\n'
b'\n'
b'Started converting audio input from stdin to text... ... \n'
b'#start (msec), end(msec), transcription\n'
b'0,1000,\n'
b'1000,2000,\n'
b'2000,3000,one \n'
b'3000,4000,two three \n'
b'4000,5000,four \n'
b'5000,6000,five six \n'
b'6000,7000,seven eight \n'
b'7000,8000,nine \n'
b'8000,9000,ten eleven \n'
b'9000,10000,twelve thirty \n'
b'10000,10334,forty fifty \n'
b'Completed converting audio input from stdin to text... elapsed time=2626 milliseconds\n'```