The aim of this project was to create an offline speech assistant solely based on FOSS software. It uses vosk-api for speech recognition and a rules in order to map the recognized speech to actions.
While working on this project I discovered Rhasspy and moved on to working on Rhasspy Bridge insted of following the approach of this project. Therefore it possibly won't be continued.
- install
vosk-api
according to their vosk installation steps - download and install speech models for
vosk
according to their docs on models - install
Pico TTS
usingsudo apt-get install libttspico-utils
- clone this repository
cd src
python3 main.py
Features can be seen in the file config.json. All is in German and could also be translated to English or another language and combined with another speech model from vosk
.
Basically the current features are:
- getting the current date/time
- telling the weather in Vienna
- telling a random joke
- control some smart home items using openHAB
- do calculations:
- possible calculations are: add, substract, divide, multiply, log, square, squareroot
- it's possible to use the result of the last calculation by the word
Ergebnis
- examples:
rechne 10 mal 10
, andrechne logarithmus ergebnis
afterwards
All of the features are also possible using Rhasspy
and Rhasspy Bridge
except doing calculations.
- Rhasspy is very flexible and can be freely configured using different systems for instance for TTS and STT
- Using regular expressions and real STT this project is more flexible in terms of being able to say things in different ways. E.g. it's possible to say
Schalte das Licht in der Küche ein
andLicht in der Küche ein
andBitte mach das Licht in der Küche aus
and all should work (if the transcription of the speech is correctly recognized). Trying to do the same thing in Rhasspy drastically reduced recognition performance, see Rhasspy: avoid too complex config. - If reduced to a simple and limited command set I think recognition of Rhasspy is better since it tries to map everything to the known commands. On the other hand the attempt of this project completely fails if the STT engine understands e.g.
Kirche
instead ofKüche
.