Replies: 3 comments 2 replies
-
Hi, technically I think what you have to do is implement the WebSocket interface of the SEPIA STT server. It accepts audio buffer chunks and will return transcribed text. This text could be sent to the SEPIA Assist-Server endpoint for further actions. Basically the ESP32 would be a micro-client for SEPIA. I've been thinking about similar things to build something like a Fire-TV remote for SEPIA. Here is a bit of documentation about the STT Server API. |
Beta Was this translation helpful? Give feedback.
-
The esp32-s3-box was a limited run demonstrator and not sure if it will be on offer again. Websockets are great as its bidrectional but extremely easy to detect the two different packets of binary and text payload. I am not a fan of any KWS that doesn't allow you to use the softmax probability score of the last KW detection as that is a hugely useful piece of metadata and in an array of KWS the highest score is the best channel to use. Its very likely the esp32-s3 will follow a similar cost curve as the esp32 and previous product and is currently new and even at its current premium price the chip itself is only approx $2.50. If you look though there is a new product https://github.com/espressif/esp-box/blob/master/docs/hardware_overview/esp32_s3_box_lite/hardware_overview_for_lite.md which has a 2 channel ADC but still a 2mic version which hopefully they have changed and added a ring buffer so that mic capture and output latency is matched as it no longer has the loopback ref channel as there isn't a 3rd channel to loopback to. So yeah interesting things are happening and the new ADC might even be cheaper than the $1+ one the had before as its likely we could get I2S mics and no need for ADC at all but still hoping the cloners will eventually do a simple ESP32-S3-AUDIO as with economies of sale network KWS could be as cheap as $5-10. |
Beta Was this translation helpful? Give feedback.
-
PS dunno where to get this but also they are doing a esp32-s3-box-lite-board which is one hell of a product name but will be really interesting to how much it costs. |
Beta Was this translation helpful? Give feedback.
-
Hello,
I'm running an openHAB private instance (on an x86_64 server) which works great when it comes automating the relatively few objects I have here.
While I'd love to control it by voice, I definitely don't want anyone to listen into my home which is why I'm looking at offline options, and SEPIA came up in a few discussions on the openHAB forum.
As it happens, I have an ESP32-Box coming up in the mail and it is said to be capable of offline processing of wake words but seems to be limited to English when it comes to orders. Indeed, while end users could get used to an English wake word, they require French speaking when it comes to order sentences.
I was thus wondering if there is a way to use SEPIA to do the heavy lifting based on audio samples sent by the ESP32-box once it has recognized its wake word.
I understand that there is (fair) bit of programming to be done on the ESP32-Box to talk to SEPIA but I'm not sure what I should be targeting in SEPIA.
I mean, I looked at the documentation but apart from general principles and very nice diagrams, I could not find anything along the lines of "audio samples are to be POST or PUT to this URL". Could you tell which page I have missed?
Thanks for your help.
Beta Was this translation helpful? Give feedback.
All reactions