esp32s3_eye_kws_demo

Speech recognition is based on this architecture and examples from the same repository. The cell type in this model is FastGRNN. More detailed view on data flow through the network with specific vector/matrix sizes:

The inference is run nine times a second. The CPU utilization due to inference is only ~24%. FastRNN cell is also supported (can be changed via menuconfig).

A bigger, LSTM-based model with ~550ms inference time can be found here. It is slightly more accurate, especially to the up label.

demo_3.mov

Notes

Number of TinyML model conversion frameworks were tested, but none gave satisfactory results. The main problem seems to be that the graphs exported from PyTorch (or other training-oriented NN frameworks) contain much additional information needed only for training, but information which obscures the essential structure needed only for inference. Here is for example a ONNX graph exported directly from PyTorch:

and this is all the "manually-transpiled" code needed for inference (~170 LoCs of C) ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

esp32s3_eye_kws_demo

Notes

Files

README.md

Latest commit

History

README.md

File metadata and controls

esp32s3_eye_kws_demo

Notes