Speech recognition is based on this architecture and examples from the same repository. The cell type in this model is FastGRNN. More detailed view on data flow through the network with specific vector/matrix sizes:
The inference is run nine times a second. The CPU utilization due to inference is only ~24%.
FastRNN cell is also supported (can be changed via menuconfig
).
A bigger, LSTM-based model with ~550ms inference time can be found here.
It is slightly more accurate, especially to the up
label.
demo_3.mov
Number of TinyML model conversion frameworks were tested, but none gave satisfactory results. The main problem seems to be that the graphs exported from PyTorch (or other training-oriented NN frameworks) contain much additional information needed only for training, but information which obscures the essential structure needed only for inference. Here is for example a ONNX graph exported directly from PyTorch:
and this is all the "manually-transpiled" code needed for inference (~170 LoCs of C) ...