Skip to content

Implementation of LeNet-5 on STM32F446RE board with Arm Cortex-M4 core

Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



12 Commits

Repository files navigation


Implementation of Caffe LeNet-5 on STM32F446RE board with Arm Cortex-M4 core.

0. Prerequisites


  1. STM32 NUCLEO-F446RE board
  2. Desktop Computer (GPU is optional)


  1. Jupyter notebook -
  2. Python -
  3. Caffe -
  4. STM32CubeIDE -
  5. PuTTY -

Notes: Make sure software above have been installed before proceeding to further step.

1. Dataset & Model Preparation


  1. LeNet-5 Model Definition: Model/lenet_train_test.prototxt (for training & testing), Model/lenet_deploy.prototxt (for real classification on desktop)
  2. Pre-trained LeNet-5 model: Model/lenet_iter_10000.caffemodel


  1. MNIST Dataset in LMDB format: Dataset/mnist_test_lmdb & Dataset/mnist_train_lmdb (for training & testing purpose).
  2. MNIST Dataset in jpg format: (for real classification purpose, please create and locate at Test_Dataset dir).

Full Training

(Optional) If you don't want to use the pre-trained LeNet-5 model.

<caffe> train -solver Model/lenet_solver.prototxt


(Optional) If you wish to fine-tune the pre-trained LeNet-5 model.

<caffe> train -solver Model/lenet_solver.prototxt -weights Model/lenet_solver.prototxt


  1. To enable GPU for full training/fine-tuning, use -gpu 0 argument.
  2. Remember to change variables in prototxt accordingly if needed, ie: dataset path (lmdb).
  3. <caffe> is your executable caffe, for my Windows case: C:\Caffe\caffe-master\Build\x64\Release\caffe.exe.
  4. More info regarding data preparation and model training, you may refer to

2. Inference via CPU/GPU

  1. Open Scripts/LeNet5_classification.ipynb via Jupyter Notebook.
  2. Follow and execute instruction mentioned in the Jupyter Notebook.
  3. Remember to change the path for following variables: caffe_root, root, model_def, model_weights, labels_file.
  4. You can choose to run inference via CPU/GPU by setting caffe.set_mode_cpu() or caffe.set_mode_gpu().
  5. This Jupyter notebook allows you to run image classification for one image and group of test images.
  6. Accuracy and inference speed will be displayed as below: image

3. Inference via STM32 NUCLEO-F446RE Board

Quantize the weights & biases

  1. Needs Caffe model definition (.prototxt) used for training/testing the model that consists of valid paths to datasets (lmdb) and trained model file (.caffemodel). It parses the network graph connectivity, quantize the caffemodel to 8-bit weights/activations layer-by-layer incrementally with minimal loss in accuracy on the test dataset. It dumps the network graph connectivity, quantization parameters into a pickle file.
  2. Run to parse and quantize the network. This step takes a while if run on CPU as it quantizes the network layer-by-layer while validating the accuracy on test dataset. To enable GPU for quantization sweeps, use --gpu argument.
python --model ../Model/lenet_train_test.prototxt --weights ../Model/lenet_iter_10000.caffemodel --save lenet_quantize.pkl

Convert model into code

  1. Gets the quantization parameters and network graph connectivity from previous step and generates the code consisting of NN function calls. Supported layers: convolution, innerproduct, pooling (max/average) and relu. It generates (a) weights.h (b) parameter.h: consisting of quantization ranges and (c) main.cpp: the network code.
  2. Run to generate code to run on Arm Cortex-M CPUs.
python --model lenet_quantize.pkl --out_dir ../Code

Convert MNIST Test Images into array format

  1. Get a group of MNIST images in jpg format and convert them into signed-int8 format. All the images array will be categorized into different input_x.h files, whereby each input_x.h file contains a maximum of 80 images (due to memory limitation of NUCLEO-F446RE board).
  2. All the input_x.h files will be included into a include_list.h file, whereby user is allowed to comment / uncomment them such that only one input_x.h is included and uploaded to the board.
python --image_dir ../Test_Dataset --out_dir ../Code

Build & Run the project via STM32CubeIDE

  1. Create a new project via STM32CubeIDE.
  2. In Board Selector, select NUCLEO-F446RE for your Commercial Part No.
  3. Download CMSIS-NN & CMSIS-DSP package from and add them to our project.
  4. Remember to include both DSP/Include and NN/Include dirs via Project > Properties > C/C++ General > Paths and Symbols > Includes.
  5. Add NN/Source dir via Project > Properties > C/C++ General > Paths and Symbols > Source Location.
  6. Click your project ioc, under Pinout & Configuration, expand Timers, select TIM10, and click 'Activated' to activate the timer.
  7. Copy content from main.cpp into Core/Src/main.c, and move weights.h, parameter.h, input_x.h, and include_list.h generated into Core/Inc dir.
  8. 'Build' and 'Run' the project to upload the program to NUCLEO-F446RE board.
  9. The memory utilization is shown below: image
  10. To view the output message, open PuTTY terminal, click 'Serial', enter your Serial Line (ie: COM3) and Speed (ie: 115200), and click 'Open'.
  11. Message such as classification result, inference cycle, accuracy will be displayed via PuTTY terminal. image


  1. The final STM32CubeIDE project for LeNet-5 implemenation has been compressed as
  2. You are expected to be able to run the project directly to your board to carry out image classification on MNIST image array located in input_x.h.


Implementation of LeNet-5 on STM32F446RE board with Arm Cortex-M4 core






No releases published


No packages published