Skip to content

Latest commit

 

History

History
executable file
·
101 lines (79 loc) · 6.43 KB

File metadata and controls

executable file
·
101 lines (79 loc) · 6.43 KB
logo

NetsPresso tutorial with YOLO Fastest for Arm Cortex-M85 and Cortex-M55+Ethos-U55

PyNetspresso provides a comprehensive process for training, compressing, converting, and benchmarking the Yolo-Fastest model, ensuring optimal performance on Arm Cortex-M85 and Cortex-M55. Particularly useful for those seeking to seamlessly apply the Yolo-Fastest model to Arm processor, this repository facilitates efficient deployment and enhances the overall workflow.

Order of the tutorial

0. Sign up
1. Install
2. Prepare dataset
3. Training
4. Compress model, convert to tflite, and benchmark with PyNetsPresso

0. Sign up

To get started with the NetsPresso Python package, you will need to sign up at NetsPresso.

1. Install

Clone repo and install requirements.txt in a Python>=3.7.0 environment, including PyTorch >= 1.11, < 2.0.

git clone https://github.com/Nota-NetsPresso/ModelZoo-YOLOFastest-for-ARM-U55-M85.git  # clone
cd ModelZoo-YOLOFastest-for-ARM-U55-M85
pip install -r requirements.txt  # install

2. Prepare dataset

Download the STREETS dataset and annotations from link, unzip, and move the vehicleannotaitons folder to ../dataset/ directory

Your code structure should like

├── datasets
│    └── vehicleannotaitons
│         ├── images
│         └── annotations
│    
└── ModelZoo-YOLOFastest-for-ARM-U55-M85


3. Training

If you want to start from scratch, create a '.pt' file via 'train.py'.

python train.py --data ./data/STREETS.yaml --epochs 300 --weights '' --cfg ./models/yolo-fastest.yaml  --batch-size 64

4. Compress model, convert to tflite, and benchmark with PyNetsPresso

auto_process.py provides integrated process which contains torch.fx converting, model compression, fx model retraining, onnx exporting, tflite converting, device benchmark, and mAP validation. You can execute auto_process.py with minimal training hyper-parameters and NetsPresso account information.

You can choose Renesas-RA8D1 (Arm Cortex-M85) or Ensemble-E7-DevKit-Gen2 (Arm Cortex-M55 + Ethos-U55) device, and boost inference speed by giving Helium option.

python auto_process.py --data ./data/STREETS.yaml --name yolo_fastest --weight_path ./models/yolo_fastest_streets.pt --epochs 300 --batch-size 64 --np_email '' --np_password '' --target_device Renesas-RA8D1 --helium

Benchmark

Model Format Precision Size
(pixels)
mAPval
50-95
mAPval
50
Speed
Cortex-M85
(ms)
Speed
Cortex-M85 with Helium
(ms)
Speed
Ethos-U55
(ms)
Params
(M)
YOLO-Fastest PyTorch FP32 256 41.6 75.5 - - - 0.3
YOLO-Fastest TFLite Full INT8 256 39.7 73.7 594 269 6.8 0.3
Compressed YOLO-Fastest TFLite Full INT8 256 37.3 71.5 513 234 6.0 0.2
Table Notes
  • The checkpoint is trained to 300 epochs with default settings. The model uses hyp.scratch-low.yaml hyps.
  • mAPval values are for single-model single-scale on the STREETS dataset.
    Reproduce by python val.py --weights './models/yolo_fastest_streets_256.pt' --data ./data/STREETS.yaml --img 256 for pytorch ckpt file, and python val.py --weights './models/yolo_fastest_streets_full_int8_256.tflite' --data ./data/STREETS.yaml --img 256 --anchors-for-tflite-path ./models/yolo_fastest_streets_256_anchors.json for full int8 tflite file.
  • Speed is making inference for a STREETS val image using Cortex-M85 (with/without helium) and Ethos-U55.

Contact

Join our Discussion Forum for providing feedback or sharing your use cases, and if you want to talk more with Nota, please contact us here.
Or you can also do it via email(contact@nota.ai) or phone(+82 2-555-8659)!