Applying Computer Vision and Machine Learning (ML) at the very edge of embedded development. This project uses TTGO T-CAMERA board based on ESP32 MCU for counting pills applying various ML techniques, such as Support Vector Machine (SVM) and Random Forest. Pills are just an example of what can be counted. The project has a modular structure and other classes and categories can be easily added.
HTML monitor and configuration interface are provided. Counter enabling, mode, reset, and alarm can be set up. Alarms work sending a JSON message to an indicated HTTP endpoint when the alarm is enabled and introduced counter value has been reached.
The device is intended to be installed over a production line/conveyor such as the whole belt width is captured by the camera sensor.
During the development process various counting strategies have been elaborated.
- Grid Division. Bruit force approach that splits the image in sample-sized subframes and applies a classifier to each. Does not support accumulative couning due to inefficiency of pills identification strategies.
- One Detection Line and One Detection Line with 1 Frame Hysteresis. More elegant, robust, and efficient way of counting pills. It consists of placing a detection line throughout the image width. The detection line has a hight of subframe and a width of image. Numerious overlapped subframes are placed over the detection line so that high pill detection precision is achieved.
- Two Detection Lines. This method is based on placing two detection lines over the image. The idea is to capture pills on the first line and await their confirmation plus uncaptured ones on the second line. This method requires calibration due to varying production belt speed. The distance between two lines is a function of the belt speed.
Once a subframe has been extracted, a classifier should be applied to detect a pill. Next algorithms has been used for classification:
- SVM
- Random Forest (RF)
The ML models data, training, and porting code is here. Refer to Credits section for used projects.
All initial goals of the project have been met and the results are presented here as throughput and algorithms efficiency.
The device works in two modes: HTTP monitored and Unattended. For the HTTP monitored mode the average is 4.2 fps, and for the Unattended mode is 8.3fps. Throughput depends on the operations listed in the table below. Memory access times are included in operations.
Operation | Average Time, ms | Mode |
---|---|---|
JPG -> RGB888 | 97 | Monitored and Unattended |
Classification Processing | 3 (RF), 989 (SVM) | Monitored and Unattended |
RGB888 -> JPG | 105 | Monitored only |
Algorithm | Average Classification Time, ms |
---|---|
SVM | 52 |
RF (forest depth 20) | 0.16 |
RF (forest depth 30) | 0.2 |
RF (forest depth 40) | 0.3 |
- More classification algorithms
- More classification categories
- Section on ML model validation
- Use of color images
Development has been done using Visual Studio Code and PlatformIO extention. To install download VS Code, install PlatformIO, compile and program your device.
Thanks to Espressif for the IDF. Important credits should be given to two projects used for ML model training and porting. For the training scilearn-kit project has been used, and thanks to eloquentarduino and his micromlgen library the porting has been possible.
The source code for the project is licensed under the MIT license, which you can find in the LICENSE.txt file.