Skip to content

Object detection with YOLOv5

sara-mohajerani edited this page Sep 18, 2022 · 3 revisions

YOLO (You Only Look Once) is an extremely fast and accurate object detection algorithm that can also be run in real-time on a video. In the architecture, YOLOv5 consists of four main parts: input, backbone, neck, and output. The input terminal mainly contains the preprocessing of the data, including data augmentation. The backbone network extracts feature maps of different sizes from the input image by multiple convolution and pooling. The bottleneck on one side is used to reduce the amount of calculation and increase the speed of inference, and on the other side helps to improve the detection accuracy. As a final detection step, the head output is mainly used to predict targets of different sizes on feature maps. The YOLOv5 consists of four architectures, named YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. The main difference between them lies in the number of feature extraction modules and convolution kernels at specific locations on the network. The network structure of YOLOv5 is shown below:

Yolo does annotation by putting a bounding box on detected objects. The annotations format is like the following: (object class ID) (X center) (Y center) (Box width) (Box height) We can use a pre-trained YOLO model to do the automatic detection. However, since our desired classes do not exist in YOLO, we need to train them with our own data set. In order to do that, we need to do annotation based on YOLO's desired format mentioned above. Many different graphical annotation tools are available to draw the bounding box around the objects manually, giving the annotation in the format of YOLO, JSON, or PASCAL VOC like Labellmg [2] or Roboflow [3].

In order to train YOLO, we used the drinking waste classification dataset in Kaggle, which has four classes (Aluminium, Glass, PET, HDPE)[4]. The positive point of using this data set was that its YOLO annotation format is available on the website. Working with this dataset needs some preparation, while a few images did not label. As a result, the labels and images were compared, and the additional images were removed.

Then, based on the instruction of training YOLOv5 on a custom dataset, the model was trained, and the best weights for the trained model were downloaded and stored.

YOLO on Kaggle data set

The Kaggle data set contains 4811 images in 4 Aluminium, Glass, PET, and HDPE classes. We use this data set to customize the YOLO algorithm weights based on some related categories of images with our dataset. The code is available in the link [5]. After cloning with YOLO's GitHub, this code uses Roboflow to make our dataset in YOLOv5 format. Roboflow also gives us the possibility to annotate raw images. In order to use Roboflow, you need to:

  1. make an account on the Roboflow website,
  2. Upload images, their labels, and a text file containing the name and number of classes,
  3. Decide about the train, validation, and test split percentage.
  4. Resize or drop images,
  5. Do the augmentation,
  6. Finally, generate the Yolo version based on the given data set.

In our study:

  • 70% of images (3.3K images) were placed in the training set, and 30% were separated as validation and test sets (951 images for the validation set and 471 images for the test set).
  • The images resized into 416x416 pixels.
  • To augment images, the rotation of 90° clockwise and counter clockwise, ±15% vertical and horizontal shearing and 32% zoom in have been applied.

In the end, Roboflow generates a URL key, which can be used in the YOLO model in Colab by replacing it with the example written into the code in the link [5] and then starting to train the model. Here we train our model with batch size 16 in 100 epochs, and it takes 2.518 hours to finish the training process. In the following, the performance of the network is illustrated:

Roboflow also makes it easy to establish an active learning pipeline. After clicking the train model icon on the web page, it starts to do training. It takes time to finish the training procedure. When the training has been finished, you can upload a non-seen image and Roboflow put a bounding box on it, and label it automatically.

Links and references

[1] Li, Zhuang, et al. "A two-stage industrial defect detection framework based on improved-yolov5 and optimized-inception-resnetv2 models." Applied Sciences 12.2 (2022): 834.

[2] How to download and use labellmg

[3] How to use Roboflow

[4] Kaggle dataset

[5] YOLO code