Use 6 steps to train your own YOLO model in your codes!
1⃣ Create YOLO object(v1~v4).
2⃣ Read files(labelme style, labelimg style).
3⃣ Get anchor boxes.
4⃣ Create model(with DarkNet, ResNet backbone...).
5⃣ Compile model.
6⃣ Train the model!
tf2_YOLO is my implementation of YOLOv1 to YOLOv4 using Tensorflow 2.X(tf.keras) after delving into 4 papers of YOLO:
YOLOv1: You Only Look Once: Unified, Real-Time Object Detection by Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi (https://arxiv.org/abs/1506.02640).
YOLOv2(YOLO9000): Better, Faster, Stronger by Joseph Redmon, Ali Farhadi (https://arxiv.org/abs/1612.08242).
YOLOv3: An Incremental Improvement by Joseph Redmon, Ali Farhadi (https://arxiv.org/abs/2004.10934).
YOLOv4: Optimal Speed and Accuracy of Object Detection, Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao (https://arxiv.org/abs/1804.02767).
This repo refers to lots of resources, including the source code of darknet:
- https://github.com/qqwweee/keras-yolo3
- https://github.com/allanzelener/YAD2K
- https://github.com/penny4860/Yolo-digit-detector
- https://github.com/pjreddie/darknet
Most importantly, the repo is written in Python and Tensorflow, so you can easily modify anything in it and program comfortably.
-
Clone or download
-
Use the command bellow in terminal to git clone:
git clone https://github.com/samson6460/tf2_YOLO.git
-
Or just download whole files using the [Code > Download ZIP] button in the upper right corner.
-
-
Install dependent packages:
pip install -r requirements.txt
dataset from: https://github.com/datitran/raccoon_dataset
dataset from: https://github.com/Shenggan/BCCD_Dataset.git
Each version of YOLO is independent, you can copy the parts you want to your own project directory.
And follow the documentation bellow to train and evaluate your YOLO model.
YOLOv1
from tf2_YOLO import yolov1_5
yolo = yolov1_5.Yolo(input_shape, class_names)
YOLOv2
from tf2_YOLO import yolov2
yolo = yolov2.Yolo(input_shape, class_names)
YOLOv3
from tf2_YOLO import yolov3
yolo = yolov3.Yolo(input_shape, class_names)
YOLOv4
from tf2_YOLO import yolov4
yolo = yolov4.Yolo(input_shape, class_names)
- input_shape: A tuple of 3 integers, shape of input image.
- class_names: A list, containing all label names.
Read as array(read into RAM)
img, label = yolo.read_file_to_dataset(
img_path,
label_path)
or
Read as tf.Sequence
seq = yolo.read_file_to_sequence(
img_path,
label_path,
batch_size)
-
img_path: A string, file path of images.
-
label_path: A string, file path of annotations.
-
batch_size: An integer, size of the batches of data (default: 20).
Returns from YOLOv1、YOLOv2
A tuple of 2 ndarrays, (img, label),
- shape of img: (batch_size, img_heights, img_widths, channels)
- shape of label: (batch_size, grid_heights, grid_widths, info)
Returns from YOLOv3、YOLOv4
A tuple: (img: ndarray, label_list: list), label_list contains the label of all FPN layers.
- shape of img: (batch_size, img_heights, img_widths, channels)
- shape of label: (batch_size, grid_heights, grid_widths, info)
YOLOv1、YOLOv2
yolo.vis_img(img[0], label[0])
YOLOv3、YOLOv4
yolo.vis_img(img[0], label[2][0])
YOLOv2
from utils.kmeans import kmeans, iou_dist
import numpy as np
all_boxes = label[label[..., 4] == 1][..., 2:4]
anchors = kmeans(
all_boxes,
n_cluster=5,
dist_func=iou_dist,
stop_dist=0.00001)
anchors = np.sort(anchors, axis=0)[::-1]
YOLOv3、YOLOv4
from utils.kmeans import kmeans, iou_dist
import numpy as np
all_boxes = label[-1][label[-1][..., 4] == 1][..., 2:4]
anchors = kmeans(
all_boxes,
n_cluster=9,
dist_func=iou_dist,
stop_dist=0.00001)
anchors = np.sort(anchors, axis=0)[::-1]
YOLOv1
model = yolo.create_model(bbox_num)
- bbox_num: An integer, the number of bounding boxes.
YOLOv2、YOLOv3、YOLOv4
model = yolo.create_model(anchors)
- anchors: 2D array like, prior anchor boxes(widths, heights), all the values should be normalize to 0-1.
YOLOv1、YOLOv2
from utils.tools import get_class_weight
from tensorflow.keras.optimizers import Adam
binary_weight = get_class_weight(
label[..., 4:5],
method='binary'
)
loss = yolo.loss(binary_weight)
metrics = yolo.metrics("obj+iou+recall0.5")
yolo.model.compile(optimizer=Adam(lr=1e-4),
loss=loss,
metrics=metrics)
YOLOv3、YOLOv4
from utils.tools import get_class_weight
from tensorflow.keras.optimizers import Adam
binary_weight_list = []
for i in range(len(label)):
binary_weight_list.append(
get_class_weight(
label[i][..., 4:5],
method='binary'
)
)
loss = yolo.loss(binary_weight_list)
metrics = yolo.metrics("obj+iou+recall0.5")
yolo.model.compile(optimizer=Adam(lr=1e-4),
loss=loss,
metrics=metrics
)
Train with dataset
yolo.model.fit(
data,
label,
epochs)
Train with tf.Sequence
yolo.model.fit(
seq,
epochs)
YOLOv1、YOLOv2
from utils.measurement import create_score_mat
prediction = yolo.model.predict(data)
# visualize one image with its annotation
yolo.vis_img(
data[0], prediction[0],
nms_mode=2)
create_score_mat(
label,
prediction,
class_names=yolo.class_names,
nms_mode=2,
version=1 # or version=2
)
print(create_score_mat)
YOLOv3、YOLOv4
from utils.measurement import create_score_mat
prediction = yolo.model.predict(data)
# visualize one image with its annotation
yolo.vis_img(
data[0],
prediction[2][0],
prediction[1][0],
prediction[0][0],
nms_mode=2)
create_score_mat(
label[-1],
prediction[2],
prediction[1],
prediction[0],
class_names=yolo.class_names,
nms_mode=2,
version=3)
print(create_score_mat)
- nms_mode: An integer,
- 0: Not use NMS.
- 1: Use NMS.
- 2: Use Soft-NMS.
- 3: Use DIoU-NMS.
- version: An integer, specifying the decode method, yolov1, 2, 3 or 4.
YOLOv1、YOLOv2
from utils.measurement import PR_func
pr = PR_func(
label,
prediction,
class_names=yolo.class_names,
max_per_img=100,
version=1 # or version=2
)
pr.plot_pr_curve(smooth=False)
pr.get_map(mode="voc2012")
YOLOv3、YOLOv4
from utils.measurement import PR_func
pr = PR_func(
data[-1],
prediction[2],
prediction[1],
prediction[0],
class_names=yolo.class_names,
max_per_img=100,
version=3
)
pr.plot_pr_curve(smooth=False)
pr.get_map(mode="voc2012")
- max_per_img: An integer, limit the number of objects that an image can detect at most.
- version: An integer, specifying the decode method, yolov1, 2, 3 or 4.
- smooth: A boolean, if True, use interpolated precision.
- mode: A string, one of "voc2007", "voc2012"(default), "area", "smootharea".
- "voc2007": calculate the average precision of recalls at [0, 0.1, ..., 1](11 points).
- "voc2012": calculate the average precision of recalls at [0, 0.14, 0.29, 0.43, 0.57, 0.71, 1].
- "area": calculate the area under precision-recall curve.
- "smootharea": calculate the area under interpolated precision-recall curve.