Computer Vision: Pascal VOC dataset

Instance Segmentation and detection of Pascal VOC objects

A Ascencio-Cabral

Models

Faster-RCNN-50-FPN -pretrained off-the-box pytorch network
Mask-RCNN-50-FPN - pretrained off-the-box pytorch network
Mask-RCNN-101-FPN - built with pretrained backbone on ImageNet dataset
Mask-RCNN-101-FPN with customised anchors sizes=(16, 32, 64, 128, 256, 512) - built with pretrained backbone on ImageNet dataset

Evaluation - Coco style metrics

Mean Average Precision (AP or mAP) at IoU [0.5, 0.05, 0.95], 0.75 and 0.50

1. Introduction

With the developments in deep learning, the applicability of computer vision has been widely spread in fields such as robotics, image search, recognition and autonomous driving. In this work Mask-RCNN-ResNet50-FPN, Mask-RCNN-ResNet101-FPN, Mask-RCNN-ResNet101-FPN with customised anchor sizes and Faster-ResNet50-FPN were used for the instance semantic segmentation and object detection on the PASCAL VOC 2012 dataset.

3. Methods

3.1 Environment

A python environment was setup and the experiments were built using Pytorch.

3.2 Datasets

For this project only the images, annotations and segmented class masks of the Pascal VOC 2012 kit dataset were used [1]. This dataset has 21 classes including the background. The dataset contained in total 2913 images with annotations and ground truth. The holdout method with proportion of 80:10:10 for training, validation and test of the models.

voc_classes: ' __background__ ','aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus','car','cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'

3.3 Training

The approach to training was transfer learning and fine tuning the last three layers of the models. The models were trained to minimize the loss with SGD and Adam optimizers, a weght decay of 0.0001 and learning rates of 0.001, 0.005 and 0.0001 with a step scheduler for 5 steps and gamma of 0.2. All models were trained on google colab.

3.4 Evaluation metrics

The performance of the model was measured after each epoch training on the validation subset and on the test subset after training. The coco style mean average precision was measure at IoU thresholds [0:50,0:05,0:95], 0.50 and 0.75 [2]. The mean average precision (mAP) was computed for all classes and per each class on the test subset [3].

4. Results - Coco style metrics

Tables 1-2 show the mean average precision of the best models for all classes and per class, respectively. Mask-RCNN-ResNet-50-FPN pretrained with the v2 of the parameters hit the highest mAP for segmentation and detection at all intersections over union (IoU) thresholds (Table 1). It can be observed that the custom size of the anchors had a positive effect on increasing the mAP for Mask-RCNN-101-FPN for both v1 and v2 of the weights (Table 1). This maybe has contributed to detecting occluded or difficult-to-detect objects in images. The Mask-RCNN models' performance was higher with the v2 (Table 1-2). Except for the aeroplane and boat, Mask-RCNN-ResNet-50 FPN all classes obtained the highest detection accuracies (Table 2)

Table 1. Best models performance on the test dataset. The mean average precision is a percentage for all Pascal VOC classes 2012. Mask-RCNN-ResNet101-FPN-CA has customised anchors sizes=(16, 32, 64, 128, 256, 512). The best results are shown in bold.

Network	Epochs	lr	Optimizer	mAP @IoU [0:50,0:05,0:95] detection	mAP @IoU=50 detection	mAP @IoU=75 detection	mAP @IoU [0:50,0:05,0:95] segmentation	mAP @IoU=50 segmentation	mAP @IoU=75 segmentation
Faster-RCNN-ResNet50-FPN	15	0.005	SGD	51.7	79.1	56.13	NA	NA	NA
Mask-RCNN-ResNet50-FPN	15	0.005	SGD	53.6	78.2	62.43	44.56	71.43	47.7
Mask-RCNN-ResNet101-FPN	20	0.0001	Adam	42.1	72.2	44.4	38.2	64.2	39.8
Mask-RCNN-ResNet101-FPN-CA^&	20	0.0001	Adam	43.7	69.9	48.8	39.2	63.9	42.5
Mask-RCNN-ResNet50-FPN^&	20	0.005	SGD	73.51	92.08	84.86	66.7	91.04	75.01
Mask-RCNN-ResNet101-FPN^&	20	0.0001	Adam	42.82	80.0	52.18	49.62	78.25	55.29
Mask-RCNN-ResNet101-FPN-CA^&	20	0.0001	Adam	60.97	85.62	68.61	56.03	83.67	61.25

^& Results were obtained by fine-tuning the models using the weight version 2.

Table 2. Best models performance on the test dataset. The mean average precision is given per each object class of the Pascal VOC 2012 dataset. All results are shown in percentages. Mask-RCNN-ResNet101-FPN-CA has customised anchors sizes=(16, 32, 64, 128, 256, 512).

Model	Task	aeroplane	bicycle	bird	boat	bottle	bus	car	cat	chair	cow	diningtable	dog	horse	motorbike	person	pottedplant	sheep	sofa	train	tvmonitor
Faster-RCNN-ResNet50-FPN	detection	92.5	68.17	89.73	76.43	69.0	92.32	81.42	85.87	48.28	78.64	53.57	84.38	88.61	93.32	89.21	71.39	57.67	80.08	94.1	87.68
Mask-RCNN-ResNet50-FPN^&	detection	94.91	95.45	93.16	93.03	86.3	99.84	93.23	93.72	84.78	93.18	91.42	94.26	97.36	95.78	94.35	82.14	90.05	80.49	98.77	89.43
Mask-RCNN-ResNet50-FPN^&	segmentation	95.79	95.91	93.16	85.4	86.12	99.84	93.93	93.72	80.58	93.99	89.91	92.57	97.36	95.78	94.21	83.61	90.05	70.72	98.77	89.43
Mask-RCNN-ResNet101-FPN^&	detection	81.08	86.14	81.77	86.67	68.85	95.55	75.43	83.41	69.79	71.82	73.77	82.97	74.76	86.72	89.02	78.17	81.37	58.37	86.88	87.46
Mask-RCNN-ResNet101-FPN^&	segmentation	83.99	75.4	83.76	84.12	64.91	94.15	74.92	83.41	61.93	71.82	66.31	82.77	73.26	86.68	87.67	80.03	83.24	53.21	86.88	86.46
Mask-RCNN-ResNet101-FPN-CA^&	detection	97.11	86.02	92.59	94.73	67.88	89.2	87.47	89.2	78.35	84.82	84.95	82.84	88.24	92.61	90.93	76.63	81.53	68.6	93.47	85.28
Mask-RCNN-ResNet101-FPN-CA^&	segmentation	97.11	73.47	94.68	92.53	67.88	89.2	87.47	89.97	69.28	85.93	69.15	79.87	87.69	93.85	90.3	78.02	82.42	66.41	94.65	83.53

^& Results by class were obtained by fine-tuning the models using the weight version 2.

4.1 Inference

Models inference on the test subset are depicted in Figure 1.

(a)

(b)

(c)

(d)

Figure 1. Trained Models inference on the test set. (a) Faster-RCNN-ResNet-50-FPN. (b) Mask-RCNN-ResNet50-FPN. (c) Mask-RCNN-ResNet101-FPN. (d) Mask-RCNN-ResNet101-FPN-CA.

References

[1] The PASCAL Visual Object Classes Homepage’. http://host.robots.ox.ac.uk/pascal/VOC/

[2] COCO - Common Objects in Context’. https://cocodataset.org/#detection-eval (accessedAug. 02, 2020).

[3] K. Morabia, J. Arora, and T. Vijaykumar, ‘Attention-based Joint Detection of Object and Semantic Part’, arXiv:2007.02419 [cs], Jul. 2020, Accessed: Jul. 02, 2020. [Online]. Available: http://arxiv.org/abs/2007.02419

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
inference		inference
utility		utility
README.md		README.md
pascal_dataset.py		pascal_dataset.py
voc_train_eval.ipynb		voc_train_eval.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Computer Vision: Pascal VOC dataset

Instance Segmentation and detection of Pascal VOC objects

A Ascencio-Cabral

Models

Evaluation - Coco style metrics

1. Introduction

3. Methods

3.1 Environment

3.2 Datasets

3.3 Training

3.4 Evaluation metrics

4. Results - Coco style metrics

4.1 Inference

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Wb-az/Image-Processing-PascalVOC

Folders and files

Latest commit

History

Repository files navigation

Computer Vision: Pascal VOC dataset

Instance Segmentation and detection of Pascal VOC objects

A Ascencio-Cabral

Models

Evaluation - Coco style metrics

1. Introduction

3. Methods

3.1 Environment

3.2 Datasets

3.3 Training

3.4 Evaluation metrics

4. Results - Coco style metrics

4.1 Inference

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages