This work was accepted for full paper presentation at the 2023 International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG 2023), held virtually and in-person in Pilsen, Czech Republic:
- The final version of our paper (as published in Computer Science Research Notes) can be accessed via this link.
- Our dataset of mirrors and reflective surfaces is publicly released for future researchers.
If you find our work useful, please consider citing:
@ARTICLE{2023-E59,
author={Gonzales, Mark Edward M. and Uy, Lorene C. and Ilao, Joel P.},
title={Designing a Lightweight Edge-Guided Convolutional Neural Network for Segmenting Mirrors and Reflective Surfaces},
journal={Computer Science Research Notes},
year={2023},
volume={3301},
pages={107-116},
doi={10.24132/CSRN.3301.14},
publisher={Union Agency, Science Press},
issn={2464-4617},
abbrev_source_title={CSRN},
document_type={Article},
source={Scopus}}
This repository is also archived on Zenodo.
ABSTRACT: The detection of mirrors is a challenging task due to their lack of a distinctive appearance and the visual similarity of reflections with their surroundings. While existing systems have achieved some success in mirror segmentation, the design of lightweight models remains unexplored, and datasets are mostly limited to clear mirrors in indoor scenes. In this paper, we propose a new dataset consisting of 454 images of outdoor mirrors and reflective surfaces. We also present a lightweight edge-guided convolutional neural network based on PMDNet. Our model uses EfficientNetV2-Medium as its backbone and employs parallel convolutional layers and a lightweight convolutional block attention module to capture both low-level and high-level features for edge extraction. It registered maximum F-measure scores of 0.8483, 0.8117, and 0.8388 on the Mirror Segmentation Dataset (MSD), Progressive Mirror Detection (PMD) dataset, and our proposed dataset, respectively. Applying filter pruning via geometric median resulted in maximum F-measure scores of 0.8498, 0.7902, and 0.8456, respectively, performing competitively with the state-of-the-art PMDNet but with 78.20× fewer floating-point operations per second and 238.16× fewer parameters.
INDEX TERMS: Mirror segmentation, Object detection, Convolutional neural network (CNN), CNN filter pruning
Run the following command to train the unpruned model:
python train.py
- The images should be saved in
<training_path>/image
. - The ground-truth masks should be saved in
<training_path>/mask
. - The ground-truth edge maps should be saved in
<training_path>/edge
. - The training checkpoints will be saved in
<checkpoint_path>
. training_path
andcheckpoint_path
can be set inconfig.py
.
To retrain the pruned model, follow the instructions in prune.py
.
Run the following command to perform prediction using the unpruned model:
python predict.py
Run the following command to perform prediction using the pruned model:
python prune.py
- The images should be saved in
<testing_path>/<dataset_name>/image
. - The file path to the unpruned model weights should be
<weights_path>
. - The file path to the pruned model weights should be
<pruned_weights_path>
. - The predicted masks will be saved in
<result_path>/<dataset_name>
. testing_path
,dataset_name
,weights_path
,pruned_weights_path
, andresult_path
can be set inconfig.py
.
Run the following command to perform model evaluation:
python misc.py
- The predicted masks should be saved in
<result_path>/<dataset_name>
. - The ground-truth masks should be saved in
<testing_path>/<dataset_name>/mask
. result_path
,testing_path
, anddataset_name
can be set inconfig.py
.
By default, train.py
, predict.py
, and prune.py
use the model defined in pmd.py
, which employs an EfficientNetV2-Medium backbone and our proposed edge extraction and fusion module.
To explore the other feature extraction backbones that we considered in our experiments, refer to the models in models_experiments
and the weights in this Drive:
Model | Weights |
---|---|
[Best] EfficientNetV2-Medium | Link |
[Best, Pruned] EfficentNetV2-Medium | Link |
ResNet-50 | Link |
ResNet-50 (+ PMD's original EDF module) | Link |
Xception-65 | Link |
VoVNet-39 | Link |
MobileNetV3 | Link |
EfficientNet-Lite | Link |
EfficientNetEdge-Large | Link |
EDF stands for edge detection and fusion.
Note: With the exception of ResNet-50 (+ PMD's original EDF module), the models in the table above use our proposed edge extraction and fusion module.
Our proposed dataset, DLSU-OMRS (De La Salle University – Outdoor Mirrors and Reflective Surfaces), can be downloaded from this link. The images have their respective licenses, and the ground-truth masks are licensed under the BSD 3-Clause "New" or "Revised" License. The use of this dataset is restricted to noncommercial purposes only.
The split PMD dataset, which we used for model training and evaluation, can be downloaded from this link. Our use of this dataset is under the BSD 3-Clause "New" or "Revised" License.
The following Python libraries and modules (other than those that are part of the Python Standard Library) were used:
Library/Module | Description | License |
---|---|---|
PyTorch | Provides tensor computation with strong GPU acceleration and deep neural networks built on a tape-based autograd system | BSD 3-Clause License |
PyTorch Images Models | Collection of state-of-the-art computer vision models, layers, and utilities | Apache License 2.0 |
Neural Network Intelligence | Provides tools for hyperparameter optimization, neural architecture search, model compression and feature engineering | MIT License |
Pillow | Provides functions for opening, manipulating, and saving image files | Historical Permission Notice and Disclaimer |
scikit-image | Provides algorithms for image processing | BSD 3-Clause "New" or "Revised" License |
PyDenseCRF | Python wrapper to dense (fully connected) conditional random fields with Gaussian edge potentials. | MIT License |
tqdm | Allows the creation of progress bars by wrapping around any iterable | Mozilla Public Licence (MPL) v. 2.0, MIT License |
NumPy | Provides a multidimensional array object, various derived objects, and an assortment of routines for fast operations on arrays | BSD 3-Clause "New" or "Revised" License |
TensorBoardX | Provides visualization and tooling needed for machine learning experimentation | MIT License |
The descriptions are taken from their respective websites.
Note: Although PyDenseCRF can be installed via pip
or its official repository, we recommend Windows users to install it by running setup.py
inside the pydensecrf
directory of our repository to prevent potential issues with Eigen.cpp
(refer to this issue for additional details).
Attributions for reference source code are provided in the individual Python scripts and in the table below:
-
Mark Edward M. Gonzales
mark_gonzales@dlsu.edu.ph -
Lorene C. Uy
lorene_c_uy@dlsu.edu.ph -
Dr. Joel P. Ilao
joel.ilao@dlsu.edu.ph
This is the major course output in a computer vision class for master's students under Dr. Joel P. Ilao of the Department of Computer Technology, De La Salle University. The task is to create an eight-week small-scale project that applies computer vision-based techniques to present a solution to an identified research problem.