-
Notifications
You must be signed in to change notification settings - Fork 0
/
CITATION.cff
80 lines (79 loc) · 3.39 KB
/
CITATION.cff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: SaRLVision
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Matthias
family-names: Bartolo
email: matthias.bartolo.21@um.edu.mt
affiliation: University of Malta
orcid: 'https://orcid.org/0009-0006-1353-4556'
- given-names: Dylan
family-names: Seychell
email: dylan.seychell@um.edu.mt
affiliation: University of Malta
orcid: 'https://orcid.org/0000-0002-2377-9833'
- given-names: Josef
family-names: Bajada
email: josef.bajada@um.edu.mt
affiliation: University of Malta
orcid: 'https://orcid.org/0000-0002-8274-6177'
repository-code: 'https://github.com/mbar0075/SaRLVision'
abstract: >-
In an era where sustainability and transparency are
paramount, the importance of effective object detection
algorithms, pivotal for enhancing efficiency, safety, and
automation across various domains, cannot be overstated.
While these algorithms such as YOLO and Faster R-CNN, are
notably fast, unfortunately they lack transparency in
their decision-making process. This study explores a
series of experiments on object detection, which combines
reinforcement learning-based visual attention methods with
saliency ranking techniques, in an effort to investigate
transparent and sustainable solutions. By employing
saliency ranking techniques that emulate human visual
perception, the reinforcement learning agent is provided
with an initial bounding box prediction. The agent, then
iteratively refines these bounding box predictions by
selecting from a finite set of actions over multiple time
steps, ultimately achieving accurate object detection.
This research also investigates the use of various image
feature extraction methods, in addition to exploring
diverse Deep Q-Network (DQN) architectural variations for
deep reinforcement learning-based localisation agent
training. Additionally, it focuses on optimising the
pipeline at every juncture by prioritising lightweight and
faster models. Another feature of the proposed system
includes the classification of detected objects, a
capability absent in previous reinforcement learning
approaches. After evaluating the performance of these
agents using the Pascal VOC 2007 dataset, faster and more
optimised models were developed. Notably, the best mean
Average Precision (mAP) achieved in this study was 51.4,
surpassing benchmarks from RL-based single object
detectors present in the literature. The designed system
provides a distinct edge over previous methods by allowing
multiple configurable real-time visualisations. These
visualisations offer users a clear view of the current
bounding boxes' coordinates and the types of actions being
performed, both of which enable a more intuitive
understanding of algorithmic decisions. Ultimately,
fostering trust and transparency in object detection
systems, aiding in the deployment of artificial
intelligence techniques in high-risk areas, while
continuously advancing research in the field of AI.
keywords:
- Artificial Intelligence
- Object Detection
- Computer Vision
- Reinforcement Learning
- Saliency Ranking
- Deep Learning
- Self-Explaining AI
license: MIT
version: 1.0.0
date-released: '2024-05-12'