This repository contains LUPI-OD, the first method to apply Learning Using Privileged Information (LUPI) to Object Detection (OD). It improves performance without increasing model size, making it ideal for applications that demand lightweight, efficient solutions.
Datasets Used for Evaluation:
Popular Object Detection Models Used:
Click-to-View
Object detection is widely recognised as a foundational task within computer vision, with applications spanning automation, medical imaging, and surveillance. Although numerous models and methods have been developed, attaining high detection accuracy often requires the utilisation of complex model architectures, especially those based on transformers. These models typically demand extensive computational resources for inference and large-scale annotated datasets for training, both of which contribute to the overall difficulty of the task.
To address these challenges, this work introduces a novel methodology incorporating the Learning Using Privileged Information (LUPI) paradigm within the object detection domain. The proposed approach is compatible with any object detection architecture and operates by introducing privileged information to a teacher model during training. This information is then distilled into a student model, resulting in more robust learning and improved generalisation without increasing the number of model parameters and complexity.
The methodology is evaluated on general-purpose object detection tasks and a focused case study involving litter detection in visually complex, highly variable outdoor environments. These scenarios are especially challenging due to the target objects' small size and inconsistent appearance. Evaluation is conducted both within individual datasets and across multiple datasets to assess consistency and generalisation. A total of 120 models are trained, covering five well-established object detection architectures. Four datasets are used in the evaluation: three focused on UAV-based litter detection and one drawn from the Pascal VOC 2012 benchmark to assess performance in multi-label detection and generalisation.
Experimental results consistently demonstrate improvements in detection accuracy across all model types and dataset conditions when employing the LUPI framework. Notably, the approach yields increases of 0.02 to 0.15 in the strict mean Average Precision (mAP)@50-95 metric, highlighting its robustness across both general-purpose and domain-specific tasks. In nearly all cases, we observed performance improvements, indicating that the proposed methodology achieves such results without increasing the number of parameters or altering the model architecture. This supports its viability as a lightweight and effective modification to existing object detection systems.
This method leverages the Learning Using Privileged Information (LUPI) paradigm to boost object detection performance by providing extra supervision during training. Privileged information is fed to a teacher model and then distilled into a student model. The key steps are:
-
Generating Privileged Information:
For every image, a single-channel bounding box mask is created as additional supervisory input. -
Training the Teacher Model:
The teacher model receives both the original image and the privileged mask as multi-channel input. It is trained to predict object classes alongside the bounding box masks. -
Distilling Knowledge to the Student Model:
The student model learns from the teacher’s soft labels. A loss function based on cosine distance between the final backbone layer features of both models guides the student to match the teacher’s internal representations.
%%{init: {
"themeVariables": {
"fontSize": "16px",
"edgeLabelFontSize": "14px",
"edgeLabelColor": "#37474F",
"primaryColor": "#6A1B9A",
"primaryBorderColor": "#4A148C",
"secondaryColor": "#81C784",
"secondaryBorderColor": "#388E3C",
"tertiaryColor": "#FFB74D",
"tertiaryBorderColor": "#F57C00",
"background": "#FFFFFF",
"textColor": "#212121"
}
}}%%
flowchart TD
classDef privileged fill:#FFB74D,stroke:#F57C00,stroke-width:2px,color:#5D4037,font-weight:bold;
classDef teacher fill:#81C784,stroke:#388E3C,stroke-width:2px,color:#1B5E20,font-weight:bold;
classDef student fill:#6A1B9A,stroke:#4A148C,stroke-width:2px,color:#D1C4E9,font-weight:bold;
classDef step fill:#E3F2FD,stroke:#90CAF9,stroke-width:1px,color:#0D47A1;
%% Step 1 - Privileged Info
PI[Step 1: Generate Privileged Information]:::privileged
PI1[Create single-channel bounding box mask for each image]:::step
%% Step 2 - Teacher Model Training
TM[Step 2: Train Teacher Model]:::teacher
TM1["Input: Original image and privileged mask (multi-channel)"]:::step
TM2[Output: Predict object classes and bounding box masks]:::step
%% Step 3 - Student Distillation
SK[Step 3: Distill Knowledge to Student Model]:::student
SK1[Train student on teacher's soft labels]:::step
SK2[Use cosine distance loss on latent features to align representations]:::step
%% Layout connections
PI --> PI1
PI1 --> TM
TM --> TM1
TM --> TM2
TM1 --> SK
TM2 --> SK
SK --> SK1
SK --> SK2
-
Introducing LUPI to Object Detection
This research demonstrates how integrating the Learning Using Privileged Information (LUPI) paradigm into object detection—particularly for litter detection—can enhance performance without changing the model architecture or affecting inference speed. -
Enhanced Accuracy in Litter Detection and Localisation
Results show significant improvements in detecting litter, especially smaller objects. The approach yields stronger gains in binary object localisation and also improves multi-label detection performance. -
Model-Agnostic Improvements
The approach works effectively across multiple detection models without increasing the number of parameters or inference time. While training time rises due to the teacher model, inference remains efficient during deployment. -
Strong Generalization Across Litter Datasets
Extensive testing confirms that the approach generalizes well within the primary litter detection dataset and across others, improving detection of small and partially occluded objects in varied scenarios. -
Broader Impact on Object Detection Tasks
Beyond litter detection, the technique enhances multi-label detection performance on general object detection datasets. However, accuracy tends to decrease as the number of object classes grows.
Model | Config | Size (MB) | Params (M) | Classes | Channels |
---|---|---|---|---|---|
Faster R-CNN | Baseline | 157.92 | 41.40 | 21 | 3 |
Student | 157.92 | 41.40 | 21 | 3 | |
RetinaNet | Baseline | 124.22 | 32.56 | 21 | 3 |
Student | 124.22 | 32.56 | 21 | 3 | |
FCOS | Baseline | 122.48 | 32.11 | 21 | 3 |
Student | 122.48 | 32.11 | 21 | 3 | |
SSD | Baseline | 100.27 | 26.29 | 21 | 3 |
Student | 100.27 | 26.29 | 21 | 3 | |
SSD Lite | Baseline | 9.42 | 2.47 | 21 | 3 |
Student | 9.42 | 2.47 | 21 | 3 |
Explored Privileged Information Channels
Preliminary Experiment Results
All results shown below reflect the performance of teacher models across key object detection metrics:
Model | mAP@50-95 | mAP@50 | mAP@75 | mAR@1 | mAR@10 | mAR@100 | Precision | Recall | F1 Score |
---|---|---|---|---|---|---|---|---|---|
RetinaNet | 0.77 | 0.86 | 0.79 | 0.60 | 0.81 | 0.81 | 0.26 | 0.90 | 0.38 |
FCOS 🥈 | 0.80 | 0.88 | 0.82 | 0.61 | 0.84 | 0.84 | 0.43 | 0.91 | 0.56 |
Faster R-CNN 🥇 | 0.77 | 0.91 | 0.82 | 0.59 | 0.82 | 0.82 | 0.56 | 0.91 | 0.68 |
SSD | 0.42 | 0.56 | 0.49 | 0.41 | 0.48 | 0.48 | 0.25 | 0.69 | 0.36 |
SSDLite | 0.49 | 0.61 | 0.54 | 0.46 | 0.55 | 0.55 | 0.04 | 0.79 | 0.07 |
Python 3.9+
CUDA-capable GPU (recommended)
git clone https://github.com/mbar0075/lupi-for-object-detection.git
cd lupi-for-object-detection
pip install -r requirements.txt
This research was carried out at the University of Malta and submitted in partial fulfilment of the requirements for the Master of Science Degree by Research. It was supervised by Dr. Dylan Seychell and Dr. Konstantinos Makantasis. The full master’s dissertation, which includes the research question, background, methodology, evaluation, and analysis, can be downloaded below.
@mastersthesis{bartolo2025privilegedinfo,
title={Investigating the Role of Learning using Privileged Information in Object Detection},
author={Bartolo, Matthias},
type={{M.Sc.} thesis},
year={2025},
school={University of Malta}
}
The main findings of this research have also been accepted at the 2025 IEEE 13th European Workshop on Visual Information Processing (EUVIP 2025)
@misc{bartolo2025learningusingprivilegedinformation,
title={Learning Using Privileged Information for Litter Detection},
author={Matthias Bartolo and Konstantinos Makantasis and Dylan Seychell},
year={2025},
eprint={2508.04124},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.04124},
}
This project is licensed under the MIT License. See the LICENSE file for details.
For questions, collaboration, or feedback, please contact Matthias Bartolo