This repository contains our work for the ChallengeIA, part of our third year at CentraleSupélec in the AI track.
This challenge is carried out in partnership with Headmind Partners.
The Dior dataset consists of two folders and a CSV file:
data/
│
├── DAM/
│ ├── 01BB01A2102X4847.jpeg
│ ├── ...
│
├── test_image_headmind/
│ ├── image-20210928-102713-12d2869d.jpeg
│ ├── ...
│
└── product_list.csv
-
The "DAM" folder contains all the reference JPEGs for each item (2,766 items). The name of each JPEG corresponds to its MMC referenced in the CSV. Each image is 256x256 pixels.
-
The second folder, "test_image_headmind", contains the test images (80 test images). All items in these images are referenced in DAM and the CSV file. The size of these images varies. The images are not annotated. The file name follows the naming convention of the camera.
-
The "product_list" CSV file includes the unique MMC code for each item as well as the Product_BusinessUnitDesc specifying the class of the item (Bags, Shoes, etc).
The goal of the project is to retrieve the reference of an item from a photo of it. Therefore, the visual characteristics of the objects must be used to identify the item.
Example: For example, given the image ./test_image_headmind/IMG_6880.jpg
, the model should return the image ./DAM/BOBYR1UXR42FR.jpeg
.
This challenge allowed us to implement a pipeline to retrieve references of luxury items photographed, using only the image as data.
The methods used in this project are:
- Image processing and data formatting (background removal, cropping, resizing)
- Data augmentation (flip, rotation, color)
- Transfer learning, based on ResNet-50, DINOV2, CLIP, ...
- Fine tuning
- Pipeline benchmarking
The most effective model is ResNet-50, leveraging data augmentation with horizontal flips and using the cosine metric. The model is evaluated based on accuracy, extended to top 3 and top 5: a prediction is considered correct when the exact expected product reference is proposed by the model in its top 3 (respectively top 5).
Top 1 Accuracy | Top 3 Accuracy | Top 5 Accuracy |
---|---|---|
45% | 60% | 69% |
We designed our solution with a product-oriented vision, keeping in mind the final utility for the client. This product can typically be integrated into a mobile application so that the client can obtain the exact reference of a scanned product in-store using their phone, or so that the seller can perform inventory by scanning items in the shop.
The presentation used for our defense can be found at this link