This repository contains the solution for the ML Challenge 2025, a competition focused on predicting e-commerce product prices from their catalog descriptions and images. The solution employs a powerful and efficient two-stage, multimodal deep learning approach to tackle the problem.
- Best Validation SMAPE Score:
73.9450%
To handle the large dataset and the complexity of training multimodal models within a hackathon timeline, a Two-Stage Pre-computation Strategy was implemented. This approach decouples the slow feature extraction from the fast model training, allowing for rapid iteration.
The first stage involves using large, pre-trained deep learning models as feature extractors. These models are run only once to process the entire dataset and save the resulting high-dimensional feature vectors (embeddings).
- Text Feature Extraction: A pre-trained
distilbert-base-uncasedmodel from the Hugging Facetransformerslibrary was used to convert product descriptions (catalog_content) into 768-dimension text embeddings. - Image Feature Extraction: A pre-trained
efficientnet_b0model from thetimmlibrary was used to convert product images into 1280-dimension image embeddings.
This process was executed in a memory-efficient manner by processing data in batches and saving each batch's embeddings directly to disk, preventing RAM crashes in the Colab environment.
The second stage involves training a small, fast neural network on the pre-computed embeddings.
- Input: The text and image embeddings from Stage 1 are concatenated to form a single 2048-dimension feature vector for each product.
- Model: A simple feed-forward neural network (Regression Head) with two hidden layers,
BatchNorm, andDropoutwas trained to map these features to the final price prediction. - Speed: This training process is incredibly fast, completing in just a few minutes on a GPU, which allows for extensive experimentation with hyperparameters like learning rate and model architecture.
.
├── ML_Challenge_2025/
│ ├── dataset/
│ │ ├── train.csv # Training data
│ │ ├── test.csv # Test data
│ │ └── images/ # Downloaded product images
│ │
│ ├── embeddings_batched/
│ │ ├── train_text/ # Saved text embeddings for training set
│ │ ├── train_image/ # Saved image embeddings for training set
│ │ ├── test_text/ # Saved text embeddings for test set
│ │ └── test_image/ # Saved image embeddings for test set
│ │
│ ├── ML_Challange.ipynb # Main Colab notebook with all code
│ ├── fast_regression_model.pth # Saved weights of the trained model
│ └── test_out.csv # Final submission file
│
└── README.md # You are here
This project was developed in Google Colab using a GPU runtime.
-
Setup Google Drive:
- Create a folder named
ML_Challenge_2025in your Google Drive. - Inside it, create a
datasetfolder and uploadtrain.csvandtest.csv. - Run an image downloader script to populate the
dataset/images/folder.
- Create a folder named
-
Part 1 - Generate Embeddings:
- Open the
ML_Challange.ipynbnotebook in Google Colab and set the runtime to GPU. - Run the "Part 1" code cells. This will process all text and images and save the embeddings into the
embeddings_batchedfolder in your Drive. (Note: This is the slow part).
- Open the
-
Part 2 - Train and Predict:
- Once Part 1 is complete, run the "Part 2" code cells in the same notebook.
- This will load the saved embeddings, train the fast regression model, and generate the final
test_out.csvfile in your project directory.
- PyTorch: Core deep learning framework.
- Transformers (Hugging Face): For loading the DistilBERT text model.
- timm (PyTorch Image Models): For loading the EfficientNet-B0 image model.
- Pandas: For data manipulation.
- Scikit-learn: For data splitting.
- Pillow: For image processing.