Recommender System 2021 Challenge

This repository contains the code used for the Recommender System 2021 Challenge hosted by the Recommender Systems course at Politecnico di Milano. The repository is split in 2 main folders:

Challenge2021 which contains our custom models and scripts created for the competition
RecSysCourseMaterial which contains the codebase from the course framework repo

Overview

The complete description of the problem can be found in the kaggle competition page.

Briefly, given the User Rating Matrix and some Item Content Matrices, the objective of the competition was to create a recommender for TV series/Movies.

The evaluation metric used was the MAP@10.

After a preprocessing phase, we used the following dataset:

URM,
- 13650 users
- 18059 items
- 2.14% data sparsity
ICM
- 18059 items
- 335 attributes
- 1.29% data sparsity

Strategy

We approached the problem through different stages:

At first, we performed some data exploration, in order to find interesting patterns in the dataset, we discovered in fact the following singularities:
- Some episodes belong to more than one TV series/Movie
- Some TV series/Movie even if without channel have been seen by some users
- Some TV series/Movie even if without episodes have been seen by some users
Then we profiled the base models to find the best performers, both in general and in the different user segments (cold, warm and hot).
The next phase was focused on building hybrids, mainly composed by 2 models at time in order to better control their optimization.

Here a more complete presentation of the steps that we followed towards our best model.

Best Model

The ICMs were not so effective in our experiments, thus we decided to focus on the information contained in the URM. Our best model was in fact a Collaborative Stratified Hybrid, composed by different models aggregated at distinct stages.

In particular the final structure was the following:

We opted for a hierarchical structure that increasingly improved the performance of each submodel.

We first separately trained and fine-tuned the base models:
- SLIM Elastic-Net, that reached a MAP of 0.2501 on the validation set
- SLIM-BPR, that was trained on the Cold user segment reaching a MAP of 0.1446 on the validation set
Then we built our MINT_Cold_v2 hybrid, we co-trained two models:
- IALS
- MINT_KNN_Hybrid, another hybrid made of ItemKNNCF and UserKNNCF
The MINT_Cold_v2 was again trained on the Cold user segment reaching a MAP of 0.1604 on the validation set.
At this point we created the Final_Cold_Hybrid linearly combining the two models trained on the Cold user segment:
- SLIM_BPR
- MINT_Cold_v2
This model reached a MAP of 0.1684 on the validation set considering only the Cold user segment.
Our Final_Hybrid was built segmenting the users (the sizes of the user segments are an hyper-parameter of the model) and linearly combining:
- SLIM Elastic-Net
- Final_Cold_Hybrid
The Final_Hybrid reached a MAP of 0.2575 on the validation set.

Evaluation

Public Leaderboard score: 0.50910 (2nd)
Private Leaderboard score: 0.50787 (2nd)

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Challenge2021		Challenge2021
RecSysCourseMaterial		RecSysCourseMaterial
assets		assets
Presentation.pdf		Presentation.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recommender System 2021 Challenge

Overview

Strategy

Best Model

Evaluation

Group Members

About

Releases

Packages

Contributors 2

Languages

Menta99/RecSys2021_Mainetti_Menta

Folders and files

Latest commit

History

Repository files navigation

Recommender System 2021 Challenge

Overview

Strategy

Best Model

Evaluation

Group Members

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages