1 - Introduction

Python version: 2.7.15
Modules needed: joblib, sklearn and numpy

1 - Introduction

This folder contains the python scripts used to run the experiments of my coursework at Centro Universitario FEI. The work aims at exploiting cluster specialization in hybrid recommender systems. Because of the coursework deadline, we only adopted user-based and item-based collaborative filtering algorithms for recommendation and k-Means algorithm for clustering. Future research will explore other state-of-art recommender and clustering algorithms. The coursework_diagram.pdf presents a diagram summarizing the coursework methodology.

2 - Dataset

The original dataset used to evaluate the proposed methodology was 1M MovieLens Dataset (dataset folder). Due to computational time, the Usage.py is modeled to use a smaller dataset, the 100k MovieLens Dataset (dataset_small folder). More info about the datasets at: https://grouplens.org/datasets/movielens/

3 - Usage

To run the entire experiment, just run the Usage.py script. There you can find the order the scripts should be run and their respective arguments. Although the scripts were written using parallel processing in order to exploit all the computational resource available (multithread based, not cluster based), some scripts might take a relatively long time to run due to the dataset size and disk I/O operations. Moreover, a considerable amount of data storage is needed (about 2.1 GB for the default experiment configuration).

To adapt this code to different rating-based recommender datasets (user-id, item-id, rating), you must modify the 01_data_split_function.py script in order to read and sample your dataset properly and configure the experiments parameters as you please (basic python knowledge is needed).

Furthermore, you might want to analyze the results generated after the standalone models evaluation in order to define the hybrid model parameters based on the standalone results. To do so, you must modify the 07_compute_evaluate_hybrid.py (basic python knowledge is needed).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dataset		dataset
dataset_small		dataset_small
.gitattributes		.gitattributes
01_data_split_function.py		01_data_split_function.py
02_k-means.py		02_k-means.py
03_finding_clusters.py		03_finding_clusters.py
04_kmeans_similarity_matrices.py		04_kmeans_similarity_matrices.py
04_knn_similarity_matrices.py		04_knn_similarity_matrices.py
05_kmeans_nonzero_predict.py		05_kmeans_nonzero_predict.py
05_knn_nonzero_predict.py		05_knn_nonzero_predict.py
06_kmeans_evaluate_predictions.py		06_kmeans_evaluate_predictions.py
06_knn_evaluate_predictions.py		06_knn_evaluate_predictions.py
07_compute_evaluate_hybrid.py		07_compute_evaluate_hybrid.py
README.md		README.md
Usage.py		Usage.py
coursework_diagram.pdf		coursework_diagram.pdf
support_functions.py		support_functions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1 - Introduction

2 - Dataset

3 - Usage

About

Releases

Packages

Languages

millerhorvath/Course_Work-RecSys

Folders and files

Latest commit

History

Repository files navigation

1 - Introduction

2 - Dataset

3 - Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages