Reproducing HybridSVD paper

This repository contains full source code for reproducing results from the HybridSVD paper. If you want to run it on your own machine, make sure to prepare conda environment according to this configuation file, which contains the list of all required packages (including their versions).

You can also interactively run experiments directly in your browser with the help of Binder cloud technologies. Simply click on the badge below to get started:

This will launch interactive JupyterLab environment with access to all repository files. By default it starts with the HybridSVD.ipynb notebook that contains the code for HybridSVD model evaluated on the Movielens and Bookcrossing datasets.

Mind cloud environment restrictions

Due to restrictions on Binder's cloud resources only small datasets, e.g., Movielens-1M or Amazon Video Games, allow performing full experiments without interruption. Attempts to work with larger files will likely crash the environment. Originally all experiments were conducted on HPC servers with much larger amount of hardware resources. It is, therefore, advised to make the following modifications to run jupyter notebooks safely in the Binder cloud:

Working with Movielens-1M data

Experiments with this dataset are available in the following files:

Baselines.ipynb
HybridSVD.ipynb
FactorizationMachines.ipynb
LCE.ipynb
ScaledSVD.ipynb
ScaledHybridSVD.ipynb

You need to change the data_labels variable in the Experiment setup section of each notebook from

data_labels = ['ML1M', 'ML10M', 'BX']

to

data_labels = ['ML1M']

Accordingly, do not run cells under Movielens10M and BookCrossing headers (these datasets are not provided in the cloud environment). Also make sure that the first argument to the get_movielens_data is ../datasets/movielens/ml-1m.zip (originally the notebooks were executed on several machines that's why the path may vary), e.g., it should start as:

data_dict[lbl], meta_dict[lbl] = get_movielens_data('../datasets/movielens/ml-1m.zip',
                                                     <other arguments>

Working with Amazon Video Games data

Experiments with this dataset are available in the following files:

Baselines_AMZ.ipynb
HybridSVD_AMZ.ipynb
FactorizationMachines_AMZ.ipynb
LCE_AMZ.ipynb
ScaledSVD_AMZ.ipynb
ScaledHybridSVD_AMZ.ipynb

You need to change the data_labels variable in the Experiment setup section from

data_labels = ['AMZe', 'AMZvg']

to

data_labels = ['AMZvg']

Accordingly, do not run cells under AMZe header. Again, make sure to provide correct input arguments to the get_amazon_data. In this case they are:

data_dict[lbl], meta_dict[lbl] = get_amazon_data('../datasets/amazon/ratings_Video_Games.csv',
                                                 meta_path='../datasets/amazon/meta/meta_Video_Games.json.gz',
                                                 <other arguments>

Reducing training time

Keep in mind that some models require much longer training time than others. For example, the whole experiment for HybridSVD in both standard and cold start scenarios on the Movielens-1M dataset completes even before the initial tuning of Factorization Machines is done for standard scenario. As Binder automatically shuts down long running tasks you may not be able to perform all computations before the timeout. To reduce the risk of such shutdown you may want to run different notebooks (different models) in independent Binder sessions. You may also want to reduce the number of points to consider in the random grid search for tuning non SVD-based models. For example, in the FM case you can change the ntrial=60 input to ntrials=30 in the fine_tune_fm(model, params, label, ntrials=60) function calls. This may, however, slightly decrease the resulting quality of FM.

Alternatively, you can skip parameter tuning sections for long-running models and reuse previously found set of nearly optimal hyper-parameters. They are printed in the end of each section with model tuning. You can also find them in the View optimal parameters notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
binder		binder
data		data
results		results
.gitignore		.gitignore
Baselines.ipynb		Baselines.ipynb
Baselines_AMZ.ipynb		Baselines_AMZ.ipynb
Baselines_YaMus.ipynb		Baselines_YaMus.ipynb
FactorizationMachines.ipynb		FactorizationMachines.ipynb
FactorizationMachines_AMZ.ipynb		FactorizationMachines_AMZ.ipynb
FactorizationMachines_YaMus.ipynb		FactorizationMachines_YaMus.ipynb
HybridSVD.ipynb		HybridSVD.ipynb
HybridSVD_AMZ.ipynb		HybridSVD_AMZ.ipynb
HybridSVD_YaMus.ipynb		HybridSVD_YaMus.ipynb
LCE.ipynb		LCE.ipynb
LCE_AMZ.ipynb		LCE_AMZ.ipynb
LCE_YaMus.ipynb		LCE_YaMus.ipynb
MatrixFactorization.ipynb		MatrixFactorization.ipynb
MatrixFactorization_AMZ.ipynb		MatrixFactorization_AMZ.ipynb
MatrixFactorization_YaMus.ipynb		MatrixFactorization_YaMus.ipynb
README.md		README.md
ScaledHybridSVD.ipynb		ScaledHybridSVD.ipynb
ScaledHybridSVD_AMZ.ipynb		ScaledHybridSVD_AMZ.ipynb
ScaledHybridSVD_YaMus.ipynb		ScaledHybridSVD_YaMus.ipynb
ScaledSVD.ipynb		ScaledSVD.ipynb
ScaledSVD_AMZ.ipynb		ScaledSVD_AMZ.ipynb
ScaledSVD_YaMus.ipynb		ScaledSVD_YaMus.ipynb
View_all_results.ipynb		View_all_results.ipynb
View_optimal_parameters.ipynb		View_optimal_parameters.ipynb
data_preprocessing.py		data_preprocessing.py
hybrids.py		hybrids.py
lce.py		lce.py
scaledsvd.py		scaledsvd.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reproducing HybridSVD paper

Mind cloud environment restrictions

Working with Movielens-1M data

Working with Amazon Video Games data

Reducing training time

About

Releases

Packages

Contributors 2

Languages

evfro/recsys19_hybridsvd

Folders and files

Latest commit

History

Repository files navigation

Reproducing HybridSVD paper

Mind cloud environment restrictions

Working with Movielens-1M data

Working with Amazon Video Games data

Reducing training time

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages