This repository contains a movie recommendation system built using Python and the Surprise library for collaborative filtering and scikit-learn for content-based filtering. The dataset used is from GroupLens and can be found at MovieLens 25M dataset.
The dataset used in this recommendation system is the MovieLens 25M dataset provided by GroupLens. It contains 25 million ratings and one million tag applications applied to 62,000 movies by 162,000 users. The dataset includes the following files:
movies.csv
: Contains movie information, includingmovieId
,title
, andgenres
.ratings.csv
: Contains user movie ratings, includinguserId
,movieId
,rating
, andtimestamp
.links.csv
: Contains identifiers that can be used to link to other sources of movie data, includingmovieId
,imdbId
, andtmdbId
.
The movie recommendation system combines two popular recommendation techniques:
-
Collaborative Filtering: A method that predicts a user's preference for an item based on the preferences of similar users. In this project, the Singular Value Decomposition (SVD) algorithm from the scikit-surprise library is used for collaborative filtering.
-
Content-Based Filtering: A method that predicts a user's preference for an item based on the item's features and the user's preferences for similar features. In this project, the Term Frequency-Inverse Document Frequency (TF-IDF) approach is used to create a feature vector for movie genres, and the cosine similarity is calculated to measure the similarity between movies based on their genres.
To run the script, you need to install the following libraries:
- scikit-surprise
- pandas You can install them using pip:
pip install scikit-surprise
pip install pandas
- Download the dataset from MovieLens 25M dataset and extract the files to the to the same folder as the repository .
- Change the MovieLens 25M dataset folder name to
data
- Run the script
movie_recommended_system.ipynb
using Jupyternotebook
The code consists of the following steps:
- Import required libraries and load the dataset
- Prepare the data for collaborative filtering
- Merge the datasets
- Build a collaborative filtering model
- Prepare the data for content-based filtering
- Create a function for content-based recommendations
- Combine collaborative and content-based filtering
- Test the recommendation system
To test the recommendation system, follow these steps:
- Set the user ID, movie title, and the number of recommendations to generate.
- Call the hybrid_recommendations() function to generate n recommendations for the user based on the provided movie title.
- The recommendations will be printed as a list of movie titles with their corresponding ranks. For example:
user_id = 2
title = 'Toy Story (1995)'
n = 10
recommendations = hybrid_recommendations(user_id, title, n)
recommendations_list = recommendations.tolist()
print(f"Top {n} recommendations for User {user_id} who likes '{title}':")
for i, movie_title in enumerate(recommendations_list, start=1):
print(f"{i}. {movie_title}")
This will print the top 10 recommendations for User 2 who likes 'Toy Story (1995)'
You can customize the recommendation system by modifying the following parameters:
- User ID: Change the user_id variable to the desired user ID to generate recommendations for a specific user.
- Movie Title: Change the title variable to the desired movie title to base the recommendations on a specific movie.
- Number of Recommendations: Change the n variable to the desired number of recommendations to generate a custom number of top recommendations.
For example:
user_id = 10
title = 'The Matrix (1999)'
n = 5
recommendations = hybrid_recommendations(user_id, title, n)
This will generate the top 5 recommendations for User 10 who likes 'The Matrix (1999)'.
Contributions to improve the movie recommendation system are welcome. You can contribute by:
- Forking the repository
- Creating a new branch with your changes
- Submitting a pull request for review Please ensure your code follows the established coding style and is well documented.