This project focuses on building a movie recommendation system using a dataset from TMDB (The Movie Database). The system provides two types of recommendations:
- Demographic Filtering: Recommends top movies based on score.
- Content-Based Filtering: Recommends movies based on plot similarity or actor/director/keyword similarity.
The primary purpose of this system is to suggest movies that users might enjoy based on their preferences, enhancing their movie-watching experience.
The recommendation system consists of two main components:
- Demographic Filtering: Uses a predefined score to recommend movies.
- Content-Based Filtering:
- TF-IDF Vectorizer: For vectorizing movie plots.
- Count Vectorizer: For vectorizing actors, directors, and keywords.
- Cosine Similarity: For calculating similarity between movies.
- Data Cleaning:
- Removed null values.
- Merged datasets on the movie ID to combine features like cast, crew, and title.
- Feature Engineering:
- Extracted important features such as cast, crew, keywords, and genres.
- Converted stringified lists into Python lists using
literal_eval
.
- Normalization/Scaling:
- Applied vectorization using
CountVectorizer
andTfidfVectorizer
to normalize text features.
- Applied vectorization using
- Demographic Filtering:
- No training required.
- Simply sorted movies based on a predefined score.
- Content-Based Filtering:
- Vectorized the movie plots and other features.
- Calculated cosine similarity between movies.
CountVectorizer
: Default parameters.TfidfVectorizer
: Default parameters.
- Not applicable as it's an unsupervised approach.
- Precision and Recall: Evaluated recommendations based on user feedback.
- Cosine Similarity Score: Used to measure similarity between movies.
- Used a subset of the TMDB dataset for testing recommendations.
- The system effectively recommends movies based on plot and other features.
- Higher cosine similarity scores correlate with more relevant recommendations.
- Environment:
- Streamlit application deployed on a web server.
- APIs/Endpoints:
/recommend
: Endpoint to get movie recommendations based on user input.
- Testing:
- Tested the deployment with different movie inputs to ensure accurate recommendations.
The movie recommendation system provides accurate and relevant suggestions based on user preferences. The combination of demographic and content-based filtering enhances the recommendation quality, making it a robust solution for movie enthusiasts.
- TMDB Movie Metadata: Dataset Link
- Scikit-learn Documentation
- Streamlit Documentation
- Getting Started with a Movie Recommendation System - IBTESAM AHMED
- Data preprocessing scripts.
- Sample API requests and responses.
- Python 3.8+
- Streamlit
- Pandas
- Numpy
- Scikit-learn
- Joblib
- Requests
pip install streamlit pandas numpy scikit-learn joblib requests
-
Clone the repository:
git clone https://github.com/agneepradeep/Movie-Media.git cd Movie-Media
-
Set up virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up the OMDB API key:
- Obtain an API key from OMDB API.
- Create a
.env
file in the root directory of the project and add your API key:OMDB_API_KEY=your_api_key_here
- In place of
OMDB_API_KEY = st.secrets['API_key']
write this lines:from dotenv import load_dotenv import os load_dotenv() OMDB_API_KEY = os.getenv('API_Key')
- It will directly use your api key from your
.env
file
-
Run the Streamlit app:
streamlit run app.py
-
Access the application: Open your browser and navigate to
http://192.168.29.239:8501/
.
By following these steps, you can set up and run the movie recommendation system locally.
Enjoy exploring and finding new movies to watch! on our website Movie-Media