A content-based movie recommendation system built using Python, Pandas, and scikit-learn, designed to suggest movies/TV shows based on similarity in metadata such as language, availability, release date, and viewing statistics.
The system supports retraining on new datasets and provides a Flask-based web interface for real-time recommendations.
- Content-Based Filtering: Recommendations are based on similarity in categorical and numerical attributes (e.g., genre, language, popularity).
- Data Cleaning Pipeline: Automatic handling of missing values, duplicate titles, and inconsistent formats.
- Custom Feature Engineering: Generates
Content_IDfor efficient mapping and similarity computation. - Web Interface: Flask app for user-friendly recommendation queries.
- Retraining Support: Easily retrain on updated or custom datasets.
- Optimized for Deployment: Saves model, preprocessor, and metadata for quick loading.
-
Data Loading & Cleaning
- Reads CSV dataset
- Cleans numeric fields (
Hours Viewed) - Drops duplicate and missing titles
- Extracts
Release_YearfromRelease Date - Assigns unique
Content_IDto each item
-
Feature Engineering
- Categorical Columns: One-hot encoded
- Numerical Columns: Standard scaled
- Saves metadata and preprocessor for future use
-
Recommendation
- Computes cosine similarity between feature vectors
- Returns top-N most similar items to a given title
git clone https://github.com/yourusername/Netflix_Movie_Recommendation_System.git
cd Netflix_Movie_Recommendation_SystemPlace your dataset (CSV) inside the data/ folder.
Required Columns:
Title Available Globally? Release Date Hours Viewed Language Indicator Content Type
If you want to train on your dataset:
python train.py --input (datapath) --outdir ./models --neighbors 50(optional)
python app.py
http://127.0.0.1:5000/
Example Query:
Input: wednesday
Output: Similar shows/movies based on metadata (language, year, popularity, etc.).
The recommendations are not random — they are based on metadata similarity, meaning the system suggests titles with similar language, release period, and audience engagement patterns.
-
Include genre-based similarity from NLP on movie descriptions
-
Add collaborative filtering using user ratings
-
Support multi-language search
-
Deploy the app to Heroku / Render
Dipean Dasgupta
Computer Science Graduate | EdgeAI & ML Enthusiast