CineBrain is an intelligent movie recommender system that suggests similar movies based on user selection. It utilizes the TMDB 5000 Movie Dataset and employs advanced natural language processing techniques to provide accurate recommendations.
cinebrain.mp4
- Project Overview
- Dataset
- Model Architecture
- Key Features
- Directory Structure
- Installation
- Usage
- Technologies Used
- Future Improvements
- Contributing
- License
CineBrain is designed to enhance the movie discovery experience by leveraging machine learning and natural language processing techniques. The system analyzes various aspects of movies, including overview, cast, crew, genres, and keywords, to generate meaningful recommendations.
The project uses the TMDB 5000 Movie Dataset from Kaggle, which includes metadata on approximately 5,000 movies from The Movie Database (TMDb).
The recommendation model follows these key steps:
-
Data Loading and Preprocessing:
- Load the dataset into a pandas DataFrame
- Handle missing values and drop null entries
- Convert JSON strings to tags for cast and crew information
-
Feature Engineering:
- Tokenize movie overviews
- Create a cumulative tag for each movie, combining information from overview, crew, cast, genres, and keywords
-
Vectorization:
- Utilize CountVectorizer from scikit-learn
- Set dimension size to 5000 and remove English stop words
- Apply stemming using PorterStemmer from NLTK to reduce word duplications
-
Similarity Calculation:
- Compute cosine similarity between movie vectors
-
Recommendation Generation:
- Implement a recommendation method that calculates cosine similarity between the selected movie and all other movies in the dataset
- Return the top 5 movies with the highest similarity scores
- Select a movie from the list to receive top 5 movie recommendations
- User-friendly interface built with Flask, HTMX, and Tailwind CSS
- Efficient data processing and similarity calculation for quick recommendations
cinebrain/
│
├── app/ # Flask application directory
│
├── data/ # Data directory
│ ├── processed_df.pkl
│ ├── tmdb_5000_credits.csv
│ ├── tmdb_5000_movies.csv
│ └── similarity.npy
│
├── recommender_system.ipynb # Jupyter notebook for model development
├── run.py # Script to run the Flask application
├── .env # Environment file for API key
└── README.md
This project uses The Movie Database (TMDb) API to fetch additional movie information. To use the API:
- Create an account on The Movie Database
- Go to your account settings and navigate to the API section
- Request an API key for developer use
- Create a
.env
file in the root directory and add your key:
TMDB_API_KEY=your_api_key_here
- Clone the repository:
git clone https://github.com/ramchaik/cinebrain.git
cd cinebrain
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows, use venv\Scripts\activate
- Install the required dependencies:
pip install -r requirements.txt
- Run the Flask application:
python run.py
-
Open a web browser and navigate to
http://localhost:5000
-
Select a movie from the list to receive recommendations
- Python
- pandas
- scikit-learn
- NLTK
- Flask
- HTMX
- Tailwind CSS
- Implement user authentication and personalized recommendations
- Integrate real-time data updates from TMDb API
- Enhance the user interface with movie posters and additional details
- Develop a mobile application for on-the-go recommendations
Contributions are welcome! Please feel free to submit a Pull Request.
This project is open source and available under the MIT License.