A thorough and multifaceted approach to predicting movie success, combining traditional data analysis with advanced techniques like network analysis and natural language processing.
This project demonstrates a predictive model using Video-On-Demand streaming data to forecast the success of movies early on. It leverages Gradient Boosting Machines and advanced feature engineering techniques.
EDA.ipynb
: Exploratory Data Analysis notebook providing insights into the streaming data.NetworkAnalysis(NetflixUS).ipynb
: Notebook focusing on network analysis of Netflix US streaming data.TextRank4Keyword.py
: Python script implementing TextRank algorithm for keyword extraction.model.ipynb
: Main notebook where the predictive model is built and evaluated.utils.py
: Utility functions supporting data processing and analysis.
- Gradient Boosting Machines: Utilizes GBM for robust predictive modeling.
- Advanced Feature Engineering: Implements sophisticated techniques to enhance model performance.
- Exploratory Data Analysis: Provides a thorough analysis of the streaming data.
- Network Analysis: Examines the network structure within the streaming data.
- TextRank for Keyword Extraction: Applies TextRank algorithm for extracting keywords from textual data.
To get started with this project:
- Clone the repository.
- Install necessary dependencies by running
pip install -r requirements.txt
. - Run the Jupyter notebooks to understand the data analysis and modeling process.
- Python 3.x
- Libraries as listed in
requirements.txt
, including but not limited to scikit-learn, pandas, etc.
Contributions to this project are welcome. Please follow the standard fork-and-pull request workflow.
This project is licensed under the MIT License - see the LICENSE file for details.