What if you could predict the ratings from the script of any show? I explored and tested this upon one of my most favourite tv shows in existence, Star Trek: The Next Generation.
The project contains (in viewing order):
- KatyaKogan_Capstone_Report.pdf (final report)
- KatyaKogan_Final_Presentation.pdf (final presentation for tech audience)
- KatyaKogan_Demo_Day.pdf (presentation file for demo day)
- Part1_TrekPredict_CleaningEDA.ipynb
- Part2_TrekPredict_Modelling.ipynb
- model_comp.csv (comparing models)
- PCT_graph.csv (modelling dataset)
- TNG.csv.gz (original dataset -> https://github.com/RTrek/startrekTNGdataset)
- total_word_count.csv
Required libraries:
- pandas
- numpy
- seaborn
- matplotlib
- sklearn
Extras:
- shap (https://shap.readthedocs.io/en/latest/index.html)
- mord (https://pythonhosted.org/mord/)
- time