Rebecca Buerger, Jack Ewert, Nicholas Litterio, Waddah Moghram, Dhruv Vyas, Yuchen Zhang
Over the last decade, Netflix has cemented its position as a leading streaming media provider. To maintain its dominance, Netflix has commissioned a one-million-dollar prize in 2009 for the code that improved rating predictions for previously collected real-life customer. Our team has been tasked with the same task for two months. As of the end of the first month, we successfully read and visualized the original dataset provided by Netflix and identified strategies to proceed with the project. Some of these strategies included K-means clustering and Pearsons’ R correlation. Approaching the conclusion of our project, we were able to supplement about 60% of the movie titles with IMDB online database. In addition, we included some time-series analysis of movie and user trends. This paper has been submitted as part of a class entitled Big Data Analytics (IE:4172) on December 7, 2018.
Please refer to the included report in BigDataNetFlixProjectFinalReport_Group_3_Alpaca.pdf
for more details
Please note that some data files could not be uploaded to GitHub due to the size limit of 250 MB allowed by server. However, these files are available by request and the result-producing code can be obtained by running the existing data files and source files if that the needed python libraries are installed properly.
Update: the GitHub repository URL mentioned in the report is no longer available. The updated URL is: https://github.com/waddahmoghram/BigDataNetflixProject2018