Every year, thousands of movies get released but only a percentage of those become successful. The aim of this work is to analyze determining factors affecting movie profitability and commercial success.
The main objective of this project is to analyze a dataset containing information about 10,000 movies from 1960 to 2015, collected from The Movie Database (TMDb) and including all films details such as the production cost, the revenue generated, rating information, actors and directors, etc. This work also tries to find answers to the questions below:
1- What kinds of properties are associated with most and less successful movies?
- Which Movie had the highest or lowest profit?
- Which year the movie industry** made the highest profit?
- Which Month the movie industry made the highest profit?
- Do popular movies get higher profit?
- What were the most or least expensive movies?
- What is the statistical relationship between budget and profit?
- Do movies with highest budget get highest rating?
- Which Movie had the highest or lowest revenue?
- Is there any statistical relationship between revenue and profit/ revenue and budget?
- What is the movie length most liked by the audience?
- Which movie was high or less rated?
- Do high rated movies get higher profit?
- Profit
- Budget
- Revenue
- Popularity
- Which genres are more profitable overall?
- Which genres are more profitable from year to year?
- Which genres are more popular overall?
- What is the evolution of the genres according to popularity from year to year?
This is a project that I was working on for Udacity Data Analyst Nanodegree. In this project, i'll go through the data analysis process and see how everything fits together. I will use the Python libraries NumPy, pandas, and Matplotlib to make my analysis easier.
You will need an installation of Python, plus the following libraries:
- pandas
- NumPy
- Matplotlib
- csv
I recommend installing Anaconda, which comes with all of the necessary packages, as well as IPython notebook.
- All the steps involved in a typical data analysis process;
- Comfortable posing questions that can be answered with a given dataset and then answering those questions;
- Investigate problems in a dataset and wrangle the data;
- Communicating the results of my analysis;
- Use vectorized operations in NumPy and pandas to speed up data analysis code;
- More familiar with pandas' Series and DataFrame objects;
- Use Matplotlib to produce plots showing your findings.
My project was reviewed by a Udacity reviewer. All criteria found in the rubric must be meeting specifications for me to pass.