Investigate-TMDb-Movie-Database

Every year, thousands of movies get released but only a percentage of those become successful. The aim of this work is to analyze determining factors affecting movie profitability and commercial success.

Project overview

The main objective of this project is to analyze a dataset containing information about 10,000 movies from 1960 to 2015, collected from The Movie Database (TMDb) and including all films details such as the production cost, the revenue generated, rating information, actors and directors, etc. This work also tries to find answers to the questions below:

Research questions

1- What kinds of properties are associated with most and less successful movies?

Which Movie had the highest or lowest profit?
Which year the movie industry** made the highest profit?
Which Month the movie industry made the highest profit?
Do popular movies get higher profit?
What were the most or least expensive movies?
What is the statistical relationship between budget and profit?
Do movies with highest budget get highest rating?
Which Movie had the highest or lowest revenue?
Is there any statistical relationship between revenue and profit/ revenue and budget?
What is the movie length most liked by the audience?
Which movie was high or less rated?
Do high rated movies get higher profit?

2- What are the Top 10 movies according to different features ? in particular :

Profit
Budget
Revenue
Popularity

3- Which genres are most popular and profitable overall and overtime?

Which genres are more profitable overall?
Which genres are more profitable from year to year?
Which genres are more popular overall?
What is the evolution of the genres according to popularity from year to year?

4- What are top 10 Casts, Directors and production companies ?

Project Objectives

This is a project that I was working on for Udacity Data Analyst Nanodegree. In this project, i'll go through the data analysis process and see how everything fits together. I will use the Python libraries NumPy, pandas, and Matplotlib to make my analysis easier.

Loading project requirements

You will need an installation of Python, plus the following libraries:

pandas
NumPy
Matplotlib
csv

I recommend installing Anaconda, which comes with all of the necessary packages, as well as IPython notebook.

Conclusion

What I learned

All the steps involved in a typical data analysis process;
Comfortable posing questions that can be answered with a given dataset and then answering those questions;
Investigate problems in a dataset and wrangle the data;
Communicating the results of my analysis;
Use vectorized operations in NumPy and pandas to speed up data analysis code;
More familiar with pandas' Series and DataFrame objects;
Use Matplotlib to produce plots showing your findings.

Evaluation

My project was reviewed by a Udacity reviewer. All criteria found in the rubric must be meeting specifications for me to pass.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
DATA.csv		DATA.csv
Data-cleaned.csv		Data-cleaned.csv
README.md		README.md
Report.html		Report.html
Report.ipynb		Report.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Investigate-TMDb-Movie-Database

Project overview

Research questions

Project Objectives

Loading project requirements

Conclusion

What I learned

Evaluation

About

Releases

Packages

Languages

Fuenj/Exploratory-Data-Analysis-EDA-with-Python-The-Movie-Database-

Folders and files

Latest commit

History

Repository files navigation

Investigate-TMDb-Movie-Database

Project overview

Research questions

Project Objectives

Loading project requirements

Conclusion

What I learned

Evaluation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages