Udacity- Data Analyst Nanodegree
You will use the Python libraries,
- NumPy
- Pandas
- Matplotlib
In this project, We will analyze a dataset,TMDB Movie Data in this case, and then communicate our findings about it.
- Data Wrangling: Employed Python libraries to clean and organize the TMDB movie dataset.
- Exploratory Analysis: Conducted thorough exploratory analysis to uncover underlying patterns in the data.
- Visualization: Created compelling visualizations to present findings on movie popularity, ratings, and revenue trends.
This data set contains information about 10,000 movies collected from The Movie Database (TMDb), including user ratings and revenue.
Certain columns, like ‘cast’ and ‘genres’, contain multiple values separated by pipe (|) characters. The final two columns ending with “_adj” show the budget and revenue of the associated movie in terms of 2010 dollars, accounting for inflation over time.
##Analysis of the Dataset Think of some questions to which you want to find the answers to or Brainstorm some questions that could be answered using the data set that is chosen, then start answering those questions. Try and suggest questions that promote looking at relationships between multiple variables.
What I learned:
- What all steps are involved in a typical data analysis process.
- Comfortable posing questions that can be answered with a given dataset and then answering those questions.
- Know how to investigate problems in a dataset and wrangle the data into a format that can be used.
- Have practice communicating the results of the analysis.
- Being able to use vectorized operations in NumPy and Pandas to speed up your data analysis code.
- Being familiar with Pandas Series and DataFrame objects, which lets access data more conveniently.
- Last but not least, Know how to use Matplotlib and Seaborn to produce plots showing findings.