Skip to content

Investigate a Dataset - Data Analyst Nanodegree with Udacity

Notifications You must be signed in to change notification settings

pooja-rane7/Investigate-a-Dataset

Repository files navigation

Investigate-a-Dataset

This project was completed as part of the course requirements of Udacity's Data Analyst Nanodegree certification.

Overview:

In this project, I have analysed TMDB dataset, which is available on Kaggle. This dataset contains information related to arround 10,000 movies collected from TMDBIt includes information about movie’s viewer’s rating,budget, revenue, genres, production companies, director, casting, keywords associated with movies, popularity ofthe movies and runtime.This dataset can help to understand various factors like profitability, the trend around runtime,popularity over the years, popular genres for the profitability, connection between popularity ratings and profit; reveal information like profitable directors, casts and production companies over the span.

What do I need to install?

I will need an installation of Python, plus the following libraries:

  • pandas
  • NumPy
  • Matplotlib
  • csv

I recommend installing Anaconda, which comes with all of the necessary packages, as well as IPython notebook.

Project Details: Dataset

TMDb movie data (cleaned from original data on Kaggle)

Project Details: Sample Questions

  1. Which duration movies are most liked by the audiences according to their popularity?
  2. What is the average revenue of the movie?

Lessons Learnt:

I know how to investigate problems in a dataset and wrangle the data into a format I can use I'm able to use vectorized operations in NumPy and pandas to speed up my data analysis code I'm familiar with Pandas and Numpy, which let me access my data more conveniently and generate creative visualization plots using Matplotlib.

Limitations:

Findings are tentative and not verified by the principles of statistics and machine learning.The conclusion is not full proof that this formula is gonna work, but it shows us that we have high probability of making high profits if we had similar characteristics as such.