This repository contains a data analysis and investigation of a TMDB movie dataset. The purpose of this project is to understand how to do some basic data analysis. After the analysis some questions are answered to conclude the findings. It is done as part of Udacity's Become A Data Analyst Nanodegree.
I was given an option to choose a dataset among various datasets, I finally chose TMDB dataset which consists of information about movies of the last 50+ years. It comprises of 21 columns and 10k+ rows.
I cleaned the dataset removing/replacing redundant and useless data to convert it into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as EDA.
In this phase I analyzed the dataset to find some interesting insights and answer some questions regarding the dataset. I used various types of statistical techniques and plots to explore the hidden information in the dataset which was conveyed in the final findings.
This project helped me and will also help you in performing detailed data analysis. It will also improve your understading regarding python, pandas ,numpy and data visualizations techniques. I have uploaded the dataset in this repo along with all the code with comments and detailed descriptions.
More details about the project can be found here: https://wahabaftab.com/Investigate-a-TMDB-Dataset/