The project aims to perform data cleaning and basic analysis on a dataset containing information about movies.
Movies Dataset from Kaggle: https://www.kaggle.com/datasets/bharatnatrayn/movies-dataset-for-feature-extracion-prediction?select=movies.csv
MS SQL Server
- Clean the movies dataset to remove inconsistencies, errors, and missing values.
- Standardize data formats and representations for uniformity.
- Perform basic exploratory data analysis (EDA) to understand trends, patterns, and correlations within the dataset.
- Derive insights that can inform decision-making processes in the entertainment industry.
- How has the number of movies and TV shows released each year changed over time?
- What are the trends in ratings and votes over the years?
- Which genres are the most popular among viewers?
- How does the popularity of different genres vary by year?
- Which movies or TV shows have the highest ratings and votes?
- With the foundation laid by this project, numerous additional questions can be explored in the future.
- Interactive dashboards or visualization tools can be developed to view insights and to interact with the data.
- The future work outlined here highlights the potential for further analysis and exploration, by cleaning the data and addressing any inconsistencies or inaccuracies, this project sets the stage for reliable insights that can drive meaningful outcomes.