Field Name | Description | Type |
---|---|---|
Title | Movie title. | Object |
Overview | Summary of the movie's plot. | Object |
Release Date | Release date of the movie. | Object |
Vote Average | Average rating given by users. | float |
Vote Count | Number of votes received by the movie. | int |
Runtime | Duration of the movie in minutes. | int |
Budget | Budget allocated for the movie. | int |
Revenue | Revenue generated by the movie. | int |
Popularity | Popularity score of the movie. | float |
Production Countries | Countries where the movie was produced. | Object |
Production Companies | Production companies involved in making the movie. | Object |
Genres | Genres of the movie (e.g., Action, Drama, Science Fiction). | Object |
- The data is collected using themoviedb's API endpoint
- Refer to this link: https://developer.themoviedb.org/reference/intro/getting-started for the API documentation
- The data consists of the top 500 US-affiliated movies of each year, ranging from 2010 to 2022, and it is collected based on the vote counts from themoviedb.
- Collect movie data from themoviedb's API endpoint
- Perform data cleaning and feature engineering on the data
- Creating an interactive multi-page PowerBI dashboard to visualize the data
- Refer to data_collection_api.ipynb for the detailed steps
- The script uses the provided API key for authentication. (You can get an API key for free after registering an account at themoviesdb)
- It retrieves movie details, including title, overview, release date, vote average, vote count, runtime, budget, revenue, popularity, production countries, production companies, and genres.
- Data is collected based on vote counts, sorted in descending order.
- The script ensures a maximum of 500 movies per year and writes the information to a CSV file.
- Replace the 'API KEY' with your actual themoviedb API key.
api_key = 'API KEY'
- Set the desired start and end years for data collection.
- The script writes collected data to a CSV file format
- csv: Module for reading and writing CSV files.
- requests: Module for sending HTTP requests.
- themoviesdb also offers data for TV shows. Read the documentation if that interests you
- Refer to feature_engineering.ipynb for the detailed steps
- Removing rows with values of 0
- Creating new columns: profits, ROI, year, month, day
- Convert columns to the appropriate data types
- One-hot-encode the genres column (This will be utilized in the PowerBI dashboard later on to create a slicer to filter by movie genre, providing users with the ability to explore and analyze the dataset based on different combinations genres.)
- Load the cleaned.csv file into PowerBI and create the dashboard
- Main Page: A comprehensive overview of key metrics and trends.
- Graphs: Visual representations of data trends, enabling in-depth analysis.
- Movie Details: Dive into specifics with detailed information about each movie.
- Key Influencers: Discover the factors influencing key aspects.
- Sidebar Navigation: A user-friendly sidebar with buttons for easy page navigation.
- Bookmarks: Seamlessly toggle between different states, such as showing or hiding the sidebar.
- Field Parameters: Effortlessly filter visuals based on different fields, such as top movies by budget, popularity, and more.
- Simple Slicers: Easily filter movies by release date and genres.
- Drillthrough feature: Added Drillthrough capability for the movie details
- Key influencers analysis: Added key influencers tools to analyze factors that have the most impact on a particular outcome.
- Right click any movie title from the table and navigate to the drillthrough page
To explore the dashboard, download and open the PowerBI Desktop app, then load the movie_viz.pbix file.
- IMPORTANT NOTE: You need to download the PowerBI Desktop in order to open the pbix file
- You can download the pbix file from this repository