Video Games Analysis

Git Repository for Project Group 4

Link to Data

Link to dataset: https://www.kaggle.com/datasets/thedevastator/discovering-hidden-trends-in-global-video-games

Link to data: https://sbcharitybucket.s3.us-west-2.amazonaws.com/Video_Games.csv

Presentation

Presentation

List of Files

Deliverable 1: Planning our Analysis

Project Overview & Selected Topic

Can we predict global revenues within the first year of a new game's release? Discover the key features for rev. predictions
Is there a direct connection between score, rating, & sales?

Dataset Description & Why we select this?

Our dataset holds all the video games launched from 1980 to 2020. Columns include game features like genre, launch year, publishers, sales my market, critic and user scores etc., which will allow us to understand the gaming industry and popular games over year.
We are selecting this data because we are interested in gaming indusrtry and would love to find out what are the factors make games stand out from more than 15k competitors by looking at their revenue and scores.

List of Technologies Used

Python
SQL
Tableau
R
PgAdmin

ERD

The dataset has been divided into three tables: Games, Sales and Ratings. <"https://github.com/bbinvt/project-group-4/blob/main/Database/ERD.PNG">

Cleaning the Data Set

Drop null values
Drop columns: Critic Count, User Count, Developer, Rating (Alternatively try filling in null values with averages of the column)
Normalize critic score and revenue
Weighted average of Revenues per year for direct comparisons between games - ie 80% of revenue comes within the first year

Proposed ML Model:

We are designing a model to predict the revenues of videogames by geography and ultimately global. First, we are examining the statistical relationship of the data set's features to determine relevance. Then we will employ a linear regression model to predict video game revenues. We will examine different methods to improve the accuracy of our model by altering label encoding, dropping variables, and can also change how data points are grouped.

Proposed Vizualization Analysis

Best selling genre by market
User/critic score by games/genres
Sales by genre by region

Deliverable 2: Building and Assembling the Pieces

Important Candidate Features

Through R, we saw that the statistically important features are as follows:

Years_on_Market
Critic_Score
Genre
User_Score
Publisher
Rating

Preliminary Visualizations

Presentation Structure

Project Overview

Dataset overview (where we started)
Why we selected this topic

Interesting Highlights

Bestselling game globally and it's platform
Top user score game globally and it's platform
Top critic score game globally and it's platform
Top genre by each market

Questions Answered

Can we predict global revenues within the first year of a new game's release? Discover the key features for rev. predictions
Is there a direct connection between score, rating, & sales?

Methodology

Tools we use
Models we use
How we improved the model

Results

Prediction results
Accuracy score
Visuals from both Python and Tableau

Deliverable 3: Put it All Together

Results

Linear regression models were used to predict global sales. The linear regression models performed quite poorly in general; the R2 value or the amount of variance that could be explained by the features varied from 6.3% to 17.1% depending on the features and type of encoding and binning used.

Using XG Boost to predict global sales has so far provided far better results. The best results so far have been an R2 value of 0.402 i.e. 40.2% of the variance can be explained by the features. For that model, the features that were used to predict global sales were Years_On_Market, Critic_Score, User_Score, Genre, Rating and Publisher, and Platform (with the last 4 features being one-hot encoded).

By creating models to predict sales for specific regions and genre of games, we were able to create high performing models with the highest performing model being the model used to predict North American action genre games (R squared score of 82.2%).

Dashboard & Expansion of Visualizations

Dashboard

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
Data		Data
Database		Database
Modeling		Modeling
Notebooks - Preparing Data		Notebooks - Preparing Data
Visuals		Visuals
.gitignore		.gitignore
Data Project-Group 4_final version.pptx		Data Project-Group 4_final version.pptx
README.md		README.md
Video game_Tableau_projectgroup4.twbx		Video game_Tableau_projectgroup4.twbx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Games Analysis

Link to Data

Presentation

List of Files

Deliverable 1: Planning our Analysis

Project Overview & Selected Topic

Dataset Description & Why we select this?

List of Technologies Used

ERD

Cleaning the Data Set

Proposed ML Model:

Proposed Vizualization Analysis

Deliverable 2: Building and Assembling the Pieces

Important Candidate Features

Preliminary Visualizations

Presentation Structure

Project Overview

Interesting Highlights

Questions Answered

Methodology

Results

Deliverable 3: Put it All Together

Results

Dashboard & Expansion of Visualizations

About

Releases

Packages

Contributors 4

Languages

bbinvt/project-group-4

Folders and files

Latest commit

History

Repository files navigation

Video Games Analysis

Link to Data

Presentation

List of Files

Deliverable 1: Planning our Analysis

Project Overview & Selected Topic

Dataset Description & Why we select this?

List of Technologies Used

ERD

Cleaning the Data Set

Proposed ML Model:

Proposed Vizualization Analysis

Deliverable 2: Building and Assembling the Pieces

Important Candidate Features

Preliminary Visualizations

Presentation Structure

Project Overview

Interesting Highlights

Questions Answered

Methodology

Results

Deliverable 3: Put it All Together

Results

Dashboard & Expansion of Visualizations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages