Skip to content

bbinvt/project-group-4

Repository files navigation

Video Games Analysis

Git Repository for Project Group 4

Link to Data

Link to dataset: https://www.kaggle.com/datasets/thedevastator/discovering-hidden-trends-in-global-video-games

Link to data: https://sbcharitybucket.s3.us-west-2.amazonaws.com/Video_Games.csv

Presentation

List of Files

Deliverable 1: Planning our Analysis

Project Overview & Selected Topic

  • Can we predict global revenues within the first year of a new game's release? Discover the key features for rev. predictions
  • Is there a direct connection between score, rating, & sales?

Dataset Description & Why we select this?

  • Our dataset holds all the video games launched from 1980 to 2020. Columns include game features like genre, launch year, publishers, sales my market, critic and user scores etc., which will allow us to understand the gaming industry and popular games over year.
  • We are selecting this data because we are interested in gaming indusrtry and would love to find out what are the factors make games stand out from more than 15k competitors by looking at their revenue and scores.

List of Technologies Used

  • Python
  • SQL
  • Tableau
  • R
  • PgAdmin

ERD

The dataset has been divided into three tables: Games, Sales and Ratings. <"https://github.com/bbinvt/project-group-4/blob/main/Database/ERD.PNG">

Cleaning the Data Set

  • Drop null values
  • Drop columns: Critic Count, User Count, Developer, Rating (Alternatively try filling in null values with averages of the column)
  • Normalize critic score and revenue
  • Weighted average of Revenues per year for direct comparisons between games - ie 80% of revenue comes within the first year

Proposed ML Model:

We are designing a model to predict the revenues of videogames by geography and ultimately global. First, we are examining the statistical relationship of the data set's features to determine relevance. Then we will employ a linear regression model to predict video game revenues. We will examine different methods to improve the accuracy of our model by altering label encoding, dropping variables, and can also change how data points are grouped.

Proposed Vizualization Analysis

  • Best selling genre by market
  • User/critic score by games/genres
  • Sales by genre by region

Deliverable 2: Building and Assembling the Pieces

Important Candidate Features

Through R, we saw that the statistically important features are as follows:

  1. Years_on_Market
  2. Critic_Score
  3. Genre
  4. User_Score
  5. Publisher
  6. Rating

Preliminary Visualizations

Presentation Structure

Project Overview

  • Dataset overview (where we started)
  • Why we selected this topic

Interesting Highlights

  • Bestselling game globally and it's platform
  • Top user score game globally and it's platform
  • Top critic score game globally and it's platform
  • Top genre by each market

Questions Answered

  • Can we predict global revenues within the first year of a new game's release? Discover the key features for rev. predictions
  • Is there a direct connection between score, rating, & sales?

Methodology

  • Tools we use
  • Models we use
  • How we improved the model

Results

  • Prediction results
  • Accuracy score
  • Visuals from both Python and Tableau

Deliverable 3: Put it All Together

Results

Linear regression models were used to predict global sales. The linear regression models performed quite poorly in general; the R2 value or the amount of variance that could be explained by the features varied from 6.3% to 17.1% depending on the features and type of encoding and binning used.

Using XG Boost to predict global sales has so far provided far better results. The best results so far have been an R2 value of 0.402 i.e. 40.2% of the variance can be explained by the features. For that model, the features that were used to predict global sales were Years_On_Market, Critic_Score, User_Score, Genre, Rating and Publisher, and Platform (with the last 4 features being one-hot encoded).

By creating models to predict sales for specific regions and genre of games, we were able to create high performing models with the highest performing model being the model used to predict North American action genre games (R squared score of 82.2%).

Dashboard & Expansion of Visualizations

About

Git Repository for Project Group 4

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •