Here is the medium article that summarises the most important results: https://medium.com/@anudeepvanjavakam/what-i-learned-from-analyzing-30k-video-game-ratings-df17ce4aa38b
IGDB.com is a video game database (acquired by Amazon-owned Twitch), intended for both game consumers and video game professionals alike. IGDB stands for Internet Game Database and consists of 2 primary parts, an online website consisting of a website and 3 mobile apps that provide game consumers with structured and up to date information about games and the gaming world.
This dataset of more than 200K games is collected from IGDB API using igdb-api-v4 for python but only about 30K+ games have ratings. This notebook explores any common trends for games that have ratings from igdb and external critics.
Most importantly, we will try to answer these questions:
- What's the proportion of games in each gaming platform?
- How do game ratings look for different platforms, genres, themes, age ratings and player perspectives and discuss why?
- Do Multi player games get higher rating than single player games regardless of platforms, genres, themes, age ratings and player perspectives?
- Which gaming platform sees more rating success for each genre?
- Which groups contribute most to great games (games that have a rating of higher than 75)
- Project Title
- Demo-Preview
- Table of contents
- Installation
- File Descriptions
- Licensing and Acknowledgements
- Results
To use this project, first clone the repo on your device using the command below:
git init
git clone https://github.com/anudeepvanjavakam1/video_games_rating_analysis.git
-->
This was developed using Python 3.10.4 and the following libraries: numpy pandas empericaldist matplotlib seaborn missingno plotly tqdm colorama
exploring_igdb_games_decluttered.ipynb -- decluttered jupyter notebook with fewer code cells and visualizations. exploring_igdb_games_data.ipynb -- jupyter notebook that includes extra code and visualization content.
The MIT License | Open Source Initiative
Some of the plot ideas/code are adapted from:
- https://github.com/miykael/miykael.github.io/blob/master/assets/nb/03_advanced_eda/nb_advanced_eda.ipynb
- https://towardsdatascience.com/my-6-part-powerful-eda-template-that-speaks-of-ultimate-skill-6bdde3c91431
Key Takeaways:
-
Most of the data was either missing or structured in a way that cannot be easily analyzed. Data is pre-processed and simplified with some assumptions.
-
Majority of the ratings are between 50 and 80 with peaking at 70.
-
Proportion of games in dataset: Windows ~28%, Mac ~9%, PS4 ~7%, Linux ~6.5%, Xbox One ~6%, Nintendo Switch ~6%. Rest of them have less than 6%.
-
Several unique values in each categorical feature make it harder to gain meaningful insights. Had to consider top 10 most frequently occuring values for most of the analysis.
-
With great moderation comes great quality! iOS had better rating distribution than other platforms.
-
7th Generation consoles (PS3 and Xbox 360) had better ratings than their next gen counterparts. This might be because of bigger library of great games with backward compatibility and longer time in market.
-
Linux and Mac have better ratings than Windows BUT only successful or well known frachise games are made compatible with Linux and Mac. This makes it look like they have better ratings. Thus they have an obvious advantage over Windows which has a plethora of bad and good games.
-
No. of follows for a game doesn't affect the rating but it affects how many people rate the game.
-
Multi player games received higher ratings than single player games. This is expected as these are games that receive (atleast in majority of the cases) the bigger budgets, better development teams, and relatively lenient time schedules. These are the games that publishers want to be the biggest of the year, and they do everything they can up to launch to ensure those results. They are more profitable (in-game purchases) and have higher bar to deliver and cater to thousands if not millions of gamers where as most single player games are a one time purchase and fall a little short in generating recurring profits.
-
Although they are less than 5% of the proportion, Mature age rated games received higher ratings than Teen or Everyone rated games. Mature games grab away all the awards as well. They have more freedom in story-telling/character and level design/gameplay and have more attention to detail. Thus, they clearly have more chance to deliver better impact and experience to a gamer.
-
Platform with the highest median rating in each genre:
- Simulator: Linux followed by Mac and Xbox 360.
- Point and Click: PS3 followed by Xbox 360. Windows and Linux have the worst ratings. I wasn't expecting this at all for point and click games.
- Adventure: Linux followed by PS4. Linux has unusually high ratings here but that may be because only successful Windows games (most of the time) are made compatible for Linux and hence we see higher rating here.
- Shooter: Android followed Mac. This is surprising that Android has higher ratings than other platforms as they offer better shooter game experience. Windows has the lowest median rating.
- Platform: PS3 followed by Nintendo Switch
- RPG: PS3 followed by Xbox 360
- Fighting: PS3 followed by Xbox 360
- Puzzle: iOS followed by Android
- Strategy: Android
- Racing: PS3 followed by Mac
Group | % of games that are great given a category group |
---|---|
Mature | 49%. In other words, in age limit category, 49% of the games will be great given they belong to Mature group. |
Mac | 44%. This is only 4% higher than PC Windows and 2% higher than other platforms. May be platform doesn't matter? |
Fantasy | 48% |
RPG | 46% |
Text View | 49% (There are very few text games but their rating seems to be high). If we only consider popular views, then first person and side view have great games (43%). |
- Last but not least, we should accept that the results here are based on several assumptions (example - genres and themes are tagged correctly for games), one of which is that this dataset is a representative sample of the games released in the world and is not already biased for any particular group. For example, If PS3 had only well known games added to IGDB database, it would invalidate our results.
Expansion Ideas
- Publishers and Developers data can be analyzed to see how they do in terms of rating.
- If there are separate ratings for gameplay, controls, level design, challenge, characters, etc. it would give more insight into what role they actually play in making a game great.
- It would be interesting to see to what extent higher rating influences game sales.
- Do games that do not have similar games associated with them receive higher rating when compared to games that have many similar games?
- Would be interesting to see if games with websites associated with them have higher hype or rating?
- Assigning more weightage to rating based on how many people have rated a game.
Image by cromaconceptovisual from Pixabay