We obtained the Board Game Review dataset from ThaWeatherman's github repository(MIT License) and used Jupyter Notebook as the platform for the purpose of coding. Our methodology involves use of classification techniques like Linear Regression and Random Forest Regression.
Feature selection is finding the subset of original features by different approaches based on the information they provide, accuracy, prediction errors. The features used in the project are:
- yearpublished
- minplayers
- maxplayers
- playingtime
- minplaytime
- maxplaytime
- minage
- users_rated
- total_owners
- total_traders
- total_wanters
- total_wishers
- total_comments
- total_weights
- average_weight
- Linear Regression
- Random Forest Regressor
The data taken is from www.github.com/ThaWeatherman/scrapers/blob/master/boardgamegeek/games.csv
80% of above data is training and 20% is testing data.
- Anywhere between 1->10
To find the accuracy, Squared Error Function under sklearn library is used. The Squared Error found is:
- Linear Regression => 2.0788190326293243
- Random Forest Regressor => 1.4458560046071653
- source.ipynb(Jupyter Notebook-https://jupyter.org/)
- source.pdf(Just a pdf print of jupyter notebook)
- games.csv(File that conntains Train and Test Data)
In source.ipynb, data is visualized using Histograms and Heat Plots.