- Language used: Python
- Packages used: pandas, numpy, sklearn
- Data source: UCI ML https://archive.ics.uci.edu/dataset/73/mushroom
- 8124 data poitns
- 23 features
-
Data cleaning:
- Convert categorical features to dummy variables
- Convert response to binary
- Split data into groups manually (not using a package)
-
Train models:
- Cross validation was used within GridSearchCV
- Models trained: Random Forest, Support Vector Machine, XGBOOst, Neural Network
-
Assess model performance on test data
- Performance assessed with: Accuracy, F1 and ROC_AUC