In this study, I propose a data mining approach to understand how different physiochemical properties affect wine quality and to predict wine taste preferences that is based on easily available analytical tests at the certification step. Six regression techniques were applied, under a computationally efficient procedure that performs simultaneous variable and model selection. The support vector machine achieved promising results, outperforming than the others. Such a model is useful to support the oenologist wine tasting evaluations and improve wine production.
Dataset: Kaggle https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009
In the data set, there 1599 different wine as row data and 12 features as columns.
- Logistic Regression classifier
- Random Forest classifier
- Decision Tree classifier
- QDA
- K-Nearest Neighbors
- Support Vector Machine classifier
From all algorithms, it was obvious that for this dataset, SVM and then Random Forest algorithm gave the best model and accuracy means that those algorithms predict correctly test data. By regression analysis, we come up with a model that highlights the significant attributes like Alcohol, Sulphates, Sulphates and so on, having more effect to deciding quality of the wine.
Take decision tree as an example: