This project focuses on predicting the quality of red wine using various machine learning algorithms for regression analysis, data visualizations, and data analysis. The dataset comprises physicochemical and sensory variables related to red and white variants of the Portuguese "Vinho Verde" wine.
The datasets present a classification or regression task, considering physicochemical inputs and sensory outputs. Notably, the classes are ordered and imbalanced, making it challenging to predict wine quality accurately. Privacy and logistic constraints limit available information to physicochemical and sensory variables, omitting grape types, wine brand, and selling price.
For detailed information, refer to the original publication by Cortez et al., 2009. Input Variables (Physicochemical Tests):
- Fixed acidity
- Volatile acidity
- Citric acid
- Residual sugar
- Chlorides
- Free sulfur dioxide
- Total sulfur dioxide
- Density
- pH
- Sulphates
- Alcohol
Output Variable (Sensory Data):
- Quality (Score between 0 and 10)
Consider exploring classification tasks by setting a cutoff for wine quality, e.g., classifying scores of 7 or higher as 'good/1' and the rest as 'not good/0'. Experiment with hyperparameter tuning, decision tree algorithms, ROC curves, and AUC values.
- Importing Libraries
- Loading Data
- Understanding Data
- Missing Values
- Exploring Variables (Data Analysis)
- Feature Selection
- Proportion of Good vs Bad Wines
- Preparing Data for Modeling
- Applying Different Models
- Choosing the Right Model
Utilize machine learning to identify physicochemical properties that contribute to a wine being classified as 'good'!
The dataset is also available from the UCI machine learning repository. Please include the citation below if you plan to use this database:
Citation: P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.
P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.