White Wine Quality Clustering

Project Overview

This project aims to analyze the chemical properties and sensory quality assessments of white wine varieties produced in a specific region of Portugal. The objective is to explore the relationship between these properties and to identify clusters of similar wines using partitioning clustering techniques. This analysis will help in understanding how chemical properties influence wine quality and can contribute to more objective wine certification and quality assurance processes.

Dataset

The dataset used in this project (whitewine_v6.xls) consists of 2700 white wine samples. Each sample has been tested for 12 attributes, including 11 physicochemical properties and 1 sensory quality rating. The physicochemical properties are continuous variables, while the quality rating is an ordinal variable ranging from 1 (worst) to 10 (best).

Attributes

fixed acidity: Non-volatile acids in wine.
volatile acidity: Acetic acid content, high levels lead to vinegar taste.
citric acid: Adds freshness and flavor to wines.
residual sugar: Sugar remaining after fermentation.
chlorides: Salt content in the wine.
free sulfur dioxide: Prevents microbial growth and oxidation.
total sulfur dioxide: Total SO2 content.
density: Wine density, influenced by alcohol and sugar content.
pH: Acidity/basicity scale (0-14).
sulphates: Contributes to SO2 levels.
alcohol: Alcohol content percentage.
quality: Sensory quality score (1-10).

Project Structure

The project is divided into two main subtasks:

Clustering with All Attributes

Objectives

Pre-processing:
- Scaling the data.
- Outlier detection and removal.
Determine the Number of Clusters:
- Using four automated tools: NBclust, Elbow, Gap statistics, and silhouette methods.
K-means Clustering:
- Perform k-means analysis with the chosen number of clusters.
- Evaluate clustering using BSS/TSS ratio, BSS, and WSS indices.
Silhouette Analysis:
- Provide silhouette plot and average silhouette width score.

Clustering with PCA-Reduced Attributes

Objectives

Principal Component Analysis (PCA):
- Reduce dimensionality of the dataset.
- Select principal components with cumulative variance > 85%.
Determine the Number of Clusters for PCA Data:
- Using the same four automated tools.
K-means Clustering on PCA Data:
- Perform k-means analysis with the chosen number of clusters.
- Evaluate clustering using BSS/TSS ratio, BSS, and WSS indices.
Silhouette Analysis for PCA Data:
- Provide silhouette plot and average silhouette width score.
Calinski-Harabasz Index:
- Evaluate clustering quality using this index.

Usage

Prerequisites

Ensure you have the following R packages installed:

install.packages(c( "cluster", "factoextra", "NBclust", "readxl", "fpc"))

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Clustering & Preprocessing		Clustering & Preprocessing
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

White Wine Quality Clustering

Project Overview

Dataset

Attributes

Project Structure

Clustering with All Attributes

Objectives

Clustering with PCA-Reduced Attributes

Objectives

Usage

Prerequisites

About

Releases

Packages

Languages

License

DharshanSR/Partitioning-Clustering

Folders and files

Latest commit

History

Repository files navigation

White Wine Quality Clustering

Project Overview

Dataset

Attributes

Project Structure

Clustering with All Attributes

Objectives

Clustering with PCA-Reduced Attributes

Objectives

Usage

Prerequisites

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages