Skip to content

FearlessFrench/provider-segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Provider Segmentation

About

A complete data clustering internship project using K-Means, Hierarchical Clustering, DBSCAN, Spectral Clustering, and Gaussian Mixture Model (GMM) Clustering to segment service providers into different groups based on selected features and also comparing each model's best possible performance.

みつは


Key Principles

Data clustering is an unsupervised machine learning technique that organizes and classifies different objects, data points, or observations into groups or clusters based on similarities or patterns. Unlike supervised learning, clustering does not rely on labeled data and instead aims to find natural groupings within the data.

Clustering is used to identify underlying trends, patterns, and outliers in a dataset. It can be applied in various scenarios, such as exploratory data analysis, preprocessing, and anomaly detection. Clustering helps in reducing the complexity of large datasets by grouping similar data points together, which can simplify further analysis and visualization.


Table of Contents

1st Phase - Exploratory Data Analysis (Data Cleaing & Transformation + Feature Engineering)

Clustering Pipeline

2nd Phase - Data Modeling & Analysis


Main Process

  • Exploratory Data Analysis (EDA)
  • Data Preprocessing (Data Cleaning & Transformation)
  • Feature Engineering (Feature Extraction & Selection)
  • Data Preparation
  • K-Means Clustering
  • Hierarchical Clustering
  • DBSCAN
  • Spectral Clustering
  • Gaussian Mixture Model Clustering
  • Silhouette Score
  • Davies-Bouldin Index
  • Calinski-Harabasz Index

Getting Started

This project contains two Jupyter Notebooks that document the process and results of the internship work.
Since the dataset may be confidential, it is not included in this repository.

  • provider_segmentation_eda.ipynb – Exploratory Data Analysis (EDA) of provider-related data.
  • provider_segmentation_clustering.ipynb – Clustering process and resulting segmentation.

You can open these notebooks in Jupyter Notebook, JupyterLab, or Google Colab to review the workflow and outputs.


License

This project is licensed under the Apache-2.0 License - see the LICENSE file for details.

About

Data clustering project with K-Means, Hierarchical Clustering, DBSCAN, Spectral Clustering, and GMM.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •