This project explores unsupervised machine learning techniques to segment credit card customers based on their spending habits.
Project Goals:
- Identify distinct customer groups with similar spending patterns.
- Develop a comprehensive notebook for educational purposes.
Content:
- Data Exploration & Cleaning: Analyze and prepare the credit card transaction data for modeling.
- Clustering Algorithms:
- Implement K-Means clustering to identify natural customer segments.
- Explore techniques to determine the optimal number of clusters (Elbow Method, Silhouette Score, Gap Statistic).
- Apply Hierarchical clustering for a top-down approach to uncover group structure.
- Utilize DBSCAN to identify clusters of arbitrary shapes, handling potential outliers.
- Leverage Gaussian Mixture Models (EM) to model customer behavior using probability distributions.
- Dimensionality Reduction:
- Employ T-SNE for visualization of high-dimensional data for clearer insights.
- Implement PCA to capture the most significant variations within the data.
- Utilize Kernel PCA to uncover non-linear relationships for more effective clustering.
- Cluster Evaluation: Assess the quality of the formed clusters using various methods.
Learning Outcomes:
This project provides a hands-on exploration of unsupervised machine learning techniques for customer segmentation. The accompanying notebook serves as a detailed guide, explaining each step from data wrangling to algorithm selection and result interpretation.
Additional Information:
- Feel free to reach out with any questions about the project or the notebook.
Note
The description after every cluster may differ from one run to another, duo to randomness of the algorithms. I set the seed but it was late.
Software Used:
- Python
- Numpy & Pandas
- Seaborn & Matplolib & Plotly
- Scikit-Learn
- Scikit-learn-intelex
- Scipy
Data Source: