Credit Clustering with KMeans Model and Jupyter Dash

Introduction

This project aims to cluster credit data using the KMeans model and visualize the results using Jupyter Dash. The dataset used in this project is the US Survey Data from 2019.

Objectives

Identify columns or features with large variances.
Perform data processing using the trimmed variance method to handle outliers.
Build an unsupervised model to cluster credit unworthy individuals or those at risk of credit decline.
Create centroids for the different clusters.
Visualize the clusters using Principal Component Analysis (PCA) in Jupyter Dash.

Workflow

The project follows the following workflow:

Importing Packages: This section imports the necessary packages and libraries for data analysis, visualization, and Jupyter Dash.
Data Import and Cleaning: The 2019 Survey dataset is imported, and initial cleaning operations are performed.
Exploratory Data Analysis (EDA): This section explores the dataset, examines its shape and characteristics, and prepares the data for clustering.
KMeans Clustering: The KMeans model is applied to the preprocessed data to cluster credit unworthy individuals.
Centroid Creation: Centroids are generated for each cluster.
Visualization with Jupyter Dash: The clusters are visualized using Principal Component Analysis (PCA) within a Jupyter Dash application.

Getting Started

To run this project, you need to have Jupyter Notebook and Jupyter Dash installed. Clone the repository and open the Jupyter Notebook file (.ipynb) in your Jupyter environment. Ensure that the required packages mentioned in the "Importing Packages" section are installed in your Python environment.

The dataset used in this project should be named "SCFP2019.csv" and placed in the data folder.

To launch the Jupyter Dash application, execute the provided code in the Jupyter Dash. The application will launch a web server, and you can access the dashboard by opening the displayed URL in your web browser. The dashboard will be interactive, allowing you to explore the dataset and analyze the credit clusters using the provided features.

Acknowledgments

The 2019 Survey dataset used in this project was sourced from federal reserve source.
The KMeans algorithm is implemented using the scikit-learn library.
Principal Component Analysis (PCA) is performed using the scikit-learn library.
Jupyter Dash is used for creating the interactive dashboard.

For any improvement don't fail to reach out through.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Data		Data
JupyterDash		JupyterDash
.gitignore		.gitignore
Cluster KMeans Model for Credit (US Survey Data 2019) multiple features.ipynb		Cluster KMeans Model for Credit (US Survey Data 2019) multiple features.ipynb
Cluster KMeans Model for Credit (US Survey Data 2019) two features.ipynb		Cluster KMeans Model for Credit (US Survey Data 2019) two features.ipynb
Exploratory Data Analysis for US Survey Data 2019.ipynb		Exploratory Data Analysis for US Survey Data 2019.ipynb
LICENCE		LICENCE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Clustering with KMeans Model and Jupyter Dash

TABLE OF CONTENTS

Introduction

Objectives

Workflow

Getting Started

Acknowledgments

About

Releases

Packages

Languages

License

pexpeter/Customer-Segmentation

Folders and files

Latest commit

History

Repository files navigation

Credit Clustering with KMeans Model and Jupyter Dash

TABLE OF CONTENTS

Introduction

Objectives

Workflow

Getting Started

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages