This project aims to cluster credit data using the KMeans model and visualize the results using Jupyter Dash. The dataset used in this project is the US Survey Data from 2019.
- Identify columns or features with large variances.
- Perform data processing using the trimmed variance method to handle outliers.
- Build an unsupervised model to cluster credit unworthy individuals or those at risk of credit decline.
- Create centroids for the different clusters.
- Visualize the clusters using Principal Component Analysis (PCA) in Jupyter Dash.
The project follows the following workflow:
- Importing Packages: This section imports the necessary packages and libraries for data analysis, visualization, and Jupyter Dash.
- Data Import and Cleaning: The 2019 Survey dataset is imported, and initial cleaning operations are performed.
- Exploratory Data Analysis (EDA): This section explores the dataset, examines its shape and characteristics, and prepares the data for clustering.
- KMeans Clustering: The KMeans model is applied to the preprocessed data to cluster credit unworthy individuals.
- Centroid Creation: Centroids are generated for each cluster.
- Visualization with Jupyter Dash: The clusters are visualized using Principal Component Analysis (PCA) within a Jupyter Dash application.
To run this project, you need to have Jupyter Notebook and Jupyter Dash installed. Clone the repository and open the Jupyter Notebook file (.ipynb) in your Jupyter environment. Ensure that the required packages mentioned in the "Importing Packages" section are installed in your Python environment.
The dataset used in this project should be named "SCFP2019.csv" and placed in the data folder.
To launch the Jupyter Dash application, execute the provided code in the Jupyter Dash. The application will launch a web server, and you can access the dashboard by opening the displayed URL in your web browser. The dashboard will be interactive, allowing you to explore the dataset and analyze the credit clusters using the provided features.
- The 2019 Survey dataset used in this project was sourced from federal reserve source.
- The KMeans algorithm is implemented using the scikit-learn library.
- Principal Component Analysis (PCA) is performed using the scikit-learn library.
- Jupyter Dash is used for creating the interactive dashboard.
For any improvement don't fail to reach out through.