Project Description: K-Means Clustering for Data Prediction and Visualization In this project, I utilized the K-means clustering algorithm to perform data prediction and visualization. The project consists of two parts:
- Synthetic Data Clustering:
I generated synthetic data using the make_blobs function from the scikit-learn library. The data includes 100 samples with four clusters and a specified cluster standard deviation. After applying K-means clustering with four clusters, the model successfully predicted cluster labels for each data point. The scatter plot visually represents the original data points and their assigned clusters. Additionally, the cluster centers are highlighted in red. 2. Iris Dataset Dimensionality Reduction and Clustering:
I utilized the Iris dataset, loaded using the seaborn library, to showcase the application of K-means clustering on real-world data. The dataset was preprocessed by one-hot encoding categorical variables. Principal Component Analysis (PCA) was employed for dimensionality reduction to visualize the data in a 2D space. K-means clustering with three clusters was applied to the reduced dataset. The scatter plot illustrates the clustered data points in the PCA-transformed space, with cluster centers highlighted in red. Technologies Used:
Python with libraries such as NumPy, Matplotlib, scikit-learn, seaborn, and pandas. Conclusion:
The project demonstrates the versatility of the K-means algorithm in both synthetic and real-world datasets. Visualization techniques aid in understanding the clustering results and provide insights into the structure of the data. This project serves as an illustrative example of implementing K-means clustering for data prediction and visualization using synthetic and real-world datasets.