- Module 19 Challenge - This assignment compares two methods: directly applying the K-means model to scaled data and using Principal Component Analysis (PCA) to reduce data to 3 components before clustering.
Main Branch: Crypto_Clustering.ipynb
The main Jupyter Notebook file containing clustering analysis.
Resources Folder: crypto_market_data.csv
The original dataset used for the clustering analysis.
Images Folder: Contains 7 PNG files of all the plots and composites created in the Jupyter Notebook.
- The best value for k is
4
, as it provides the most distinct elbow with a similar drop and pattern. The standard scaled K-means has slightly highery
values, but4
is the distinct optimal k.
- The answer is 89.5% with the components
([0.3719856 , 0.34700813, 0.17603793])
.
After visually analyzing the cluster analysis results, what's the impact of using fewer features to cluster using K-Means?
-
The principal component plot is tightly grouped within
y
values (-2, 2) andx
values (1,1). The 3 distinct outliers are theta-token, celsisus-degree-token and ethlend. -
The standard clusters cover a larger area and are less patterened.
y
values (-2, 2.5) andx
values (-2,2). Group 0 largest impact from the 24 hourx
value. Group 2 positively impacted by the 7 dayy
values. Group 0 less 7 day impact and is closest to (0, 0). -
Below are the 3 tokens most appropriately shown as outliers from the overall patterns in the PCA cluster plot. Only the ethlend is more accurately captured by the principal components method.
-
Clusters = 3
-
PCA Scatter Plot:
PCA1
= 8.089 andPCA2
= -3.897. -
Standard Scatter Plot:
price_change_percentage_24h
= -4.981 andprice_change_percentage_7d
= -0.04581. -
Comparison: The higher
PCA1
value reflects significant variance in the first principal componenent. -
Original data
ethlend
saw a -13.53%(24h)
and +4.22%(7d)
. -
Method: Both PCA and standard plots show ethlend as an outlier.
-
Clusters = 1
-
PCA Scatter Plot:
PCA1
= 4.792 andPCA2
= 6.768. -
Standard Scatter Plot:
price_change_percentage_24h
= 1.046 andprice_change_percentage_7d
= -0.618. -
Comparison: The higher
PCA2
shows 2nd component has a greater impact thanPCA1
. -
Original data +2.51%
(24h)
and +0.6%(7d)
*. -
Method: In the case of
celsisus-degree-token
, Standard plot is a better predictor of true value.
-
Clusters = 0
-
PCA Scatter Plot:
PCA1
= 2.677 andPCA2
= -0.01395. -
Standard Scatter Plot:
price_change_percentage_24h
= -1.612 andprice_change_percentage_7d
= -1.682. -
Comparison:
PCA1
has a greater impact thatPCA2
. Standard scatter has almost exact same outcome for both measurements. -
Original data: -4.56%
(24h)
and -6.09%(7d)
-
Method: In the case of
theta token
the standard scatter plot more actuartely predicts the true value.
Data for this dataset was generated by edX Boot Camps LLC, and is intended for educational purposes only.
This project is licensed under the terms of the GNU General Public License v3.0. For more details, see the LICENSE file.