Skip to content

cmhillm75/CryptoClustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CryptoClustering

  • Module 19 Challenge - This assignment compares two methods: directly applying the K-means model to scaled data and using Principal Component Analysis (PCA) to reduce data to 3 components before clustering.

Setup

Main Branch: Crypto_Clustering.ipynb The main Jupyter Notebook file containing clustering analysis.

Resources Folder: crypto_market_data.csv The original dataset used for the clustering analysis.

Images Folder: Contains 7 PNG files of all the plots and composites created in the Jupyter Notebook.

What are the best values for k?

  • The best value for k is 4, as it provides the most distinct elbow with a similar drop and pattern. The standard scaled K-means has slightly higher y values, but 4 is the distinct optimal k.

What is the total explained variance of the principal components?

  • The answer is 89.5% with the components ([0.3719856 , 0.34700813, 0.17603793]).

After visually analyzing the cluster analysis results, what's the impact of using fewer features to cluster using K-Means?

  • The principal component plot is tightly grouped within y values (-2, 2) and x values (1,1). The 3 distinct outliers are theta-token, celsisus-degree-token and ethlend.

  • The standard clusters cover a larger area and are less patterened. y values (-2, 2.5) and x values (-2,2). Group 0 largest impact from the 24 hour x value. Group 2 positively impacted by the 7 day y values. Group 0 less 7 day impact and is closest to (0, 0).

  • Below are the 3 tokens most appropriately shown as outliers from the overall patterns in the PCA cluster plot. Only the ethlend is more accurately captured by the principal components method.

coin_id = ethlend

  • Clusters = 3

  • PCA Scatter Plot: PCA1 = 8.089 and PCA2 = -3.897.

  • Standard Scatter Plot: price_change_percentage_24h = -4.981 and price_change_percentage_7d = -0.04581.

  • Comparison: The higher PCA1 value reflects significant variance in the first principal componenent.

  • Original data ethlend saw a -13.53% (24h) and +4.22% (7d).

  • Method: Both PCA and standard plots show ethlend as an outlier.

coin_id = celsisus-degree-token

  • Clusters = 1

  • PCA Scatter Plot: PCA1 = 4.792 and PCA2 = 6.768.

  • Standard Scatter Plot: price_change_percentage_24h = 1.046 and price_change_percentage_7d = -0.618.

  • Comparison: The higher PCA2 shows 2nd component has a greater impact than PCA1.

  • Original data +2.51% (24h) and +0.6% (7d)*.

  • Method: In the case of celsisus-degree-token, Standard plot is a better predictor of true value.

coin_id = theta token

  • Clusters = 0

  • PCA Scatter Plot: PCA1 = 2.677 and PCA2 = -0.01395.

  • Standard Scatter Plot: price_change_percentage_24h = -1.612 and price_change_percentage_7d = -1.682.

  • Comparison: PCA1 has a greater impact that PCA2. Standard scatter has almost exact same outcome for both measurements.

  • Original data: -4.56% (24h) and -6.09% (7d)

  • Method: In the case of theta token the standard scatter plot more actuartely predicts the true value.

References

Data for this dataset was generated by edX Boot Camps LLC, and is intended for educational purposes only.

License

This project is licensed under the terms of the GNU General Public License v3.0. For more details, see the LICENSE file.

About

Module 19 Challenge

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published