Unsupervised learning, Transformation and Clustering, PCA
Create a report that includes what cryptocurrencies are on the trading market and how they could be grouped to create a classification system for this new investment. Since there is no known output for what to looking for, need to use unsupervised learning. To group the cryptocurrencies, use a clustering algorithm. Finallay, use data visualizations to share findings with the board.
- preprocessing (cleaning, scaling) the database,
- using Principal Component Analysis to see the data in dimentional reduction,
- clustering cryptocurrencies using K-Means algorithm,
- visualizing classified results with 2D and 3D scatter plots.
Tools : Python, jupyter notebook, skilearn
Using unsupervised machine learning to identify clusters of the cryptocurrencies. We produced the elbow curve below using the K-Means method iterating on k values from 1 to 10 to define the ideal number of clusters based on data features. So, based on the Elbow curve, we will use 4 clusters to categorize the crytocurrencies.
3D-Scatter plot with PCA algorithm to reduce the crytocurrencies dimensions to three principal components on clusters
We created hv_table as an interactive table in the jupyter notebook, as you can see in the picture below as sorted by class, most of the cryptocurrencies belongs to #0, #1 classes, and there is only one cryptocurrencies belongs to #3, which is highlight here, BitTorrent;
Plotting the scatter plot with two cryptocurrency features directly does not efficiently segregate the different classes. As you seen earlier using the PCA algorithm is the right method for better visualizations.
We have showed 4 clusters of cryptocurrencies after classification of 532 cryptocurrencies based on similarities of their features by unsupervised machine learning algorithms. 3D PCA plot nicely showed the 4 clusters, most of the cryptocurrencies are part of the #0, #1 clusters, 6 of them belong to #2 cluster, only 1 of them belongs to #3. Only with two cryptocurrency features (TotalCoinMined vs TotalCoinSupply) does not efficiently segregate the different classes. So, more features giving more power to classify the cryptocurrencies in this case.