Skip to content

Use Python and unsupervised machine learning learning to predict if cryptocurrencies are affected by 24-hour or 7-day price changes.

Notifications You must be signed in to change notification settings

jaezak/crypto_unsupervised_machine_learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

lesson banner

Cryptocurrency Price Change Prediction

Use Python and unsupervised learning to predict if cryptocurrencies are affected by 24-hour or 7-day price changes. This project offers an exciting opportunity to explore the world of cryptocurrencies and develop a predictive model that can help cryptocurrency enthusiasts and traders make informed decisions.

Overview

Cryptocurrency markets are known for their volatility, and traders often seek to understand short-term (24-hour) and longer-term (7-day) price trends. In this challenge, your primary goals are as follows:

  • Data Collection: Gather relevant data on cryptocurrency prices, including historical price changes for both 24-hour and 7-day periods.

  • Data Preprocessing: Clean and preprocess the data, handling missing values, outliers, and other data quality issues.

  • Unsupervised Learning: Apply unsupervised learning techniques to discover patterns and relationships within the cryptocurrency data.

  • Classification Model: Build a predictive classification model to determine whether cryptocurrencies are affected by 24-hour or 7-day price changes.

  • Model Evaluation: Assess the performance of your model using appropriate evaluation metrics, such as accuracy, precision, recall, and F1-score.

Dataset

You will have access to a dataset containing historical cryptocurrency price data, including features related to different cryptocurrencies, their prices, and price changes over 24-hour and 7-day periods. This dataset will serve as the foundation for your predictive modeling efforts.

Getting Started

  1. Rename the Crypto_Clustering_starter_code.ipynb file as Crypto_Clustering.ipynb.

  2. Load the crypto_market_data.csv into a DataFrame.

  3. Get the summary statistics and plot the data to see what the data looks like before proceeding.

Prepare the Data

  • Use the StandardScaler() module from scikit-learn to normalize the data from the CSV file.

  • Create a DataFrame with the scaled data and set the "coin_id" index from the original DataFrame as the index for the new DataFrame.

    • The first five rows of the scaled DataFrame should appear as follows:

      The first five rows of the scaled DataFrame

Find the Best Value for k Using the Original Scaled DataFrame

Use the elbow method to find the best value for k using the following steps:

  • Create a list with the number of k values from 1 to 11.
  • Create an empty list to store the inertia values.
  • Create a for loop to compute the inertia with each possible value of k.
  • Create a dictionary with the data to plot the elbow curve.
  • Plot a line chart with all the inertia values computed with the different values of k to visually identify the optimal value for k.
  • Answer the following question in your notebook: What is the best value for k?

Cluster Cryptocurrencies with K-means Using the Original Scaled Data

Use the following steps to cluster the cryptocurrencies for the best value for k on the original scaled data:

  • Initialize the K-means model with the best value for k.
  • Fit the K-means model using the original scaled DataFrame.
  • Predict the clusters to group the cryptocurrencies using the original scaled DataFrame.
  • Create a copy of the original data and add a new column with the predicted clusters.
  • Create a scatter plot using hvPlot as follows:
    • Set the x-axis as "price_change_percentage_24h" and the y-axis as "price_change_percentage_7d".
    • Color the graph points with the labels found using K-means.
    • Add the "coin_id" column in the hover_cols parameter to identify the cryptocurrency represented by each data point.

Optimize Clusters with Principal Component Analysis

  • Using the original scaled DataFrame, perform a PCA and reduce the features to three principal components.

  • Retrieve the explained variance to determine how much information can be attributed to each principal component and then answer the following question in your notebook:

    • What is the total explained variance of the three principal components?
  • Create a new DataFrame with the PCA data and set the "coin_id" index from the original DataFrame as the index for the new DataFrame.

    • The first five rows of the PCA DataFrame should appear as follows:

      The first five rows of the PCA DataFrame

Find the Best Value for k Using the PCA Data

Use the elbow method on the PCA data to find the best value for k using the following steps:

  • Create a list with the number of k-values from 1 to 11.
  • Create an empty list to store the inertia values.
  • Create a for loop to compute the inertia with each possible value of k.
  • Create a dictionary with the data to plot the Elbow curve.
  • Plot a line chart with all the inertia values computed with the different values of k to visually identify the optimal value for k.
  • Answer the following question in your notebook:
    • What is the best value for k when using the PCA data?
    • Does it differ from the best k value found using the original data?

Cluster Cryptocurrencies with K-means Using the PCA Data

Use the following steps to cluster the cryptocurrencies for the best value for k on the PCA data:

  • Initialize the K-means model with the best value for k.
  • Fit the K-means model using the PCA data.
  • Predict the clusters to group the cryptocurrencies using the PCA data.
  • Create a copy of the DataFrame with the PCA data and add a new column to store the predicted clusters.
  • Create a scatter plot using hvPlot as follows:
    • Set the x-axis as "PC1" and the y-axis as "PC2".
    • Color the graph points with the labels found using K-means.
    • Add the "coin_id" column in the hover_cols parameter to identify the cryptocurrency represented by each data point.
  • Answer the following question:
    • What is the impact of using fewer features to cluster the data using K-Means?

References

Data for this dataset was generated by edX Boot Camps LLC, and is intended for educational purposes only.

About

Use Python and unsupervised machine learning learning to predict if cryptocurrencies are affected by 24-hour or 7-day price changes.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published