This repository contains a project focused on predicting cryptocurrency prices using various machine learning models. The goal is to analyze historical cryptocurrency data and build predictive models to estimate future prices. The dataset used in this project is crypto-markets.csv
, taken from kaggle https://www.kaggle.com/datasets/jessevent/all-crypto-currencies/data, which contains various features such as open, high, low, close prices, volume, and rank.
The project includes the following steps:
- Data Preprocessing
- Data Visualization
- Feature Selection
- Model Training and Evaluation
- Results Visualization
import numpy as np
import pandas as pd
import seaborn as sb
import seaborn as sns
from scipy import stats
import plotly.express as px
from sklearn.svm import SVR
import matplotlib.pyplot as plt
from xgboost import XGBRegressor
import plotly.graph_objects as go
from sklearn.linear_model import BayesianRidge
from sklearn.tree import DecisionTreeRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import LinearRegression, Lasso
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
-
Loading Data:
df = pd.read_csv('crypto-markets.csv') df.head()
-
Basic Data Analysis:
df.info() df.describe() df.isna().sum()
-
Handling Outliers:
z_scores = stats.zscore(df.select_dtypes(include=['float64', 'int64'])) abs_z_scores = np.abs(z_scores) filtered_entries = (abs_z_scores < 3).all(axis=1) df = df[filtered_entries]
-
Scaling Numerical Features:
scaler = StandardScaler() df[df.select_dtypes(include=['float64', 'int64']).columns] = scaler.fit_transform(df.select_dtypes(include=['float64', 'int64']))
-
Close Price Scatter Plot:
fig = go.Figure(data=[go.Scatter(y=df['close'])]) fig.update_layout(title='Crypto Close Price', yaxis_title='Price') fig.show()
-
Closing Price Trend:
fig = go.Figure(data=[go.Scatter(x=df['name'], y=df['close'], name='Closing Price')]) fig.update_layout(title='Closing Price Trend', xaxis_title='Name', yaxis_title='Closing Price') fig.show()
-
Closing Price Over Time:
fig = go.Figure(data=[go.Scatter(x=df['date'], y=df['close'], name='Closing Price')]) fig.update_layout(title='Closing Price Trend Over Time', xaxis_title='Date', yaxis_title='Closing Price') fig.show()
-
Distribution of Closing Prices:
plt.figure(figsize=(10, 6)) sns.histplot(df['close'], bins=50, kde=True) plt.title('Distribution of Closing Prices') plt.xlabel('Closing Price') plt.ylabel('Frequency') plt.show()
-
Proportion of Rank Now by Name:
df_positive_ranknow = df[df['ranknow'] > 0] ranknow_name = df_positive_ranknow.groupby('name')['ranknow'].sum() sample_data = ranknow_name.sample(20) colors = sns.color_palette('magma', len(sample_data)) fig = go.Figure(data=[go.Pie(labels=sample_data.index, values=sample_data.values, hole=0.5)]) fig.update_layout(title='Proportion of Rank Now by Name') fig.show()
-
Correlation Matrix:
numerical_features = df.select_dtypes(include=['float64', 'int64']).columns corr_matrix = df[numerical_features].corr() plt.figure(figsize=(12, 8)) sns.heatmap(corr_matrix, annot=True, cmap='coolwarm') plt.title('Correlation Matrix') plt.show()
-
Lasso Regression:
numerical_features = df.select_dtypes(include=['float64', 'int64']).columns X = df[numerical_features].drop(columns=['close']) y = df['close'] lasso = Lasso(alpha=0.01) lasso.fit(X, y) model = SelectFromModel(lasso, prefit=True) selected_features = X.columns[model.get_support()] print('Selected features by Lasso:', selected_features)
The following machine learning models were used to predict cryptocurrency prices:
-
Linear Regression:
model = LinearRegression() model.fit(X_train, y_train) y_pred = model.predict(X_test)
-
Decision Tree:
tree_model = DecisionTreeRegressor(random_state=42) tree_model.fit(X_train, y_train) y_pred = tree_model.predict(X_test)
-
Random Forest:
rf_model = RandomForestRegressor(n_estimators=100, random_state=42) rf_model.fit(X_train, y_train) y_pred = rf_model.predict(X_test)
-
Bayesian Ridge Regression:
bayesian_model = BayesianRidge() bayesian_model.fit(X_train, y_train) y_pred = bayesian_model.predict(X_test)
-
Support Vector Regression (SVR):
svr_model = SVR(kernel='rbf') svr_model.fit(X_train, y_train) y_pred = svr_model.predict(X_test)
-
Gradient Boosting:
gb_model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, random_state=42) gb_model.fit(X_train, y_train) y_pred = gb_model.predict(X_test)
-
XGBoost Regressor:
xgb_model = XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=42) xgb_model.fit(X_train, y_train) y_pred = xgb_model.predict(X_test)
The performance of each model was evaluated using Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared (R2) scores. The results were plotted to compare the models.
metrics = {
'Model': ['Linear Regression', 'Decision Tree', 'Random Forest', 'Bayesian Regression', 'Gradient Boosting', 'XGBoost'],
'MAE': [LRmae, DTmae, RFmae, BRmae, GBmae, XGmae],
'MSE': [LRmse, DTmse, RFmse, BRmse, GBmse, XGmse],
'RMSE': [LRrmse, DTrmse, RFrmse, BRrmse, GBrmse, XGrmse],
'R-squared': [LRr2, DTr2, RFr2, BRr2, GBr2, XGr2]
}
plt.figure(figsize=(14, 10))
# MAE plot
plt.subplot(2, 2, 1)
plt.plot(metrics['Model'], metrics['MAE'], marker='o', linestyle='-', color='b', label='MAE')
plt.title('Mean Absolute Error (MAE) Comparison')
plt.xlabel('Models')
plt.ylabel('MAE')
plt.xticks(rotation=45)
plt.grid(True)
plt.legend()
# MSE plot
plt.subplot(2, 2, 2)
plt.plot(metrics['Model'], metrics['MSE'], marker='o', linestyle='-', color='r', label='MSE')
plt.title('Mean Squared Error (MSE) Comparison')
plt.xlabel('Models')
plt.ylabel('MSE')
plt.xticks(rotation=45)
plt.grid(True)
plt.legend()
# RMSE plot
plt.subplot(2, 2, 3)
plt.plot(metrics['Model'], metrics['RMSE'], marker='o', linestyle='-', color='g', label='RMSE')
plt.title('Root Mean Squared Error (RMSE) Comparison')
plt.xlabel('Models')
plt.ylabel('RMSE')
plt.xticks(rotation=45)
plt.grid(True)
plt.legend()
# R-squared plot
plt.subplot(2, 2, 4)
plt.plot(metrics['Model'], metrics['R-squared'], marker='o', linestyle='-', color='m', label='R-squared')
plt.title('R-squared Comparison')
plt.xlabel('Models')
plt.ylabel('R-squared')
plt.xticks(rotation=45)
plt.grid(True)
plt.legend()
plt.tight_layout()
plt.show()
This project demonstrates the process of predicting cryptocurrency prices using various machine learning models. By preprocessing the data, visualizing trends, selecting relevant features, and applying multiple regression techniques, I was able to build a predictive models to estimate future prices of cryptocurrency.