The purpose of these notebooks is to test various machine learning predictive models on two sets of asset buckets: Tech stocks and Cryptocurrency. The code uses closing prices pulled from Alpaca API and CoinAPI and converted to CSV files for stocks and crypto assets respectively. The notebooks use the past 15 months of data considering highly volatile market to capture post COVID-19 shock across all models for optimal predictions. As the data in this study is of time series nature, we have used ARIMA, XGBoost, and LSTM models since they are time series friendly.
Language: Python3, Pandas
ARIMA Requirements:
numpy, pandas, math, matplotlib inline, Path from pathlib, plot_acf and plot_pacf from statsmodels.graphics.tsaplots, ARIMA from statsmodels.tsa.arima.model, metrics from sklearn
XGBoost Requirements:
numpy, pandas, xgboost, train_test_split from sklearn.model_selection, GridSearchCV, matplotlib.pyplot, plot_importance from xgboost , plot_tree, mean_squared_error from sklearn.metrics, MinMaxScaler from sklearn.preprocessing
LSTM Requirements:
numpy, pandas, math, hvplot.pandas, Path from pathlib, seed from numpy.random, random from tensorflow, MinMaxScaler from sklearn.preprocessing, Sequential from tensorflow.keras.models and LSTM, Dense, Dropout from tensorflow.keras.layers
External Resources:
BTC_data.csv, ETH_data.csv, LTC_data.csv, AMZN_GOOG_MSFT_data.csv pulled from Alpaca API and CoinAPI
Developed with Google Colab - Using Google Colab
ARIMA:
- Set up:
- Read in closing price data
- Calculate percent changes from daily returns
- Read in closing price data
- Usage:
- Specify desired date range
- Split date ranges into train and test split ratio
- Run PACF and ACF to help determine parameters
- Optimize parameters to improve performance
- Convert predicted percent change returns back to price predictions
- Specify desired date range
XGBoost:
- Set up:
- Read in closing price data
- Calculate financial indicators and added to the closing price data frame
- Read in closing price data
- Usage:
- Specify the train, validate, test split ratio
- Set the parameters
- Option to change the parameters and/or the split ratio to improve performance
- Specify the train, validate, test split ratio
LSTM:
- Set Up:
- Read in closing price data
- Create a dataframe for each set of closing price data
- Create a list of all dataframes to be run in the model
- Read in closing price data
- Usage:
- Specify number of units and dropout fraction for the model
- Remove or add layers to model as necessary for performance
- If using window_data function, specify window size
- Select dates to slice the data if necessary
- Specify train/test split ratio
- Option to change model optimizer to improve performance
- Specify number of epochs and batch size to optimize
- Specify number of units and dropout fraction for the model
Drew Disbrow Marnell: dldmarnell@gmail.com
Yoko Yamamoto: yyamamo222@gmail.com
Apexa Patel: apexa.dhirubhai@gmail.com
Matt Epler: epler.matt@gmail.com
MIT License Copyright (c) 2021 Drew Disbrow Marnell