The goal of this project was to see how effective a reinforcement learning type training model would be for financial management. Normal models rely on linear regression and predictions of the next price of a stock. Instead, this project sought to determine if policy gradient, a type of reinforcement model, may be effective for portfolio management.
- 3rd Place at Delware Valley Science Fair for Computer Science in 10th Grade Division
- Pytorch for developing the policy gradient neural network
- Pandas for dataframe manipulation of S&P 500
- Numpy for mathematical functions
- Load Stock from the S&P 500
- Reset the Agent to have $100,000 balance and 0 shares of any stocks
- Provide state of the current stock prices of the day to the model
- Execute order if applicable based on the buy, hold, sell signals from the model for each stock
- Provide reward to the model for if portfolio value has increased or profit has been made (greater reward if profit increase)
- High, Low, Open, Close, Volume
- Values returned as differences from the previous value
balance + sum(number of shares * share price) for each stock
- Learning Rate (how aggressive model optimizations are) = 1e-6
- Gamme (how to value previous timestep data when making current decision) = 0.99
- Optimizer = Adam
As seen here, although the model occasionally has a negative return, it has more positive returns, which are all greater than the negative returns the model obtains
As seen here, the rate of positive returns are much more likely that the minimal risk of negative returns observered based on the testing procedure
- Null Hypothesis: The machine learning model will not be significantly better than the risk free return
- Anti-Hypothesis: The use of machine learning models on stock data will outperform the risk-free return on a 5 year scale
- Assessed risk-return ratio by calculating Sharpe's Ratio and comparing to risk-free investment along with t-test
- P-value = 1.1% < 5%
- Sharpes Ratio is calculated to be 0.3