Skip to content

Implementation and improvement of Two‑Stage Deep Ensemble Paradigm Based on Optimal Multi‑scale Decomposition and Multi‑factor Analysis for Stock Price Prediction by Wang and Liu

License

Notifications You must be signed in to change notification settings

cbobowic/bigru_model_implementation

Repository files navigation

README

Overview

The goal of this project was to implement and improve the results of a recent paper by Jujie Wang and Jing Liu titled "Two‑Stage Deep Ensemble Paradigm Based on Optimal Multi‑scale Decomposition and Multi‑factor Analysis for Stock Price Prediction"1 The paper proposed a stock price prediction algorithm which consists of significant data preprocessing followed by a Bidirectional Gated Recurrent Unit (BiGRU) prediction model. By improving the frequency BiGRU models, the frequency predictions were sufficiently accurate to remove the final BiGRU integration model, outperforming the original paper and all 7 other models compared in the paper in all three error metrics.

The data preprocessing consists of two distinct stages, followed by the prediction stage:

  1. Optimal Multi-Scale Decomposition
    • The target variable (Close price) is first decomposed using Single Spectrum Analysis (SSA). To extract the meaningful frequencies of the data and reduce noise, only the first three components are retained. These components are reconstructed to produce a denoised version of the close price.
    • The denoised close price is then decomposed into the optimal number of modes via Variational Mode Decomposition (VMD). The optimal number of modes is determined algorithmically, instead of manual selection. Starting at two modes, the algorithm computes the Sample Entropy10 (SampEn) of the adjacent modes. If the SampEn is sufficiently similar between adjacent modes, the algorithm stops and selects the number of modes as the current number of modes. Otherwise, the algorithm continues to add modes until the difference in SampEn reaches the necessary threshold.
    • The VMD decomposition is then split into "High Frequency" and "Low Frequency" by calculating the Approximate Entropy9 (ApEn) of each mode. Modes with ApEn greater than the mean ApEn are classified as "High Frequency" and the rest are classified as "Low Frequency".
    • The "High Frequency" and "Low Frequency" modes are then each reconstructed to produce the denoised close price optimally decomposed by frequency.
  2. Multivariate Analysis
    • For each of the "High Frequency" and "Low Frequency" timeseries, 15 variables are considered for multivariate analysis. These variables include the open, high, low, 5-day moving average (MA), 10-day MA, 20-day exponential MA, Bollinger Band highs, Bollinger Band lows, William's variable accumulation distribution, Chaikin A/D line, Chaikin A/D oscillator, on-balance volume, time series forcast, commodity channel index, and moving average convergence divergence.
    • Only variables which both pass the Spearman correlation coefficient test (when the p-value is less than 0.05) and have a mutual information (MI) indicator greater than the mean are further considered in the model.
    • A Principal Component Analysis (PCA) is then performed on the selected variables to reduce the dimensionality of the data.
  3. Prediction
    • For each frequency, a separate BiGRU model is then trained. The PCA-transformed data along with the respective frequency with a first-order lag is used as input to the BiGRU model. The model is trained to predict the close price of the next day. A BiGRU size of 512 units was selected to balance performance and training efficiency. Each BiGRU is trained to minimize mean square error (MSE) and tested on the last 20% of the data.
    • For the final prediction, a third BiGRU model takes both frequency predictions as input and predicts the final close price.

Novel Contributions

The original paper proposed a two-step BiGRU integration model for the prediction stage. I found that the model could be improved in two key areas:

  1. Dropout Layer: The original paper did not include a dropout layer in the BiGRU model. I found that adding a dropout layer between the BiGRU and dense prediction layer significantly improved the performance of the low-frequency prediction model. The dropout layer reduced overfitting and allowed the model to better capture the long patterns in the data.
  2. Direct Summation of Frequencies: The improved performance from dropout layers at the frequency level resulted in highly accurate frequency predictions. Since the original signal can be reconstructed by summing the low-frequency and high-frequency series, I found that the final prediction model could be simplified by removing the final BiGRU layer and instead directly summing the low-frequency and high-frequency predictions. This reduced the complexity of the model and improved the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE) for the SZI index, with all three metrics outperforming the results reported in the original paper.

Results

SPY

The SPY prices from January 1, 2012 to January 1, 2023 were used as a sample stock. The data was split into training and testing sets with a 80/20 split.

SPY Initial Closing Prices

The initial closing prices of the SPY stock are shown above, along with the recomposed signal after SSA was performed.

SPY Frequency Decompositions

The frequency decompositions of the SPY stock are shown above. The optimal number of modes for VMD decomposition was 5. Per ApEn scores, the low frequency was reconstructed from imf1, imf4, imf5, and the high frequency was reconstructed from imf2 and imf3.

For the high frequency data, the following variables were retained: open, high, low, MA5, MA10, EMA20, BOLL_high, BOLL_low, OBV, TSF, and MACD. The low frequency data retained all of the above except MACD. The SPY data varied significantly from SZI and the results in the paper in that, through PCA, only 1 component was retained for the low frequency and 2 components for the high frequency while retaining >95% of the variance. The PCA-transformed data was then used as input to the BiGRU models.

High-Frequency BiGRU Prediction Model:

Layer Output Shape Param #
Bidirectional GRU (None, 10, 1024) 1,588,224
Dropout (None, 10, 1024) 0
Dense (None, 10, 1) 1,025

SPY High Frequency Prediction

Low-Frequency BiGRU Prediction Model:

Layer Output Shape Param #
Bidirectional GRU (None, 10, 1024) 1,585,152
Dropout (None, 10, 1024) 0
Dense (None, 10) 1,025

SPY Low Frequency Prediction

Final BiGRU Prediction Model:

Layer Output Shape Param #
Bidirectional GRU (None, 1024) 1,585,152
Dense (None, 1) 1,025

Although Wang and Liu used just one BiGRU layer followed by the dense prediction layer, I found that the low- and high-frequency predictions were significantly improved by adding a dropout layer. The integration model did not benefit from the dropout layer. The models were trained for 20 epochs with a batch size of 16, as well as an early stopping callback to reduce overfitting. Training locally took approximately 2s per epoch.

SPY Final Prediction

Alternative Prediction Method

Since the low- and high-frequencies were modeled with such high accuracy with the improved BiGRU architecture, directly summing the two frequency predictions resulted in even lower error rates:

SPY Final Prediction Novel

The numerical results are reported in the table below:

Model MAE RMSE MAPE (%)
SSA-VMD-MFA-BiGRU 4.3789 5.6107 1.0599
Improved SSA-VMD-MFA-BiGRU 2.9181 3.8153 0.7172

The direct summation technique outperformed the SSA-VMD-MFA-BiGRU model by Wang and Liu in all three metrics.

SZI

To compare performance with the original paper, the Shenzhen Component Index (SZI) prices from October 28th, 2014 to October 28th, 2022 were used as a sample stock. As in the paper, the optimal VMD decomposition was 7 modes. Imf4 to imf7 were used for high frequency and imf1 to imf3 were used for low frequency. The same high-frequency variables from the SPY data were retained except for the high Bollinger Bands, which were not retained. The low-frequency variables were the same as the SPY data. PCA was performed on the high-frequency data with 3 components retained and the low-frequency data with just 2 components retained. The prediction steps were the same as used in SPY.

SZI Low Frequency Predictions

SZI High Frequency Predictions

SZI Final Prediction

For some unknown reason, the SZI final prediction did not quite achieve the error reported by Wang and Liu. However, the direct summation technique outperformed the original paper in all three metrics, as well as each GRU and LSTM model compared in the paper:

Model MAE RMSE MAPE (%)
GRU2 221.3521 290.9207 1.6643
CEEMDAN-LSTM3 223.2584 281.5606 1.6427
SSA-VMD-MFA-BiGRU 171.1673 221.2086 1.2796
Improved SSA-VMD-MFA-BiGRU 160.1349 190.2766 1.1747
SSA-SVR4 446.6226 486.6464 3.9549
CNN-BILSTM5 479.4441 588.2834 4.6337
SSA-BIGRU6 187.7221 247.5064 1.4110
VMD-LSTM7 183.1706 235.6836 1.3634
VDM-SE-GRU8 178.0437 230.4729 1.3526

SSEC

The other stock used for comparison in the paper was the Shanghai Stock Exchange Composite Index (SSEC) from October 28th, 2014 to October 28th, 2022. Similarly results followed, with the direct summation technique outperforming the original paper in all three metrics:

Original prediction method by Wang and Liu SSEC Prediction

Direct summation prediction method SSEC Prediction Novel

Model MAE RMSE MAPE (%)
GRU2 37.3347 50.9810 1.1155
CEEMDAN-LSTM3 102.4273 123.4081 3.0580
SSA-VMD-MFA-BiGRU 29.9973 39.3508 0.8946
Improved SSA-VMD-MFA-BiGRU 16.2963 20.8189 0.4832
SSA-SVR4 109.1372 117.2263 3.3783
CNN-BILSTM5 133.8181 167.2553 4.5655
SSA-BIGRU6 76.6564 89.0099 2.2300
VMD-LSTM7 32.7635 41.5849 0.9680
VDM-SE-GRU8 30.5862 39.5800 0.9073

Citations

1 Wang, J., Liu, J. Two-Stage Deep Ensemble Paradigm Based on Optimal Multi-scale Decomposition and Multi-factor Analysis for Stock Price Prediction. Cogn Comput 16, 243–264 (2024). https://doi.org/10.1007/s12559-023-10203-x.

2 Saud AS, Shakya S. Analysis of look back period for stock price prediction with RNN variants: a case study on banking sector of NEPSE. Procedia Comput Sci. 2020. https://doi.org/10.1016/j.procs.2020.03.419.

3 Lin Y, Lin ZX, Liao Y, Li YZh, Xu JL, Yan Y. Forecasting the realized volatility of stock price index: a hybrid model integrating CEEMDAN and LSTM. Expert Syst Appl. 2022. https://doi.org/10.1016/j.eswa.2022.117736.

4 Lahmiri S. Minute-ahead stock price forecasting based on singular spectrum analysis and support vector regression. Appl Math Comput. 2017. https://doi.org/10.1016/j.amc.2017.09.049.

5 Barua R, Sharma AK. Dynamic Black Litterman portfolios with views derived via CNN-BiLSTM predictions. Financ Res Lett. 2022. https://doi.org/10.1016/j.frl.2022.103111.

6 Li XCh, Ma XF, Xiao FCh, Xiao C, Wang F, Zhang Sh. Timeseries production forecasting method based on the integration of Bidirectional Gated Recurrent Unit (Bi-GRU) network and Sparrow Search Algorithm (SSA). J Petrol Sci Eng. 2021. https://doi.org/10.1016/j.petrol.2021.109309.

7 Yu YY, Lin Y, Hou XP, Zhang X. Novel optimization approach for realized volatility forecast of stock price index based on deep reinforcement learning model. Expert Syst Appl. 2023. https://doi.org/10.1016/j.eswa.2023.120880.

8 Zhang SQ, Luo J, Wang SY, Liu F. Oil price forecasting: a hybrid GRU neural network based on decomposition–reconstruction methods. Expert Syst Appl. 2023. https://doi.org/10.1016/j.eswa.2023.119617.

9 Wikipedia contributors. (2023, September 5). Approximate entropy. In Wikipedia, The Free Encyclopedia. Retrieved 01:38, May 31, 2024, from https://en.wikipedia.org/w/index.php?title=Approximate_entropy&oldid=1173919176

10 Wikipedia contributors. (2024, March 14). Sample entropy. In Wikipedia, The Free Encyclopedia. Retrieved 01:41, May 31, 2024, from https://en.wikipedia.org/w/index.php?title=Sample_entropy&oldid=1213663278

Data was sourced from Yahoo Finance.

About

Implementation and improvement of Two‑Stage Deep Ensemble Paradigm Based on Optimal Multi‑scale Decomposition and Multi‑factor Analysis for Stock Price Prediction by Wang and Liu

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published