Authors and affiliation:
Prof. Johannes Ruf j.ruf@lse.ac.uk, http://www.maths.lse.ac.uk/Personal/jruf/, London School of Economics and Political Science.
Weiguan Wang weiguanwang@outlook.com, https://weiguanwang.github.io/, Shanghai University.
22 April 2021
Suggested citation:
J. Ruf and W. Wang, Hedging with Linear Regressions and Neural Networks, SSRN 3580132, 2021. Forthcoming in the Journal of Business and Economic Statistics. Download at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3580132
Supplementary reading:
J. Ruf and W. Wang, Neural Networks for Option Pricing and Hedging: A Literature Review, Journal of Computational Finance, volume 24, number 1, pages 1-45, 2020. Download at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3486363Introduction
J. Ruf and W. Wang (2021b), Information Leakage in Backtesting. SSRN 3836631. Download at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3836631
This documentation explains the code structure and data folders to reproduce the results in Ruf and Wang (2021). To run the code provided here, the user needs to:
- Overwrite the
DATA_DIR
variable in thesetup.py
to your own choice. - Obtain raw data (should you want to work with historical datasets) and rename files as detailed in Data folder strucuture.
The code consists of four subfolders. They are libaray
, Simulation
, OptionMetrics
, and Euroxx
. The library
folder contains functions used by other parts of the code. The library
consists of:
bs.py
: This file contains a function used to simulate the Black-Scholes dataset.cleaner_aux.py
: This file contains functions used to clean raw data.common.py
: This file contains functions that calculate and inspect the hedging error.heston.py
: This file contains functions used to simulate the Heston dataset, as well as calculating option prices in the Heston model.loader_aux.py
: This file contains functions used to load clean data (before training the ANN or linear regressions).network.py
: This file implements HedgeNet and auxiliary functions.plot.py
: This file contains functions used to plot diagnostic figures.regression_aux.py
: This file contains functions that implement the linear regression methods.simulation.py
: This file contains functions that implement the CBOE rules, and organize data.stoxx.py
: This file contains function used to clean the Euro Stoxx 50 dataset only.
In each of the other three folder, there are two python files that are used by other notebooks:
setup.py
: This file contains all the flags to configure experiments. It varies by datasets, and contains two major configurations:- It specifies the hedging period, time window size, data cleaning choice, and other experimental setup.
- It specifies the location of raw data, clean data, and the stored results.
Load_Clean_aux.py
loads the clean data and implements some extra cleaning, before running linear regressions or ANNs.
The notebooks have a very similar structure as follows:
- In the simulation folder, the first notebook implements the data simulation. In the OptionMetrics and Euroxx folder, the first notebook implements the cleaning of the historical raw datasets downloaded from data providers.
-
2_Regression_Generate.ipynb
implements all linear regressions on sensitivities and stores the PNL (MSHE) files. -
3_Tuning _Hyper.ipynb
implements the tuning of $L^2 $ regularisation parameters. -
4_Network.ipynb
implements the training of the ANN and stores the PNL files (MSHE of ANN). -
5_Diagnostic.ipynb
creates tables to summarize PNL (MSHE) files in terms of given performance measure, across several experimental setups, i.e. globally for each dataset. -
6_Local_Diag_And_Plots.ipynb
implements the diagnostics of PNL files for a single experimental setup. Plots made from PNL files are generated in this file. They include linear regression coefficients, mean squared hedging error plots, MSHE vs sensitivities, confidence interval and etc. -
7_Analysis_of_(Semi-)CleanData.ipynb
implements the analysis of raw and clean data. They include histograms of certain features, number of samples in each time window, volatility, leverage effect, etc. -
8_Bucket_Moneyness.ipynb
splits the data set by moneyness into several buckets, and runs statistical models on each bucket independently.
Before running the code, one needs to specify the directory that stores the simulation data, (or historical data) and the results. This is done by overwriting the DATA_DIR
variable in each of the setup.py
file.
The data folders have two common subfolders,
CleanData
: It stores simulated data in case of Black-Scholes or Heston data, or cleaned data generated by1_Clean.ipynb
in case of historical data.Result
: It store the PNL files and other auxiliary files, either from the linear regressions or ANN. They also include tables made by5_Diagnostic.ipynb
. For the ANN, it additionally contains loss plots, checkpoints, etc. For the linear regression, it additionally contains regression coefficients, standard errors, etc.
For the two historical datasets, there is an extra folder RawData
to store data given by data providers. Data needs to be arranged and renamed in the following way for the code to run.
-
For the S&P 500 data. There are 4 files:
-
option_price.csv
contains option quotes downloaded from OptionMetrics. -
spx500.csv
contains the close-of-day price of S&P 500. -
onr.csv
contains the overnight LIBOR rate downloaded from Bloomberg. -
interest_rate.csv
contains the interest rate derived from zero-coupon bond for maturity larger than 7 days, downloaded from OptionMetrics.
-
-
For the Euro Stoxx data. Data needs to be put in four folders:
futures
contains two files,futures.csv
andrefData.csv
; the former contains the tick trading data of futures, and the latter contains the contract specifications of futures in the former.options
contains two files,options.csv
andrefData.csv
; they are tick trading data of options and their reference.interest_rate
contains seven files. They areLIBOR_EURO_ON
,LIBOR_EURO_1M.csv
,LIBOR_EURO_3M.csv
,LIBOR_EURO_6M.csv
LIBOR_EURO_12M.csv
; namely, LIBOR rate of overnight, maturity 1 month, 3 months, 6 months, 12 months. The other two files areZERO_EURO_5Y
andZERO_EURO_10Y
; namely, interest rate derived from zero-coupon bond of maturity 5 and 10 years.stoxx50.csv
is the end-of-day spot of Euro Stoxx 50 index.
-
We use business day convention when counting and offsetting days, where business days consist of all weekdays. However, the stock/option trading days are a subset of business days due to the existence of certain public holidays. For instance, Martin Luther King Day is not a trading day on the Chicago Board Option Exchange, where the S&P 500 options are traded. The current code does not take this difference into account, and hence unnecessarily removes samples when it cannot obtain the stock/option price at the end of a hedging period. This problem has no significant impact for the results and conclusions presented here as it only reduces the sample size, and only by a miniscule amount.
-
The code uses continuous compounding for computing the single-period return on the risk-free asset. However, Equation (1) in the paper uses simple compounding. This is slightly inconsistent but does not seem to change the results in any significant way.
We thank Yiren Wu and Max Yang for reporting these two issues.
Package | Version |
---|---|
Anaconda | 2019.03 |
Keras | 2.2.4 |
Python | 3.6 |
Numpy | 1.16.3 |
Pandas | 0.24.2 |
Scikit-learn | 0.20.3 |
Scipy | 1.2.1 |
Seaborn | 0.9 |
Tensorflow | 1.13.1 |