This project explores delta hedging strategies in quantitative finance, comparing traditional analytical methods with emerging Reinforcement Learning (RL) approaches for risk management in options trading. By implementing and evaluating these strategies across simulated market environments, including the Black-Scholes and Heston models, the study highlights the strengths and limitations of each approach, taking into account factors such as transaction costs and stochastic volatility. The findings suggest that while analytical methods offer a solid foundation, RL-based strategies show promising adaptability, providing valuable insights for optimizing risk management in complex financial markets.
Here's you can find the full master's thesis, while here the presentation.
The reinforcement learning algorithm used to trained the RL agents is the TD3 algorithm (Twin Delayed Deep Deterministic policy gradient algorithm)
The methodology involves comparing the TD3 strategy (
-
Black-Scholes (BS) Environment:
- BS Strategy (
$\pi^{BS}$ ) - Wilmott Strategy (
$\pi^{W}$ )
- BS Strategy (
-
Heston Environment:
- Heston Strategy (
$\pi^{H}$ ) - Wilmott Strategy (
$\pi^{W}$ ) (equipped with Heston hedging delta)
- Heston Strategy (
The following parameters were used for the experiments:
- Initial asset price:
$S_0 = 100$ - Risk-free rate:
$r = 0$ (risk-neutral measure) - Heston model parameters:
$v_0 = 0.04$ $\theta = 0.04$ $\sigma = 0.1$ $\rho = -0.7$ $\kappa = 0.1$
- Transaction cost function:
$\xi = 1$ ,$\eta = \textit{Ticksize}$ - Risk-aversion parameter:
$\lambda = 0.1$
- Two hidden layers, each containing 64 fully connected neurons
- ReLU activation function
- Separate RL agents were trained for Black-Scholes and Heston environments.
- Different transaction cost scenarios were employed:
- No transaction costs: Ticksize = 0.00
- Low transaction costs: Ticksize = 0.01
- High transaction costs: Ticksize = 0.05
- Each RL agent underwent 8000 episodes, incorporating diverse market conditions, such as:
- Black-Scholes: Varied moneyness, volatilities, time to expiration, rebalancing frequencies
- Heston: Varied moneyness, time to expiration, rebalancing frequencies
The performance of the hedging strategies was evaluated using the following metrics:
- Replication Error:
-
Key Metrics:
- Mean Error in Negative (MEN):
$$MEN = \frac{MHE}{\Pi_0}$$ - Standard Error Normalized (SEN):
$$SEN = \frac{\sqrt{\frac{1}{n} \sum_{i = 1}^n \left(( \Pi_{T, i} - (X_{T, i} - K_i)^{+} ) - MHE\right)^2}}{\Pi_0}$$ Where:
$$MHE = \frac{1}{n} \sum_{i = 1}^n ( \Pi_{T, i} - (X_{T, i} - K)^{+} )$$
- Call Option
-
Black-Scholes delta hedging with
$\Delta^{BS}$ -
Black-Scholes delta hedging with
$\Delta^{W}$ (Wilmott delta) -
Heston delta hedging with
$\Delta^{H}$ -
Heston delta hedging with
$\Delta^{W}$ (Wilmott delta)
-
Black-Scholes delta hedging with
- Spread Option
- Reinforcement Learning Agent training
- Hedging in BS environment with TD3 with no transaction costs
- Hedging in BS environment with TD3 with low transaction costs
- Hedging in BS environment with TD3 with high transaction costs
- Hedging in Heston environment with TD3 with no transaction costs
- Hedging in Heston environment with TD3 with low transaction costs
- Hedging in Heston environment with TD3 with high transaction costs
- Results