A paper published in 2022, by Hans Buehler, an accomplished researcher in quantitative finance, discusses a reinforcement learning (RL) framework for dynamically hedging portfolios of financial instruments (e.g., securities, derivatives) using historical data. The approach is based on solving a Bellman equation to optimize hedging strategies in continuous state and action spaces, incorporating trading frictions (e.g., transaction costs, liquidity constraints) and risk-adjusted return objectives.
The Fast American option pricing is based off of the paper "High Performance American Option Pricing"