From 8d20d2c24c953403fcbfe49dd3625355aa5e407a Mon Sep 17 00:00:00 2001 From: sydneyvernon Date: Thu, 3 Oct 2024 14:48:14 -0700 Subject: [PATCH] fix for latex formatting? --- docs/src/accelerators.md | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/docs/src/accelerators.md b/docs/src/accelerators.md index d5a03e111..f1572eeac 100644 --- a/docs/src/accelerators.md +++ b/docs/src/accelerators.md @@ -6,8 +6,20 @@ While developed for use with ensemble Kalman inversion (EKI), Accelerators have ## "Momentum" Acceleration in Gradient Descent -In traditional gradient descent, one iteratively solves for $x^*$, the minimizer of a function $f(x)$, by performing the update step $x_{k+1} = x_{k} + \alpha \nabla f(x_{k})$, where $\alpha$ is a step size parameter. -In 1983, Nesterov's momentum method was introduced to accelerate gradient descent. In the modified algorithm, the update step becomes $x_{k+1} = x_{k} + \beta (x_{k} - x_{k-1}) + \alpha \nabla f(x_{k} + \beta (x_{k} - x_{k-1}))$, where $\beta$ is a momentum coefficient. Intuitively, the method mimics a ball gaining speed while rolling down a constantly-sloped hill. +In traditional gradient descent, one iteratively solves for $x^*$, the minimizer of a function $f(x)$, by performing the update step + +```math +x_{k+1} = x_{k} + \alpha \nabla f(x_{k}), +``` + +where $\alpha$ is a step size parameter. +In 1983, Nesterov's momentum method was introduced to accelerate gradient descent. In the modified algorithm, the update step becomes + +```math +x_{k+1} = x_{k} + \beta (x_{k} - x_{k-1}) + \alpha \nabla f(x_{k} + \beta (x_{k} - x_{k-1})), +``` + +where $\beta$ is a momentum coefficient. Intuitively, the method mimics a ball gaining speed while rolling down a constantly-sloped hill. ## Implementation in Ensemble Kalman Inversion Algorithm @@ -15,8 +27,7 @@ EKI can be understood as an approximation of gradient descent ([Kovachki and Stu The traditional update step for EKI is as follows, with $j = 1, ..., J$ denoting the ensemble member and $k$ denoting iteration number. ```math - \tag{2} - u_{k+1}^j = u_{k}^j + \Delta t C_{k}^{u\mathcal{G}} (\frac{1}{\Delta t}\Gamma + C^{\mathcal{G}\mathcal{G}}_k)^{-1} \left(y - \mathcal{G}(u_k^j)\right) +u_{k+1}^j = u_{k}^j + \Delta t C_{k}^{u\mathcal{G}} (\frac{1}{\Delta t}\Gamma + C^{\mathcal{G}\mathcal{G}}_k)^{-1} \left(y - \mathcal{G}(u_k^j)\right) ``` When using the ``NesterovAccelerator``, this update step is modified to include a term reminiscent of that in Nesterov's momentum method for gradient descent. @@ -24,14 +35,12 @@ When using the ``NesterovAccelerator``, this update step is modified to include We first compute intermediate values: ```math - \tag{3} - v_k^j = u_k^j+ \beta_k (u_k^j - u_{k-1}^j) +v_k^j = u_k^j+ \beta_k (u_k^j - u_{k-1}^j) ``` We then update the ensemble: ```math - \tag{4} - u_{k+1}^j = v_{k}^j + \Delta t C_{k}^{u\mathcal{G}} (\frac{1}{\Delta t}\Gamma + C^{\mathcal{G}\mathcal{G}}_k)^{-1} \left(y - \mathcal{G}(v_k^j)\right) +u_{k+1}^j = v_{k}^j + \Delta t C_{k}^{u\mathcal{G}} (\frac{1}{\Delta t}\Gamma + C^{\mathcal{G}\mathcal{G}}_k)^{-1} \left(y - \mathcal{G}(v_k^j)\right) ``` The momentum coefficient $\beta_k$ here is recursively computed as $\beta_k = \theta_k(\theta_{k-1}^{-1}-1)$ in the ``NesterovAccelerator``, as derived in ([Su et al](https://jmlr.org/papers/v17/15-084.html)). Alternative accelerators are the ``FirstOrderNesterovAccelerator``, which uses $\beta_k = 1-3k^{-1}$, and the ``ConstantNesterovAccelerator``, which uses a specified constant coefficient, with the default being $\beta_k = 0.9$. The recursive ``NesterovAccelerator`` coefficient has generally been found to be the most effective in most test cases.