Possible speedup of localization by looping observations #175

tommyod · 2023-11-21T12:12:04Z

One of the great things about the KF is that it runs on $O(m)$ time for updating instead of $O(m^3)$ when we have $m$ measurements. This is possible when the measurement error is independent, and consequently the covariance of the error is diagonal. See Understanding the EnKF Section 4.2 Serial Updating. https://folk.ntnu.no/joeid/Katzfuss.pdf it refers to this paper that I believe is the origin paper for covariance localization https://www.researchgate.net/publication/228817876_A_Sequential_Ensemble_Kalman_Filter_for_Atmospheric_Data_Assimilation

Our measurement error is so independent so that we have even built it into the api I believe. At least for current practical purposes, $\Sigma_\epsilon$ will be diagonal. Then why can we too not do sequential updates? There probably is a good reason and I am not seeing it.

Thus throwing something wild out there: how about iterating over observations assimilating them sequentially, and then iterate over parameters when doing this sequential update?
for (dj, yj, Sigma_eps_j) in (observations, responses, variance):
    H = LLS coefficients yj on X[mask,:] # 1xp but using X which could be p2xn for p2<p depending on mask
    Sigma_yj = H @X @ X.T @ H.T # 1x1
    Sigma_d = Sigma_yj + Sigma_eps_j # 1x1
    for i in realizations
        T_ji = (dj - yji) / Sigma_d # 1x1
        for X_ki in realization_i_of_parameters:
            if dj not masked for xk:
                X_ki = X_ki + cov_xk_yj*T_ji
Is this easily wrong? Or wrong for some other reason? I tried to adjust for the weird reasons that using the full transition matrix was wrong (writing H = LLS coefficients yj on X[mask:,]). It therefore assumes the masking is pre-computed. Is this unreasonable?

Edit: If you do not have immediate objections (either theoretically or computationally) I could try implementing it for the Guass-Linear notebook example and see if results makes sense.

The text was updated successfully, but these errors were encountered:

tommyod · 2023-12-06T13:38:53Z

$$X_\text{updated} = \text{cov}(X, X) G^T (G \text{cov}(X, X) G^T + \alpha C_D)^{-1} (D - Y)$$

Let there be $n$ parameters, $m$ responses and $N$ ensemble members. Consider a single $y_j$, a row vector corresponding to resonse $j$. Then $C_D$ becomes a number (diagonal $(j, j)$ ) and we have shapes

$$X_\text{updated} = \underbrace{\text{cov}(X, X)}_{n \times n} \underbrace{G^T}_{n \times 1} (\underbrace{G}_{1 \times n} \underbrace{\text{cov}(X, X)}_{n \times n} \underbrace{G^T}_{n \times 1} + \underbrace{\alpha C_D}_{1 \times 1})^{-1} \underbrace{(d_j - y_j)}_{1 \times N}$$

or

$$X_\text{updated} = \underbrace{\text{cov}(X, X)}_{n \times n} \underbrace{G^T}_{n \times 1} \underbrace{(G \text{cov}(X, X) G^T + \alpha C_D)^{-1}}_{1 \times 1} \underbrace{(d_j - y_j)}_{1 \times N}$$

If we know that $X$ is sampled from a multivariate normal, then we know $\text{cov}(X, X)$ and it becomes diagonal. Assume it's the identity, then

$$X_\text{updated} = \underbrace{G^T}_{n \times 1} (\underbrace{G}_{1 \times n} \underbrace{G^T}_{n \times 1} + \underbrace{\alpha C_D}_{1 \times 1})^{-1} \underbrace{(d_j - y_j)}_{1 \times N}$$

which has the structure of a rank-1 update.

tommyod · 2023-12-14T09:56:03Z

Så litt på å loope over observasjoner $y_i$. Problemet slik jeg ser det er at vi ikke vet hvilke parametre som korrelerer med $y_i$. Når vi vet det må disse lastes fra disk. Så hvis $y_1$ er korrelert med ett parametersett $X_1$, og $y_2$ med $X_2$, etc så må vi laste disse fra disk for å oppdatere.

Se for deg en situasjon der det er lite overlapp. Det blir mye lasting og skriving til disk. Eksempel: vi har $10$ parametre i $X$. Vi kan kun holde $3$ i minnet samtidig. $y_1$ sier at $\{1, 2, 3, 4\}$ skal oppdateres, da må vi laste 2 grupper, f.eks. $\{1, 2\}$ og $\{3, 4\}$. Så sier $y_2$ at $\{1, 2\}$ skal oppdateres - selv om vi hadde disse i minne for litt siden er de nå skrevet til disk, så da må vi laste de på nytt. Så sier $y_3$ at $\{1, 7\}$ skal oppdateres, da må vi nesten hente 7 fra disk også - sikkert ved å skrive $\{1,2\}$ først så laste $\{1, 7\}$ på nytt.

Blunde1 · 2023-12-14T10:52:32Z

Problemet du beskriv er ikkje gitt er eit problem, men ein trade-off mellom skriving til og frå disk vs. invertering av dense store matriser svært mange gangar. I algoritmen over har vi ein loop over observasjonar, så ein loop over realisasjonar, så ein loop over parametrar. Parametrar kan lastast i store batchar, så det bør ikkje vere eit kjempeproblem?

Der er to problem eg ser som er urelaterte til det over:

H = LLS coefficients yj on X[mask,:] # 1xp dersom dette krevjer samtlege parametrar i minnet er det eit problem for minne. Eg er dog ikkje sikker på om alle i minnet er nødvendig for LLS. Veit du noko her?
yji blir approksimert med H_j @ x_i medan i ES++ har vi yji = h_j(x_i) (H er average gradient på h). Dette introduserer difor ein approksimasjon, som kan eller ikkje kan vere gjev.

Blunde1 · 2023-12-18T10:42:49Z

Did a first stab at this here: https://github.com/Blunde1/iterative_ensemble_smoother/blob/es-observation-loop/docs/source/LinearRegression.py

tommyod mentioned this issue Nov 21, 2023

Adaptive localization #168

Merged

tommyod added the enhancement New feature or request label Dec 14, 2023

eivindjahren added christmas-review Issues and PRs for Christmas review and removed christmas-review Issues and PRs for Christmas review labels Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible speedup of localization by looping observations #175

Possible speedup of localization by looping observations #175

tommyod commented Nov 21, 2023

tommyod commented Dec 6, 2023 •

edited

Loading

tommyod commented Dec 14, 2023 •

edited

Loading

Blunde1 commented Dec 14, 2023

Blunde1 commented Dec 18, 2023

Possible speedup of localization by looping observations #175

Possible speedup of localization by looping observations #175

Comments

tommyod commented Nov 21, 2023

tommyod commented Dec 6, 2023 • edited Loading

tommyod commented Dec 14, 2023 • edited Loading

Blunde1 commented Dec 14, 2023

Blunde1 commented Dec 18, 2023

tommyod commented Dec 6, 2023 •

edited

Loading

tommyod commented Dec 14, 2023 •

edited

Loading