Skip to content

Commit

Permalink
resolved merge conflicts
Browse files Browse the repository at this point in the history
  • Loading branch information
Tommy Odland committed Nov 24, 2023
2 parents d544242 + 31e9b54 commit c7049fb
Show file tree
Hide file tree
Showing 9 changed files with 246 additions and 46 deletions.
2 changes: 1 addition & 1 deletion .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ python:
- method: pip
path: .
extra_requirements:
- docs
- doc

sphinx:
builder: html
Expand Down
30 changes: 30 additions & 0 deletions LITERATURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Literature

A list of literature that we find useful for Data Assimilation.
Could be moved to `docs/source` eventually.

## Papers

### SIES

- (2019) [Efficient Implementation of an Iterative Ensemble Smoother for Data Assimilation and Reservoir History Matching](https://www.frontiersin.org/articles/10.3389/fams.2019.00047/full) by Evensen et al

### ESMDA

- (2012) [History matching time-lapse seismic data using the ensemble Kalman filter with multiple data assimilations](https://link.springer.com/article/10.1007/s10596-012-9275-5) by Emerick et al
- (2013) [Ensemble smoother with multiple data assimilation](https://www.sciencedirect.com/science/article/abs/pii/S0098300412000994) by Emerick et al

### Covariance regularization

- (2022) [GraphSPME: Markov Precision Matrix Estimation and Asymptotic Stein-Type Shrinkage](https://arxiv.org/abs/2205.07584) by Lunde et al

## Books

### Preliminaries

- (2006) [Pattern Recognition and Machine Learning](https://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738) by Bishop

### Data Assimilation

- (2014) [Data Assimilation: The Ensemble Kalman Filter](https://www.amazon.com/Data-Assimilation-Ensemble-Kalman-Filter/dp/3642424767/) by Evensen
- (2022) [Data Assimilation Fundamentals: A Unified Formulation of the State and Parameter Estimation Problem](https://www.amazon.com/Data-Assimilation-Fundamentals-Formulation-Environment/dp/3030967085/) by Evensen
20 changes: 9 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,22 +26,20 @@ This algorithm is particularly effective for problems with a large number of par
pip install iterative_ensemble_smoother
```

## Usage

**iterative_ensemble_smoother** mainly implements `SIES` and `ESMDA` classes. Check out
the examples section to see how to use them.


## Building from source

To build **iterative_ensemble_smoother** from source:
If you want to do development, then run:

```bash
```text
git clone https://github.com/equinor/iterative_ensemble_smoother.git
cd iterative_ensemble_smoother
pip install .
<create environment>
pip install --editable '.[doc,dev]'
```

## Usage

**iterative_ensemble_smoother** mainly implements the two classes `SIES` and `ESMDA`.
Check out the examples section to see how to use them.

## Building the documentation

```bash
Expand Down
172 changes: 172 additions & 0 deletions docs/source/LinearRegression.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# ---
# jupyter:
# jupytext:
# formats: ipynb,py:percent
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.15.2
# kernelspec:
# display_name: Python 3 (ipykernel)
# language: python
# name: python3
# ---

# %%
# ruff: noqa: E402

# %% [markdown]
# # Linear regression with ESMDA
#
# We solve a linear regression problem using ESMDA.
# First we define the forward model as $g(x) = Ax$,
# then we set up a prior ensemble on the linear
# regression coefficients, so $x \sim \mathcal{N}(0, 1)$.
#
# As shown in the 2013 paper by Emerick et al, when a set of
# inflation weights $\alpha_i$ is chosen so that $\sum_i \alpha_i^{-1} = 1$,
# ESMDA yields the correct posterior mean for the linear-Gaussian case.

# %% [markdown]
# ## Import packages

# %%
import numpy as np
from matplotlib import pyplot as plt

from iterative_ensemble_smoother import ESMDA

# %% [markdown]
# ## Create problem data
#
# Some settings worth experimenting with:
#
# - Decreasing `prior_std=1` will pull the posterior solution toward zero.
# - Increasing `num_ensemble` will increase the quality of the solution.
# - Increasing `num_observations / num_parameters`
# will increase the quality of the solution.

# %%
num_parameters = 25
num_observations = 100
num_ensemble = 30
prior_std = 1

# %%
rng = np.random.default_rng(42)

# Create a problem with g(x) = A @ x
A = rng.standard_normal(size=(num_observations, num_parameters))


def g(X):
"""Forward model."""
return A @ X


# Create observations: obs = g(x) + N(0, 1)
x_true = np.linspace(-1, 1, num=num_parameters)
observation_noise = rng.standard_normal(size=num_observations)
observations = g(x_true) + observation_noise

# Initial ensemble X ~ N(0, prior_std) and diagonal covariance with ones
X = rng.normal(size=(num_parameters, num_ensemble)) * prior_std

# Covariance matches the noise added to observations above
covariance = np.ones(num_observations)

# %% [markdown]
# ## Solve the maximum likelihood problem
#
# We can solve $Ax = b$, where $b$ is the observations,
# for the maximum likelihood estimate.
# Notice that unlike using a Ridge model,
# solving $Ax = b$ directly does not use any prior information.

# %%
x_ml, *_ = np.linalg.lstsq(A, observations, rcond=None)

plt.figure(figsize=(8, 3))
plt.scatter(np.arange(len(x_true)), x_true, label="True parameter values")
plt.scatter(np.arange(len(x_true)), x_ml, label="ML estimate (no prior)")
plt.xlabel("Parameter index")
plt.ylabel("Parameter value")
plt.grid(True, ls="--", zorder=0, alpha=0.33)
plt.legend()
plt.show()

# %% [markdown]
# ## Solve using ESMDA
#
# We crease an `ESMDA` instance and solve the Guass-linear problem.

# %%
smoother = ESMDA(
covariance=covariance,
observations=observations,
alpha=5,
seed=1,
)

X_i = np.copy(X)
for i, alpha_i in enumerate(smoother.alpha, 1):
print(
f"ESMDA iteration {i}/{smoother.num_assimilations()}"
+ f" with inflation factor alpha_i={alpha_i}"
)
X_i = smoother.assimilate(X_i, Y=g(X_i))


X_posterior = np.copy(X_i)

# %% [markdown]
# ## Plot and compare solutions
#
# Compare the true parameters with both the ML estimate
# from linear regression and the posterior means obtained using `ESMDA`.

# %%
plt.figure(figsize=(8, 3))
plt.scatter(np.arange(len(x_true)), x_true, label="True parameter values")
plt.scatter(np.arange(len(x_true)), x_ml, label="ML estimate (no prior)")
plt.scatter(
np.arange(len(x_true)), np.mean(X_posterior, axis=1), label="Posterior mean"
)
plt.xlabel("Parameter index")
plt.ylabel("Parameter value")
plt.grid(True, ls="--", zorder=0, alpha=0.33)
plt.legend()
plt.show()

# %% [markdown]
# We now include the posterior samples as well.

# %%
plt.figure(figsize=(8, 3))
plt.scatter(np.arange(len(x_true)), x_true, label="True parameter values")
plt.scatter(np.arange(len(x_true)), x_ml, label="ML estimate (no prior)")
plt.scatter(
np.arange(len(x_true)), np.mean(X_posterior, axis=1), label="Posterior mean"
)

# Loop over every ensemble member and plot it
for j in range(num_ensemble):
# Jitter along the x-axis a little bit
x_jitter = np.arange(len(x_true)) + rng.normal(loc=0, scale=0.1, size=len(x_true))

# Plot this ensemble member
plt.scatter(
x_jitter,
X_posterior[:, j],
label=("Posterior values" if j == 0 else None),
color="black",
alpha=0.2,
s=5,
zorder=0,
)
plt.xlabel("Parameter index")
plt.ylabel("Parameter value")
plt.grid(True, ls="--", zorder=0, alpha=0.33)
plt.legend()
plt.show()
5 changes: 2 additions & 3 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,10 @@
version = re.sub(r"\.dev.*$", r".dev", ies.__version__)
release = version

# convert the python file to a notebook
# Convert Python files to notebooks
check_output(["jupytext", "Polynomial.py", "-o", "Polynomial.ipynb"])

# Do the same for this file
check_output(["jupytext", "Oscillator.py", "-o", "Oscillator.ipynb"])
check_output(["jupytext", "LinearRegression.py", "-o", "LinearRegression.ipynb"])

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
Expand Down
1 change: 1 addition & 0 deletions docs/source/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,6 @@ Example Usage

Polynomial
Oscillator
LinearRegression

* :ref:`genindex`
26 changes: 13 additions & 13 deletions src/iterative_ensemble_smoother/esmda.py
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ def assimilate(
corresponds to a parameter in the model, and each column corresponds
to an ensemble member (realization).
Y : np.ndarray
2D array of shape (num_parameters, ensemble_size), containing
2D array of shape (num_observations, ensemble_size), containing
responses when evaluating the model at X. In other words, Y = g(X),
where g is the forward model.
overwrite : bool
Expand All @@ -257,7 +257,7 @@ def assimilate(
Returns
-------
X_posterior : np.ndarray
2D array of shape (num_inputs, num_ensemble_members).
2D array of shape (num_parameters, num_ensemble_members).
"""
if self.iteration >= self.num_assimilations():
Expand All @@ -270,9 +270,9 @@ def assimilate(
num_ensemble == num_emsemble2
), "Number of ensemble members in X and Y must match"
if not np.issubdtype(X.dtype, np.floating):
raise TypeError("Argument `X` must be contain floats")
raise TypeError("Argument `X` must contain floats")
if not np.issubdtype(Y.dtype, np.floating):
raise TypeError("Argument `Y` must be contain floats")
raise TypeError("Argument `Y` must contain floats")

assert 0 < truncation <= 1.0

Expand Down Expand Up @@ -316,15 +316,15 @@ def compute_transition_matrix(
alpha: float,
truncation: float = 1.0,
) -> npt.NDArray[np.double]:
"""Return a matrix K such that X_posterior = X_prior + X_prior @ K.
"""Return a matrix T such that X_posterior = X_prior + X_prior @ T.
The purpose of this method is to facilitate row-by-row, or batch-by-batch,
updates of X. This is useful if X is too large to fit in memory.
Parameters
----------
Y : np.ndarray
2D array of shape (num_parameters, ensemble_size), containing
2D array of shape (num_observations, ensemble_size), containing
responses when evaluating the model at X. In other words, Y = g(X),
where g is the forward model.
alpha : float
Expand All @@ -338,19 +338,19 @@ def compute_transition_matrix(
Returns
-------
K : np.ndarray
A matrix K such that X_posterior = X_prior + X_prior @ K.
T : np.ndarray
A matrix T such that X_posterior = X_prior + X_prior @ T.
It has shape (num_ensemble_members, num_ensemble_members).
"""

# Recall the update equation:
# X += C_MD @ (C_DD + alpha * C_D)^(-1) @ (D - Y)
# X += X @ center(Y).T / (N-1) @ (C_DD + alpha * C_D)^(-1) @ (D - Y)
# We form K := center(Y).T / (N-1) @ (C_DD + alpha * C_D)^(-1) @ (D - Y),
# We form T := center(Y).T / (N-1) @ (C_DD + alpha * C_D)^(-1) @ (D - Y),
# so that
# X_new = X_old + X_old @ K
# X_new = X_old + X_old @ T
# or
# X += X @ K
# X += X @ T

D = self.perturb_observations(size=Y.shape, alpha=alpha)
inversion_func = self._inversion_methods[self.inversion]
Expand All @@ -359,9 +359,9 @@ def compute_transition_matrix(
C_D=self.C_D,
D=D,
Y=Y,
X=None, # We don't need X to compute the factor K
X=None, # We don't need X to compute the factor T
truncation=truncation,
return_K=True, # Ensures that we don't need X
return_T=True, # Ensures that we don't need X
)


Expand Down
Loading

0 comments on commit c7049fb

Please sign in to comment.