pyshemkf
(py
thon environment for handling shem
at-suite output
of en kf
simulations) is a python environment for post-processing
SHEMAT-Suite (Rath 2006) output.
pyshemkf
can generate graphs and 2D-figures of variables and
parameters provided by typical output of SHEMAT-Suite.
Additionally, pyshemkf
contains scripts and functions for the
evaluation and visualization of RMSE distributions from Ensemble
Kalman Filter (EnKF, Evensen 2003) synthetic experiments computed with
SHEMAT-Suite.
For Python scripts used in the WRR paper Keller 2018 written by the author, see Manuscript scripts. The version of the Python scripts, which corresponds to the time of the work on the manuscript, has been saved under:
The data used by the Python scripts is saved under:
The scripts are written for python3.7.3
. They make use of the
following Python modules:
matplotlib
,matplotlib.pyplot
,matplotlib.cm
,matplotlib.colors
,matplotlib.mlab
are used for plottingnumpy
,scipy
,scipy.stats
provide numerical and matrix functionsshutil
,os
,subprocess
,shlex
are used to run commands on the operating systemstring
provides string manipulation utilitiesvtk
andh5py
provide functions to read vtk and hdf files.
The location of the included module pskf
has to be added to the
environment variable PYTHONPATH
:
export PYTHONPATH=${PYTHONPATH}:<path_to>/pyshemkf/site-packages
Another important prerequisite of pyshemkf
is the proper
SHEMAT-Suite output directory structure (see SHEMAT-Suite output
directory).
Getting started with pyshemkf
. This tutorial explains the usage of a
typical script of pyshemkf
.
A good point to start is the script runplot.py
in the forward
directory:
$HOME/pyshemkf/scripts/forward/runplot.py
This script is made up of different parts. We will discuss the first three parts of the script.
- Loading Python modules
- Setting switches
- Setting output specifications
At first, Python modules are loaded
import numpy as np
import matplotlib.pyplot as plt
from pskf.tools.plot import specs as sc
import pskf.scripts.forward.arrays as fa
import pskf.scripts.forward.plot as fp
import pskf.scripts.forward.read as fr
import pskf.tools.run.pythonmodule as pm
import pskf.tools.plot.plotfunctions as pf
The first lines load general modules, the later lines load modules
from the included modules pskf
(see Module =pskf=).
The next part of the script contains switches.
# Switches
is_read = 0
is_plot = 0
is_save = 0
is_show = 0
is_backup = 0
These switches determine, whether following parts of the script are executed.
After the switches, output specifications are set. These
specifications always include the model name, the date of the output
directory and the letter of the output directory. Additionally,
tag-specific specifications can be set. For the tag forward
, the
name of the variable to be included in the figure is saved under
varname
.
# Specs
model_name = 'model'
dat = '2010_01_30'
let = 'a'
varname = 'uindex'
At this point, we have discussed the first part of a typical script.
Now please run the IPython script
ipython $HOME/pyshemkf/scripts/forward/runplot.ipy
Since all switches are set to zero, this execution should have
produced no output. Executing the script was rather to test whether
all Python modules are loaded. If you receive an ImportError
, you
should do one of the following two things:
ImportError
fornumpy
ormatplotlib
: Check your general Python setup.ImportError
forpskf
: Most probably,PYTHONPATH
is not set to includepyshemkf
(see Prerequisites)
If no error message is shown, you can move forward to Reading SHEMAT-Suite output.
If all Python modules are loaded correctly, the is_read
switch can
be set to 1
.
# Switches
is_read = 1
is_plot = 0
is_save = 0
is_show = 0
is_backup = 0
If you re-run the script now, the execution will yield the following error:
IOError: [Errno 2] No such file or directory:
'$HOME/shematOutputDir/model_output/2010_01_30/2010_01_30_a/samples_output/MODEL_EO_time_out_0.vtk'
The directory in the error message is generated from the
specifications under # Specs
. The error message tells us which
output file the script is looking for and how the SHEMAT-Suite output
directories should be structured.
To find the right file we need to:
- Set up the correct SHEMAT-Suite output directory
- Put the right specifications under
# Specs
.
If the output files are still not found, the function fr.read
offers
the possibility to explicitly specify directory and name of a given
vtk-output using the function parameters fname
and fdir
.
# Read
if is_read:
numpy_array, numpy_array_name = fr.read(
model_name,
dat,
let,
varname=varname,
fname='example_output.vtk'
fdir='example_dir/to/output'
)
np.save(numpy_array_name, numpy_array)
print('Saved as ' + numpy_array_name)
After successfully saving the python array, its name is printed.
Saved as $HOME/pyshemkf/output/forward/npy/<varname>_<model_name>_<dat>_<let>_1.npy
If this message is displayed after executing runplot.py
, you can
move on to Plotting.
If the numpy array is saved, the switches of runplot.py
can be
changed as follows:
# Switches
is_read = 0
is_plot = 1
is_save = 0
is_show = 1
is_backup = 0
The output specifications can (and should) be left the same as for reading the output files.
While calling the plotting routine, the appearance of the plot can be influenced through input parameters.
# Plot
if is_plot:
# Figure
fig = plt.figure(1, figsize=[15, 10])
# Run plot function
ax, pic_name = fp.plot(
fig.add_subplot(1, 1, 1),
model_name,
dat,
let,
varname=varname,
)
# Monitoring points
ax = pf.scatter(
ax,
model_name,
dat,
let,
)
# Colorbar
cb_ax = pf.cb(
fig.add_subplot(1, 2, 1),
ax,
varname=varname,
)
# Save
if is_save:
plt.savefig(pic_name)
print('Saved as ' + pic_name)
# Show
if is_show:
plt.show()
else:
plt.clf()
Via the switch is_save
, the figure can be saved, via the switch
is_backup
, a backup of runplot.ipy
is generated in the
subdirectory backup/
.
In the case of forward
, monitoring points are included as well as a
colorbar. If these function calls cause any problems (for example,
when there are no monitoring points in the given SHEMAT-Suite output),
they can be removed.
First, the directory structure of pyshemkf
is explained. Then, a
naming convention for directories of SHEMAT-Suite output is
introduced. This naming convention is required for compatibility with
pyshemkf
.
There are three directories in the root directory of pyshemkf
: One
for output, one for IPython-scripts and one for the Python module
pskf
.
Directory for all output. output/
has one subdirectory for each tag
(see Tags). Each of these tag-subdirectories contains subdirectories,
whose names correspond to file endings: npy/
, png/
, pdf/
and
eps/
. The scripts of pyshemkf
write output of a format to the
directory with the corresponding name. Example:
$HOME/pyshemkf/output/pdf/example_output.pdf
The directories dists
(output/dists/) and specs
(output/specs/)
contain only numpy arrays in the subdirectory npy/
. dists
contains
RMSE distributions, specs
contains specifications of the simulated
model (for example the discretization).
IPython scripts for reading and plotting SHEMAT-Suite output sorted by tags (see Tags, Scripts).
Module containing functions used by the IPython scripts of pyshemkf
.
Some functions (for reading and plotting) are meant to be used by
specific IPython scripts in /scripts
, others are general functions
used throughout pyshemkf
(see Module =pskf=).
For the module pskf
to be loaded by Python, its path has to be added
to the environment variable PYTHONPATH
(see Prerequisites).
pyshemkf
needs a specific naming convention of SHEMAT-Suite output
directories. A single output directory should be named as follows:
$HOME/shematOutputDir/<model_name>_output/<dat>/<dat>_<let>
An example with <model_name>=wavereal
, <dat>=2010_01_30
,
<let>=a
:
$HOME/shematOutputDir/wavereal_output/2010_01_30/2010_01_30_a
Inside the SHEMAT-Suite output directories, input files are saved alongside output directories.
- Input files
- general input file
<MODEL_NAME>
(WAVEREAL
) - true input file
<MODEL_NAME>_TRUE
(WAVEREAL_TRUE
) - EnKF input file
<MODEL_NAME>.enkf
(WAVEREAL.enkf
) - SGSim input files
sgsim_k_<modelname>_true.par
(sgsim_k_wavereal_true
),sgsim_k_<modelname>.par
(sgsim_k_wavereal
)
- general input file
- Output directories
samples_output/
: forward outputenkf_output/
: EnKF outputsingle_cell_output/
: output at single cells
Tags are used to organize different groups of read and plot
routines. They determine the output-path, the script-path and the path
of to the function definitions of pskf
.
There are two groups of tags in pyshemkf
, corresponding roughly to
the following functionalities: analysis
, forward
, monitor
and
singlecell
are scripts reading general SHEMAT-Suite. errorplot
,
gaussianity
and numcomp
provide visualization of RMSE
distributions of large numbers of EnKF synthetic experiments.
2D-Images of ensemble mean variable/parameter fields or single realization variable/parameter fields from EnKF-simulations in SHEMAT-Suite.
Figures showing RMSE means of different EnKF-methods.
2D-Images of variable/parameter fields in a single forward run of SHEMAT-Suite.
RMSE distributions from a large number of EnKF synthetic experiments with SHEMAT-Suite.
Visualizing monitoring point output from SHEMAT-Suite.
Matrix plots visualizing RMSE statistics from a large number of EnKF synthetic experiments with SHEMAT-Suite.
Visualizing single cell output from SHEMAT-Suite.
The script endresread.ipy
(scripts/endresread.ipy) is not part of
one of the scripting tags. It has the preliminary task of reading RMSE
distributions from SHEMAT-Suite
output.
For each tag, there is a runplot.ipy general script that calls the
read and plot functions from pskf
(see Module =pskf=). If wanted,
numpy arrays and figures are saved, figures are shown and a backup of
the script is generated in the corresponding backup
directory.
A /scripts/templates
directory will not be part of the
git-repository and can for example be used for new scripts, before
they are ready to be committed to the repository.
The following scripts generate the figures of the WRR manuscript associated with this repository:
- /scripts/forward/runplot_figure_1_a.ipy
- /scripts/forward/runplot_figure_1_b.ipy
- /scripts/errorplot/runplot_figure_2_a.ipy
- /scripts/errorplot/runplot_figure_2_b.ipy
- /scripts/errorplot/runplot_figure_2_c.ipy
- /scripts/numcomp/runplot_figure_3.ipy
- /scripts/errorplot/runplot_figure_4_a.ipy
- /scripts/errorplot/runplot_figure_4_b.ipy
- /scripts/errorplot/runplot_figure_4_c.ipy
- /scripts/numcomp/runplot_figure_5.ipy
- /scripts/errorplot/runplot_figure_6_a.ipy
- /scripts/errorplot/runplot_figure_6_b.ipy
- /scripts/gaussianity/runplot_figure_7_a.ipy
- /scripts/gaussianity/runplot_figure_7_b.ipy
- /scripts/errorplot/runplot_figure_8_a.ipy
- /scripts/errorplot/runplot_figure_8_b.ipy
- /scripts/errorplot/runplot_figure_9_a.ipy
- /scripts/errorplot/runplot_figure_9_b.ipy
The functions in the scripts
directory
(/site-packages/pskf/scripts/) are tag-specific, i.e. they are meant
to be used by the runplot.ipy
scripts under a certain tag (for
example analysis
). Three typical file types exist in one tag
directory:
read.py
(Example /site-packages/pskf/scripts/analysis/read.py) contains functions for reading the specific SHEMAT-Suite output needed under a tag and turning the output into numpy arrays.plot.py
(Example /site-packages/pskf/scripts/analysis/plot.py) contains functions for plotting the numpy arrays read in underread.py
.arrays.py
orvariables.py
(Example /site-packages/pskf/scripts/analysis/arrays.py) contain useful tag-specific variables and arrays. One example is the tag name itself.
The tools
directory (/site-packages/pskf/tools/) contains general
functions (opposed to the tag-specific functions in scripts
).
General variables and functions related to plotting.
Important collection of dates, letters, number of runs and number of observations for different EnKF runs. According to this information, specifiers for the different output are defined and standardized.
Plotting functions for handling vtk-input, grid properties, colormaps, colorbars, scatterplots, hdf (not yet fully tested).
Utility functions for reading grid properties from SHEMAT-Suite output files in SHEMAT-Suite output directories. Important functions defining the specifiers used to standardize output of the IPython scripts.
Python-related directory variables
python_dir
python_scripts_dir
python_output_dir
Python-related functions for generating specific directories, filenames for saving and backups.
General utility functions for replacing strings, make temporal files,
handling letter endings of specifiers, running shell scripts, reading
and manipulating SHEMAT-Suite input files, compiling SHEMAT, running
matlab, generating lists of SHEMAT-Suite specific files and
directories. Some of these functions are used in scripts to run
SHEMAT-Suite that are not part of the pyshemkf
repository.
Rath, V., Wolf, A., & Bücker, H. M., Joint three-dimensional inversion of coupled groundwater flow and heat transfer based on automatic differentiation: sensitivity calculation, verification, and synthetic examples, Geophysical Journal International, 167(1), 453–466 (2006). http://dx.doi.org/10.1111/j.1365-246x.2006.03074.x
Evensen, G., The ensemble kalman filter: theoretical formulation and practical implementation, Ocean Dynamics, 53(4), 343–367 (2003). http://dx.doi.org/10.1007/s10236-003-0036-9
Keller, J., Hendricks Franssen, H.-J., & Marquart, G. (2018). Comparing seven variants of the ensemble Kalman filter: How many synthetic experiments are needed? Water Resources Research, 54. https://doi.org/10.1029/2018WR023374