Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Min model error #28

Merged
merged 41 commits into from
Jan 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
595ae52
Separated out data processing to own module
EricSaboya Oct 31, 2023
3dc3578
Added new basis function algorithms
EricSaboya Oct 31, 2023
013063b
updated input file to match new input variables
EricSaboya Oct 31, 2023
7166f9f
Updated to modular format
EricSaboya Oct 31, 2023
be2649d
Updated model error calculation, added offset posterior outputs, adde…
EricSaboya Oct 31, 2023
751e657
changed meas_period to averaging_period
EricSaboya Nov 1, 2023
5cbaf8a
updated to offset trace output
EricSaboya Nov 1, 2023
ae01cfe
updated offset trace
EricSaboya Nov 1, 2023
4fed2e4
changed meas_period to averaging_period
EricSaboya Nov 1, 2023
0120081
updated input files with new input variables
EricSaboya Nov 1, 2023
5450497
added option to save or reload a merged data object
Nov 14, 2023
54e09b9
Merge pull request #27 from openghg/save_merged_data_option
EricSaboya Nov 15, 2023
afab013
Added option for getting individual fward sims for flux sectors
EricSaboya Nov 15, 2023
dcc8d13
Merge branch 'iss18_filter_error' into eric_devel
brendan-m-murphy Nov 24, 2023
cc7af06
Added minor input validation to check_str
brendan-m-murphy Dec 4, 2023
75df746
Added --kwargs command line option to run_hbmcmc
brendan-m-murphy Dec 5, 2023
cc34473
Merge branch 'main' into min_model_error
brendan-m-murphy Dec 5, 2023
6d73dc1
Modifications to make tests work.
brendan-m-murphy Dec 5, 2023
63667da
Added min_error parameter to inversions.
brendan-m-murphy Dec 5, 2023
9b312a2
Fixed error in `get_country` class
brendan-m-murphy Dec 5, 2023
4b8b117
Converted `get_country` from class to function
brendan-m-murphy Dec 5, 2023
c1ffd5f
Added option to pass flux from fp_all to postproc
brendan-m-murphy Dec 6, 2023
4c82ee2
Added option to save arviz InferenceData
brendan-m-murphy Dec 7, 2023
646a2dd
Fixed typing to work with python 3.9
brendan-m-murphy Dec 7, 2023
27b0792
Added PBLH filter from co2 branch
brendan-m-murphy Dec 12, 2023
5182034
Added missing code for pblh filter
brendan-m-murphy Dec 12, 2023
ed5ab02
added option for store name to addaveraging error, and bug fix in emi…
Jan 25, 2024
f43fc82
bug fix to include tempdir=None when not using basis_algorithm
Jan 25, 2024
07f67b8
bug fixes for basis function creation
Jan 25, 2024
0f659aa
added print out of n obs removed when filtering
Jan 25, 2024
7d1e122
brought hbmcmc_post_process up to date so it can be used to calc the …
Jan 25, 2024
854e161
added try/except loop to drop sites from an inversion when data proce…
Jan 25, 2024
2bc0c2c
bug fixes for saving a merged data pickle and dropping sites when rea…
Jan 25, 2024
95dc372
prior fluxes are now extracted from the merged data object, and the c…
Jan 25, 2024
898904e
Merge branch 'update_CI_requirements' into min_model_error
brendan-m-murphy Jan 26, 2024
127b816
Updates in response to comments
brendan-m-murphy Jan 26, 2024
1938b6e
Merge pull request #39 from openghg/min_model_error_alice_updates
brendan-m-murphy Jan 26, 2024
ba62c2a
Merged in main-formatting
brendan-m-murphy Jan 26, 2024
8722c6b
Added change log
brendan-m-murphy Jan 29, 2024
23b8dae
Merge branch 'main' into min_model_error
brendan-m-murphy Jan 29, 2024
c46dbe3
Removed errant parameter from function call
brendan-m-murphy Jan 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# OpenGHG Inversions Change Log

# Version 0.3 (not yet released)

- Formatted code base using `black` with line length 110. Configuration files set up for `black` and `flake8` with line length 110.

275 changes: 275 additions & 0 deletions openghg_inversions/basis_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -253,3 +253,278 @@ def qtoptim(x):
mode="w",
)
return outputdir


# BUCKET BASIS FUNCTIONS
def load_landsea_indices():
"""
Load UKMO array with indices that separate
land and sea regions in EUROPE domain
--------------
land = 1
sea = 0
"""
landsea_indices = xr.open_dataset("../countries/country-EUROPE-UKMO-landsea-2023.nc")
return landsea_indices["country"].values


def bucket_value_split(grid, bucket, offset_x=0, offset_y=0):
"""
Algorithm that will split the input grid (e.g. fp * flux)
such that the sum of each basis function region will
equal the bucket value or by a single array element.

The number of regions will be determined by the bucket value
i.e. smaller bucket value ==> more regions
larger bucket value ==> fewer regions
------------------------------------
Args:
grid: np.array
2D grid of footprints * flux, or whatever
grid you want to split. Could be: population
data, spatial distribution of bakeries, you chose!

bucket: float
Maximum value for each basis function region

Returns:
array of tuples that define the indices for each basis function region
[(ymin0, ymax0, xmin0, xmax0), ..., (yminN, ymaxN, xminN, xmaxN)]
"""

if np.sum(grid) <= bucket or grid.shape == (1, 1):
return [(offset_y, offset_y + grid.shape[0], offset_x, offset_x + grid.shape[1])]

else:
if grid.shape[0] >= grid.shape[1]:
half_y = grid.shape[0] // 2
return bucket_value_split(grid[0:half_y, :], bucket, offset_x, offset_y) + bucket_value_split(
grid[half_y:, :], bucket, offset_x, offset_y + half_y
)

elif grid.shape[0] < grid.shape[1]:
half_x = grid.shape[1] // 2
return bucket_value_split(grid[:, 0:half_x], bucket, offset_x, offset_y) + bucket_value_split(
grid[:, half_x:], bucket, offset_x + half_x, offset_y
)


# Optimize bucket value to number of desired regions
def get_nregions(bucket, grid):
"""Returns no. of basis functions for bucket value"""
return np.max(bucket_split_landsea_basis(grid, bucket))


def optimize_nregions(bucket, grid, nregion, tol):
"""
Optimize bucket value to obtain nregion basis functions
within +/- tol.
"""
# print(bucket, get_nregions(bucket, grid))
if get_nregions(bucket, grid) <= nregion + tol and get_nregions(bucket, grid) >= nregion - tol:
return bucket

if get_nregions(bucket, grid) < nregion + tol:
bucket = bucket * 0.995
return optimize_nregions(bucket, grid, nregion, tol)

elif get_nregions(bucket, grid) > nregion - tol:
bucket = bucket * 1.005
return optimize_nregions(bucket, grid, nregion, tol)


def bucket_split_landsea_basis(grid, bucket):
"""
Same as bucket_split_basis but includes
land-sea split. i.e. basis functions cannot overlap sea and land
------------------------------------
Args:
grid: np.array
2D grid of footprints * flux, or whatever
grid you want to split. Could be: population
data, spatial distribution of bakeries, you chose!

bucket: float
Maximum value for each basis function region

Returns:
2D array with basis function values

"""
landsea_indices = load_landsea_indices()
myregions = bucket_value_split(grid, bucket)

mybasis_function = np.zeros(shape=grid.shape)

for i in range(len(myregions)):
ymin, ymax = myregions[i][0], myregions[i][1]
xmin, xmax = myregions[i][2], myregions[i][3]

inds_y0, inds_x0 = np.where(landsea_indices[ymin:ymax, xmin:xmax] == 0)
inds_y1, inds_x1 = np.where(landsea_indices[ymin:ymax, xmin:xmax] == 1)

count = np.max(mybasis_function)

if len(inds_y0) != 0:
count += 1
for i in range(len(inds_y0)):
mybasis_function[inds_y0[i] + ymin, inds_x0[i] + xmin] = count

if len(inds_y1) != 0:
count += 1
for i in range(len(inds_y1)):
mybasis_function[inds_y1[i] + ymin, inds_x1[i] + xmin] = count

return mybasis_function


def nregion_landsea_basis(grid, bucket=1, nregion=100, tol=1):
"""
Obtain basis function with nregions (for land-sea split)
------------------------------------
Args:
grid: np.array
2D grid of footprints * flux, or whatever
grid you want to split. Could be: population
data, spatial distribution of bakeries, you chose!

bucket: float
Initial bucket value for each basis function region.
Defaults to 1

nregion: int
Number of desired basis function regions
Defaults to 100

tol: int
Tolerance to find number of basis function regions.
i.e. optimizes nregions to +/- tol
Defaults to 1

Returns:
basis_function np.array
2D basis function array

"""
bucket_opt = optimize_nregions(bucket, grid, nregion, tol)
basis_function = bucket_split_landsea_basis(grid, bucket_opt)
return basis_function


def bucketbasisfunction(
emissions_name,
fp_all,
sites,
start_date,
domain,
species,
outputname,
outputdir=None,
nbasis=100,
abs_flux=False,
):
"""
Basis functions calculated using a weighted region approach
where each basis function / scaling region contains approximately
the same value
-----------------------------------
Args:
emissions_name (str/list):
List of keyword "source" args used for retrieving emissions files
from the Object store.
fp_all (dict):
fp_all dictionary object as produced from get_data functions
sites (str/list):
List of measurements sites being used.
start_date (str):
Start date of period of inference
domain (str):
Name of model domain
species (str):
Name of atmospheric species of interest
outputname (str):
Name of inversion run
outputdir (str):
Directory where inversion run outputs are saved
nbasis (int):
Desired number of basis function regions
abs_flux (bool):
When set to True uses absolute values of a flux array

"""
if abs_flux:
print("Using absolute values of flux array")
if emissions_name == None:
flux = (
np.absolute(fp_all[".flux"]["all"].data.flux.values)
if abs_flux
else fp_all[".flux"]["all"].data.flux.values
)
meanflux = np.squeeze(flux)
else:
if isinstance(fp_all[".flux"][emissions_name[0]], dict):
arr = fp_all[".flux"][emissions_name[0]]
flux = np.absolute(arr.data.flux.values) if abs_flux else arr.data.flux.values
meanflux = np.squeeze(flux)
else:
flux = (
np.absolute(fp_all[".flux"][emissions_name[0]].data.flux.values)
if abs_flux
else fp_all[".flux"][emissions_name[0]].data.flux.values
)
meanflux = np.squeeze(flux)
meanfp = np.zeros((fp_all[sites[0]].fp.shape[0], fp_all[sites[0]].fp.shape[1]))
div = 0
for site in sites:
meanfp += np.sum(fp_all[site].fp.values, axis=2)
div += fp_all[site].fp.shape[2]
meanfp /= div

if meanflux.shape != meanfp.shape:
meanflux = np.mean(meanflux, axis=2)
fps = meanfp * meanflux

bucket_basis = nregion_landsea_basis(fps, 1, nbasis)

lon = fp_all[sites[0]].lon.values
lat = fp_all[sites[0]].lat.values

base = np.expand_dims(bucket_basis, axis=2)

time = [pd.to_datetime(start_date)]
newds = xr.Dataset(
{"basis": (["lat", "lon", "time"], base)},
coords={"time": (["time"], time), "lat": (["lat"], lat), "lon": (["lon"], lon)},
)
newds.lat.attrs["long_name"] = "latitude"
newds.lon.attrs["long_name"] = "longitude"
newds.lat.attrs["units"] = "degrees_north"
newds.lon.attrs["units"] = "degrees_east"
newds.attrs["creator"] = getpass.getuser()
newds.attrs["date created"] = str(pd.Timestamp.today())

if outputdir is None:
cwd = os.getcwd()
tempdir = os.path.join(cwd, f"Temp_{str(uuid.uuid4())}")
os.mkdir(tempdir)
os.mkdir(os.path.join(tempdir, f"{domain}/"))
newds.to_netcdf(
os.path.join(
tempdir,
domain,
f"weighted_{species}-{outputname}_{domain}_{start_date.split('-')[0]}{start_date.split('-')[1]}.nc",
),
mode="w",
)
return tempdir
else:
basisoutpath = os.path.join(outputdir, domain)
if not os.path.exists(basisoutpath):
os.makedirs(basisoutpath)
newds.to_netcdf(
os.path.join(
basisoutpath,
f"weighted_{species}-{outputname}_{domain}_{start_date.split('-')[0]}{start_date.split('-')[1]}.nc",
),
mode="w",
)
4 changes: 2 additions & 2 deletions openghg_inversions/config/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,9 +193,9 @@ def str_check(string, error=True):
Returns:
string (formatted)
"""

if string is None:
return None
# Remove any ' or " symbols surrounding the input string from the config_file

string = string.strip() # Strip any whitespace just in case
if string[0] == "'" and string[-1] == "'":
string = string[1:-1]
Expand Down
Loading
Loading