About this document
- Workflow
New in this version
Introduction
Input and validation data
- Static data
- RCM time series
- River discharge observations
Methods
- Terminology
- Streams, outlets, and basins
- Discharge and RCM coverage
- Validation
Product evaluation and assessment
- Main characteristics
- Comparison with previous similar work
- Validation against observations
- Uncertainty
- Other sources of freshwater
- Summary
Product description
- Streams
- Outlets
- Basins
- Discharge
- Database access software
Conclusions
Data and code availability
Algorithms
- Streams, outlets, and basins
- Model output routing
- Makefile
Data
- Provenance
- Import Data
Quality control
- Streams, Outlets, and Basins
- Outputs
Supplemental material
- Coverage
Figures
- Overview
- Basin changes with changing k
- Bulk observation vs. RCM scatter plots
- Modified Tukey plot for all observations
- Bamber 2018
- Observations vs. RCM, map + ts + scatter
- Watson River
- Watson Adjustments
- Leverett Glacier
- Kiattuut Sermiat
- Kingigtorssuaq
- Oriartorfik
- Teqinngalip
- Kobbefjord
- Røde Elv
- Zackenberg
- Qaanaaq
- Elevation histogram
README
Appendix
- Software
Misc journal sections
References
Meta
- Software
LaTeX setup

About this document

This document is an Emacs Org Mode plain-text file with code and text embedded. If you are viewing:

A DOC, Google Doc, or PDF file, then it was generated by exporting from Org. Not all of the Org parts (code, results, comments, etc.) were exported. The Org source file is available upon request, and may be embedded in the PDF. Most non-Apple PDF viewers provide easy access to embedded or attached files.
A webpage somewhere, then this is a subset of the code and text that the website render has decided to display to you through the browser. You can choose to view the raw source and/or download it and view it locally on your computer.
A file with a org extension in something other than Emacs, then you are seeing the canonical version and the full source, but without any syntax highlighting, document structure, or the ability to execute the code blocks.
An Org file within Emacs, then this is the canonical version. You should be able to fully interact and reproduce the contents of this document, although it may require 3rd-party applications (Python, etc.) a similar Emacs configuration, and the data files. This is available upon request.

Workflow

To recreate this work

Open this file in Emacs Org Mode.
check that you have the necessary software dependencies installed. See section: Code.
Download and set up the necessary data files as per the Data section
Tangle the embedded code blocks.
- Execute C-c C-v C-t to run the (org-babel-tangle) function
Run make
- This should probably be run in an external terminal because it takes hours to days…
Update Babel result blocks throughout the document by
- Cleaning all result blocks with C-u C-c C-v k or (org-babel-remove-result-one-or-many t), then
- Executing all blocks (without :eval no) using C-c C-v C-b or (org-babel-execute-buffer)

This is captured programatically by workflow-update

New in this version

The discussion of quality control and land runoff with depth < 0 as invalid has been updated. Land runoff with depth << 0 is possible and occurs when land-source runoff enters the subglacial system and discharges subglacially at depth from a marine terminating glacier. This occurs often when runoff is sourced from nunatuks. This scenario also generates land outlets that exist outside of land basins.

Various input data products have been upgraded. BedMachine has been upgraded from v3 to v5. MAR has been upgraded from version 3.11 to 3.12, and the simulation period has been extended from 1979 through September 2019, to 1950 through December 2022. The RACMO simulation period has been extended from 1958 through September 2019 to 1958 through December 2022. Prior to 1990, the RACMO data is unchanged (6-hourly ERA-Interim forcing). From 1990 onward, we now use 3-hourly ERA5 forcing. Both the MAR and RACMO domain boundaries have been corrected, fixing a small alignment error in the initial version.

Additional metadata now includes the citet:mouginot_2019_data basins and regions and the citet:zwally_2012_data sector nearest to each outlet, and shortest distance between the outlet and the basin, region, or sector boundary. This should be used with caution - some peripheral ice cap outlets may be assigned to the ice sheet, or land-terminating outlets may be nearest to an ice basin that does not overlap with hydrological basin. The distance can be used as a filter. We also include the nearest named glacier from citet:bjork_2015, and distance to the citet:bjork_2015 point. We also include the nearest citet:mankoff_2020_solid gate and distance to gate. Because citet:mankoff_2020_solid is a continually updating product, and gate locations and IDs may change in the future, we note here that we use gates V3 (file: https://doi.org/10.22008/promice/data/ice_discharge/gates/v02/GSEWLR) that are associated with discharge V55. The gate IDs used here are also valid with many previous ice discharge versions, and likely to be valid with many following ice discharge versions.

Introduction

\introduction

Over the past decades, liquid runoff from Greenland has increased citep:mernild_2012,bamber_2018_freshwater,trusel_2018,perner_2019 contributing to mass decrease citep:sasgen_2020. When that runoff leaves the ice sheet and discharges into fjords and coastal seas, it influences a wide range of physical citep:straneo_2011,an_2012,mortensen_2013,bendtsen_2015,cowton_2015,mankoff_2016,fried_2019,cowton_2019,beckmann_2019, chemical citep:kanna_2018,balmonte_2019, and biological citep:kamenos_2012,kanna_2018,balmonte_2019 systems citep:catania_2020. The scales of the impacts range from instantaneous at the ice–ocean boundary to decadal in the distal ocean citep:gillard_2016. The influence of freshwater on multiple domains and disciplines citep:catania_2020 is the reason several past studies have estimated runoff and discharge at various temporal and spatial scales (e.g., citet:mernild_2008,mernild_2009,mernild_2010,langen_2015,ahlstrom_2017,citterio_2017,van-as_2018,bamber_2018_freshwater,perner_2019,slater_2019).

To date no product provides discharge estimates at high spatial resolution (~100 m; resolving individual streams), daily temporal resolution, for all of Greenland, covering a broad time span (1950 through September 2022), from multiple regional climate models (RCMs), and with a simple database access software to support downstream users. Here we present these data. In the following description and methods, we document the inputs, assumptions, methodologies, and results we use to estimate Greenland discharge from 1950 through December 2022.

Freshwater discharge from Greenland primarily takes three forms: solid ice from calving at marine-terminating glaciers; submarine meltwater from ice-ocean boundary melting at marine-terminating glaciers; and liquid runoff from melted inland surface ice, rain, and condensation. A recent paper by citet:mankoff_2020_ice targets the solid ice discharge plus submarine melt budget by estimating the ice flow rate across gates 5 km upstream from all fast-flowing marine-terminating glaciers in Greenland. Complementing that paper, this paper targets Greenland’s point-source liquid water discharge budget by partitioning RCM runoff estimates to all ice margin and coastal outlets. The sum of these data and citet:mankoff_2020_ice is an estimate of the majority of freshwater (in both liquid and solid form) volume flow rates into Greenland fjords. Those two terms comprise the bulk but not all freshwater - they exclude precipitation directly onto the fjord or ocean surface, as well as relatively minor contributions from evaporation and condensation, sea ice formation and melt, or subglacial basal melting.

Input and validation data

Static data

The static products (streams, outlets, and basins (Fig. fig:overview)) are derived from an ice sheet surface digital elevation model (DEM), an ice sheet bed DEM, an ice sheet mask, the land surface DEM, and an ocean mask. For the surface DEM, we use ArcticDEM v7 100 m citep:ArcticDEM. Subglacial routing uses ArcticDEM and ice thickness from BedMachine v5 citep:NSIDC_BedMachine_GL_v5,morlighem_2017. Both DEMs are referenced to the WGS84 ellipsoid. For the ice mask we use the Programme for Monitoring of the Greenland Ice Sheet (PROMICE) ice extent citep:citterio_2013. For the ocean mask we use the Making Earth System Data Records for Use in Research Environments (MEaSUREs) Greenland Ice Mapping Project (GIMP) Land Ice and Ocean Classification Mask, Version 1 citep:NSIDC_0714,howat_2014.

RCM time series

The time series product (daily discharge) is derived from daily runoff estimates from RCM calculations over the land and ice areas of Greenland. We use the Modèle Atmosphérique Régional (MAR; citet:fettweis_2017) and the Regional Atmospheric Climate Model (RACMO; citet:noel_2019). Runoff, $R$, is defined by

\begin{equation} R = ME + RA - RT - RF. \end{equation}

In Eq. \ref{eq:runoff}, $ME$ is melt, $RA$ is rainfall, $RT$ is retention, and $RF$ is refreezing. In RACMO, retention occurs only when firn is present (not with bare ice). MAR does have a delay for bare ice runoff. Neither have a delay for land runoff. Both RCM outputs were provided regridded to the same 1 km grid using an offline statistical downscaling technique based on the local vertical runoff gradient applied to the subgrid topography citep:noel_2016,fettweis_2020. MAR (v 3.13; citet:delhasse_2020) ran with 7.5 km resolution and ERA5 6 h forcing. RACMO (v 2.3p2; citet:noel_2018,noel_2019) ran with 5.5 km resolution and ERA-Interim 6-hour forcing. Runoff is assigned an uncertainty of ±15 % (Sect. \ref{sec:uncertain:RCM}).

River discharge observations

We use 10 river discharge daily time series to validate the results of this work. The name, location, time coverage, and relevant data and scientific publications associated with each of these observational data are listed in Table tbl_obs.

Location	Long	Lat	Time	Data	Publication	Fig(s).
Kiattuut Sermiat	45.33	61.21	2013	citet:hawkings_2016_data	citet:hawkings_2016	1 3 4 5 \ref{fig:Ks}
Kingigtorssuaq (Nuuk)	51.5801	64.1387	2008-2018	citet:GEM_data		1 3 4 \ref{fig:K}
Kobbefjord (Nuuk)	51.3810	64.1336	2006-2017	citet:GEM_data		1 3 4 \ref{fig:Kb}
Leverett Glacier	50.17	67.06	2009-2012	citet:tedstone_2017	citet:hawkings_2015	1 3 4 5 \ref{fig:L}
Oriartorfik (Nuuk)	51.4066	64.1707	2007-2018	citet:GEM_data		1 3 4 \ref{fig:O}
Qaanaaq	69.3030	77.4753	2017-2018	citet:qaanaaq_data	citet:sugiyama_2014	1 3 4 5 \ref{fig:Q}
Røde Elv (Disko)	53.4989	69.2534	2017	citet:GEM_data		1 3 4 5 6 \ref{fig:R}
Teqinngalip (Nuuk)	51.5484	64.1586	2007-2018	citet:GEM_data		1 3 4 \ref{fig:T}
Watson River	50.68	67.01	2006-2019	citet:van-as_2018	citet:van-as_2018	1 3 4 5 \ref{fig:W} \ref{fig:W_adjust}
Zackenberg	20.5628	74.4722	1996-2018	citet:GEM_data		1 3 4 5 \ref{fig:Z}

Discharge observations table internal use

Extract the longitude, latitude, and abbreviation from tbl_obs and import into GRASS for use for the analysis elsewhere in this document.

Extract abbreviation for each location
Add Convert (lon,lat) to EPSG:3413 (x,y)

for key in "${!tbl[@]}"; do 	# 
  row=(${tbl[$key]})
  IFS=" " read lon lat time data pub fig <<< ${row[@]}
  IFS=":" read x y z <<< $(echo "${lat} -${lon}" | cs2cs EPSG:4326 EPSG:3413 -f "%0.f" | sed 's/[[:space:]]/:/g')
  echo  ${key/\ */} $x $y
done | sort

Note - to solve jupyter-python issue with :cache results,

Manually edit this table
add #+NAME: tbl_obs_xy_cache
Reduce 1st column to 1-letter abbreviation, except
- Kobbefjord -> Kb
- Kiattuut Sermiat -> Ks
- Adjust Leverett to “-216646 | -2504583”
Then access via table name and not babel code block name.

Ks	-18335	-3183360
K	-326372	-2829354
Kb	-316602	-2831048
L	-216646	-2504583
O	-317396	-2826710
Q	-560538	-1241281
R	-335678	-2246371
T	-324548	-2827284
W	-249713	-2510668
Z	699990	-1540459

Methods

Terminology

We use the following terminology throughout the document:

Runoff refers to the unmodified RCM data products – melted ice, rain, condensation, and evaporation – that comprise the RCM runoff output variable.
Discharge refers to the runoff after it has been processed by this work - routed to and aggregated at the outlets. Depending on context, discharge may also refer to the observed stream discharge (Table \ref{tbl_obs}).
Basins refer to the 100 m x 100 m gridded basins derived from a combination of the ArcticDEM product and the mask.
Mask refers to the surface classification on that 100 m x 100 m grid and is one of ice, land, or ocean (also called fjord or water). When referring to the surface classification in the RCM, we explicitly state “RCM mask”.
MAR and RACMO refer to the RCMs, but when comparing discharge estimates between them or to observations, we use MAR and RACMO to refer to our discharge product derived from the MAR and RACMO RCM runoff variables rather than repeatedly explicitly stating “discharged derived from [MAR|RACMO] runoff”. The use should be clear from context.

Streams, outlets, and basins

Streams are calculated from the hydraulic head $h$ which is the DEM surface for land surface routing, or the subglacial pressure head elevation for subglacial routing. $h$ is defined as

\begin{equation} h = z_b + k \frac{ρ_i}{ρ_w} (z_s - z_b), \end{equation}

with $z_b$ the ice-free land surface and basal topography, $k$ the flotation fraction, $ρ_i$ the density of ice (917 kg m^-3), $ρ_w$ the density of water (1000 kg m^-3), and $z_s$ the land surface for both ice-free and ice-covered surfaces.

Eq. eq:head comes from citet:shreve_1972, where the hydropotential has units of pascals (Pa), but here it is divided by gravitational acceleration $g$ times the density of water $ρ_w$ to convert the units from pascals to meters (Pa to m). We compute $h$ and from that streams, outlets, basins, and runoff for a range of subglacial pressures, implemented as a range of $k$ values: 0.8, 0.9, and 1.0. We use these three scenarios to estimate sensitivity of the outlet location for all upstream cells, but otherwise only use results from the $k = 1.0$ scenario. Eq. eq:head makes the assumption that when ice is present all water routes subglacially, meaning that water flows from the surface to the bed in the grid cell where it is generated. In reality, internal catchments and moulins likely drain waters to the bed within a few kilometers of their source citep:yang_2016_internally. The difference between some supraglacial flow and immediate subglacial flow is not likely to impact results because discharge is reported only at the outlet locations.

We use the GRASS GIS software citep:neteler_2012,GRASS_GIS_software and the r.stream.extract command configured for single-flow direction from eight neighbors (SFD-8) to calculate streams and outlets at the ice edge and coast. Streams are defined only if their upstream contributing area is above a threshold (> 3 km^2), so small basins may have outlets but no streams. The software fills all sinks so that all water flows to the domain edge. We then use the r.stream.basins tool citep:jasiewicz_2011 to calculate basins upstream from each outlet. Basins < 1 km² are absorbed into their largest neighbor and the associated outlets are dropped.

Outlet sensitivity

The three choices of $k$ generate three scenarios of basins and outlets, and we use this to show sensitivity of every ice grid cell to these choices. After three $k$ scenarios, each cell has three possible outlets, where each outlet is an (x,y) coordinate. To show results in a map view, we reduced these six properties (three 2D coordinates) to a single property. For every grid cell in the ice domain we compute the maximum distance between each outlet and the other two (six becomes three), and we then select the maximum (three becomes one). Fig. fig:k_basin_change displays the maximum distance - a worst-case scenario - of how far the outlet of every inland ice cell may move due to basal routing assumptions.

Discharge and RCM coverage

RCM runoff is summed over each basin for each day of RCM data and assigned to each outlet for that day. This assumes routing between the runoff and the outlet is instantaneous, so all analyses done here include a 7 d smooth applied to the RCM discharge product (cf. citet:van-as_2017). The released data do not include any smoothing.

The alignments of the RCM and the basins do not always agree. Each 100 m x 100 m ArcticDEM pixel is classified as ice citep:citterio_2013, ocean citep:NSIDC_0714, or land (defined as neither ice nor ocean). However, the classification of the mask cells and the 1 km² RCM domains do not always agree - for example, when a mask cell is classified as ice but the matching RCM cell is land. This disagreement occurs almost everywhere along the ice margin because the 1 km RCM boundary and the 100 m mask boundary rarely perfectly align. The ice margin is where most runoff occurs per unit area due to the highest temperatures at the lowest ice elevations, so small changes in masks in these locations can introduce large changes in RCM outputs.

We adjust for this imprecise overlap and scale the RCM results to the basin area. Where the mask reports ice and a RCM reports land, the RCM land runoff fraction is discarded, and the RCM ice runoff fraction over this basin is adjusted for the uncovered basin cells (and vice versa for basin land and RCM ice). Small basins with no RCM coverage of the same type have no runoff.

Runoff adjustments using this method are underestimated for large basins with large inland high-elevation regions with low runoff, because this method fills in misaligned cells with each day’s average discharge, but the misalignment (missing runoff) occurs at the ice sheet edge where maximum runoff occurs. However, given that the basin is large, misalignment is proportionally small, and therefore errors are proportionally small. Conversely, when misalignment is proportionally large (e.g., a basin is only 1 % covered by the same RCM classification), this implies a small basin. Because the basin is small, the covered region (no matter how much smaller) must be nearby and not climatically different.

RCM inputs are also scaled to adjust for the EPSG:3413 non-equal-area projection. This error is up to 8 % for some grid cells but ranges from - 6 % to + 8 % over Greenland and the cumulative error for the entire ice sheet is < 8 %.

Validation

We validate the modeled outlet discharge against the observations first in bulk and then individually. Bulk comparisons are done with scatter plots (Figs. \ref{fig:scatter_daily} & \ref{fig:scatter_yearsum}) and modified Tukey plots comparing observations vs. the ratio of the RCMs to observations (Fig. \ref{fig:tukey}, based on Tukey mean-difference plots, also known as Bland–Altman plots citep:altman_1983,martin_1986).

We introduce the graphics here as part of the methods to reduce replication in figure captions - we show 10 nearly identical graphics (Figs. \ref{fig:W} and \ref{fig:L} through \ref{fig:Q}) for 10 different observation locations, and each graphic uses the same template of six panels.

For each figure (Figs. \ref{fig:W} and \ref{fig:L} to \ref{fig:Q}), the top panel (a) shows a satellite basemap with the land portion of the basin of interest (if it exists) outlined in dark green, the streams within that basin in light green, the basin outlet as an orange filled diamond, and the stream gauge location as an orange unfilled diamond. Ice basin(s) that drain to the outlet are outlined in thick dark blue if they exist, and all other ice basins are outlined in thin dark blue. Both MAR and RACMO use the same domains. The RCM ice domain is in light blue, and the RCM land domain is not shown, but is outside the light blue ice domain (not including the water). The scale of each map varies, but the basin lines (green and dark blue) are on a 100 m grid, and the RCM grid cells (light blue) are on a 1 km grid.

Panel b shows an example time series - whatever data are available for the last calendar year of the observations.

Panel c shows a scatter plot of observations vs. RCM-derived discharge. This is the same data shown in Fig. \ref{fig:scatter_daily} but subset to just the basin of interest. Color encodes day of year, and a kernel density estimation (KDE) of the discharge highlights where most points occur - not necessarily visible without the KDE because the points overlap (total number of plotted points is printed on the graphic near “n:”). The r² correlation coefficient for each RCM-derived discharge is displayed. The gray band shows the 5 to 95 % prediction interval, and the three solid lines mark the 1:1, 1:5, and 5:1 ratios.

Panel d shows observations vs. the ratio of the RCM to the observations. This is the same data shown in Fig. \ref{fig:tukey}, but subset to just the basin of interest. Color denotes sample density (similar to the KDE in panel c). The horizontal lines mark the mean, 0.05, and 0.95 quantile of the ration between the RCM and the observations. A value of 1 (or 10⁰) is agreement between observations and the RCM, and a value of 2 or 0.5 is a factor of 2 or a +100/-50 % disagreement. The horizontal split marks the bottom one-third and top two-thirds quantiles of discharge.

Product evaluation and assessment

Main characteristics

Results of this work include (1) ice-margin-terminating streams, outlets, and basins; (2) coast-terminating streams, outlets, and basins; (3) discharge at the ice marginal outlets from ice runoff; and (4) discharge at the coastal outlets from land runoff. Discharge products are provided from both the MAR and RACMO RCMs. We note that our subglacial streams represent where the model routes the water and do not indicate actual streams, unlike the land streams that do appear near actual streams when compared to satellite imagery. Even so, these streams routed using simple subglacial theory show remarkable alignment with ice surface streams and lakes visible in satellite imagery. This may support the theory that basal topography exerts a strong control on supraglacial hydrology citep:lampkin_2011,sergienko_2013_glaciological,crozier_2018, or may indicate a poorly represented and smooth bed in BedMachine, and therefore Eq. eq:head is effectively applying surface routing in these locations.

Of the 361,950 km² of basin land cells, the RCMs cover 339,749 km² (~94 %) with their land grid cells, and 22,201 km² (~6 %) of basin grid cells are filled in with our coverage algorithm (Sect. \ref{sec:methods:discharge}; the RCMs have these as ice or ocean). A total of 51,532 km² of RCM land is discarded because the basins classify part or all of these cells as ice or ocean. Of the 1,781,816 km² of basin ice cells, the RCMs cover 1,760,912 km² (~99 %) with their ice cells, and 20,904 km² (~1 %) of basin grid cells are filled in (the RCMs have these as land or ocean). A total of 21,793 km^2 of RCM ice is discarded, because the basins classify part or all of these cells as land or ice (table and data available in at https://github.com/GEUS-Glaciology-and-Climate/freshwater citep:github_freshwater).

Our coverage correction (Sect. \ref{sec:methods:discharge}) adjusts RCM ice runoff values by ~3 %. Discarding RCM ice runoff that does not match the underlying mask ice cells results in a 5 % reduction in discharge. However, applying our coverage algorithm to adjust RCM inputs for regions where basins have ice but the RCMs do not results in an 8 % increase from the reduced discharge (net gain of ~3 %). A similar adjustment occurs for RCM land runoff.

Comparison with previous similar work

Our static products - streams, outlets, and basins - have been previously estimated. citet:lewis_2009 identified 293 distinct hydrologic ice basins and provided a data set of ice basins and ice margin outlets. Our work, a decade later, has ~2 orders of magnitude more basins and outlets because of the higher resolution of the input data, and includes additional data. We provide ice basins, ice margin outlets, ice streams with metadata, land basins, coastal outlets, and land streams with metadata. citet:lewis_2009 generated basins from a 5 km DEM, compared to the 100 m DEM used here. Routing with a 5 km DEM that does not capture small-scale topography is likely to cause some basins and outlets to drain into an incorrect fjord - we find that some land basins delineated with even the 150 m BedMachine land surface may drain into the incorrect fjord, but we did not find similar errors with the 100 m ArcticDEM product used in this work.

Our time-series product (discharge) also has existing similar products. The most recent of these is from citet:bamber_2018_freshwater, who provide a data product at lower spatial resolution (5 km), lower temporal resolution (monthly), and only coastal discharge, not coastal basins, ice basins, or ice margin outlets and discharge. However, citet:bamber_2018_freshwater surpasses our product in that spatial coverage includes a larger portion of the Arctic including Iceland, Svalbard, and Arctic Canada. Furthermore, by providing data at 5 km spatial and monthly temporal resolution, citet:bamber_2018_freshwater implements the main strategy suggested here to increase the signal-to-noise ratio of the data - averaging discharge in space or time (see Sect. \ref{sec:mitigation}).

We show both the geospatial and temporal differences between this product and citet:bamber_2018_freshwater for an example location - Disko Island (Fig. fig:bamber_2018). Spatially our product allows assessment of discharge at interior locations, necessary when comparing with observations that are not at the coast (for example, the Leverett Glacier observations, Fig. fig:L). Temporally, the MAR and RACMO runoff summed over all of Disko Island and to monthly resolution is similar to the monthly Disko Island discharge of citet:bamber_2018_freshwater, but the daily resolution shows increased variability and individual discharge events (from warm days or rain) not seen in the monthly view.

A similar GIS workflow was presented by citet:pitcher_2016 only focusing on the discharge uncertainty from basal routing assumptions (the $k$ parameter in Eq \ref{eq:head}). We find these differences to be smaller than the differences between RCMs or between RCM and observations (see Sect. \ref{sec:uncertainty}).

Validation against observations

\label{sec:validation_obs}

Here we compare our results to all publicly accessible observations we could find or those willing to become open and publicly accessible as part of this work (Table tbl_obs).

This validation compares observations with discharge at stream gauges derived from RCM runoff estimates, much of it coming from far inland on the ice sheet. Disagreement is expected and does not indicate any specific issues in the RCMs but is instead likely due to our routing algorithm (Sect. \ref{sec:methods:discharge}).

Below we discuss first the validation for all discharge estimates together and then the individual outlets. For the individual outlets we begin by focusing on the problematic results in order of severity – Watson River (Figs. \ref{fig:W} & \ref{fig:W_adjust}), Leverett Glacier (Fig. \ref{fig:L}), and Kiattuut Sermiat (Fig. \ref{fig:Ks}) – and show that for two of these three, simple solutions are available, although manual intervention is needed to detect the issue and then adjust results.

Bulk validation

A comparison of every day of observational data with discharge > 0 (15,778 days) and the two RCMs (Fig. fig:scatter_daily) shows good agreement with r² of 0.45 and 0.88 for discharge derived from MAR and RACMO runoff respectively (hereafter “MAR” and “RACMO”). This comparison covers more than 4 orders of magnitude of modeled and observed discharge. The RACMO vs. observed discharge is within a factor of 5 (e.g., plus or minus half an order of magnitude), although both RCMs report only ~50 % of the observed discharge for the largest volumes at the Watson River outlet (Fig. \ref{fig:W}). The reason for the disagreement at the Watson River outlet is discussed in detail in Sect. \ref{sec:W}.

The four near-Nuuk GEM basins (Table \ref{tbl_obs}, Sect. \ref{sec:K}) have ice basins but either no or limited coverage in the RCMs. When excluding these basins from the comparison the r² agreement changes to 0.59 and 0.78 for MAR and RACMO respectively, and the 5 to 95 % prediction interval is significantly smaller for MAR (red band in Fig. \ref{fig:scatter_daily}). The largest disagreements throughout this work come from these small basins with no RCM coverage. These disagreements are therefore indicative of differences between the land/ice classification mask used by the RCMs compared with the basin masks used here and not necessarily an insufficient ability of the models to simulate melting (near-surface) climate conditions.

Fig. fig:scatter_yearsum shows a similar view as Fig. fig:scatter_daily, but here each observational data set and associated daily discharge is summed by year for the days in that year that observations exist (hence units m³ and not m³ yr$^-1$; for example four “L” means there are four calendar years with some observations at the Leverett outlet). Here it is more clear that the Watson River outlet (Sect. \ref{sec:W}) reports ~50 % of the observed discharge, the Kiattuut Sermiat outlet (Sect. \ref{sec:Ks}) overestimates discharge, and the remainder fall within the factor-of-2 lines, except for low discharge at Kingigtorssuaq in the MAR RCM where the RCMs do not cover that small glacier (Sect. \ref{sec:K}).

\label{sec:method:tukey}

Because discharge spans a wide range (~4 orders of magnitude, Fig. \ref{fig:scatter_daily}), a high correlation (r² of 0.88, Fig. \ref{fig:scatter_daily}) may be due primarily to the range which is larger than the error citep:altman_1983,martin_1986. Fig. \ref{fig:tukey} compensates for this by comparing the observations with the ratio of the RCM to the observations. This graphic again excludes the four near-Nuuk GEM basins. From Fig. \ref{fig:tukey}, the top two-thirds of observed discharge has modeled discharge underestimated by a factor of 0.78 (MAR) and 0.73 (RACMO), as well as 5 to 95 % quantile of 0.30 to 2.08. The top two-thirds of observed discharge spans ~2 orders of magnitude (width of horizontal lines, from ~10¹ to ~10³ m³ s^-1). The ratio of the RCMs to these observations for the top two-thirds has a 5 to 95 % quantile range of ~1 order of magnitude (distance between horizontal lines, from log$₁₀$ 0.3 to log$₁₀$ 2.08 = 0.84). The 5 to 95 % quantile range of the ratio between the RCMs and the observations is therefore half the range of the observations. Put differently, days with high observed discharge may have modeled discharge within ±0.5 order of magnitude, or plus or minus a factor of 5, or +500/-80 %. The modeled discharge is not likely to move farther than this from the observations, and high discharge remains high.

The bottom third of discharge is where the largest disagreement occurs. The mean model values are near the observed - the ratio of RCM to observed discharge is scaled by 0.67 for MAR (~33 % low) and 1.08 for RACMO (~8 % high), but the 5 to 95 % quantile range of the ratio between RCM and observations is large. Although large uncertainties for low discharge may not seem to matter for some uses (e.g., estimates of total discharge from Greenland, which is dominated by the largest quantities of discharge), it may matter for other uses. The bottom one-third quantile of observed discharge spans 3 orders of magnitude (10^-2 to ~10¹) but the uncertainty of the RCM-to-observations ratio spans ~4 and ~2 orders of magnitude for MAR and RACMO respectively (~10^-3 to ~2.2x10¹ MAR; ~10^-1 to 2.2x10¹ RACMO).

Basin sizes

abbrev	total	land	land %	ice	ice %
Ks	693.33	391.31	56.4392	302.02	43.5608
K	7.56	5.64	74.6032	1.92	25.3968
Kb	37.52	28.71	76.5192	8.81	23.4808
L	1360.88	0	0	1360.88	100
O	16.37	14.39	87.9047	1.98	12.0953
Q	13.22	2.2	16.6415	11.02	83.3585
R	100.12	72.39	72.3032	27.73	27.6968
T	24.85	18.89	76.0161	5.96	23.9839
W	1881.79	520.91	27.6816	1360.88	72.3184
Z	487.02	377.76	77.5656	109.26	22.4344

Watson River

The Watson River discharge basin area is 1882 km², of which 521 km² (28 %) is land and 1361 km² (72 %) is ice (Fig fig:Wa). The partial (last calendar year) discharge time series shows MAR and RACMO agree well with each other but have a maximum of 500 m³ s^-1, while observations are up to 4x more (Fig. fig:Wb). Low discharge (both early and late season) is overestimated and high discharge is underestimated, approximately equal for both RCMs (Fig. fig:Wc). The low discharge overestimate ranges from a mean multiple of 1.68 (MAR) and 1.57 (RACMO) to a +95 % quantile ratio of ~70 (MAR) and ~52 (RACMO). The high-discharge underestimate has a mean multiple of ~0.5 for both MAR and RACMO and a 5 to 95 quantile range of between 0.23 to 1.09.

The Watson River discharge presented here is approximately half of the citet:van-as_2018 discharge for high discharge. The large underestimate for high discharge may be due to either errors in the basin delineation used in this study, errors in the stage–discharge relationship used by citet:van-as_2018, errors in the RCM runoff estimates, or a combination of the above three. All three of these error sources increase with high discharge (and associated melt): basin delineation becomes less certain with inland distance from the ice sheet margin. The river stage–discharge conversion becomes less certain at high stage levels. Runoff calculations become less certain from a snow surface than an ice surface because of, e.g., snow density, subsurface refreezing, and surface darkening.

The complexity of estimating the area of the Watson River catchment is described by citet:monteban_2020, who note that previous studies have used values ranging from 6131 km² citep:mernild_2010_runoff to 12547 km² citep:van-as_2012. Our basin is smaller than the basin used in citet:van-as_2018 and similar to citet:mernild_2018 who attributed the difference between their modeled outflow and observations from citet:van-as_2017 to their decision to use surface rather than subglacial routing, and applied a correction term. We find that our basin does not include a separate basin to the south that is part of the Watson River ice basin in citet:van-as_2018 (from citet:lindback_2015 and citet:lindback_2014). We are able to recreate the citet:van-as_2018 basin but only when using the citet:lindback_2014 bed and the citet:bamber_2013 surface. When using any other combination of bed DEM, surface DEM, or $k$ values, we are unable to match the citet:lindback_2015 basin. Instead all our basins resemble the basin shown in Fig fig:W. To solve this, we manually select two large ice basins to the south of the Watson River ice basin. Modeled and observed discharge agree after including these two basins (Fig. fig:W_adjust), suggesting basin delineation, not stage–discharge or RCM runoff is the primary cause for this disagreement. Furthermore, it is the additional width at lower elevation from the larger basin, not the increased inland high-elevation area, that likely contributes the runoff needed to match the observations, because 85 % of all surface runoff occurs below 1350 m, and almost all below 1850 citet:van-as_2017.

At the Watson River outlet, there is no reason to suspect this product underestimates observed discharge by 50 %. The observations are needed to highlight the disagreement. Once this disagreement is apparent, it is also not clear what to do to reduce the disagreement without the previous efforts by citet:lindback_2015 and citet:lindback_2014. Basin delineation is discussed in more detail in the uncertainty section (Sect. \ref{sec:uncertain:basin}). The other two problematic areas highlighted above (Sect. \ref{sec:validation_obs}) can be detected and improved without observational support.

Leverett Glacier

The Leverett Glacier basin area is 1361 km² and 100 % ice (Fig fig:La). The partial (last calendar year) discharge time series shows MAR and RACMO agree well with each other and with the observations (Fig. fig:Lb), with no seasonal dependence (Fig fig:Lc). The 5 to 95 % prediction interval for MAR is generally within the 1:5 and 5:1 bands, with a larger spread for RACMO (Fig fig:Lc). High model discharge is 3 % higher than observed (MAR) or 25 % higher than observed (RACMO), and the 5 to 95 quantile range of the ratio is between 0.73 and 1.62 (MAR) and 0.83 and 2.02 (RACMO). Low model discharge is also centered near the observations, but as always larger errors exist for low discharge (Fig fig:Ld).

This basin is problematic because the basin feeding the outlet is small (< 5 km²), but even without the observational record satellite imagery shows a large river discharging from the ice sheet here. Meanwhile, a large (100s of km²) ice basin does discharge just a few 100 m away, but not upstream of this gauge location. We therefore adjust the gauge location onto the ice (equivalent to selecting a different outlet) so that our database access software selects what appears to be the correct basin given the size of the stream in the satellite imagery (Fig. fig:L).

The plots shown here use the adjusted gauge location and modeled discharge appears to match the observed discharge. When plotting (not shown) the modeled discharge for the outlet just upstream of the true gauge location, results are clearly incorrect. This issue - small basins at the margin and incorrect outlet location - is persistent throughout this product and discussed in more detail in Sect. \ref{sec:uncertain:basin}.

The Leverett Glacier basin is a subset of the Watson River outlet basin (Sect. \ref{sec:W}). The strong agreement here supports our claim that the Watson River disagreement is not from the RCM runoff or the stage–discharge relationship but more likely due to basin area. The correct Watson River basin should include some basins outside of the Leverett Glacier basin that still drain to the Watson River outlet gauge location.

Kiattuut Sermiat

<<py_init>>
<<load_all_obs>>

df = obs['Ks']
df['MAR'] = df['MAR_ice'] + df['MAR_ice_upstream']
df['RACMO'] = df['RACMO_ice'] + df['RACMO_ice_upstream']
df = df[['obs','MAR','MAR_land','RACMO','RACMO_land']]

df = df.sum(axis='rows')

df['MAR land %'] = df['MAR_land'] / (df['MAR']+df['MAR_land']) * 100
df['RACMO land %'] = df['RACMO_land'] / (df['RACMO']+df['RACMO_land']) * 100

df

The Kiattuut Sermiat discharge basin area is 693 km², of which 391 km² (56 %) is land and 302 km² (44 %) is ice. The basin area is incorrectly large because the land basin reported and shown includes the entire basin that contains the discharge point, of which some is downstream (Fig fig:Ksa). However, only ~25 % of runoff comes from the land, and only a small portion of the land basin is downstream of the gauge location, so this is not enough to explain the discharge vs. observation disagreement. The partial (last calendar year) discharge time series shows MAR and RACMO agree well with each other, but are significantly higher than the observations (Fig. fig:Ksb). Both low and high discharge are overestimated, but the 5 to 95 % quantile range of the ratio are within a factor of 5 (Fig fig:Ksc), with a mean ratio between 1.71 (RACMO bottom one-third of discharge) to 2.44 (MAR high two-thirds discharge)

The Kiattuut Sermiat gauge is in a problematic location in terms of determining the actual (nontheoretical) upstream contributing area. Similar to the Leverett Glacier gauge location, the issues here can be estimated independent of observational data. Specifically, it is not clear if this stream includes water from the larger glacier to the east and east-northeast that feeds this glacier (Fig. fig:Ksa) - in our delineation it does not. Furthermore, several glaciers to the north-northwest and detached from the glacier near the stream gauge appear to drain into a lake that then drains under the glacier and then to the stream gauge. This latter issue is observable in any cloud-free satellite imagery and does not need the basin delineations provided here to highlight the complexities of this field site. Nonetheless, RCM discharge estimates are only slightly more than double the observations.

The Kiattuut Sermiat gauge location may have been selected in part due to its accessibility - it is walking distance from the Narsarsuaq Airport. The data may also suit their intended purpose well and there are likely many results that can be derived independent of the area or location of the upstream source water. However, if the location or area of the upstream contributions is important, then gauge location should balance ease of access and maintenance with the ease with which the data can be interpreted in the broader environment.

GEM observations near Nuuk

\label{sec:K} \label{sec:Kb} \label{sec:O} \label{sec:T}

Four Greenland Ecosystem Monitoring (GEM) program stream gauges are located near Nuuk, with similar basin properties. All are small (7.56 to 37.52 km²) and 10 % to 25 % ice in the basin mask, but two of the four (Kingigtorssuaq, Fig. fig:K; and Oriartorfik, Fig. fig:O) contain small glaciers contributing to observed discharge but no RCM ice cells cover those glaciers, and the remaining two (Teqinngalip, Fig. fig:T; and Kobbefjord, Fig. fig:Kb) have several small glaciers, but only one per basin has RCM ice coverage.

All four of these basins show some weak agreement. The maximum r² is 0.47 (Fig. fig:Tc) and the minimum is 0.11 (Fig fig:Kc), but we note that the worst agreement comes from a basin with no glaciers in the RCM domain, and that in all cases the mean high discharge agrees well, suggesting high discharge in these small basins with few small glaciers may be due to rain (captured in the RCMs) rather than warm days and melted ice.

Remaining observations

\label{sec:R} \label{sec:Z} \label{sec:Q}

Three additional stream gauges remain: Røde Elv, Zackenberg, and Qaanaaq.

The Røde Elv basin is situated at the southern edge of Disko Island (Fig. fig:bamber_2018). It has an area of 100 km², of which 72 km² is land and 28 km² is ice (Fig fig:Ra). The partial (last calendar year) discharge time series shows MAR, RACMO, and the observations all in approximately the same range but with high variability (Fig. fig:Rb). Of the few samples here (n = 98), most are within the factor-of-5 bands for MAR and a few more are outside the bands for RACMO (Fig. fig:Rc). Mean discharge offset ranges from a ratio of 0.82 (RACMO low) to 1.85 (MAR low), with high-discharge estimates slightly closer to observations - a 48 % and 77 % overestimate for MAR and RACMO respectively (Fig. fig:Rd).

The Zackenberg basin in NE Greenland has an area of 487 km², of which 378 km² (78 %) is land and 109 km² (22 %) is ice (Fig. fig:Za). The partial (last calendar year) discharge time series shows disagreements between MAR and RACMO that generally bound the observations (Fig. fig:Zb). RACMO-derived discharge is consistently high for low discharge early in the year, but both discharge products fall mostly within the factor-of-5 bands (Fig. fig:Zc). For high discharge, mean modeled discharge is 9 % high (MAR) and 24 % low (RACMO) and has a worst-case 5 to 95 % quantile range low by a factor of 0.29 (Fig. fig:Zd).

The Qaanaaq basin in NW Greenland has an area of 13.2 km², of which 2.2 km² (17 %) is land and 11 km² (83 %) is ice (Fig. fig:Qa). The partial (last calendar year) discharge time series shows disagreements between MAR and RACMO that generally bound the observations (Fig fig:Qb). Of the few samples (n = 82), MAR preferentially overestimates and RACMO underestimates discharge, but both generally within a factor of 5 (Fig fig:Qc). The mean high-discharge ratio is 1.26 (MAR) and 0.4 (RACMO) from Fig. fig:Qd.

Uncertainty

The volume of data generated here is such that manually examining all of it or editing it to remove artifacts or improve the data would be time and cost prohibitive. A similar warning is provided with the ArcticDEM data used here. However, any ArcticDEM issues interior to a basin do not impact results here that are aggregated by basin and reported at the outlet. ArcticDEM issues that cross basin boundaries should only impact a small part of the basin near the issue.

Uncertainty from RCM inputs and observations are considered external to this work, although they are still discussed (Sects. \ref{sec:uncertain:RCM} and \ref{sec:uncertain:obs}). In this work, we introduce one new source of uncertainty - the routing model, which generates both temporal (runoff delay) and spatial (basin delineation) uncertainty.

Temporal uncertainty

The RCMs include a time lag between when water melts in the model and when it leaves a grid cell. RACMO retention occurs only when there is firn cover (no retention when bare ice melts); MAR includes a time delay of up to 10 days that is primarily a function of surface slope citep:zuo_1996,yang_2019. However, neither model includes a subglacial system. Properly addressing time delays with runoff requires addressing storage and release of water across a variety of timescales in a variety of media: firn (e.g., citet:munneke_2014,vandecrux_2019), supraglacial streams and lakes (e.g., citet:zuo_1996,smith_2015,yang_2019), the subglacial system (e.g., citet:rennermalm_2013), possibly terrestrial streams and lakes (e.g., citet:van-as_2018) and a variety of other physical processes that are not within the scope of surface mass balance (SMB) modeling. Runoff delay can be implemented outside the RCMs (e.g., citet:liston_2012,mernild_2018), but for this version of the product we assume that once an RCM classifies meltwater as runoff, it is instantly transported to the outlet. Actual lags between melt and discharge range from hours to years citep:colgan_2011_hydrology,van-as_2017,rennermalm_2013,livingston_2013.

Data released here include no additional lag beyond the RCM lag, although a 7 d running mean (cf. citet:van-as_2017) is included in all of the results presented here except Fig. fig:bamber_2018, which shows monthly summed data, and Fig. fig:scatter_yearsum, which shows yearly summed data. When increasing the signal to noise by summing by year (Fig. fig:scatter_yearsum vs. Fig. fig:scatter_daily), model results more closely match observations.

Basin uncertainty

Basin uncertainty is a function of the subglacial routing assumptions (the $k$ parameter in Eq. eq:head, which in reality varies in both space and time). However, basin uncertainty does not necessary translate to discharge uncertainty. For example, when comparing two $k$ simulations, a large basin in simulation $k_0$ may change only its outlet by a few grid cells in $k_1$. A small micro-basin may appear in $k_1$ with its outlet in the same grid cell as the large $k_0$ outlet. The large change in discharge between the two outlets at the same location in $k_0$ and $k_1$ is not an appropriate estimate of uncertainty - rather the large basin in $k_0$ should be compared with the almost entirely overlapping large basin in $k_1$ with the different outlet. This fluidity of basins and outlets between $k$ scenarios makes it almost impossible to define, identify, and compare basins between scenarios, unless working manually with individual basins (as we did, for example, at the Leverett Glacier observation location, modeled upstream basin, and adjusted upstream basin; see Sect. \ref{sec:L}).

Another example has a large basin in simulation $k_0$ and a similarly large basin in simulation $k_1$ draining out of the same grid cell, but overlapping only at the outlet grid cell. Upstream the two do not overlap and occupy different regions of the ice sheet. These two basins sharing one outlet (between different $k$ simulations) could have similar discharge. Put differently, although inland grid cells may change their outlet location by large distances under different routing assumptions (Fig. fig:k_basin_change), that does not imply upstream basin area changes under different routing assumptions. Large changes in upstream catchment area are possible citep:chu_2016_rerouting, but we note citet:chu_2016_rerouting highlight changes at only a few outlets and under the extreme scenario of $k = 1.11$ describing an overpressured system. Because $ρ_w/ρ_i = 1.09$, setting $k = 1.09$ reduces Eq. \ref{eq:head} to $h = z_s$ and is equivalent to an overpressured system with surface routing of the water. In a limited examination comparing our results with $k ∈ [0.8, 0.9, 1.0]$, we did not detect basins with large changes in upstream area. In addition, all time series graphics show the mean RCM discharge for $k = 1.0$, but the uncertainty among all three $k$ values (not shown) is small enough that it is difficult to distinguish the three separate uncertainty bands - the difference between RCMs or between RCMs and observations is much larger than uncertainty from the $k$ parameter.

The above issues are specific to ice basins. Land basin outlets do not change location, and the range of upstream runoff from different $k$ simulations to a land outlet provides one metric of uncertainty introduced by the $k$ parameter. This uncertainty among all three $k$ values is small at ice margin outlets. It is even smaller at land outlets which act as spatial aggregators and increase the signal-to-noise ratio.

Below, we discuss the known uncertainties, ranging from least to most uncertain.

The basins presented here are static approximations based on the 100 m DEM of a dynamic system. Land basin boundaries are likely to be more precise and accurate than ice basins because the land surface is better resolved, has larger surface slopes, has negligible subsurface flow, and is less dynamic than the ice surface. Even if basins and outlets seem visually correct from the 100 m product, the basin outline still has uncertainty on the order of hundreds of meters and will therefore include many minor errors and nonphysical properties, such as drainage basin boundaries bisecting lakes. However, all artifacts we did find are significantly smaller than the 1 km² grid of the RCM inputs. We do not show but note that when doing the same work with the 150 m BedMachine land surface DEM, some basins change their outlet locations significantly - draining on the opposite side of a spit or isthmus and into a different fjord than the streams do when observed in satellite imagery. We have not observed these errors in streams and basins derived from the 100 m ArcticDEM in a visual comparison with Google Earth, although they may still exist.

Moving from land basins to subglacial ice basins, the uncertainty increases because subglacial routing is highly dynamic on timescales from minutes to seasons (e.g., citet:werder_2013). This dynamic system may introduce large spatial changes in outflow location (water or basin “piracy”, citet:ahlstrom_2002,lindback_2015 and citet:chu_2016_rerouting), but citet:stevens_2018 suggests basins switching outlet locations may not be as common as earlier work suggests, and our sensitivity analysis suggests that near the margin where the majority of runoff occurs, outlet location often changes by less than 10 km under different routing assumptions (Fig. fig:k_basin_change). The largest (> 100 km) changes in outlet location in Fig. fig:k_basin_change occur when the continental or ice flow divides move, and one or two of the $k$ scenario(s) drain cells to an entirely different coast or sector of the ice sheet. Finally, in some locations water is routed supraglacially, not subglacially (c.f. citet:li_2022).

The regions near the domain edges - both the land coast and the ice margin - are covered by many small basins, and in this work basins < 1 km² are absorbed into their largest neighbor (see Methods section). By definition these basins are now hydraulically incorrect. An example can be seen in the Zackenberg basin (Fig. fig:Za, southwest corner of the basin), where one small basin on the southern side of a hydraulic divide was absorbed into the large Zackenberg basin that should be defined by and limited to the northern side of the mountain range.

Near the ice margin quality issues exist. At the margin, many of the small basins (absorbed or not) may be incorrect because the bed uncertainty is larger relative to the ice thickness, and therefore uncertainty has a larger influence on routing. Minor mask misalignments may cause hydraulic jumps (waterfalls) at the margin, or sinks that then need to be filled by the algorithm, and may overflow (i.e., the stream continues onward) somewhere at the sink edge different from the location of the real stream. The solution for individual outlets is to visually examine modeled outlet location, nearby streams in satellite imagery, and the area of upstream catchments, as we did for the Leverett Glacier outlet (Sect \ref{sec:L}). Alternatively, selecting several outlets in an area will likely include the nearby correct outlet. This can be automated and an effective method to aggregate all the micro-ice basins that occur at the domain edge is to select the downstream land basin associated with one ice outlet and then all upstream ice outlets for that land basin.

RCM uncertainty

In addition to the basin delineation issues discussed above, the runoff product from the RCMs also introduces uncertainty into the product generated here. The RCM input products do not provide formal time- or space-varying error estimates, but of course do contain errors because they represent a simplified and discretized reality. RCM uncertainty is shown here with a value of ±15 %. The MAR uncertainty comes from an evaluation by the Greenland SMB Model Intercomparison Project (GrSMBMIP; citet:fettweis_2020) that examined the uncertainty of modeled SMB for 95 % of the 10767 in situ measurements over the main ice sheet. The mean bias between the model and the measurements was 15 % with a maximum of 1000 mmWE yr^-1. GrSMBMIP uses integrated values over several months of SMB, suggesting larger uncertainty of modeled runoff at the daily timescale. The RACMO uncertainty comes from an estimated average 5% runoff bias in RACMO2.3p2 compared to annual cumulative discharge from the Watson River citep:noel_2019. The bias increases to a maximum of 20 % for extreme runoff years (e.g., 2010 and 2012), so here we select 15 %, a value between the reported 5 % and the maximum 20 % that matches the MAR uncertainty. We display ±15 % uncertainty in the graphics here and suggest this is a minimum value for daily runoff data.

The 15 % RCM uncertainty is represented graphically in the time series plots when comparing to each of the observations. It is not shown in the scatter plots because the log–log scaling and many points make it difficult to display. In the time series plots, we show the mean value from the $k = 1.0$ scenario and note that discharge from the other two $k$ scenarios covered approximately the same range.

Observational uncertainty

When comparing against observations, additional uncertainty is introduced because the stage–discharge relationship is neither completely precise nor accurate. We use published observation uncertainty when it exists. Only two observational data sets come with uncertainty: Watson River and Qaanaaq. Similar to the RCM uncertainty, they are displayed in the time series but not in the scatter plots.

Mitigating uncertainties

Traditional uncertainty propagation is further complicated because it is not clear to what extent the three uncertainties (observational, RCM, and routing model) should be treated as independent from each other - all three uncertainties are likely to show some correlation with elevation, slope, air temperature, or other shared properties or processes.

Many of the uncertainties discussed here can be mitigated by increasing the signal-to-noise ratio of the product. Because we provide a high spatial and temporal resolution product, this is equivalent to many signals, each of which has some uncertainty (noise). Averaging results spatially or temporally, if possible for a downstream use of this product, will increase the signal-to-noise ratio and reduce uncertainty.

For example, because we provide basins for the entire ice sheet, total discharge is not subject to basin uncertainty. Any error in the delineation of one basin must necessarily be corrected by the inclusion (if underestimate) or exclusion (if overestimate) of a neighboring basin, although neighboring basins may introduce their own errors. Therefore, summing basins reduces the error introduced by basin outline uncertainty and should be done if a downstream product does not need an estimate of discharge from a single outlet. This feature is built-in to coastal outlet discharge, which is not as sensitive to our routing algorithm as ice margin outlet discharge because most coast outlets include a range of upstream ice margin outlets (e.g., Fig. fig:W vs. fig:L). Conversely, at the ice margin, outlet location and discharge volume is more uncertain than at the land coast. However, most runoff is generated near the ice margin, and as runoff approaches the margin, there are fewer opportunities to change outlet location (Fig. fig:k_basin_change).

Our coverage algorithm (Sect \ref{sec:methods:discharge}) only fills in glaciated regions that have at least some RCM coverage. When working with basins that have glaciated areas and no RCM coverage as in the case for all four of the GEM outlets near Nuuk, discharge could be approximated by estimating discharge from the nearest covered glaciated area with a similar climatic environment.

Temporally, errors introduced by this study’s assumption of instantaneous discharge can be reduced by summing or averaging discharge over larger time periods, or applying a lag function to the time series as done here and in citet:van-as_2017. Although a given volume of water may remain in storage long term, if one assumes that storage is in roughly steady state, then long-term storage shown by, for example, dye trace studies, can be ignored - the volume with the dye may be stored, but a similar volume should be discharged in its place.

Quality control

The scale of the data are such that manual editing to remove artifacts is time and cost prohibitive. Here we provide one example of incorrect metadata. The elevation of each outlet is included as metadata by looking up the bed elevation in the BedMachine data set at the location of each outlet. Errors in BedMachine or in the outlet location (defined by the GIMP ocean mask) introduce errors in outlet elevation.

A large basin in NW Greenland has metadata outlet elevation > 0 (gray in Fig. fig:overview) but appears to be marine terminating when viewed in satellite imagery. Elsewhere the land- vs. marine-terminating color coding in Fig. fig:overview appears to be mostly correct, but this view only provides information about the sign of the elevation and not the magnitude (i.e., if the reported depth is correct). Ice outlets can occur above, at, or below 0 m. It is easier to validate the land-terminating basins, which should in theory all have an outlet elevation of ≤ 0 m. That is not the case (Fig. fig:elev). Outlets at 0 m are coastal outlets. Outlets < 0 m occur when land-source runoff enters the subglacial system and discharges at depth from a marine terminating glacier, but we expect these outlets to be << 0 m, not near 0 m. This edge case also produces land outlets that exist outside of land basins. It is possible for land outlets to be correctly assigned an elevation > 0 m, if a land basin outlet occurs at a waterfall off a cliff (as might occur the edges of Petermann fjord) or due to DEM discretization of steep cells. However, most of the land outlets near but not at 0 m are likely due to mask misalignment placing a section of coastline in a fjord (negative land elevation) or inland (positive land elevation), or discretization issues in BedMachine. The bulk of land discharge (75 %) occurs within 0 ±10 m elevation and 90 % within 0 ±30 m elevation (Fig. fig:elev).

Other sources of freshwater

The liquid water discharge product provided here is only one source of freshwater that leaves the ice sheet and affects fjords and coastal seas. The other primary freshwater source is iceberg calving and submarine melt at the ice/ocean boundary of marine-terminating glaciers. A companion to the liquid water discharge product introduced here is provided by citet:mankoff_2019_ice,mankoff_2020_ice, which estimates solid ice volume flow rates across gates near marine-terminating glaciers. That downstream ice enters fjords as either calving icebergs or liquid water from submarine melting.

Both this product and citet:mankoff_2020_ice provide liquid or solid freshwater volume flow rates at outlets (this product) or grounding lines citep:mankoff_2020_ice, but actual freshwater discharge into a fjord occurs at a more complicated range of locations. Solid ice melts throughout the fjord and beyond (e.g., citet:enderlin_2016,moon_2017), and the freshwater discharge presented here may enter at the reported depth (Sect. \ref{sec:QC}) but rapidly rises up the ice front and eventually flows into the fjord at some isopycnal (see citet:mankoff_2016). The eventual downstream location of the fresh water is not addressed in this work.

Freshwater inputs directly to the water surface are also not included in this product. The flux (per square meter) to the water surface should be similar to the flux to the non-ice-covered land surface - assuming the orographic effects on precipitation produce similar fluxes to the near-land water surface.

Finally, basal melt from 1) geothermal heating (e.g., citet:fahnestock_2001) 2) frictional heating (e.g., citet:echelmeyer_1990) and 3) viscous heat dissipation from runoff (e.g., citet:mankoff_2017) contributes additional discharge (see for example citet:johannesson_2020) to the surface melt. Geothermal and frictional heating are approximately in steady state and contribute freshwater throughout the winter months.

Summary

Of the 20 comparisons between the two RCMs and the 10 observations we note the following.

In general this product shows good agreement between observations and the modeled discharge from the RCM runoff routed to the outlets, when comparing across multiple basins, especially when ignoring small basins with small glaciers that are not included in the RCMs (Fig. \ref{fig:scatter_daily}). The agreement is not as good when estimating the discharge variability within individual basins. From this, the product is more appropriately used to estimate the magnitude of the discharge from any individual basin, and perhaps provide some idea of the statistical variability, but not necessarily the precise amount of discharge for any specific day, because routing delays are neglected.
The majority of the 20 comparisons have the 5 to 95 % prediction interval between scales of 1:5 and 5:1. From this, the model results match observations within plus or minus a factor of 5, or half an order of magnitude. Put differently, the daily RCM values for single or few basins have an uncertainty of +500 % or -80 %.
The uncertainty of +500%/-80% is for “raw” data: daily discharge for one or few basins with a simple temporal smooth. When averaging spatially or temporally over larger areas or longer times, uncertainty decreases (Sect. \ref{sec:uncertainty}). For example, when moving from daily data (Fig. fig:scatter_daily) to annual sums (Fig. fig:scatter_yearsum), the uncertainty is reduced to +100%/-50%.
The two RCMs agree best with each other for the three observations dominated by large ice domains (Watson River, Sect. \ref{sec:W} & Fig. \ref{fig:W}); Leverett Glacier, Sect. \ref{sec:L} & Fig. \ref{fig:L} which is a subset of the Watson River basin; and Kiattuut Sermiat, Sect. \ref{sec:Ks} & Fig. \ref{fig:Ks}). RCMs agree best with observations for ice-dominated basins with well-resolved bed topography in BedMachine (i.e., correct basins modeled in this work) - here only Leverett Glacier (Sect. \ref{sec:L} & Fig. \ref{fig:L}) meets this criterion.
Runoff errors increase with low discharge (panels ‘d’ in Figs. \ref{fig:W}, \ref{fig:L} to \ref{fig:Q}).
For land basins, subglacial routing errors no longer exist, basins are well defined, and errors are due to neglecting runoff delays or the RCM estimates of runoff.
For ice basins, errors are dominated by basin uncertainty. Errors between similar-sized and neighboring basins are likely to offset and may even cancel each other. Even so, a conservative treatment might consider errors between basins as random and reduce by the sum of the squares when summing discharge from multiple similar-sized and neighboring basins.

Product description

These data contain a static map of Greenland’s hydrological outlets, basins, and streams and a times-series of discharge from each outlet.

The output data are provided in the following formats:

Streams

the stream data are provided as a GeoPackage standard GIS product and a metadata CSV that includes the stream type (start or intermediate segment), network, stream along-flow length, stream straight length, sinuosity, source elevation, outlet elevation, and a variety of stream indices such as the Strahler, Horton, Shreve, Hack, and other parameters citep:jasiewicz_2011. We note that the subglacial streams are unvalidated with respect to actual subglacial conduits, and they should be used with caution.

Outlets

The outlet data are also provided as a GeoPackage and CSV, each of which include the outlet ID (linked to the basin ID), the longitude, latitude, EPSG:3413 x and y, and the outlet elevation. The outlet elevation is the BedMachine bed elevation at the outlet location, and users should be aware of quality issues identified in Sect. \ref{sec:QC}. The ice outlet metadata includes the ID, longitude, latitude, x, and y of the downstream land outlet, if one exists.

Basins

The basin product GeoPackage includes the geospatial region that defines the basin. The metadata CSV includes the basin ID (linked to the outlet ID), and the area of each basin.

Discharge

The time series discharge product is provided as four NetCDF files per year, one for each domain (ice margin and land coast) and one for each RCM (MAR and RACMO). The NetCDF files contain an unlimited time dimension, usually containing 365 or 366 d; much of the same metadata as the outlets CSV file, including the outlet (also known as station) ID, the latitude, longitude, and elevation of the outlet; and a runoff variable with dimensions (station, time) and units of cubic meters per second (m³ s^-1).

Database access software

The data can be accessed with custom code from the raw data files. However, to support downstream users we provide a tool to access the outlets, basins, and discharge for any region of interest (ROI). The ROI can be a point, a list describing a polygon, or a file, with units in longitude and latitude (EPSG:4326) or meters (EPSG:3413). If the ROI includes any land basins, an option can be set to include all upstream ice basins and outlets, if they exist. The script can be called from the command line (CLI) and returns CSV formatted tables or within Python and returns standard Python data structures (from the GeoPandas or xarray package).

For example, to query for discharge at one point (50.5 °W, 67.2 °N), the following command is issued:

python ./discharge.py --base ./freshwater --roi=-50.5,67.2 --discharge,

where discharge.py is the provided script, ./freshwater is the folder containing the downloaded data, and --discharge tells the program to return RCM discharge (as opposed to --outlets which would return basin and outlet information). The program documentation and usage examples are available at https://github.com/GEUS-Glaciology-and-Climate/freshwater citep:github_freshwater.

Because the --upstream option is not set, the --discharge option is set, and the point is over land, the results of this command are a time series for the MAR and RACMO land discharge for the basin containing this point. A small subset (the first 10 days of June 2012) is shown as an example:

time	MAR_land	RACMO_land
2012-06-01	0.043025	0.382903
2012-06-02	5.5e-05	0.095672
2012-06-03	5e-05	0.009784
2012-06-04	9e-06	-0.007501
2012-06-05	0.008212	0.007498
2012-06-06	28.601947	0.607345
2012-06-07	0.333926	0.05691
2012-06-08	0.489437	0.204384
2012-06-09	0.038816	0.167325
2012-06-10	5.1e-05	0.011415

If the upstream option is set, two additional columns are added: one for each of the two RCM ice domains. A maximum of six columns may be returned: 2 RCMs × (1 land + 1 ice + 1 upstream ice domain), because results are summed across all outlets within each domain when the script is called from the command line (summing is not done when the script is accessed from within Python).

If the --outlets option is set instead of the --discharge option, then results are a table of outlets. For example, moving 10 ° east over the ice,

python ./discharge.py --base ./freshwater --roi=-40.5,67.2 --outlets

results in the following.

index	id	lon	lat	x	y	elev	domain	upstream	coast_id	…
0	118180	-38.071	66.33	313650	-2580750	-78	land	False	-1	…
1	67133	-38.11	66.333	311850	-2580650	-58	ice	False	118180	…

If the script is accessed from within Python, then the discharge option returns an xarray Dataset of discharge, without aggregating by outlet, and the outlets option returns a GeoPandas GeoDataFrame, and includes the geospatial location of all outlets and outline of all basins, and can be saved to GIS-standard file formats for further analysis.

Conclusions

\conclusions

We provide a 100 m spatial resolution data set of streams, outlets, and basins, and a 1 day temporal resolution data set of discharge through those outlets for the entire ice sheet area from 1950 through December 2022. Access to this database is made simple for nonspecialists with a Python script. Comparing the two RCM-derived discharge products to 10 gauged streams shows the uncertainty is approximately plus or minus a factor of 5, or half an order of magnitude, or +500%/-80%, when comparing daily discharge for single or few basins.

Because of the high spatial (individual basins) and temporal (daily) resolution, larger uncertainty exists than when working over larger areas or time steps. These larger areas and times can be achieved through spatial and temporal aggregating or by implementing a lag function.

This liquid freshwater volumetric flow rate product is complemented by a solid ice discharge product citep:mankoff_2020_ice. Combined, these provide an estimate of the majority of freshwater (total solid ice and liquid) flow rates from the Greenland Ice Sheet into fjords and coastal seas, at high temporal resolution and process-level spatial resolution (i.e., glacier terminus for solid ice discharge, stream for liquid discharge).

This estimate of freshwater volume flow rate into Greenland fjords aims to support further studies of the impact of freshwater on ocean physical, chemical, and biological properties; fjord nutrient, sediment, and ecosystems; and larger societal impacts of freshwater on the fjord and surrounding environments.

Data and code availability

The data from this work are available at doi:10.22008/promice/freshwater citep:GEUS_freshwater_paper.

The code and a website for postpublication updates are available at https://github.com/GEUS-Glaciology-and-Climate/freshwater citep:github_freshwater where we document changes to this work and use the GitHub Issues feature to collect suggested improvements, document those improvements as they are implemented, and document problems that made it through review. This version of the document is generated with git commit version \input{|”git describe –always –dirty=’*’”}.

Algorithms

Streams, outlets, and basins

The hydrological basins are defined based on a range of subglacial flow routing regimes.

The head gradient is defined as:

Location	Description
Sea	Undefined
Land	ArcticDEM 100 m
Ice	ArcticDEM + k * 0.917 * thickness

thickness is from BedMachine. k is equal to one of 0.7, 0.8, 0.9, 1.0, or 1.09, where 1.09 ~ $ρ_w/ρ_i$, or surface routing. bed is ArcticDEM surface - BedMachine thickness.

log_info "Calculating subglacial head with k: ${k} (${domain})"
r.mapcalc "head = if(mask_o_l_i@ArcticDEM == 1, null(), 0) + if(mask_o_l_i@ArcticDEM == 2, z_s@ArcticDEM, 0) + if(mask_o_l_i@ArcticDEM == 3, (z_s@ArcticDEM - thickness@BedMachine) + ${k} * 0.917 * thickness@BedMachine)"

Then, we calculate the streams, outlets, and basins based on the head

<<streams>>
<<outlets>>
<<basins>>

Putting it all together, we want to calculate streams, outlets, and basins to the ice edge (domain = ice), and once to the coast (domain=land). See Section Domains for implementation. This is the top-level ./sob.sh code that implements the streams, outlets, and basins routing and exports the results to disk.

NOTE: We only run the land domain 1x. The upstream basins are a function of subglacial pressure, but the exposed portion of the land basins are independent from subglacial pressure. I run ice_100, although any could be run because there is no ice overburden on land.
Note: land domain run first, because ice domains need info from land domain (downstream land outlet)

<<init>>

for domain in land ice; do
  for k_pct in 100 90 80; do

    # when land, only 100
    [[ ${domain} == land ]] && [[ ${k_pct} != 100 ]] && break 

    k=$(echo "scale=2; ${k_pct}/100" | bc -l)

    log_info "Setting domain to ${domain}_${k_pct}"
    g.mapset -c ${domain}_${k_pct}
    <<mask_domain>>
    <<head>>
    <<sob>>

    <<adjust_basins>>

    <<metadata>>

    <<export>>
  done
done

Below, we’ll build out the code defined above.

Streams

After calculating the head, we use 3rd party tools to get the flow direction and streams

THRESH=300
log_warn "Using threshold: ${THRESH}"
log_info "r.stream.extract..."

r.stream.extract elevation=head threshold=${THRESH} memory=16384 direction=dir stream_raster=streams stream_vector=streams

Outlets

The flow direction dir is negative where flow leaves the domain. These are the outlets.
Encode each outlet with a unique id

log_info "Calculating outlets"
r.mapcalc "outlets_1 = if(dir < 0, 1, null())"
r.out.xyz input=outlets_1 | \
    cat -n | \
    tr '\t' '|' | \
    cut -d"|" -f1-3 | \
    v.in.ascii input=- output=outlets_uniq separator=pipe \
        columns="x int, y int, cat int" x=2 y=3 cat=1

Basins

Using r.stream.basins, we can get basins for every outlet.

log_info "r.stream.basins..."

r.stream.basins -m direction=dir points=outlets_uniq basins=basins_uniq memory=16384 --verbose

<<absorb_small_basins>>

Domains

For the ice domain, the domain boundary is the ice/land edge.
For the land domain, the domain boundary is the land/fjord edge.

g.region -dp

log_info "Masking domain to ${domain}"
[[ ${domain} == "ice" ]] && r.mask raster=mask_o_l_i@ArcticDEM maskcats=3 --o # mask to ice
[[ ${domain} == "land" ]] && r.mask raster=mask_o_l_i@ArcticDEM maskcats="2 3" --o # mask to land & ice

<<mask_small_areas>>

Adjust Basins

I make the following basin adjustments:

Ice basins have nunatuks masked out. They needed to be classified as “ice” for the routing algorithm, otherwise SOBs occur inland, not routed to the ice margin, but the “basin” raster used for masking RCM data should not include the nunatuk area.
Land basins have nunatuks and inland “land islands in the ice” included in the land basin. No outlets occur here, but the land area and basin ID match the associated outlet.

if [[ ${domain} == "ice" ]]; then
  log_info "Adjusting ice basins..."

  g.rename raster=basins,basins_filled vector=basins,basins_filled
  r.mapcalc "basins = basins_filled * mask_ice_holes@Citterio_2013"
  r.to.vect -v input=basins output=basins type=area
  v.db.dropcolumn map=basins column="label"
fi

if [[ ${domain} == "land" ]]; then
  log_info "Adjusting land basins..."

  g.rename raster=basins,basins_filled vector=basins,basins_filled
  r.mapcalc "basins = basins_filled * not_ice@Citterio_2013"
  r.to.vect -v input=basins output=basins type=area
  v.db.dropcolumn map=basins column="label"
fi

Metadata

<<add_metadata>>
<<add_stream_indices>>

Add Metadata

streams [2/2]
- [X] stream indices
- [X] stream length
basin [2/2]
- [X] area
- [X] ice - has some ice contribution
outlet [5/5]
- [X] acc value - no, can use area
- [X] BedMachine z_b
- [X] lon, lat
- [X] EPSG 3413 x, y
- [X] link margin outlets to coast outlet

log_info "Adding metadata..."

###
### streams
###
# v.db.addcolumn map=streams column="length DOUBLE"
v.to.db map=streams option=length column=length

###
### outlets
###
v.db.addcolumn map=outlets column="lon DOUBLE PRECISION"
v.db.addcolumn map=outlets column="lat DOUBLE PRECISION"
v.db.addcolumn map=outlets column="x INT"
v.db.addcolumn map=outlets column="y INT"
# v.db.addcolumn map=outlets column="cells INT"
v.db.addcolumn map=outlets column="elev INT"

# r.mask -r

v.what.rast map=outlets raster=x@PERMANENT column=x
v.what.rast map=outlets raster=y@PERMANENT column=y
v.what.rast map=outlets raster=z_b@BedMachine column=elev

# probably a more efficient way to get lon,lat column from x,y...
mkdir -p tmp
db.select -c sql='select x,y,cat from outlets' | m.proj -od input=- | tr '|' ',' > ./tmp/lonlat.csv
db.in.ogr input=./tmp/lonlat.csv output=lonlat
db.select table=lonlat|head
v.db.join map=outlets column=cat other_table=lonlat other_column=field_3
v.db.update map=outlets column=lon query_column=field_1
v.db.update map=outlets column=lat query_column=field_2
v.db.dropcolumn map=outlets columns=field_1,field_2,field_3
db.select table=outlets | head

# distance from outlet ice or coast
if [[ ${domain} == "ice" ]]; then # processing ice domain.
   # Find which coast basin we're inside of
   ice_domain=$(g.mapset -p)
   land_domain=land_100

   v.db.addcolumn map=outlets column="coast_id int"
   v.what.rast map=outlets type=point raster=basins_filled@${land_domain} column=coast_id

   v.db.addcolumn map=outlets column="coast_lon double"
   v.db.addcolumn map=outlets column="coast_lat double"
   v.db.addcolumn map=outlets column="coast_x int"
   v.db.addcolumn map=outlets column="coast_y int"
  
   g.copy vector=outlets@${land_domain},oland
   db.execute sql='UPDATE outlets SET coast_lon=(SELECT lon from oland WHERE outlets.coast_id=oland.cat)'
   db.execute sql='UPDATE outlets SET coast_lat=(SELECT lat from oland WHERE outlets.coast_id=oland.cat)'
   db.execute sql='UPDATE outlets SET coast_x=(SELECT x from oland WHERE outlets.coast_id=oland.cat)'
   db.execute sql='UPDATE outlets SET coast_y=(SELECT y from oland WHERE outlets.coast_id=oland.cat)'
fi


# Nearest upstream Zwally sector or Mouginot basin/region for each hydro outlet
v.db.addcolumn map=outlets columns="Z2012_sector INT"
v.db.addcolumn map=outlets columns="Z2012_sector_dist INT"
v.distance from=outlets to=sectors@Zwally_2012 upload=to_attr column=Z2012_sector to_column=cat_
v.distance from=outlets to=sectors@Zwally_2012 upload=dist column=Z2012_sector_dist

v.db.addcolumn map=outlets columns="M2019_ID INT"
v.db.addcolumn map=outlets columns="M2019_ID_dist INT"
v.db.addcolumn map=outlets columns="M2019_basin CHAR(99)"
v.db.addcolumn map=outlets columns="M2019_region CHAR(99)"
v.distance from=outlets to=basins@Mouginot_2019 upload=to_attr column=M2019_ID to_column=cat output=connection
v.distance from=outlets to=basins@Mouginot_2019 upload=dist column=M2019_ID_dist
v.distance from=outlets to=basins@Mouginot_2019 upload=to_attr column=M2019_basin to_column=NAME
v.distance from=outlets to=basins@Mouginot_2019 upload=to_attr column=M2019_region to_column=SUBREGION1

# Nearest Mankoff 2020 gate and distance to gate
v.db.addcolumn map=outlets columns="M2020_gate INT"
v.db.addcolumn map=outlets columns="M2020_gate_dist INT"
v.distance from=outlets to=gates@Mankoff_2020 upload=to_attr column=M2020_gate to_column=gate
v.distance from=outlets to=gates@Mankoff_2020 upload=dist column=M2020_gate_dist

# Nearest Bjørk 2015d name and distance to point
v.db.addcolumn map=outlets columns="B2015_name CHAR(99)"
v.db.addcolumn map=outlets columns="B2015_dist INT"
v.distance from=outlets to=names@Bjork_2015 upload=to_attr column=B2015_name to_column=Official_n
v.distance from=outlets to=names@Bjork_2015 upload=dist column=B2015_dist


###
### basins
###
# area of basin
v.to.db map=basins option=area column=area

Stream Indices

log_info "r.stream.order: BEGIN"
date
time r.stream.order -m stream_rast=streams direction=dir elevation=head accumulation=ones@PERMANENT stream_vect=stream_vect strahler=strahler horton=horton shreve=shreve hack=hack topo=topological memory=16384
date
log_info "r.stream.order: END"

# g.copy vector=streams,foo --o
# g.copy vector=stream_vect,bar --o

for c in $(echo strahler horton shreve hack drwal_old topo_dim); do
    db.execute sql="ALTER TABLE streams ADD COLUMN ${c} INT"
    db.execute sql="UPDATE streams SET ${c}=(SELECT ${c} from stream_vect WHERE stream_vect.cat=streams.cat)"
done

for c in $(echo stright sinosoid cum_length source_elev outlet_elev); do
    db.execute sql="ALTER TABLE streams ADD COLUMN ${c} double"
    db.execute sql="UPDATE streams SET ${c}=(SELECT ${c} from stream_vect WHERE stream_vect.cat=streams.cat)"
done

# # fix typo: sinosoid -> sinusoid; stright -> straight
db.execute sql="ALTER TABLE streams ADD COLUMN sinusoid DOUBLE"
db.execute sql="UPDATE streams SET sinusoid = sinosoid"
# db.execute sql="ALTER TABLE streams DROP COLUMN sinosoid"
v.db.dropcolumn map=streams columns=sinosoid

db.execute sql="ALTER TABLE streams ADD COLUMN straight DOUBLE"
db.execute sql="UPDATE streams SET straight = stright"
# db.execute sql="ALTER TABLE streams DROP COLUMN stright"
v.db.dropcolumn map=streams columns=stright

Export

log_info "Exporting..."

MAPSET=$(g.mapset -p)
mkdir -p freshwater/${MAPSET}

# db.select table=streams | tr '|' ',' > ./freshwater/${MAPSET}/streams.csv
# db.select table=outlets | tr '|' ',' > ./freshwater/${MAPSET}/outlets.csv
# db.select table=basins | tr '|' ',' > ./freshwater/${MAPSET}/basins.csv
parallel --bar "db.select table={} | tr '|' ',' > ./freshwater/${MAPSET}/{}.csv" ::: streams outlets basins

# v.out.ogr input=streams output=./freshwater/${MAPSET}/streams.gpkg --o
# v.out.ogr input=outlets output=./freshwater/${MAPSET}/outlets.gpkg --o
# v.out.ogr input=basins output=./freshwater/${MAPSET}/basins.gpkg --o
parallel --bar "v.out.ogr input={} output=./freshwater/${MAPSET}/{}.gpkg --o" ::: streams outlets basins basins_filled

Helper functions

Data Dir

I set DATADIR as a bash environment variable in my login scripts.
This is so that Python babel blocks can also easily get that property.

import os
DATADIR = os.environ['DATADIR']

<<get_DATADIR>>
print(DATADIR)

init

set -o nounset
set -o pipefail

# set -o errexit

### uncomment the above line when doing initial run. When rerunning and
### counting on GRASS failing w/ overwrite issues (speed increase), the
### line above must be commented

red='\033[0;31m'; orange='\033[0;33m'; green='\033[0;32m'; nc='\033[0m' # No Color
log_info() { echo -e "${green}[$(date --iso-8601=seconds)] [INFO] ${@}${nc}"; }
log_warn() { echo -e "${orange}[$(date --iso-8601=seconds)] [WARN] ${@}${nc}"; }
log_err() { echo -e "${red}[$(date --iso-8601=seconds)] [ERR] ${@}${nc}" >&2; }

trap ctrl_c INT # trap ctrl-c and call ctrl_c()
function ctrl_c() {
  MSG_WARN "Caught CTRL-C"
  MSG_WARN "Killing process"
  kill -term $$ # send this program a terminate signal
}

debug() { if [[ debug:- == 1 ]]; then log_warn "debug:"; echo $@; fi; }

<<GRASS_config>>

GRASS config

https://grass.osgeo.org/grass74/manuals/variables.html

GRASS_VERBOSE
-1	complete silence (also errors and warnings are discarded)
0	only errors and warnings are printed
1	progress and important messages are printed (percent complete)
2	all module messages are printed
3	additional verbose messages are printed

export GRASS_VERBOSE=3
# export GRASS_MESSAGE_FORMAT=silent

if [ -z ${DATADIR+x} ]; then
    echo "DATADIR environment varible is unset."
    echo "Fix with: \"export DATADIR=/path/to/data\""
    exit 255
fi

set -x # print commands to STDOUT before running them

x and y and ones in PERMANENT mapset

log_info "x, y, ones..."

MAPSET=$(g.mapset -p)
g.mapset PERMANENT
r.mapcalc "x = x()"
r.mapcalc "y = y()"
r.mapcalc "ones = 1"
g.mapset ${MAPSET}

Mask small areas

Don’t process tiny land islands.

This is supposed to remove small islands in the ocean
It may also remove very small nunatuks

# remove islands
# frink "90 m^2 * 10 -> hectares" # 8.1
# frink "1 km^2 -> hectares" # 100

# value is in hectares
if [[ ${domain} == "land" ]]; then
  r.reclass.area -d input=MASK output=MASK_nosmall value=100.1 mode=lesser method=rmarea
  r.mask MASK_nosmall --o
fi

Absorb small basins & drop their outlets

Merge small (< 1 km^2) basins with their largest neighbor.
Drop associated outlets too.

# absorb small basins and outlets
# frink "1.0 km^2 / ((90 * 90) m^2)" # 123.4567
# frink "1.0 km^2 / ((150 * 150) m^2)" # 45
# frink "1.0 km^2 / ((100 * 100) m^2)# #100

# minsize is in cells
r.clump -d input=basins_uniq output=basins_nosmall minsize=101
r.mode base=basins_nosmall cover=basins_uniq output=basins
r.to.vect -v input=basins output=basins type=area
v.db.dropcolumn map=basins column="label"

v.to.rast input=outlets_uniq output=outlets_uniq use=cat
# r.mapcalc "outlets = if(outlets_streams == basins, basins, null())"
r.mapcalc "outlets = if(outlets_uniq == basins, basins, null())"
r.to.vect -v input=outlets output=outlets type=point
v.db.dropcolumn map=outlets column=label
# db.dropcolumn -f table=outlets column=area

GRASS launch and mapset selector prologue

Launches GRASS if not running.
Changes to specified mapset if not already in it.

[[ -z ${mapset} ]] && mapset=PERMANENT
if [[ ${GRASS_VERSION:-x} == "x" ]]; then
  [[ -d ./G ]] || grass -e -c EPSG:3413 ./G
  [[ -d ./G/${mapset} ]] || grass -e -c ./G/${mapset}
  grass ./G/${mapset}
else
  [[ ${mapset} == $(g.mapset -p) ]] || g.mapset -c ${mapset} --q
fi

Example usage:

<<grass_init_mapset>>
echo "MAPSET is: " $(g.mapset -p)

Remove GRASS PS1 prompt noise from Babel output

echo ""
echo ""
echo "${data}" | tr '>' '\n' | grep -v -E "^ ?$" | grep -v "GRASS"

Example Usage:

<<grass_init_mapset>>
g.region -p

Model output routing

Area correction for EPSG:3413

This correction needs to be applied to any model data.
It is easiest and fastest to generate an area correction raster for each of the two models on their exact grid.
To do this, we set up model domains in GRASS, estimate the area correction for each cell, write out a NetCDF file of that raster, and then apply that to each day of the model data.

<<init>>
log_info "Area Error..."

MAR

Create MAR mapset

MAR NetCDF files don’t contain projection information that can be used by GRASS. So I find the bottom, top, left, and right edges by…

longitude where lat is max,
longitude where lat is min
latitude where lon is max
latitude where lon is min

Then pass those four through m.proj to get the x,y bounds of the region in GRASS

log_info "Creating MAR mapset..."

g.mapset -c MAR

# NOTE: The origin of the hard-code values used here can be found in the Org source file.
T=-45.039822,83.948792
B=-30.694536,58.800426
L=-89.264137,81.557274
R=7.516274,80.071167

Txy=$(m.proj -i coordinates=$T)
Bxy=$(m.proj -i coordinates=$B)
Lxy=$(m.proj -i coordinates=$L)
Rxy=$(m.proj -i coordinates=$R)
echo $Txy $Bxy $Lxy $Rxy

N=$(echo ${Txy} | cut -d"|" -f2)
S=$(echo ${Bxy} | cut -d"|" -f2)
E=$(echo ${Rxy} | cut -d"|" -f1)
W=$(echo ${Lxy} | cut -d"|" -f1)

g.region e=$E w=$W s=$S n=$N -pl res=1000
g.region w=w-500 e=e+500 n=n+500 s=s-500 res=1000 -p
g.region save=MAR

MAR alignment check

Note - this check diagnosed a misalignmnet of 1/2 MAR grid cell shifted SOUTH. A correction was implemented during the import phase, so when re-running this, everything may appear to line up properly initially, and then be shifted 1/2 cell (500 m) too far NORTH if the adjustment below is applied a second time.

Review:

grass ./G/MAR

g.region region=MAR -p

d.mon start=wx0

g.list type=raster mapset=ice_surf -m |cat
d.rast basins@ice_surf

g.list type=raster mapset=MAR -m |cat
r.to.vect input=mask_ice output=mask_ice type=area
d.vect mask_ice fill_color=none

zoom in N - MAR mask seems 0.5 to 1 grid cell shifted south.
same near Disko
same in S

Let’s shift everything N by half a grid cell

g.region region=MAR -p
r.mapcalc "mask_ice_shift = mask_ice"
g.region s=s+500 n=n+500
r.region -c map=mask_ice_shift
diff <(g.region -up region=MAR) <(g.region -up)
diff <(g.region -up region=MAR) <(g.region -up raster=mask_ice_shift)
diff <(g.region -up) <(g.region -up raster=mask_ice_shift)

r.to.vect input=mask_ice_shift output=mask_ice_shift type=area
d.vect mask_ice_shift fill_color=none color=red

2D area error

EPSG:3413 has projection errors of $±$ ~8% in Greenland
Method
- Email: Re: {GRASS-user} scale error for each pixel
- Webmail: https://www.mail-archive.com/grass-user@lists.osgeo.org/msg35005.html

MAPSET=$(g.mapset -p)
log_info "2D Area Error for ${MAPSET}"

v.mkgrid map=grid position=region type=point

v.out.ascii grid | m.proj input=- -od | cut -d"|" -f1,2 | tr '|' ' ' > ./tmp/distortion_ll_${MAPSET}.txt
PROJSTR=$(g.proj -j | grep -v 'crs')
echo $PROJSTR
cat ./tmp/distortion_ll_${MAPSET}.txt \
  | proj -VS ${PROJSTR} \
  | grep Areal \
  | column -t \
  | sed s/\ \ /,/g \
  | cut -d, -f4 \
    > ./tmp/distortion_err_${MAPSET}.txt

paste -d " " <(m.proj -i input=./tmp/distortion_ll_${MAPSET}.txt separator=space | cut -d" " -f1,2) ./tmp/distortion_err_${MAPSET}.txt | r.in.xyz input=- output=err_2D_inv separator=space

r.mapcalc "err_2D_area = 1/(err_2D_inv)" # convert to multiplier
r.mapcalc "err_2D_line = 1/(err_2D_inv^0.5)" # convert area error to linear error
r.out.gdal --o -c -m input=err_2D_area output=./tmp/err_2D_area_${MAPSET}.nc type=Float32 format=netCDF

if [[ "" == $(g.list type=raster pattern=err_2D_area mapset=.) ]]; then
  <<area_error>>
fi

RACMO

Create RACMO mapset

log_info "Creating RACMO mapset..."

g.mapset -c RACMO
FILE=${DATADIR}/RACMO/freshwater/Icemask_Topo_Iceclasses_lon_lat_average_1km.nc 
cdo -sinfo ${FILE}
x0=$(ncdump -v x ${FILE} | grep "^ x =" | cut -d" " -f4 | cut -d, -f1)
x1=$(ncdump -v x ${FILE} | tail -n2 | head -n1 | tr ',' '\n' | tail -n1 | cut -d" " -f2)
y0=$(ncdump -v y ${FILE} | grep "^ y =" | cut -d" " -f4 | cut -d, -f1)
y1=$(ncdump -v y ${FILE} | tail -n2 | head -n1 | tr ',' '\n' | tail -n1 | cut -d" " -f2)
g.region w=$x0 e=$x1 s=$y0 n=$y1 res=1000 -p
g.region s=s-500 n=n+500 e=e+500 w=w-500 -p
g.region save=RACMO

2D area error

See: 2D Area Error

if [[ "" == $(g.list type=raster pattern=err_2D_area mapset=.) ]]; then
  <<area_error>>
fi

RCM preprocessing

At this point the static work is done - we have basins, streams, and outlets for all of GIS.

The remaining tasks are to SUM the daily model variables by each basin, and assign results to the outlet for each basin.

Working with the provided NetCDF files directly is possible but accessing them is very slow (in GRASS). I believe this has to do with extracting one variable from a multi-variable file, and/or chunking and compression issues. Given that the code operates day-by-day, that means from 1950 through December 2022 there are src_bash{echo $(( ($(date -d 2022-12-31 +%s) - $(date -d 1950-01-01 +%s)) / 86400 ))} {{{results(26662)}}} days, if I can cut access time down by 1 second that save 6.3 hours. In reality, by pre-processing the NetCDF files access time is reduced from 10 seconds to 1 second, saving ~2.5 days of processing time (cutting processing time in half). The cost for this savings is a few dozen lines of code and a few hours of processing.

MAR: One RUNOFF variable that contains appropriate runoff for both ice and land
RACMO: One RUNOFF variable that contains appropriate runoff for both ice and land

Init

import xarray as xr
import numpy as np
import os
import glob
from tqdm import tqdm
from dask.diagnostics import ProgressBar

<<get_DATADIR>>

MAR notes

From Xavier:

RU(CORR) = the (sub grid topography corrected) runoff over the permanent ice part of a pixel. In your case, I suggest you to use RUcorr instead of RU.

RU2 = runoff over the tundra part of a pixel (land runoff).

SF/RF are provided everywhere.

MSK_GIMP = land/sea/ice mask.

Over land you can use RUcorr and RU2 in fct of MSK_GIMP Over sea, you can use RF+SF

Extract MAR vars

Use xarray citep:hoyer_2017 and dask citep:DASK so that we can efficiently process the ~330 GB of data (much larger than RAM).

area = xr.open_dataset("./tmp/err_2D_area_MAR.nc")

ROOT=DATADIR+"/MAR/3.13-freshwater"

filelist = np.sort(glob.glob(ROOT+"/MARv3.13-daily-ERA5-????.nc"))
years = [_.split("-")[-1].split(".")[0] for _ in filelist]
for y in years:
    outfile = "./tmp/MAR_runoff_ice_" + y + ".nc"
    if os.path.exists(outfile):
        continue

    infile = ROOT+"/MARv3.13-daily-ERA5-" + y + ".nc"
    print("infile: ", infile, "outfile: ", outfile)
    ds = xr.open_mfdataset(infile, 
                           chunks={'time': 30}, 
                           combine='by_coords')
    da = (ds['RUcorr'] * (ds['MSK'] == 2) * area['Band1'].values)   # runoff over ice
    da.variable.attrs = {'units':'mmWE/day','long_name':'Water runoff','standard_name':'Water runoff'}
    ds_sub = da.to_dataset(name='runoff')
    delayed_obj = ds_sub.to_netcdf(outfile, compute=False)
    with ProgressBar(): results = delayed_obj.compute() # takes a few min...
    del(ds); del(da); del(ds_sub); del(delayed_obj)    

for y in years:
    outfile = "./tmp/MAR_runoff_land_" + y + ".nc"
    if os.path.exists(outfile):
        continue

    infile = ROOT+"/MARv3.13-daily-ERA5-" + y + ".nc"
    print("infile: ", infile, "outfile: ", outfile)
    ds = xr.open_mfdataset(infile, 
                           chunks={'time': 30},
                           combine='by_coords')
    da = ds['RU2'] * (ds['MSK'] == 1) * area['Band1'].values       # runoff over land
    da.variable.attrs = {'units':'mmWE/day','long_name':'Water runoff','standard_name':'Water runoff'}
    ds_sub = da.to_dataset(name='runoff')
    delayed_obj = ds_sub.to_netcdf(outfile, compute=False)
    with ProgressBar(): results = delayed_obj.compute()
    del(ds); del(da); del(ds_sub); del(delayed_obj)

FAIL Test

Sum land runoff for one year
Sum ice runoff for one year
There should be no overlap (and no gaps at the boundaries)

grass -c ./G/tmp/

g.region MAR@MAR

export GRASS_OVERWRITE=1

d.mon start=wx0
d.erase

### MAR ice
seq=$(seq 365)
parallel --bar 'r.external -o source="netCDF:./tmp/MAR_runoff_ice_2000.nc:runoff" output=ice.{} band={}' ::: ${seq}
parallel --bar 'r.region -c map=ice.{} --q' ::: ${seq}
r.series input=$(g.list type=raster pattern=ice.* separator=,) output=ice method=maximum
r.mapcalc "ice0 = if(ice == 0, null(), ice)"
r.colors -a map=ice0 color=blues

### MAR land
parallel --bar 'r.external -o source="netCDF:./tmp/MAR_runoff_land_2000.nc:runoff" output=land.{} band={}' ::: ${seq}
parallel --bar 'r.region -c map=land.{} --q' ::: ${seq}
r.series input=$(g.list type=raster pattern=land.* separator=,) output=land method=maximum
r.mapcalc "land0 = if(land == 0, null(), land)"
r.colors -a map=land0 color=green

r.mapcalc "test = if(ice0) + if(land0)"
r.info test # should be min=NULL max=NULL showing no overlap. What about gaps?

r.mapcalc "test = if(isnull(ice0), 0, 1) + if(isnull(land0), 0, 2)"
# min = 0, max = 2

TODO: Problem: There appears to be no land runoff.

RACMO notes

Brice email: Re: All freshwater

The data consist of:

1 km: ice covered regions

Daily total precipitation (rain + snow) and runoff in mm w.e. or kg/m2 per day covering the whole GrIS and peripheral ice caps (GICS) for the year 1979-2017.

Mask file including: lon/lat, PROMICE mask (3 = GrIS, 1 and 2 = GICs) and topography from the GIMP DEM upscaled to 1 km.

The data are statistically downscaled from the output of RACMO2.3p2 (5.5 km) to 1 km resolution.

The projection used is Polar Stereographic North (EPSG:3413) and the resolution is exactly 1 km x 1 km.

5.5 km: tundra region

Daily runoff and evaporation in mm w.e. or kg/m2 per day at 5.5 km for 1979-2017.

Use of the data:

Note that in RACMO2 a positive subl means condensation while negative subl actually means sublimation.

Extract RACMO vars

Ice runoff

This data set provided in its current from by Brice.
1 km gridded land runoff on EPSG:3413 grid

area = xr.open_dataset("./tmp/err_2D_area_RACMO.nc")

ROOT=DATADIR+"/RACMO/freshwater"

msk = xr.open_dataset(ROOT+"/Icemask_Topo_Iceclasses_lon_lat_average_1km.nc")
msk['landmask'] = (msk['Promicemask'] == 0) & (msk['Topography'] != 0)

filelist = np.sort(glob.glob(ROOT+"/runoff_ice_1km/runoff*.nc"))
years = np.unique([_.split(".")[1].split("_")[0] for _ in filelist])
for i,y in enumerate(tqdm(years)):
    outfile = "./tmp/RACMO_runoff_ice_" + y + ".nc"
    if os.path.exists(outfile):
        continue

    filelist_year = [_ for _ in filelist if str(y) in _]
    print("infile: ", filelist_year, "outfile: ", outfile)
    ds = xr.open_mfdataset(filelist_year,
                           chunks={'time': 30},
                           combine='by_coords')
    
    da = (ds['runoffcorr'] * (msk['Promicemask'] != 0) * area['Band1'].values) # ice runoff
    da.variable.attrs = {'units':'mm w.e. per day',
                         'long_name':'Downscaled corrected snowmelt',
                         'standard_name':'Downscaled_corrected_snowmelt'}
    ds_sub = da.to_dataset(name='runoff')
    delayed_obj = ds_sub.to_netcdf(outfile, compute=False)
    with ProgressBar(): results = delayed_obj.compute()
    del(ds); del(da); del(ds_sub); del(delayed_obj)

Land runoff

This data set provided at 5.5 km on a rotated pole grid
Code to regrid onto 1 km EPSG:3413 was also provided.
- See ${DATADIR}/RACMO/freshwater/runoff_land_1km_regrid/ folder for details

area = xr.open_dataset("./tmp/err_2D_area_RACMO.nc")

ROOT=DATADIR+"/RACMO/freshwater"
ROOT_REGRID = ROOT + "/runoff_land_1km_regrid"
msk = xr.open_dataset(ROOT_REGRID+"/Tundra_Mask_1km.nc")

filelist = np.sort(glob.glob(ROOT_REGRID + "/out/runoff*.nc"))
years = np.unique([_.split(".")[1] for _ in filelist])
for i,y in enumerate(tqdm(years)):
    outfile = "./tmp/RACMO_runoff_land_" + y + ".nc"
    if os.path.exists(outfile):
        continue

    print("infile: ", filelist[i], "outfile: ", outfile)
    ds = xr.open_mfdataset(filelist[i], chunks={'time': 30}, combine='by_coords')
    
    da = ds['runoff'] * (msk['Tundra']) * area['Band1'].values       # runoff over land
    da.variable.attrs = {'units':'mm w.e. per day',
                         'long_name':'Downscaled corrected snowmelt',
                         'standard_name':'Downscaled_corrected_snowmelt'}
    ds_sub = da.to_dataset(name='runoff')
    delayed_obj = ds_sub.to_netcdf(outfile, compute=False)
    with ProgressBar(): results = delayed_obj.compute()
    del(ds); del(da); del(ds_sub); del(delayed_obj)

Test

SUCCESS Ice and land should not overlap

Domains should be independent.

grass -c ./G/tmp/

g.region RACMO@RACMO

export GRASS_OVERWRITE=1

d.mon start=wx0
d.erase

### RACMO ice
seq=$(seq 365)
parallel --bar 'r.external -o source="netCDF:./tmp/RACMO_runoff_ice_2000.nc:runoff" output=ice.{} band={}' ::: ${seq}
parallel --bar 'r.region -c map=ice.{} --q' ::: ${seq}
r.series input=$(g.list type=raster pattern=ice.* separator=,) output=ice method=maximum
r.mapcalc "ice0 = if(ice == 0, null(), ice)"
r.colors -a map=ice0 color=blues

### RACMO land
parallel --bar 'r.external -o source="netCDF:./tmp/RACMO_runoff_land_2000.nc:runoff" output=land.{} band={}' ::: ${seq}
parallel --bar 'r.region -c map=land.{} --q' ::: ${seq}
r.series input=$(g.list type=raster pattern=land.* separator=,) output=land method=maximum
r.mapcalc "land0 = if(land == 0, null(), land)"
r.colors -a map=land0 color=green

r.mapcalc "test = if(ice0) + if(land0)"
r.info test # should be min=NULL max=NULL showing no overlap. 

r.mapcalc "test = if(isnull(ice0), 0, 1) + if(isnull(land0), 0, 2)" # What about gaps?
r.info test
# min = 0, max = 2
d.rast test
d.legend test

SUCCESS Ice runoff should not bleed into land

Not an issue when the RCM provides two different domains and a mask (MAR)
May be an issue when RCM results are interpolated onto a grid from elsewhere (RACMO)

import xarray as xr
import numpy as np
import glob
import os

area = xr.open_dataset("./tmp/err_2D_area_RACMO.nc")

DATADIR='/home/kdm/data'
ROOT=DATADIR+"/RACMO/freshwater"

### ICE

msk = xr.open_dataset(ROOT+"/Icemask_Topo_Iceclasses_lon_lat_average_1km.nc")
msk['landmask'] = (msk['Promicemask'] == 0) & (msk['Topography'] != 0)

filelist = np.sort(glob.glob(ROOT+"1km/runoff*.nc"))
years = np.unique([_.split(".")[1].split("_")[0] for _ in filelist])
i = 42
y = '2000'

filelist_year = [_ for _ in filelist if str(y) in _]
ds = xr.open_mfdataset(filelist_year,
                       chunks={'time': 30},
                       combine='by_coords')
    
imshow(ds['runoffcorr'][200,:,:], origin='lower')

### LAND

ROOT_REGRID = ROOT + "/regrid_5.5-to-1km"
msk = xr.open_dataset(ROOT_REGRID+"/Tundra_Mask_1km.nc")

filelist = np.sort(glob.glob(ROOT_REGRID + "/out/runoff*.nc"))
years = np.unique([_.split(".")[2].split(".")[0] for _ in filelist])
i = 3
y = '2000'

ds = xr.open_mfdataset(filelist[i], chunks={'time': 30}, combine='by_coords')

imshow(msk['Tundra'], origin='lower')
land_runoff = ds.squeeze()['runoff'].max(axis=0).compute()
imshow(land_runoff, origin='lower')
imshow(land_runoff *  msk['Tundra'], origin='lower') # appears that ice runoff "bleeds" into land runoff.

Inputs equal outputs?

I expect not - the difference should be due to the area scaling

Create test files - one year of MAR and RACMO data

cdo -v selyear,2000 ./tmp/MAR_runoff_ice.nc ./tmp/MAR_Y2K_tmp.nc
cdo -v yearsum ./tmp/MAR_Y2K_tmp.nc ./tmp/MAR_Y2K.nc

cdo -v selyear,2000 ./tmp/MAR_runoff_land.nc ./tmp/MAR_Y2K_tmp.nc
cdo -v yearsum ./tmp/MAR_Y2K_tmp.nc ./tmp/MAR_Y2K_land.nc

cdo -v selyear,2000 ./tmp/RACMO_runoff_ice.nc ./tmp/RACMO_Y2K_tmp.nc
cdo -v yearsum ./tmp/RACMO_Y2K_tmp.nc ./tmp/RACMO_Y2K.nc

rm ./tmp/{MAR,RACMO}_Y2K_tmp.nc

Freshwater flow rates from RCMs

<<init>>

Output info table

Have:

product	loc	k	RCM	DEM	variable
runoff	ice margin	80,90,100	MAR	ArcticDEM	RUcorr
runoff	ice margin	80,90,100	RACMO	ArcticDEM	runoffcorr
runoff	land coast	100	MAR	ArcticDEM	RUcorr
runoff	land coast	100	RACMO	ArcticDEM	runoffcorr

Can expand to include:

[ ] product: More products (e.g. precip, condensation, etc.)
[ ] location: Add Fjord Surface. Improve linking ice margin to coast
[ ] RCM: More RCMs? More times (future simulations?) Near-realtime Wx forecast?
[ ] DEM: Multiple DEMs? Higher res ArcticDEM?
[X] Other: Multiple DEM routing schemes (subglacial)

SETUP algorithm and prepare data

The algorithm is designed to capture model data that is over basin ice.
This works without extra effort when operating in the ice domain routed to the ice edge, because data outside the domain is masked.
However, when working over the ice domain routed to the coast, this includes the land domain. Now 1 km^2 model grid cells that only partially cover an ice basin are included in their entirety.
From this, ice runoff routed to the ice edge is not equal to ice runoff routed over land to the coast, but the two should be equal.
Therefore, I build a basin mask here
Later, we’ll apply the smallest version of AND(basin_mask, model_mask) limiting to the smallest ice subset.

log_info "Initilazing MAR mapset..."

g.mapset MAR

g.region -d # default region, or raster=basins_merged@coast
r.mapcalc "mask_ice_basin = if(basins@ice_100, 1, null())"
r.mapcalc "mask_land_basin = if(not(isnull(basins@land_100)) & isnull(mask_ice_basin), 1, null())"
r.mapcalc "mask_ice_land_basin = if(not(isnull(basins_filled@land_100)), 1, null())"

g.region MAR@MAR -p
r.in.gdal -o input="NetCDF:${DATADIR}/MAR/3.13-freshwater/MARv3.13-daily-ERA5-2000.nc:MSK" output=MSK
r.region map=MSK region=MAR@MAR
r.mapcalc "mask_ice_MAR = if((MSK == 2) & mask_ice_land_basin, 1, null())"
r.mapcalc "mask_land_MAR = if((MSK == 1) & mask_ice_land_basin, 1, null())"
g.region -d

Below is…

NON PARALLEL
not tangled
Run for 5 days
Run parallelized version for 5 days
Check outputs are the same

Timing Test:

r.mapcalc "model = 1" # One synthetic day
r.region map=model region=MAR@MAR --q

r.mask -r 
time r.univar -t map=model zones=basins@ice_100 > /dev/null # 10-15 sec

### LOOP VERSION (~11 seconds per day)

rm ./tmp/MAR_ice_runoff.noparallel.bsv
rm ./tmp/MAR_ice_runoff.bsv

for t in $(seq 0 4); do
  date
  tt=$(( $t + 1 )) # T0 is 0-based. r.external is 1-based
  DATESTR=$(date -d"2000-01-01+${t} days" +"%Y-%m-%d")
  r.external -o source="netCDF:./tmp/MAR_runoff_ice_2000.nc:runoff" output=model band=${tt} --o --q
  r.region map=model region=MAR@MAR --q
  r.univar -t map=model zones=basins@ice_100 --q | cut -d"|" -f13 | datamash -t"|" transpose | sed s/^sum/${DATESTR}/ >> ./tmp/MAR_ice_runoff.noparallel.bsv
done

Implement above code but in parallel

function basin_partition() {
    JOB="$1"         # 0 to ndays-1
    SLOT="$2"        # 0 to ncpus-1
    ZONES="$3"       # basins@land_100, basins@ice_80, etc.
    RCMfilevar=$4    # MAR_runoff_ice_2000.nc:runoff
    REGION=$5        # MAR@MAR
    
    RCMfile=${RCMfilevar/:*/}  # <RCM>_runoff_<domain>_<k>.nc

    r.external -o source="netCDF:tmp/${RCMfilevar}" output=model_${SLOT} band=${JOB} --o --q 2>/dev/null
    r.region map=model_${SLOT} region=${REGION} --q
    DATESTR0=$(ncdump -h tmp/${RCMfile} |grep time:units)
    DATESTR0=${DATESTR0: 27:10}
    DATESTR=$(date -d"${DATESTR0}+$(( ${JOB}-1 )) days" +"%Y-%m-%d")
    r.univar -t map=model_${SLOT} zones=${ZONES} --q 2>/dev/null | cut -d"|" -f13 | datamash -t"|" transpose | sed s/^sum/${DATESTR}/
}
export -f basin_partition

DEBUG: Proc 1 day and make sure its same as NON PARALLEL

seq 5 | parallel --keep-order --bar basin_partition {#} {%} basins@ice_100 "MAR_runoff_ice_1980.nc:runoff" MAR@MAR >> ./tmp/MAR_ice_runoff.bsv # DEBUG

diff -q ./tmp/MAR_ice_runoff.bsv ./tmp/MAR_ice_runoff.noparallel.bsv
head -n1 ./tmp/MAR_ice_runoff.bsv | tr '|' '\n' | wc
tail -n1 ./tmp/MAR_ice_runoff.bsv | tr '|' '\n' | wc

We want to calculate

ice runoff to ice margin
land runoff to coast

All separately. Therefore, easiest to build the ZONES file for r.univar to cover only what we want.

Set up header

log_info "Building BSV headers"

# r.external -o source="netCDF:./tmp/MAR_runoff_ice.nc:runoff" output=model band=1 --o --q # load in a sample day
# r.region map=model region=MAR@MAR

r.mapcalc "model = 1" --o # dummy data to build headers
for k_pct in 100 90 80; do
  r.univar -t map=model zones=basins@ice_${k_pct} \
    | sed s/^zone/cat/ \
    | cut -d"|" -f1 \
    | datamash -t"|" transpose \
	       > ./tmp/header_ice_${k_pct}.bsv
done

# land only 100
k_pct=100
r.univar -t map=model zones=basins@land_${k_pct} \
  | sed s/^zone/cat/ \
  | cut -d"|" -f1 \
  | datamash -t"|" transpose \
	     > ./tmp/header_land_${k_pct}.bsv

MAR ice runoff at ice outlets

log_info "Using EXTRA GNU parallel options via \${PARALLEL}: ${PARALLEL:-Not set}"

for y in $(seq 1950 2022); do
  for k_pct in 100 90 80; do
    file_base=MAR_runoff_ice_${y}
    infile=${file_base}.nc
    outfile=${file_base}_${k_pct}.bsv

    log_info "Calculating MAR ice runoff at ice outlets: ${outfile}"

    if [[ -e ./dat/${outfile} ]]; then
      log_warn "Output file ${outfile} exists. Skipping..."
    else
      cp ./tmp/header_ice_${k_pct}.bsv ./dat/${outfile}

      T1=$(ncdump -v time ./tmp/${infile} | grep -A 100000 "^ time =" | tr -dc [0-9,] | rev | cut -d, -f1 | rev)
      seq 0 ${T1} \
	| parallel --nice 9 --keep-order --bar basin_partition {#} {%} basins@ice_${k_pct} "${infile}:runoff" MAR@MAR \
		   >> ./dat/${outfile}
    fi
  done
done

QC: Do outputs match inputs?

I expect outputs to be less than inputs because not every model ice cell is fully sampled, because not every model ice cell has an ice basin under it. Model ice cells over land are not counted.

import pandas as pd
import xarray as xr
import numpy as np

<<get_DATADIR>>
mask = xr.open_dataset(DATADIR+"/MAR/3.13-freshwater/MARv3.13-daily-ERA5-2000.nc")['MSK']

# First day match?   # convert mm/grid cell to m^3
o = pd.read_csv('./dat/runoff_ice2margin_MAR.bsv', sep="|", index_col=0, nrows=1) * 1E-3 * 100 * 100

# first year of input
i = xr.open_dataset('./dat/MAR_runoff_ice.nc').isel(time=0)['runoff'] * 1E-3 * 1000 * 1000 * (mask == 2)

osum = o.sum().sum()
isum = i.sum().values
print(isum, osum)
print(isum/osum)

# check for first year
o = pd.read_csv('./dat/runoff_ice2margin_MAR.bsv', sep="|", index_col=0, nrows=366) * 1E-3 * 100 * 100
i = xr.open_dataset('./dat/MAR_runoff_ice.nc').isel(time=np.arange(366))['runoff'] * 1E-3 * 1000 * 1000 * (mask == 2)
osum = o.sum().sum()
isum = i.sum().values
print(isum, osum)
print(isum/osum)

Inputs are ~4 % larger than outputs over the first year
- Due to EPSG:3413 scaling?
- Due to Arctic Canada in this mask?

MAR land runoff at coast outlets

log_info "Using EXTRA GNU parallel options via \${PARALLEL}: ${PARALLEL:-Not set}"

k_pct=100
for y in $(seq 1950 2022); do
  file_base=MAR_runoff_land_${y}
  infile=${file_base}.nc
  outfile=${file_base}_${k_pct}.bsv

  log_info "Calculating MAR land runoff at land outlets: ${outfile}"

  if [[ -e ./dat/${outfile} ]]; then
    log_warn "Output file ${outfile} exists. Skipping..."
  else
    cp ./tmp/header_land_${k_pct}.bsv ./dat/${outfile}
    
    T1=$(ncdump -v time ./tmp/${infile} | grep -A 100000 "^ time =" | tr -dc [0-9,] | rev | cut -d, -f1 | rev)
    seq 0 ${T1} \
      | parallel --nice 9 --keep-order --bar basin_partition {#} {%} basins@land_${k_pct} "${infile}:runoff" MAR@MAR \
		 >> ./dat/${outfile}
  fi
done

RACMO (same as MAR)

Prepare data

g.mapset -c RACMO

g.region -d
r.mapcalc "mask_ice_basin = if(basins@ice_100, 1, null())"
r.mapcalc "mask_land_basin = if(not(isnull(basins@land_100)) & isnull(mask_ice_basin), 1, null())"
r.mapcalc "mask_ice_land_basin = if(not(isnull(basins_filled@land_100)), 1, null())"

g.region RACMO@RACMO -p
FILE=${DATADIR}/RACMO/freshwater/Icemask_Topo_Iceclasses_lon_lat_average_1km.nc 
r.in.gdal -o input="NetCDF:${FILE}:Promicemask" output=mask_ice
r.region map=mask_ice region=RACMO@RACMO
r.mapcalc "mask_ice_RACMO = if((mask_ice != 0) & mask_ice_land_basin, 1, null())"
r.mapcalc "mask_land_RACMO = if((mask_ice == 0) & mask_ice_land_basin, 1, null())"
g.region -d

r.mapcalc "model = 1" # One synthetic day
r.region map=model region=RACMO@RACMO

Partition RCMs across basins

g.mapset RACMO

log_info "Using EXTRA GNU parallel options via \${PARALLEL}: ${PARALLEL:-Not set}"

# ice
for y in $(seq 1958 2022); do
  for k_pct in 100 90 80; do
    file_base=RACMO_runoff_ice_${y}
    infile=${file_base}.nc
    outfile=${file_base}_${k_pct}.bsv

    log_info "Calculating RACMO ice runoff at ice outlets: ${outfile}"

    if [[ -e ./dat/${outfile} ]]; then
      log_warn "Output file ${outfile} exists. Skipping..."
    else
      cp ./tmp/header_ice_${k_pct}.bsv ./dat/${outfile}

      T0=$(ncdump -v time ./tmp/${infile} | grep -A 100000 "^ time =" | tr -dc [0-9,] | cut -d, -f1)
      T1=$(ncdump -v time ./tmp/${infile} | grep -A 100000 "^ time =" | tr -dc [0-9,] | rev | cut -d, -f1 | rev)
      seq ${T0} ${T1} \
        | parallel --nice 9 --keep-order --bar basin_partition {#} {%} basins@ice_${k_pct} "${infile}:runoff" RACMO@RACMO \
             >> ./dat/${outfile}
    fi
  done
done

# land
k_pct=100
for y in $(seq 1958 2022); do
  file_base=RACMO_runoff_land_${y}
  infile=${file_base}.nc
  outfile=${file_base}_${k_pct}.bsv
  
  log_info "Calculating RACMO land runoff at land outlets: ${outfile}"
  
  if [[ -e ./dat/${outfile} ]]; then
    log_warn "Output file ${outfile} exists. Skipping..."
  else
    cp ./tmp/header_land_${k_pct}.bsv ./dat/${outfile}
    
    T0=$(ncdump -v time ./tmp/${infile} | grep -A 100000 "^ time =" | tr -dc [0-9,] | cut -d, -f1)
    T1=$(ncdump -v time ./tmp/${infile} | grep -A 100000 "^ time =" | tr -dc [0-9,] | rev | cut -d, -f1 | rev)
    seq ${T0} ${T1} \
      | parallel --nice 9 --keep-order --bar basin_partition {#} {%} basins@land_${k_pct} "${infile}:runoff" RACMO@RACMO \
          >> ./dat/${outfile}
  fi
done

NOTDONE Precipitation (rain+snow) at fjord surface

grass -c ./G/fjords
g.region -d res=1000 -a -p
r.mask -r

d.mon start=wx0
d.erase
d.rast shade
d.rast land_shade

# fill everything (Disko bay, Geiki, fjords, etc.)
r.grow -m input=head@coast output=grow radius=50000 old=-1 new=1 --o

# shrink back to just the coastal region
r.grow -m input=grow output=shrink radius=-44000 --o

r.null map=shrink setnull=-1

# fill in cells that are part land/coast
# but might have land at the center at 30 m res
r.grow -m input=shrink output=buffer_coast radius=1001 new=1 --o

d.erase
d.rast shade
d.rast buffer_coast

# for display purposes?
r.to.vect input=buffer_coast output=buffer_coast type=point

# then, r.out.xyz for the model data using buffer_coast as a mask. 
# This should produce "x,y,val" for each time step

Testing

grass -c ./G/tmp/

export GRASS_OVERWRITE=1

d.mon start=wx0
d.erase

### ICE

d.rast basins@ice_80
# OR
r.to.vect input=basins@ice_80 output=zones type=area
d.vect zones fill_color=none color=red

### MAR

band=250
r.external -o source="netCDF:./tmp/MAR_runoff_ice_2000.nc:runoff" output=map band=${band} 
r.region map=map region=MAR@MAR --q
r.colors -a map=map color=viridis
d.rast map
d.vect zones fill_color=none color=red

### RACMO

band=200
r.external -o source="netCDF:./tmp/RACMO_runoff_ice_1958.nc:runoff" output=map band=${band} 
r.region map=map region=MAR@MAR --q
r.colors -a map=map color=viridis
d.rast map

### LAND

# r.to.vect input=basins@land_100 output=zones type=area
# d.vect zones fill_color=none color=red
d.rast basins@land_100

# MAR

band=220
r.external -o source="netCDF:./tmp/MAR_runoff_land_1999.nc:runoff" output=map band=${band} 
r.region map=map region=MAR@MAR --q
r.colors -a map=map color=viridis
d.rast map

# RACMO

band=220
r.external -o source="netCDF:./tmp/RACMO_runoff_land_1999.nc:runoff" output=map band=${band} 
r.region map=map region=MAR@MAR --q
r.colors -a map=map color=viridis
d.rast map



# r.to.vect input=map_${band} output=map_${band} type=area
# d.vect map_${band} fill_color=none color=blue

Calculate and prepare coverage

Calculate

<<init>>

See [ Coverage ]

How much of each ice basin is (1) (is not (*)) covered by model ice cells.

for k in 80 90 100; do
  mapset=ice_${k}
  g.mapset ${mapset}

  r.mask -r
  r.stats --q -aN input=basins@${mapset},mask_ice_MAR@MAR separator=,  \
	  > ./tmp/coverage_MAR_ice_${k}.csv
  r.stats --q -aN input=basins@${mapset},mask_ice_RACMO@RACMO separator=,  \
	  > ./tmp/coverage_RACMO_ice_${k}.csv
done

How much of each land basin is (1) (is not (*)) covered by model land cells.

k=100
mapset=land_${k}
g.mapset ${mapset}

r.mask -r
r.stats --q -aN input=basins@${mapset},mask_land_MAR@MAR separator=,  \
	> ./tmp/coverage_MAR_land_${k}.csv
r.stats --q -aN input=basins@${mapset},mask_land_RACMO@RACMO separator=,  \
	> ./tmp/coverage_RACMO_land_${k}.csv

Prepare

Seems the easiest (and fastest) way to apply coverage is to write to NetCDF that is in the same format as the runoff NetCDF files so that they can easily be multiplied together.

BSV to NC

Convert annual BSV files to annual NetCDF format files

Want to produce CF-compliant and user-friendly NetCDF that will work with nco and panoply.

SEE: https://www.nodc.noaa.gov/data/formats/netcdf/v2.0/ timeSeriesProfile
https://www.nodc.noaa.gov/data/formats/netcdf/v2.0/timeSeriesOrthogonal.cdl
http://cfconventions.org/cf-conventions/cf-conventions.html#time-series-data

Also modeling after:

~/data/Bamber_2018/FWF17.v3_a.nc
See also: https://gis.stackexchange.com/questions/67012/using-netcdf-for-point-time-series-observations

This code:

Can be run 1x within Org to produce ds.nc for testing purposes
Is also included in the tangled bsv2netcdf.py script

In the bsv2netcdf.py script, pass the input file in, and the output file is determined from the input file name. Can be run in parallel (for example in the Makefile) with: parallel --bar "python bsv2netcdf.py {}" ::: $(ls dat/*_runoff_*.bsv)

import pandas as pd
import xarray as xr
import numpy as np
import datetime
import subprocess
import os

# if "BSVFILE" not in locals(): BSVFILE="./tmp/runoff_ice2margin_MAR.bsv.1979"
#if "BSVFILE" not in locals(): BSVFILE="./tmp/runoff_ice2margin_MAR.bsv.2000"
if "BSVFILE" not in locals(): BSVFILE="./dat/RACMO_runoff_land_1979_100.bsv"

df = pd.read_csv(BSVFILE, index_col=0, delimiter="|")
df.index = pd.to_datetime(df.index)

RCM, var, domain, year, k = os.path.basename(os.path.splitext(BSVFILE)[0]).split("_")

# patch RACMO dates
if (RCM == 'RACMO') & (domain == 'land'):
    df.index = pd.date_range(start=str(year)+'-01-01', end=str(year)+'-12-31')[0:df.index.size]


metadir = domain + "_" + k
meta = pd.read_csv("freshwater/" + metadir + "/outlets.csv" , index_col=0).T
meta = meta[df.columns.astype(np.int)]

ds = xr.Dataset()

ds["station"] = (("station"), df.columns.astype(np.str).astype(np.uint32))
ds["time"] = (("time"), df.index)

ds["station"].attrs["long_name"] = "outlet id"
ds["station"].attrs["cf_role"] = "timeseries_id"
# ds["station"].attrs["units"] = ""

ds["time"].attrs["long_name"] = "time of measurement"
ds["time"].attrs["standard_name"] = "time"
# ds["time"].attrs["units"] = "day of year"
# ds["time"].attrs["calendar"] = "julian"
ds["time"].attrs["axis"] = "T"

# gridsize = 100 x 100
CONV = 1E-3 * 100 * 100 / 86400
ds["discharge"] = (("station", "time"), df.values.T*CONV)
ds["discharge"].attrs["long_name"] = RCM + " discharge"
ds["discharge"].attrs["standard_name"] = "water_volume_transport_in_river_channel"
ds["discharge"].attrs["units"] = "m3 s-1"
ds["discharge"].attrs["coordinates"] = "lat lon alt station"

ds["lat"] = (("station"), meta.loc['lat'].astype(np.float32))
#ds["lat"].attrs["coordinates"] = "station"
ds["lat"].attrs["long_name"] = "latitude"
ds["lat"].attrs["standard_name"] = "latitude"
ds["lat"].attrs["units"] = "degrees_north"
ds["lat"].attrs["axis"] = "Y"

ds["lon"] = (("station"), meta.loc['lon'].astype(np.float32))
#ds["lon"].attrs["coordinates"] = "station"
ds["lon"].attrs["long_name"] = "longitude"
ds["lon"].attrs["standard_name"] = "longitude"
ds["lon"].attrs["units"] = "degrees_east"
ds["lon"].attrs["axis"] = "X"

ds["alt"] = (("station"), meta.loc['elev'].astype(np.float32))
ds["alt"].attrs["long_name"] = "height_above_mean_sea_level"
ds["alt"].attrs["standard_name"] = "altitude"
# ds["alt"].attrs["long_name"] = "height above mean sea level"
# ds["alt"].attrs["standard_name"] = "height"
ds["alt"].attrs["units"] = "m"
ds["alt"].attrs["positive"] = "up"
ds["alt"].attrs["axis"] = "Z"

ds["Z2012_sector"] = (("station"), meta.loc["Z2012_sector"].replace(np.nan, value=-1).astype(np.uint32))
ds["Z2012_sector"].attrs["long_name"] = "Zwally (2012) sector nearest outlet"
ds["Z2012_sector_dist"] = (("station"), meta.loc["Z2012_sector_dist"].replace(np.nan, value=-1).astype(np.uint32))
ds["Z2012_sector_dist"].attrs["long_name"] = "Zwally (2012) sector nearest outlet distance"

ds["M2019_ID"] = (("station"), meta.loc["M2019_ID"].replace(np.nan, value=-1).astype(np.uint32))
ds["M2019_ID"].attrs["long_name"] = "Mouginot (2019) basin nearest outlet ID"
ds["M2019_ID_dist"] = (("station"), meta.loc["M2019_ID_dist"].replace(np.nan, value=-1).astype(np.uint32))
ds["M2019_ID_dist"].attrs["long_name"] = "Mouginot (2019) basin nearest outlet ID distance"
ds["M2019_basin"] = (("station"), meta.loc["M2019_basin"])
ds["M2019_basin"].attrs["long_name"] = "Mouginot (2019) basin nearest outlet name"
ds["M2019_region"] = (("station"), meta.loc["M2019_region"])
ds["M2019_region"].attrs["long_name"] = "Mouginot (2019) region nearest outlet"

ds["M2020_gate"] = (("station"), meta.loc["M2020_gate"].replace(np.nan, value=-1).astype(np.uint32))
ds["M2020_gate"].attrs["long_name"] = "Mankoff (2020) gate nearest outlet"
ds["M2020_gate_dist"] = (("station"), meta.loc["M2020_gate_dist"].replace(np.nan, value=-1).astype(np.uint32))
ds["M2020_gate_dist"].attrs["long_name"] = "Mankoff (2020) gate nearest outlet distance"

ds["B2015_name"] = (("station"), meta.loc["B2015_name"])
ds["B2015_name"].attrs["long_name"] = "Bjørk (2015) Official Name gate nearest outlet"
ds["B2015_dist"] = (("station"), meta.loc["B2015_dist"])
ds["B2015_dist"].attrs["long_name"] = "Bjørk (2015) Official Name point distance"

if 'ice' in BSVFILE:
    ds["coast_id"] = (("station"), meta.loc["coast_id"].replace(np.nan, value=-1).astype(np.uint32))
    ds["coast_id"].attrs["long_name"] = "ID of coastal outlet"

    ds["coast_lat"] = (("station"), meta.loc['coast_lat'].astype(np.float32))
    ds["coast_lat"].attrs["long_name"] = "latitude"
    ds["coast_lat"].attrs["standard_name"] = "latitude"
    ds["coast_lat"].attrs["units"] = "degrees_north"
    ds["coast_lat"].attrs["axis"] = "Y"

    ds["coast_lon"] = (("station"), meta.loc['coast_lon'].astype(np.float32))
    ds["coast_lon"].attrs["long_name"] = "longitude"
    ds["coast_lon"].attrs["standard_name"] = "longitude"
    ds["coast_lon"].attrs["units"] = "degrees_east"
    ds["coast_lon"].attrs["axis"] = "X"

    ds["coast_alt"] = (("station"), meta.loc['elev'].astype(np.float32))
    ds["coast_alt"].attrs["long_name"] = "height_above_mean_sea_level"
    ds["coast_alt"].attrs["standard_name"] = "altitude"
    ds["coast_alt"].attrs["units"] = "m"
    ds["coast_alt"].attrs["positive"] = "up"
    ds["coast_alt"].attrs["axis"] = "Z"

ds.attrs["featureType"] = "timeSeries"
ds.attrs["title"] = "Greenland discharge"
ds.attrs["summary"] = "Greenland RCM discharge at basin outlets"
ds.attrs["keywords"] = "Hydrology; Greenland; Runoff; Discharge; Freshwater"
ds.attrs["Conventions"] = "CF-1.7"
ds.attrs["source"] = "git commit: " + \
    subprocess.check_output(["git", "describe", "--always"]).strip().decode('UTF-8')
# ds.attrs["comment"] = "TODO"
# ds.attrs["acknowledgment"] = "TODO"
# ds.attrs["license"] = "TODO"
# ds.attrs["date_created"] = datetime.datetime.now().strftime("%Y-%m-%d")
ds.attrs["creator_name"] = "Ken Mankoff"
ds.attrs["creator_email"] = "kdm@geus.dk"
ds.attrs["creator_url"] = "http://kenmankoff.com"
ds.attrs["institution"] = "GEUS"
# ds.attrs["time_coverage_start"] = "TODO"
# ds.attrs["time_coverage_end"] = "TODO"
# ds.attrs["time_coverage_resolution"] = "TODO"
ds.attrs["references"] = "10.22008/promice/freshwater"
ds.attrs["product_version"] = 1.0

And now noweb weave bsv2xarray to convert all the BSVs to NetCDF files

import numpy as np
import pandas as pd
import xarray as xr
import numpy as np
import sys
import os

f = sys.argv[1]
# f = './dat/MAR_runoff_ice_1979_100.bsv'
    
RCM, var, domain, year, k = os.path.basename(os.path.splitext(f)[0]).split("_")
#MAR runoff ice   2000 100

# Set up output folder and file name
OUTDIR = "./freshwater/" + domain + "_" + k + "/discharge" # ./freshwater_90 or ./freshwater_80
# if k == '100': OUTDIR = "./freshwater/" + domain + "/discharge" # ./freshwater/
if not os.path.exists(OUTDIR):
  os.makedirs(OUTDIR, exist_ok=True)
OUTBASE = RCM + "_" + year
OUTFILE = OUTDIR + "/" + OUTBASE + ".nc"

# Generate xarray structure
BSVFILE=f
<<bsv2xarray>>

# load and apply coverage scaling
cf = "coverage_" + RCM + "_" + domain +"_" + k + ".nc"
c = xr.open_dataset("./tmp/"+cf)
c = c.where(c['runoff'] != 0, np.nan, drop=False)
ds['discharge'].values = ds['discharge']/c['runoff'] # apply coverage

ds.to_netcdf(OUTFILE, unlimited_dims='time', encoding={'discharge': {'zlib': True, 'complevel': 2}}, engine='netcdf4', mode='w')

Combine NetCDF

Note:

ln -s ice_100 ice
ln -s land_100 land

import xarray as xr
import numpy as np

for d in ['ice','land']:
    for r in ['RACMO','MAR']:
        print(d,r)
        ds = xr.open_mfdataset('./freshwater/'+d+'/discharge/'+r+'_*.nc', combine='by_coords')

        for v in ['Z2012_sector', 'Z2012_sector_dist',
                  'M2019_ID','M2019_ID_dist','M2019_region','M2019_basin',
                  'M2020_gate','M2020_gate_dist',
                  'B2015_name','B2015_dist']:
            ds[v] = ds[v].isel({'time':1}).squeeze()
            
        if d == 'ice':
            for v in ['coast_id','coast_lat','coast_lon','coast_alt']:
                ds[v] = ds[v].isel({'time':1}).squeeze()
            
        for v in ['M2019_basin','M2019_region','B2015_name']:
            ds[v] = ds[v].astype(str)

        for v in ['B2015_dist','M2020_gate_dist','M2020_gate',
                  'M2019_ID_dist','M2019_ID','Z2012_sector_dist']:
            ds[v] = ds[v].astype(np.uint32)
        
        ds.to_netcdf('./freshwater/'+d+'/'+r+'.nc',
                     unlimited_dims='time', 
                     encoding={'discharge': {'zlib': True, 'complevel': 2},
                               'Z2012_sector': {'zlib': True, 'complevel': 2},
                               'M2019_ID': {'zlib': True, 'complevel': 2},
                               'M2019_basin': {'zlib': True, 'complevel': 2},
                               'M2019_region': {'zlib': True, 'complevel': 2},
                               'B2015_name': {'zlib': True, 'complevel': 2},
                               'M2020_gate': {'zlib': True, 'complevel': 2},
                               }, 
                     engine='netcdf4', mode='w')

Makefile

This code, and all code files in this project, are derived products tangled from the sob.org source file.

freshwater.zip: G run 
# dist

G:
	grass -e -c EPSG:3413 ./G

run: FORCE
	# basins, outlets, and streams
	grass ./G/PERMANENT --exec ./import.sh
	grass ./G/PERMANENT --exec ./sob.sh 

	# how do basins change under different SOB routing?
	grass ./G/PERMANENT --exec ./k_basin_change.sh

	# 2D EPSG area error
	grass ./G/PERMANENT --exec ./area_error.sh

	# RCM
	mkdir -p dat
	python ./rcm_preprocess.py
	grass ./G/PERMANENT --exec ./rcm.sh # partition RCM output to basins

	# coverage
	grass ./G/PERMANENT --exec ./coverage_calc.sh
	python coverage_csv2nc.py
	
	# create end-user data product
	parallel --bar "python bsv2netcdf.py {}" ::: $$(ls dat/*_runoff_*.bsv)
	python netcdf_combine.py

dist: freshwater.zip
	zip -r freshwater.zip freshwater

FORCE: # dummy target

clean_all:
	rm -fR G tmp dat freshwater freshwater.zip

clean_RCM:
	rm -fR G/MAR G/RACMO

clean_sob:
	rm -fR G/land_* G/ice_*

Data

Provenance

Show enough information about data files and metadata to aid in reproducibility

ArcticDEM

file=${DATADIR}/ArcticDEM/arcticdem_mosaic_100m_v4.1_dem.tif
md5sum ${file}
echo ""
gdalinfo ${file}

be66611e17eb925404daa080e690522a  /home/kdm/data/ArcticDEM/arcticdem_mosaic_100m_v4.1_dem.tif

Driver: GTiff/GeoTIFF
Files: /home/kdm/data/ArcticDEM/arcticdem_mosaic_100m_v4.1_dem.tif
Size is 74002, 75002
Coordinate System is:
PROJCRS["WGS 84 / NSIDC Sea Ice Polar Stereographic North",
    BASEGEOGCRS["WGS 84",
        ENSEMBLE["World Geodetic System 1984 ensemble",
            MEMBER["World Geodetic System 1984 (Transit)"],
            MEMBER["World Geodetic System 1984 (G730)"],
            MEMBER["World Geodetic System 1984 (G873)"],
            MEMBER["World Geodetic System 1984 (G1150)"],
            MEMBER["World Geodetic System 1984 (G1674)"],
            MEMBER["World Geodetic System 1984 (G1762)"],
            MEMBER["World Geodetic System 1984 (G2139)"],
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ENSEMBLEACCURACY[2.0]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["US NSIDC Sea Ice polar stereographic north",
        METHOD["Polar Stereographic (variant B)",
            ID["EPSG",9829]],
        PARAMETER["Latitude of standard parallel",70,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8832]],
        PARAMETER["Longitude of origin",-45,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8833]],
        PARAMETER["False easting",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["easting (X)",south,
            MERIDIAN[45,
                ANGLEUNIT["degree",0.0174532925199433]],
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["northing (Y)",south,
            MERIDIAN[135,
                ANGLEUNIT["degree",0.0174532925199433]],
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Polar research."],
        AREA["Northern hemisphere - north of 60°N onshore and offshore, including Arctic."],
        BBOX[60,-180,90,180]],
    ID["EPSG",3413]]
Data axis to CRS axis mapping: 1,2
Origin = (-4000100.000000000000000,4100100.000000000000000)
Pixel Size = (100.000000000000000,-100.000000000000000)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
  LAYOUT=COG
  PREDICTOR=3
Corner Coordinates:
Upper Left  (-4000100.000, 4100100.000) (179d17'33.71"E, 40d21'17.26"N)
Lower Left  (-4000100.000,-3400100.000) ( 94d38' 7.22"W, 44d 3'59.59"N)
Upper Right ( 3400100.000, 4100100.000) ( 95d19'55.26"E, 43d27'54.66"N)
Lower Right ( 3400100.000,-3400100.000) (  0d 0' 0.01"E, 47d34'59.22"N)
Center      ( -300000.000,  350000.000) (175d36' 4.66"E, 85d44'47.27"N)
Band 1 Block=512x512 Type=Float32, ColorInterp=Gray
  NoData Value=-9999
  Overviews: 37001x37501, 18500x18750, 9250x9375, 4625x4687, 2312x2343, 1156x1171, 578x585, 289x292

Citterio 2013

file=${DATADIR}/Citterio_2013/PROMICE_250_2019-12-04_EPSG4326/PROMICE_250_2019-12-04.shp
md5sum ${file}

a69e8f7fa340d8998c69aef539de8c58  /home/kdm/data/Citterio_2013/PROMICE_250_2019-12-04_EPSG4326/PROMICE_250_2019-12-04.shp

GIMP_0714

file=${DATADIR}/GIMP/0714/1999.07.01/GimpOceanMask_90m
md5sum ${file}
echo ""
gdalinfo ${file}

99e30ed3209ef4c6d4fe4e84cd76596c  /home/kdm/data/GIMP/0714/1999.07.01/GimpOceanMask_90m

Driver: GTiff/GeoTIFF
Files: /home/kdm/data/GIMP/0714/1999.07.01/GimpOceanMask_90m
Size is 16620, 30000
Coordinate System is:
PROJCRS["unnamed",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["Polar Stereographic (variant B)",
        METHOD["Polar Stereographic (variant B)",
            ID["EPSG",9829]],
        PARAMETER["Latitude of standard parallel",70,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8832]],
        PARAMETER["Longitude of origin",-45,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8833]],
        PARAMETER["False easting",1,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",1,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",south,
            MERIDIAN[90,
                ANGLEUNIT["degree",0.0174532925199433,
                    ID["EPSG",9122]]],
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",south,
            MERIDIAN[180,
                ANGLEUNIT["degree",0.0174532925199433,
                    ID["EPSG",9122]]],
            ORDER[2],
            LENGTHUNIT["metre",1]]]
Data axis to CRS axis mapping: 1,2
Origin = (-640000.000000000000000,-655550.000000000000000)
Pixel Size = (90.000000000000000,-90.000000000000000)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=DEFLATE
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  ( -640000.000, -655550.000) ( 89d18'44.40"W, 81d33'26.51"N)
Lower Left  ( -640000.000,-3355550.000) ( 55d47'53.80"W, 59d11'57.96"N)
Upper Right (  855800.000, -655550.000) (  7d32'51.02"E, 80d 4'20.46"N)
Lower Right (  855800.000,-3355550.000) ( 30d41'32.31"W, 58d47'45.95"N)
Center      (  107900.000,-2005550.000) ( 41d55'13.60"W, 71d36'44.99"N)
Band 1 Block=16620x1 Type=Int16, ColorInterp=Gray

BedMachine v5

md5sum ${DATADIR}/Morlighem_2017/BedMachineGreenland-v5.nc
echo ""
ncdump -chs ${DATADIR}/Morlighem_2017/BedMachineGreenland-v5.nc

7387182a059dd8cad66ce7638eb0d7cd  /home/kdm/data/Morlighem_2017/BedMachineGreenland-v5.nc

netcdf BedMachineGreenland-v5 {
dimensions:
	x = 10218 ;
	y = 18346 ;
variables:
	char mapping ;
		mapping:grid_mapping_name = "polar_stereographic" ;
		mapping:latitude_of_projection_origin = 90. ;
		mapping:standard_parallel = 70. ;
		mapping:straight_vertical_longitude_from_pole = -45. ;
		mapping:false_easting = 0. ;
		mapping:false_northing = 0. ;
		mapping:proj4text = "+proj=stere +lat_0=90 +lat_ts=70 +lon_0=-45 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs" ;
		mapping:crs_wkt = "PROJCS[\"WGS 84 / NSIDC Sea Ice Polar Stereographic North\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]],PROJECTION[\"Polar_Stereographic\"],PARAMETER[\"latitude_of_origin\",70],PARAMETER[\"central_meridian\",-45],PARAMETER[\"scale_factor\",1],PARAMETER[\"false_easting\",0],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]],AXIS[\"X\",EAST],AXIS[\"Y\",NORTH],AUTHORITY[\"EPSG\",\"3413\"]]" ;
		mapping:_Storage = "contiguous" ;
	int x(x) ;
		x:long_name = "Cartesian x-coordinate" ;
		x:standard_name = "projection_x_coordinate" ;
		x:units = "meter" ;
		x:_Storage = "contiguous" ;
		x:_Endianness = "little" ;
	int y(y) ;
		y:long_name = "Cartesian y-coordinate" ;
		y:standard_name = "projection_y_coordinate" ;
		y:units = "meter" ;
		y:_Storage = "contiguous" ;
		y:_Endianness = "little" ;
	byte mask(y, x) ;
		mask:long_name = "mask (0 = ocean, 1 = ice-free land, 2 = grounded ice, 3 = floating ice, 4 = non-Greenland land)" ;
		mask:grid_mapping = "mapping" ;
		mask:valid_range = 0b, 4b ;
		mask:flag_values = 0b, 1b, 2b, 3b, 4b ;
		mask:flag_meanings = "ocean ice_free_land grounded_ice floating_ice non_greenland_land" ;
		mask:source = "gimpmask v2.0 (https://byrd.osu.edu/research/groups/glacier-dynamics/data/icemask) combined with coastline from Jeremie Mouginot" ;
		mask:_Storage = "contiguous" ;
	float surface(y, x) ;
		surface:long_name = "ice surface elevation" ;
		surface:standard_name = "surface_altitude" ;
		surface:units = "meters" ;
		surface:grid_mapping = "mapping" ;
		surface:source = "gimpdem v2.1 (https://byrd.osu.edu/research/groups/glacier-dynamics/data/gimpdem)" ;
		surface:_Storage = "contiguous" ;
		surface:_Endianness = "little" ;
	float thickness(y, x) ;
		thickness:long_name = "ice thickness" ;
		thickness:standard_name = "land_ice_thickness" ;
		thickness:units = "meters" ;
		thickness:grid_mapping = "mapping" ;
		thickness:source = "Mass conservation (Mathieu Morlighem)" ;
		thickness:_Storage = "contiguous" ;
		thickness:_Endianness = "little" ;
	float bed(y, x) ;
		bed:long_name = "bed topography" ;
		bed:standard_name = "bedrock_altitude" ;
		bed:units = "meters" ;
		bed:grid_mapping = "mapping" ;
		bed:source = "Mass conservation (Mathieu Morlighem)" ;
		bed:_FillValue = -9999.f ;
		bed:_Storage = "contiguous" ;
		bed:_Endianness = "little" ;
	short errbed(y, x) ;
		errbed:long_name = "bed topography/ice thickness error" ;
		errbed:units = "meters" ;
		errbed:grid_mapping = "mapping" ;
		errbed:source = "Mathieu Morlighem" ;
		errbed:_FillValue = -9999s ;
		errbed:_Storage = "chunked" ;
		errbed:_ChunkSizes = 3670, 2044 ;
		errbed:_DeflateLevel = 2 ;
		errbed:_Shuffle = "true" ;
		errbed:_Endianness = "little" ;
	byte source(y, x) ;
		source:long_name = "data source (0 = none, 1 = gimpdem, 2 = Mass conservation, 3 = synthetic, 4 = interpolation, 5 = hydrostatic equilibrium, 6 = kriging, 7 = RTOPO-2, 8 = gravity inversion, 9 = Millan et al. 2021, 10+ = bathymetry data)" ;
		source:grid_mapping = "mapping" ;
		source:valid_range = 0b, 50b ;
		source:flag_values = 0b, 1b, 2b, 3b, 4b, 5b, 6b, 7b, 8b, 9b, 10b, 11b, 12b, 13b, 14b, 15b, 16b, 17b, 18b, 19b, 20b, 21b, 22b, 23b, 24b, 25b, 26b, 27b, 28b, 29b, 30b, 31b, 32b, 33b, 34b, 35b, 36b, 37b, 38b, 39b, 40b, 41b, 42b, 43b, 44b, 45b, 46b, 47b, 48b, 49b, 50b ;
		source:flag_meanings = "none gimpdem mass_conservation synthetic interpolation hydrodstatic_equilibrium kriging RTopo_2 gravity_inversion millan_etal_2021 bathymetry1 bathymetry2 bathymetry3 bathymetry4 bathymetry5 bathymetry6 bathymetry7 bathymetry8 bathymetry9 bathymetry10 bathymetry11 bathymetry12 bathymetry13 bathymetry14 bathymetry15 bathymetry16 bathymetry17 bathymetry18 bathymetry19 bathymetry20 bathymetry21 bathymetry22 bathymetry23 bathymetry24 bathymetry25 bathymetry26 bathymetry27 bathymetry28 bathymetry29 bathymetry30 bathymetry31 bathymetry32 bathymetry33 bathymetry34 bathymetry35 bathymetry36 bathymetry37 bathymetry38 bathymetry39 bathymetry40 bathymetry41" ;
		source:source = "Mathieu Morlighem" ;
		source:_Storage = "contiguous" ;
	byte dataid(y, x) ;
		dataid:long_name = "data id" ;
		dataid:grid_mapping = "mapping" ;
		dataid:valid_range = 1b, 10b ;
		dataid:flag_values = 1b, 2b, 7b, 10b ;
		dataid:flag_meanings = "GIMPdem Radar seismic multibeam" ;
		dataid:source = "Mathieu Morlighem" ;
		dataid:_Storage = "chunked" ;
		dataid:_ChunkSizes = 4587, 2555 ;
		dataid:_DeflateLevel = 1 ;
		dataid:_Shuffle = "true" ;
	short geoid(y, x) ;
		geoid:long_name = "EIGEN-6C4 Geoid - WGS84 Ellipsoid difference" ;
		geoid:standard_name = "geoid_height_above_reference_ellipsoid" ;
		geoid:units = "meters" ;
		geoid:grid_mapping = "mapping" ;
		geoid:geoid = "eigen-6c4 (Forste et al 2014)" ;
		geoid:_Storage = "chunked" ;
		geoid:_ChunkSizes = 3670, 2044 ;
		geoid:_DeflateLevel = 2 ;
		geoid:_Shuffle = "true" ;
		geoid:_Endianness = "little" ;

// global attributes:
		:Conventions = "CF-1.7" ;
		:Title = "BedMachine Greenland" ;
		:Author = "Mathieu Morlighem" ;
		:version = "28-Jul-2022 (v5.5)" ;
		:nx = 10218. ;
		:ny = 18346. ;
		:Projection = "Polar Stereographic North (70N, 45W)" ;
		:proj4 = "+init=epsg:3413" ;
		:sea_water_density\ \(kg\ m-3\) = 1023. ;
		:ice_density\ \(kg\ m-3\) = 917. ;
		:xmin = -652925 ;
		:ymax = -632675 ;
		:spacing = 150 ;
		:no_data = -9999. ;
		:license = "No restrictions on access or use" ;
		:Data_citation = "Morlighem M. et al., (2017), BedMachine v3: Complete bed topography and ocean bathymetry mapping of Greenland from multi-beam echo sounding combined with mass conservation, Geophys. Res. Lett., 44, doi:10.1002/2017GL074954. (http://onlinelibrary.wiley.com/doi/10.1002/2017GL074954/full)" ;
		:_NCProperties = "version=2,netcdf=4.7.4,hdf5=1.8.12" ;
		:_SuperblockVersion = 0 ;
		:_IsNetcdf4 = 0 ;
		:_Format = "netCDF-4 classic model" ;
}

MAR

File list and md5sums

WARNING: May be stale. Using pre-computed md5sum because it takes so long to calculate.

ROOT=${DATADIR}/MAR/3.13-freshwater
LIST=$(cd ${ROOT}; ls *.nc | { tee >(head -n4 >&3; cat >/dev/null) | tail -n4; } 3>&1  ) # first and last four files
parallel --keep-order md5sum ${ROOT}/{} ::: ${LIST}

f678065481615c0d74078dacf0350457  /home/kdm/data/MAR/3.12-freshwater/MARv3.12-daily-ERA5-2000.nc
ae0e868b79f0a9554779d85c0260115e  /home/kdm/data/MAR/3.12-freshwater/MARv3.12-daily-ERA5-2001.nc
792b2794f606127aa3f00b215d641859  /home/kdm/data/MAR/3.12-freshwater/MARv3.12-daily-ERA5-2002.nc
19ad04f82cce52403533582c4df236cf  /home/kdm/data/MAR/3.12-freshwater/MARv3.12-daily-ERA5-2003.nc
f678065481615c0d74078dacf0350457  /home/kdm/data/MAR/3.12-freshwater/MARv3.12-daily-ERA5-2000.nc
ae0e868b79f0a9554779d85c0260115e  /home/kdm/data/MAR/3.12-freshwater/MARv3.12-daily-ERA5-2001.nc
792b2794f606127aa3f00b215d641859  /home/kdm/data/MAR/3.12-freshwater/MARv3.12-daily-ERA5-2002.nc
19ad04f82cce52403533582c4df236cf  /home/kdm/data/MAR/3.12-freshwater/MARv3.12-daily-ERA5-2003.nc

ncdump

ROOT=${DATADIR}/MAR/3.13-freshwater
ncdump -chs ${ROOT}/MARv3.13-daily-ERA5-2000.nc

netcdf MARv3.13-daily-ERA5-2000 {
dimensions:
	y = 2700 ;
	x = 1496 ;
	time = 366 ;
variables:
	float LAT(y, x) ;
		LAT:units = "degrees" ;
		LAT:long_name = "Latitude" ;
		LAT:standard_name = "Latitude" ;
		LAT:_FillValue = 9.96921e+36f ;
		LAT:missing_value = 9.96921e+36f ;
		LAT:_Storage = "chunked" ;
		LAT:_ChunkSizes = 2700, 1496 ;
		LAT:_DeflateLevel = 6 ;
		LAT:_Shuffle = "true" ;
		LAT:_Endianness = "little" ;
	float LON(y, x) ;
		LON:units = "degrees" ;
		LON:long_name = "Longitude" ;
		LON:standard_name = "Longitude" ;
		LON:_FillValue = 9.96921e+36f ;
		LON:missing_value = 9.96921e+36f ;
		LON:_Storage = "chunked" ;
		LON:_ChunkSizes = 2700, 1496 ;
		LON:_DeflateLevel = 6 ;
		LON:_Shuffle = "true" ;
		LON:_Endianness = "little" ;
	float MSK(y, x) ;
		MSK:units = "-" ;
		MSK:long_name = "Land/Ice Mask" ;
		MSK:standard_name = "Land_Ice_Mask" ;
		MSK:_FillValue = 9.96921e+36f ;
		MSK:missing_value = 9.96921e+36f ;
		MSK:_Storage = "chunked" ;
		MSK:_ChunkSizes = 2700, 1496 ;
		MSK:_DeflateLevel = 6 ;
		MSK:_Shuffle = "true" ;
		MSK:_Endianness = "little" ;
	float MSK_MAR(y, x) ;
		MSK_MAR:units = "-" ;
		MSK_MAR:long_name = "Original MAR  5x 5km2 Ice Mask" ;
		MSK_MAR:standard_name = "Original_MAR__5x_5km2_Ice_Mask" ;
		MSK_MAR:_FillValue = 9.96921e+36f ;
		MSK_MAR:missing_value = 9.96921e+36f ;
		MSK_MAR:_Storage = "chunked" ;
		MSK_MAR:_ChunkSizes = 2700, 1496 ;
		MSK_MAR:_DeflateLevel = 6 ;
		MSK_MAR:_Shuffle = "true" ;
		MSK_MAR:_Endianness = "little" ;
	float RU2(time, y, x) ;
		RU2:units = "mmWE/day" ;
		RU2:long_name = "Water run-off (sub-pixel 2)" ;
		RU2:standard_name = "Water_run-off__sub-pixel_2_" ;
		RU2:_FillValue = 9.96921e+36f ;
		RU2:missing_value = 9.96921e+36f ;
		RU2:_Storage = "chunked" ;
		RU2:_ChunkSizes = 100, 2700, 1496 ;
		RU2:_DeflateLevel = 6 ;
		RU2:_Shuffle = "true" ;
		RU2:_Endianness = "little" ;
	float RUcorr(time, y, x) ;
		RUcorr:units = "mmWEday" ;
		RUcorr:long_name = "Water run-off (sub-pixel 1)" ;
		RUcorr:standard_name = "Water_run-off__sub-pixel_1_" ;
		RUcorr:_FillValue = 9.96921e+36f ;
		RUcorr:missing_value = 9.96921e+36f ;
		RUcorr:_Storage = "chunked" ;
		RUcorr:_ChunkSizes = 100, 2700, 1496 ;
		RUcorr:_DeflateLevel = 6 ;
		RUcorr:_Shuffle = "true" ;
		RUcorr:_Endianness = "little" ;
	float SMB2(time, y, x) ;
		SMB2:units = "mmWE/day" ;
		SMB2:long_name = "Surface Mass Balance (sub-pixel 2)" ;
		SMB2:standard_name = "Surface_Mass_Balance__sub-pixel_2_" ;
		SMB2:_FillValue = 9.96921e+36f ;
		SMB2:missing_value = 9.96921e+36f ;
		SMB2:_Storage = "chunked" ;
		SMB2:_ChunkSizes = 100, 2700, 1496 ;
		SMB2:_DeflateLevel = 6 ;
		SMB2:_Shuffle = "true" ;
		SMB2:_Endianness = "little" ;
	float SMBcorr(time, y, x) ;
		SMBcorr:units = "mmWE/day" ;
		SMBcorr:long_name = "Surface Mass Balance (sub-pixel 1)" ;
		SMBcorr:standard_name = "Surface_Mass_Balance__sub-pixel_1_" ;
		SMBcorr:_FillValue = 9.96921e+36f ;
		SMBcorr:missing_value = 9.96921e+36f ;
		SMBcorr:_Storage = "chunked" ;
		SMBcorr:_ChunkSizes = 100, 2700, 1496 ;
		SMBcorr:_DeflateLevel = 6 ;
		SMBcorr:_Shuffle = "true" ;
		SMBcorr:_Endianness = "little" ;
	float SRF(y, x) ;
		SRF:units = "m" ;
		SRF:long_name = "Surface height" ;
		SRF:standard_name = "Surface_height" ;
		SRF:_FillValue = 9.96921e+36f ;
		SRF:missing_value = 9.96921e+36f ;
		SRF:_Storage = "chunked" ;
		SRF:_ChunkSizes = 2700, 1496 ;
		SRF:_DeflateLevel = 6 ;
		SRF:_Shuffle = "true" ;
		SRF:_Endianness = "little" ;
	float SRF_MAR(y, x) ;
		SRF_MAR:units = "m" ;
		SRF_MAR:long_name = "Original MAR  5x 5km2 Surface height" ;
		SRF_MAR:standard_name = "Original_MAR__5x_5km2_Surface_height" ;
		SRF_MAR:_FillValue = 9.96921e+36f ;
		SRF_MAR:missing_value = 9.96921e+36f ;
		SRF_MAR:_Storage = "chunked" ;
		SRF_MAR:_ChunkSizes = 2700, 1496 ;
		SRF_MAR:_DeflateLevel = 6 ;
		SRF_MAR:_Shuffle = "true" ;
		SRF_MAR:_Endianness = "little" ;
	float time(time) ;
		time:units = "DAYS since 2000-01-01 12:00:00" ;
		time:long_name = "time" ;
		time:standard_name = "time" ;
		time:_Storage = "chunked" ;
		time:_ChunkSizes = 100 ;
		time:_DeflateLevel = 6 ;
		time:_Shuffle = "true" ;
		time:_Endianness = "little" ;
	float x(x) ;
		x:units = "m" ;
		x:long_name = "x" ;
		x:standard_name = "x" ;
		x:_Storage = "chunked" ;
		x:_ChunkSizes = 1496 ;
		x:_DeflateLevel = 6 ;
		x:_Shuffle = "true" ;
		x:_Endianness = "little" ;
	float y(y) ;
		y:units = "m" ;
		y:long_name = "y" ;
		y:standard_name = "y" ;
		y:_Storage = "chunked" ;
		y:_ChunkSizes = 2700 ;
		y:_DeflateLevel = 6 ;
		y:_Shuffle = "true" ;
		y:_Endianness = "little" ;

// global attributes:
		:title = "Daily MARv3.12 outputs in 2000 interpolated on the 1x1km^2 grid from Noel et al. using ERA5" ;
		:institution = "University of Liège (Belgium)" ;
		:contact = "xavier.fettweis@uliege.be" ;
		:institute = "University of Liège (Belgium)" ;
		:model = "regional climate model MARv3.12.0" ;
		:forcing = "ERA5" ;
		:creation_date = "2021-11-18-T114559Z" ;
		:history = "Thu Nov 18 11:45:59 2021: ncks --cnk_dmn time,100 -L 6 MARv3.12-daily-ERA5-2000.nc MARv3.12-daily-ERA5-2000.nc4" ;
		:NCO = "netCDF Operators version 4.8.1 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ;
		:_NCProperties = "version=2,netcdf=4.7.3,hdf5=1.10.5," ;
		:_SuperblockVersion = 0 ;
		:_IsNetcdf4 = 1 ;
		:_Format = "netCDF-4" ;
}

RACMO

File list and md5sums

WARN: Only check some to save compute time.

echo 'ice'
(cd ${DATADIR}/RACMO/freshwater/runoff_ice_1km; find . -type f -name "*.nc" | LC_ALL=C sort | head -n8 | parallel --keep-order md5sum)
(cd ${DATADIR}/RACMO/freshwater/runoff_ice_1km; find . -type f -name "*.nc" | LC_ALL=C sort | tail -n8 | parallel --keep-order md5sum)
echo 'land_5.5_km'
(cd ${DATADIR}/RACMO/freshwater/runoff_land_5.5km; find . -type f -name "*.nc" | LC_ALL=C sort | head -n8 | parallel --keep-order md5sum)
(cd ${DATADIR}/RACMO/freshwater/runoff_land_5.5km; find . -type f -name "*.nc" | LC_ALL=C sort | tail -n8 | parallel --keep-order md5sum)
echo 'land_1_km'
(cd ${DATADIR}/RACMO/freshwater/runoff_land_1km_regrid; find . -type f -name "*.nc" | LC_ALL=C sort | head -n8 | parallel --keep-order md5sum)
(cd ${DATADIR}/RACMO/freshwater/runoff_land_1km_regrid; find . -type f -name "*.nc" | LC_ALL=C sort | tail -n8 | parallel --keep-order md5sum)

ice
219834ba9f1bc98e0a87ab37ea88863d	./runoff.1990_AMJ.BN_RACMO2.3p2_ERA5_3h_FGRN055.1km.DD.nc
7bde9226e0651e2cfbb8082424ba5181	./runoff.1990_JAS.BN_RACMO2.3p2_ERA5_3h_FGRN055.1km.DD.nc
76d77c3ac9881521825d93c1c43af7d5	./runoff.1990_JFM.BN_RACMO2.3p2_ERA5_3h_FGRN055.1km.DD.nc
17dd60fa5964af322624e72d9b58ea7c	./runoff.1990_OND.BN_RACMO2.3p2_ERA5_3h_FGRN055.1km.DD.nc
fcdd6121b4c1c94d1fe5018966deacac	./runoff.1991_AMJ.BN_RACMO2.3p2_ERA5_3h_FGRN055.1km.DD.nc
48c894ad8d7c95b5521892c18183ea41	./runoff.1991_JAS.BN_RACMO2.3p2_ERA5_3h_FGRN055.1km.DD.nc
d405a62f91a201e3b7be931c12e1f678	./runoff.1991_JFM.BN_RACMO2.3p2_ERA5_3h_FGRN055.1km.DD.nc
bcaadbf2e742b533a612417f748dd5f0	./runoff.1991_OND.BN_RACMO2.3p2_ERA5_3h_FGRN055.1km.DD.nc
acf9b55165baf20b613067383ea3d1fc	./runoff_WJB_int.1988_AMJ.BN_RACMO2.3p2_FGRN055_1km.DD.nc
08d48cb9b85912742785fc089a8462bb	./runoff_WJB_int.1988_JAS.BN_RACMO2.3p2_FGRN055_1km.DD.nc
9c26be97b007d967bdcc48c34ad1c3ea	./runoff_WJB_int.1988_JFM.BN_RACMO2.3p2_FGRN055_1km.DD.nc
0673a072bd62d2957f01d5ca6c6cf570	./runoff_WJB_int.1988_OND.BN_RACMO2.3p2_FGRN055_1km.DD.nc
c9d47dea1a7fe5317447734ac19b7faa	./runoff_WJB_int.1989_AMJ.BN_RACMO2.3p2_FGRN055_1km.DD.nc
46b5f01a3dce922eee06f48f415ec9fb	./runoff_WJB_int.1989_JAS.BN_RACMO2.3p2_FGRN055_1km.DD.nc
dbfdf673dc82d957b5109b34724db0a6	./runoff_WJB_int.1989_JFM.BN_RACMO2.3p2_FGRN055_1km.DD.nc
af207835fe80b60d58e06b5f6035bf18	./runoff_WJB_int.1989_OND.BN_RACMO2.3p2_FGRN055_1km.DD.nc
land_5.5_km
f60e7505e75e58133e82d381db5caa43	./runoff.1957.FGRN055_BN_RACMO2.3p2_FGRN055.DD.nc
7b2de3637a9b79647aa600c8da7d2990	./runoff.1958.FGRN055_BN_RACMO2.3p2_FGRN055.DD.nc
cf60a1b81f84975ee741b483ddedb46c	./runoff.1959.FGRN055_BN_RACMO2.3p2_FGRN055.DD.nc
fd74c2d30a0d6589303b8ccbc4679694	./runoff.1960.FGRN055_BN_RACMO2.3p2_FGRN055.DD.nc
d364c5db102ee36981d55e0b46277673	./runoff.1961.FGRN055_BN_RACMO2.3p2_FGRN055.DD.nc
fcb4b8154fce45145ba468c7b0c2a1e6	./runoff.1962.FGRN055_BN_RACMO2.3p2_FGRN055.DD.nc
b87dfb7d13426080a58745d9a917de2b	./runoff.1963.FGRN055_BN_RACMO2.3p2_FGRN055.DD.nc
3d055d1d9c0ddacae46acad23a421a9e	./runoff.1964.FGRN055_BN_RACMO2.3p2_FGRN055.DD.nc
32e7fa643ae40f346999e09c1b17a6d5	./runoff.2014.FGRN055_BN_RACMO2.3p2_ERA5_3h_FGRN055.DD.nc
1a6d4fdbb8dd14aa5311b3d43954c331	./runoff.2015.FGRN055_BN_RACMO2.3p2_ERA5_3h_FGRN055.DD.nc
a5f91f7731fb2a7418a7e89e9f67fb43	./runoff.2016.FGRN055_BN_RACMO2.3p2_ERA5_3h_FGRN055.DD.nc
90634adf3dd1b9ee409002a29ac3e08f	./runoff.2017.FGRN055_BN_RACMO2.3p2_ERA5_3h_FGRN055.DD.nc
906751f527cdb51f5448fd324a705a29	./runoff.2018.FGRN055_BN_RACMO2.3p2_ERA5_3h_FGRN055.DD.nc
0e481615b966397f0a4c50f48ba1dab9	./runoff.2019.FGRN055_BN_RACMO2.3p2_ERA5_3h_FGRN055.DD.nc
a813eb69a66bd4c23cf15afcac592420	./runoff.2020.FGRN055_BN_RACMO2.3p2_ERA5_3h_FGRN055.DD.nc
c5ae487f3ecc5987a17c3c06083ebd5a	./runoff.2021.FGRN055_BN_RACMO2.3p2_ERA5_3h_FGRN055.DD.nc
land_1_km
879f40c2e9c4598a83458ab93bcd2246	./Grid_regular_EPSG3413_1km.nc
c2caab22f5dc8b816d26b15b07c6c732	./Tundra_Mask_1km.nc
c0c69a24c76cb909ea2ee463d79d10b6	./mask.nc
a3c0a6c3762155ad404fe6be75c49251	./out/runoff.1957.FGRN055_BN_RACMO2.3p2_FGRN055.DD.nc.1km-bilin.nc
76d34a54d246b6872d913e7889f66772	./out/runoff.1958.FGRN055_BN_RACMO2.3p2_FGRN055.DD.nc.1km-bilin.nc
5f1f04983c6f79fc1d89ccb15ec5946b	./out/runoff.1959.FGRN055_BN_RACMO2.3p2_FGRN055.DD.nc.1km-bilin.nc
b5d9ffdd8b73d22d9b6bc7da5cbc3c0d	./out/runoff.1960.FGRN055_BN_RACMO2.3p2_FGRN055.DD.nc.1km-bilin.nc
dc269d5d295640ed71f033bce1d30489	./out/runoff.1961.FGRN055_BN_RACMO2.3p2_FGRN055.DD.nc.1km-bilin.nc
78000c74d9e9553e5237c20205b6f891	./out/runoff.2017.FGRN055_BN_RACMO2.3p2_ERA5_3h_FGRN055.DD.nc.1km-bilin.nc
ced4b9dd79d5c2e0a612b01bd07d2afd	./out/runoff.2018.FGRN055_BN_RACMO2.3p2_ERA5_3h_FGRN055.DD.nc.1km-bilin.nc
9f520292d3ab4dbf40246bc462cda4e3	./out/runoff.2019.FGRN055_BN_RACMO2.3p2_ERA5_3h_FGRN055.DD.nc.1km-bilin.nc
59b9237449e3c8dfe1b62e8a330bffbd	./out/runoff.2020.FGRN055_BN_RACMO2.3p2_ERA5_3h_FGRN055.DD.nc.1km-bilin.nc
c03ce8c3865abad695537b78e5fcbe13	./out/runoff.2021.FGRN055_BN_RACMO2.3p2_ERA5_3h_FGRN055.DD.nc.1km-bilin.nc
0d0023a324f077136cf8e69b96440617	./out/runoff.2022.FGRN055_BN_RACMO2.3p2_ERA5_3h_FGRN055.DD.nc.1km-bilin.nc
c5e011d91b9feb6b7a533b9f146486f8	./weights_FGRN055_to_EPSG3413_1km_bilinear.nc
8f7bea66a91abcea8f2995a9d36b1132	./weights_FGRN055_to_EPSG3413_1km_neareststod.nc

ncdump

ROOT=${DATADIR}/RACMO/freshwater/runoff_ice_1km
ncdump -chs ${ROOT}/runoff_WJB_int.1979_AMJ.BN_RACMO2.3p2_FGRN055_1km.DD.nc

netcdf runoff_WJB_int.1979_AMJ.BN_RACMO2.3p2_FGRN055_1km.DD {
dimensions:
	time = 91 ;
	x = 1496 ;
	y = 2700 ;
variables:
	float time(time) ;
		time:units = "DAYS since 1979-04-01 00:00:00" ;
		time:long_name = "time" ;
		time:standard_name = "time" ;
		time:_Storage = "chunked" ;
		time:_ChunkSizes = 91 ;
		time:_DeflateLevel = 1 ;
		time:_Endianness = "little" ;
		time:_NoFill = "true" ;
	float x(x) ;
		x:units = "km" ;
		x:long_name = "x" ;
		x:standard_name = "x" ;
		x:_Storage = "chunked" ;
		x:_ChunkSizes = 1496 ;
		x:_DeflateLevel = 1 ;
		x:_Endianness = "little" ;
		x:_NoFill = "true" ;
	float y(y) ;
		y:units = "km" ;
		y:long_name = "y" ;
		y:standard_name = "y" ;
		y:_Storage = "chunked" ;
		y:_ChunkSizes = 2700 ;
		y:_DeflateLevel = 1 ;
		y:_Endianness = "little" ;
		y:_NoFill = "true" ;
	float LON(y, x) ;
		LON:units = "km" ;
		LON:long_name = "Easting" ;
		LON:standard_name = "Easting" ;
		LON:actual_range = 0.f, 0.f ;
		LON:missing_value = 0.f, -1.225254e+28f ;
		LON:_Storage = "chunked" ;
		LON:_ChunkSizes = 1350, 748 ;
		LON:_DeflateLevel = 1 ;
		LON:_Endianness = "little" ;
		LON:_NoFill = "true" ;
	float LAT(y, x) ;
		LAT:units = "km" ;
		LAT:long_name = "Northing" ;
		LAT:standard_name = "Northing" ;
		LAT:actual_range = 0.f, 0.f ;
		LAT:missing_value = 0.f, -1.225254e+28f ;
		LAT:_Storage = "chunked" ;
		LAT:_ChunkSizes = 1350, 748 ;
		LAT:_DeflateLevel = 1 ;
		LAT:_Endianness = "little" ;
		LAT:_NoFill = "true" ;
	float runoffcorr(time, y, x) ;
		runoffcorr:units = "mm w.e. per day" ;
		runoffcorr:long_name = "Downscaled corrected snowmelt" ;
		runoffcorr:standard_name = "Downscaled_corrected_snowmelt" ;
		runoffcorr:actual_range = 0.f, 508.326f ;
		runoffcorr:missing_value = -1.e+30f ;
		runoffcorr:_Storage = "chunked" ;
		runoffcorr:_ChunkSizes = 12, 338, 187 ;
		runoffcorr:_DeflateLevel = 1 ;
		runoffcorr:_Endianness = "little" ;
		runoffcorr:_NoFill = "true" ;

// global attributes:
		:title = "Daily runoff field (RACMO2.3p2)" ;
		:institution = "IMAU (Brice Noel)" ;
		:grid = "Map Projection:Polar Stereographic Ellipsoid - Map Reference Latitude: 90.0 - Map Reference Longitude: -39.0 - Map Second Reference Latitude: 71.0 - Map Eccentricity: 0.081819190843 ;wgs84 - Map Equatorial Radius: 6378137.0 ;wgs84 meters - Grid Map Origin Column: 160 - Grid Map Origin Row: -120 - Grid Map Units per Cell: 5000 - Grid Width: 301 - Grid Height: 561" ;
		:history = "libUN (2013.05.22) - Fri Mar 16 18:19:46 2018" ;
" ;
		:_NCProperties = "version=1|netcdflibversion=4.6.0|hdf5libversion=1.10.0" ;
		:_SuperblockVersion = 0 ;
		:_IsNetcdf4 = 0 ;
		:_Format = "netCDF-4 classic model" ;
}

Bamber 2018

FILE=${DATADIR}/Bamber_2018/FWF17.v3_a.nc
md5sum ${FILE}
ncdump -chs ${FILE}

ed9d9bd0580e124146a5503832ced95e  /home/kdm/data/Bamber_2018/FWF17.v3_a.nc
netcdf FWF17.v3_a {
dimensions:
	Y = 785 ;
	X = 752 ;
	TIME = 708 ;
variables:
	double Y(Y) ;
		Y:_FillValue = NaN ;
		Y:units = "meters" ;
		Y:standard_name = "projection_y_coordinate" ;
		Y:point_spacing = "even" ;
		Y:axis = "Y" ;
		Y:_Storage = "contiguous" ;
		Y:_Endianness = "little" ;
	double X(X) ;
		X:_FillValue = NaN ;
		X:units = "meters" ;
		X:standard_name = "projection_x_coordinate" ;
		X:point_spacing = "even" ;
		X:axis = "X" ;
		X:_Storage = "contiguous" ;
		X:_Endianness = "little" ;
	int64 TIME(TIME) ;
		TIME:units = "days since 1958-01-01 00:00:00" ;
		TIME:calendar = "proleptic_gregorian" ;
		TIME:_Storage = "contiguous" ;
		TIME:_Endianness = "little" ;
	short runoff_tundra(TIME, Y, X) ;
		runoff_tundra:_FillValue = -9999s ;
		runoff_tundra:long_name = "Tundra runoff" ;
		runoff_tundra:units = "km3" ;
		runoff_tundra:grid_mapping = "polar_stereographic" ;
		runoff_tundra:description = "WARNING! This variable contains runoff routed from all land tundra (i.e. including CAA, Svalbard, Iceland), whereas the paper only shows tundra runoff from Greenland. To reproduce Greenland-only runoff you must mask the tundra variable with the LSMGr mask provided in order to set all tundra runoff originating outside the Greenland land mass to zero." ;
		runoff_tundra:scale_factor = 0.01 ;
		runoff_tundra:_Storage = "chunked" ;
		runoff_tundra:_ChunkSizes = 118, 131, 126 ;
		runoff_tundra:_DeflateLevel = 4 ;
		runoff_tundra:_Shuffle = "true" ;
		runoff_tundra:_Endianness = "little" ;
	double lon(Y, X) ;
		lon:_FillValue = NaN ;
		lon:grid_mapping = "polar_stereographic" ;
		lon:units = "degrees" ;
		lon:standard_name = "longitude" ;
		lon:_Storage = "contiguous" ;
		lon:_Endianness = "little" ;
	short runoff_ice(TIME, Y, X) ;
		runoff_ice:_FillValue = -9999s ;
		runoff_ice:long_name = "Ice sheet runoff" ;
		runoff_ice:units = "km3" ;
		runoff_ice:grid_mapping = "polar_stereographic" ;
		runoff_ice:scale_factor = 0.01 ;
		runoff_ice:_Storage = "chunked" ;
		runoff_ice:_ChunkSizes = 118, 131, 126 ;
		runoff_ice:_DeflateLevel = 4 ;
		runoff_ice:_Shuffle = "true" ;
		runoff_ice:_Endianness = "little" ;
	double lat(Y, X) ;
		lat:_FillValue = NaN ;
		lat:grid_mapping = "polar_stereographic" ;
		lat:units = "degrees" ;
		lat:standard_name = "latitude" ;
		lat:_Storage = "contiguous" ;
		lat:_Endianness = "little" ;
	byte polar_stereographic ;
		polar_stereographic:grid_mapping_name = "polar_stereographic" ;
		polar_stereographic:scale_factor_at_central_origin = 1. ;
		polar_stereographic:standard_parallel = 70. ;
		polar_stereographic:straight_vertical_longitude_from_pole = -45. ;
		polar_stereographic:false_easting = 0. ;
		polar_stereographic:false_northing = 0. ;
		polar_stereographic:latitude_of_projection_origin = 90. ;
	byte LSMGr(Y, X) ;
		LSMGr:_FillValue = -99b ;
		LSMGr:long_name = "Hole-filled Greenland land mass mask" ;
		LSMGr:units = "none" ;
		LSMGr:grid_mapping = "polar_stereographic" ;
		LSMGr:_Storage = "chunked" ;
		LSMGr:_ChunkSizes = 785, 752 ;
		LSMGr:_DeflateLevel = 4 ;
		LSMGr:_Shuffle = "true" ;
	byte ocean_basins(Y, X) ;
		ocean_basins:_FillValue = -99b ;
		ocean_basins:long_name = "ID number of oceanographic basin which each coastal pixel drains into." ;
		ocean_basins:units = "none" ;
		ocean_basins:grid_mapping = "polar_stereographic" ;
		ocean_basins:basins = ", 26:Labrador Sea, 27:Hudson Strait, 28:Davis Strait, 29:Baffin Bay, 30:Lincoln Sea, 32:Irish Sea and St. George\'s Channel, 33:Inner Seas off the West Coast of Scotland, 55:Gulf of Riga, 56:Baltic Sea, 57:Gulf of Finland, 58:Gulf of Bothnia, 59:White Sea, 76:North Atlantic Ocean, 77:Gulf of St. Lawrence, 80:Celtic Sea, 82:Hudson Bay, 83:The Northwestern Passages, 84:Arctic Ocean, 86:Barentsz Sea, 87:Greenland Sea, 88:North Sea, 96:Kattegat, 98:Skagerrak, 99:Norwegian Sea" ;
		ocean_basins:history = "Basins are as defined by \"Limits of Oceans & Seas, Special Publication No. 23\" published by the IHO in 1953. The dataset was composed by the Flanders Marine Data and Information Centre and downloaded from www.marineregions.org on 6 February 2017. Coastal runoff pixels were allocated an ocean basin to run off into using fwf/allocate_coast_to_basin.py." ;
		ocean_basins:_Storage = "chunked" ;
		ocean_basins:_ChunkSizes = 785, 752 ;
		ocean_basins:_DeflateLevel = 4 ;
		ocean_basins:_Shuffle = "true" ;
	short solid_ice(TIME, Y, X) ;
		solid_ice:_FillValue = -9999s ;
		solid_ice:long_name = "Solid ice discharge" ;
		solid_ice:units = "km3" ;
		solid_ice:grid_mapping = "polar_stereographic" ;
		solid_ice:description = "the monthly discharge data are mean annual values divided by 12." ;
		solid_ice:scale_factor = 0.001 ;
		solid_ice:_Storage = "chunked" ;
		solid_ice:_ChunkSizes = 118, 131, 126 ;
		solid_ice:_DeflateLevel = 4 ;
		solid_ice:_Shuffle = "true" ;
		solid_ice:_Endianness = "little" ;

// global attributes:
		:Conventions = "CF-1.4" ;
		:institution = "University of Bristol (Andrew Tedstone)" ;
		:title = "Monthly freshwater fluxes to the ocean across the Arctic, 1958-2016" ;
		:nx = 752. ;
		:ny = 785. ;
		:xmin = -1777980. ;
		:ymax = -67308. ;
		:spacing = 5000. ;
		:description = "This is the dataset that underlies the paper \"Bamber, J., A. Tedstone, M. King, I. Howat, E. Enderlin, M. van den Broeke and B. Noel (2018) Land ice freshwater budget of the Arctic and North Atlantic Oceans. Part I: Data, methods and results. Journal of Geophysical Research: Oceans.\"" ;
		:_NCProperties = "version=1|netcdflibversion=4.4.1.1|hdf5libversion=1.10.1" ;
		:_SuperblockVersion = 0 ;
		:_IsNetcdf4 = 1 ;
		:_Format = "netCDF-4" ;
}

van As 2018

dir=${DATADIR}/van_As_2018
md5sum ${dir}/*
echo ""
head ${dir}/Watson_discharge_day_v03.txt 
echo ""
tail ${dir}/Watson_discharge_day_v03.txt

176d156b2881b7b4c3ed75b2ede522e0  /home/kdm/data/van_As_2018/readme Watson.txt
9bde46206b085e4fb7d8c4c62dc9758e  /home/kdm/data/van_As_2018/Supplemental Data Set.xls
a1b9c2ae0c25e18014a28e873d18b873  /home/kdm/data/van_As_2018/uaar_a_1433799_sm6008.zip
0589c8ac34275f5002ca8cfd5a23b8a6  /home/kdm/data/van_As_2018/Watson_discharge_day_v03.txt
b5bc413b8db8b6944b35f3b5b5a0d0b2  /home/kdm/data/van_As_2018/Watson_discharge_hour_v03.txt
db56b7f1a7848f15b7158238df1943b1  /home/kdm/data/van_As_2018/Watson River discharge reconstruction 1949-2018.txt

 Year MonthOfYear DayOfMonth DayOfYear DayOfCentury WaterFluxDiversOnly(m3/s) Uncertainty(m3/s) WaterFluxDivers&Temperature(m3/s) Uncertainty(m3/s) WaterFluxCumulative(km3) Uncertainty(km3)
 2006    4   10  100 2292 -999.00 -999.00    0.00    0.00    0.00    0.00
 2006    4   11  101 2293 -999.00 -999.00    0.00    0.00    0.00    0.00
 2006    4   12  102 2294 -999.00 -999.00    0.00    0.00    0.00    0.00
 2006    4   13  103 2295 -999.00 -999.00    0.00    0.00    0.00    0.00
 2006    4   14  104 2296 -999.00 -999.00    0.00    0.00    0.00    0.00
 2006    4   15  105 2297 -999.00 -999.00    0.00    0.00    0.00    0.00
 2006    4   16  106 2298 -999.00 -999.00    0.00    0.00    0.00    0.00
 2006    4   17  107 2299 -999.00 -999.00    0.00    0.00    0.00    0.00
 2006    4   18  108 2300 -999.00 -999.00    0.00    0.00    0.00    0.00

 2019   10   18  291 7231 -999.00 -999.00    0.00    0.00    8.33    1.37
 2019   10   19  292 7232 -999.00 -999.00    0.00    0.00    8.33    1.37
 2019   10   20  293 7233 -999.00 -999.00    0.00    0.00    8.33    1.37
 2019   10   21  294 7234 -999.00 -999.00    0.00    0.00    8.33    1.37
 2019   10   22  295 7235 -999.00 -999.00    0.00    0.00    8.33    1.37
 2019   10   23  296 7236 -999.00 -999.00    0.00    0.00    8.33    1.37
 2019   10   24  297 7237 -999.00 -999.00    0.00    0.00    8.33    1.37
 2019   10   25  298 7238 -999.00 -999.00    0.00    0.00    8.33    1.37
 2019   10   26  299 7239 -999.00 -999.00    0.00    0.00    8.33    1.37
 2019   10   27  300 7240 -999.00 -999.00    0.00    0.00    8.33    1.37

Qaanaaq

root=~/data.me/qaanaaq
md5sum ${root}/discharge2018.txt
head ${root}/discharge2018.txt

Leverett Glacier

for f in $(ls ${DATADIR}/Tedstone_2017/leverett_Q_????_UTC.csv); do
  md5sum ${f}
done
echo " "
head -n15 ${f}

GEM

# ls ${DATADIR}/GEM
md5sum ${DATADIR}/GEM/GEM.csv
echo ""
head ${DATADIR}/GEM/GEM.csv
echo ""
tail ${DATADIR}/GEM/GEM.csv

Narsarsuaq

md5sum ${DATADIR}/Hawkings_2016/NarsarsuaqDischarge2013.xlsx

Import Data

<<init>>

Masks

GIMP Ocean Mask

g.mapset -c GIMP_0714
r.in.gdal -o input=${DATADIR}/GIMP/0714/1999.07.01/GimpOceanMask_90m output=mask_ocean_lakes
# ocean mask has some cut-off oceans a.k.a lakes. See for example near -274731,-2747470.
g.region raster=mask_ocean_lakes

# Fill the lakes and treat as land.
r.null map=mask_ocean_lakes setnull=0 # only operate on ocean cells
r.clump input=mask_ocean_lakes output=clumps
main_clump=$(r.stats -c -n clumps sort=desc | head -n1 | cut -d" " -f1)
r.mapcalc "mask_ocean = if(clumps == ${main_clump}, 1, null())"
r.null mask_ocean null=0

Citterio 2013 Ice Mask

g.mapset -c Citterio_2013

mkdir -p tmp

# fix bad shapefile that deosn't import directly into GRASS.
ogr2ogr -s_srs EPSG:4326 -t_srs EPSG:3413 ./tmp/PROMICE_3413 ${DATADIR}/Citterio_2013/PROMICE_250_2019-12-04_EPSG4326
v.in.ogr input=./tmp/PROMICE_3413 output=mask

g.region raster=mask_ocean@GIMP_0714
v.to.rast input=mask output=mask_ice_holes use=val val=1

# Everything should route to the coast, this means we need to fill in
# all the nunatuks so that there are no interior drainage basns.
r.mapcalc "not_ice = if(isnull(mask_ice_holes), 1, null())"
r.clump input=not_ice output=clumps
main_clump=$(r.stats -c -n clumps sort=desc | head -n1 | cut -d" " -f1) # ocean
r.mapcalc "mask_ice = mask_ice_holes ||| if(clumps != ${main_clump})"
r.null mask_ice null=0

ArcticDEM

log_info "Importing ArcticDEM"
g.mapset -c ArcticDEM

r.external source=${DATADIR}/ArcticDEM/arcticdem_mosaic_100m_v4.1_dem.tif output=arctic_raw # all arctic

g.region raster=mask_ocean@GIMP_0714 align=arctic_raw

g.region save=ArcticDEM
g.mapset PERMANENT
g.region region=ArcticDEM@ArcticDEM
g.region -sp # default location
g.mapset ArcticDEM

r.mapcalc "mask_land = if((mask_ice@Citterio_2013 == 0) && (mask_ocean@GIMP_0714 == 0), 1, 0)"

# set up mask: (o)cean=1 (l)and=2 (i)ce=3
r.mapcalc "mask_o_l_i_4 = (mask_ocean@GIMP_0714 + mask_land*2 + mask_ice@Citterio_2013*3)"
r.mapcalc "mask_o_l_i = if(mask_o_l_i_4 == 4, 1, mask_o_l_i_4)" # cleanup some multi-flagged as ocean.
r.mapcalc "z_s_ocean = arctic_raw" # subset and move from external to internal
r.null map=z_s_ocean null=0
r.mapcalc "z_s = if(mask_o_l_i == 1, null(), z_s_ocean)"

BedMachine v5

log_info "Importing BedMachine"
g.mapset -c BedMachine

ROOT=${DATADIR}/Morlighem_2017

for var in mask surface thickness bed errbed geoid source; do
  echo $var
  r.in.gdal -o input="NetCDF:${DATADIR}/Morlighem_2017/BedMachineGreenland-v5.nc:"${var} output=${var}
done
g.rename raster=surface,z_s
g.rename raster=bed,z_b

g.region raster=z_s
g.region res=150 -pa

NSIDC 0713 Basemap

log_info "Importing NSIDC 0713"

g.mapset -c NSIDC_0713
g.region -d
r.external -o input=${DATADIR}/GIMP/0713/RGB.vrt output=rgb

Mouginot 2019

log_info "Loading Mouginot 2019"

g.mapset -c Mouginot_2019
v.import input=${DATADIR}/Mouginot_2019/Greenland_Basins_PS_v1.4.2.shp output=basins snap=1
# v.to.rast input=basins output=basins use=cat label_column=NAME
# v.to.rast input=basins output=regions use=cat label_column=SUBREGION1

Zwally 2012

log_info "Loading Zwally 2012"

g.mapset -c Zwally_2012
v.import input=${DATADIR}/Zwally_2012/sectors output=sectors snap=1
# v.to.rast input=sectors output=sectors use=attr column=cat_

Other

Mankoff 2020 gates

According to ”New in this version” section, use doi:10.22008/promice/data/ice_discharge/gates/v02/UFW2WY

g.mapset -c Mankoff_2020
log_info "Importing Mankoff 2020 gates"

mkdir -p tmp/Mankoff_2020_gates

dest=./tmp/Mankoff_2020_gates/gates.gpkg
if [[ ! -f ${dest} ]]; then
  wget "https://dataverse.geus.dk/api/access/datafile/:persistentId?persistentId=doi:10.22008/promice/data/ice_discharge/gates/v02/UFW2WY" -O ${dest}
fi

md5sum ./tmp/Mankoff_2020_gates/*
# d28f4a1c4c72490dc83893c18352f579  ./tmp/Mankoff_2020_gates/gates.gpkg

v.in.ogr input=./tmp/Mankoff_2020_gates/gates.gpkg output=gates
v.db.select map=gates|head

Bjørk 2015 names

g.mapset -c Bjork_2015
log_info "Importing Bjørk 2015 names"

# ogr2ogr -f "ESRI Shapefile" -t_srs EPSG:3413 EPSG_3413 GreenlandGlacierNames_GGNv01_WGS84.shp 
v.in.ogr input=${DATADIR}/Bjørk_2015/EPSG_3413/ output=names
v.db.select map=names|head

x and y in permanent mapset

<<xy_permanent>>

Bamber 2018

This is just for the outlines on the Disko island figure.

Sum a bunch of the data, then assume wherever it is not 0 is an outlet grid cell.

g.mapset -c Bamber_2018
FILE=${DATADIR}/Bamber_2018/FWF17.v3_a.nc

cdo -v -b F32 timsum ${FILE} ./tmp/B2018.nc # timcumsum?
r.external source=netcdf:./tmp/B2018.nc:runoff_ice output=ice # band=707
r.external source=netcdf:./tmp/B2018.nc:runoff_tundra output=tundra # band=707

r.mask -r
r.mapcalc "MASK = if(ice == 0, null(), 1)" --o
r.to.vect input=ice output=ice type=area
r.mask -r
r.mapcalc "MASK = if(tundra == 0, null(), 1)" --o
r.to.vect input=tundra output=tundra type=area
r.mask -r

Quality control

Streams, Outlets, and Basins

Compare streams and outlets with Google Earth

Data are too large for Google Earth (app), but using Google Earth tiles in QGIS works fine.

See ./freshwater.qgz and add one or more of the following to QGIS Tile Server (XYZ)

Google Maps: https://mt1.google.com/vt/lyrs=r&x={x}&y={y}&z={z} Google Satellite: http://www.google.cn/maps/vt?lyrs=s@189&gl=cn&x={x}&y={y}&z={z} Google Satellite Hybrid: https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z} Google Terrain: https://mt1.google.com/vt/lyrs=t&x={x}&y={y}&z={z} Bing Aerial: http://ecn.t3.tiles.virtualearth.net/tiles/a{q}.jpeg?g=1

Find lakes and compare w/ Google Earth

g.mapset -c lakes
r.fill.dir input=head@land_surf output=filled direction=dir areas=problems format=grass
r.mapcalc "lakes = if(filled - head@land_surf > 0.0, 1, null() )"
r.to.vect input=lakes output=lakes type=area

Then load in QGIS

Land outlet elevations

All land outlets should be at 0 m, or within ArcticDEM uncertainty.
Some other elevations may exist because masks are not aligned.
In reality, outlets could exist at high elevations if they enter the ocean via waterfall from a cliff. See for example outlets at the edge of Petermann fjord.

TODO: Weight by runoff

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import matplotlib
from matplotlib import rc

rc('font', size=12)
rc('text', usetex=False)
# matplotlib.pyplot.xkcd()

# plt.close(1)
fig = plt.figure(1, figsize=(5,4)) # w,h
# get_current_fig_manager().window.move(0,0)
fig.clf()
fig.set_tight_layout(True)
# import matplotlib.gridspec as gridspec
# gs = gridspec.GridSpec(1, 1) #w,h
# ax = plt.subplot(gs[:,:])
ax = fig.add_subplot(211)

df = pd.read_csv("./freshwater/ice_100/outlets.csv")
df['elev'].plot.hist(bins = 100,
                     alpha = 0.5,
                     bottom = 0.1,
                     color = 'b',
                     label="Ice",
                     ax = ax)

df = pd.read_csv("./freshwater/land_100/outlets.csv")
df['elev'].plot.hist(bins = 100,
                     alpha = 0.5,
                     bottom = 0.1,
                     color = 'g',
                     label="Land",
                     ax = ax)

ax.set_yscale("log")    
ax.set_xlabel("Outlet elevation [m]")
ax.set_ylabel("Count")
ax.set_yticks([1E0,1E1,1E2,1E3,1E4])

ax2 = fig.add_subplot(212)

mm = df['elev'].abs().max()
h,b = np.histogram(df['elev'].abs().values,
                   bins=mm)
ax2.plot(b[1:], np.cumsum(h),
         alpha = 0.5,
         label="Land",
         color = 'g')


# mm = df['elev'].max()
# h,b = np.histogram(df[df['elev'] >= 0]['elev'].values,
#                    bins=mm)
# ax2.plot(b[1:], np.cumsum(h),
#          alpha = 0.5,
#          linestyle = '--',
#          label="Land > 0",
#          color = 'g')

# # ax2.set_xlim([0.1,mm])
ax2.set_ylim([1E0,3.1E4])
ax2.set_yticks([0,1E4,2E4,3E4])
ax2.set_xlim([1,1000])
# ax2.set_yscale("log")
ax2.set_xscale("log")
# ax2.set_yticks([1E0,1E1,1E2,1E3,1E4,1E5])

ax2.set_ylabel("Cumulative\ncounts")
ax2.set_xlabel("Elevation [m]")

ax.legend()
# ax2.legend(loc=4)

from adjust_spines import adjust_spines as adj
adj(ax, ['left','bottom'])
adj(ax2, ['left','bottom'])

df = pd.DataFrame()
df['Elevation [m]'] = np.append(np.arange(10), np.arange(10, 101, 10))
df.set_index('Elevation [m]', inplace=True)
df['CDF [%]'] = cumsum(h)[df.index] / cumsum(h).max() * 100

plt.savefig('./fig/outlet_elevation_histogram_cdf.png', transparent=True, bbox_inches='tight', dpi=300)

df
# (cumsum(h)[0:100+1:10])/cumsum(h).max()

Elevation [m]	CDF [%]
0	0.200127
1	39.3483
2	45.9024
3	51.4292
4	56.6392
5	61.2721
6	65.2647
7	68.5634
8	71.4352
9	73.9168
10	76.0015
20	86.0745
30	89.8769
40	92.055
50	93.5259
60	94.5566
70	95.3604
80	95.9841
90	96.5078
100	96.9147

Outputs

Check CF compliance

conda activate freshwater
cfchecks ./freshwater/land/runoff/MAR_2000.nc

CHECKING NetCDF FILE: ./freshwater/land/runoff/MAR_2000.nc
=====================
Using CF Checker Version 4.0.0
Checking against CF Version CF-1.7
Using Standard Name Table Version 76 (2020-10-13T12:38:27Z)
Using Area Type Table Version 10 (23 June 2020)
Using Standardized Region Name Table Version 4 (18 December 2018)


------------------
Checking variable: station
------------------
WARN: (3.1): units attribute should be present

------------------
Checking variable: time
------------------

------------------
Checking variable: runoff
------------------

------------------
Checking variable: lat
------------------

------------------
Checking variable: lon
------------------

------------------
Checking variable: alt
------------------

ERRORS detected: 0
WARNINGS given: 1
INFORMATION messages: 0

NetCDF header

ncdump -chs ./freshwater/land/runoff/MAR_2000.nc

netcdf MAR_2000 {
dimensions:
	time = UNLIMITED ; // (366 currently)
	station = 29635 ;
variables:
	uint64 station(station) ;
		station:long_name = "outlet id" ;
		station:cf_role = "timeseries_id" ;
		station:_Storage = "contiguous" ;
		station:_Endianness = "little" ;
	int64 time(time) ;
		time:long_name = "time of measurement" ;
		time:standard_name = "time" ;
		time:axis = "T" ;
		time:units = "days since 2000-01-01 00:00:00" ;
		time:calendar = "proleptic_gregorian" ;
		time:_Storage = "chunked" ;
		time:_ChunkSizes = 512 ;
		time:_Endianness = "little" ;
	double runoff(station, time) ;
		runoff:_FillValue = NaN ;
		runoff:long_name = "MAR runoff" ;
		runoff:standard_name = "water_volume_transport_in_river_channel" ;
		runoff:units = "m3 s-1" ;
		runoff:coordinates = "lat lon alt station" ;
		runoff:_Storage = "chunked" ;
		runoff:_ChunkSizes = 29635, 1 ;
		runoff:_DeflateLevel = 2 ;
		runoff:_Shuffle = "true" ;
		runoff:_Endianness = "little" ;
	float lat(station) ;
		lat:_FillValue = NaNf ;
		lat:long_name = "latitude" ;
		lat:standard_name = "latitude" ;
		lat:units = "degrees_north" ;
		lat:axis = "Y" ;
		lat:_Storage = "contiguous" ;
		lat:_Endianness = "little" ;
	float lon(station) ;
		lon:_FillValue = NaNf ;
		lon:long_name = "longitude" ;
		lon:standard_name = "longitude" ;
		lon:units = "degrees_east" ;
		lon:axis = "X" ;
		lon:_Storage = "contiguous" ;
		lon:_Endianness = "little" ;
	float alt(station) ;
		alt:_FillValue = NaNf ;
		alt:long_name = "height_above_mean_sea_level" ;
		alt:standard_name = "altitude" ;
		alt:units = "m" ;
		alt:positive = "up" ;
		alt:axis = "Z" ;
		alt:_Storage = "contiguous" ;
		alt:_Endianness = "little" ;

// global attributes:
		:featureType = "timeSeries" ;
		:title = "Greenland runoff" ;
		:summary = "Greenland RCM runoff at basin outlets" ;
		:keywords = "Hydrology; Greenland; Runoff; Freshwater" ;
		:Conventions = "CF-1.7" ;
		:source = "git commit: 29aa681" ;
		:creator_name = "Ken Mankoff" ;
		:creator_email = "kdm@geus.dk" ;
		:creator_url = "http://kenmankoff.com" ;
		:institution = "GEUS" ;
		:references = "10.22008/promice/freshwater" ;
		:product_version = 1. ;
		:_NCProperties = "version=2,netcdf=4.7.4,hdf5=1.10.5" ;
		:_SuperblockVersion = 0 ;
		:_IsNetcdf4 = 1 ;
		:_Format = "netCDF-4" ;
}

Annual sums agree?

Yes, within ~0.3 %.

import xarray as xr
import glob as glob
import pandas as pd

df[80] = xr.open_mfdataset(glob.glob("./freshwater/ice_80/runoff/MAR_*.nc"), combine='by_coords')\
           .resample({'time':'A'})\
           .sum()\
           .sum(dim='station')['runoff']\
           .to_dataframe()\
           .values

df[90] = xr.open_mfdataset(glob.glob("./freshwater/ice_90/runoff/MAR_*.nc"), combine='by_coords')\
           .resample({'time':'A'})\
           .sum()\
           .sum(dim='station')['runoff']\
           .to_dataframe()\
           .values

df[100] = xr.open_mfdataset(glob.glob("./freshwater/ice_100/runoff/MAR_*.nc"), combine='by_coords')\
           .resample({'time':'A'})\
           .sum()\
           .sum(dim='station')['runoff']\
           .to_dataframe()\
           .values

df['range'] = df.max(axis='columns') - df.min(axis='columns')
df['%'] = df['range'] / df.max(axis='columns') * 100

df.describe()

Supplemental material

Coverage

Introduction

The 1) MAR and RACMO models, 2) basins based on the GIMP surface, and 3) reality, are not always correctly aligned. Misalignment includes the following:

Reality	Basin	Model	Comment
Glacier	Glacier	Glacier	aligned
Land	Land	Land	aligned
Glacier	Glacier	Land
Glacier	Land	Glacier
Glacier	Land	Land
Land	Land	Glacier
Land	Glacier	Land
Land	Glacier	Glacier

We ignore “Reality” and treat the basin output as the standard, which reduces Table tab:misalignment to just two cases where basins and model disagree, with the following effects:

Where basin = glacier and model = land, glacier runoff is lost
Where basin = land and model = glacier, land runoff is lost

~~-----------------------------------------------------------------------------~~

description	kilometers
Category Information	square
1	land	361,950
	----------------------------------------------------------------	----------
	0	model fjord. . . . . . . . . . . . . . . . . . . . . . . . . .	717
	1	model land . . . . . . . . . . . . . . . . . . . . . . . . . .	339,749
	2	model ice. . . . . . . . . . . . . . . . . . . . . . . . . . .	21,484
*	no data	3,676,860
	----------------------------------------------------------------	----------
	0	model fjord. . . . . . . . . . . . . . . . . . . . . . . . . .	1,863,957
	1	model land . . . . . . . . . . . . . . . . . . . . . . . . . .	51,532
	2	model ice. . . . . . . . . . . . . . . . . . . . . . . . . . .	1,761,221
	*	no data. . . . . . . . . . . . . . . . . . . . . . . . . . . .	150
TOTAL	4,038,810

~~-----------------------------------------------------------------------------~~ ~~-----------------------------------------------------------------------------~~

description	kilometers
Category Information	square
1	ice	1,781,816
	----------------------------------------------------------------	----------
	0	model fjord. . . . . . . . . . . . . . . . . . . . . . . . . .	27
	1	model land . . . . . . . . . . . . . . . . . . . . . . . . . .	20,876
	2	model ice. . . . . . . . . . . . . . . . . . . . . . . . . . .	1,760,912
*	no data	2,256,994
	----------------------------------------------------------------	----------
	0	model fjord. . . . . . . . . . . . . . . . . . . . . . . . . .	1,864,647
	1	model land . . . . . . . . . . . . . . . . . . . . . . . . . .	370,404
	2	model ice. . . . . . . . . . . . . . . . . . . . . . . . . . .	21,793
	*	no data. . . . . . . . . . . . . . . . . . . . . . . . . . . .	150
TOTAL	4,038,810

~~-----------------------------------------------------------------------------~~

We adjust the model results to the basin in the following ways. For speed, the following is done once using a basin and model mask, not using the daily values. A coverage factor is calculated one time, and then used as a daily multiplier later in the workflow.

Where a basin reports ice and the model reports land, we divide the daily runoff values of the basin cells that do have model ice by the ratio of the total basin area to the area with ice (with runoff data). That is, the model land runoff is discarded, and the nearby model ice runoff is scaled to compensate for the uncovered basin cells. For example, if an ice basin is 90 % covered by model ice, model runoff is divided by 0.9 to scale it to an estimate of what 100 % runoff would be. This scenario occurs frequently at the ice margin because the 1 km² model grid cell rarely matches 90 m² basin boundary. It also occurs wherever nunatuks exist, because ice sheet interior “holes” are filled, otherwise they act as interior drains that block large areas from routing to the coast, something unlikely to occur in reality. When a small basin has no model cells covering any part of it, that basin never has any reported runoff.

Where a basin reports land and the model reports ice, the same treatment as above is performed, but for land. That is, the model ice runoff is discarded, and the nearby model land runoff is scaled to compensate for the uncovered basin cells. As above, this occurs frequently at the ice margin for the same reason. Large areas of model ice (and reality ice) are occasionally reported over land basins where outlet glaciers are not properly captured in the model masks (FIGURE near UAK).

Where the model mask reports ocean, no basins exist and model runoff, both land and ice, are discarded. Large amounts of real ocean and fjord locations are classified as land in the MAR model mask variable.

The above means that model runoff is not conserved.

Algorithm

Test Manual

Scale the model area covering each basin so that it equals the basin area.

First, lets test this - some visual examination

rm -fR ./G/test_coverage
grass -c ./G/test_coverage

g.mapset -c coverage_test
g.region -d
r.mask -r

# pick one and zoom in
r.mapcalc "land_basin = if(basins@ice_surf == 353, 353, null())"  --q --o
r.colors map=land_basin color=green
r.mapcalc "ice_basin = if(basins@ice_surf == 12, 12, null())"  --q --o
r.colors map=ice_basin color=water

g.region zoom=land_basin --q

d.mon start=wx0
d.erase
d.rast land_basin
d.rast ice_basin
# # check if model ice exists here
d.rast mask_ice_MAR@MAR # yes

r.report --q -h map=ice_basin,mask_ice_MAR@MAR units=k | grep -A4 "|12|"
# SAME -> setting mask not needed for ice.

g.region zoom=land_basin --q
r.report --q -h map=land_basin,mask_land_MAR@MAR units=k | grep -A4 "|353|"

g.region zoom=land_basin --q
r.report --q -h map=ice_basins@land_surf,mask_ice_MAR@MAR units=k | grep -A4 "|353|"

# coverage report
r.report --q -h map=basins@land_surf,mask_ice_MAR@MAR units=k

Generate a test file

grass -c ./G/test_coverage
r.report --q -h map=basins@ice_surf,mask_ice_MAR@MAR units=k | head -n 100
r.stats --q -aN input=basins@ice_surf,mask_ice_MAR@MAR separator=,  > ./tmp/coverage_test.csv
exit

Compute coverage

Solution

The key line is:

r.stats --q -aN input=basins@ice_surf,mask_ice_MAR@MAR separator=,  > ./tmp/coverage_test.csv

In the main processing loop, I’ll add a line like:

r.stats --q -aN input=basins@<MAPSET>,mask_<domain>_<model>@<model> separator=,  \
	> ./tmp/coverage_<mapset>_<domain>_<model>.csv

And then in post-processing I will calculate coverage and apply it to each basin.

Figure

g.mapset -c test_alignment

g.region n=-3227490 s=-3253500 w=7110 e=48960 res=90 -pa align=mask_ice_basin@MAR
g.region save=test --o

# MAR grid to this mapset and region subset
r.to.vect input=mask_ice_MAR@MAR output=MAR_grid_ice type=area
r.to.vect input=mask_land_MAR@MAR output=MAR_grid_land type=area

# MAR grid centers at MAR res to this mapset and region subset
g.region res=1000 -p
r.to.vect input=mask_ice_MAR@MAR output=MAR_grid_ice_point type=point
r.to.vect input=mask_land_MAR@MAR output=MAR_grid_land_point type=point
g.region region=test -p

d.mon start=wx1
d.erase

d.vect basins@land_surf fill_color=178:223:138 color=100:100:100 legend_label="Land Basin"
d.vect basins@ice_surf fill_color=166:206:227 color=100:100:100 legend_label="Ice Basin" 
# d.vect outlets@land_surf fill_color=green icon=basic/circle size=7 legend_label="Ice Outlet"
# d.vect outlets@ice_surf fill_color=blue icon=basic/circle size=7 legend_label="Land Outlet"

d.vect MAR_grid_ice fill_color=none color=black width=3 legend_label="MAR Ice/Land Boundary"
d.vect MAR_grid_ice_point icon=basic/box color=black fill_color=blue size=3 legend_label="MAR Ice Cell"
d.vect MAR_grid_land_point icon=basic/box color=black fill_color=green size=3 legend_label="MAR Land Cell"

NOTE - better figure generated in QGIS, but using the data created above. See ./fig/alignment.qgz and output ./fig/basin_MAR_align.png

ps.map -e input=./tmp/basin_MAR_align.psmap output=./fig/basin_MAR_align.eps --o
convert -trim -density 300 ./fig/basin_MAR_align.eps ./fig/basin_MAR_align.png
o ./fig/basin_MAR_align.png

border n
# scale 1:1000000
paper a4
  end

vareas MAR_grid_ice
  color black
  pat $GISBASE/etc/paint/patterns/diag_down.eps
  scale 1
  fcolor 55:126:184
  width 1
  label "MAR ice"
  end

vareas MAR_grid_land
  color black
  pat $GISBASE/etc/paint/patterns/diag_up.eps
  scale 1
  fcolor 51:160:44
  width 1
  label "MAR land"
  end

# hatch fill
vareas basins@ice_surf
  # pat $GISBASE/etc/paint/patterns/diag_up.eps
  # scale 0.5
  # fcolor 55:126:184
  color 55:126:184
  # fcolor 166:206:227
  fcolor 240:240:240
  label "Basin ice"
  end

# vareas basins@ice_surf
#   color 55:126:184
#   fcolor 166:206:227
#   end

vareas basins@land_surf
  color 51:160:44
  fcolor 127:223:138
  label "Basin land"
  end

# # crosses on land
# vpoints MAR_grid_land_point
#   type point
#   color 51:160:44
#   scale 3
#   # see ${GISBAS}/etc/symbol/basic/
#   symbol basic/x
#   label "Model Land Grid Cell"
#   end

vlegend
  where 5.7 1.1
  fontsize 14
  end

scalebar f
   length 4
   units kilometers
   segment 2
   where 1 5
   fontsize 14
   end

Figures

Overview

<<py_init>>
<<py_init_graphics>>

import geopandas as gp
from shapely.geometry import Point

import matplotlib
from matplotlib import rc
rc('font', size=12)
rc('text', usetex=False)
# matplotlib.pyplot.xkcd()

# plt.close(1)
fig = plt.figure(1, figsize=(8,11)) # w,h
# get_current_fig_manager().window.move(0,0)
fig.clf()
fig.set_tight_layout(True)
ax = fig.add_subplot(111)

if 'land' not in locals():
    land = (gp.read_file("./freshwater/land_100/outlets.gpkg").set_index("cat"))\
        .merge(gp.read_file("./freshwater/land_100/basins_filled.gpkg").set_index("cat"), left_index=True, right_index=True)\
        .rename(columns={"geometry_x":"outlet", "geometry_y":"basin"})\
        .set_geometry('basin')

    ice = (gp.read_file("./freshwater/ice_100/outlets.gpkg").set_index("cat"))\
        .merge(gp.read_file("./freshwater/ice_100/basins.gpkg").set_index("cat"), left_index=True, right_index=True)\
        .rename(columns={"geometry_x":"outlet", "geometry_y":"basin"})\
        .set_geometry('basin')

kw = {'ax':ax, 'linewidth':0.3}
land.plot(color=C_lightgreen, edgecolor=C_darkgreen, **kw)
ice[ice.elev > -10].plot(color=(0.8, 0.8, 0.8), edgecolor=C_darkblue, **kw)
ice[ice.elev <= -10].plot(color=C_lightblue, edgecolor=C_darkblue, **kw)

from matplotlib_scalebar.scalebar import ScaleBar
ax.add_artist(ScaleBar(1))

gdf = gp.GeoDataFrame(obs_xy, columns=['abbrev','x','y'], crs="EPSG:3413").set_index("abbrev")
gdf['geometry'] = [Point(x,y) for x,y in zip(gdf['x'],gdf['y'])]

import matplotlib.patheffects as pe
kw = {'color':'k', 'path_effects':[pe.withStroke(linewidth=4, foreground="white")]}

for obs in gdf.index:
    basin = land[land.contains(gdf.loc[obs].geometry)].geometry
    if obs == 'L':
        basin = ice[ice.contains(gdf.loc[obs].geometry)].geometry
    box = basin.envelope.scale(1.1, 1.1)
    box.plot(ax=ax, color='none', edgecolor='k')
    xy = box.iloc[0].exterior.coords.xy
    x,y = xy[0][-2], xy[1][-2]

    
    txt=obs
    # if obs == 'L': txt = ''
    # if obs == 'W': 
    #     txt = 'W & L'
    #     ax.text(x-35000,y+12000, txt, horizontalalignment='left', **kw)
    if obs not in ['K','Kb','O']:
        ax.text(x-3000,y+5000, txt, horizontalalignment='right', **kw)
    else: # obs in ['K','Kb','O']:
        if obs == 'K':
            x,y = xy[0][-1], xy[1][-1]
            ax.text(x-3000,y-5000, txt, horizontalalignment='right', verticalalignment='top', **kw)
        if obs == 'O':
            x,y = xy[0][2], xy[1][2]
            ax.text(x+3000,y+5000, txt, horizontalalignment='left', verticalalignment='bottom', **kw)
        if obs == 'Kb':
            x,y = xy[0][1], xy[1][1]
            ax.text(x+3000,y-5000, txt, horizontalalignment='left', verticalalignment='top', **kw)

plt.axis('off')

plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, hspace = 0, wspace = 0)
plt.margins(0,0)
plt.gca().xaxis.set_major_locator(plt.NullLocator())
plt.gca().yaxis.set_major_locator(plt.NullLocator())
plt.savefig("./fig/overview.png", bbox_inches='tight', dpi=300, pad_inches=0)

Basin changes with changing k

Notes

To find out how much each upstream cell moved between two basin delineations, B0 and B1…
- For each outlet in B0 and B1 get the x and y coordinate and the outlet ID (same as the basin ID).
- Build four raster maps, B0_x, B0_y, B1_x, B1_y, where each cell in x(y) has the x(y) coordinate of the outlet for that cell.
- Distance in outlet location is then sqrt((B0_x - B1_x)^2 + (B0_y - B1_y)^2) for each cell.
When comparing many (n) basin delineations, each cell has n (x,y) possible outlet locations.
- Calculate mean_x, mean_y for each cell from n (x,y) possibilities
- Then calculate average/max/std distance for each cell from the n (x,y) to the mean.
- Then, one map captures n possible delineations

Generate difference maps for different basins

<<init>>

Extract the x and y coordinates of each outlet

g.mapset -c k_basin_change

log_info "Exporting outlet coordinates"

mapsets=$(g.mapsets -l separator="\n" | grep -E "^ice_*")
parallel --bar "r.out.xyz input=outlets@{1} | awk -F'|' '{print \$3, \$1}' > ./tmp/outlets_{1}_x" ::: ${mapsets}
parallel --bar "r.out.xyz input=outlets@{1} | awk -F'|' '{print \$3, \$2}' > ./tmp/outlets_{1}_y" ::: ${mapsets}

Basins map where pixel encodes the x and y coordinate of its outlet

First encode the outlet location as the category information of each basin, rather than as the value for each (x,y) cell.
- This means we encode n basin bits of info (20,000?) rather than n grid cells bits of info (4.5M?)
Then create new rasters where the grid cells contain the outlet info directly. Easier for math further down in the code.

log_info "Encoding outlet coordinates"

coord="x y"
parallel --bar "g.copy basins@{1},cat_{1}_{2}" ::: ${mapsets} ::: ${coord}
parallel --bar "r.category map=cat_{1}_{2} rules=./tmp/outlets_{1}_{2} separator=space" ::: ${mapsets} ::: ${coord}
parallel --bar "r.mapcalc '{1}_{2} = @cat_{1}_{2}'" ::: ${mapsets} ::: ${coord}

More complex stats in Python

Export to CSV

log_info "Exporting cells..."
log_warn "Reducing resolution."

LIST=$(g.list type=raster pattern='^[i|l][ceand]*_*[x|y]$' separator=,)
echo "x,y,${LIST}" | tr "," "|" > ./tmp/cells_xy.bsv
g.region res=1000 -a
r.out.xyz input=${LIST} output=- >> ./tmp/cells_xy.bsv
g.region -d


LIST=ice_100_x,ice_100_y,ice_80_x,ice_80_y,ice_90_x,ice_90_y,z_s@ArcticDEM
echo "x,y,${LIST}" | tr "," "|" > ./tmp/cells_elev.bsv
g.region res=1000 -a
r.out.xyz input=${LIST} output=- >> ./tmp/cells_elev.bsv
g.region -d

Import & process in Python

import pandas as pd
import numpy as np

df_xy = pd.read_csv("./tmp/cells_xy.bsv", delimiter="|", usecols=['x','y']).astype(np.int)

header = pd.read_csv("./tmp/cells_xy.bsv", delimiter="|", nrows=1)
cols_in = header.columns[['ice' in _ for _ in header.columns]]
df = pd.read_csv("./tmp/cells_xy.bsv", delimiter="|", usecols=cols_in)

multi_hdr = [np.array([_.split("_")[1] for _ in cols_in]).astype(np.int), 
             [_.split("_")[2] for _ in cols_in]]
df.columns = pd.MultiIndex.from_arrays(multi_hdr, names=['k','coord'])

k = np.unique(multi_hdr[0])

for kk in k:

    # mean distance for this k =
    # all other columns minus this column squared for x
    # also for y
    # square root of all that
    # take the mean of all columns

    df[(kk,'avg_dist')] = (((df.xs('x', level='coord', axis='columns').drop(kk, axis='columns').apply(lambda c: c-df.xs('x', level='coord', axis='columns')[kk]))**2 + (df.xs('y', level='coord', axis='columns').drop(kk, axis='columns').apply(lambda c: c-df.xs('y', level='coord', axis='columns')[kk]))**2)**0.5).mean(axis='columns')

    # same for max, except last step

    df[(kk,'max_dist')] = (((df.xs('x', level='coord', axis='columns').drop(kk, axis='columns').apply(lambda c: c-df.xs('x', level='coord', axis='columns')[kk]))**2 + (df.xs('y', level='coord', axis='columns').drop(kk, axis='columns').apply(lambda c: c-df.xs('y', level='coord', axis='columns')[kk]))**2)**0.5).max(axis='columns')

    # df[(kk,'std_dist')] = (((df.xs('x', level='coord', axis='columns').drop(kk, axis='columns').apply(lambda c: c-df.xs('x', level='coord', axis='columns')[kk]))**2 + (df.xs('y', level='coord', axis='columns').drop(kk, axis='columns').apply(lambda c: c-df.xs('y', level='coord', axis='columns')[kk]))**2)**0.5).std(axis='columns')

df_out = pd.DataFrame()
df_out['avg_avg'] = df.xs('avg_dist', level='coord', axis='columns').mean(axis='columns')
# df_out['avg_max'] = df.xs('max_dist', level='coord', axis='columns').mean(axis='columns')
# df_out['avg_std'] = df.xs('std_dist', level='coord', axis='columns').mean(axis='columns')

# df_out['max_avg'] = df.xs('avg_dist', level='coord', axis='columns').max(axis='columns')
df_out['max_max'] = df.xs('max_dist', level='coord', axis='columns').max(axis='columns')
# df_out['max_std'] = df.xs('std_dist', level='coord', axis='columns').max(axis='columns')

# df_out['std_avg'] = df.xs('avg_dist', level='coord', axis='columns').std(axis='columns')
# df_out['std_max'] = df.xs('max_dist', level='coord', axis='columns').std(axis='columns')
# df_out['std_std'] = df.xs('std_dist', level='coord', axis='columns').std(axis='columns')

df_out = df_out.merge(df_xy, left_index=True, right_index=True)
df_out.to_csv("./tmp/cells_xy_ice_stats.bsv", sep="|", float_format='%d')

NO - Now that only doing 1x land_100, this method no longer works

cols_in = header.columns[['land' in _ for _ in header.columns]]
df = pd.read_csv("./tmp/cells_xy.bsv", delimiter="|", usecols=cols_in)
df.columns = pd.MultiIndex.from_arrays(multi_hdr, names=['k','coord'])

for kk in k:

    df[(kk,'avg_dist')] = (((df.xs('x', level='coord', axis='columns').drop(kk, axis='columns').apply(lambda c: c-df.xs('x', level='coord', axis='columns')[kk]))**2 + (df.xs('y', level='coord', axis='columns').drop(kk, axis='columns').apply(lambda c: c-df.xs('y', level='coord', axis='columns')[kk]))**2)**0.5).mean(axis='columns')

    df[(kk,'max_dist')] = (((df.xs('x', level='coord', axis='columns').drop(kk, axis='columns').apply(lambda c: c-df.xs('x', level='coord', axis='columns')[kk]))**2 + (df.xs('y', level='coord', axis='columns').drop(kk, axis='columns').apply(lambda c: c-df.xs('y', level='coord', axis='columns')[kk]))**2)**0.5).max(axis='columns')

df_out = pd.DataFrame()
df_out['avg_avg'] = df.xs('avg_dist', level='coord', axis='columns').mean(axis='columns')

df_out['max_max'] = df.xs('max_dist', level='coord', axis='columns').max(axis='columns')

df_out = df_out.merge(df_xy, left_index=True, right_index=True)
df_out.to_csv("./tmp/cells_xy_land_stats.bsv", sep="|", float_format='%d')

Import Python results back to GRASS

# run python code above to calculate basin outlet statistics

log_info "Calculating cell outlet statistics..."

python ./k_basin_change.py 

log_info "Re-importing rasters."
log_warn "Reduced resolution."

g.region res=1000 -a
r.in.xyz input=./tmp/cells_xy_ice_stats.bsv output=ice_max method=mean x=4 y=5 z=3 skip=1
r.in.xyz input=./tmp/cells_xy_ice_stats.bsv output=ice_avg method=mean x=4 y=5 z=2 skip=1

# r.in.xyz input=./tmp/cells_xy_land_stats.bsv output=land_max method=mean x=4 y=5 z=3 skip=1
# r.in.xyz input=./tmp/cells_xy_land_stats.bsv output=land_avg method=mean x=4 y=5 z=2 skip=1

Visualize Maps

Color the difference maps

-1 = -1
0 thru 2999 = 1 0 to < 3 km
3000 thru 10000 = 2 3 to < 10 km
10001 thru 29999 = 3 10 to < 30 km
30000 thru 99999 = 4 30 to < 100 km
100000 thru 10000000 = 5 > 100 km

# for raster in ice_max ice_avg land_max land_avg; do
for raster in ice_max ice_avg; do
  r.mask raster=mask_o_l_i@ArcticDEM maskcats="2 3" --o
  r.mapcalc "${raster}_land = if(isnull(${raster}), -1 * (mask_o_l_i@ArcticDEM >=2), ${raster})" --o
  r.mask -r
  r.reclass input=${raster}_land output=${raster}_class --o rules=./tmp/categories.txt

  cat << EOF | r.colors map=${raster}_class rules=-
-1 128:128:128
5 255:255:204
4 161:218:180
3 65:182:196
2 44:127:184
1 37:52:148
EOF
done

# https://colorbrewer2.org/?type=sequential&scheme=YlGnBu&n=5

Prepare coastline vectors

g.mapset k_basin_change

g.region -d
r.mapcalc "outline = if(mask_o_l_i@ArcticDEM != 1)"
r.to.vect input=outline output=outline type=area

g.region res=1000 -a
r.mapcalc "outline_low = if(mask_o_l_i@ArcticDEM != 1, 1, null())"
r.to.vect input=outline_low output=outline_low type=area
g.region -d

Prepare some contours

Could be useful to show an elevation contour representative of the max melt elevation

r.mask outline_low maskcats=1 --o
g.region res=1000 -a
r.mapcalc "z_s = z_s@BedMachine" # mask
r.contour input=z_s output=z_s levels=1000,1500,2000
g.region -d
r.mask -r

PSMAP commands

border n
# scale 1:1000000
paper a3
  end

raster ${raster}

# vareas outline_low
#   color gray
#   fcolor gray
#   # width 0.1
#   end

vlines z_s
  type line
  where level = 1500
  color black
  width 1
  end
vlines z_s
  type line
  where level = 1500
  color white
  width 5
  end

colortable y
  nodata n
  where 5.5 12
  fontsize 24
  end
scalebar s
  length 100
  units kilometers
  segment 1
  where 7 11.5
  fontsize 24
  end

TEST Generate an image

# All of Greenland

g.region -d 
g.region res=500 -a

raster=land_avg_class
cat << EOF | ps.map -e input=- output=./tmp/${raster}.eps --o
<<psmap>>
<<psmap_legend_GL>>
EOF
convert -trim ./tmp/${raster}.{eps,png}
o ./tmp/${raster}.png
g.region -d

Generate EPS files: Greenland

Use the PSMAP block above

log_info "Generating graphic..."

mkdir -p ./fig

for domain in ice; do
  for stat in max avg; do
    raster=${domain}_${stat}_class

    g.region -d
    g.region res=1000 -a
    r.mask -r

    cat << EOF | ps.map input=- output=./tmp/${raster}.ps --o
    <<psmap>>
EOF

    convert -trim -transparent white ./tmp/${raster}.ps ./fig/${raster}.png
    g.region -d
  done
done

Hexbin heatmap of dist vs. elevation

::header-args:bash+: :export no ::header-args:jupyter-python+: :export no

Import & process in Python

import pandas as pd
import numpy as np

df_xy = pd.read_csv("./tmp/cells_elev.bsv", delimiter="|", usecols=['x','y','z_s@ArcticDEM']).astype(np.int)

header = pd.read_csv("./tmp/cells_elev.bsv", delimiter="|", nrows=1)
cols_in = header.columns[['ice' in _ for _ in header.columns]]
df = pd.read_csv("./tmp/cells_elev.bsv", delimiter="|", usecols=cols_in)

multi_hdr = [np.array([_.split("_")[1] for _ in cols_in]).astype(np.int), 
             [_.split("_")[2] for _ in cols_in]]
df.columns = pd.MultiIndex.from_arrays(multi_hdr, names=['k','coord'])

k = np.unique(multi_hdr[0])

for kk in k:

    # mean distance for this k =
    # all other columns minus this column squared for x
    # also for y
    # square root of all that
    # take the mean of all columns

    df[(kk,'avg_dist')] = (((df.xs('x', level='coord', axis='columns').drop(kk, axis='columns').apply(lambda c: c-df.xs('x', level='coord', axis='columns')[kk]))**2 + (df.xs('y', level='coord', axis='columns').drop(kk, axis='columns').apply(lambda c: c-df.xs('y', level='coord', axis='columns')[kk]))**2)**0.5).mean(axis='columns')

    # same for max, except last step

    df[(kk,'max_dist')] = (((df.xs('x', level='coord', axis='columns').drop(kk, axis='columns').apply(lambda c: c-df.xs('x', level='coord', axis='columns')[kk]))**2 + (df.xs('y', level='coord', axis='columns').drop(kk, axis='columns').apply(lambda c: c-df.xs('y', level='coord', axis='columns')[kk]))**2)**0.5).max(axis='columns')

    # df[(kk,'std_dist')] = (((df.xs('x', level='coord', axis='columns').drop(kk, axis='columns').apply(lambda c: c-df.xs('x', level='coord', axis='columns')[kk]))**2 + (df.xs('y', level='coord', axis='columns').drop(kk, axis='columns').apply(lambda c: c-df.xs('y', level='coord', axis='columns')[kk]))**2)**0.5).std(axis='columns')

df_out = pd.DataFrame()
df_out['avg_avg'] = df.xs('avg_dist', level='coord', axis='columns').mean(axis='columns')
# df_out['avg_max'] = df.xs('max_dist', level='coord', axis='columns').mean(axis='columns')
# df_out['avg_std'] = df.xs('std_dist', level='coord', axis='columns').mean(axis='columns')

# df_out['max_avg'] = df.xs('avg_dist', level='coord', axis='columns').max(axis='columns')
df_out['max_max'] = df.xs('max_dist', level='coord', axis='columns').max(axis='columns')
# df_out['max_std'] = df.xs('std_dist', level='coord', axis='columns').max(axis='columns')

# df_out['std_avg'] = df.xs('avg_dist', level='coord', axis='columns').std(axis='columns')
# df_out['std_max'] = df.xs('max_dist', level='coord', axis='columns').std(axis='columns')
# df_out['std_std'] = df.xs('std_dist', level='coord', axis='columns').std(axis='columns')

df_out = df_out.merge(df_xy, left_index=True, right_index=True)
# df_out.to_csv("./tmp/cells_xy_ice_stats.bsv", sep="|", float_format='%d')

import matplotlib.pyplot as plt

import matplotlib
from matplotlib import rc
rc('font', size=12)
rc('text', usetex=False)
# matplotlib.pyplot.xkcd()

# plt.close(1)
fig = plt.figure(1, figsize=(5,4)) # w,h
# get_current_fig_manager().window.move(0,0)
fig.clf()
fig.set_tight_layout(True)
# import matplotlib.gridspec as gridspec
# gs = gridspec.GridSpec(1, 1) #w,h
# ax = plt.subplot(gs[:,:])
ax = fig.add_subplot(111)

df2 = df_out.copy(deep=True)
df2 = df2[df2['z_s@ArcticDEM'] < 1500]
df2 = df2[df2['max_max'] < 250*1E3]
df2 = df2[df2['max_max'] > 100*10*sqrt(2)]
ax.hexbin(df2['max_max']/1E3,df2['z_s@ArcticDEM'], mincnt=1, gridsize=(50,25), bins='log')
# ax.set_xlim(0,100E3)
# ax.set_ylim(0,3000)
# ax.plot(np.sin(np.arange(0,2*pi,0.1)))

# plt.savefig('.png', transparent=True, bbox_inches='tight', dpi=300)

Modified Tukey plot for all observations

Bamber 2018

Observations vs. RCM, map + ts + scatter

Ingest obs location into GRASS obs mapset

# this is in Bash w/o GRASS
rm -f ./tmp/obs_loc.bsv
for key in "${!tbl_obs[@]}"; do 
  row=(${tbl_obs[$key]})
  echo ${row[0]}"|"${row[1]}"|"${key}"|" >> ./tmp/obs_loc.bsv
done

# in GRASS
<<grass_init_mapset>>
g.mapset -c obs

v.in.ascii input=./tmp/obs_loc.bsv output=obs_loc x=1 y=2
v.db.renamecolumn map=obs_loc column=int_1,x
v.db.renamecolumn map=obs_loc column=int_2,y
v.db.renamecolumn map=obs_loc column=str_1,label
v.db.dropcolumn map=obs_loc column=int_3

db.select table=obs_loc

RCM at each obs

Check locations

Recall locations:

for key in "${!tbl[@]}"; do 
  row=(${tbl[$key]})
  echo ${row[@]} ${key} 
done

# also
# cat ./tmp/obs_loc.bsv

import geopandas as gpd
from shapely.geometry import Point

gdf = gpd.GeoDataFrame(tbl, columns=['abbrev','x','y'], crs="EPSG:3413").set_index("abbrev")
gdf['geometry'] = [Point(x,y) for x,y in zip(gdf['x'],gdf['y'])]
gdf.to_file("./tmp/obs.gpkg", driver="GPKG")
gdf

abbrev	x	y	geometry
Ks	-18335	-3183360	POINT (-18335 -3183360)
K	-326372	-2829354	POINT (-326372 -2829354)
Kb	-316602	-2831048	POINT (-316602 -2831048)
L	-216646	-2504583	POINT (-216646 -2504583)
O	-317396	-2826710	POINT (-317396 -2826710)
Q	-560538	-1241281	POINT (-560538 -1241281)
R	-335678	-2246371	POINT (-335678 -2246371)
T	-324548	-2827284	POINT (-324548 -2827284)
W	-249713	-2510668	POINT (-249713 -2510668)
Z	699990	-1540459	POINT (699990 -1540459)

View

qgis ./tmp/obs.gpkg &

Comments

Issues

Leverett always in micro-basin at ice sheet edge. Suggest manually moving or using upstream to collect other nearby micro-catchments.
- Used “upstream” from original Leverett and there was too much runoff.
- New Leverett is adjusted into the largest nearby ice basin.
Narsarsuaq in micro-basin at glacier snout. Seems likely that upstream fixes this. It appears that upstream glaciers drain to land and then under this glacier.

Runoff at each obs

source ~/.bash_profile
conda activate freshwater
mkdir -p ./dat/runoff
for line in $(cat ./tmp/obs_loc.bsv); do
  echo $line
  IFS="|" read x y abb <<< ${line[@]}
  echo $abb $x $y
  python ./discharge.py --base ./freshwater --roi=${x},${y} -u -d > ./dat/runoff/${abb}.csv
done

This takes an hour to run…

Plan overview map

g.mapset -c obs

# | Color       |   R |   G |   B | hex     |
# |-------------+-----+-----+-----+---------|
# | light blue  | 166 | 206 | 227 | #a6cee3 |
# | dark blue   |  31 | 120 | 180 | #1f78b4 |
# | light green | 178 | 223 | 138 | #b2df8a |
# | dark green  |  51 | 160 |  44 | #33a02c |
# | pink        | 251 | 154 | 153 | #fb9a99 |
# | red         | 227 |  26 |  28 | #e31a1c |
# | pale orange | 253 | 191 | 111 | #fdbf6f |
# | orange      | 255 | 127 |   0 | #ff7f00 |

C_lightblue=#a6cee3
C_darkblue=#1f78b4
C_lightgreen=#b2df8a
C_darkgreen=#33a02c
C_pink=#fb9a99

export GRASS_OVERWRITE=1
# for label in $(db.select -c sql="select label from obs_loc where label == 'L'"); do
for label in $(db.select -c sql="select label from obs_loc"); do
  v.extract input=obs_loc where="label=='${label}'" output=this

  # set up the rasters and vectors
  <<plan_view_obs_loc>>

  # Make the map
cat << EOF | ps.map -e input=- output=./fig/${label}.eps --o
<<plan_view_obs_loc_psmap>>
EOF

    convert -trim -density 300 ./fig/${label}.eps ./fig/${label}.png
    # o ./fig/${label}.png

done

## Open each in inkscape, adjust scalebar if necessary (x4 for L and W), save all (even if not adjustments)
# for f in $(ls fig/{?,??}.eps); do inkscape $f; done
## Make PNGs: 
# parallel --verbose --bar "convert -trim -density 300 ./fig/{1}.eps ./fig/{1}.png" ::: Kb K Ks L O Q R T W Z

# # B2018 cell nearby
# x=$(db.select -c sql="select round(x) from b2018 where label == '${label}'")
# y=$(db.select -c sql="select round(y) from b2018 where label == '${label}'")
# g.region w=$(( ${x} - 2500 )) e=$(( ${x} + 2499 )) s=$(( ${y} - 2500 )) n=$(( ${y} + 2499 )) -pa
# v.in.region output=b2018_this  # vector outlining region. Useful for cropping.

v.select ainput=basins_filled@land_100 binput=this output=selected operator=contains

g.region vector=selected -pa

# expand by 10 %
eval $(g.region -pg)
w_adj=$(echo "(${e} - ${w}) * 0.05" | bc)
h_adj=$(echo "(${n} - ${s}) * 0.05" | bc)
g.region e=e+${w_adj} w=w-${w_adj} n=n+${h_adj} s=s-${h_adj} -pa
g.region w=w-${w_adj}
g.region w=w-${w_adj} # another 10 % for the legend

v.in.region output=region  # vector outlining region. Useful for cropping.

# All basins within region
v.overlay ainput=region binput=basins@land_100 operator=and output=basins_land
v.overlay ainput=region binput=basins_filled@land_100 operator=and output=basins_land_filled
v.overlay ainput=region binput=basins@ice_100 operator=and output=basins_ice

# this land basin + outlet + stream
v.select ainput=basins_land binput=this operator=contains output=basins_land_this
v.select ainput=basins_land_filled binput=this operator=contains output=basins_land_filled_this
v.select ainput=outlets@land_100 binput=basins_land_this  operator=within output=outlets_land_this
v.select ainput=streams@land_100 atype=line binput=basins_land_this operator=overlap output=streams_land_this

# ice outlets, basins, and streams draining into this basin
v.select ainput=outlets@ice_100 binput=basins_land_filled_this operator=within output=outlets_ice_this
v.select ainput=basins_ice binput=outlets_ice_this operator=contains output=basins_ice_this
v.select ainput=streams@ice_100 atype=line binput=basins_ice_this operator=overlap output=streams_ice_this

if [[ ${label} == "L" ]]; then
  v.select ainput=this binput=basins_ice operator=within output=outlets_ice_this
  v.select ainput=basins_ice binput=this operator=contains output=outlets_ice_this
  v.select ainput=basins_ice binput=outlets_ice_this operator=contains output=basins_ice_this
  v.select ainput=streams@ice_100 atype=line binput=basins_ice_this operator=overlap output=streams_ice_this
fi

### Basemap
g.region res=15 -a
parallel --verbose --bar "r.mapcalc \"{}float = (rgb.{#}@NSIDC_0713)^0.33\"" ::: r g b
eval $(r.univar map=rfloat,gfloat,bfloat -g) # get max of all three
parallel --verbose --bar "r.mapcalc \"{} = int( round( ({}float - ${min}) / (${max} - ${min}) * 255 ) )\"" ::: r g b
r.colors map=r,g,b color=grey

# RCM cells here
r.to.vect input=mask_ice_MAR@MAR output=MAR_ice type=area
r.to.vect input=mask_ice_RACMO@RACMO output=RACMO_ice type=area

# d.rgb red=r green=g blue=b
# d.vect basins@land_100 fill_color=none color=${C_darkgreen} width=2
# d.vect outlets@land_100 icon=basic/cross2 fillcolor=${C_darkgreen} color=${C_darkgreen} size=10
# d.vect streams@land_100 color=${C_lightgreen}
# d.vect basins@ice_100 fill_color=none color=${C_darkblue} width=2
# d.vect outlets@ice_100 icon=basic/cross2 fillcolor=${C_darkblue} color=${C_darkblue} size=10
# d.vect streams@ice_100 color=${C_lightblue}
# d.vect this icon=basic/cross2 color=red fill_color=red size=15
# d.out.file output=./fig/b.png size=${width},${height}

border n

paper a4
  end

vareas basins_land
  color ${C_darkgreen}
  fcolor none
  lpos 0
  width 0.33
  end

vareas basins_ice_this
  color ${C_darkblue}
  fcolor none
  label Ice basins
  end

vareas basins_ice
  color ${C_darkblue}
  fcolor none
  lpos 0
  width 0.25
  end

vareas basins_land_this
  color ${C_darkgreen}
  fcolor none
  label Land basin
  end

# vlines streams_ice_this
#   color ${C_lightblue}
#   # lpos 0
#   label Streams
#   end

vlines streams_land_this
  color ${C_lightgreen}
  # lpos 0
  label Streams
  end

vpoints outlets_ice_this
  fcolor 255:127:0
  # fcolor white
  color none
  # symbol basic/box
  label Outlets
  size 5
  end

vpoints outlets_land_this
  # color 51:160:44
  fcolor 255:127:0
  color none
  size 5
  lpos 0
  end

vpoints this
  type point
  color 255:127:0
  fcolor none
  label Station
  end

vareas MAR_ice
  color ${C_lightblue}
  # pat $GISBASE/etc/paint/patterns/diag_down.eps
  # fcolor 126:206:227
  fcolor none
  scale 1
  width 3
  label RCM ice
  end

vareas RACMO_ice
  color ${C_lightblue}
  # pat $GISBASE/etc/paint/patterns/diag_up.eps
  # fcolor 126:206:227
  fcolor none
  scale 1
  width 1
  # label RACMO ice
  lpos 0
  end

# vareas b2018_this
#   color ${C_pink}
#   # pat $GISBASE/etc/paint/patterns/diag_up.eps
#   # fcolor 126:206:227
#   fcolor none
#   scale 1
#   width 1
#   # label RACMO ice
#   lpos 0
#   end

rgb r g b

vlegend
  where 0.1 1
  fontsize 8
  end

text 3% 3% a
  background white
  fontsize 20
  end
  
scalebar s
  length 1
  units kilometers
  segment 1
  where 1 3.25
  fontsize 12
  end

Time series at each obs

<<py_init>>
<<py_init_graphics>>

<<load_all_obs>>
# k='W'; obs={k:obs[k]}

for k in obs.keys():
    fig = plt.figure(1, figsize=(8,3.5)) # w,h
    fig.clf()
    fig.set_tight_layout(True)
    ax1 = fig.add_subplot(111)

    adjust_spines(ax1, ['left','bottom'])

    df = obs[k]
    name = df.attrs['name']

    df = df.replace(0, np.nan)
    kw = {'ax': ax1, 'drawstyle':'steps-post'}
    dfly = df[df.index.year == df.dropna().index.year[-1]] # .dropna() # last year
    dfly = dfly.loc[dfly['obs'].first_valid_index():dfly['obs'].last_valid_index()]

    # dfly = df
    dfly['obs'].plot(color=C_darkblue, **kw, alpha=1, label=name + ' observed')
    dfly['MAR'].plot(color=C_MAR, **kw, alpha=0.5, label='MAR')
    dfly['RACMO'].plot(color=C_RACMO, **kw, alpha=0.5, label='RACMO')

    # \pm 15 % according to Xavier
    kw={'step':'post', 'alpha':0.25}
    ax1.fill_between(dfly.index, dfly['MAR']*0.85, dfly['MAR']*1.15,color=C_MAR, **kw)
    ax1.fill_between(dfly.index, dfly['RACMO']*0.85, dfly['RACMO']*1.15,color=C_RACMO, **kw)
    # ax1.fill_between(dfly.index, dfly['MAR_90']*0.85, dfly['MAR_90']*1.15, color=C_MAR, **kw)
    # ax1.fill_between(dfly.index, dfly['RACMO_90']*0.85, dfly['RACMO_90']*1.15, color=C_RACMO, **kw)
    # ax1.fill_between(dfly.index, dfly['MAR_80']*0.85, dfly['MAR_80']*1.15, color=C_MAR, **kw)
    # ax1.fill_between(dfly.index, dfly['RACMO_80']*0.85, dfly['RACMO_80']*1.15, color=C_RACMO, **kw)

    if k == 'Q': # 9 % 2017, 22 % 2018 and 2019 error according to Ken Kondo email
        dfly['err'] = dfly['obs'] * 0.22
        ax1.fill_between(dfly.index, dfly['obs']-dfly['err'], dfly['obs']+dfly['err'], color=C_darkblue, step='post', alpha=0.25)

    if k == 'W': # Dirk provides err w/ his data
        ax1.fill_between(dfly.index, dfly['obs']-dfly['err'], dfly['obs']+dfly['err'], color=C_darkblue, step='post', alpha=0.25)

    plt.legend(fontsize=8, fancybox=False, frameon=False)
    ax1.set_xlabel("Time")
    ax1.set_ylabel("Runoff [m$^{3}$ s$^{-1}$]")
    ax1.text(-0.14, 0.95, 'b', size=15, color='white', bbox=dict(facecolor='black'), transform=ax1.transAxes)

    plt.savefig("./fig/" + k + "_ts.png", bbox_inches='tight', dpi=300)

Scatter & Tukey at each obs

<<py_init>>
<<py_init_graphics>>

<<load_all_obs>>
# k="Q"; obs = {k:obs[k]}

for k in obs.keys():

    fig = plt.figure(1, figsize=(8,6.5)) # w,h
    fig.clf()
    fig.set_tight_layout(True)
    ax1 = fig.add_subplot(221)
    ax2 = fig.add_subplot(222)
    ax3 = fig.add_subplot(223)
    ax4 = fig.add_subplot(224)

    adjust_spines(ax1, ['left','bottom'])
    adjust_spines(ax2, ['right','bottom'])
    adjust_spines(ax3, ['left','bottom'])
    adjust_spines(ax4, ['right','bottom'])

    df = obs[k]
    
    df = df.replace(0, np.nan).dropna()
    df = np.log10(df)
    # df = df[~df.isin([np.nan, np.inf, -np.inf]).any(1)] # clean up nan and inf

    kw = {'alpha': 1, 'cmap': cm.viridis, 'marker':'.', 'c':df.index.dayofyear, 'edgecolor':'none', 'clip_on':False}
    s = ax1.scatter(df['obs'], df['MAR'], **kw)
    s = ax2.scatter(df['obs'], df['RACMO'], **kw)

    kw = {'color':'k', 'alpha':0.66, 'linewidth':0.5, 'levels':5}
    sns.kdeplot(df['obs'], df['MAR'], ax=ax1, **kw)
    sns.kdeplot(df['obs'], df['RACMO'], ax=ax2, **kw)

    lims = [np.min([ax1.get_xlim()[0], ax1.get_ylim()[0], ax2.get_xlim()[0], ax2.get_ylim()[0]]),
            np.max([ax1.get_xlim()[1], ax1.get_ylim()[1], ax2.get_xlim()[1], ax2.get_ylim()[1]])]

    # drop 5/95 outliers
    # q = df['obs'].quantile([0.05, 0.95])
    # df = df[(df['obs'] > q[0.05]) & (df['obs'] < q[0.95])]

    df.sort_values(by='obs', inplace=True)
    x = df['obs'].values
    y_MAR = df['MAR'].values
    y_RACMO = df['RACMO'].values
    
    X = sm.add_constant(x)
    # X = x
    model = sm.OLS(y_MAR, X)
    results = model.fit()
    prstd, iv_l, iv_u = wls_prediction_std(results)
    ax1.fill_between(x, iv_u, iv_l, color="grey", alpha=0.25)
    ax1.text(0.05, 0.85, 'r$^{2}$:' + str(round(results.rsquared,2)), transform=ax1.transAxes, horizontalalignment='left')

    model = sm.OLS(y_RACMO, X)
    results = model.fit()
    prstd, iv_l, iv_u = wls_prediction_std(results)
    ax2.fill_between(x, iv_u, iv_l, color="grey", alpha=0.25)
    ax2.text(0.05, 0.85, 'r$^{2}$:' + str(round(results.rsquared,2)), transform=ax2.transAxes, horizontalalignment='left')

    ax1.text(0.05, 0.95, 'n:' + str(df['obs'].size), transform=ax1.transAxes, horizontalalignment='left')

    ax1.set_xlabel('Observed [m$^{3}$ s$^{-1}$]')
    ax1.set_ylabel('MAR [m$^{3}$ s$^{-1}$]')
    ax2.set_xlabel('Observed [m$^{3}$ s$^{-1}$]')
    ax2.set_ylabel('RACMO [m$^{3}$ s$^{-1}$]')


    if lims[0] < -3: lims[0] = -3
    lims_upstreamp = np.log10((10**np.array(lims))*5)
    lims_down = np.log10((10**np.array(lims))/5)
    kw = {'alpha':0.5, 'linewidth':1, 'color':'k', 'linestyle':'-'}
    for ax in [ax1,ax2]:
        ax.plot(([lims[0],lims[1]]), ([lims[0],lims[1]]), **kw)
        ax.plot(([lims[0],lims[1]]), ([lims_upstreamp[0],lims_upstreamp[1]]), **kw)
        ax.plot(([lims[0],lims[1]]), ([lims_down[0],lims_down[1]]), **kw)
        ax.set_ylim(lims[0], lims[1])
        ax.set_xlim(lims[0], lims[1])

        ticks = np.arange(round(lims[0]), round(lims[1])+1)
        ax.set_yticks(ticks)
        labels = ['10$^{' + str(int(_)) + '}$' for _ in ticks]
        ax.set_yticklabels(labels)
        ax.set_xticks(ax.get_yticks())
        ax.set_xticklabels(ax.get_yticklabels())
        
    # cax = fig.add_axes([0.46, 0.6, 0.01, 0.2])
    # cb = fig.colorbar(s, cax=cax)
    cax = fig.add_axes([0.40, 0.53, 0.2, 0.015])
    cb = fig.colorbar(s, cax=cax, orientation='horizontal')
    cax.xaxis.set_ticks_position('top')
    cax.xaxis.set_label_position('top')
    cb.set_label('Day of year')

    plt.setp(ax1.xaxis.get_majorticklabels(), rotation=30)
    plt.setp(ax2.xaxis.get_majorticklabels(), rotation=30)



    ### LOWER PANELS
    # Included here so that we can easily match the x-axis limits
    df = obs[k]

    df['x'] = df['obs']
    df['y_MAR'] = df['MAR'] / df['obs']
    df['y_RACMO'] = df['RACMO'] / df['obs']
    df = df.replace(0, np.nan).dropna()
    df = df[~df.isin([np.nan, np.inf, -np.inf]).any(1)]
    df = np.log10(df)

    kw = {'mincnt':1,
          'bins':'log',
          'clip_on':True,
          'gridsize':20,
          'cmap':cm.cividis}

    # plot all to get max of both for colorbar range
    h_MAR = ax3.hexbin(df['x'], df['y_MAR'], alpha=0, **kw)
    h_RACMO = ax4.hexbin(df['x'], df['y_RACMO'], alpha=0, **kw)
    
    ymin = np.floor(np.min([ax1.get_ylim()[0], ax2.get_ylim()[0]]))
    ymax = np.ceil(np.max([ax1.get_ylim()[1], ax2.get_ylim()[1]]))
    THRESH=3
    if ymax > THRESH: ymax = THRESH

    kw['alpha'] = 1
    kw['extent'] = [lims[0], lims[1], ymin, ymax]
    kw['vmax'] = max([h_MAR.get_array().max(),h_RACMO.get_array().max()])

    h_MAR = ax3.hexbin(df['x'], df['y_MAR'], **kw)
    h_RACMO = ax4.hexbin(df['x'], df['y_RACMO'], **kw)

    df_top = df[df['obs'] > df['obs'].quantile(0.33)]
    df_bot = df[df['obs'] < df['obs'].quantile(0.33)]

    kwline = {'color':'k'}
    kwtext = {'path_effects':[pe.withStroke(linewidth=2, foreground="white")], 
              'color':'k',
              'fontsize':10,
              'verticalalignment':'center'}
    for d in [df_top, df_bot]:
    
        for ax in [ax3,ax4]:
            if d is df_top:
                max_MAR = np.max([_[0] for _ in h_MAR.get_offsets()])
                max_RACMO = np.max([_[0] for _ in h_RACMO.get_offsets()])
                xpos = np.max([max_MAR, max_RACMO])
                # xpos = round(xmax)
                kwtext['horizontalalignment'] = 'left'
            elif d is df_bot:
                min_MAR = np.min([_[0] for _ in h_MAR.get_offsets()])
                min_RACMO = np.min([_[0] for _ in h_RACMO.get_offsets()])
                xpos = np.min([min_MAR, min_RACMO])
                kwtext['horizontalalignment'] = 'right'

            if ax is ax3: yy = d['y_MAR']
            if ax is ax4: yy = d['y_RACMO']
            y = yy.mean()
            ax.plot([d['x'].min(),d['x'].max()], [y,y], **kwline)
            ax.text(xpos, y, str(round(10**y,2)), **kwtext)
            
            y = yy.quantile(0.95)
            ax.plot([d['x'].min(),d['x'].max()], [y,y], linestyle='--', **kwline)
            ax.text(xpos, y, str(round(10**y,2)), **kwtext)
            
            y = yy.quantile(0.05)
            ax.plot([d['x'].min(),d['x'].max()], [y,y], linestyle='--', **kwline)
            ax.text(xpos, y-0.15, str(round(10**y,2)), **kwtext)
            
    ax3.set_xlabel('Observed [m$^{3}$ s$^{-1}$]')
    ax3.set_ylabel('MAR / Observed')
    ax4.set_xlabel('Observed [m$^{3}$ s$^{-1}$]')
    ax4.set_ylabel('RACMO / Observed')

    for ax in [ax3,ax4]:
        ax.set_xlim(lims[0], lims[1])
        ax.set_xticks(ax1.get_xticks())
        ax.set_xticklabels(ax1.get_xticklabels())
        ticks = np.arange(round(ymin), round(ymax)+1)
        labels = ['10$^{' + str(int(_)) + '}$' for _ in ticks]
        ax.set_yticks(ticks)
        ax.set_ylim([ticks[0], ticks[-1]])
        ax.set_yticklabels(labels)

    ax1.text(-0.33, 0.95, 'c', size=15, color='white', bbox=dict(facecolor='black'), transform=ax1.transAxes)
    ax3.text(-0.33, 0.99, 'd', size=15, color='white', bbox=dict(facecolor='black'), transform=ax3.transAxes)

    cax = fig.add_axes([0.40, 0.5, 0.2, 0.015])
    cb = fig.colorbar(h_MAR, cax=cax, orientation='horizontal')
    if k in ['Ks','R','Q']: cax.xaxis.set_minor_formatter(plt.ScalarFormatter())
    cb.set_label('Count')

    plt.setp(ax3.xaxis.get_majorticklabels(), rotation=30)
    plt.setp(ax4.xaxis.get_majorticklabels(), rotation=30)

    # clean up upper panel
    adjust_spines(ax1,['left'])
    adjust_spines(ax2,['right'])
    ax1.set_xlabel('')
    ax2.set_xlabel('')

    plt.savefig("./fig/" + k + "_tukey_scatter.png", bbox_inches='tight', dpi=300)

Stack

mkdir ./tmp/stack
f=$(ls fig/{?,??}_ts.png|head -n1) # debug
for f in $(ls fig/{?,??}_ts.png); do
  ff=$(basename $f)
  k=$(echo ${ff} | cut -d"_" -f1)
  echo $ff $k

  cp fig/${k}.png ./tmp/stack/map.png
  cp fig/${k}_ts.png ./tmp/stack/ts.png
  cp fig/${k}_tukey_scatter.png ./tmp/stack/tukey_scatter.png
  
  convert ./tmp/stack/{map,ts,tukey_scatter}.png -gravity center -append fig/${k}_stack.png
done

convert -pointsize 40 -fill red -draw 'text 850,250 "X"' ./fig/L_stack.png ./fig/tmp.png
mv ./fig/tmp.png ./fig/L_stack.png

Watson River

Watson Adjustments

import matplotlib.pyplot as plt

import matplotlib
from matplotlib import rc
rc('font', size=12)
rc('text', usetex=False)
# matplotlib.pyplot.xkcd()

from discharge import discharge 
import geopandas as gp
from adjust_spines import adjust_spines as adjust_spines

if 'r_land' not in locals():
    # runoff at Watson land outlet, and for two ice basins to the S not included in the uptream Watson outlets
    r_land = discharge(base="./freshwater", roi="-50.68,67.01", upstream=True, quiet=False).discharge()
    r_ice0 = discharge(base="./freshwater", roi="-49.40,66.90", upstream=False, quiet=False).discharge()
    r_ice1 = discharge(base="./freshwater", roi="-49.25,66.80", upstream=False, quiet=False).discharge()

    g_land = discharge(base="./freshwater", roi="-50.68,67.01", upstream=True, quiet=False).outlets().set_geometry('basin')
    g_ice0 = discharge(base="./freshwater", roi="-49.40,66.90", upstream=False, quiet=False).outlets().set_geometry('basin')
    g_ice1 = discharge(base="./freshwater", roi="-49.25,66.80", upstream=False, quiet=False).outlets().set_geometry('basin')

# plt.close(1)
fig = plt.figure(1, figsize=(4,4.5)) # w,h
fig.clf()
fig.set_tight_layout(True)
import matplotlib.gridspec as gridspec
gs = gridspec.GridSpec(2, 2, width_ratios=[1,1000], height_ratios=[1,2]) #h, w
ax1 = plt.subplot(gs[0, :])
ax2 = plt.subplot(gs[1, :])

C_lightblue = np.array((166, 206, 227))/255
C_darkblue = np.array((31, 120, 180))/255
C_lightgreen = np.array((127, 223, 138))/255
C_darkgreen = np.array((51, 160, 44))/255

kw = {'ax':ax1, 'linewidth':0.3, 'clip_on':False}
g_land[g_land['domain'] == 'land'].plot(color=C_lightgreen, edgecolor='k', **kw)
g_land[g_land['domain'] == 'ice'].plot(color='orange', edgecolor='k', **kw)
g_land[g_land['domain'] == 'land'].set_geometry('outlet').plot(color='r', **kw)
g_ice0[g_ice0['domain'] == 'ice'].plot(color=C_darkblue, edgecolor='k', **kw)
g_ice1[g_ice1['domain'] == 'ice'].plot(color=C_darkblue, edgecolor='k', **kw)

# ax1.set_ylim([-2.59E6,-2.49E6])
ax1.set_ylim([-2.60E6,-2.50E6])

# write out for Google Eearth so we can draw approx location of contour line
# g_ice1[(g_ice1['domain'] == 'ice') & (g_ice1['k'] == 100)].drop(columns="outlet").to_file('./tmp/g_ice1.shp')
# 1500 @ -48.3
# 1850 @ -47.33
# most runoff below that (see van As 2017)
from shapely.geometry import Point, Polygon
xy = gp.GeoSeries(data=Point(-48.3, 66.6), crs="EPSG:4326").to_crs("EPSG:3413").loc[0].xy
ax1.plot(list(xy[0])*2,[-2.52E6,-2.57E6], 'k--', alpha=0.5, clip_on=False)
xy = gp.GeoSeries(data=Point(-47.33, 66.6), crs="EPSG:4326").to_crs("EPSG:3413").loc[0].xy
ax1.plot(list(xy[0])*2,[-2.54E6,-2.59E6], 'k--', alpha=0.5, clip_on=False)

# ax1.margins(0,0)
# ax1.xaxis.set_major_locator(plt.NullLocator())
# ax1.yaxis.set_major_locator(plt.NullLocator())
adjust_spines(ax1, [])

from matplotlib_scalebar.scalebar import ScaleBar
ax1.add_artist(ScaleBar(1))

W = pd.read_csv("./dat/runoff/obs_W.csv", index_col=0, parse_dates=True)

MAR = r_land[['MAR_land','MAR_ice_upstream']]\
    .sum(dim=['land','ice_upstream'])\
    .to_dataframe()\
    .merge(r_ice0['MAR_ice']\
           .squeeze()\
           .to_dataframe()['MAR_ice'], left_index=True, right_index=True)\
    .rename(columns={"MAR_ice":"ice0"})\
    .merge(r_ice1['MAR_ice']\
           .squeeze()\
           .to_dataframe()['MAR_ice'], left_index=True, right_index=True)\
    .rename(columns={"MAR_ice":"ice1"})\
    .merge(W, left_index=True, right_index=True)

combine = ['MAR_land','MAR_ice_upstream']
MAR['land'] = MAR[combine].sum(axis='columns')
MAR.drop(columns=combine, inplace=True)


# identical to MAR
RACMO = r_land[['RACMO_land','RACMO_ice_upstream']]\
    .sum(dim=['land','ice_upstream'])\
    .to_dataframe()\
    .merge(r_ice0['RACMO_ice']\
           .squeeze()\
           .to_dataframe()['RACMO_ice'], left_index=True, right_index=True)\
    .rename(columns={"RACMO_ice":"ice0"})\
    .merge(r_ice1['RACMO_ice']\
           .squeeze()\
           .to_dataframe()['RACMO_ice'], left_index=True, right_index=True)\
    .rename(columns={"RACMO_ice":"ice1"})\
    .merge(W, left_index=True, right_index=True)

combine = ['RACMO_land','RACMO_ice_upstream']
RACMO['land'] = RACMO[combine].sum(axis='columns')
RACMO.drop(columns=combine, inplace=True)

df = (MAR+RACMO)/2
df['merge'] = df['land'] + df['ice0'] + df['ice1']
df = np.log10(df).dropna()
df = df[~df.isin([np.nan, np.inf, -np.inf]).any(1)] # clean up nan and inf

import seaborn as sns

kw = {'alpha':0.66, 'shade_lowest':False}
sns.kdeplot(df['Observed'], df['land'], cmap=cm.Oranges, ax=ax2, shade=True, **kw)
sns.kdeplot(df['Observed'], df['merge'], cmap=cm.Blues, ax=ax2, **kw)

lims = np.array((ax2.get_xlim(),ax2.get_ylim())).flatten()
ax2.set_xlim(1, 3.8)
ax2.set_ylim(ax2.get_xlim())
ax2.set_xlabel('Observed [m$^{3}$ s$^{-1}$]')
lims = [1,3.8]

if lims[0] < -3: lims[0] = -3
lims_upstreamp = np.log10((10**np.array(lims))*2)
lims_down = np.log10((10**np.array(lims))/2)
kw = {'alpha':0.5, 'linewidth':1, 'color':'k'}
ax2.plot(([lims[0],lims[1]]), ([lims[0],lims[1]]), **kw)
# ax2.plot(([lims[0],lims[1]]), ([lims_upstreamp[0],lims_upstreamp[1]]), **kw)
ax2.plot(([lims[0],lims[1]]), ([lims_down[0],lims_down[1]]), linestyle='--', **kw)
ax2.set_ylim(lims[0], lims[1])
ax2.set_xlim(lims[0], lims[1])
    
ticks = np.arange(round(lims[0]), round(lims[1])+1)
ax2.set_yticks(ticks)
labels = ['10$^{' + str(int(_)) + '}$' for _ in ticks]
ax2.set_yticklabels(labels)
ax2.set_xticks(ax2.get_yticks())
ax2.set_xticklabels(ax2.get_yticklabels())
    
adjust_spines(ax2, ['left','bottom'])
ax2.set_ylabel('RCM mean [m$^{^3}$ s$^{-1}$]')

plt.savefig('./fig/watson_adjusted_south.png', transparent=True, bbox_inches='tight', dpi=300)

Leverett Glacier

Kiattuut Sermiat

Kingigtorssuaq

Oriartorfik

Teqinngalip

Kobbefjord

Røde Elv

Zackenberg

Qaanaaq

Elevation histogram

README

README for "Greenland liquid water runoff from 1950 through December 2022"

Data DOI: doi:10.22008/promice/freshwater
Source: https://github.com/GEUS-Glaciology-and-Climate/freshwater

* Data Description

Data sets released as part of this work include:
+ Streams
+ Outlets
+ Basins
+ Runoff

** Streams, outlets, and basins

Streams, outlets, and basins are calculated for both land coast outlets and ice margin outlets, and the upstream basins and streams that feed those outlets. For each of these two (land coast, ice margin) we estimate streams, outlets, and basins, assuming subglacial pressure is equal to ice overburden pressure.

Streams over land closely follow actual streams based on comparison with satellite imagery.
Subglacial streams are merely representative of where the model hydropotential routes most of the water. 

Outlets are marked where streams leave the domain

Basins are the usptream grid cells that feed into each outlet.

** Runoff

Runoff comes from daily MAR and RACMO RCM runoff, partitioned into each basin, and assigned to each outlet.

The ice dataset report the ice runoff. The land dataset report the the land runoff. The land dataset includes the ID of the upstream ice basins that drain into each land basin, allowing one to find the ice melt runoff at any land outlet.

* Datasets

Each dataset is in a folder named <domain>, where =domain= is one of "ice" or "land". Within each folder the following files and sub-folders are provided:

| Filename     | Description                           |
|--------------+---------------------------------------|
| streams.gpkg | GeoPackage of stream locations        |
| streams.csv  | Metadata for streams                  |
| outlets.gpkg | GeoPackage of outlet locations        |
| outlets.csv  | Metadata for outlets                  |
| basins.gpkg  | GeoPackage of basin locations         |
| basins.csv   | Metadata for basins                   |
| runoff       | NetCDF files of runoff at each outlet |

In the runoff sub-folders, files are named <RCM>_<YYYY>.nc, where =RCM= is one of "MAR" or "RACMO", and =YYYY= is the year of the data. Within each file are daily runoff values at each outlet.

* Database access software

The =discharge.py= script provides high-level access to the database. See =README.org= for usage examples.

Complete documentation and example use cases with code are available at https://github.com/GEUS-Glaciology-and-Climate/freshwater, or by contacting Ken Mankoff <kdm@geus.dk>

Appendix

Software

This work was performed using only open-source software, primarily GRASS GIS citep:neteler_2012, CDO citep:CDO, NCO citep:NCO, GDAL citep:GDAL, and Python citep:van-rossum_1995, in particular the Jupyter citep:kluyver_2016, dask citep:dask_sw,dask_paper, pandas citep:mckinney_2010, geopandas citep:geopandas, numpy citep:oliphant_2006, x-array citep:hoyer_2017, and Matplotlib citep:hunter_2007 packages. The entire work was performed in Emacs citep:stallman_1981 using Org Mode citep:schulte_2012 on GNU/Linux and using many GNU utilities (See https://github.com/GEUS-Glaciology-and-Climate/freshwater citep:github_freshwater). The parallel citep:tange_2011 tool was used to speed up processing. We used proj4 citep:proj4 to compute the errors in the EPSG 3413 projection. The color map for Fig. fig:k_basin_change comes from citet:colorbrewer.

Misc journal sections

\authorcontribution{KDM produced this work and wrote the code and the text. XF and BN supplied RCM inputs. APA, WIC, DVA, and RSF helped with discussions of methods, quality control, or writing. KK and SS supplied Qaanaaq data. KL provided GEM data.}

\competinginterests{The authors declare that they have no conflict of interest.}

References

LaTeX setup

(add-to-list 'org-latex-classes
               `("copernicus"
                 "\\documentclass{copernicus}
               [NO-DEFAULT-PACKAGES]
               [NO-PACKAGES]
               [EXTRA]"
                 ("\\section{%s}" . "\\section*{%s}")
                 ("\\subsection{%s}" . "\\subsection*{%s}")
                 ("\\subsubsection{%s}" . "\\subsubsection*{%s}")
                 ("\\paragraph{%s}" . "\\paragraph*{%s}")
                 ("\\subparagraph{%s}" . "\\subparagraph*{%s}"))
               )

(setq-local org-latex-title-command "")

Files

freshwater.org

Latest commit

History

freshwater.org

File metadata and controls

Table of contents

About this document

Workflow

New in this version

Introduction

Input and validation data

Static data

RCM time series

River discharge observations

Discharge observations table internal use

Methods

Terminology

Streams, outlets, and basins

Outlet sensitivity

Discharge and RCM coverage

Validation

Product evaluation and assessment

Main characteristics

Comparison with previous similar work

Validation against observations

Bulk validation

Basin sizes

Watson River

Leverett Glacier

Kiattuut Sermiat

GEM observations near Nuuk

Remaining observations

Uncertainty

Temporal uncertainty

Basin uncertainty

RCM uncertainty

Observational uncertainty

Mitigating uncertainties

Quality control

Other sources of freshwater

Summary

Product description

Streams

Outlets

Basins

Discharge

Database access software

Conclusions

Data and code availability

Algorithms

Streams, outlets, and basins

Streams

Outlets

Basins

Domains

Adjust Basins

Metadata

Add Metadata

Stream Indices

Export

Helper functions

Data Dir

init

GRASS config

x and y and ones in PERMANENT mapset

Mask small areas

Absorb small basins & drop their outlets

GRASS launch and mapset selector prologue

Remove GRASS PS1 prompt noise from Babel output

Model output routing

Area correction for EPSG:3413

MAR

Create MAR mapset

MAR alignment check

2D area error

RACMO

Create RACMO mapset

2D area error

RCM preprocessing