FluxRGNN

A spatio-temporal modeling framework for large-scale migration forecasts based on static sensor network data (e.g. from weather radars). FluxRGNN is a recurrent graph neural network that is based on a generic mechanistic description of population-level movements across space and time. Unlike previous approaches, this hybrid model capitalises on local associations between environmental conditions and migration intensity as well as on spatio-temporal dependencies inherent to the movement process.

The original FluxRGNN approach models movements on the Voronoi tessellation of sensor locations (the paper can be found here). In our follow-up paper we have introduced FluxRGNN+, which extends the original FluxRGNN approach to arbitrary tessellations by decoupling the sensor network from the computational grid on which movements are modeled. This repository provides the code for both approaches. The original implementation of the FluxRGNN approach and the associated experiments and analysis scripts can be found under version v1.1.1. Note that the latest version may not be compatible with all settings and experiments of the original paper.

Requirements and setup

First, make sure you have conda installed.

To install all other dependencies and the FluxRGNN package itself, switch to the FluxRGNN directory and run:

bash install.sh

This will create a new conda environment called fluxrgnn and will install the FluxRGNN package into this environment. Later on, it is enough to activate the environment with

conda activate fluxrgnn

before getting started.

Note that after making changes to files in the fluxrgnn directory, you need to reinstall the associated python package by running

python setup.py install

Additional dependencies

If you want to use your GPU, you may need to manually install a matching PyTorch version.

To recreate geographical visualisations from our paper, some additional packages are required. They can be installed by running

conda env update --name fluxrgnn --file plotting_environment.yml

To make the conda environment visible for the jupyter notebooks, run

python -m ipykernel install --user --name=fluxrgnn

To install additional packages required to run the radar data preprocessing (see below), run

conda env update --name fluxrgnn --file preprocessing_environment.yml

To generate contrastive explanations (i.e. Shapley value attributions of deviations in predicted quantities from a local reference to different input features) a modified version of the shap Python package is required, which can be found here. Make sure this package is available in the fluxrgnn conda environment, either by running

python setup.py install

in the shap directory, or by adding the shap directory to your PYTHONPATH.

Getting started

Hydra config

FluxRGNN makes use of hydra to create a hierarchical configuration which can be composed dynamically and allows for overrides through the command line. Have a look at the scripts/config folder to get familiar with the structure of config files. The default settings correspond to the settings used in our FluxRGNN+ paper.

You can, for example, easily switch between models (e.g. FluxRGNN+ and FluxRGNN_voronoi), by simply adding model=FluxRGNN+ or model=FluxRGNN_voronoi to your command line when running one of the provided scripts. Similarly, you could change the forecasting horizon to, say, 24 hours by adding model.horizon=24.

Dataloader

The dataloader expects the preprocessed data (including environmental and sensor network data) to be in the following path:

FluxRGNN/data/preprocessed/{t_unit}_{edge_type}_{buffer}_{info}/{datasource}/{season}/{year}

where info can be either ndummy={n_dummy_radars} (if edge_type is set to voronoi) or res={h3_resolution} (if edge_type is set to hexagons). The values of t_unit, edge_type, buffer, n_dummy_radars, h3_resolution, datasource, season and year can be specified in the hydra configuration files in the scripts/conf directory.

Preprocessed European data (datasource=radar) can be downloaded here, and preprocessed US data (datasource=nexrad) can be downloaded here. These datasets include bird density and velocity measurements, atmospheric reanalysis data, and other relevant spatial and temporal features. They were generated using the script scripts/run_preprocessing.py in combination with this code base. To run this yourself, follow the birdMigration README to install the birds python package in your fluxrgnn conda environment and to download raw radar data, if needed. Then, from the FluxRGNN/scripts directory, run

python run_preprocessing.py datasource={datasource} +raw_data_dir={path/to/raw/data}

If you would like to apply FluxRGNN to your own data, you need to generate the following files (for each season and year):

delaunay.gpickle: graph structure underlying the desired tessellation as a networkx.DiGraph where nodes represent grid cells and edges between cells exist if they are adjacent. You can use this code base to construct Voronoi or hexagonal tessellations and the associated graph structure from a set of sensor locations.
tessellation.shp: shape file specifying the geometry of grid cells
radar_buffers.shp: shape file specifying the geometry of radar buffers (i.e. the area around the radar antenna used to obtain measurements)
cell_to_radar_edges.csv: dataframe containing edges between grid cells (cidx) and sensors (ridx), the distance between cell centers and sensor locations, and the area of overlap (intersection) between the cell and the sensor measurement area. This graph is used to define a simplified observation model mapping cell quantities to sensor measurements.
radar_to_cell_edges.csv: dataframe containing edges between sensors (ridx) and grid cells (cidx) and their distance. This graph is used to infer cell quantities from sparse sensor measurements during forecast initialization.

static_radar_features.csv: dataframe containing the following static features of radars:

	description	data type
ID	radar identifier used to define graph structures	integer
radar	name/label of radar	string
observed	true if data is available for this radar, false otherwise	boolean
x	x-component of radar location in local coordinate reference system	float
y	y-component of radar location in local coordinate reference system	float
lon	longitude of radar location	float
lat	latitude of radar location	float
area_km2	measurement area in km^2	float

static_cell_features.csv: dataframe containing the following static features of grid cells:

	description	data type
ID	cell identifier used to define graph structures	integer
h3_id	H3 cell identifier (if hexagonal H3 tessellation is used)	string
radar	list of radars located within the cell	List of strings
observed	true if at least one radar is located within the cell, false otherwise	boolean
x	x-component of cell center location in local coordinate reference system	float
y	y-component of cell center location in local coordinate reference system	float
lon	longitude of cell center location	float
lat	latitude of cell center location	float
area_km2	cell area in km^2	float
boundary	true if cell is at the boundary of the modeled domain, false otherwise	boolean
nlcd_maj	the NLCD land cover class dominating the cell	integer
nlcd_cX	the fraction of the cell covered by NLCD land cover class X (for X=0,...,18)	float

dynamic_radar_features.csv: dataframe containing the following dynamic radar features, i.e. variables that change over time:

	description	data type
ID	radar identifier used to define graph structures	integer
radar	name/label of radar	string
datetime	timestamp defining the beginning of the time step (e.g. "2015-08-01 12:00:00+00:00")	string
dayofyear	day of the year (determined based on the beginning of the time step)	int
tidx	time index used for indexing, sorting and aligning data sequences of multiple radars	int
solarpos	solar position (in degrees)	float
solarpos_dt	change in solar position relative to the previous time step (in degrees)	float
night	true if at any point during the time step the sun angle is below -6 degrees, false otherwise	boolean
birds_km2	bird density (birds/km^2) measured by the radar	float
bird_u	u-component of the bird velocity measured by the radar	float
bird_v	v-component of the bird velocity measured by the radar	float
missing_birds_km2	true if bird density data is missing, false otherwise	boolean
missing_bird_uv	true if bird_u or bird_v data is missing, false otherwise	boolean

dynamic_cell_features.csv: dataframe containing the following dynamic cell features, i.e. variables that change over time:

	description	data type
ID	cell identifier used to define graph structures	integer
datetime	timestamp defining the beginning of the time step (e.g. "2015-08-01 12:00:00+00:00")	string
dayofyear	day of the year (determined based on the beginning of the time step)	int
tidx	time index used for indexing, sorting and aligning data sequences of multiple radars	int
solarpos	solar position (in degrees)	float
solarpos_dt	change in solar position relative to the previous time step (in degrees)	float
dusk	true if at any point during the time step the sun angle drops below 6 degrees, false otherwise	boolean
dawn	true if at any point during the time step the sun angle rises above 6 degrees, false otherwise	boolean
night	true if at any point during the time step the sun angle is below -6 degrees, false otherwise	boolean
nightID	night identifier used to group data belonging to the same night	integer
...	any relevant environmental variables can be added here. The variable names should correspond to those specified in the env_vars list in the datasource config file.

FluxRGNN+ training and testing

To train FluxRGNN+ on NEXRAD data, switch to the scripts directory and run

python run_neural_nets.py model=FluxRGNN+ datasource=nexrad model.scale=0.001 season=fall

for fall migratory movements, or

python run_neural_nets.py model=FluxRGNN+ datasource=nexrad model.scale=0.002 season=spring

for spring migratory movements.

To run the same on a cluster using slurm and cuda, run

sbatch run_neural_nets.job 'model=FluxRGNN+ datasource=nexrad model.scale={scale} season={season} device=cluster'

To generate predictions using a trained model which is stored in /path/to/model.ckpt (can be downloaded here), run

python run_neural_nets.py model=FluxRGNN+ datasource=nexrad model.scale={scale} season={season} task=predict missing_data_threshold=1.0 model.load_states_from=/path/to/model.ckpt model.horizon={horizon}

where horizon can be freely adjusted depending on how far into the future you would like to forecast. Setting missing_data_threshold=1.0 makes sure that predictions for all sequences are generated, independent of the amount of weather radar measurements available.

Baseline models

To train and evaluate one of the baseline models (HA, GAM, GBT, or XGBoost), simply replace model=FluxRGNN+ using the corresponding model name.

Contrastive explanations

To analyse the short-term effects of weather on predicted migratory movements using Shapley value-based contrastive explanations, make sure the modified shap package is installed (see above) and run

python explain_forecast.py model=FluxRGNN+ datasource=nexrad task=explain model.load_states_from={/path/to/model.ckpt} model.horizon={horizon} model.scale=0.001 task.n_seq_samples=100 task.seqID_start=31 task.seqID_end=76 season=fall

to explain 100 randomly chosen nights during the fall peak migration period (seqID_start and seqID_end are specific for the provided NEXRAD dataset and correspond to the period between 1 September and 15 October). To do the same for the spring peak migration period (10 April to 25 May), run

python explain_forecast.py model=FluxRGNN+ datasource=nexrad task=explain model.load_states_from={/path/to/model.ckpt} model.horizon={horizon} model.scale=0.002 task.n_seq_samples=100 task.seqID_start=41 task.seqID_end=86 season=spring

This will estimate Shapley values for a range of model outputs (for FluxRGNN+ this includes bird densities, take-off, landing, migration traffic rate, flight direction, and flight speed).

Alternatively, the Weights&Biases sweep configuration scripts/sweep_explanations_spring.yaml and scripts/sweep_explanations_fall.yaml can be used to easily run multiple nights in parallel.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FluxRGNN

Requirements and setup

Additional dependencies

Getting started

Hydra config

Dataloader

FluxRGNN+ training and testing

Baseline models

Contrastive explanations

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 342 Commits
fluxrgnn		fluxrgnn
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
environment_notebooks.yml		environment_notebooks.yml
install.sh		install.sh
plotting_environment.yml		plotting_environment.yml
preprocessing_environment.yml		preprocessing_environment.yml
setup.py		setup.py

License

FionaLippert/FluxRGNN

Folders and files

Latest commit

History

Repository files navigation

FluxRGNN

Requirements and setup

Additional dependencies

Getting started

Hydra config

Dataloader

FluxRGNN+ training and testing

Baseline models

Contrastive explanations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages