Code for training and evaluating global-scale species range estimation models. This code enables the recreation of the results from our ICML 2023 paper Spatial Implicit Neural Representations for Global-Scale Species Mapping.
Estimating the geographical range of a species from sparse observations is a challenging and important geospatial prediction problem. Given a set of locations where a species has been observed, the goal is to build a model to predict whether the species is present or absent at any location. In this work, we use Spatial Implicit Neural Representations (SINRs) to jointly estimate the geographical range of thousands of species simultaneously. SINRs scale gracefully, making better predictions as we increase the number of training species and the amount of training data per species. We introduce four new range estimation and spatial representation learning benchmarks, and we use them to demonstrate that noisy and biased crowdsourced data can be combined with implicit neural representations to approximate expert-developed range maps for many species.
Above we visualize predictions from one of our SINR models trained on data from iNaturalist. On the left we show the learned species embedding space, where each point represents a different species. On the right we see the predicted range of the species corresponding to the red dot on the left.
-
We recommend using an isolated Python environment to avoid dependency issues. Install the Anaconda Python 3.9 distribution for your operating system from here.
-
Create a new environment and activate it:
conda create -y --name sinr_icml python==3.9
conda activate sinr_icml
- After activating the environment, install the required packages:
pip3 install -r requirements.txt
Continue to Data Download and Preparation
- Navigate to the repository root directory:
cd /path/to/sinr/
- Download the data file:
curl -L https://data.caltech.edu/records/b0wyb-tat89/files/data.zip --output data.zip
- Extract the data and clean up:
unzip -q data.zip
- Clean up:
rm data.zip
- Navigate to the directory for the environmental features:
cd /path/to/sinr/data/env
- Download the data:
curl -L https://geodata.ucdavis.edu/climate/worldclim/2_1/base/wc2.1_5m_bio.zip --output wc2.1_5m_bio.zip
curl -L https://geodata.ucdavis.edu/climate/worldclim/2_1/base/wc2.1_5m_elev.zip --output wc2.1_5m_elev.zip
- Extract the data:
unzip -q wc2.1_5m_bio.zip
unzip -q wc2.1_5m_elev.zip
- Run the formatting script:
python format_env_feats.py
- Clean up:
rm *.zip
rm *.tif
After following these instructions, the data
directory should have the following structure:
data
βββ README.md
βββ env
βΒ Β βββ bioclim_elevation_scaled.npy
βΒ Β βββ format_env_feats.py
βββ eval
βΒ Β βββ geo_feature
βΒ Β βΒ Β βββ ABOVE_GROUND_CARBON.tif
βΒ Β βΒ Β βββ ELEVATION.tif
βΒ Β βΒ Β βββ LEAF_AREA_INDEX.tif
βΒ Β βΒ Β βββ NON_TREE_VEGITATED.tif
βΒ Β βΒ Β βββ NOT_VEGITATED.tif
βΒ Β βΒ Β βββ POPULATION_DENSITY.tif
βΒ Β βΒ Β βββ SNOW_COVER.tif
βΒ Β βΒ Β βββ SOIL_MOISTURE.tif
βΒ Β βΒ Β βββ TREE_COVER.tif
βΒ Β βββ geo_prior
βΒ Β βΒ Β βββ geo_prior_model_meta.csv
βΒ Β βΒ Β βββ geo_prior_model_preds.npz
βΒ Β βΒ Β βββ taxa_subsets.json
βΒ Β βββ iucn
βΒ Β βΒ Β βββ iucn_res_5.json
βΒ Β βββ snt
βΒ Β βββ snt_res_5.npy
βββ masks
βΒ Β βββ LAND_MASK.tif
βββ ocean_mask.npy
βββ ocean_mask_hr.npy
βΒ Β βββ USA_MASK.tif
βββ train
βββ geo_prior_train.csv
βββ geo_prior_train_meta.json
Now you should be all done! Continue to learn about how to use the geomodel to make predictions and do experiments.
There are a variety of ways to use the models and make predictions, this section will walk you through on how to use this codebase effectively.
To generate predictions for a model in the form of an image, run the following command:
python viz_map.py --taxa_id 130714
Here, --taxa_id
is the id number for a species of interest from iNaturalist. If you want to generate predictions for a random species, add the --rand_taxa
instead.
Note, before you run this command you need to first download the data as described in Set up instructions
. In addition, if you want to evaluate some of the pretrained models from the paper, you need to download those first and place them at sinr/pretrained_models
. See Web App for Visualizing Model Predictions
below for more details.
To train and evaluate a model, run the following command, requires GPU:
python train_and_evaluate_models.py
Common parameters of interest can be set within train_and_evaluate_models.py
. All other parameters are exposed in setup.py
.
By default, trained models and evaluation results will be saved to a folder in the experiments
directory. Evaluation results will also be printed to the command line.
Gradio app for exploring different model predictions.
To use the web app, you must first download the pretrained models from here and place them at sinr/pretrained_models
. See app.py
for the expected paths.
Activate the SINR environment:
conda activate sinr_icml
Navigate to the web_app directory:
cd /path/to/sinr/web_app
Launch the app:
python app.py
Click on or copy the local address output in the command line and open this in your web browser in order to view the web app. This will look something like:
Running on local URL: http://127.0.0.1:7860
- From here use your mouse and the dropdown menus to choose which model and species you wish to visualize.
- Taxon IDs are aligned with those from iNaturalist, so if you wish to find a specific taxon you can search within the iNaturalist site and then copy the taxon ID into the web app. Note that not all taxa from iNaturalist are present in all models.
- For example, to view the predicted species range for the Northern Cardinal, navigate to the iNaturalist page for this taxon (https://www.inaturalist.org/taxa/9083-Cardinalis-cardinalis) and set the taxon ID in the app to
9083
and click "Run Model".
- For example, to view the predicted species range for the Northern Cardinal, navigate to the iNaturalist page for this taxon (https://www.inaturalist.org/taxa/9083-Cardinalis-cardinalis) and set the taxon ID in the app to
- To generate a thresholded predicted range select the "threshold" button and use the slider to choose the threshold value.
Navigate to the root of the project directory
- Create a new directory
reproduce
- Navigate inside
./reproduce
and create new directoryrepr
Your file structure should look like:
data
βββ README.md
βββ ... All other files
βββ reproduce
βΒ Β βββ repr
Now run python reproduce.py
to create recon results from pretrained models. While executing it should create .npy files inside ./reproduce/repr
. After execution, you can run reproduce_tables.ipynb
to generate tables from the data and compare it to actual values presented in the paper.
Contact Ozel with errors if any.
To extract annotation data from iNatAtor, use data_extraction.ipynb
or data_extraction.py
(to submit as jobs). You will need to configure a .env that has connection secrets. If your database is running in your local computer and you want to use unity to extract data as a job you will need to build a tunnel. For development, I suggest running this script on the same computer as your database and then upload the .csv file into unity to inaturalist-sinr/data/annotation
.
You can use any .csv
data as long as it supports the required columns described in data_extraction.ipynb
. An example.csv
is given to you as a sample data to test on.
Make sure you have followed the steps for downloading training data and pretrained models described above. You have two options for running fine tuning on Unity.
When your data is ready, you can use .sh
scripts in scripts/
to submit slurm jobs.
You can run them as sbatch scripts/fine_tune.sh
, the job outputs will saved in scripts/out
. You can use results_from.sh
script to fill out model command details and submit the job to get the image, it will be saved in images/
.
Open an interactive VSCode session on Unity to start fine tuning (interactive VSCode is preferred because you can change hyperparameters and re-run the trainer quickly). If you don't have access to Unity or slurm you can still follow along this option with your local VSCode or any other code editor.
You can run python fine_tune_main.py
to start fine tuning. You can run python viz_map.py --name fine_tuned --model_path /path/to/inaturalist-sinr/fine-tuned/demo/${YOUR_SAVED_MODEL_NAME}.pt --taxa_id ${TAXA}
to see updates on your predictions.
Please refer to (fine-tuning report)[https://docs.google.com/document/d/17t-MRulBXyp-WsPVg_WaYRnvEutDUfsqRNVWgjVVFww/edit?usp=sharing] to get a deeper understanding of changes and updates.
Currently the pretrained_models/model_an_full_input_enc_sin_cos_distilled_from_env.pt
model cannot be fine-tuned. We noticed that the model saved and a training batch had different dimenstions than what was returned from environmental input encoders.
This project was enabled by data from the Cornell Lab of Ornithology, The International Union for the Conservation of Nature, iNaturalist, NASA, USGS, JAXA, CIESIN, and UC Merced. We are especially indebted to the iNaturalist and eBird communities for their data collection efforts. We also thank Matt Stimas-Mackey and Sam Heinrich for their help with data curation. This project was funded by the Climate Change AI Innovation Grants program, hosted by Climate Change AI with the support of the Quadrature Climate Foundation, Schmidt Futures, and the Canada Hub of Future Earth. This work was also supported by the Caltech Resnick Sustainability Institute and an NSF Graduate Research Fellowship (grant number DGE1745301).
If you find our work useful in your research please consider citing our paper.
@inproceedings{SINR_icml23,
title = {{Spatial Implicit Neural Representations for Global-Scale Species Mapping}},
author = {Cole, Elijah and Van Horn, Grant and Lange, Christian and Shepard, Alexander and Leary, Patrick and Perona, Pietro and Loarie, Scott and Mac Aodha, Oisin},
booktitle = {ICML},
year = {2023}
}
Extreme care should be taken before making any decisions based on the outputs of models presented here. Our goal in this work is to demonstrate the promise of large-scale representation learning for species range estimation, not to provide definitive range maps. Our models are trained on biased data and have not been calibrated or validated beyond the experiments illustrated in the paper.