Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 15 additions & 6 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ on:
jobs:
build:
runs-on: ubuntu-latest
container: ghcr.io/osgeo/gdal:ubuntu-small-3.10.3
container: ghcr.io/osgeo/gdal:ubuntu-small-3.11.4
strategy:
fail-fast: false
matrix:
Expand All @@ -22,22 +22,31 @@ jobs:
- name: Install system
run: |
apt-get update -qqy
apt-get install -y git python3-pip libpq5 libpq-dev r-base libtirpc-dev
apt-get install -y git python3-pip libpq5 libpq-dev r-base libtirpc-dev shellcheck
- uses: actions/checkout@v4
with:
submodules: 'true'

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v3
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install gdal[numpy]==3.10.3
python -m pip install gdal[numpy]==3.11.4
python -m pip install -r requirements.txt

- name: Lint with pylint
run: |
python3 -m pylint utils prepare_layers prepare_species threats
run: python3 -m pylint utils prepare_layers prepare_species threats

- name: Type checking with mypy
run: python3 -m mypy utils prepare_layers prepare_species threats

- name: Tests
run: python3 -m pytest ./tests

- name: Script checks
run: |
python3 -m pytest ./tests
shellcheck ./scripts/run.sh
3 changes: 0 additions & 3 deletions .gitmodules

This file was deleted.

4 changes: 4 additions & 0 deletions .mypy.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[mypy]
ignore_missing_imports = True
explicit_package_bases = False
no_namespace_packages = True
7 changes: 5 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,14 @@ WORKDIR /go/littlejohn
RUN go mod tidy
RUN go build

FROM ghcr.io/osgeo/gdal:ubuntu-small-3.10.0
FROM ghcr.io/osgeo/gdal:ubuntu-small-3.11.4

RUN apt-get update -qqy && \
apt-get install -qy \
git \
cmake \
python3-pip \
shellcheck \
r-base \
libpq-dev \
libtirpc-dev \
Expand All @@ -27,7 +28,7 @@ COPY --from=reclaimerbuild /go/reclaimer/reclaimer /bin/reclaimer
COPY --from=littlejohnbuild /go/littlejohn/littlejohn /bin/littlejohn

RUN rm /usr/lib/python3.*/EXTERNALLY-MANAGED
RUN pip install gdal[numpy]==3.10.0
RUN pip install gdal[numpy]==3.11.4

COPY requirements.txt /tmp/
RUN pip install -r /tmp/requirements.txt
Expand All @@ -53,3 +54,5 @@ ENV PYTHONPATH=/root/star

RUN python3 -m pytest ./tests
RUN python3 -m pylint prepare_layers prepare_species utils tests
RUN python3 -m mypy prepare_layers prepare_species utils tests
RUN shellcheck ./scripts/run.sh
91 changes: 76 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,40 +4,41 @@ An implementation of the threat based [STAR biodiversity metric by Muir et al](h

See [method.md](method.md) for a description of the methodology, or `scripts/run.sh` for how to execute the pipeline.

# Running the pipeline

## Checking out the code

This repository uses submodules, so once you have cloned it, you need to fetch the submodules:
The code is available on github, and can be checked out from there:

```shell
$ git clone https://github.com/quantifyearth/star.git
$ cd star
$ git submodule update --init --recursive
$ git clone https://github.com/quantifyearth/STAR.git
...
$ cd STAR
```

## Running the pipeline
## Additional inputs

There are some additional inputs required to run the pipeline, which should be placed in the directory you use to store the pipeline results.

The easiest way to get started will be to run `scripts/run.sh` under a linux environment.
* SpeciesList_generalisedRangePolygons.csv - A list of species with generalised ranges on the IUCN Redlist.
* BL_Species_Elevations_2023.csv (optional) - corrections to the elevation of birdlife species on the IUCN Redlist taken from the BirdLife data.

### Running on Ubuntu
The script also assumes you have a Postgres database with the IUCN Redlist database in it.

## Running the pipeline

The following extra utilities will need to be installed:
There are two ways to run the pipeline. The easiest way is to use Docker if you have it available to you, as it will manage all the dependencies for you. But you can check out and run it locally if you want to also, but it requires a little more effort.

* [Reclaimer](https://github.com/quantifyearth/reclaimer/) - a utility for downloading data from various primary sources.
* [Littlejohn](https://github.com/quantifyearth/littlejohn/) - a utility to run jobs in parallel driven by a CSV file.
### Running with Docker

### Running in Docker

There is included a docker file, which is based on the GDAL container image, which is set up to install everything ready to use. You can build that using:

```
```shell
$ docker buildx build -t star .
```

You can then invoke the run script using this. You should map an external folder into the container as a place to store the intermediary data and final results, and you should provide details about the Postgres instance with the IUCN redlist:

```
```shell
$ docker run --rm -v /some/local/dir:/data \
-e DB_HOST=localhost \
-e DB_NAME=iucnredlist \
Expand All @@ -46,6 +47,66 @@ $ docker run --rm -v /some/local/dir:/data \
star ./scripts/run.sh
```

### Running without Docker

If you prefer not to use Docker, you will need:

* Python3 >= 3.10
* GDAL
* R (required for validation)
* [Reclaimer](https://github.com/quantifyearth/reclaimer/) - a Go tool for fetching data from Zenodo
* [Littlejohn](https://github.com/quantifyearth/littlejohn/) - a Go tool for running scripts in parallel

If you are using macOS please note that the default Python install that Apple ships is now several years out of date (Python 3.9, released Oct 2020) and you'll need to install a more recent version (for example, using [homebrew](https://brew.sh)).

With those you should set up a Python virtual environment to install all the required packages. The one trick to this is you need to match the Python GDAL package to your installed GDAL version. For example, on my machine I did the following:

```shell
$ python3 -m venv ./venv
$ . ./venv/bin/activate
(venv) $ gdalinfo --version
GDAL 3.11.3 "Eganville", released 2025/07/12
(venv) $ pip install gdal[numpy]==3.11.3
...
(venv) $ pip install -r requirements.txt
```

You will also need to install the R stats packages required for the validation stage:

```shell
$ R -e "install.packages(c('lme4', 'lmerTest'), repos='https://cran.rstudio.com/')"
```

Before running the pipeline you will need to set several environmental variables to tell the script where to store data and where the database with the IUCN Redlist is. You can set these manually, or we recommend using a tool like [direnv](https://direnv.net).

```shell
export DATADIR=[PATH WHERE YOU WANT THE RESULTS]
export DB_HOST=localhost
export DB_NAME=iucnredlist
export DB_PASSWORD=supersecretpassword
export DB_USER=postgres
```

Once you have all that you can then run the pipeline:

```shell
(venv) $ ./scripts/run.sh
```

# Credits

The author of this package is greatly indebted to both [Francesca Ridley](https://www.ncl.ac.uk/nes/people/profile/francescaridley.html) from the University of Newcastle and [Simon Tarr](https://www.linkedin.com/in/simon-tarr-22069b209/) of the IUCN for their guidance and review.

## Data Attribution

The crosswalk table `data/crosswalk_bin_T.csv` was created by [Francesca Ridley](https://www.ncl.ac.uk/nes/people/profile/francescaridley.html) and is derived from:

```
Lumbierres, M., Dahal, P.R., Di Marco, M., Butchart, S.H.M., Donald, P.F.,
& Rondinini, C. (2022). Translating habitat class to land cover to map area
of habitat of terrestrial vertebrates. Conservation Biology, 36, e13851.
https://doi.org/10.1111/cobi.13851
```

The paper is licensed under CC BY-NC. It is used in this STAR implementation to crosswalk between the IUCN Habitat classes in the Redlist and the land classes in the Copernicus data layers.

1 change: 0 additions & 1 deletion aoh-calculator
Submodule aoh-calculator deleted from c24def
18 changes: 18 additions & 0 deletions data/crosswalk_bin_T.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
CGLS100_name,CGLS100_value,Label,H_1,H_2,H_3,H_4,H_5,H_6,H_7,H_8,H_14.1,H_14.2,H_14.3,H_14.6,H_14.4,H_14.5,H_15
CLS_20_shrubs,20,shrubs,0,1,1,0,0,0,U,1,0,0,0,0,0,0,0
CLS_30_Herbaceous_vegetation,30,Herbaceous_vegetation,0,0,0,1,0,0,U,0,0,0,0,0,0,0,0
CLS_40_CultivatedandManaged_VegetationAgriculture,40,CultivatedandManaged_VegetationAgriculture,0,0,0,1,1,0,U,0,1,1,0,0,0,0,0
CLS_50_Urban_builtup,50,Urban_builtup,0,0,0,0,0,0,U,0,0,0,0,0,1,1,0
CLS_60_bare_sparsevegetation,60,bare_sparsevegetation,0,0,1,0,0,1,U,1,0,0,0,0,0,0,0
CLS_80_permanent_water,80,permanent_water,0,0,0,0,1,0,U,0,0,0,0,0,0,0,0
CLS_90_Herbaceous_wetland,90,Herbaceous_wetland,0,0,0,0,1,0,U,0,0,0,0,0,0,0,1
CLS_111_Closedforest_evergreen_needle,111,Closedforest_evergreen_needle,1,0,0,0,0,0,U,0,0,0,0,0,0,0,0
CLS_112_Closedforest_evergreen_broad,112,Closedforest_evergreen_broad,1,0,0,0,0,0,U,0,0,0,0,0,0,0,0
CLS_114_Closedforest_deciduous_broad,114,Closedforest_deciduous_broad,1,0,0,0,0,0,U,0,0,0,0,0,0,0,0
CLS_115_Closedforest_mixed,115,Closedforest_mixed,1,0,0,0,0,0,U,0,0,0,0,0,0,0,0
CLS_116_Closedforest_unknown,116,Closedforest_unknown,1,0,0,0,0,0,U,0,0,0,0,0,0,0,0
CLS_121_Openforest_evergreen_needle,121,Openforest_evergreen_needle,1,0,0,0,0,1,U,0,0,0,0,0,0,0,0
CLS_122_Openforest_evergreen_broad,122,Openforest_evergreen_broad,1,0,0,0,0,0,U,0,0,0,0,0,0,0,0
CLS_124_Openforest_deciduous_broad,124,Openforest_deciduous_broad,0,1,0,0,0,0,U,0,0,0,0,0,0,0,0
CLS_125_Openforest_mixed,125,Openforest_mixed,1,0,0,0,0,0,U,0,0,0,0,0,0,0,0
CLS_126_Openforest_unknown,126,Openforest_unknown,0,0,0,0,0,0,U,0,0,0,0,0,0,0,0
24 changes: 21 additions & 3 deletions method.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,8 +120,12 @@ python3 ./prepare_layers/make_masks.py --habitat_layers /data/habitat_layers/cur
To assist with provenance, we download the data from the Zenodo ID.

```shark-run:reclaimer
curl -o FABDEM.zip https://data.bris.ac.uk/datasets/tar/s5hqmjcdj8yo2ibzi9b4ew3sn.zip
...
curl -o /data/FABDEM.zip https://data.bris.ac.uk/datasets/tar/s5hqmjcdj8yo2ibzi9b4ew3sn.zip
```

```shark-run:gdalonly
python3 tbd.py --input /data/FABDEM.zip \
--output /data/elevation.tif
```

Similarly to the habitat map we need to resample to 1km, however rather than picking the mean elevation, we select both the min and max elevation for each pixel, and then check whether the species is in that range when we calculate AoH.
Expand Down Expand Up @@ -214,4 +218,18 @@ python3 ./aoh-calculator/validation/validate_map_prevelence.py --collated_aoh_da

```shark-publish
/data/validation/model_validation.csv
```
```

## Threats

```shark-run:aohbuilder
python3 ./threats/threat_processing.py --speciesdata /data/species-info/* \
--aoh /data/aohs/ \
--output /data/threat_rasters

python3 ./threats/threat_summation.py --threat_rasters /data/threat_rasters --output /data/threat_results
```

```shark-publish
/data/threat_results
```
9 changes: 5 additions & 4 deletions prepare_layers/convert_crosswalk.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import argparse
from pathlib import Path

import pandas as pd

Expand Down Expand Up @@ -28,8 +29,8 @@
}

def convert_crosswalk(
original_path: str,
output_path: str,
original_path: Path,
output_path: Path,
) -> None:
original = pd.read_csv(original_path)

Expand All @@ -56,14 +57,14 @@ def main() -> None:
parser = argparse.ArgumentParser(description="Convert IUCN crosswalk to minimal common format.")
parser.add_argument(
'--original',
type=str,
type=Path,
help="Original format",
required=True,
dest="original_path",
)
parser.add_argument(
'--output',
type=str,
type=Path,
help='Destination minimal file',
required=True,
dest='output_path',
Expand Down
47 changes: 21 additions & 26 deletions prepare_layers/make_masks.py
Original file line number Diff line number Diff line change
@@ -1,62 +1,57 @@
import argparse
import os
import sys
from glob import glob
from pathlib import Path
from typing import Set

import numpy as np
from yirgacheffe.layers import RasterLayer
import yirgacheffe as yg
import yirgacheffe.operators as yo

OPEN_SEA_LCC = "lcc_200.tif"
NO_DATA_LCC = "lcc_0.tif"

def prepare_mask(
layers: Set[str],
output_path: str,
layers: Set[Path],
output_path: Path,
at_least: bool = True,
) -> None:
assert layers
rasters = [RasterLayer.layer_from_file(x) for x in layers]

intersection = RasterLayer.find_intersection(rasters)
for r in rasters:
r.set_window_for_intersection(intersection)
rasters = [yg.read_raster(x) for x in layers]

calc = rasters[0]
for r in rasters[1:]:
calc = calc + r
if at_least:
calc = calc.numpy_apply(lambda a: np.where(a >= 0.5, 1.0, 0.0))
calc = yo.where(calc >= 0.5, 1.0, 0.0)
else:
calc = calc.numpy_apply(lambda a: np.where(a > 0.5, 1.0, 0.0))
calc = yo.where(calc > 0.5, 1.0, 0.0)

with RasterLayer.empty_raster_layer_like(rasters[0], filename=output_path) as result:
calc.parallel_save(result)
calc.to_geotiff(output_path, parallelism=128)

def prepare_masks(
habitat_layers_path: str,
output_directory_path: str,
habitat_layers_path: Path,
output_directory_path: Path,
) -> None:
os.makedirs(output_directory_path, exist_ok=True)

layer_files = set(glob("lcc_*.tif", root_dir=habitat_layers_path))
layer_files = set(habitat_layers_path.glob("lcc_*.tif"))
if not layer_files:
sys.exit(f"Found no habitat layers in {habitat_layers_path}")

marine_layers = layer_files & set([OPEN_SEA_LCC])
terrerstrial_layers = layer_files - set([OPEN_SEA_LCC, NO_DATA_LCC])
marine_layers = {x for x in layer_files if x.name == OPEN_SEA_LCC}
terrerstrial_layers = {x for x in layer_files if x.name not in [OPEN_SEA_LCC, NO_DATA_LCC]}

assert len(marine_layers) == 1
assert len(terrerstrial_layers) == len(layer_files) - 2
assert len(terrerstrial_layers) < len(layer_files)

prepare_mask(
{os.path.join(habitat_layers_path, x) for x in marine_layers},
os.path.join(output_directory_path, "marine_mask.tif"),
marine_layers,
output_directory_path / "marine_mask.tif",
)

prepare_mask(
{os.path.join(habitat_layers_path, x) for x in terrerstrial_layers},
os.path.join(output_directory_path, "terrestrial_mask.tif"),
terrerstrial_layers,
output_directory_path / "terrestrial_mask.tif",
at_least=True,
)

Expand All @@ -66,14 +61,14 @@ def main() -> None:
parser = argparse.ArgumentParser(description="Generate terrestrial and marine masks.")
parser.add_argument(
'--habitat_layers',
type=str,
type=Path,
help="directory with split and scaled habitat layers",
required=True,
dest="habitat_layers"
)
parser.add_argument(
'--output_directory',
type=str,
type=Path,
help="Folder for output mask layers",
required=True,
dest="output_directory"
Expand Down
Loading