Skip to content

Geoprocessing framework for geographical & Earth Observation (EO) data

License

Notifications You must be signed in to change notification settings

ahuarte47/geodataflow

Repository files navigation

GeodataFlow

Geospatial processing framework for geographical & Earth Observation (EO) data in Python.

GeodataFlow is a Geoprocessing framework for fetching, translating and manipulating Geospatial data (Raster, Vector, EO/STAC collections) by using a Pipeline or sequence of operations on input data. It is very much like the GDAL library which handles raster and vector data.

The project is split up into several namespace packages or components:

  • geodataflow.core

    The main subpackage of GeodataFlow which implements basic building blocks (Pipeline engine & Modules) and commonly used functionalities.

  • geodataflow.api

    WebAPI component using FastAPI which provides access to GeodataFlow backend via API REST calls.

    api

  • workbench/ui

    GeodataFlow Workbench is a static javascript application for users easily draw and run their own Workflows in the Web Browser.

    workbench

    NOTE: There is no any installer for GeodataFlow Workbench yet, but you can test it loading the docker-compose.yml. Please, read related section below.

Backends:

  • spatial

    Installs the geodataflow.spatial backend implementation for GeodataFlow using GDAL/OGR.

  • dataframes

    Installs the geodataflow.dataframes backend implementation for GeodataFlow using Geopandas.

  • pySpark, Geospatial SQL, ... ?

Videos demostrating GeodataFlow:

Workflow examples

Assuming you are using geodataflow.spatial (GDAL/OGR) as active backend implementation, GeodataFlow can run workflows as the following:

  • Converting a Shapefile to GeoPackage:

    # ==============================================================
    # Pipeline sample to convert a Shapefile to GeoPackage.
    # ==============================================================
    {
      "pipeline": [
        {
          "type": "FeatureReader",
          "connectionString": "input.shp"
        },
        # Extract the Centroid of input geometries.
        {
          "type": "GeometryCentroid"
        },
        # Transform CRS of geometries.
        {
          "type": "GeometryTransform",
          "sourceCrs": 4326,
          "targetCrs": 32630
        },
        # Save features to Geopackage.
        {
          "type": "FeatureWriter",
          "connectionString": "output.gpkg"
        }
      ]
    }
  • Fetching metadata of a S2L2A Product (STAC):

    # ==============================================================
    # Pipeline sample to fetch metadata of a S2L2A Product (STAC).
    # ==============================================================
    {
      "pipeline": [
        {
          "type": "FeatureReader",
    
          # Define the input AOI in an embedded GeoJson.
          "connectionString": {
            "type": "FeatureCollection",
            "crs": {
              "type": "name",
              "properties": { "name": "EPSG:4326" }
            },
            "features": [
              {
                "type": "Feature",
                "properties": { "id": 0, "name": "My AOI for testing" },
                "geometry": {
                  "type": "Polygon",
                  "coordinates": [[
                      [-1.746826,42.773227],
                      [-1.746826,42.860866],
                      [-1.558685,42.860866],
                      [-1.558685,42.773227],
                      [-1.746826,42.773227]
                  ]]
                }
              }
            ]
          }
        },
        # Transform CRS of geometries.
        {
          "type": "GeometryTransform",
          "sourceCrs": 4326,
          "targetCrs": 32630
        },
        # Fetch metadata of EO Products that match one SpatioTemporial criteria.
        {
          "type": "EOProductCatalog",
    
          "driver": "STAC",
          "provider": "https://earth-search.aws.element84.com/v0/search",
          "product": "sentinel-s2-l2a-cogs",
    
          "startDate": "2021-09-25",
          "endDate": "2021-10-05",
          "closestToDate": "2021-09-30",
          "filter": "",
    
          "preserveInputCrs": true
        },
        # Save features to Geopackage.
        {
          "type": "FeatureWriter",
          "connectionString": "output.gpkg"
        }
      ]
    }

Installation

Because GeodataFlow is composed by several namespace packages, some of them are optional (e.g. Backend implementations). You will need to install the ones you want by adding them as an extra to the command-line that runs the installer.

In order to read and write Cloud Optimized Geotiffs (COG), GDAL version 3.1 or greater is required. If your system GDAL is older than version 3.1, consider using Docker or Conda to get a modern GDAL.

Using pypi

To install the latest stable version from pypi, write this in the command-line:

> pip install geodataflow[api,dataframes,eodag,gee]

The geodataflow package installs geodataflow.core and geodataflow.spatial ones by default. You can use namespace package installers as well (e.g. api), they have the same effect than the generic one.

Optional extras for Backends:

  • eodag

    EODAG - Earth Observation Data Access Gateway is a Python package for searching and downloading remotely sensed images while offering an unified API for data access regardless of the data provider.

  • gee

    GEE - Google Earth Engine API is a geospatial processing service. With Earth Engine, you can perform geospatial processing at scale, powered by Google Cloud Platform. GEE requires authentication, please, read available documentation here.

To view all available CLI tool commands and options:

> geodataflow --help

Listing all available modules:

> geodataflow --modules

Run a workflow in the command-line interface:

> geodataflow --pipeline_file "/geodataflow/spatial/tests/data/test_eo_stac_catalog.json"

Using docker-compose

docker-compose.yml builds images and starts GeodataFlow API and Workbench components to easily run Workflows with GeodataFlow.

PACKAGE_WITH_GEODATAFLOW_PIPELINE_CONTEXT in the yml file indicates the backend implementation to load. The default value is geodataflow.spatial. If you prefer to use another backend, please, change it before starting.

Write in the command-line from the root folder of the project:

> docker-compose up

Then, type in your favorite Web Browser:

To remove all resources:

> docker-compose down --rmi all -v --remove-orphans

Testing

Each package provides a collection of tests, run tests on tests folders to validate them.

Contribute

Have you spotted a typo in our documentation? Have you observed a bug while running GeodataFlow? Do you have a suggestion for a new feature?

Don't hesitate and open an issue or submit a pull request, contributions are most welcome!

License

GeodataFlow is licensed under Apache License v2.0. See LICENSE file for details.

Credits

GeodataFlow is built on top of amazingly useful open source projects. See NOTICE file for details about those projects and their licenses.

Thank you to all the authors of these projects!

Authors

GeodataFlow has been created by Alvaro Huarte https://www.linkedin.com/in/alvarohuarte.