diff --git a/.gitignore b/.gitignore index 4fc65075..fc4ef159 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,4 @@ OSC/OHC_4DATLANTIC_200204_202212_V2-0.nc .DS_Store -example_4datlantic \ No newline at end of file +example_4datlantic +_build/ diff --git a/OSC/creating_an_item_catalog.ipynb b/OSC/creating_an_item_catalog.ipynb deleted file mode 100644 index d7e79cd7..00000000 --- a/OSC/creating_an_item_catalog.ipynb +++ /dev/null @@ -1,2296 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "a69d53fb-b6de-4e54-8b1f-ee3604b1cb49", - "metadata": {}, - "source": [ - "# Creating a STAC Item Catalog\n", - "## Context\n", - "### Purpose\n", - "The purpose of this tutorial is to learn how to share research outcomes with the wider research community in Open Science Catalog. This can be done by creating STAC Catalogs that\n", - "describe a specific dataset you are willing to share.\n", - "\n", - "This tutorial provides steps necessary to Create STAC Item Catalog in an semi-automated way, using the PySTAC library. By following these steps you will be able to create a __self-contained__ STAC Catalog with individual items in a JSON format. This catalog should be hosted in your own (or institutional) public GitHub repository to ensure it is accessible. (See more on the requirements about this Catalog in the [documentation](https://esa-earthcode.github.io/documentation/documentation/Technical%20Documentation/Data/Contributing%20to%20the%20EarthCODE%20Catalog#step-2-creating-and-uploading-a-stac-item-catalog)).\n", - "\n", - "In this example we will upload it to an open-access repository on GitHub. In the next tutorial we will create the actual Open Science Catalog entry, where we will create a full metadata description of our dataset with a link to this Item Catalog.\n", - "\n", - "#### STAC Items\n", - "A STAC Item is the lowest level component of a STAC catalog. All STAC Items must have an associated data __Asset__, in addition to the Asset (which you can think of as a data file), the Item also contains metadata about the data itself, such as:\n", - "- SpatioTemporal extent including start and end time and geographical extent (coordinates)\n", - "- Variables\n", - "- File type\n", - "- File size\n", - "\n", - "\n", - ":::{important}\n", - "Think about the persistence of your data!\n", - "If your data files are not currently stored in an open-access and persistent storage, you can [contact the ESA team](mailto:earthcode@esa.int) who will assist you to upload your data to the ESA Project Results Repository (PRR). The same applies for the repository we will upload our STAC Item Catalog to!\n", - ":::\n", - "\n", - "### Prerequisites\n", - "In this example we assume that the data files are already uploaded to a remote storage, and we have access to the download URLs, but feel free to modify this example for your own data files!\n", - "\n", - "We will be using this supraglacial lakes dataset: https://zenodo.org/records/7568049#.ZDbG4nbP1aQ. We have also noted down the download links for 5 of the assets:\n", - "\n", - "```bash\n", - "https://zenodo.org/records/7568049/files/extent_S1B_EW_GRDH_1SDH_20171111T205337_20171111T205438_008239_00E91A_F8D1.tif\n", - "https://zenodo.org/records/7568049/files/extent_S1B_EW_GRDH_1SDH_20190224T203744_20190224T203844_015093_01C356_B9C1.tif\n", - "https://zenodo.org/records/7568049/files/extent_S1B_EW_GRDH_1SDH_20170620T205332_20170620T205433_006139_00AC89_6857.tif\n", - "https://zenodo.org/records/7568049/files/extent_S1B_EW_GRDH_1SDH_20180923T202118_20180923T202218_012847_017B82_7DD5.tif\n", - "https://zenodo.org/records/7568049/files/extent_S1B_EW_GRDH_1SDH_20181108T203747_20181108T203847_013518_01903B_D463.tif\n", - "```\n", - "\n", - "This notebook uses `Python 3.10.11`\n", - "\n", - ":::{tip}\n", - "You can reuse this example with your own data links, as long as they point to an open access, persistent and remote storage!\n", - ":::" - ] - }, - { - "cell_type": "markdown", - "id": "a588ae9a-b476-4fd4-ba03-d3d4b9ed6b17", - "metadata": {}, - "source": [ - "## Loading Libraries\n", - "Make sure you have installed the requirements, e.g. with `pip install pystac rasterio`" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "a2569c50-3381-4de0-8ca8-b1a58bf3b780", - "metadata": {}, - "outputs": [], - "source": [ - "import json\n", - "import time\n", - "import requests\n", - "\n", - "import pystac\n", - "import rasterio\n", - "from pathlib import Path" - ] - }, - { - "cell_type": "markdown", - "id": "c76844f5-2d18-4daa-9410-30e2262f5693", - "metadata": {}, - "source": [ - "# Creating the Item Catalog\n", - "To correctly reference our Items, they need to be linked in the STAC Catalog. This catalog should also include the following minimum information:\n", - "- Title of the Catalog\n", - "- Description\n", - "- ID (e.g. catalog-id: can be short name of the dataset)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "e7b3d981-3665-4d41-9add-b2f1dceed59d", - "metadata": {}, - "outputs": [], - "source": [ - "title = \"Item Catalog Example\"\n", - "description = \"A collection of supraglacial lakes data in a very useful example notebook.\"\n", - "catalog_id = \"supraglacial-lakes-example-2025\"\n", - "\n", - "catalog = pystac.Catalog(\n", - " id=catalog_id,\n", - " title=title,\n", - " description=description,\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "2abc2760-b6a0-43cc-9a84-f20e339641e2", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - "\n", - "\n", - "
\n", - "
\n", - " \n", - "
\n", - "
" - ], - "text/plain": [ - "" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "catalog" - ] - }, - { - "cell_type": "markdown", - "id": "3dde85d3-3466-4faf-8e14-174b2c8d33f2", - "metadata": {}, - "source": [ - "That's all! Most of the metadata will be added to the Items which we will add to this catalog shortly." - ] - }, - { - "cell_type": "markdown", - "id": "3506f64b-223a-4d53-87d6-33e1374ce638", - "metadata": {}, - "source": [ - "## Creating a single STAC Item\n", - "Manually creating STAC Items can be cumbersome and is prone to errors (but possible!). Luckily there are many tools that can make the process a lot easier. \n", - "\n", - "Here we will use `rio_stac` [(documentation here)](https://developmentseed.org/rio-stac/api/rio_stac/stac/) which is a library that we can use to open and extract metadata from raster datasets." - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "id": "0d632f56-c7cd-49d3-bd75-32475e0641ef", - "metadata": {}, - "outputs": [], - "source": [ - "filenames = [\n", - " \"https://zenodo.org/records/7568049/files/extent_S1B_EW_GRDH_1SDH_20171111T205337_20171111T205438_008239_00E91A_F8D1.tif\",\n", - " \"https://zenodo.org/records/7568049/files/extent_S1B_EW_GRDH_1SDH_20190224T203744_20190224T203844_015093_01C356_B9C1.tif\",\n", - " \"https://zenodo.org/records/7568049/files/extent_S1B_EW_GRDH_1SDH_20170620T205332_20170620T205433_006139_00AC89_6857.tif\",\n", - " \"https://zenodo.org/records/7568049/files/extent_S1B_EW_GRDH_1SDH_20180923T202118_20180923T202218_012847_017B82_7DD5.tif\",\n", - " \"https://zenodo.org/records/7568049/files/extent_S1B_EW_GRDH_1SDH_20181108T203747_20181108T203847_013518_01903B_D463.tif\",\n", - "]" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "id": "63ba669d-10e8-4c84-b483-425034f17c1b", - "metadata": {}, - "outputs": [], - "source": [ - "from rio_stac import create_stac_item" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "id": "2275bcc2-2800-49b4-8ad1-c57194aa254e", - "metadata": {}, - "outputs": [], - "source": [ - "item = create_stac_item(\n", - " source=filenames[0],\n", - " id=\"item_1\",\n", - " asset_name=\"data\", # EarthCODE standard asset name\n", - " # all the metadata!\n", - " with_eo=True,\n", - " with_proj=True,\n", - " with_raster=True,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "id": "a02d6174-f97f-47c9-9f2b-975b2fab70be", - "metadata": {}, - "source": [ - "Inspecting the result we can see that this function has extracted rich information about our raster file. This information is attached to the `Item`. This Item also has an `\"assets\"` attribute which references the actual data.\n", - "\n", - "
\n", - " Important: Verify that your references always point to the remote open storage!\n", - "
" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "id": "475b8490-967a-4ba2-9321-78680a0a937c", - "metadata": { - "scrolled": true - }, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - "\n", - "\n", - "
\n", - "
\n", - "
    \n", - " \n", - " \n", - " \n", - "
  • \n", - " type\n", - " \"Feature\"\n", - "
  • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
  • \n", - " stac_version\n", - " \"1.1.0\"\n", - "
  • \n", - " \n", - " \n", - " \n", - " \n", - "
  • \n", - " stac_extensions[] 3 items\n", - " \n", - "
      \n", - " \n", - " \n", - " \n", - "
    • \n", - " 0\n", - " \"https://stac-extensions.github.io/projection/v1.1.0/schema.json\"\n", - "
    • \n", - " \n", - " \n", - " \n", - "
    \n", - " \n", - "
      \n", - " \n", - " \n", - " \n", - "
    • \n", - " 1\n", - " \"https://stac-extensions.github.io/raster/v1.1.0/schema.json\"\n", - "
    • \n", - " \n", - " \n", - " \n", - "
    \n", - " \n", - "
      \n", - " \n", - " \n", - " \n", - "
    • \n", - " 2\n", - " \"https://stac-extensions.github.io/eo/v1.1.0/schema.json\"\n", - "
    • \n", - " \n", - " \n", - " \n", - "
    \n", - " \n", - "
  • \n", - " \n", - " \n", - " \n", - " \n", - "
  • \n", - " id\n", - " \"item_1\"\n", - "
  • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
  • \n", - " geometry\n", - "
      \n", - " \n", - " \n", - " \n", - "
    • \n", - " type\n", - " \"Polygon\"\n", - "
    • \n", - " \n", - " \n", - " \n", - " \n", - "
    • \n", - " coordinates[] 1 items\n", - " \n", - "
        \n", - " \n", - " \n", - "
      • \n", - " 0[] 5 items\n", - " \n", - "
          \n", - " \n", - " \n", - "
        • \n", - " 0[] 2 items\n", - " \n", - "
            \n", - " \n", - " \n", - " \n", - "
          • \n", - " 0\n", - " -50.17968937855544\n", - "
          • \n", - " \n", - " \n", - " \n", - "
          \n", - " \n", - "
            \n", - " \n", - " \n", - " \n", - "
          • \n", - " 1\n", - " 66.77834561360399\n", - "
          • \n", - " \n", - " \n", - " \n", - "
          \n", - " \n", - "
        • \n", - " \n", - " \n", - "
        \n", - " \n", - "
          \n", - " \n", - " \n", - "
        • \n", - " 1[] 2 items\n", - " \n", - "
            \n", - " \n", - " \n", - " \n", - "
          • \n", - " 0\n", - " -47.9894188361956\n", - "
          • \n", - " \n", - " \n", - " \n", - "
          \n", - " \n", - "
            \n", - " \n", - " \n", - " \n", - "
          • \n", - " 1\n", - " 66.83503196763441\n", - "
          • \n", - " \n", - " \n", - " \n", - "
          \n", - " \n", - "
        • \n", - " \n", - " \n", - "
        \n", - " \n", - "
          \n", - " \n", - " \n", - "
        • \n", - " 2[] 2 items\n", - " \n", - "
            \n", - " \n", - " \n", - " \n", - "
          • \n", - " 0\n", - " -48.056356656894216\n", - "
          • \n", - " \n", - " \n", - " \n", - "
          \n", - " \n", - "
            \n", - " \n", - " \n", - " \n", - "
          • \n", - " 1\n", - " 67.37093506267574\n", - "
          • \n", - " \n", - " \n", - " \n", - "
          \n", - " \n", - "
        • \n", - " \n", - " \n", - "
        \n", - " \n", - "
          \n", - " \n", - " \n", - "
        • \n", - " 3[] 2 items\n", - " \n", - "
            \n", - " \n", - " \n", - " \n", - "
          • \n", - " 0\n", - " -50.295235368856346\n", - "
          • \n", - " \n", - " \n", - " \n", - "
          \n", - " \n", - "
            \n", - " \n", - " \n", - " \n", - "
          • \n", - " 1\n", - " 67.31275872920898\n", - "
          • \n", - " \n", - " \n", - " \n", - "
          \n", - " \n", - "
        • \n", - " \n", - " \n", - "
        \n", - " \n", - "
          \n", - " \n", - " \n", - "
        • \n", - " 4[] 2 items\n", - " \n", - "
            \n", - " \n", - " \n", - " \n", - "
          • \n", - " 0\n", - " -50.17968937855544\n", - "
          • \n", - " \n", - " \n", - " \n", - "
          \n", - " \n", - "
            \n", - " \n", - " \n", - " \n", - "
          • \n", - " 1\n", - " 66.77834561360399\n", - "
          • \n", - " \n", - " \n", - " \n", - "
          \n", - " \n", - "
        • \n", - " \n", - " \n", - "
        \n", - " \n", - "
      • \n", - " \n", - " \n", - "
      \n", - " \n", - "
    • \n", - " \n", - " \n", - "
    \n", - "
  • \n", - " \n", - " \n", - " \n", - " \n", - "
  • \n", - " bbox[] 4 items\n", - " \n", - "
      \n", - " \n", - " \n", - " \n", - "
    • \n", - " 0\n", - " -50.295235368856346\n", - "
    • \n", - " \n", - " \n", - " \n", - "
    \n", - " \n", - "
      \n", - " \n", - " \n", - " \n", - "
    • \n", - " 1\n", - " 66.77834561360399\n", - "
    • \n", - " \n", - " \n", - " \n", - "
    \n", - " \n", - "
      \n", - " \n", - " \n", - " \n", - "
    • \n", - " 2\n", - " -47.9894188361956\n", - "
    • \n", - " \n", - " \n", - " \n", - "
    \n", - " \n", - "
      \n", - " \n", - " \n", - " \n", - "
    • \n", - " 3\n", - " 67.37093506267574\n", - "
    • \n", - " \n", - " \n", - " \n", - "
    \n", - " \n", - "
  • \n", - " \n", - " \n", - " \n", - " \n", - "
  • \n", - " properties\n", - "
      \n", - " \n", - " \n", - " \n", - "
    • \n", - " proj:epsg\n", - " 32623\n", - "
    • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
    • \n", - " proj:geometry\n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " type\n", - " \"Polygon\"\n", - "
      • \n", - " \n", - " \n", - " \n", - " \n", - "
      • \n", - " coordinates[] 1 items\n", - " \n", - "
          \n", - " \n", - " \n", - "
        • \n", - " 0[] 5 items\n", - " \n", - "
            \n", - " \n", - " \n", - "
          • \n", - " 0[] 2 items\n", - " \n", - "
              \n", - " \n", - " \n", - " \n", - "
            • \n", - " 0\n", - " 272312.5\n", - "
            • \n", - " \n", - " \n", - " \n", - "
            \n", - " \n", - "
              \n", - " \n", - " \n", - " \n", - "
            • \n", - " 1\n", - " 7416137.5\n", - "
            • \n", - " \n", - " \n", - " \n", - "
            \n", - " \n", - "
          • \n", - " \n", - " \n", - "
          \n", - " \n", - "
            \n", - " \n", - " \n", - "
          • \n", - " 1[] 2 items\n", - " \n", - "
              \n", - " \n", - " \n", - " \n", - "
            • \n", - " 0\n", - " 368812.5\n", - "
            • \n", - " \n", - " \n", - " \n", - "
            \n", - " \n", - "
              \n", - " \n", - " \n", - " \n", - "
            • \n", - " 1\n", - " 7416137.5\n", - "
            • \n", - " \n", - " \n", - " \n", - "
            \n", - " \n", - "
          • \n", - " \n", - " \n", - "
          \n", - " \n", - "
            \n", - " \n", - " \n", - "
          • \n", - " 2[] 2 items\n", - " \n", - "
              \n", - " \n", - " \n", - " \n", - "
            • \n", - " 0\n", - " 368812.5\n", - "
            • \n", - " \n", - " \n", - " \n", - "
            \n", - " \n", - "
              \n", - " \n", - " \n", - " \n", - "
            • \n", - " 1\n", - " 7475962.5\n", - "
            • \n", - " \n", - " \n", - " \n", - "
            \n", - " \n", - "
          • \n", - " \n", - " \n", - "
          \n", - " \n", - "
            \n", - " \n", - " \n", - "
          • \n", - " 3[] 2 items\n", - " \n", - "
              \n", - " \n", - " \n", - " \n", - "
            • \n", - " 0\n", - " 272312.5\n", - "
            • \n", - " \n", - " \n", - " \n", - "
            \n", - " \n", - "
              \n", - " \n", - " \n", - " \n", - "
            • \n", - " 1\n", - " 7475962.5\n", - "
            • \n", - " \n", - " \n", - " \n", - "
            \n", - " \n", - "
          • \n", - " \n", - " \n", - "
          \n", - " \n", - "
            \n", - " \n", - " \n", - "
          • \n", - " 4[] 2 items\n", - " \n", - "
              \n", - " \n", - " \n", - " \n", - "
            • \n", - " 0\n", - " 272312.5\n", - "
            • \n", - " \n", - " \n", - " \n", - "
            \n", - " \n", - "
              \n", - " \n", - " \n", - " \n", - "
            • \n", - " 1\n", - " 7416137.5\n", - "
            • \n", - " \n", - " \n", - " \n", - "
            \n", - " \n", - "
          • \n", - " \n", - " \n", - "
          \n", - " \n", - "
        • \n", - " \n", - " \n", - "
        \n", - " \n", - "
      • \n", - " \n", - " \n", - "
      \n", - "
    • \n", - " \n", - " \n", - " \n", - " \n", - "
    • \n", - " proj:bbox[] 4 items\n", - " \n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " 0\n", - " 272312.5\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - " \n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " 1\n", - " 7416137.5\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - " \n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " 2\n", - " 368812.5\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - " \n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " 3\n", - " 7475962.5\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - " \n", - "
    • \n", - " \n", - " \n", - " \n", - "
    • \n", - " proj:shape[] 2 items\n", - " \n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " 0\n", - " 2393\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - " \n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " 1\n", - " 3860\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - " \n", - "
    • \n", - " \n", - " \n", - " \n", - "
    • \n", - " proj:transform[] 9 items\n", - " \n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " 0\n", - " 25.0\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - " \n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " 1\n", - " 0.0\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - " \n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " 2\n", - " 272312.5\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - " \n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " 3\n", - " 0.0\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - " \n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " 4\n", - " -25.0\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - " \n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " 5\n", - " 7475962.5\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - " \n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " 6\n", - " 0.0\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - " \n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " 7\n", - " 0.0\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - " \n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " 8\n", - " 1.0\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - " \n", - "
    • \n", - " \n", - " \n", - " \n", - " \n", - "
    • \n", - " datetime\n", - " \"2025-04-07T10:01:20.994526Z\"\n", - "
    • \n", - " \n", - " \n", - " \n", - "
    \n", - "
  • \n", - " \n", - " \n", - " \n", - " \n", - "
  • \n", - " links[] 0 items\n", - " \n", - "
  • \n", - " \n", - " \n", - " \n", - " \n", - "
  • \n", - " assets\n", - "
      \n", - " \n", - " \n", - " \n", - "
    • \n", - " data\n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " href\n", - " \"https://zenodo.org/records/7568049/files/extent_S1B_EW_GRDH_1SDH_20171111T205337_20171111T205438_008239_00E91A_F8D1.tif\"\n", - "
      • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
      • \n", - " type\n", - " \"image/tiff; application=geotiff\"\n", - "
      • \n", - " \n", - " \n", - " \n", - " \n", - "
      • \n", - " raster:bands[] 1 items\n", - " \n", - "
          \n", - " \n", - " \n", - " \n", - "
        • \n", - " 0\n", - "
            \n", - " \n", - " \n", - " \n", - "
          • \n", - " data_type\n", - " \"int16\"\n", - "
          • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
          • \n", - " scale\n", - " 1.0\n", - "
          • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
          • \n", - " offset\n", - " 0.0\n", - "
          • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
          • \n", - " sampling\n", - " \"area\"\n", - "
          • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
          • \n", - " nodata\n", - " 0.0\n", - "
          • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
          • \n", - " statistics\n", - "
              \n", - " \n", - " \n", - " \n", - "
            • \n", - " mean\n", - " 1.0\n", - "
            • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
            • \n", - " minimum\n", - " 1\n", - "
            • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
            • \n", - " maximum\n", - " 1\n", - "
            • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
            • \n", - " stddev\n", - " 0.0\n", - "
            • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
            • \n", - " valid_percent\n", - " 0.24083415354330706\n", - "
            • \n", - " \n", - " \n", - " \n", - "
            \n", - "
          • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
          • \n", - " histogram\n", - "
              \n", - " \n", - " \n", - " \n", - "
            • \n", - " count\n", - " 11\n", - "
            • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
            • \n", - " min\n", - " 0.5\n", - "
            • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
            • \n", - " max\n", - " 1.5\n", - "
            • \n", - " \n", - " \n", - " \n", - " \n", - "
            • \n", - " buckets[] 10 items\n", - " \n", - "
                \n", - " \n", - " \n", - " \n", - "
              • \n", - " 0\n", - " 0\n", - "
              • \n", - " \n", - " \n", - " \n", - "
              \n", - " \n", - "
                \n", - " \n", - " \n", - " \n", - "
              • \n", - " 1\n", - " 0\n", - "
              • \n", - " \n", - " \n", - " \n", - "
              \n", - " \n", - "
                \n", - " \n", - " \n", - " \n", - "
              • \n", - " 2\n", - " 0\n", - "
              • \n", - " \n", - " \n", - " \n", - "
              \n", - " \n", - "
                \n", - " \n", - " \n", - " \n", - "
              • \n", - " 3\n", - " 0\n", - "
              • \n", - " \n", - " \n", - " \n", - "
              \n", - " \n", - "
                \n", - " \n", - " \n", - " \n", - "
              • \n", - " 4\n", - " 0\n", - "
              • \n", - " \n", - " \n", - " \n", - "
              \n", - " \n", - "
                \n", - " \n", - " \n", - " \n", - "
              • \n", - " 5\n", - " 1566\n", - "
              • \n", - " \n", - " \n", - " \n", - "
              \n", - " \n", - "
                \n", - " \n", - " \n", - " \n", - "
              • \n", - " 6\n", - " 0\n", - "
              • \n", - " \n", - " \n", - " \n", - "
              \n", - " \n", - "
                \n", - " \n", - " \n", - " \n", - "
              • \n", - " 7\n", - " 0\n", - "
              • \n", - " \n", - " \n", - " \n", - "
              \n", - " \n", - "
                \n", - " \n", - " \n", - " \n", - "
              • \n", - " 8\n", - " 0\n", - "
              • \n", - " \n", - " \n", - " \n", - "
              \n", - " \n", - "
                \n", - " \n", - " \n", - " \n", - "
              • \n", - " 9\n", - " 0\n", - "
              • \n", - " \n", - " \n", - " \n", - "
              \n", - " \n", - "
            • \n", - " \n", - " \n", - "
            \n", - "
          • \n", - " \n", - " \n", - " \n", - "
          \n", - "
        • \n", - " \n", - " \n", - " \n", - "
        \n", - " \n", - "
      • \n", - " \n", - " \n", - " \n", - "
      • \n", - " eo:bands[] 1 items\n", - " \n", - "
          \n", - " \n", - " \n", - " \n", - "
        • \n", - " 0\n", - "
            \n", - " \n", - " \n", - " \n", - "
          • \n", - " name\n", - " \"b1\"\n", - "
          • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
          • \n", - " description\n", - " \"gray\"\n", - "
          • \n", - " \n", - " \n", - " \n", - "
          \n", - "
        • \n", - " \n", - " \n", - " \n", - "
        \n", - " \n", - "
      • \n", - " \n", - " \n", - " \n", - "
      • \n", - " roles[] 0 items\n", - " \n", - "
      • \n", - " \n", - " \n", - "
      \n", - "
    • \n", - " \n", - " \n", - " \n", - "
    \n", - "
  • \n", - " \n", - " \n", - " \n", - "
\n", - "
\n", - "
" - ], - "text/plain": [ - "" - ] - }, - "execution_count": 18, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "item" - ] - }, - { - "cell_type": "markdown", - "id": "657f2524-f050-4c6e-b67b-d0fa4720f1a5", - "metadata": {}, - "source": [ - "## Creating the rest of our Items\n", - "Now that we have shown how to generate a single Item using `rio_stac`, we can repeat the process for the rest of our data files. The goal is to create a list of STAC Items, that we can add to our Catalog with the buit-in `Catalog.add_items()` method.\n", - "\n", - "We could in principle just iterate over the method above, but in order to respect the rate limits for our data provider (Zenodo), we define a function which reads the response headers and responds appropriately.\n", - "\n", - "This function also saves the file to a local temporary destination instead of reading the data from Zenodo directly." - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "id": "bcfa894a-b367-460f-be43-12d72f081004", - "metadata": {}, - "outputs": [], - "source": [ - "import time\n", - "import requests\n", - "\n", - "def download_zenodo_file(url: str, local_path: str, max_retries: int = 5) -> None:\n", - " \"\"\"\n", - " Download a file from Zenodo into local_path, respecting rate limits if we hit 429 responses.\n", - " \n", - " :param url: The direct download URL from Zenodo.\n", - " :param local_path: Where to save the file locally.\n", - " :param max_retries: Number of times to retry the download if repeatedly rate-limited.\n", - " \"\"\"\n", - " attempt = 0\n", - " \n", - " while attempt < max_retries:\n", - " response = requests.get(url, stream=True)\n", - " \n", - " if response.status_code == 200:\n", - " with open(local_path, 'wb') as f:\n", - " for chunk in response.iter_content(chunk_size=8192):\n", - " f.write(chunk)\n", - " return \n", - " \n", - " # If rate-limited (HTTP 429), then check rate-limit headers and wait.\n", - " elif response.status_code == 429:\n", - " attempt += 1\n", - " if reset_timestamp := response.headers.get(\"X-RateLimit-Reset\") is not None:\n", - " now = time.time()\n", - " wait_seconds = int(reset_timestamp) - int(now)\n", - " wait_seconds = max(wait_seconds, 1) # Wait at least 1 second.\n", - " print(f\"Got 429 Too Many Requests. Waiting ~{wait_seconds} seconds.\")\n", - " time.sleep(wait_seconds)\n", - " else:\n", - " response.raise_for_status()\n", - " \n", - " raise RuntimeError(f\"Failed to download {url} after {max_retries} retries.\")" - ] - }, - { - "cell_type": "markdown", - "id": "95eb0444-b091-4b54-9386-279a106cf8b6", - "metadata": {}, - "source": [ - "Now we can iterate over the rest of our data files and create the STAC items." - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "id": "a784725e-0aa5-4f2f-b773-f7abf16335b2", - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "import rasterio\n", - "\n", - "items = []\n", - "local_tmp_file = \"tmp.tif\"\n", - "\n", - "for idx, remote_url in enumerate(filenames[0:]):\n", - " # Save our dataset to the temporary file\n", - " download_zenodo_file(remote_url, local_tmp_file)\n", - "\n", - " # Inspect the local file and create a STAC Item\n", - " item = create_stac_item(\n", - " source=local_tmp_file,\n", - " id=f\"item_{idx+1}\",\n", - " asset_name=\"data\",\n", - " asset_href=remote_url, # Explicitly set the asset reference to the remote one!\n", - " with_eo=True,\n", - " with_proj=True,\n", - " with_raster=True,\n", - " )\n", - "\n", - " items.append(item)" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "id": "87bf869f-7a23-4391-a121-59152458d869", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "https://zenodo.org/records/7568049/files/extent_S1B_EW_GRDH_1SDH_20171111T205337_20171111T205438_008239_00E91A_F8D1.tif\n", - "https://zenodo.org/records/7568049/files/extent_S1B_EW_GRDH_1SDH_20190224T203744_20190224T203844_015093_01C356_B9C1.tif\n", - "https://zenodo.org/records/7568049/files/extent_S1B_EW_GRDH_1SDH_20170620T205332_20170620T205433_006139_00AC89_6857.tif\n", - "https://zenodo.org/records/7568049/files/extent_S1B_EW_GRDH_1SDH_20180923T202118_20180923T202218_012847_017B82_7DD5.tif\n", - "https://zenodo.org/records/7568049/files/extent_S1B_EW_GRDH_1SDH_20181108T203747_20181108T203847_013518_01903B_D463.tif\n" - ] - } - ], - "source": [ - "# Verify that our items all point to the correct reference\n", - "for item in items:\n", - " print(item.assets['data'].href)" - ] - }, - { - "cell_type": "markdown", - "id": "205ac2a5-404b-477f-9c69-9ddfb8e48259", - "metadata": {}, - "source": [ - "Looks good!" - ] - }, - { - "cell_type": "markdown", - "id": "a86efb05-9b40-4d91-a655-827ba483fe76", - "metadata": {}, - "source": [ - "## Adding Items to our Item Catalog\n", - "Now that we have defined our items, we can add them to our catalog." - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "id": "0b730d79-374a-40d4-9587-aaacf18ada3c", - "metadata": { - "scrolled": true - }, - "outputs": [ - { - "data": { - "text/plain": [ - "[>,\n", - " >,\n", - " >,\n", - " >,\n", - " >]" - ] - }, - "execution_count": 22, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "catalog.add_items(items)" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "id": "13759136-395a-411f-a016-4cb4d011e26a", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - "\n", - "\n", - "
\n", - "
\n", - "
    \n", - " \n", - " \n", - " \n", - "
  • \n", - " type\n", - " \"Catalog\"\n", - "
  • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
  • \n", - " id\n", - " \"supraglacial-lakes-example-2025\"\n", - "
  • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
  • \n", - " stac_version\n", - " \"1.1.0\"\n", - "
  • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
  • \n", - " description\n", - " \"A collection of supraglacial lakes data in a very useful example notebook.\"\n", - "
  • \n", - " \n", - " \n", - " \n", - " \n", - "
  • \n", - " links[] 5 items\n", - " \n", - "
      \n", - " \n", - " \n", - " \n", - "
    • \n", - " 0\n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " rel\n", - " \"item\"\n", - "
      • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
      • \n", - " href\n", - " None\n", - "
      • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
      • \n", - " type\n", - " \"application/geo+json\"\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - "
    • \n", - " \n", - " \n", - " \n", - "
    \n", - " \n", - "
      \n", - " \n", - " \n", - " \n", - "
    • \n", - " 1\n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " rel\n", - " \"item\"\n", - "
      • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
      • \n", - " href\n", - " None\n", - "
      • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
      • \n", - " type\n", - " \"application/geo+json\"\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - "
    • \n", - " \n", - " \n", - " \n", - "
    \n", - " \n", - "
      \n", - " \n", - " \n", - " \n", - "
    • \n", - " 2\n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " rel\n", - " \"item\"\n", - "
      • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
      • \n", - " href\n", - " None\n", - "
      • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
      • \n", - " type\n", - " \"application/geo+json\"\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - "
    • \n", - " \n", - " \n", - " \n", - "
    \n", - " \n", - "
      \n", - " \n", - " \n", - " \n", - "
    • \n", - " 3\n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " rel\n", - " \"item\"\n", - "
      • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
      • \n", - " href\n", - " None\n", - "
      • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
      • \n", - " type\n", - " \"application/geo+json\"\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - "
    • \n", - " \n", - " \n", - " \n", - "
    \n", - " \n", - "
      \n", - " \n", - " \n", - " \n", - "
    • \n", - " 4\n", - "
        \n", - " \n", - " \n", - " \n", - "
      • \n", - " rel\n", - " \"item\"\n", - "
      • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
      • \n", - " href\n", - " None\n", - "
      • \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
      • \n", - " type\n", - " \"application/geo+json\"\n", - "
      • \n", - " \n", - " \n", - " \n", - "
      \n", - "
    • \n", - " \n", - " \n", - " \n", - "
    \n", - " \n", - "
  • \n", - " \n", - " \n", - " \n", - " \n", - "
  • \n", - " title\n", - " \"Item Catalog Example\"\n", - "
  • \n", - " \n", - " \n", - " \n", - "
\n", - "
\n", - "
" - ], - "text/plain": [ - "" - ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "catalog" - ] - }, - { - "cell_type": "markdown", - "id": "f07db6c1-814c-400e-a9c3-c6de45a5dfa3", - "metadata": {}, - "source": [ - "## Saving the Catalog\n", - "If we inspect the Catalog, we can see that it now contains Links to the Items, but the links themselves don't contain any references.\n", - "\n", - "With PySTAC, the real magic happens when you __normalize and save__ the Catalog. This is when all the links are resolved, and a folder structure will be laid out following best practices, automatically!\n", - "\n", - "What we will do is to specify a target location, which will become the root folder of our Catalog. When we __normalize__ the Catalog to this folder, all the internal references will be resolved with relative paths. When we __save__ the Catalog, PySTAC will generate the JSON files in the folder we just normalized to.\n", - "\n", - "We normalize and save the Catalog as \"self contained\". Here is the description of a self-contained catalog from the [PySTAC API documentation](https://pystac.readthedocs.io/en/stable/api/catalog.html#pystac.catalog.CatalogType):\n", - "\n", - ">\"A ‘self-contained catalog’ is one that is designed for portability. Users may want to download an online catalog from and be able to use it on their local computer, so all links need to be relative.\"\n", - "\n", - "In other words, exactly what we want to make our data accessible!" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "id": "b97752b5-0e90-484f-b391-3cd544a0fd6e", - "metadata": {}, - "outputs": [], - "source": [ - "output_folder = \"supraglacial-lakes-item-catalog\"" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "id": "d12452c9-01aa-4ce4-8aea-94b1601e3160", - "metadata": {}, - "outputs": [], - "source": [ - "catalog.normalize_and_save(root_href=output_folder, catalog_type=pystac.CatalogType.SELF_CONTAINED)" - ] - }, - { - "cell_type": "markdown", - "id": "a1d5a2af-14aa-4cea-a704-fd137da6c4f8", - "metadata": {}, - "source": [ - "If you inspect the Catalog you can see that PySTAC has added a few more links to our Catalog, namely to the root catalog and itself, which in this istance is the same.\n", - "\n", - "Also notice that the `Link.href` attributes show absolute paths in the notebook. However, you should now have the actual STAC catalog saved in the `output_folder`.\n", - "\n", - "The folder will have the following structure:\n", - "```\n", - "supraglacial-lakes-item-catalog\n", - "├── catalog.json\n", - "├── item_1\n", - "│   └── item_1.json\n", - "├── item_2\n", - "│   └── item_2.json\n", - "├── item_3\n", - "│   └── item_3.json\n", - "├── item_4\n", - "│   └── item_4.json\n", - "└── item_5\n", - " └── item_5.json\n", - "```\n", - "\n", - "Looking at the `catalog.json`:\n", - "```json\n", - "{\n", - " \"type\": \"Catalog\",\n", - " \"id\": \"supraglacial-lakes-example-2025\",\n", - " \"stac_version\": \"1.1.0\",\n", - " \"description\": \"A collection of supraglacial lakes data in a very useful example notebook.\",\n", - " \"links\": [\n", - " {\n", - " \"rel\": \"root\",\n", - " \"href\": \"./catalog.json\",\n", - " \"type\": \"application/json\",\n", - " \"title\": \"Item Catalog Example\"\n", - " },\n", - " {\n", - " \"rel\": \"item\",\n", - " \"href\": \"./item_1/item_1.json\",\n", - " \"type\": \"application/geo+json\"\n", - " },\n", - " {\n", - " \"rel\": \"item\",\n", - " \"href\": \"./item_2/item_2.json\",\n", - " \"type\": \"application/geo+json\"\n", - " },\n", - " {\n", - " \"rel\": \"item\",\n", - " \"href\": \"./item_3/item_3.json\",\n", - " \"type\": \"application/geo+json\"\n", - " },\n", - " {\n", - " \"rel\": \"item\",\n", - " \"href\": \"./item_4/item_4.json\",\n", - " \"type\": \"application/geo+json\"\n", - " },\n", - " {\n", - " \"rel\": \"item\",\n", - " \"href\": \"./item_5/item_5.json\",\n", - " \"type\": \"application/geo+json\"\n", - " }\n", - " ],\n", - " \"title\": \"Item Catalog Example\"\n", - "}\n", - "```\n", - "we can verify that everything looks correct.\n", - "\n", - "The item JSON files should have the following links:\n", - "```json\n", - " \"links\": [\n", - " {\n", - " \"rel\": \"root\",\n", - " \"href\": \"../catalog.json\",\n", - " \"type\": \"application/json\",\n", - " \"title\": \"Item Catalog Example\"\n", - " },\n", - " {\n", - " \"rel\": \"parent\",\n", - " \"href\": \"../catalog.json\",\n", - " \"type\": \"application/json\",\n", - " \"title\": \"Item Catalog Example\"\n", - " }\n", - " ],\n", - "```\n", - "... among all the other metadata we have added.\n", - "\n", - "
\n", - " That's it! We have now created a self-contained STAC Item Catalog that contains all the metadata of our data, in compliance with the EarthCODE specifications for FAIR and open science. Now we just need to upload it to somewhere people can access it.\n", - "
" - ] - }, - { - "cell_type": "markdown", - "id": "bf6040df-be53-4b9e-9fa9-4fe614b8ac17", - "metadata": {}, - "source": [ - "## Upload the Item Catalog\n", - "In this part we will upload the Catalog in order to make it available. __Feel free to do this in any way you like as long as you are sure the files will remain accessible!__\n", - "\n", - "A good option is to upload the files we just created to GitHub. In the next part of the tutorial, when we will create an entry to the Open Science Catalog, we will only need the URL for the `catalog.json` we have in our root. The STAC browser will read the files directly from this repository and extract all the information from our Items automatically.\n", - "\n", - "We will now show how this can be done with GitHub and the git CLI." - ] - }, - { - "cell_type": "markdown", - "id": "5ba27d20-8ced-4df1-9cb7-45c655bab909", - "metadata": {}, - "source": [ - "### Create a public GitHub repository\n", - "Go to [github.com/new](github.com/new) and create a remote repository. Here we will name it the same as our local folder, make sure it is set to __public__, and ignore everything else.\n", - "\n", - ":::{figure} ./images/gh-new-repo.png\n", - ":alt: GitHub create a new repository\n", - ":align: center\n", - "GitHub create a new repository\n", - ":::" - ] - }, - { - "cell_type": "markdown", - "id": "8da4c0cc-048d-40eb-bf16-523a7ad42a41", - "metadata": {}, - "source": [ - "After creating the repository, you can simply click __upload existing files__ to upload your files manually, or if you are comfortable with git, do it through the command line interface:\n", - "\n", - "```bash\n", - "# Navigate to the Item Catalog we want to upload\n", - "cd supraglacial-lakes-item-catalog\n", - "\n", - "# Initialise it as a git repository\n", - "git init\n", - "\n", - "# Add the URL to your newly created GitHub repository as the remote version of your local files\n", - "git remote add origin https://github.com//supraglacial-lakes-item-catalog.git\n", - "\n", - "# Add and commit your files\n", - "git add --all\n", - "git commit -m \"Useful commit message\"\n", - "\n", - "# Set the remote main as the upstream version of your local main, and push your changes\n", - "git push --set-upstream origin main\n", - "```" - ] - }, - { - "cell_type": "markdown", - "id": "f0c8e14e-84a8-4be6-9e02-952a72e75c2f", - "metadata": {}, - "source": [ - "When you refresh GitHub page, you should see your STAC catalog.\n", - "\n", - ":::{figure} ./images/gh-repo-complete.png\n", - ":alt: New GitHub repository\n", - ":align: center\n", - "New GitHub repository\n", - ":::" - ] - }, - { - "cell_type": "markdown", - "id": "5cc8eff5-62d4-401b-9bb0-bc64021c7565", - "metadata": {}, - "source": [ - "__That's it!__\n", - "\n", - "In the next stages we will explain how to create and add your product to the Open Science Catalog, linking to the Items we just created. " - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.11" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/OSC/index.md b/OSC/index.md index 4a67ab0d..6776f80d 100644 --- a/OSC/index.md +++ b/OSC/index.md @@ -1,16 +1,18 @@ # Open Science Catalog -# Uploading data to the Open Science Catalog -Here you can find guides and example notebooks on how to use the Open Science Catalog. +The [Open Science Catalog (OSC)](https://opensciencedata.esa.int/) is a key component of the ESA EO Open Science framework. It is a publicly available web-based application designed to provide easy access to scientific resources including geoscience products, workflows, experiments and documentation from activities and projects funded by ESA under the EO Programme. -## Uploading Data -If you are looking to upload your data please refer to the following guides. Choose the method that best suits your use case! -### 1: Creating and Uploading an Item Catalog -- [Creating an Item Catalog](./creating_an_item_catalog.ipynb) - A notebook explaining how Item Catalogs should be created, uses raster data. +There are three ways to add information to the OSC: + +### 1: Using a Visual GUI -### 2: Adding a Product entry to the Open Science Catalog -- [PySTAC notebook](./manual_example.md) - A guide for manually creating Product entries using PySTAC. Requires some knowledge of git. - [Git Clerk](./git_clerk_example.md) - A guide for using the Git Clerk tool which is a user interface for automatically creating product entries and creating a Pull Request in the OSC GitHub Repo. -- [DeepCode](./deepcode_example.md) - An example using DeepCode: a library for automatically generating product entries for DeepESDL datasets. +### 2: Manually opening a PR +- [Directly editing the json files](./osc_pr_manual.ipynb) - A guide for manually creating Product entries. Requires knowledge of git. + +- [Generating OSC files using pystac](./osc_pr_pystac.ipynb) - A guide for creating Product entries using pystac. Requires knowledge of git and Python. + +### 3: Using one of the platform tools +- [DeepCode](./deepcode_example.md) - An example using DeepCode: a library for automatically generating product entries for DeepESDL datasets. diff --git a/OSC/manual_example.md b/OSC/manual_example.md deleted file mode 100644 index 866f5cab..00000000 --- a/OSC/manual_example.md +++ /dev/null @@ -1,2 +0,0 @@ -# Manual Example (PySTAC) -A tutorial on how to manually (and painstakingly) add a product to the OSC. \ No newline at end of file diff --git a/OSC/osc_pr_manual.ipynb b/OSC/osc_pr_manual.ipynb new file mode 100644 index 00000000..dc0261b7 --- /dev/null +++ b/OSC/osc_pr_manual.ipynb @@ -0,0 +1,455 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "67eeae76", + "metadata": {}, + "source": [ + "# Adding new content to Open Science Catalogue with Pull Request (PR)\n", + "\n", + "The [Open Science Catalog (OSC)](https://opensciencedata.esa.int/) is a key component of the ESA EO Open Science framework. It is a publicly available web-based application designed to provide easy access to scientific resources including geoscience products, workflows, experiments and documentation from activities and projects funded by ESA under the EO Programme. \n", + "\n", + "The Open Science Catalog is built on the Spatio Temporal Asset Catalog (STAC), which is a standardised format for describing geospatial data. Throught the open source STAC browser, the catalog allows users to browse and explore interlinked elements such as themes, variables, EO missions, projects, products, workflows, and experiments, all described using STAC-compliant JSON files. This schema ensures that these can be easily reused by other scientists and automated workflowss and correclty displayed in the web browser. Data, workflows, and experiments are documented in the catalogue primarily through enriched metadata and direct links to the corresponding research outcomes. The physical location of these resources is typically indicated via the Project Results Repository or other secure external repositories. Further details on the OSC format can be found [here](https://github.com/ESA-EarthCODE/open-science-catalog-metadata/wiki/System-Design-Document-%E2%80%90-v1.0.0). \n", + "\n", + "## Adding information to the OSC\n", + "\n", + "There are three ways to add information to the OSC.\n", + "- **Manually opening a pull request (this tutorial)**\n", + "- Using the GUI editor.\n", + "- Using one of the platform specific tools\n", + "\n", + "**This notebook describes how you can add information to the OSC by manually creating and editting json files that describe STAC Collections and Catalogs.** The steps to add information in this way are:\n", + "1. Fork the repository\n", + "2. Add the information about project/product/workflow/variables in STAC json format.\n", + "3. Open a PR to merge the new information into the OSC.\n", + "\n", + "In general most of the information that you need, is already in your data or project documentation, so you will NOT need to generate anything new. **All information that you provide will be automatically validated and manually verified by an EarthCODE team member.** Therefore, you can use the automatic validation from the CI to make the appropriate changes to the format or information you provide. \n", + "\n", + "### 1. Forking the repository\n", + "\n", + "Since the OSC metadata is fully hosted on GitHub. Use your personal GitHub account to cotnribute to the catalog. If you do not have an account, you need to setup a new GitHub account: https://docs.github.com/en/get-started/start-your-journey/creating-an-account-on-github. \n", + "\n", + "To contribute your research outputs, you need to create valid STAC objects and commit them to the [open-science-catalog-metadata repository on GitHub](https://github.com/ESA-EarthCODE/open-science-catalog-metadata). \n", + "The first step in this process is to fork the `open-science-catalog-metadata` repository, that will create your own copy of the Open Science Catalog. ( More information about how to do this in GitHub is available here: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo )\n", + "\n", + "\n", + "Once you have a OSC copy, you should have a look at the folder structure and information for an existing Item of the same type as the one you want to add - product, project, workflow, variable etc. This will give you an idea of the required information for a valid STAC Object. These STAC objects, stored as JSON files, are be automatically processed and rendered in the catalog viewer. " + ] + }, + { + "cell_type": "markdown", + "id": "d8b2d1da", + "metadata": {}, + "source": [ + "\n", + "### 2. Add the information about project/product/workflow/experiments/variables.\n", + "\n", + "After you forked the repository, you can start adding the required information. [This document](https://github.com/stac-extensions/osc?tab=readme-ov-file) explains the Open Science Catalog Extension to the SpatioTemporal Asset Catalog (STAC) specification. There are different requirements depending on the catalog entry you are trying to add.\n", + "\n", + "```TIP. \n", + "Sometimes its easier to copy the folder of existing project/product/workflow, rename it and start changing its information.\n", + "For example, copying the contents of this folder products/sentinel3-ampli-ice-sheet-elevation/, renaming it to products/New_Project_Name/ and editing its values.\n", + "\n", + "```\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "1596d488", + "metadata": {}, + "source": [ + "\n", + "#### 2.1 Add new Project\n", + "\n", + "Projects are the containers that have the top level information about your work. It is the first type of information you should provide. Typically an OSC project corresponds to a project financed by the European Space Agency - Earth Observation programme. Before creating new project, check if your project is not already on the [list of onboarded projects](https://opensciencedata.esa.int/projects/catalog). In such case you can use your project entry and only update it where needed.\n", + "\n", + "\n", + "| **Field** | **Description** | **STAC representation** |\n", + "|--------------------|--------------------------|------------------------------------|\n", + "| Project_ID | Numeric identifier | |\n", + "| Status | “ongoing” or “completed” | osc:status property |\n", + "| Project_Name | Name | title property |\n", + "| Short_Description | | description property |\n", + "| Website | | link |\n", + "| Eo4Society_link | | link |\n", + "| Consortium | | contacts[].name property |\n", + "| Start_Date_Project | | extent.temporal[] property |\n", + "| End_Date_Project | | extent.temporal[] property |\n", + "| TO | | contacts[].name property |\n", + "| TO_E-mail | | contacts[].emails[].value property |\n", + "| Theme1 - Theme6 | Theme identifiers | osc:themes property |\n", + "\n", + "\n", + "Metadata of each project is stored in a folder named after their unique id `(collectionid)`. Each folder has one file - collection.json that has all the project information (metadata). Have a look at the structure of the Project entry below (with example values filled in):\n", + "\n", + "```json\n", + "{\n", + " 'type': 'Collection', // This is the STAC type specification. You dont need to change this\n", + " 'stac_version': '1.1.0', // This is the STAC version specification. You dont need to change this\n", + " 'id': 'worldcereal2', // This is your project id. Please make sure to use unique id name for your project! The parent folder of the collection.json should have the same name as this id (not displayed in the browser).\n", + " 'title': 'WorldCereal2', // Title of your project. Official acronym of the project may be used as well (this will be displayed to public)\n", + " 'description': 'WorldCereal is an ESA initiative that provides global '\n", + " 'cropland and crop type maps at 10-meter resolution, offering '\n", + " 'seasonally updated data on temporary crops, croptypes (maize, '\n", + " 'winter cereals and spring cereals), and irrigation.', // A short, but meaningful description of your project.\n", + " 'links': [ // links to related elements of the catalog. The first two links should always be present and are always the same.\n", + " {'href': '../../catalog.json',\n", + " 'rel': 'root',\n", + " 'title': 'Open Science Catalog',\n", + " 'type': 'application/json'},\n", + " {'href': '../catalog.json',\n", + " 'rel': 'parent',\n", + " 'title': 'Projects',\n", + " 'type': 'application/json'},\n", + " // The next two links are external links to project websites. These are mandatory and you have to adapt them to your project.\n", + " {'href': 'https://esa-worldcereal.org/en', # your dedicated project page\n", + " 'rel': 'via',\n", + " 'title': 'Website'},\n", + " {'href': 'https://eo4society.esa.int/projects/worldcereal-global-crop-monitoring-at-field-scale/', #link to the project page on EO4Society website\n", + " 'rel': 'via',\n", + " 'title': 'EO4Society Link'},\n", + " // The next link is a link to the themes specified in the themes field below. It is mandatory to have a link to all themes specified in the themes array\n", + " {'href': '../../themes/land/catalog.json', #related theme of the project\n", + " 'rel': 'related',\n", + " 'title': 'Theme: Land',\n", + " 'type': 'application/json'}\n", + " ],\n", + "\n", + " 'themes': [ // this is an array of the ESA themes the project relates to. The fields are restricted to the themes available in the OCS. The format of the array is id:theme and having at least one theme is mandatory. Check available themes here: https://opensciencedata.esa.int/themes/catalog\n", + " {'concepts': [{'id': 'land'}], \n", + " 'scheme': 'https://github.com/stac-extensions/osc#theme'}\n", + " ],\n", + "\n", + " 'stac_extensions': [ // which schemas is the project information validated against. Typically you would not change these.\n", + " 'https://stac-extensions.github.io/osc/v1.0.0/schema.json', \n", + " 'https://stac-extensions.github.io/themes/v1.0.0/schema.json',\n", + " 'https://stac-extensions.github.io/contacts/v0.1.1/schema.json'\n", + " ]\n", + " 'osc:status': 'completed', // status of the project - Select from: completed, ongoing, scheduled\n", + " 'osc:type': 'project', // Type of OSC STAC collection, for projects should always be project\n", + " 'updated': '2025-07-14T17:03:29Z', // when was last update made\n", + " 'extent': {'spatial': {'bbox': [[-180.0, -90.0, 180.0, 90.0]]}, // The study area of the project and its planned duration.\n", + " 'temporal': {'interval': [['2021-01-01T00:00:00Z',\n", + " '2021-12-31T23:59:59Z']]}}\n", + " 'license': 'proprietary' // Top level license of project outcomes. Should be one of https://github.com/ESA-EarthCODE/open-science-catalog-validation/blob/main/schemas/license.json\n", + "\n", + " // list of consortium members working on the project and contact to ESA TO following the project. This field is required.\n", + " 'contacts': [{'emails': [{'value': 'Zoltan.Szantoi@esa.int'}],\n", + " 'name': 'Zoltan Szantoi',\n", + " 'roles': ['technical_officer']},\n", + " {'name': 'VITO Remote Sensing', 'roles': ['consortium_member']}\n", + " \n", + " }\n", + "\n", + "```\n", + "\n", + "In addition to specifying the links within the project collection.json entry (created above), you should also add an entry in the parent catalog, listing all projects to be correclty rendered into STAC Browser. Once done **it is required** to add the following link (as provided below) to: ```projects/catalog.json``` .
\n", + "Add this links array into the project/catalog.json just after the last project entry. Edit the catalog.json direclty by copy-and paste the followinf link (updated according to the data from your collection.json)\n", + "\n", + "```json\n", + "{\n", + " 'rel':'child', \n", + " 'target: './{project_id}/collection.json', // use the collectionid of the project\n", + " 'media_type': 'application/json',\n", + " 'title': '{project_title}' // title of th project as described in the collection.json file created before. \n", + "}\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "a3012337", + "metadata": {}, + "source": [ + "\n", + "#### 2.2 Add new Product\n", + "\n", + "Products represent the outputs of you projects and typically reference datasets. Similarly to Projects, they are STAC items and follow similar structure, with some additional fields, improving their findability.\n", + "\n", + "| **Field** | **Description** | **STAC representation** |\n", + "|---------------------|---------------------------------------|---------------------------------------|\n", + "| **ID** | Numeric identifier | |\n", + "| **Status** | “ongoing” or “completed” | osc:status property |\n", + "| **Project** | The project identifier | osc:project property, collection link |\n", + "| **Website** | | link |\n", + "| **Product** | Name | link |\n", + "| **Short_Name** | | identifier |\n", + "| **Description** | | description property |\n", + "| **Access** | URL | link |\n", + "| **Documentation** | URL | link |\n", + "| **Version** | | version property |\n", + "| **DOI** | Digital Object Identifier | sci:doi property and cite-as link |\n", + "| **Variable** | Variable identifier | collection link |\n", + "| **Start** | | extent.temporal[] |\n", + "| **End** | | extent.temporal[] |\n", + "| **Region** | | osc:region property |\n", + "| **Polygon** | | geometry |\n", + "| **Released** | | created property |\n", + "| **Theme1 - Theme6** | Theme identifiers | osc:themes property |\n", + "| **EO_Missions** | Semi-colon separated list of missions | osc:missions property |\n", + "| **Standard_Name** | | cf:parameter.name property |\n", + "\n", + "\n", + "```json\n", + "{\n", + " 'type': 'Collection', // This is the STAC type specification. You dont need to change this\n", + " 'id': 'worldcereal-crop-extent-belgium2', // This is the unique id of the product. Typically contains the dataset title+project name (or acronym)\n", + " 'stac_version': '1.0.0', // This is the STAC version specification. You dont need to change this\n", + " 'stac_extensions': [ // which schemas is the product information validated against. Typically you would not change these.\n", + " 'https://stac-extensions.github.io/osc/v1.0.0/schema.json',\n", + " 'https://stac-extensions.github.io/themes/v1.0.0/schema.json',\n", + " 'https://stac-extensions.github.io/cf/v0.2.0/schema.json'\n", + " ],\n", + " 'created': '2025-07-14T17:37:16Z', //initial creation date\n", + " 'updated': '2025-07-14T17:37:16Z' // date of the last update\n", + " 'title': 'WorldCereal Crop Extent - Belgium2', // product title\n", + " 'description': 'WorldCereal is an ESA initiative that provides global ' // Short, but meaningful product description. It should provide enough information to the external users on the specific product.\n", + " 'cropland and crop type maps at 10-meter resolution, offering '\n", + " 'seasonally updated data on temporary crops, croptypes (maize, '\n", + " 'winter cereals and spring cereals), and irrigation. This '\n", + " 'dataset provides the outputs for Belgium.',\n", + "\n", + " 'extent': {'spatial': {'bbox': [[-180.0, -90.0, 180.0, 90.0]]}, // the temporal and spatial extent of the product\n", + " 'temporal': {'interval': [['2021-01-01T00:00:00Z',\n", + " '2021-12-31T23:59:59Z']]}},\n", + " 'keywords': ['Crops', 'Cereal'], // list of keywords associated with the product. These are expected to be inline with the description.\n", + "\n", + " 'osc:project': 'worldcereal2', //unique id of the OSC project, this product is associated with. It must be the id provided in the ./project/(collectionid)\n", + " 'osc:region': 'Belgium', //text description of the study area\n", + " 'osc:status': 'ongoing', //product status\n", + " 'osc:type': 'product', // Type of OSC STAC collection, for products should always be product\n", + " \n", + " 'links': [ // links to different elements of the catalog. The first two links should always be present and are always the same.\n", + " \n", + " {'href': '../../catalog.json',\n", + " 'rel': 'root',\n", + " 'title': 'Open Science Catalog',\n", + " 'type': 'application/json'},\n", + " {'href': '../catalog.json',\n", + " 'rel': 'parent',\n", + " 'title': 'Products',\n", + " 'type': 'application/json'},\n", + " {'href': '../../projects/worldcereal2/collection.json', // link to parent project (associated project)\n", + " 'rel': 'related',\n", + " 'title': 'Project: WorldCereal2',\n", + " 'type': 'application/json'},\n", + "\n", + " {'href': '../../themes/land/catalog.json', // link to the theme (scientific domain) this product is associated with.\n", + " 'rel': 'related',\n", + " 'title': 'Theme: Land',\n", + " 'type': 'application/json'},\n", + " {'href': '../../eo-missions/sentinel-2/catalog.json', // link to eo-missions used to produce the outcomes\n", + " 'rel': 'related',\n", + " 'title': 'EO Mission: Sentinel-2',\n", + " 'type': 'application/json'},\n", + " {'href': '../../variables/crop-yield-forecast/catalog.json', // link to variables specified below.\n", + " 'rel': 'related',\n", + " 'title': 'Variable: Crop Yield Forecast',\n", + " 'type': 'application/json'},\n", + "\n", + " {'href': 'https://eoresults.esa.int/browser/#/external/eoresults.esa.int/stac/collections/ESA_WORLDCEREAL_SPRINGCEREALS', // link to dataset hosted in ESA Project Results Repository (PRR). \n", + " 'rel': 'child',\n", + " 'title': 'ESA WorldCereal Spring Cereals'},\n", + "\n", + " {'href': 'https://eoresults.esa.int/browser/#/external/eoresults.esa.int/stac/collections/ESA_WORLDCEREAL_SPRINGCEREALS',\n", + " 'rel': 'via',\n", + " 'title': 'Access'}, // external link to the actual data\n", + " {'href': 'https://worldcereal.github.io/worldcereal-documentation/',\n", + " 'rel': 'via',\n", + " 'title': 'Documentation'} // external link to data documentation\n", + "],\n", + " 'osc:missions': ['sentinel-2'], // array of ESA missions related to the product. This array of values is mandatory and limited to missions already existing in the OSC. If you would like to associate your product to a mission that is not on the list, create eo-mission entry first. \n", + " 'osc:variables': ['crop-yield-forecast'], // array of variables related to the product. This array of values is mandatory and limited to variables already existing in the OSC. If you would like to associate your product to a mission that is not on the list, create eo-mission entry first. \n", + " 'cf:parameter': [{'name': 'crop-yield-forecast'}], // optional parameters following cf conventions\n", + " \n", + " 'sci:doi': 'https://doi.org/10.57780/s3d-83ad619', // DOI, if already assigned\n", + " \n", + " 'themes': [ // this is an array of the ESA themes the project relates to. The fields are restricted to the themes available in the OCS. The format of the array is id:theme and having at least one theme is mandatory.\n", + " {'concepts': [{'id': 'land'}],\n", + " 'scheme': 'https://github.com/stac-extensions/osc#theme'}],\n", + " \n", + " 'license': 'proprietary', // License of the product. Should be one of https://github.com/ESA-EarthCODE/open-science-catalog-validation/blob/main/schemas/license.json\n", + "\n", + "}\n", + "\n", + "```\n", + "\n", + "\n", + "In addition to specifying the links from the product to other parts of the catalog, **it is required** to add the reverse links, as in case of the Project to following elements: \n", + "- From the Product Collection.json to the Catalog.json (listing all products in the OSC)\n", + "- From the associated Project to the Product\n", + "- From the associated EO-Missions catalog to the Product\n", + "- From the associated Variables Catalog to the Product\n", + "- From the associated Themes Catalog to the Product\n", + "\n", + "1. Add the Product link to products/catalog.json by pasting the following in the links array: \n", + "\n", + "```json\n", + "{\n", + " 'rel':'child', \n", + " 'target: './worldcereal-crop-extent-belgium2/collection.json', // use the collectionid of the product\n", + " 'media_type': 'application/json',\n", + " 'title': 'WorldCereal Crop Extent - Belgium2' // title of the product as described in the collection.json file created before. \n", + "}\n", + "```\n", + "2. Add the links array to associated elements of the OSC. For example add following product to parent project:\n", + "\n", + "```json\n", + "{\n", + " \"rel\": \"related\",\n", + " \"href\": \"../../products/worldcereal-crop-extent-belgium2/collection.json\",\n", + " \"type\": \"application/json\",\n", + " \"title\": \"Product: WorldCereal Crop Extent - Belgium2\"\n", + "}\n", + "```\n", + "Similarly, add links to other OSC elements like eo-missions, variables, themes etc. " + ] + }, + { + "cell_type": "markdown", + "id": "f66d8ed0", + "metadata": {}, + "source": [ + "#### 2.3 Add new Workflow\n", + "\n", + "Workflows are the code and workflows associated with a project, that have been used to generate a specific product. Workflows follow `OGC record specifications` in contrast to OSC Projects and Products entries. However, the metadata of a workflow is also expressed in JSON format.\n", + "\n", + "\n", + "```json\n", + "{\n", + " 'conformsTo': [ // OGC spec, does not need to change\n", + " 'http://www.opengis.net/spec/ogcapi-records-1/1.0/req/record-core' \n", + " ],\n", + " 'type': 'Feature'// OGC spec requirement, does not need to change\n", + " 'geometry': None, // OGC spec requirement, does not need to change\n", + " 'linkTemplates': [], // OGC spec, does not need to change\n", + " 'id': 'worldcereal-workflow2', // unique workflow id\n", + "\n", + " 'links': [ // links to different parts of the catalog. The first two links should always be present and are always the same.\n", + " \n", + " {'href': '../../catalog.json',\n", + " 'rel': 'root',\n", + " 'title': 'Open Science Catalog',\n", + " 'type': 'application/json'},\n", + " {'href': '../catalog.json',\n", + " 'rel': 'parent',\n", + " 'title': 'Workflows',\n", + " 'type': 'application/json'},\n", + " {'href': '../../projects/worldcereal2/collection.json', // link to associated project\n", + " 'rel': 'related',\n", + " 'title': 'Project: WorldCereal2',\n", + " 'type': 'application/json'},\n", + " {'href': '../../themes/land/catalog.json', // link to associated themes in the themes array specified below\n", + " 'rel': 'related',\n", + " 'title': 'Theme: Land',\n", + " 'type': 'application/json'},\n", + " { // link to the openeo-process process graph that describes the workflow\n", + " 'href': 'https://raw.githubusercontent.com/WorldCereal/worldcereal-classification/refs/tags/worldcereal_crop_extent_v1.0.1/src/worldcereal/udp/worldcereal_crop_extent.json',\n", + " 'rel': 'openeo-process',\n", + " 'title': 'openEO Process Definition',\n", + " 'type': 'application/json'},\n", + " { // external link to the full workflow codebase\n", + " 'href': 'https://github.com/WorldCereal/worldcereal-classification.git',\n", + " 'rel': 'git',\n", + " 'title': 'Git source repository',\n", + " 'type': 'application/json'},\n", + " { // external link to the service used to run the workflow\n", + " 'href': 'https://openeofed.dataspace.copernicus.eu',\n", + " 'rel': 'service',\n", + " 'title': 'CDSE openEO federation',\n", + " 'type': 'application/json'}\n", + " ],\n", + " // OGC spec requirement to have a properties field, that contains most of the workflow metadata\n", + "\n", + " 'properties': {\n", + " \n", + " 'contacts': [{'emails': [{'value': 'marie-helene.rio@esa.int'}],\n", + " 'name': 'Marie-Helene Rio',\n", + " 'roles': ['technical_officer']},\n", + " {'name': 'CNR-INSTITUTE OF MARINE SCIENCES-ISMAR '\n", + " '(IT)',\n", + " 'roles': ['consortium_member']},\n", + " {'name': '+ATLANTIC – Association for an Atla '\n", + " '(PT)',\n", + " 'roles': ['consortium_member']}],\n", + " 'created': '2025-07-14T18:02:13Z', // date of workflow creation\n", + " 'updated': '2025-07-14T18:02:13Z', // date of workflow last update\n", + " 'version': '1' // workflow version\n", + " 'title': 'ESA worldcereal global crop extent detector2', // Short and meaningful title of the workflow\n", + " 'description': 'Detects crop land at 10m resolution, trained '\n", + " 'for global use. Based on Sentinel-1 and 2 '\n", + " 'data...', // Short and meaningful workflow description. Should provide specification on how the workflow can be executed and what it does.\n", + " 'keywords': ['agriculture', 'crops'], // workflow keywords (to enhance the findability of the workflow)\n", + " 'themes': [{'concepts': [{'id': 'land'}], // // this is an array of the ESA themes the project relates to. The fields are restricted to the themes available in the OCS. The format of the array is id:theme and having atleast one theme is mandatory.\n", + " 'scheme': 'https://github.com/stac-extensions/osc#theme'\n", + " }],\n", + " 'formats': [{'name': 'GeoTIFF'}], //format of worfklow output\n", + " 'osc:project': 'worldcereal2', // workflow related project\n", + " 'osc:status': 'completed', // workflow status\n", + " 'osc:type': 'workflow', // OSC type, for workflows should always be workflow\n", + " 'license': 'varuious', // workflow license\n", + " \n", + " }\n", + "}\n", + "```\n", + "\n", + "\n", + "In addition to specifying the links from the workflow to other parts of the catalog, **it is required** to add the reverse links:\n", + "\n", + "- From the Workflow record.json to the workflows/catalog.json (listing all workflows in the OSC)\n", + "- From the associated Project to the Workflow\n", + "- From the associated Themes to the Workflow" + ] + }, + { + "cell_type": "markdown", + "id": "72cee07e", + "metadata": {}, + "source": [ + "\n", + "\n", + "### 3. Open a PR to merge the new information into the OSC.\n", + "\n", + "After you have added all the information, commit and push your changes to the forked repository and open a pull request against the OSC - https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request .\n", + "\n", + "\n", + "**Once you open the PR, there will be an automatic validation run against the information you added** . If it fails you will have to change some of the added information. You can see if the PR is successfull based on the specific CI run, in the screen shot below. If you click on the red X, the validator will give you the specific reason for the failure.
\n", + "Please be advised that once a pull request (PR) is submitted to the open-science-catalog-metadata repository, it will undergo a review process conducted by members of the EarthCODE team. During this process, the content will be evaluated for completeness and accuracy. Should any additional information or modifications be required, you may be asked to update your PR accordingly. All communication related to the review will be provided through comments within the PR. \n" + ] + }, + { + "cell_type": "markdown", + "id": "302291e0", + "metadata": {}, + "source": [ + "## Alternatives\n", + "\n", + "- EarthCODE provides a [GUI editor](http://workspace.earthcode.earthcode.eox.at/osc-editor) to automatically create links and open a PR for you.\n", + "- If you are using one of the EarthCODE platforms, they provide specialised tools for automatic this work.\n", + "- You can use libraries like pystac to automate some of the required work. This tutorial shows how." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "pangeo", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/OSC/osc_pr_pystac.ipynb b/OSC/osc_pr_pystac.ipynb new file mode 100644 index 00000000..15a7e580 --- /dev/null +++ b/OSC/osc_pr_pystac.ipynb @@ -0,0 +1,808 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "c4f71528", + "metadata": {}, + "source": [ + "# Generating OSC information using pystac\n", + "\n", + "This notebook shows how to generate OSC Projects, Products and Workflows using pystac. EarthCODE provides a [GUI editor](http://workspace.earthcode.earthcode.eox.at/osc-editor) that offers this and more functionality, including a user interface. However, if you decide to manually create items, using a library like pystac can save some time. \n", + "The code described here does not carry out all the required steps to pass the automated OSC validation. For example, you still have to generate all return links as described in the manual PR tutorial. You'll also have to manually open the PR in the end.\n", + "\n", + "> NOTE: Before you run the notebook you'll need a fork of the open-science-catalog-metadata repository. See the Manual PR Tutorial about how to do it.\n", + "\n", + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b0191237", + "metadata": {}, + "outputs": [], + "source": [ + "import pystac\n", + "from datetime import datetime\n", + "from pystac.extensions.projection import ProjectionExtension" + ] + }, + { + "cell_type": "markdown", + "id": "ad1eceac", + "metadata": {}, + "source": [ + "### Get all entries from the Open Science Catalog" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "918b2a64", + "metadata": {}, + "outputs": [], + "source": [ + "# read the catalog root\n", + "catalog = pystac.Catalog.from_file('../open-science-catalog-metadata-staging/catalog.json')\n", + "\n", + "# access the list of the themes in open science catalog\n", + "themes = catalog.get_child('themes')\n", + "allowed_themes = [child.id for child in themes.get_children()]\n", + "\n", + "\n", + "# access the list of available ESA missions\n", + "missions = catalog.get_child('eo-missions')\n", + "allowed_missions = [child.id for child in missions.get_children()]\n", + "\n", + "# access the list of avaiable variables\n", + "variables = catalog.get_child('variables')\n", + "allowed_variables = [child.id for child in variables.get_children()]\n", + "\n", + "# access the list of existing projects, products and workflows\n", + "products = catalog.get_child('products')\n", + "projects = catalog.get_child('projects')\n", + "workflows = catalog.get_child('workflows')" + ] + }, + { + "cell_type": "markdown", + "id": "5430bed7", + "metadata": {}, + "source": [ + "### Define helper functions | Add new variables, theme and eo missions " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fb6a0c55", + "metadata": {}, + "outputs": [], + "source": [ + "def add_product_variables(collection, variables_to_add):\n", + " '''Add variables to the collection custom fields and add links to the missions collection.'''\n", + " \n", + " for variable in variables_to_add:\n", + " \n", + " assert variable in allowed_variables\n", + "\n", + " # add the correct link\n", + " collection.add_link(\n", + " pystac.Link(rel=\"related\", \n", + " target=variables.get_child(variable).get_links('self')[0].href, \n", + " media_type=\"application/json\",\n", + " title=f\"Variable: {variables.get_child(variable).title}\")\n", + " )\n", + "\n", + " # Add themes to the custom fields\n", + " collection.extra_fields.update({\n", + " \"osc:variables\": variables_to_add\n", + " })\n", + "\n", + "def add_themes(collection, themes_to_add):\n", + " '''Add themes to the collection custom fields and add links to the themes collection.'''\n", + " \n", + " themes_list = []\n", + " for theme in themes_to_add:\n", + " \n", + " assert theme in allowed_themes\n", + "\n", + " # add the correct link\n", + " collection.add_link(\n", + " pystac.Link(rel=\"related\", \n", + " target=themes.get_child(theme).get_links('self')[0].href, \n", + " media_type=\"application/json\",\n", + " title=f\"Theme: {themes.get_child(theme).title}\")\n", + " )\n", + " \n", + " themes_list.append(\n", + " {\n", + " \"scheme\": \"https://github.com/stac-extensions/osc#theme\",\n", + " \"concepts\": [{\"id\": theme}]\n", + " }\n", + " )\n", + "\n", + " # Add themes to the custom fields\n", + " collection.extra_fields.update({\n", + " \"themes\": themes_list\n", + " }\n", + " )\n", + "\n", + "\n", + "def add_links(collection, relations, targets, titles):\n", + "\n", + " '''Add links from the collection to outside websites.'''\n", + " links = []\n", + " \n", + " for rel, target, title in zip(relations, targets, titles):\n", + " links.append(pystac.Link(rel=rel, target=target, title=title)),\n", + " \n", + " collection.add_links(links)\n", + "\n", + "\n", + "def create_contract(name, roles, emails):\n", + " '''Create a contact template'''\n", + " contact = {\n", + " \"name\": name,\n", + " \"roles\": [r for r in roles]\n", + " }\n", + " if emails:\n", + " contact['emails'] = [{\"value\":email} for email in emails]\n", + " return contact\n", + "\n", + "def add_product_missions(collection, missions_to_add):\n", + " '''Add missions to the collection custom fields and add links to the missions collection.'''\n", + " \n", + " for mission in missions_to_add:\n", + " \n", + " assert mission in allowed_missions\n", + "\n", + " # add the correct link\n", + " collection.add_link(\n", + " pystac.Link(rel=\"related\", \n", + " target=missions.get_child(mission).get_links('self')[0].href, \n", + " media_type=\"application/json\",\n", + " title=f\"EO Mission: {missions.get_child(mission).title}\"\n", + " )\n", + " )\n", + "\n", + " # Add themes to the custom fields\n", + " collection.extra_fields.update({\n", + " \"osc:missions\": missions_to_add\n", + " }\n", + " )\n" + ] + }, + { + "cell_type": "markdown", + "id": "ccb1fbac", + "metadata": {}, + "source": [ + "### Define helper functions | Create new project collection " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "55c5c1d8", + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "def create_project_collection(project_id, project_title, project_description, \n", + " project_status, extent, project_license):\n", + "\n", + " '''Create project collection template from the provided information.'''\n", + "\n", + " # Create the collection\n", + " collection = pystac.Collection(\n", + " id=project_id,\n", + " description=project_description,\n", + " extent=extent,\n", + " license=project_license,\n", + " title=project_title,\n", + " extra_fields = {\n", + " \"osc:status\": project_status,\n", + " \"osc:type\": \"project\",\n", + " \"updated\": datetime.now().strftime(\"%Y-%m-%dT%H:%M:%SZ\")\n", + " },\n", + " stac_extensions=[\n", + " \"https://stac-extensions.github.io/osc/v1.0.0/schema.json\",\n", + " \"https://stac-extensions.github.io/themes/v1.0.0/schema.json\",\n", + " \"https://stac-extensions.github.io/contacts/v0.1.1/schema.json\"\n", + " ]\n", + " \n", + " )\n", + "\n", + " # Add pre-determined links \n", + " collection.add_links([\n", + " pystac.Link(rel=\"root\", target=\"../../catalog.json\", media_type=\"application/json\", title=\"Open Science Catalog\"),\n", + " pystac.Link(rel=\"parent\", target=\"../catalog.json\", media_type=\"application/json\", title=\"Projects\"),\n", + " # pystac.Link(rel=\"self\", target=f\"https://esa-earthcode.github.io/open-science-catalog-metadata/projects/{project_id}/collection.json\", media_type=\"application/json\"),\n", + " ])\n", + "\n", + " return collection\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "47339180", + "metadata": {}, + "source": [ + "### Define helper functions | Create new product collection " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "74f7046e", + "metadata": {}, + "outputs": [], + "source": [ + "def create_product_collection(product_id, product_title, product_description, product_extent, product_license,\n", + " product_keywords, product_status, product_region, product_project_id,\n", + " product_parameters=None, product_doi=None):\n", + "\n", + " collection = pystac.Collection(\n", + " id=product_id,\n", + " title=product_title,\n", + " description=product_description,\n", + " extent=product_extent,\n", + " license=product_license,\n", + " keywords=product_keywords,\n", + " stac_extensions=[\n", + " \"https://stac-extensions.github.io/osc/v1.0.0/schema.json\",\n", + " \"https://stac-extensions.github.io/themes/v1.0.0/schema.json\",\n", + " \"https://stac-extensions.github.io/cf/v0.2.0/schema.json\"\n", + " ],\n", + " )\n", + " \n", + " # Add pre-determined links \n", + " collection.add_links([\n", + " pystac.Link(rel=\"root\", target=\"../../catalog.json\", media_type=\"application/json\", title=\"Open Science Catalog\"),\n", + " pystac.Link(rel=\"parent\", target=\"../catalog.json\", media_type=\"application/json\", title=\"Products\"),\n", + " # pystac.Link(rel=\"self\", target=f\"https://esa-earthcode.github.io/open-science-catalog-metadata/products/{project_id}/collection.json\", media_type=\"application/json\"),\n", + " pystac.Link(rel=\"related\", target=f\"../../projects/{product_project_id}/collection.json\", media_type=\"application/json\", title=f\"Project: {project_title}\"),\n", + "\n", + " ])\n", + "\n", + " # Add extra properties\n", + " collection.extra_fields.update({\n", + " \"osc:project\": product_project_id,\n", + " \"osc:status\": product_status,\n", + " \"osc:region\": product_region,\n", + " \"osc:type\": \"product\",\n", + " \"created\": datetime.now().strftime(\"%Y-%m-%dT%H:%M:%SZ\"),\n", + " \"updated\": datetime.now().strftime(\"%Y-%m-%dT%H:%M:%SZ\"),\n", + " })\n", + "\n", + " if product_doi is not None:\n", + " collection.extra_fields[\"sci:doi\"] = product_doi\n", + "\n", + "\n", + " if product_parameters:\n", + " collection.extra_fields[\"cf:parameter\"] = [{\"name\": p} for p in product_parameters]\n", + " \n", + " return collection" + ] + }, + { + "cell_type": "markdown", + "id": "2c21cee1", + "metadata": {}, + "source": [ + "### Define helper functions | Create new workflow record " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25e60580", + "metadata": {}, + "outputs": [], + "source": [ + "def create_workflow_collection(workflow_id, workflow_title, \n", + " workflow_description, workflow_license, workflow_extent,\n", + " workflow_keywords, workflow_formats, workflow_project):\n", + "\n", + " '''Create a workflow collection template from the provided information.'''\n", + "\n", + " # Create the collection\n", + "\n", + " collection = {\n", + " 'id': workflow_id,\n", + " 'type': 'Feature',\n", + " 'geometry': None,\n", + " \"conformsTo\": [\"http://www.opengis.net/spec/ogcapi-records-1/1.0/req/record-core\"],\n", + " \"properties\": {\n", + " \"title\": workflow_title,\n", + " \"description\": workflow_description,\n", + " \"osc:type\": \"workflow\",\n", + " \"osc:project\": workflow_project,\n", + " \"osc:status\": \"completed\",\n", + " \"formats\": [{\"name\": f} for f in workflow_formats],\n", + " \"updated\": datetime.now().strftime(\"%Y-%m-%dT%H:%M:%SZ\"),\n", + " \"created\": datetime.now().strftime(\"%Y-%m-%dT%H:%M:%SZ\"),\n", + " \"keywords\": workflow_keywords,\n", + " \"license\": workflow_license,\n", + " \"version\": \"1\"\n", + " },\n", + " \"linkTemplates\": [],\n", + " \"links\": [\n", + " \n", + " {\n", + " \"rel\": \"root\",\n", + " \"href\": \"../../catalog.json\",\n", + " \"type\": \"application/json\",\n", + " \"title\": \"Open Science Catalog\"\n", + " }, \n", + " {\n", + " \"rel\": \"parent\",\n", + " \"href\": \"../catalog.json\",\n", + " \"type\": \"application/json\",\n", + " \"title\": \"Workflows\"\n", + " }, \n", + " \n", + " {\n", + " \"rel\": \"related\",\n", + " \"href\": f\"../../projects/{workflow_project}/collection.json\",\n", + " \"type\": \"application/json\",\n", + " \"title\": f\"Project: {project_title}\"\n", + " },\n", + " \n", + " ]\n", + "\n", + " }\n", + " \n", + " return collection\n" + ] + }, + { + "cell_type": "markdown", + "id": "0805ca97", + "metadata": {}, + "source": [ + "## Create a metadata collection for new project" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f89f7674", + "metadata": {}, + "outputs": [], + "source": [ + "# Define id, title, description, project status, license\n", + "project_id = \"worldcereal2\"\n", + "project_title = \"WorldCereal2\"\n", + "project_description = \"WorldCereal is an ESA initiative that provides global cropland and crop type maps at 10-meter resolution, offering seasonally updated data on temporary crops, croptypes (maize, winter cereals and spring cereals), and irrigation.\"\n", + "project_status = \"completed\"\n", + "project_license = 'proprietary'\n", + "\n", + "# Define spatial and temporal extent\n", + "spatial_extent = pystac.SpatialExtent([[-180.0, -90.0, 180.0, 90.0]])\n", + "temporal_extent = pystac.TemporalExtent([[datetime(2021, 1, 1), datetime(2021, 12, 31, 23, 59, 59)]])\n", + "extent = pystac.Extent(spatial=spatial_extent, temporal=temporal_extent)\n", + "\n", + "# Define links and link titles\n", + "project_link_targets = [\"https://esa-worldcereal.org/en\", \n", + " \"https://eo4society.esa.int/projects/worldcereal-global-crop-monitoring-at-field-scale/\"]\n", + "project_link_relations = [\"via\", \"via\"]\n", + "project_link_titles = [\"Website\", \"EO4Society Link\"]\n", + "\n", + "# Define project themes\n", + "project_themes = [\"land\"]\n", + "\n", + "# contacts\n", + "project_contracts_info = [\n", + " (\"Zoltan Szantoi\", [\"technical_officer\"], [\"Zoltan.Szantoi@esa.int\"]),\n", + " (\"VITO Remote Sensing\", [\"consortium_member\"], None)\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "990bdf9d", + "metadata": {}, + "outputs": [], + "source": [ + "collection = create_project_collection(project_id, project_title, project_description, \n", + " project_status, extent, project_license)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29d52b22", + "metadata": {}, + "outputs": [], + "source": [ + "# add links\n", + "add_links(collection, project_link_relations, project_link_targets, project_link_titles)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2d20bfd9", + "metadata": {}, + "outputs": [], + "source": [ + "## add themes\n", + "add_themes(collection, project_themes)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b02cf9a0", + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "# Add contacts\n", + "collection.extra_fields.update({\n", + "\n", + " \"contacts\": [create_contract(*info) for info in project_contracts_info]\n", + " \n", + "})" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7d45c051", + "metadata": {}, + "outputs": [], + "source": [ + "collection.validate()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e46fbe2e", + "metadata": {}, + "outputs": [], + "source": [ + "collection" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3aa1ecae", + "metadata": {}, + "outputs": [], + "source": [ + "# save this file and copy it to the catalog/projects/{project}/collection.json\n", + "collection.save_object(dest_href='project_collection.json')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "384e7a40", + "metadata": {}, + "outputs": [], + "source": [ + "# optionally run this code to transfer the generated file to the OSC folder, ready to be commited.\n", + "!mkdir -p ../open-science-catalog-metadata-staging/projects/worldcereal2/\n", + "!cp project_collection.json ../open-science-catalog-metadata-staging/projects/worldcereal2/collection.json" + ] + }, + { + "cell_type": "markdown", + "id": "05c72f72", + "metadata": {}, + "source": [ + "## Create a metadata collection for new product" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "72642f9f", + "metadata": {}, + "outputs": [], + "source": [ + "product_id = \"worldcereal-crop-extent-belgium2\"\n", + "product_title = \"WorldCereal Crop Extent - Belgium2\"\n", + "product_description = \"WorldCereal is an ESA initiative that provides global cropland and crop type maps at 10-meter resolution, offering seasonally updated data on temporary crops, croptypes (maize, winter cereals and spring cereals), and irrigation. This dataset provides the outputs for Belgium.\"\n", + "product_keywords = [\n", + " \"Crops\",\n", + " \"Cereal\"\n", + "]\n", + "product_status = \"ongoing\"\n", + "product_license = \"proprietary\"\n", + "\n", + "# Define spatial and temporal extent\n", + "product_spatial_extent = pystac.SpatialExtent([[2.5135, 49.529, 6.156, 51.475]])\n", + "product_temporal_extent = pystac.TemporalExtent([[datetime(2021, 1, 1), datetime(2021, 12, 31, 23, 59, 59)]])\n", + "product_extent = pystac.Extent(spatial=spatial_extent, temporal=temporal_extent)\n", + "product_region = \"Belgium\"\n", + "product_themes = [\"land\"]\n", + "product_missions = [ \"sentinel-2\"]\n", + "product_variables = [ \"crop-yield-forecast\" ]\n", + "product_parameters = [ \"crop-yield-forecast\" ]\n", + "\n", + "product_project_id = \"worldcereal2\"\n", + "\n", + "product_doi = \"https://doi.org/10.57780/s3d-83ad619\"\n", + "\n", + "\n", + "# define links to add\n", + "\n", + "product_target_relations = ['child', 'via', 'via']\n", + "product_target_links = ['https://eoresults.esa.int/stac/collections/sentinel3-ampli-ice-sheet-elevation',\n", + " 'https://eoresults.esa.int/browser/#/external/eoresults.esa.int/stac/collections/sentinel3-ampli-ice-sheet-elevation',\n", + " 'https://eoresults.esa.int/d/sentinel3-ampli-ice-sheet-elevation/2025/05/07/sentinel-3-ampli-user-handbook/S3_AMPLI_User_Handbook.pdf']\n", + "product_target_titles = ['PRR link', 'Access', 'Documentation']\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3644abb4", + "metadata": {}, + "outputs": [], + "source": [ + "product_collection = create_product_collection(product_id, product_title, product_description, product_extent, product_license,\n", + " product_keywords, product_status, product_region, product_project_id,\n", + " product_parameters, product_doi)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4fa6fcdc", + "metadata": {}, + "outputs": [], + "source": [ + "# add themes\n", + "add_themes(product_collection, product_themes)\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "37155798", + "metadata": {}, + "outputs": [], + "source": [ + "add_product_missions(product_collection, product_missions)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2cfaf55f", + "metadata": {}, + "outputs": [], + "source": [ + "add_product_variables(product_collection, product_variables)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cebf804a", + "metadata": {}, + "outputs": [], + "source": [ + "# add links\n", + "add_links(product_collection,\n", + " product_target_relations,\n", + " product_target_links,\n", + " product_target_titles\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0227fb4b", + "metadata": {}, + "outputs": [], + "source": [ + "product_collection.validate()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7e5300ae", + "metadata": {}, + "outputs": [], + "source": [ + "product_collection" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1a645384", + "metadata": {}, + "outputs": [], + "source": [ + "# save this file and copy it to the catalog/products/{product_id}/collection.json\n", + "product_collection.save_object(dest_href='product_collection.json')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "931c340c", + "metadata": {}, + "outputs": [], + "source": [ + "# optionally run this code to transfer the generated file to the OSC folder, ready to be commited.\n", + "!mkdir -p ../open-science-catalog-metadata-staging/products/worldcereal-crop-extent-belgium2/\n", + "!cp product_collection.json ../open-science-catalog-metadata-staging/products/worldcereal-crop-extent-belgium2/collection.json" + ] + }, + { + "cell_type": "markdown", + "id": "b04b6ec5", + "metadata": {}, + "source": [ + "## Create a metadata collection for new workflow" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1e9bce8c", + "metadata": {}, + "outputs": [], + "source": [ + "workflow_id = \"worldcereal-workflow2\"\n", + "workflow_title=\"ESA worldcereal global crop extent detector2\"\n", + "workflow_description=\"Detects crop land at 10m resolution, trained for global use. Based on Sentinel-1 and 2 data...\"\n", + "workflow_license = \"proprietary\"\n", + "workflow_keywords= [\"agriculture\", \"crops\"]\n", + "workflow_formats = [\"GeoTIFF\"]\n", + "workflow_project = \"worldcereal2\"\n", + "workflow_themes = ['land']\n", + "\n", + "# Define spatial and temporal extent\n", + "spatial_extent = pystac.SpatialExtent([[-180.0, -90.0, 180.0, 90.0]])\n", + "temporal_extent = pystac.TemporalExtent([[datetime(2022, 2, 1), datetime(2026, 1, 31, 23, 59, 59)]])\n", + "workflow_extent = pystac.Extent(spatial=spatial_extent, temporal=temporal_extent)\n", + "\n", + "\n", + "# add custom theme schemas\n", + "\n", + "workflow_contracts_info = [\n", + " (\"Marie-Helene Rio\", [\"technical_officer\"], [\"marie-helene.rio@esa.int\"]),\n", + " (\"CNR-INSTITUTE OF MARINE SCIENCES-ISMAR (IT)\", [\"consortium_member\"], None),\n", + " (\"+ATLANTIC – Association for an Atla (PT)\", [\"consortium_member\"], None),\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "32ae9134", + "metadata": {}, + "outputs": [], + "source": [ + "workflow_collection = create_workflow_collection(workflow_id, workflow_title, \n", + " workflow_description, workflow_license, workflow_extent,\n", + " workflow_keywords, workflow_formats, workflow_project)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ce842c95", + "metadata": {}, + "outputs": [], + "source": [ + "# add contacts\n", + "workflow_collection['properties'].update({\n", + "\n", + " \"contacts\": [create_contract(*info) for info in workflow_contracts_info]\n", + " \n", + "})\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e3a8a642", + "metadata": {}, + "outputs": [], + "source": [ + "workflow_collection['properties']['themes'] = [\n", + " {\n", + " \"scheme\": \"https://github.com/stac-extensions/osc#theme\",\n", + " \"concepts\": [{\"id\": t} for t in workflow_themes]\n", + " }\n", + "]\n", + "\n", + "for t in workflow_themes:\n", + " workflow_collection['links'].append(\n", + " {\n", + " \"rel\": 'related',\n", + " \"href\": f\"../../{t}/land/catalog.json\",\n", + " \"type\": \"application/json\",\n", + " \"title\": f'Theme: {t.capitalize()}'\n", + " }\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bd0b0451", + "metadata": {}, + "outputs": [], + "source": [ + "workflow_target_relations = ['openeo-process', 'git', 'service']\n", + "workflow_target_links = ['https://raw.githubusercontent.com/WorldCereal/worldcereal-classification/refs/tags/worldcereal_crop_extent_v1.0.1/src/worldcereal/udp/worldcereal_crop_extent.json',\n", + " 'https://github.com/WorldCereal/worldcereal-classification.git',\n", + " 'https://openeofed.dataspace.copernicus.eu']\n", + "workflow_target_titles = ['openEO Process Definition', 'Git source repository', 'CDSE openEO federation']\n", + "\n", + "for rel, link, title in zip(workflow_target_relations, workflow_target_links, workflow_target_titles):\n", + " workflow_collection['links'].append(\n", + " {\n", + " \"rel\": rel,\n", + " \"href\": link,\n", + " \"type\": \"application/json\",\n", + " \"title\": title\n", + " }\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e593c92d", + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "with open('record.json', 'w') as f:\n", + " json.dump(workflow_collection, f)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c5378915", + "metadata": {}, + "outputs": [], + "source": [ + "# optionally run this code to transfer the generated file to the OSC folder, ready to be commited.\n", + "!mkdir -p ../open-science-catalog-metadata-staging/workflows/worldcereal-workflow2/\n", + "!cp record.json ../open-science-catalog-metadata-staging/workflows/worldcereal-workflow2/record.json" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "pangeo", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/PRR/Creating STAC Catalog_from_PRR_example.ipynb b/PRR/Creating STAC Catalog_from_PRR_example.ipynb new file mode 100644 index 00000000..5a022f87 --- /dev/null +++ b/PRR/Creating STAC Catalog_from_PRR_example.ipynb @@ -0,0 +1,544 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "bead50f2-482d-4bd6-a892-8ec1ff6d705c", + "metadata": {}, + "source": [ + "# Creating STAC Catalog from the PRR - Example from SRAL Processing over Land Ice Dataset\n", + "\n", + "**This is an example notebook for creating the STAC Items uploaded to ESA Project Results Repository and made available at**: https://eoresults.esa.int/browser/#/external/eoresults.esa.int/stac/collections/sentinel3-ampli-ice-sheet-elevation\n", + "\n", + "Dataset is also discoverable via Open Science Catalogue, providing access to created in this tutorial collection stored in ESA Project Results Repository (PRR). \n", + "https://opensciencedata.esa.int/products/sentinel3-ampli-ice-sheet-elevation/collection \n", + "\n", + "It focuses on generating metadata for a project with a hundreads of items, each of which has hundreads of `netcdf` assets.\n", + "\n", + "Check the [EarthCODE documentation](https://earthcode.esa.int/), and [PRR STAC introduction example](https://esa-earthcode.github.io/examples/prr-stac-introduction) for a more general introduction to STAC and the ESA PRR.\n", + "\n", + "\n", + "\n", + "The code below demonstrates how to perform the necessary steps using real data from the ESA project **SRAL Processing over Land Ice\n", + "**. With the focus of the project on improving Sentinel-3 altimetry performances over land ice.\n", + "\n", + "🔗 Check the : [User handbook](https://eoresults.esa.int/d/sentinel3-ampli-ice-sheet-elevation/2025/05/07/sentinel-3-ampli-user-handbook/S3_AMPLI_User_Handbook.pdf)\n", + "\n", + "🔗 Check the : [Scientifc publication](http://doi.org/https://doi.org/10.57780/s3d-83ad619)\n", + "\n", + "#### Acknowledgment \n", + "We gratefully acknowledge the **SRAL Processing over Land Ice team** for providing access to the data used in this example, as well as support in creating it.\n", + "\n", + "\n", + "### Steps described in this notebook\n", + "This notebook presents the workflow for generating a PRR Collection for the entire dataset coming from the project. To create a valid STAC Items and Collection you should follow steps described below:\n", + "1. Generate a root STAC Collection\n", + "2. Group your dataset files into STAC Items and STAC Assets\n", + "3. Add the Items to the collection\n", + "4. Save the normalised collection \n", + "\n", + "Due to the complexity of the project and the time it takes to process the data, the STAC Items are generated first and stored locally. They are added to the collection afterwards.\n", + "Furthermore, since we are working with thousands of files, we are using the links from the PRR directly. When the notebook was created originally all the files were available locally.\n", + "\n", + "This notebook can be used as an example for following scenario(s): \n", + "1. Creating the STAC Items from the files stored locally\n", + "2. Creating the STAC Items from files stored in the s3bucket or other cloud repository \n", + "3. Creating the STAC Items from files already ingested into PRR\n", + "\n", + "Of course if your files are locally stored, or stored in a different S3 Bucket the access to them (roor_url and items paths) should be adapted according to your dataset location. \n", + "\n", + "> Note: Due to the original size of the dataset ~ 100GB, running this notebook end to end may take hours. We do advise therefore to trying it on your own datasets by changing file paths to be able to produe valid STAC Collaction and STAC Items. " + ] + }, + { + "cell_type": "markdown", + "id": "48e929f8-1160-4f98-8be6-b5f78f88d003", + "metadata": {}, + "source": [ + "## Loading Libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b5b477ae-e0a5-49a4-9544-1d7bf0fce337", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import json\n", + "import time\n", + "import pystac\n", + "import rasterio\n", + "from shapely import box\n", + "import pandas as pd\n", + "import xarray as xr\n", + "from datetime import datetime\n", + "from dateutil.parser import isoparse\n", + "from dateutil import parser\n", + "from dateutil.parser import parse" + ] + }, + { + "cell_type": "markdown", + "id": "124cdd42-d703-461f-85fd-7b21a1b0c387", + "metadata": {}, + "source": [ + "## 2. Load Product files stored in ESA Project Results Repository" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ed3cd796", + "metadata": {}, + "outputs": [], + "source": [ + "root_url = 'https://eoresults.esa.int' # provide a root url for the datasets items " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "09af6b92", + "metadata": {}, + "outputs": [], + "source": [ + "# get all items for the S3 AMPLI collection from the PRR STAC API\n", + "items = pystac.ItemCollection.from_file('https://eoresults.esa.int/stac/collections/sentinel3-ampli-ice-sheet-elevation/items?limit=10_000')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "621c47eb", + "metadata": {}, + "outputs": [], + "source": [ + "# get the paths to all the data\n", + "\n", + "# using a dictionary is faster than using pystac\n", + "items_dict = items.to_dict()\n", + "all_item_paths = []\n", + "for item in items_dict['features']:\n", + " assets = item['assets']\n", + " for asset_name, asset_dict in assets.items():\n", + " if asset_dict['roles'] == ['data']:\n", + " all_item_paths.append(asset_dict['href'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "27435f9f", + "metadata": {}, + "outputs": [], + "source": [ + "# Create a list of EO Missions and instruments as well as region of the dataset and cycles\n", + "instruments = ['sentinel-3a', 'sentinel-3b']\n", + "regions = ['antarctica', 'greenland']\n", + "cycles = [f\"cycle{str(i).zfill(3)}\" for i in range(5, 112)] # Cycle005 to Cycle111" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a7259ec9", + "metadata": {}, + "outputs": [], + "source": [ + "# Assign the instrument name based on the acronym used in the file name\n", + "renaming = {\n", + " 'S3A': 'sentinel-3a',\n", + " 'S3B': 'sentinel-3b',\n", + " 'ANT': 'antarctica',\n", + " 'GRE': 'greenland'\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "59f908cb", + "metadata": {}, + "source": [ + "Define geometries, which are the same for all items within the same region. If they are not, these have to be extracted from the assets inside the item." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "34a9a318", + "metadata": {}, + "outputs": [], + "source": [ + "# Define the spatial extent (bbox) for each region of interest\n", + "greenland_bbox = [-74.0, 59.0, -10.0, 84.0]\n", + "greenland_geometry = json.loads(json.dumps(box(*greenland_bbox).__geo_interface__))\n", + "\n", + "antarctica_bbox = [-180.0, -90.0, 180.0, -60.0]\n", + "antarctica_geometry = json.loads(json.dumps(box(*antarctica_bbox).__geo_interface__))\n" + ] + }, + { + "cell_type": "markdown", + "id": "4cde8425", + "metadata": {}, + "source": [ + "### 2.1 Group the files by the instruments, region and cycle of the dataset" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e293aeb9", + "metadata": {}, + "outputs": [], + "source": [ + "data = []\n", + "\n", + "for ipath in all_item_paths:\n", + " splitname = ipath.split('/')[-1].split('_')\n", + " instrument = splitname[0]\n", + " cycle = splitname[9]\n", + " region = splitname[-2]\n", + "\n", + " data.append((renaming[instrument], renaming[region], cycle, ipath))\n", + "\n", + "\n", + "filedata = pd.DataFrame(data, columns=['instrument', 'region', 'cycle', 'path'])" + ] + }, + { + "cell_type": "markdown", + "id": "5686090d", + "metadata": {}, + "source": [ + "## 3. Create the STAC Items with the metadata from the original files loaded from the PRR" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d79789c7", + "metadata": {}, + "outputs": [], + "source": [ + "# group all files into items from the same instrument, region and cycle\n", + "for (instrument, region, cycle), links in filedata.groupby(['instrument', 'region', 'cycle']):\n", + " \n", + " # open the metadata attributes for each file in the group\n", + " datasets = [xr.open_dataset(root_url + link + '#mode=bytes') for link in links['path']]\n", + "\n", + "\n", + " # Define the Temporal extent\n", + " first_item = datasets[0]\n", + " last_item = datasets[-1]\n", + " props = first_item.attrs\n", + " props2 = last_item.attrs\n", + "\n", + " start_datetime = props.get(\"first_meas_time\")\n", + " end_datetime = props2.get(\"last_meas_time\")\n", + "\n", + " # Define the geometry\n", + " if props['zone'] == 'Antarctica':\n", + " bbox = antarctica_bbox\n", + " geometry = antarctica_geometry\n", + " elif props['zone'] == 'Greenland':\n", + " bbox = greenland_bbox\n", + " geometry = greenland_geometry\n", + "\n", + "\n", + " # Shared properties\n", + " properties = {\n", + " \"start_datetime\": start_datetime,\n", + " \"end_datetime\": end_datetime,\n", + " \"created\": props.get(\"processing_date\"),\n", + " \"description\": f\"Sentinel-3 AMPLI Land Ice Level-2 product acquired by {instrument.capitalize()} platform derived from the SRAL altimeter in Earth Observation mode over {region} region.\",\n", + " \"conventions\": props.get(\"Conventions\"),\n", + " \"platform_name\": props.get(\"platform_name\"),\n", + " \"platform_serial_identifier\": props.get(\"platform_serial_identifier\"),\n", + " \"altimeter_sensor_name\": props.get(\"altimeter_sensor_name\"),\n", + " \"operational_mode\": props.get(\"operational_mode\"),\n", + " \"cycle_number\": props.get(\"cycle_number\"),\n", + " \"netcdf_version\": props.get(\"netcdf_version\"),\n", + " \"product_type\": props.get(\"product_type\"),\n", + " \"timeliness\": props.get(\"timeliness\"),\n", + " \"institution\": props.get(\"institution\"),\n", + " \"processing_level\": props.get(\"processing_level\"),\n", + " \"processor_name\": props.get(\"processor_name\"),\n", + " \"processor_version\": props.get(\"processor_version\"),\n", + " \"references\": props.get(\"references\"),\n", + " \"zone\": props.get(\"zone\"),\n", + " }\n", + "\n", + "\n", + " # Create STAC item for the cycle\n", + " item = pystac.Item(\n", + " id=f\"sentinel-3{props.get(\"platform_serial_identifier\").lower()}-{props.get(\"zone\").lower()}-{cycle.lower()}\",\n", + " geometry=geometry,\n", + " bbox=bbox,\n", + " datetime=isoparse(start_datetime),\n", + " properties=properties\n", + " )\n", + "\n", + " item.stac_version = \"1.1.0\"\n", + " item.stac_extensions = [\n", + " \"https://stac-extensions.github.io/projection/v1.1.0/schema.json\",\n", + " \"https://stac-extensions.github.io/raster/v1.1.0/schema.json\",\n", + " \"https://stac-extensions.github.io/eo/v1.1.0/schema.json\"\n", + " ]\n", + "\n", + " item.assets = {}\n", + "\n", + " # Add assets from that cycle\n", + " for nc_href, ds in zip(links['path'], datasets):\n", + "\n", + " asset_title = ds.attrs['product_name']\n", + " extra_fields = {\n", + " \"cycle_number\": str(ds.attrs.get(\"cycle_number\")),\n", + " \"orbit_number\": str(ds.attrs.get(\"orbit_number\")),\n", + " \"relative_orbit_number\": str(ds.attrs.get(\"relative_orbit_number\")),\n", + " \"orbit_direction\": ds.attrs.get(\"orbit_direction\"),\n", + " }\n", + "\n", + " item.add_asset(\n", + " key=asset_title,\n", + " asset=pystac.Asset(\n", + " href=nc_href,\n", + " media_type=\"application/x-netcdf\",\n", + " roles=[\"data\"],\n", + " extra_fields=extra_fields\n", + " )\n", + " )\n", + "\n", + " # Save STAC item per cycle\n", + " json_filename = f\"sentinel-3{props.get(\"platform_serial_identifier\").lower()}-{props.get(\"zone\").lower()}-{cycle.lower()}.json\"\n", + " item.save_object(dest_href='examples/' + json_filename, include_self_link=False)\n", + " print(f\" Saved {json_filename}\")" + ] + }, + { + "cell_type": "markdown", + "id": "25640a5e-5e12-4604-b09a-009263a3ce67", + "metadata": {}, + "source": [ + "### 3.1 Import documentation " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2f50b311-f345-4862-866e-394606e29512", + "metadata": {}, + "outputs": [], + "source": [ + "import pystac\n", + "from datetime import datetime\n", + "import os\n", + "from datetime import datetime, timezone\n", + "\n", + "date_str = \"07/05/2025\"\n", + "\n", + "# Convert to ISO format string (YYYY-MM-DD)\n", + "iso_like_str = datetime.strptime(date_str, \"%d/%m/%Y\").strftime(\"%Y-%m-%d\")\n", + "\n", + "# Parse with isoparse and attach UTC timezone\n", + "dt_utc = isoparse(iso_like_str).replace(tzinfo=timezone.utc)\n", + "\n", + "print(dt_utc.isoformat())" + ] + }, + { + "cell_type": "markdown", + "id": "127296bc", + "metadata": {}, + "source": [ + "### 3.2 Create STAC Item for the documentation associated to the dataset " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a04b5330-2380-488e-8a44-4dc09eea18e5", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Basic metadata\n", + "doc_href = \"/d/S3_AMPLI_User_Handbook.pdf\" # Relative or absolute href\n", + "doc_title = \"Sentinel-3 Altimetry over Land Ice: AMPLI level-2 Products\"\n", + "doc_description = \"User Handbook for Sentinel-3 Altimetry over Land Ice: AMPLI level-2 Products\"\n", + "\n", + "# Create STAC item\n", + "item = pystac.Item(\n", + " id=\"sentinel-3-ampli-user-handbook\",\n", + " geometry=None,\n", + " bbox=None,\n", + " datetime=dt_utc,\n", + " properties={\n", + " \"title\": doc_title,\n", + " \"description\": doc_description,\n", + " \"reference\": \"CLS-ENV-MU-24-0389\",\n", + " \"issue_n\": dt_utc.isoformat()\n", + " }\n", + ")\n", + "\n", + "# Add asset for the PDF\n", + "item.add_asset(\n", + " key=\"documentation\",\n", + " asset=pystac.Asset(\n", + " href=doc_href,\n", + " media_type=\"application/pdf\",\n", + " roles=[\"documentation\"],\n", + " title=doc_title\n", + " )\n", + ")\n", + "\n", + "# Save to file\n", + "item.set_self_href(\"examples/sentinel-3-ampli-user-handbook.json\")\n", + "item.save_object(include_self_link=False)\n", + "\n", + "print(\"📄 STAC Item for documentation created: sentinel-3-ampli-user-handbook.json\")" + ] + }, + { + "cell_type": "markdown", + "id": "29107c8d", + "metadata": {}, + "source": [ + "## 4. Generate valid STAC collection\n", + "\n", + "Once all the assets are processed, create the parent collection for all Items created in the previous step." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8279ea13", + "metadata": {}, + "outputs": [], + "source": [ + "collection = pystac.Collection.from_dict(\n", + "\n", + "{\n", + " \"id\": \"sentinel3-ampli-ice-sheet-elevation\",\n", + " \"type\": \"Collection\",\n", + " \"links\": [\n", + " ],\n", + " \"title\": \"Sentinel-3 AMPLI Ice Sheet Elevation\",\n", + " \"extent\": {\n", + " \"spatial\": {\n", + " \"bbox\": [\n", + " [-180, -90, 180, 90]\n", + " ]\n", + " },\n", + " \"temporal\": {\n", + " \"interval\": [\n", + " [\n", + " \"2016-06-01T00:00:00Z\",\n", + " \"2024-05-09T00:00:00Z\"\n", + " ]\n", + " ]\n", + " }\n", + " },\n", + " \"license\": \"CC-BY-4.0\",\n", + " \"summaries\": {\n", + " \"references\": [\n", + " \"https://doi.org/10.5194/egusphere-2024-1323\"\n", + " ],\n", + " \"institution\": [\n", + " \"CNES\"\n", + " ],\n", + " \"platform_name\": [\n", + " \"SENTINEL-3\"\n", + " ],\n", + " \"processor_name\": [\n", + " \"Altimeter data Modelling and Processing for Land Ice (AMPLI)\"\n", + " ],\n", + " \"operational_mode\": [\n", + " \"Earth Observation\"\n", + " ],\n", + " \"processing_level\": [\n", + " \"2\"\n", + " ],\n", + " \"processor_version\": [\n", + " \"v1.0\"\n", + " ],\n", + " \"altimeter_sensor_name\": [\n", + " \"SRAL\"\n", + " ]\n", + " },\n", + " \"description\": \"Ice sheet elevation estimated along the Sentinel-3 satellite track, as retrieved with the Altimeter data Modelling and Processing for Land Ice (AMPLI). The products cover Antarctica and Greenland.\",\n", + " \"stac_version\": \"1.1.0\"\n", + "}\n", + ")\n", + "collection" + ] + }, + { + "cell_type": "markdown", + "id": "9e299244", + "metadata": {}, + "source": [ + "### 4.1. Add items to collection\n", + "Once the collection is created read all the items from disk and add the necassary links." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "03cf5c90", + "metadata": {}, + "outputs": [], + "source": [ + "import glob\n", + "for fpath in glob.glob('examples/*'):\n", + " collection.add_item(pystac.Item.from_file(fpath))" + ] + }, + { + "cell_type": "markdown", + "id": "b8db4163", + "metadata": {}, + "source": [ + "### 4.2 Save the normalised collection" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9f1156ff", + "metadata": {}, + "outputs": [], + "source": [ + "# save the full self-contained collection\n", + "collection.normalize_and_save(\n", + " root_href='../data/example_catalog_ampli/',\n", + " catalog_type=pystac.CatalogType.SELF_CONTAINED\n", + ")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "pangeo", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/PRR/PRR_STAC_download_example.ipynb b/PRR/PRR_STAC_download_example.ipynb new file mode 100644 index 00000000..97cd8c8e --- /dev/null +++ b/PRR/PRR_STAC_download_example.ipynb @@ -0,0 +1,601 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "13408f04-ac61-4d5a-8711-28367f0cc50d", + "metadata": {}, + "source": [ + "# ESA Project Results Repository (PRR) Data Access and Collections Preview" + ] + }, + { + "cell_type": "markdown", + "id": "2ecaefdc-773f-44f2-84f0-0d73f291dc66", + "metadata": {}, + "source": [ + "This notebook has been created to support the access to the users of EarthCODE and APEX, who would like to exploit available products and project results stored in the [ESA Project Results Repository (PRR)](https://eoresults.esa.int/). PRR provides access to data, workflows, experiments and documentation from ESA EOP-S Projects organised across Collections, accessible via [OGC Records](https://ogcapi.ogc.org/records) e S[TAC API](https://github.com/radiantearth/stac-api-spec).\n", + "\n", + "Each collection contains [STAC Items](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md), with their related assets stored within the PRR storage.\n", + "\n", + "Scientists/commercial companies can access the PRR via the [EarthCODE](https://earthcode.esa.int/) and [APEx](https://esa-apex.github.io/apex_documentation/) projects.\n", + "\n", + "Use following notebook cells to preview the content of the ESA PRR and request the download of selected products. " + ] + }, + { + "cell_type": "markdown", + "id": "522ba326", + "metadata": {}, + "source": [ + "### Loading Libraries and set up logging level" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8defa779", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import os\n", + "import logging\n", + "import pprint\n", + "import shutil\n", + "from urllib.parse import urljoin\n", + "from urllib.request import urlretrieve\n", + "\n", + "#Make sure you have installed pystac_client before running this\n", + "from pystac_client import Client\n", + "\n", + "# set pystac_client logger to DEBUG to see API calls\n", + "logging.basicConfig()\n", + "logger = logging.getLogger(\"pystac_client\")\n", + "logger.setLevel(logging.DEBUG)\n" + ] + }, + { + "cell_type": "markdown", + "id": "0cb5268b", + "metadata": {}, + "source": [ + "### Connect to ESA PRR Catalog and display the list of collections available" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b2ecfa12", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# URL of the STAC Catalog to query\n", + "catalog_url = \"https://eoresults.esa.int/stac\"\n", + "\n", + "# custom headers\n", + "headers = []\n", + "\n", + "cat = Client.open(catalog_url, headers=headers)\n", + "cat # display the basic informaiton about PRR Catalog in STAC Format" + ] + }, + { + "cell_type": "markdown", + "id": "0438c280-1c01-472e-80d3-802ef508b32b", + "metadata": {}, + "source": [ + "
\n", + "\n", + "Use the cell below to access entire **list of collections available in ESA PRR.**
\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6db1da45-bca2-421d-bf13-6e4790231554", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "collection_search = cat.collection_search(limit=150)\n", + "print(f\"Total number of collections found in ESA PRR is {collection_search.matched()}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "17ac8cab-daa3-4cff-93a4-8935c1d737c0", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Display the name of the names of collection (collection-ids) to be used to filter the colleciton of interest\n", + "for collection in collection_search.collections_as_dicts():\n", + " print(collection.get(\"id\", \"Unnamed Collection\"))" + ] + }, + { + "cell_type": "markdown", + "id": "17692c1e-9e7f-4f2e-b015-839309fb72c8", + "metadata": {}, + "source": [ + "
\n", + "
\n", + "Alternatively, you can display the metadata of all STAC Collections available
\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5c1a9baf-ab62-43aa-b3ac-7545b7713e1c", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Or they can be displayed with their full metadata\n", + "collection_search = cat.collection_search(\n", + " datetime='2023-04-02T00:00:00Z/2024-08-10T23:59:59Z', #this is an additional filter to be added to filter the collections based on the date.\n", + " limit=10\n", + ")\n", + "print(f\"{collection_search.matched()} collections found\")\n", + "print(\"PRR available Collections\\n\")\n", + "\n", + "for results in collection_search.collections_as_dicts(): # maybe this part should not display entire dic\n", + " pp = pprint.PrettyPrinter(depth=4)\n", + " pp.pprint(results)" + ] + }, + { + "cell_type": "markdown", + "id": "824264f1", + "metadata": { + "tags": [] + }, + "source": [ + "### Open Sentinel-3 AMPLI Ice Sheet Elevation collection" + ] + }, + { + "cell_type": "markdown", + "id": "90b7aaef-d656-4398-b642-49ca9aad3acc", + "metadata": {}, + "source": [ + "To access specific collection, we will use the *collection id* from the cell above. Type `sentinel3-ampli-ice-sheet-elevation` to connect to selected collection and display its metadata. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f9543e09", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "collection = cat.get_collection(\"sentinel3-ampli-ice-sheet-elevation\") # place here the id of the selected collection\n", + "#collection # or use simply json metadata to display the information \n", + "print(\"PRR Sentinel-3 AMPLI Collection\\n\")\n", + "pp = pprint.PrettyPrinter(depth=4)\n", + "pp.pprint(collection.to_dict())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eaa9f1f3-be46-4a3a-8454-42c43610b33b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "#Or display it in the STAC file format to better visualise the attributes and properties \n", + "collection" + ] + }, + { + "cell_type": "markdown", + "id": "b0a0e61d-27b2-405c-af2c-3d26ae450d96", + "metadata": {}, + "source": [ + "
\n", + "
\n", + "\n", + "From the cell below, we will retrieve and explore **queryable fields** from a **STAC API**, which allows us to understand what parameters we can use for filtering our searches.
\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7181ff09-5884-41d0-ade8-dd73e9f3ec58", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "queryable = collection.get_queryables()\n", + "\n", + "pp = pprint.PrettyPrinter(depth=4)\n", + "pp.pprint(queryable)" + ] + }, + { + "cell_type": "markdown", + "id": "d1153415-6360-4a40-877c-2c7322581f6f", + "metadata": { + "tags": [] + }, + "source": [ + "### Display STAC Items from Sentinel-3 AMPLI Ice Sheet Elevation collection " + ] + }, + { + "cell_type": "markdown", + "id": "b1bf8c6a", + "metadata": {}, + "source": [ + "By executing the cell below you will get the ids of items that can be found in the specific collection (requested above).
\n", + "First five items from the list are printed out. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ad0b4c7b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "items = collection.get_items()\n", + "\n", + "# flush stdout so we can see the exact order that things happen\n", + "def get_five_items(items):\n", + " for i, item in enumerate(items):\n", + " print(f\"{i}: {item}\", flush=True)\n", + " if i == 4:\n", + " return\n", + " \n", + "print(\"First page\", flush=True)\n", + "get_five_items(items)\n", + "\n", + "print(\"Second page\", flush=True)\n", + "get_five_items(items)" + ] + }, + { + "cell_type": "markdown", + "id": "2a51b4c7", + "metadata": {}, + "source": [ + "Now execute a **search with a set of parameters**. In this case it returns just one item because **we filter on one queryable parameter** `(id)`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7fb8689e", + "metadata": { + "collapsed": true, + "jupyter": { + "outputs_hidden": true + }, + "tags": [] + }, + "outputs": [], + "source": [ + "#Search for items based on spatio-temporal properties\n", + "\n", + "# AOI entire world\n", + "geom = {\n", + " \"type\": \"Polygon\",\n", + " \"coordinates\": [\n", + " [\n", + " [-180, -90],\n", + " [-180, 90],\n", + " [180 , 90],\n", + " [180, -90],\n", + " [-180, -90],\n", + " ]\n", + " ],\n", + "}\n", + "\n", + "# limit sets the # of items per page so we can see multiple pages getting fetched\n", + "#In this search we apply also filtering on ID that is one of the searchable parameters for the colletion\n", + "search = cat.search(\n", + " max_items=7,\n", + " limit=5,\n", + " collections=\"sentinel3-ampli-ice-sheet-elevation\", # specify collection id\n", + " intersects=geom,\n", + " query={\"id\": {\"eq\": \"sentinel-3a-antarctica-cycle107\"}}, # search for the specific Item in the collection \n", + " datetime=\"2023-04-02T00:00:00Z/2024-08-10T23:59:59Z\", # specify the start and end date of the time frame to perform the search \n", + ")\n", + "\n", + "items = list(search.items())\n", + "\n", + "print(len(items))\n", + "\n", + "pp = pprint.PrettyPrinter(depth=4)\n", + "pp.pprint([i.to_dict() for i in items])" + ] + }, + { + "cell_type": "markdown", + "id": "15757ed2-f144-488c-bdcc-93a454fe8f98", + "metadata": { + "tags": [] + }, + "source": [ + "
\n", + "
\n", + "\n", + "If you do not know the item id, search through available satellite instrument name, region, number of the cycle and the datetime range of the products of interest.

\n", + "**You can specify them by filtering based on following possible values:**
\n", + "* missions: `3a` or `3b`\n", + "* regions: `anarctica` or `greenland`\n", + "* cycle range: for sentinel-3a possible cycle range is from `005 to 112`; while sentinel-3b has range from `011-093`\n", + "* datetime: specify the time frame of the products from the range between: `2016-06-01 00:00:00 UTC – 2024-05-09 00:00:00 UTC`
\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d311ba44-9051-4e8b-928f-9c37c74bb79f", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "#Search for items from specific mission and type of the instrument (based on the id) and the region as well as cycle number \n", + "# Define your cycle range and mission types\n", + "cycle_range = [f\"{i:03d}\" for i in range(90, 111)] #005 to 111   # for sentinel-3a possible cycle range is from 005 to 111; while s3b has range from 011-092\n", + "missions = [\"3b\"] # select the mission and sensor type from:\"sentinel-3a\" or \"sentinel-3b\"] \n", + "regions = [\"antarctica\"] # specify the region from: \"antarctica\" or \"greenland\"\n", + "\n", + "# AOI entire world\n", + "geom = {\n", + " \"type\": \"Polygon\",\n", + " \"coordinates\": [\n", + " [\n", + " [-180, -90],\n", + " [-180, 90],\n", + " [180 , 90],\n", + " [180, -90],\n", + " [-180, -90],\n", + " ]\n", + " ],\n", + "}\n", + "\n", + "# limit sets the # of items per page so we can see multiple pages getting fetched\n", + "#In this search we apply also filtering on ID that is one of the searchable parameters for the colletion\n", + "search = cat.search(\n", + " max_items=7,\n", + " limit=5,\n", + " collections=\"sentinel3-ampli-ice-sheet-elevation\",\n", + " intersects=geom, # search for the specific Item in the collection \n", + " datetime=\"2021-04-02T00:00:00Z/2024-08-10T23:59:59Z\", # specify the start and end date of the time frame to perform the search which are: 2016-06-01 00:00:00 UTC – 2024-05-09 00:00:00 UTC\n", + ")\n", + "items = list(search.items())\n", + "print(f\"Number of items found: {len(items)}\")\n", + "print(items)\n", + "\n", + "pp = pprint.PrettyPrinter(depth=4)\n", + "\n", + "filtered = [\n", + " item for item in items\n", + " if any(m in item.id.lower() for m in missions)\n", + " and any(r in item.id.lower() for r in regions)\n", + " and any(f\"cycle{c}\" in item.id.lower() for c in cycle_range)\n", + "]\n", + "\n", + "\n", + "#for i, item in enumerate(filtered, 2):\n", + " # print(f\"{i}. {item.id} @ {item.datetime}\")\n", + "\n", + "## Print number of filtered items\n", + "print(f\"Number of filtered items: {len(filtered)}\")\n", + "for i, item in enumerate(filtered, 2):\n", + " print(f\"{i}. {item.id} @ {item.datetime}\")" + ] + }, + { + "cell_type": "markdown", + "id": "a4d6035b", + "metadata": {}, + "source": [ + "## Download all assets from the selected item
\n", + "Based on the selection done in the previous cell, download the products to the `downloads` folder in your workspace" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9b57bffd-ad05-451d-9eb4-e8c5abc3eb72", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "base_url = \"https://eoresults.esa.int\"\n", + "\n", + "item_to_be_downloaded = 3\n", + "target = items[item_to_be_downloaded]\n", + "\n", + "output_dir = f\"downloads/{target.id}\"\n", + "os.makedirs(output_dir, exist_ok=True)\n", + "\n", + "assets_total=len(target.assets.items())\n", + "assets_current=0\n", + "for asset_key, asset in target.assets.items():\n", + " filename = os.path.basename(asset.href)\n", + " full_href = urljoin(base_url, asset.href)\n", + " local_path = os.path.join(output_dir, filename)\n", + " assets_current+=1\n", + " print(f\"[{assets_current}/{assets_total}] Downloading {filename}...\")\n", + " try:\n", + " urlretrieve(full_href, local_path)\n", + " except Exception as e:\n", + " print(f\"Failed to download {full_href}. {e}\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "4de12173-7b7c-4f0c-90f4-c2f93cb66159", + "metadata": {}, + "source": [ + "## Download filtered items
\n", + "Based on the selection done in the previous cell, download the products to the `downloads` folder in your workspace. You will download here the items which result from further filtering options (by mission type, cycle number, region etc.)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "21e508b5-b896-45ba-97e7-cfdf29b261a1", + "metadata": { + "collapsed": true, + "jupyter": { + "outputs_hidden": true + }, + "tags": [] + }, + "outputs": [], + "source": [ + "target = filtered[0] if len(filtered) > 0 else None\n", + "\n", + "output_dir = f\"downloads/{target.id}\"\n", + "os.makedirs(output_dir, exist_ok=True)\n", + "\n", + "assets_total=len(target.assets.items())\n", + "assets_current=0\n", + "for asset_key, asset in target.assets.items():\n", + " filename = os.path.basename(asset.href)\n", + " full_href = urljoin(base_url, asset.href)\n", + " local_path = os.path.join(output_dir, filename)\n", + " assets_current+=1\n", + " print(f\"[{assets_current}/{assets_total}] Downloading {filename}...\")\n", + " try:\n", + " urlretrieve(full_href, local_path)\n", + " except Exception as e:\n", + " print(f\"Failed to download {full_href}. {e}\") " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f7e21db4-6afb-48f3-8130-e2663ae98d1f", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "base_url = \"https://eoresults.esa.int\"\n", + "for index, item in enumerate(filtered, 2):\n", + " output_dir = f\"filtered/{item.id}\"\n", + " os.makedirs(output_dir, exist_ok=True)\n", + "\n", + " assets_total = len(item.assets.items())\n", + " assets_current = 0\n", + "\n", + " for asset_key, asset in item.assets.items():\n", + " filename = os.path.basename(asset.href)\n", + " full_href = urljoin(base_url, asset.href)\n", + " local_path = os.path.join(output_dir, filename)\n", + "\n", + " assets_current += 1\n", + " print(f\"[{index}] [{assets_current}/{assets_total}] Downloading {filename} for item {item.id}...\")\n", + "\n", + " try:\n", + " urlretrieve(full_href, local_path)\n", + " except Exception as e:\n", + " print(f\"Failed to download {full_href}. {e}\")\n", + "\n", + "print(f\"Downloaded assets for {len(filtered)} items.\")" + ] + }, + { + "cell_type": "markdown", + "id": "34c7870f", + "metadata": {}, + "source": [ + "## (Optional) Read some data to ensure all items are downloaded properly\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a868c6c9", + "metadata": {}, + "outputs": [], + "source": [ + "import xarray as xr\n", + "import numpy as np\n", + "\n", + "# change this to a downloaded file\n", + "example_filepath = f'./downloads/{target.id}/S3A_SR_2_TDP_LI_20240403T201315_20240403T201615_20250416T191921_0180_111_014______CNE_GRE_V001.nc'\n", + "\n", + "# Open selected product and check the values\n", + "# Note: You can select another group of values to read : satellite_and_altimeter, or ESA_L2_processing\n", + "ds = xr.open_dataset(example_filepath, group='AMPLI_processing')\n", + "values = ds['elevation_radar_ampli'].values\n", + "values[~np.isnan(values)]" + ] + }, + { + "cell_type": "markdown", + "id": "b23edcf7-b5f6-4d82-bbe3-87839ffd5e28", + "metadata": {}, + "source": [ + "## (Optional) Create an archive of products downloaded " + ] + }, + { + "cell_type": "markdown", + "id": "289de21c-e98d-4113-9c24-79117841e222", + "metadata": {}, + "source": [ + "Create an archive of the products downloaded to your workspace and save them in .zip format to make them compressed" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "912e38df-65f2-46d4-be4e-7751dbe6837f", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Create an archive of downloaded products \n", + "zip_path = shutil.make_archive(output_dir, 'zip', root_dir=output_dir)\n", + "print(f\"Created ZIP archive: {zip_path}\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "pangeo", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/OSC/PRR_STAC_introduction.ipynb b/PRR/PRR_STAC_introduction.ipynb similarity index 100% rename from OSC/PRR_STAC_introduction.ipynb rename to PRR/PRR_STAC_introduction.ipynb diff --git a/PRR/example_tccas.ipynb b/PRR/example_tccas.ipynb new file mode 100644 index 00000000..3b38fdd1 --- /dev/null +++ b/PRR/example_tccas.ipynb @@ -0,0 +1,1836 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1f92ff5a", + "metadata": {}, + "source": [ + "# ESA Project Results Repository: Generating STAC collections with multiple assets\n", + "\n", + "This notebook shows how to generate a valid STAC collection, which is a requirement to upload research outcomes to the [ESA Project Results Repository (PRR)](https://eoresults.esa.int/). It focuses on generating metadata for a project with a multiple data files of different types. \n", + "\n", + "Check the [EarthCODE documentation](https://earthcode.esa.int/), and [PRR STAC introduction example](https://esa-earthcode.github.io/examples/prr-stac-introduction) for a more general introduction to STAC and the ESA PRR.\n", + "\n", + "\n", + "The code below demonstrates how to perform the necessary steps using real data from the ESA project **Terrestrial Carbon Community Assimilation System (TCCAS)**. The focus of TCCAS is the combination of a diverse array of observational data streams with the D&B terrestrial biosphere model into a consistent picture of the terrestrial carbon, water, and energy cycles.\n", + "\n", + "\n", + "🔗 Check the project website: [Terrestrial Carbon Community Assimilation System (TCCAS) – Website](https://tccas.inversion-lab.com/index.html)\n", + "\n", + "🛢️ TCCAS Dataset: [Terrestrial Carbon Community Assimilation System (TCCAS) – Data base: Sodankylä and Lapland region](https://tccas.inversion-lab.com/database/sodankylae.html)\n", + "\n", + "#### Acknowledgment \n", + "We gratefully acknowledge the **Terrestrial Carbon Community Assimilation System (TCCAS) team** for providing access to the data used in this example, as well as support in creating it." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "3333ec9c", + "metadata": {}, + "outputs": [], + "source": [ + "# import libraries\n", + "import xarray as xr\n", + "from pystac import Item, Collection\n", + "import pystac\n", + "from datetime import datetime\n", + "from shapely.geometry import box, mapping\n", + "import glob\n", + "import json\n", + "import shapely\n", + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "id": "6161e2c3", + "metadata": {}, + "source": [ + "## 1. Generate the parent collection\n", + "\n", + "The root STAC Collection provides a general description of all project outputs which will be stored on the PRR.\n", + "The PRR STAC Collection template enforces some required fields that you need to provide in order to build its valid description. Most of these metadata fields should already be available and can be extracted from your data.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "869e46f3", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
    \n", + " \n", + " \n", + " \n", + "
  • \n", + " type\n", + " \"Collection\"\n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " id\n", + " \"tccas-sodankylae\"\n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " stac_version\n", + " \"1.1.0\"\n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " description\n", + " \"The Terrestrial Carbon Community Assimilation System (TCCAS) is built around the coupled D&B terrestrial biosphere model. D&B has been newly developed based on the well-established DALEC and BETHY models and builds on the strengths of each component model. In particular, D&B combines the dynamic simulation of the carbon pools and canopy phenology of DALEC with the dynamic simulation of water pools, and the canopy model of photosynthesis and energy balance of BETHY. D&B includes a set of observation operators for optical as well as active and passive microwave observations. The focus of TCCAS is the combination of this diverse array of observational data streams with the D&B model into a consistent picture of the terrestrial carbon, water, and energy cycles. TCCAS applies a variational assimilation approach that adjusts a combination of initial pool sizes and process parameters to match the observational data streams. This dataset includes Satelite, Field and model forcing data sets for Sodankylä and Lapland region.\"\n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " links[] 0 items\n", + " \n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " title\n", + " \"Terrestrial Carbon Community Assimilation System: Database for Lapland and Sodankyla region\"\n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " extent\n", + "
      \n", + " \n", + " \n", + " \n", + "
    • \n", + " spatial\n", + "
        \n", + " \n", + " \n", + "
      • \n", + " bbox[] 1 items\n", + " \n", + "
          \n", + " \n", + " \n", + "
        • \n", + " 0[] 4 items\n", + " \n", + "
            \n", + " \n", + " \n", + " \n", + "
          • \n", + " 0\n", + " 18.0\n", + "
          • \n", + " \n", + " \n", + " \n", + "
          \n", + " \n", + "
            \n", + " \n", + " \n", + " \n", + "
          • \n", + " 1\n", + " 65.0\n", + "
          • \n", + " \n", + " \n", + " \n", + "
          \n", + " \n", + "
            \n", + " \n", + " \n", + " \n", + "
          • \n", + " 2\n", + " 32.0\n", + "
          • \n", + " \n", + " \n", + " \n", + "
          \n", + " \n", + "
            \n", + " \n", + " \n", + " \n", + "
          • \n", + " 3\n", + " 69.0\n", + "
          • \n", + " \n", + " \n", + " \n", + "
          \n", + " \n", + "
        • \n", + " \n", + " \n", + "
        \n", + " \n", + "
      • \n", + " \n", + " \n", + "
      \n", + "
    • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    • \n", + " temporal\n", + "
        \n", + " \n", + " \n", + "
      • \n", + " interval[] 1 items\n", + " \n", + "
          \n", + " \n", + " \n", + "
        • \n", + " 0[] 2 items\n", + " \n", + "
            \n", + " \n", + " \n", + " \n", + "
          • \n", + " 0\n", + " \"2011-01-01T00:00:00Z\"\n", + "
          • \n", + " \n", + " \n", + " \n", + "
          \n", + " \n", + "
            \n", + " \n", + " \n", + " \n", + "
          • \n", + " 1\n", + " \"2021-12-31T00:00:00Z\"\n", + "
          • \n", + " \n", + " \n", + " \n", + "
          \n", + " \n", + "
        • \n", + " \n", + " \n", + "
        \n", + " \n", + "
      • \n", + " \n", + " \n", + "
      \n", + "
    • \n", + " \n", + " \n", + " \n", + "
    \n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " license\n", + " \"various\"\n", + "
  • \n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# create the parent collection\n", + "collectionid = \"tccas-sodankylae\"\n", + "\n", + "\n", + "collection = Collection.from_dict(\n", + " \n", + "{\n", + " \"type\": \"Collection\",\n", + " \"id\": collectionid,\n", + " \"stac_version\": \"1.1.0\",\n", + " \"title\": \"Terrestrial Carbon Community Assimilation System: Database for Lapland and Sodankyla region\",\n", + " \"description\": \"The Terrestrial Carbon Community Assimilation System (TCCAS) is built around the coupled D&B terrestrial biosphere model. D&B has been newly developed based on the well-established DALEC and BETHY models and builds on the strengths of each component model. In particular, D&B combines the dynamic simulation of the carbon pools and canopy phenology of DALEC with the dynamic simulation of water pools, and the canopy model of photosynthesis and energy balance of BETHY. D&B includes a set of observation operators for optical as well as active and passive microwave observations. The focus of TCCAS is the combination of this diverse array of observational data streams with the D&B model into a consistent picture of the terrestrial carbon, water, and energy cycles. TCCAS applies a variational assimilation approach that adjusts a combination of initial pool sizes and process parameters to match the observational data streams. This dataset includes Satelite, Field and model forcing data sets for Sodankylä and Lapland region.\",\n", + " \"extent\": {\n", + " \"spatial\": {\n", + " \"bbox\": [\n", + " [\n", + " 18.00,\n", + " 65.00,\n", + " 32.00,\n", + " 69.00\n", + " ]\n", + " ]\n", + " },\n", + " \"temporal\": {\n", + " \"interval\": [\n", + " [\n", + " \"2011-01-01T00:00:00Z\",\n", + " \"2021-12-31T00:00:00Z\"\n", + " ]\n", + " ]\n", + " }\n", + " },\n", + " \"license\": \"various\",\n", + " \"links\": []\n", + "\n", + "}\n", + "\n", + ")\n", + "\n", + "collection # visualise the metadata of your collection " + ] + }, + { + "cell_type": "markdown", + "id": "2ab07efc", + "metadata": {}, + "source": [ + "## 2. Create STAC Items and STAC Assets from original dataset \n", + "\n", + "The second step is to describe the different files as STAC Items and Assets. Take your time to decide how your data should be categorised to improve usability of the data, and ensure intuitive navigation through different items in the collections. There are multiple strategies for doing this and this tutorial demonstrate one of the possible ways of doing that. Examples of how other ESA projects are doing this are available in the [EarthCODE documentation](https://esa-earthcode.github.io/examples/prr-stac-introduction)" + ] + }, + { + "cell_type": "markdown", + "id": "eb7701f9-d671-40f5-8bd0-4f85311ff72d", + "metadata": {}, + "source": [ + "#### 2.1 Create STAC Item from Satellite Dataset" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "60ad7781", + "metadata": {}, + "outputs": [], + "source": [ + "# define dataset names and base url. If the data is locally stored, then you have to adjust these paths.\n", + "\n", + "\n", + "root_url = 'https://lcc.inversion-lab.com/data/eo/'\n", + "data_files = {\n", + " \"Fraction of absorbed Photosynthetic Active Radiation Leaf Area Index (JRC-TIP)\": \"/jrc-tip/jrctip_fapar-lai_sodankyla_20110101-20220105.nc\",\n", + " \"Brightness temperature (SMOS TB)\": \"smos/smos_l3tb/SMOS_L3TB__sodankyla.nc\",\n", + " \"Soil moisture and Vegetation Optical Depth (SMOS SM and SMOS L-VOD)\": \"smos/smosL2/smosL2_1D_v700_sodankyla_trans.nc\",\n", + " \"Solar Induced Chlorophyll Fluorescence (Sentinel 5P)\": \"sif/tropomi/Sodankyla_SIF_TROPOMI_final.nc4\",\n", + " \"Slope (ASCAT Slope)\": \"ascat/local_slope.final/ASCAT_slope_so.nc\",\n", + " \"Photochemical Reflectance Index (MODIS PRI)\": \"modis/final/PRI_ESTIMATE_SODANKYLA_SINUSOIDAL.nc\",\n", + " \"Land Surface Temperature (MODIS LST)\": \"modis/final/LST_ESTIMATE_SODANKYLA_SINUSOIDAL.nc\",\n", + " \"Solar Induced Chlorophyll Fluorescence (OCO-2 SIF)\": \"sif/oco2/Sodankyla_SIF_OCO2_final.nc4\",\n", + " \"Vegetation Optical Depth (AMSR-2 VOD)\": \"amsr2/final/AMSR2_so.nc\"\n", + "} " + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "30dfc770", + "metadata": {}, + "outputs": [], + "source": [ + "# fix the same bbox and geometry for all items in the region\n", + "bbox = [18.00, 65.00, 32.00, 69.00]\n", + "geometry = json.loads(json.dumps(shapely.box(*bbox).__geo_interface__))" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "e37e5a77", + "metadata": {}, + "outputs": [], + "source": [ + "# some attributes extracted from xarray are not json serialisable and have to be cast to other types.\n", + "def convert_to_json_serialisable(attrs):\n", + " attrs = attrs.copy()\n", + " for attr in attrs.keys():\n", + " if isinstance(attrs[attr], np.ndarray):\n", + " attrs[attr] = attrs[attr].tolist()\n", + " elif str(type(attrs[attr])).__contains__('numpy.int'):\n", + " attrs[attr] = int(attrs[attr])\n", + " return attrs" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "5bedc1dd", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/tmp/ipykernel_12458/2301298067.py:9: UserWarning: no explicit representation of timezones available for np.datetime64\n", + " ts = (start_time - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's')\n", + "/tmp/ipykernel_12458/2301298067.py:5: FutureWarning: In a future version of xarray decode_timedelta will default to False rather than None. To silence this warning, set decode_timedelta to True, False, or a 'CFTimedeltaCoder' instance.\n", + " ds = xr.open_dataset(root_url + dataset_filepath + '#mode=bytes')\n", + "/tmp/ipykernel_12458/2301298067.py:9: UserWarning: no explicit representation of timezones available for np.datetime64\n", + " ts = (start_time - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's')\n" + ] + } + ], + "source": [ + "# for each dataset create an item\n", + "for dataset_name, dataset_filepath in data_files.items():\n", + "\n", + " # 1. open the netcdf file\n", + " ds = xr.open_dataset(root_url + dataset_filepath + '#mode=bytes')\n", + "\n", + " if 'time' in ds.coords:\n", + " start_time = ds['time'][0].values\n", + " ts = (start_time - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's')\n", + " start_time = datetime.fromtimestamp(ts)\n", + " elif 'yymmddHH' in ds.variables:\n", + " string_date = '-'.join(ds['yymmddHH'][0].values.astype(str)[:3])\n", + " start_time = datetime.strptime(string_date, '%Y-%m-%d')\n", + " else:\n", + " string_date = '-'.join(ds['yymmddHHMMSS'][0].values.astype(int).astype(str)[:3])\n", + " start_time = datetime.strptime(string_date, '%Y-%m-%d')\n", + "\n", + " # 3. Create a STAC item with the extracted properties\n", + " item = Item(\n", + " id=f\"{collection.id}-{dataset_name.lower().replace(' ', '_')}\",\n", + " geometry=geometry,\n", + " datetime=start_time,\n", + " bbox=bbox,\n", + " properties= {\n", + " \"license\": ds.attrs['license'],\n", + " \"description\": f'Dataset with variables related to {dataset_name}.',\n", + " }\n", + " )\n", + "\n", + " if len(item.properties['license']) > 20:\n", + " item.properties['license'] = 'TIP-FAPAR-1.7'\n", + "\n", + " # 3. add an asset (the actual link to the file)\n", + " item.add_asset(\n", + " key=f'Dataset with variables related to {dataset_name}.', # title can be arbitrary\n", + " asset=pystac.Asset(\n", + " href=f'/d/{collectionid}/{dataset_filepath.split('/')[-1]}',\n", + " media_type=\"application/x-netcdf\",\n", + " roles=[\"data\"],\n", + " )\n", + " )\n", + "\n", + " # 4. Extract variable information\n", + " for v in ds.variables:\n", + " item.properties[f\"variable_{v}\"] = convert_to_json_serialisable(ds.variables[v].attrs)\n", + "\n", + " item.validate()\n", + "\n", + " # 5. Add the item to the collection\n", + " collection.add_item(item)\n", + " " + ] + }, + { + "cell_type": "markdown", + "id": "aed954c0-39cc-4833-a30b-8d64f4fe38b5", + "metadata": {}, + "source": [ + "#### 2.2 Create STAC Item from In situ Dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "00e4d3b5", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
    \n", + " \n", + " \n", + " \n", + "
  • \n", + " rel\n", + " \"item\"\n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " href\n", + " None\n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " type\n", + " \"application/geo+json\"\n", + "
  • \n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
" + ], + "text/plain": [ + ">" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# add a single item with all the in-situ data, since it comes in a single .tgz file\n", + "item = Item(\n", + " id=f\"{collection.id}-insitu_package\",\n", + " geometry=geometry,\n", + " datetime=start_time,\n", + " bbox=bbox,\n", + " properties= {\n", + " \"license\": \"CC-BY-4.0\",\n", + " \"description\": 'Insitu package with FloX, VOD and Miscellaneous field datasets related to the TCCAS project. ',\n", + " }\n", + ")\n", + "\n", + "# 3. add an asset (the actual link to the file)\n", + "item.add_asset(\n", + " key=f'Insitu package', # title can be arbitrary\n", + " asset=pystac.Asset(\n", + " href=f'/d/{collectionid}/sodankyla-insitu-package.tgz',\n", + " media_type=\"application/tar+gzip\",\n", + " roles=[\"data\"],\n", + " )\n", + ")\n", + "\n", + "item.validate()\n", + "collection.add_item(item)" + ] + }, + { + "cell_type": "markdown", + "id": "76ae5696-ef72-4006-92fa-6259ed27687d", + "metadata": {}, + "source": [ + "#### 2.3 Create STAC Item from Model based Dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "60f80689", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
    \n", + " \n", + " \n", + " \n", + "
  • \n", + " rel\n", + " \"item\"\n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " href\n", + " None\n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " type\n", + " \"application/geo+json\"\n", + "
  • \n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
" + ], + "text/plain": [ + ">" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# add an item with multiple model forcing assets\n", + "item = Item(\n", + " id=f\"{collectionid}-model_forcing\",\n", + " geometry=geometry,\n", + " datetime=start_time,\n", + " bbox=bbox,\n", + " properties= {\n", + " \"license\": \"CC-BY-4.0\",\n", + " \"description\": ' Regional and Site-level model forcing Data Sets for Sodankylä and Lapland region, part of the TCCAS project.',\n", + " }\n", + ")\n", + "\n", + "\n", + "# 3. add an asset (the actual link to the file)\n", + "item.add_asset(\n", + " key=f'static-site-level', # title can be arbitrary\n", + " asset=pystac.Asset(\n", + " href=f'/d/{collectionid}/FI-Sod_staticforcing.nc',\n", + " media_type=\"application/x-netcdf\",\n", + " roles=[\"data\"],\n", + " )\n", + ")\n", + "\n", + "item.add_asset(\n", + " key=f'time-dependent (ERA5) - site level', # title can be arbitrary\n", + " asset=pystac.Asset(\n", + " href=f'/d/{collectionid}/FI-Sod_dynforcing-era5_20090101-20211231_with-lwdown.nc',\n", + " media_type=\"application/x-netcdf\",\n", + " roles=[\"data\"],\n", + " )\n", + ")\n", + "\n", + "item.add_asset(\n", + " key=f'time-dependent (in-situ) - site level', # title can be arbitrary\n", + " asset=pystac.Asset(\n", + " href=f'/d/{collectionid}/FI-Sod_dynforcing-insitu_20090101-20211231_with-insitu-lwdown.nc',\n", + " media_type=\"application/x-netcdf\",\n", + " roles=[\"data\"],\n", + " )\n", + ")\n", + "\n", + "item.add_asset(\n", + " key=f'static-regional', # title can be arbitrary\n", + " asset=pystac.Asset(\n", + " href=f'/d/{collectionid}/sodankyla-region_cgls-pft-crops-redistributed_staticforcing.nc',\n", + " media_type=\"application/x-netcdf\",\n", + " roles=[\"data\"],\n", + " )\n", + ")\n", + "\n", + "item.add_asset(\n", + " key=f'time-dependent (ERA5) - regional', # title can be arbitrary\n", + " asset=pystac.Asset(\n", + " href=f'/d/{collectionid}/sodankyla-region_dynforcing_era5_2009-2021.nc',\n", + " media_type=\"application/x-netcdf\",\n", + " roles=[\"data\"],\n", + " )\n", + ")\n", + "\n", + "item.validate()\n", + "collection.add_item(item)" + ] + }, + { + "cell_type": "markdown", + "id": "f3f3ff6e-7c18-499b-960f-3ec82e763a88", + "metadata": {}, + "source": [ + "#### 2.4 Create STAC Item for the documentation and add to the Collection" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "ead22923", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
    \n", + " \n", + " \n", + " \n", + "
  • \n", + " rel\n", + " \"item\"\n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " href\n", + " None\n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " type\n", + " \"application/geo+json\"\n", + "
  • \n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
" + ], + "text/plain": [ + ">" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# add all the documentation under a single item\n", + "item = Item(\n", + " id=f\"{collectionid}-documentation\",\n", + " geometry=geometry,\n", + " datetime=start_time,\n", + " bbox=bbox,\n", + " properties= {\n", + " \"license\": \"CC-BY-4.0\",\n", + " \"description\": 'Documentation for the TCCAS project datasets.',\n", + " }\n", + ")\n", + "\n", + "item.add_asset(\n", + " key=f'TCCAS user manual.', # title can be arbitrary\n", + " asset=pystac.Asset(\n", + " href=f'/d/{collectionid}/TCCAS_manual.pdf',\n", + " media_type=\"application/pdf\",\n", + " roles=[\"documentation\"],\n", + " )\n", + ")\n", + "\n", + "item.add_asset(\n", + " key=\"Satellite Data Uncertainty analysis Scientific Report\", # title can be arbitrary\n", + " asset=pystac.Asset(\n", + " href=f'/d/{collectionid}/D7.pdf',\n", + " media_type=\"application/pdf\",\n", + " roles=[\"documentation\"],\n", + " )\n", + ")\n", + "\n", + "item.add_asset(\n", + " key=\"Campaign Data User Manual\", # title can be arbitrary\n", + " asset=pystac.Asset(\n", + " href=f'/d/{collectionid}/D11_CDUM-all_sites.pdf',\n", + " media_type=\"application/pdf\",\n", + " roles=[\"documentation\"],\n", + " )\n", + ")\n", + "\n", + "collection.add_item(item)" + ] + }, + { + "cell_type": "markdown", + "id": "33d4ffee", + "metadata": {}, + "source": [ + "## 4. Save the metadata as a self-contained collection" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "52756c30", + "metadata": {}, + "outputs": [], + "source": [ + "# save the full self-contained collection\n", + "collection.normalize_and_save(\n", + " root_href='./data/example_catalog/',\n", + " catalog_type=pystac.CatalogType.SELF_CONTAINED\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "4a19d066", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
    \n", + " \n", + " \n", + " \n", + "
  • \n", + " type\n", + " \"Collection\"\n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " id\n", + " \"tccas-sodankylae\"\n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " stac_version\n", + " \"1.1.0\"\n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " description\n", + " \"The Terrestrial Carbon Community Assimilation System (TCCAS) is built around the coupled D&B terrestrial biosphere model. D&B has been newly developed based on the well-established DALEC and BETHY models and builds on the strengths of each component model. In particular, D&B combines the dynamic simulation of the carbon pools and canopy phenology of DALEC with the dynamic simulation of water pools, and the canopy model of photosynthesis and energy balance of BETHY. D&B includes a set of observation operators for optical as well as active and passive microwave observations. The focus of TCCAS is the combination of this diverse array of observational data streams with the D&B model into a consistent picture of the terrestrial carbon, water, and energy cycles. TCCAS applies a variational assimilation approach that adjusts a combination of initial pool sizes and process parameters to match the observational data streams. This dataset includes Satelite, Field and model forcing data sets for Sodankylä and Lapland region.\"\n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " links[] 14 items\n", + " \n", + "
      \n", + " \n", + " \n", + " \n", + "
    • \n", + " 0\n", + "
        \n", + " \n", + " \n", + " \n", + "
      • \n", + " rel\n", + " \"root\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " href\n", + " \"/home/krasen/oneones/data/example_catalog/collection.json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " type\n", + " \"application/json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " title\n", + " \"Terrestrial Carbon Community Assimilation System: Database for Lapland and Sodankyla region\"\n", + "
      • \n", + " \n", + " \n", + " \n", + "
      \n", + "
    • \n", + " \n", + " \n", + " \n", + "
    \n", + " \n", + "
      \n", + " \n", + " \n", + " \n", + "
    • \n", + " 1\n", + "
        \n", + " \n", + " \n", + " \n", + "
      • \n", + " rel\n", + " \"item\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " href\n", + " \"/home/krasen/oneones/data/example_catalog/tccas-sodankylae-fraction_of_absorbed_photosynthetic_active_radiation_leaf_area_index_(jrc-tip)/tccas-sodankylae-fraction_of_absorbed_photosynthetic_active_radiation_leaf_area_index_(jrc-tip).json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " type\n", + " \"application/geo+json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + "
      \n", + "
    • \n", + " \n", + " \n", + " \n", + "
    \n", + " \n", + "
      \n", + " \n", + " \n", + " \n", + "
    • \n", + " 2\n", + "
        \n", + " \n", + " \n", + " \n", + "
      • \n", + " rel\n", + " \"item\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " href\n", + " \"/home/krasen/oneones/data/example_catalog/tccas-sodankylae-brightness_temperature_(smos_tb)/tccas-sodankylae-brightness_temperature_(smos_tb).json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " type\n", + " \"application/geo+json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + "
      \n", + "
    • \n", + " \n", + " \n", + " \n", + "
    \n", + " \n", + "
      \n", + " \n", + " \n", + " \n", + "
    • \n", + " 3\n", + "
        \n", + " \n", + " \n", + " \n", + "
      • \n", + " rel\n", + " \"item\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " href\n", + " \"/home/krasen/oneones/data/example_catalog/tccas-sodankylae-soil_moisture_and_vegetation_optical_depth_(smos_sm_and_smos_l-vod)/tccas-sodankylae-soil_moisture_and_vegetation_optical_depth_(smos_sm_and_smos_l-vod).json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " type\n", + " \"application/geo+json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + "
      \n", + "
    • \n", + " \n", + " \n", + " \n", + "
    \n", + " \n", + "
      \n", + " \n", + " \n", + " \n", + "
    • \n", + " 4\n", + "
        \n", + " \n", + " \n", + " \n", + "
      • \n", + " rel\n", + " \"item\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " href\n", + " \"/home/krasen/oneones/data/example_catalog/tccas-sodankylae-solar_induced_chlorophyll_fluorescence_(sentinel_5p)/tccas-sodankylae-solar_induced_chlorophyll_fluorescence_(sentinel_5p).json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " type\n", + " \"application/geo+json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + "
      \n", + "
    • \n", + " \n", + " \n", + " \n", + "
    \n", + " \n", + "
      \n", + " \n", + " \n", + " \n", + "
    • \n", + " 5\n", + "
        \n", + " \n", + " \n", + " \n", + "
      • \n", + " rel\n", + " \"item\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " href\n", + " \"/home/krasen/oneones/data/example_catalog/tccas-sodankylae-slope_(ascat_slope)/tccas-sodankylae-slope_(ascat_slope).json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " type\n", + " \"application/geo+json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + "
      \n", + "
    • \n", + " \n", + " \n", + " \n", + "
    \n", + " \n", + "
      \n", + " \n", + " \n", + " \n", + "
    • \n", + " 6\n", + "
        \n", + " \n", + " \n", + " \n", + "
      • \n", + " rel\n", + " \"item\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " href\n", + " \"/home/krasen/oneones/data/example_catalog/tccas-sodankylae-photochemical_reflectance_index_(modis_pri)/tccas-sodankylae-photochemical_reflectance_index_(modis_pri).json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " type\n", + " \"application/geo+json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + "
      \n", + "
    • \n", + " \n", + " \n", + " \n", + "
    \n", + " \n", + "
      \n", + " \n", + " \n", + " \n", + "
    • \n", + " 7\n", + "
        \n", + " \n", + " \n", + " \n", + "
      • \n", + " rel\n", + " \"item\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " href\n", + " \"/home/krasen/oneones/data/example_catalog/tccas-sodankylae-land_surface_temperature_(modis_lst)/tccas-sodankylae-land_surface_temperature_(modis_lst).json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " type\n", + " \"application/geo+json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + "
      \n", + "
    • \n", + " \n", + " \n", + " \n", + "
    \n", + " \n", + "
      \n", + " \n", + " \n", + " \n", + "
    • \n", + " 8\n", + "
        \n", + " \n", + " \n", + " \n", + "
      • \n", + " rel\n", + " \"item\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " href\n", + " \"/home/krasen/oneones/data/example_catalog/tccas-sodankylae-solar_induced_chlorophyll_fluorescence_(oco-2_sif)/tccas-sodankylae-solar_induced_chlorophyll_fluorescence_(oco-2_sif).json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " type\n", + " \"application/geo+json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + "
      \n", + "
    • \n", + " \n", + " \n", + " \n", + "
    \n", + " \n", + "
      \n", + " \n", + " \n", + " \n", + "
    • \n", + " 9\n", + "
        \n", + " \n", + " \n", + " \n", + "
      • \n", + " rel\n", + " \"item\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " href\n", + " \"/home/krasen/oneones/data/example_catalog/tccas-sodankylae-vegetation_optical_depth_(amsr-2_vod)/tccas-sodankylae-vegetation_optical_depth_(amsr-2_vod).json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " type\n", + " \"application/geo+json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + "
      \n", + "
    • \n", + " \n", + " \n", + " \n", + "
    \n", + " \n", + "
      \n", + " \n", + " \n", + " \n", + "
    • \n", + " 10\n", + "
        \n", + " \n", + " \n", + " \n", + "
      • \n", + " rel\n", + " \"item\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " href\n", + " \"/home/krasen/oneones/data/example_catalog/tccas-sodankylae-insitu_package/tccas-sodankylae-insitu_package.json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " type\n", + " \"application/geo+json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + "
      \n", + "
    • \n", + " \n", + " \n", + " \n", + "
    \n", + " \n", + "
      \n", + " \n", + " \n", + " \n", + "
    • \n", + " 11\n", + "
        \n", + " \n", + " \n", + " \n", + "
      • \n", + " rel\n", + " \"item\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " href\n", + " \"/home/krasen/oneones/data/example_catalog/tccas-sodankylae-model_forcing/tccas-sodankylae-model_forcing.json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " type\n", + " \"application/geo+json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + "
      \n", + "
    • \n", + " \n", + " \n", + " \n", + "
    \n", + " \n", + "
      \n", + " \n", + " \n", + " \n", + "
    • \n", + " 12\n", + "
        \n", + " \n", + " \n", + " \n", + "
      • \n", + " rel\n", + " \"item\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " href\n", + " \"/home/krasen/oneones/data/example_catalog/tccas-sodankylae-documentation/tccas-sodankylae-documentation.json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " type\n", + " \"application/geo+json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + "
      \n", + "
    • \n", + " \n", + " \n", + " \n", + "
    \n", + " \n", + "
      \n", + " \n", + " \n", + " \n", + "
    • \n", + " 13\n", + "
        \n", + " \n", + " \n", + " \n", + "
      • \n", + " rel\n", + " \"self\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " href\n", + " \"/home/krasen/oneones/data/example_catalog/collection.json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      • \n", + " type\n", + " \"application/json\"\n", + "
      • \n", + " \n", + " \n", + " \n", + "
      \n", + "
    • \n", + " \n", + " \n", + " \n", + "
    \n", + " \n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " title\n", + " \"Terrestrial Carbon Community Assimilation System: Database for Lapland and Sodankyla region\"\n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " extent\n", + "
      \n", + " \n", + " \n", + " \n", + "
    • \n", + " spatial\n", + "
        \n", + " \n", + " \n", + "
      • \n", + " bbox[] 1 items\n", + " \n", + "
          \n", + " \n", + " \n", + "
        • \n", + " 0[] 4 items\n", + " \n", + "
            \n", + " \n", + " \n", + " \n", + "
          • \n", + " 0\n", + " 18.0\n", + "
          • \n", + " \n", + " \n", + " \n", + "
          \n", + " \n", + "
            \n", + " \n", + " \n", + " \n", + "
          • \n", + " 1\n", + " 65.0\n", + "
          • \n", + " \n", + " \n", + " \n", + "
          \n", + " \n", + "
            \n", + " \n", + " \n", + " \n", + "
          • \n", + " 2\n", + " 32.0\n", + "
          • \n", + " \n", + " \n", + " \n", + "
          \n", + " \n", + "
            \n", + " \n", + " \n", + " \n", + "
          • \n", + " 3\n", + " 69.0\n", + "
          • \n", + " \n", + " \n", + " \n", + "
          \n", + " \n", + "
        • \n", + " \n", + " \n", + "
        \n", + " \n", + "
      • \n", + " \n", + " \n", + "
      \n", + "
    • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    • \n", + " temporal\n", + "
        \n", + " \n", + " \n", + "
      • \n", + " interval[] 1 items\n", + " \n", + "
          \n", + " \n", + " \n", + "
        • \n", + " 0[] 2 items\n", + " \n", + "
            \n", + " \n", + " \n", + " \n", + "
          • \n", + " 0\n", + " \"2011-01-01T00:00:00Z\"\n", + "
          • \n", + " \n", + " \n", + " \n", + "
          \n", + " \n", + "
            \n", + " \n", + " \n", + " \n", + "
          • \n", + " 1\n", + " \"2021-12-31T00:00:00Z\"\n", + "
          • \n", + " \n", + " \n", + " \n", + "
          \n", + " \n", + "
        • \n", + " \n", + " \n", + "
        \n", + " \n", + "
      • \n", + " \n", + " \n", + "
      \n", + "
    • \n", + " \n", + " \n", + " \n", + "
    \n", + "
  • \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
  • \n", + " license\n", + " \"various\"\n", + "
  • \n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "collection" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "pangeo", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/PRR/index.md b/PRR/index.md new file mode 100644 index 00000000..8456ea96 --- /dev/null +++ b/PRR/index.md @@ -0,0 +1,22 @@ +# ESA Project Results Repository + +The [ESA Project Results Repository (PRR)](https://eoresults.esa.int/) provides long term storage for research outcomes. It provides access to data, workflows, experiments and documentation from ESA Projects organised across Collections, accessible via the [STAC API](https://github.com/radiantearth/stac-api-spec). Each Collection contains [STAC Items](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md), with their related Assets stored within the PRR storage. Scientists/commercial companies can access the PRR via the [EarthCODE](https://earthcode.esa.int/) and [APEx](https://esa-apex.github.io/apex_documentation/) projects. + + +# Uploading data to the PRR +In order to upload data to the ESA Project Results Repository (PRR) you have to generate a STAC Collection that is associated to your files. The STAC Collection provides metadata about your files and makes them searchable and machine readable. The metadata generation process is organised in four steps process: + +1. Generate a root STAC Collection +2. Group your dataset files into STAC Items and STAC Assets +3. Add the Items to the Collection +4. Save the normalised Collection +5. Send the data, metadata and some extra information to the Earth-Code team. + +Below you will find guides to the whole process, we recomend starting with the introductory notebook. + +- [Generating a STAC Collection for the PRR(Introduction)](./PRR_STAC_introduction.ipynb) - A notebook explaining how to create the required PRR metadata. It describes the steps in detail and uses a relatively simple example, with a single .nc raster data file. +- [Generating a STAC Collection for the PRR (Multiple file types)](./example_tccas.ipynb) - Example how to generate metadata for a more complicated dataset which has multiple types of data and different file formats. +- [Generating a STAC Collection for the PRR(Large dataset for multiple regions)](./Creating%20STAC%20Catalog_from_PRR_example.ipynb) - Example how to generate metadata for a large dataset that has multiple disjoint regions. + +If you are interested in exploring/downloading PRR data you can use this notebook as a guide: +- [ESA Project Results Repository (PRR) Data Access and Collections Preview](./PRR_STAC_download_example.ipynb) - A notebook explaining how Item Catalogs should be created, uses raster data. diff --git a/README.md b/README.md index 00c550e8..b0eedcc3 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ myst start ``` docker run -it --rm -v ./:/d/ -p 127.0.0.1:3000:3000 -p 127.0.0.1:3100:3100 node:22 bash -cd /d/examples/ +cd /d/ npm install -g mystmd myst start ``` diff --git a/index.md b/index.md index bf3032c2..c09dbca9 100644 --- a/index.md +++ b/index.md @@ -7,9 +7,11 @@ date: 2025-02-24 # affiliations: # - TBD --- -👋 Welcome to the EarthCODE examples book! + +Welcome to the EarthCODE examples book! Here you will find guides and examples on how to use the various EarthCODE resources. -If you are looking to upload data to the Open Science Catalog, check out our [Open Science Catalog Examples](OSC/index.md). +If you are looking to upload data to the ESA Project Results (PRR) repository, check out our [PRR Examples](PRR/index.md). +If you are looking to add information to the Open Science Catalog, check out our [Open Science Catalog Examples](OSC/index.md). \ No newline at end of file diff --git a/myst.yml b/myst.yml index a9934653..26f93449 100644 --- a/myst.yml +++ b/myst.yml @@ -12,16 +12,26 @@ project: code: MIT content: CC-BY-4.0 toc: + - file: index.md - - title: Open Science Catalog + + - title: ESA Project Results Repository (PRR) + file: PRR/index.md + children: + - file: PRR/PRR_STAC_introduction.ipynb + - file: PRR/example_tccas.ipynb + - file: PRR/Creating STAC Catalog_from_PRR_example.ipynb + - file: PRR/PRR_STAC_download_example.ipynb + + - title: Open Science Catalog (OSC) file: OSC/index.md children: - - file: OSC/creating_an_item_catalog.ipynb - - file: OSC/manual_example.md - file: OSC/git_clerk_example.md + - file: OSC/osc_pr_manual.ipynb + - file: OSC/osc_pr_pystac.ipynb - file: OSC/deepcode_example.md - - file: OSC/stactools_old_example.md - - file: OSC/PRR_STAC_introduction.ipynb + + # plugins: # - directives.mjs # - picsum.mjs