From 5a5e800ec53ff16f942587438bb763fbf362b96b Mon Sep 17 00:00:00 2001 From: Krasen Samardzhiev Date: Fri, 8 Aug 2025 14:28:27 +0100 Subject: [PATCH 1/2] documentation --- .../Contributing to the EarthCODE Catalog.md | 285 ------------------ .../index.md | 23 ++ ...ontributing to the Open Science Catalog.md | 96 ++++++ ...ring Resources in The EarthCODE Catalog.md | 0 .../Open Science Catalog.md | 154 ++++++++++ .../Using Data in Workflows.md | 0 .../index.md | 9 +- .../Workflows/index.md | 28 -- 8 files changed, 279 insertions(+), 316 deletions(-) delete mode 100644 pages/Technical Documentation/Data/Contributing to the EarthCODE Catalog.md create mode 100644 pages/Technical Documentation/ESA Project Results Repository (PRR)/index.md create mode 100644 pages/Technical Documentation/Open Science Catalog (OSC)/Contributing to the Open Science Catalog.md rename pages/Technical Documentation/{Data => Open Science Catalog (OSC)}/Discovering Resources in The EarthCODE Catalog.md (100%) create mode 100644 pages/Technical Documentation/Open Science Catalog (OSC)/Open Science Catalog.md rename pages/Technical Documentation/{Data => Open Science Catalog (OSC)}/Using Data in Workflows.md (100%) rename pages/Technical Documentation/{Data => Open Science Catalog (OSC)}/index.md (50%) delete mode 100644 pages/Technical Documentation/Workflows/index.md diff --git a/pages/Technical Documentation/Data/Contributing to the EarthCODE Catalog.md b/pages/Technical Documentation/Data/Contributing to the EarthCODE Catalog.md deleted file mode 100644 index 84a767ba..00000000 --- a/pages/Technical Documentation/Data/Contributing to the EarthCODE Catalog.md +++ /dev/null @@ -1,285 +0,0 @@ ---- -order: 1 ---- -# Publishing Science Results - -The catalog functionalities described in the previous sections are granted to any user (registered/and non-registered users). Catalog exploration and content discovery and access to the products is fully open and transparent. In this section we will describe how to make new data accessible through the catalog and make them available for the broad scientific community. - -## Who can contribute? -Contributions to the Open Science Catalog are vital for advancing FAIR Open Science Principles across ESA-funded Earth Science activities. -We would like to specifically encourage contributions from: - -- Principal Investigators of ESA EO (Earth Observation Programme) funded Projects, -- Researchers, Scientists, Data Owners working on ESA-funded EO Projects, -- Principal Investigators, Researchers and Scientists from ESA Science Cluster Projects, -- ESA Technical Officers leading ESA EO Projects -- ESA-ESRIN Science Hub Members (e.g. ESA post-doctoral Research Fellows, ESA Living Planet Fellowship, ESA Visiting Scientists) -- Wider EO Research and Science Community: contact the EarthCODE team at [earth-code@esa.int](mailto:earth-code@esa.int) for more details! - -### You can enrich EarthCODE in several impactful ways: - -- **Publish Research Products:** Add new content to the Open Science Catalog. -- **Update content:** Keep the descriptions and metadata of products, projects, and more up-to-date. -- **Request removals:** Ensure the catalog remains accurate by requesting the removal of outdated or incorrect entries. - -To contribute, you only need to have an active GitHub account. If you don't have one yet, please [create an account](https://github.com/signup) to get started. - - -## How to Publish Results - -To publish your scientific results to the Open Science Catalog, you must: - -- Create valid **STAC** and/or **OGC API Record** objects (in JSON format). -- Submit a **Pull Request** to the [open-science-catalog-metadata-**staging**](https://github.com/ESA-EarthCODE/open-science-catalog-metadata-staging/tree/main) repository with your new entry. - -::: details The Open Science Catalog -The [**Open Science Catalog**](https://opensciencedata.esa.int/) is a publicly accessible platform that enables anyone—whether or not they have a GitHub account—to **discover and access Earth Observation research**. It provides a transparent and structured way to explore the latest results from EO projects by organizing metadata in a consistent and harmonized format. - -Built on the open-source **STAC Browser**, the catalog allows users to browse and explore interlinked elements such as **themes, variables, EO missions, projects, products, workflows, and experiments**, all described using **STAC-compliant JSON files**. -::: - -### Preparing Your Research for Publication - -To make your research outcomes accessible and discoverable by the broader scientific community, follow these steps: - -1. **Prepare your Product Package (Research Experiment)**, by uploading **dataset files**, **code** and **documentation** to appropriate, accessible locations. - -2. **Generate a Self-Contained STAC Catalog** - - Use tools like [`stactools`](https://stactools.readthedocs.io/en/stable/), [`rio-stac`](https://github.com/developmentseed/rio-stac), or [`PySTAC`](https://pystac.readthedocs.io/en/stable/) to generate a STAC Catalog. - - Host the resulting JSON files (Catalog + Items) in a **public GitHub repository** (or institutional equivalent). - ::: warning Important - Make sure the Catalog uses **relative paths** and points to remote asset URLs! - ::: - - -3. **Describe Your Research in the Open Science Catalog** - - Create entries that describe your **dataset, workflow, and experiment**. - - Link them to relevant **projects, variables, themes, and EO missions**. - - Include a `related` link to your external STAC Catalog to ensure it is discoverable from the Open Science Catalog. - -By following these steps, your research becomes part of a broader ecosystem of reusable, discoverable, and connected scientific outputs. - -## Step 1: Make Your Data Accessible - -To contribute to the Open Science Catalog, your research data must be openly accessible and persistent. Begin by preparing your **Research Experiments**, which includes: -- **Data files**, which will be added to the Item Catalog. -- **Workflow** (e.g., Jupyter Notebooks, Python scripts, CWL records) -- **Documentation** (e.g., links to peer-reviewed publications or public product descriptions) - -All of these should be *accessible*, meaning they are stored on **remote, persistent storage** that allows discovery and access. Examples include: -- ESA’s Project Results Repository (PRR) -- S3-compatible object storage (e.g. ESA S3 Bucket) -- Zenodo, CEDA, Dataverse, or other persistent archives - -If your data is already hosted on a reliable cloud storage provider you can use those links directly. - -If your data is not yet in the cloud or its persistence is uncertain, we recommend uploading it to the official **ESA Project Results Repository (PRR)**. To do this: -1. Request a data provider account. -2. Then, request a PRR collection, which will be used in later steps to define your STAC Item Catalog. - -::: details Requesting PRR Storage -At the moment, requests to store data on ESA PRR is done by the ESA PLES engineering team. If you need to request permanent storage, contact the team at [earth-code@esa.int](mailto:earth-code@esa.int) -::: - -## Step 2: Creating and uploading a STAC Item Catalog - -### Description - -The purpose of the STAC Item Catalog is to collect metadata and references to your assets in a format that can be easily reused by other scientists and automated workflows, and displayed correctly in the Open Science Catalog. The STAC Items should be created for all assets in your dataset (single files), gathered in dedicated STAC Catalogs to become available to EarthCODE users. - -The STAC structure helps organize and describe your data in a consistent and machine-readable way. Here’s how the hierarchy works: - -1. **STAC Catalog** - A STAC Catalog is the top-level container that groups related data files (Items + Assets). It behaves much like a folder in a traditional file system and can include other catalogs or items to help organize your data logically. -2. **STAC Item** - A STAC Item represents a single observation (with a given spatial and temporal extent) and is defined using a GeoJSON-like structure enriched with additional metadata—such as spatial and temporal extent, projection information, geophysical variables, and more. - Each Item contains one or more **Assets**, which are direct links to the actual data files. Assets may also describe specific bands, file types, or related resources associated with the item. - -__Example folder structure__ -``` -my-item-catalog -├── catalog.json -├── item_1 -│   └── item_1.json -└── item_2 - └── item_2.json -``` - -::: details Example `catalog.json` -```json{15,20} -{ - "type": "Catalog", - "id": "my-item-catalog-id", - "stac_version": "1.1.0", - "description": "Provide a meaningful description of the dataset here.", - "links": [ - { - "rel": "root", - "href": "./catalog.json", - "type": "application/json", - "title": "Title of the Dataset" - }, - { - "rel": "item", - "href": "./item_1/item_1.json", // relative link to the item.json describing a single file in the dataset - "type": "application/geo+json" - }, - { - "rel": "item", - "href": "./item_2/item_2.json", // relative link to the item.json describing a single file in the dataset - "type": "application/geo+json" - }, - ], - "title": "Tile of the Dataset" -} -``` -::: - -::: details Example `item.json` -```json{60} -{ - "type": "Feature", - "stac_version": "1.1.0", - "stac_extensions": [ - "https://stac-extensions.github.io/eo/v1.1.0/schema.json" - ], - "id": "item_1", - "geometry": { - "type": "Polygon", - "coordinates": [ - [ - [ - -50.17968937855544, - 66.77834561360399 - ], - [ - -47.9894188361956, - 66.83503196763441 - ], - [ - -48.056356656894216, - 67.37093506267574 - ], - [ - -50.295235368856346, - 67.31275872920898 - ], - [ - -50.17968937855544, - 66.77834561360399 - ] - ] - ] - }, - "bbox": [ - -50.295235368856346, - 66.77834561360399, - -47.9894188361956, - 67.37093506267574 - ], - "properties": { - "datetime": "2025-03-18T09:47:57.377671Z" - }, - "links": [ - { - "rel": "root", - "href": "../catalog.json", - "type": "application/json", - "title": "Title of the Dataset" - }, - { - "rel": "parent", - "href": "../catalog.json", - "type": "application/json", - "title": "Title of the Dataset" - } - ], - "assets": { - "data": { - "href": "https://EarthCODE/OSCAcssets/MY_PROJECT/MY_PRODUCT/item_1.tif", // link to remote asset - "type": "image/tiff; application=geotiff", - "eo:bands": [ - { - "name": "b1", - "description": "gray" - } - ], - "roles": [] - } - } -} -``` -::: - -### Creating the Item Catalog - -One way to create an Item Catalog is to copy an existing catalog and edit it manually in a text editor to fit your data. If you're new to STAC and only have a few data assets, this approach can work, but it is prone to errors. - -Manually editing STAC Items can be tedious, and extracting all the required metadata correctly can be challenging. For most Item Catalogs, we recommend using automated tools, for example: - -- The [`stactools`](https://stactools.readthedocs.io/en/stable/cli.html#stac-create-item) CLI provides a simple command-line interface for generating STAC Items. With the [`stactools-datacube`](https://github.com/stactools-packages/datacube) extension even following the STAC datacube extension. -- A combination of [`PySTAC`](https://pystac.readthedocs.io/en/stable/) to create the Catalog and [`rio-stac`](https://github.com/developmentseed/rio-stac) for automatically generating valid STAC Items with all required metadata. - -Typically, this workflow starts by defining individual STAC objects (a Catalog and its Items). Once created, these objects are linked together using STAC relationships. - -In the final step, the Catalog is __saved and normalized__ to a specified root directory. At this stage, you can choose to set the Catalog type to __"self-contained"__. When enabled, this ensures that all internal links are automatically resolved and adjusted to be relative, making the Catalog portable and independent of absolute file paths. - -The process is straightforward, and we highly recommend checking out [this notebook](https://esa-earthcode.github.io/examples/creating-an-item-catalog). - -::: warning IMPORTANT -Regardless of how you create the catalog, it must be **self-contained**. This means: -- Internal links should use **relative paths** (e.g., `"../catalog.json"` instead of `"/Users/name/catalog/catalog.json"`). -- Data asset paths should point to **remote storage**, not local files on your system. -::: - -### Uploading the Item Catalog - -The **Item Catalog must be hosted separately** from the Open Science Catalog, and like the data files, it should be **persistent and publicly accessible**. - -Since the Item Catalog only contains metadata (JSON) files rather than actual datasets, a simple and effective solution is to store it in a public repository, such as **GitHub**. This approach is demonstrated in the [Creating an Item Catalog notebook](https://esa-earthcode.github.io/examples/creating-an-item-catalog). - -Alternatively, if you have access to a **reliable cloud storage service**—such as the EarthCODE object storage bucket—you can host your Item Catalog there. - -If you choose to use the ESA PRR the generated STAC Items have to be sent via `POST` requests to the [PRR Registration Gateway](https://eoresults.esa.int/reg-api/docs#/) for registration using the previously requested collection. - -The only requirement is that __other users and the STAC browser must be able to find and read your repository!__ - -:::info -In the next step where you will be uploading metadata to the Open Science Catalog, EarthCODE administrators will review your Item Catalog and assist you with any necessary adjustments. -::: - - -## Step 3: Creating a Product entry in the OSC - -### How to publish new data to the catalog? - -Data ingestion to the catalog can be performed in different ways, depending on **where the products are originally stored** , but also depending on **the number of products to be ingested** and therefore size. - -All Themes, Variables, EO Missions, Projects, Products, Workflows, and Experiments are hosted as a metadata repository placed on the GitHub platform: Git and [GitHub API](https://docs.github.com/en/rest). Each update to metadata is handled via a [Pull Request (PR)](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests). This Pull Request allows for reviewers to see the changes to be applied in advance, to check for validity of the requested changes (via an automated validation script) and to provide reviews as comments. If appropriate, the changes can be merged with the main branch of the repository. When a Pull Request is merged, the updated STAC catalog is deployed as Static Catalog. - -![ingest-data-scheme](https://github.com/EOEPCA/open-science-catalog-metadata/assets/120453810/5d6297e7-5d66-4564-9538-bb6eaeb92598) - -At the moment Open Science Catalog supports ingestion of new products via **GitHub only**. This allows data providers like projects PI to apply **multiple changes or adding several new products at once**. - -Work is underway to provide a CMS-like GUI based editing functionality using [git-clerk](https://github.com/EOX-A/git-clerk) via the EarthCODE Portal. Although this GUI approach is still relying on GitHub the complexity is abstracted and hidden to the user. - -In this section the contribution procedure is described in an example for a Product. Please note that in the same way new Variables, Projects, Themes, EO Missions, Workflows, and Experiments can be added. We will be using three Use Case scenarios to better describe the procedure of product ingestion to the catalog: - -1. Adding metadata of a single product (item) to the catalog manually. -2. Ingesting metadata of assets with STAC Catalog: - 1. Adding multiple or single product(s) stored in external server (open-access storage) - 2. Adding multiple or single product(s) stored locally - -### Which information is needed before I start importing new data? - -Before making any changes to the catalog's content please make sure you have already prepared the following information about your product: - -- The Product should be related to a result of an ESA-funded project. Check if the Project's page is already existing within the ESA Open Science Catalog: [https://opensciencedata.esa.int/](https://opensciencedata.esa.int/). If not **create a Project page first.** -- **Complete metadata available** (to correctly describe the Product) -- The Product should be stored in an external database that is approved and a **stable data repository** (e.g. ESA PRR, CEDA Data Archive: [https://catalogue.ceda.ac.uk/](https://catalogue.ceda.ac.uk/); Zenodo repository: [https://zenodo.org/](https://zenodo.org/), etc.) -- If the product you would like to ingest is stored elsewhere, see other data ingestion scenarios described in the section TBD. -- Data provided in formats acceptable by GDAL and rasterio library. - -Please refer to the graphic below to check which metadata are required for your product **before starting the Product upload.** - -![metadata-stac](https://github.com/EOEPCA/open-science-catalog-metadata/assets/120453810/71b8e8a7-9a86-491b-ae54-1fb4de9ccf32) diff --git a/pages/Technical Documentation/ESA Project Results Repository (PRR)/index.md b/pages/Technical Documentation/ESA Project Results Repository (PRR)/index.md new file mode 100644 index 00000000..6bdc918c --- /dev/null +++ b/pages/Technical Documentation/ESA Project Results Repository (PRR)/index.md @@ -0,0 +1,23 @@ +--- +order: 1 +--- +# ESA Project Results Repository + + +# ESA Project Results Repository + +The [ESA Project Results Repository (PRR)](https://eoresults.esa.int/) provides long term storage for research outcomes. It provides access to data, workflows, experiments and documentation from ESA Projects organised across Collections, accessible via the [STAC API](https://github.com/radiantearth/stac-api-spec). Each Collection contains [STAC Items](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md), with their related Assets stored within the PRR storage. Scientists/commercial companies can access the PRR via the [EarthCODE](https://earthcode.esa.int/) and [APEx](https://esa-apex.github.io/apex_documentation/) projects. + + +# Uploading data to the PRR +In order to upload data to the ESA Project Results Repository (PRR) you have to generate a STAC Collection that is associated to your files. The STAC Collection provides metadata about your files and makes them searchable and machine readable. The metadata generation process is organised in four steps process: + +1. Generate a root STAC Collection +2. Group your dataset files into STAC Items and STAC Assets +3. Add the Items to the Collection +4. Save the normalised Collection +5. Send the data, metadata and some extra information to the EarthCODE team. + +**In the examples you will find guides to the whole process, code samples from other ESA projects, as well as instructions how to access data from and traverse the PRR**. We recomend starting with the introductory notebook. + +- [PRR Examples](https://esa-earthcode.github.io/examples/index-1/) diff --git a/pages/Technical Documentation/Open Science Catalog (OSC)/Contributing to the Open Science Catalog.md b/pages/Technical Documentation/Open Science Catalog (OSC)/Contributing to the Open Science Catalog.md new file mode 100644 index 00000000..19a8122a --- /dev/null +++ b/pages/Technical Documentation/Open Science Catalog (OSC)/Contributing to the Open Science Catalog.md @@ -0,0 +1,96 @@ +--- +order: 1 +--- +# Publishing Science Results + +This section describes how to add entries - data, workflows, products and projects - to the (Open Science Catalog)[https://opensciencedata.esa.int/]. + +## Who can contribute? +Contributions to the Open Science Catalog are vital for advancing FAIR Open Science Principles across ESA-funded Earth Science activities. +We would like to specifically encourage contributions from: + +- Principal Investigators of ESA EO (Earth Observation Programme) funded Projects, +- Researchers, Scientists, Data Owners working on ESA-funded EO Projects, +- Principal Investigators, Researchers and Scientists from ESA Science Cluster Projects, +- ESA Technical Officers leading ESA EO Projects +- ESA-ESRIN Science Hub Members (e.g. ESA post-doctoral Research Fellows, ESA Living Planet Fellowship, ESA Visiting Scientists) +- Wider EO Research and Science Community: contact the EarthCODE team at [earth-code@esa.int](mailto:earth-code@esa.int) for more details! + +### You can enrich EarthCODE in several impactful ways: + +- **Publish Research Products:** Add new content to the Open Science Catalog. +- **Update content:** Keep the descriptions and metadata of products, projects, and more up-to-date. +- **Request removals:** Ensure the catalog remains accurate by requesting the removal of outdated or incorrect entries. + +To contribute, you only need to have an active GitHub account. If you don't have one yet, please [create an account](https://github.com/signup) to get started. + + +## How to Publish Results + +To publish your scientific results to the Open Science Catalog, you must: + +1. Host your **datasets**, **code** and **documentation** online. + +::: details Proprietary data +Sometimes parts of the data and workflows are protected or private. Although not open, these experiments can still become FAIR and added to the catalogue. The process for adding the entries is the same, until the review, when the EarthCODE team will reach out with specific questions regarding your data. +::: + +2. Create entries (STAC Collections) that describe the **dataset files**, **code** and their relationships to existing items in the catalog. + +3. Request to add them to the PRR. + +By following these steps, your research becomes part of a broader ecosystem of reusable, discoverable, and connected scientific outputs. + + + +## Step 1: Hosting your **datasets**, **code** and **documentation** online + +To contribute to the Open Science Catalog, your research data and workflows/code must be hosted on remote, persistent storage that allows discovery. Examples include: +- ESA’s Project Results Repository (PRR) +- S3-compatible object storage +- GitHub for code +- Zenodo, CEDA, Dataverse, or other persistent archives + +If your data is already hosted on a reliable storage provider you do **not** need to make changes. + +If your data is not yet in the cloud or its persistence is uncertain, we recommend uploading it to the **ESA Project Results Repository (PRR)**. The PRR provides access to data, workflows, experiments and documentation from ESA Projects organised across Collections, accessible via the [STAC API](https://github.com/radiantearth/stac-api-spec). Each Collection contains [STAC Items](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md), with their related Assets stored within the PRR storage. **Therefore, to upload your data to the PRR you have to generate a STAC collection that describes your data, code and documentation.** + + +See the [PRR introduction example](https://esa-earthcode.github.io/examples/prr-stac-introduction/) for a detailed, interactive introduction about how to do this, or the [bank of examples](https://esa-earthcode.github.io/examples/index-1/) to see how different ESA projects have generated their collections. + +::: details Requesting PRR Storage +If you have any questions or require suppport please email the EarthCODE support team: [earth-code@esa.int](mailto:earth-code@esa.int) . +::: + + + +## Step 2: Creating OSC Entries + +### How to publish new data to the catalog? + +The Open Science Catalog is built on the Spatio Temporal Asset Catalog (STAC), which is a standardised format for describing geospatial data. Therefore new entries must conform to its specification. There are three ways to add information to the OSC: + +### 1: Using a Visual GUI + +- [Open Science Catalog Editor](https://workspace.earthcode.eox.at/) - A graphical user interface for automatically creating OSC entries and review requests. + +### 2: Manual creation +- [Directly creating/editing STAC files](https://esa-earthcode.github.io/examples/osc-pr-manual/) - A guide for manually creating OSC entries. Requires knowledge of git. + +- [Generating OSC files using pystac](https://esa-earthcode.github.io/examples/osc-pr-pystac/) - A guide for creating OSC entries using pystac. Requires knowledge of git and Python. + +### 3: Using one of the platform tools +- [DeepCode](https://github.com/deepesdl/deep-code) - An example using DeepCode: a library for automatically generating product entries for DeepESDL datasets. +- Additionally, you can contact your platform supplier for support. + + + +## Step 3: Review & Publishing + +Regardless of what option for creating OSC Entries you choose, the generated data will be reviewed by EarthCODE team before it is accepted into the PRR. The review process will take place on GitHub via its [pull request functionality](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests). The EarthCODE team will: +- check the accuracy and completeness of descriptions, links and information +- ask for a code snippet that shows how to read the data (if applicable) +- ask for a code snipptet that demonstrates how to run the code (if applicable) +After any required changes are made, the OSC entries are ingested in the catalog. + +When a new product or workflow is ingested in the OSC, the team will encourage you to promote it on the (EarthCODE forum)[https://discourse-earthcode.eox.at/]. \ No newline at end of file diff --git a/pages/Technical Documentation/Data/Discovering Resources in The EarthCODE Catalog.md b/pages/Technical Documentation/Open Science Catalog (OSC)/Discovering Resources in The EarthCODE Catalog.md similarity index 100% rename from pages/Technical Documentation/Data/Discovering Resources in The EarthCODE Catalog.md rename to pages/Technical Documentation/Open Science Catalog (OSC)/Discovering Resources in The EarthCODE Catalog.md diff --git a/pages/Technical Documentation/Open Science Catalog (OSC)/Open Science Catalog.md b/pages/Technical Documentation/Open Science Catalog (OSC)/Open Science Catalog.md new file mode 100644 index 00000000..061fa8c9 --- /dev/null +++ b/pages/Technical Documentation/Open Science Catalog (OSC)/Open Science Catalog.md @@ -0,0 +1,154 @@ +--- +order: 1 +--- +# Open Science Catalog (OSC) + +## Introduction + +The Open Science Catalog (OSC) is a key component of the ESA EO Open Science framework. It is built on the Spatio Temporal Asset Catalog (STAC), which is a standardised format for describing geospatial data. The catalog captures information about Projects, Products, Workflows, and Experiments, and their relationships to ESA Themes, Variables, EO missions. These elements contain **information and direct links** to the corresponding research outcomes, which are themselves located in external storage providers. + +Users can browse and explore these interlinked elemennts throguht the webrowser, API or directly through the data itself. See [Data Discovery and Access](Discovering%20Resources%20in%20The%20EarthCODE%20Catalog) for more information. + + +## Adding / updating entries. + +The different ways to add/update/remove entries from the catalog are described in the [Uploading and Managing Your Data section](Contributing%20to%20the%20EarthCODE%20Catalog). + +At a core level, each update to metadata is handled via a [Pull Request (PR)](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests). + +This Pull Request allows for reviewers to see the changes to be applied in advance, to check for validity of the requested changes (via an automated validation script) and to provide reviews as comments. + + +## Structure + +The Open Science Catalog is a deployment of several EOEPCA components, in combination with additional supplementary components. In this section we focus on the metadata and its structure, as that is what most users will need to work with. You can see the full technical architechture of the Open Science Catalog [here](https://github.com/ESA-EarthCODE/open-science-catalog-metadata/wiki/System-Design-Document-%E2%80%90-v1.0.0). + +The Open Science Catalog metadata is a STAC catalog comprised of json files, with specific attributes and structure that together describe its elements - Themes, Variables, EO Missions, Projects, Products, Workflows, and Experiments. All files are stored directly on Github [here] (https://github.com/ESA-EarthCODE/open-science-catalog-metadata/tree/main), as they only contain metadata and links and not the actual data in the products, or the code in the workflows. + +Detailed information about Projects, Products, Workflows, and Experiments is available in the tutorial which shows how to manually create files for the OSC - [here](https://esa-earthcode.github.io/examples/osc-pr-manual/) . + + +### Projects + +Projects are the containers that have the top level information about your work. It is the first type of information you should provide. Typically an OSC project corresponds to a project financed by the European Space Agency - Earth Observation programme. Before creating new project, check if your project is not already on the [list of onboarded projects](https://opensciencedata.esa.int/projects/catalog). In such case you can use your project entry and only update it where needed. + + +| **Field** | **Description** | **STAC representation** | +|--------------------|--------------------------|------------------------------------| +| Project_ID | Numeric identifier | | +| Status | “ongoing” or “completed” | osc:status property | +| Project_Name | Name | title property | +| Short_Description | | description property | +| Website | | link | +| Eo4Society_link | | link | +| Consortium | | contacts[].name property | +| Start_Date_Project | | extent.temporal[] property | +| End_Date_Project | | extent.temporal[] property | +| TO | | contacts[].name property | +| TO_E-mail | | contacts[].emails[].value property | +| Theme1 - Theme6 | Theme identifiers | osc:themes property | + + +Metadata of each project is stored in a folder named after their unique id `(collectionid)`. Each folder has one file - collection.json that has all the project information (metadata). + +In addition to specifying the links within the project collection.json entry (created above), you should also add an entry in the parent catalog, listing all projects to be correclty rendered into STAC Browser. + +### Products + +Products represent the outputs of you projects and typically reference datasets. Similarly to Projects, they are STAC items and follow similar structure, with some additional fields, improving their findability. + +| **Field** | **Description** | **STAC representation** | +|---------------------|---------------------------------------|---------------------------------------| +| **ID** | Numeric identifier | | +| **Status** | “ongoing” or “completed” | osc:status property | +| **Project** | The project identifier | osc:project property, collection link | +| **Website** | | link | +| **Product** | Name | link | +| **Short_Name** | | identifier | +| **Description** | | description property | +| **Access** | URL | link | +| **Documentation** | URL | link | +| **Version** | | version property | +| **DOI** | Digital Object Identifier | sci:doi property and cite-as link | +| **Variable** | Variable identifier | collection link | +| **Start** | | extent.temporal[] | +| **End** | | extent.temporal[] | +| **Region** | | osc:region property | +| **Polygon** | | geometry | +| **Released** | | created property | +| **Theme1 - Theme6** | Theme identifiers | osc:themes property | +| **EO_Missions** | Semi-colon separated list of missions | osc:missions property | +| **Standard_Name** | | cf:parameter.name property | + + +In addition to specifying the links from the product to other parts of the catalog, **it is required** to add the reverse links, as in case of the Project to following elements: +- From the Product Collection.json to the Catalog.json (listing all products in the OSC) +- From the associated Project to the Product +- From the associated EO-Missions catalog to the Product +- From the associated Variables Catalog to the Product +- From the associated Themes Catalog to the Product + + +## Workflows + +Workflows are the code and workflows associated with a project, that have been used to generate a specific product. Workflows follow `OGC record specifications` in contrast to OSC Projects and Products entries. However, the metadata of a workflow is also expressed in JSON format. + +| Field Name | Description | +| ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `conformsTo` | An array of URIs indicating which OGC API Records specifications this record conforms to. | +| `type` | Indicates the GeoJSON object type. Required to be `"Feature"` for OGC compliance. | +| `geometry` | Spatial representation of the item. Set to `None` here, as it may not be spatially explicit. | +| `linkTemplates` | An array of link templates as per the OGC API. Used for dynamic link generation. | +| `id` | Unique identifier for the workflow STAC item (`'worldcereal-workflow2'`). | +| `links` | List of external and internal references including catalog navigation, project association, theme association, process graph, source code, and service endpoint. | +| `properties.contacts` | List of individuals or organizations associated with the workflow. Each contact may include name, email, and roles such as `technical_officer` or `consortium_member`. | +| `properties.created` | Timestamp representing when the workflow was first created (`2025-07-14T18:02:13Z`). | +| `properties.updated` | Timestamp of the most recent update to the workflow (`2025-07-14T18:02:13Z`). | +| `properties.version` | The version number of the workflow (`1`). | +| `properties.title` | A concise, descriptive title of the workflow: *"ESA worldcereal global crop extent detector2"*. | +| `properties.description` | A summary of what the workflow does: *"Detects crop land at 10m resolution, trained for global use..."*. | +| `properties.keywords` | Array of keywords to support discoverability (e.g., `agriculture`, `crops`). | +| `properties.themes` | Array of themes the workflow relates to. Each entry includes a `concepts` array with IDs (e.g., `'land'`) and a `scheme` URL. | +| `properties.formats` | Output formats of the workflow (e.g., `GeoTIFF`). | +| `properties.osc:project` | Project ID associated with the workflow (`worldcereal2`). | +| `properties.osc:status` | Current status of the workflow (e.g., `completed`). | +| `properties.osc:type` | Type of OSC object, expected to be `workflow`. | +| `properties.license` | License for the workflow (e.g., `'varuious'` – likely a typo for `various`). | + + + +In addition to specifying the links from the workflow to other parts of the catalog, **it is required** to add the reverse links: + +- From the Workflow record.json to the workflows/catalog.json (listing all workflows in the OSC) +- From the associated Project to the Workflow +- From the associated Themes to the Workflow + + +### Themes + +Themes describe the Earth Science topics linked to the grand science challenges set in the ESA strategy. This is a fixed list. + +Field | Description | STAC representation +-- | -- | -- +theme | Theme name | id +description | Theme description | description +link | Link to further resources | link + +### Variables + +The variables field describes the Geoscience, climate and environmental variables that products and workflows model. There is a fixed list of variables, however if your variable is missing from it you can add it in your Pull Request. + +Field | Description | STAC representation +-- | -- | -- +theme | The associated theme name | osc:theme property +theme_description |   |   +link | Link to further resources | link +variable | The variable name | id +domain | The variables domain |   +variable description |   | description + + + + + + diff --git a/pages/Technical Documentation/Data/Using Data in Workflows.md b/pages/Technical Documentation/Open Science Catalog (OSC)/Using Data in Workflows.md similarity index 100% rename from pages/Technical Documentation/Data/Using Data in Workflows.md rename to pages/Technical Documentation/Open Science Catalog (OSC)/Using Data in Workflows.md diff --git a/pages/Technical Documentation/Data/index.md b/pages/Technical Documentation/Open Science Catalog (OSC)/index.md similarity index 50% rename from pages/Technical Documentation/Data/index.md rename to pages/Technical Documentation/Open Science Catalog (OSC)/index.md index 8e6c3a2a..fd7279ca 100644 --- a/pages/Technical Documentation/Data/index.md +++ b/pages/Technical Documentation/Open Science Catalog (OSC)/index.md @@ -1,15 +1,18 @@ --- order: 1 --- -# Working with Data +# Open Science Catalog -The Working with Data section provides detailed instructions on how to efficiently access, manage, and use data within the EarthCODE environment. This includes discovering and accessing datasets through the Open Science Catalog, uploading and managing your own data, and incorporating datasets into your workflows for analysis and processing. These subsections will guide you through the steps needed to ensure seamless integration of data into your research projects. +The Open Science Catalog Section provides detailed instructions on how to efficiently access, manage, and use data from EarthCode. This includes discovering and accessing datasets through the Open Science Catalog, uploading and managing your own data, and incorporating datasets into your workflows for analysis and processing. + +## [Open Science Catalog Overivew](Open%20Science%20Catalog) +This subsection gives an overview of the Open Science Catalog, its core entries - products, workflows, projects and experiments, and how they are structured and stored. ## [Data Discovery and Access](Discovering%20Resources%20in%20The%20EarthCODE%20Catalog) You will learn how to search for and access data through the Open Science Catalog. It covers instructions on using metadata and dependencies, helping you locate relevant datasets for your research. Additionally, you will gain insight into how to utilize APIs for automated access to data. ## [Uploading and Managing Your Data](Contributing%20to%20the%20EarthCODE%20Catalog) -Here, you will find guidelines on how to upload your own datasets to EarthCODE. This includes managing data storage, handling file formats, and applying version control to ensure the integrity and traceability of your datasets over time. The section also covers best practices for organizing and categorizing your data for easier retrieval. +Here, you will find guidelines on how to incorporate your own datasets, code and documentation to the Open Science Catalog. ## [Using Data in Workflows](Using%20Data%20in%20Workflows) This subsection explains how to incorporate data into your workflows and experiments within EarthCODE. It provides step-by-step instructions on integrating external datasets into your research workflows, enabling you to perform analysis, processing, and visualization tasks in an efficient and reproducible manner. diff --git a/pages/Technical Documentation/Workflows/index.md b/pages/Technical Documentation/Workflows/index.md deleted file mode 100644 index df2089c8..00000000 --- a/pages/Technical Documentation/Workflows/index.md +++ /dev/null @@ -1,28 +0,0 @@ ---- -order: 1 ---- -# Working with Workflows -:::warning 🛠️ Page Under Development -Content is being actively developed and updated for this page. EarthCODE's documentation is a living document and will be continuously updated with detailed reviews. -::: -## Workflow Management - -Guidance on using EarthCODE's workflow tools, such as interactive notebooks, containers, and pipeline orchestration for reproducible research. - -!(cross platform)[/img/terms/cross-platform-reusability.png] - -## Creating Workflows -How to develop research workflows using EarthCODE’s interactive graphical tooling, including workflow design and validation. - -## Running Workflows -How to execute workflows using the platform’s cloud-based resources, including how to monitor progress and manage execution. - -## Automation and Continuous Integration - -Setting up automated processes for testing, deployment, and data processing using continuous integration/continuous deployment (CI/CD) pipelines. - - -## Using Machine Learning Tools -Instructions on setting up and running machine learning models within EarthCODE’s scalable environment. - - From a989aeabd91ba3b209b3416ef58e557680e6d93a Mon Sep 17 00:00:00 2001 From: Krasen Samardzhiev Date: Mon, 11 Aug 2025 11:57:58 +0100 Subject: [PATCH 2/2] changes --- ...ontributing to the Open Science Catalog.md | 20 +++++++++---------- pages/Technical Documentation/index.md | 4 ++-- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/pages/Technical Documentation/Open Science Catalog (OSC)/Contributing to the Open Science Catalog.md b/pages/Technical Documentation/Open Science Catalog (OSC)/Contributing to the Open Science Catalog.md index 19a8122a..f7440ae3 100644 --- a/pages/Technical Documentation/Open Science Catalog (OSC)/Contributing to the Open Science Catalog.md +++ b/pages/Technical Documentation/Open Science Catalog (OSC)/Contributing to the Open Science Catalog.md @@ -3,7 +3,7 @@ order: 1 --- # Publishing Science Results -This section describes how to add entries - data, workflows, products and projects - to the (Open Science Catalog)[https://opensciencedata.esa.int/]. +This section describes how to publish entries - data, workflows, products and projects - to the (Open Science Catalog)[https://opensciencedata.esa.int/]. ## Who can contribute? Contributions to the Open Science Catalog are vital for advancing FAIR Open Science Principles across ESA-funded Earth Science activities. @@ -35,9 +35,9 @@ To publish your scientific results to the Open Science Catalog, you must: Sometimes parts of the data and workflows are protected or private. Although not open, these experiments can still become FAIR and added to the catalogue. The process for adding the entries is the same, until the review, when the EarthCODE team will reach out with specific questions regarding your data. ::: -2. Create entries (STAC Collections) that describe the **dataset files**, **code** and their relationships to existing items in the catalog. +2. Create entries (STAC Collections) that describe the **dataset files**, **code** and their relationships to existing items in the catalog. These entries follow the [OSC STAC specification](https://github.com/stac-extensions/osc). -3. Request to add them to the PRR. +3. Request to add them to the PRR via one of the three options described below. By following these steps, your research becomes part of a broader ecosystem of reusable, discoverable, and connected scientific outputs. @@ -53,10 +53,10 @@ To contribute to the Open Science Catalog, your research data and workflows/code If your data is already hosted on a reliable storage provider you do **not** need to make changes. -If your data is not yet in the cloud or its persistence is uncertain, we recommend uploading it to the **ESA Project Results Repository (PRR)**. The PRR provides access to data, workflows, experiments and documentation from ESA Projects organised across Collections, accessible via the [STAC API](https://github.com/radiantearth/stac-api-spec). Each Collection contains [STAC Items](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md), with their related Assets stored within the PRR storage. **Therefore, to upload your data to the PRR you have to generate a STAC collection that describes your data, code and documentation.** +If your data is not yet hosted online or its persistence is uncertain, we recommend uploading it to the **ESA Project Results Repository (PRR)**. The PRR provides access to data, workflows, experiments and documentation from ESA Projects organised across Collections, accessible via the [STAC API](https://github.com/radiantearth/stac-api-spec). Each Collection contains [STAC Items](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md), with their related Assets stored within the PRR storage. **Therefore, to upload your data to the PRR you have to generate a STAC collection that describes your data, code and documentation.** -See the [PRR introduction example](https://esa-earthcode.github.io/examples/prr-stac-introduction/) for a detailed, interactive introduction about how to do this, or the [bank of examples](https://esa-earthcode.github.io/examples/index-1/) to see how different ESA projects have generated their collections. +See the [PRR page](../ESA%20Project%20Results%20Repository%20(PRR)/) for a detailed, interactive introduction and a bank of examples of how different ESA projects have generated their collections. ::: details Requesting PRR Storage If you have any questions or require suppport please email the EarthCODE support team: [earth-code@esa.int](mailto:earth-code@esa.int) . @@ -68,18 +68,18 @@ If you have any questions or require suppport please email the EarthCODE support ### How to publish new data to the catalog? -The Open Science Catalog is built on the Spatio Temporal Asset Catalog (STAC), which is a standardised format for describing geospatial data. Therefore new entries must conform to its specification. There are three ways to add information to the OSC: +The Open Science Catalog is built on the Spatio Temporal Asset Catalog (STAC), which is a standardised format for describing geospatial data. Therefore new entries must conform to its specification. There are three ways to create the entries: -### 1: Using a Visual GUI +### 1: Using the Visual OSC Editor -- [Open Science Catalog Editor](https://workspace.earthcode.eox.at/) - A graphical user interface for automatically creating OSC entries and review requests. +- The [Open Science Catalog Editor](https://workspace.earthcode.eox.at/) is graphical user interface for automatically creating OSC entries and review requests. ### 2: Manual creation - [Directly creating/editing STAC files](https://esa-earthcode.github.io/examples/osc-pr-manual/) - A guide for manually creating OSC entries. Requires knowledge of git. - [Generating OSC files using pystac](https://esa-earthcode.github.io/examples/osc-pr-pystac/) - A guide for creating OSC entries using pystac. Requires knowledge of git and Python. -### 3: Using one of the platform tools +### 3: Using platform tools and support - [DeepCode](https://github.com/deepesdl/deep-code) - An example using DeepCode: a library for automatically generating product entries for DeepESDL datasets. - Additionally, you can contact your platform supplier for support. @@ -87,7 +87,7 @@ The Open Science Catalog is built on the Spatio Temporal Asset Catalog (STAC), w ## Step 3: Review & Publishing -Regardless of what option for creating OSC Entries you choose, the generated data will be reviewed by EarthCODE team before it is accepted into the PRR. The review process will take place on GitHub via its [pull request functionality](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests). The EarthCODE team will: +Regardless of what option for creating OSC Entries you choose, the generated data will be reviewed by EarthCODE team before it is accepted into the PRR. The review process will take place on GitHub via its [pull request functionality](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests). During the review the EarthCODE team will: - check the accuracy and completeness of descriptions, links and information - ask for a code snippet that shows how to read the data (if applicable) - ask for a code snipptet that demonstrates how to run the code (if applicable) diff --git a/pages/Technical Documentation/index.md b/pages/Technical Documentation/index.md index 2c8f2b08..e5bdf051 100644 --- a/pages/Technical Documentation/index.md +++ b/pages/Technical Documentation/index.md @@ -10,5 +10,5 @@ Content is being actively developed and updated for this page. EarthCODE's docum The **Technical Documentation** page offers comprehensive guidance on how to configure and personalize your EarthCODE environment to meet your specific research needs. This section provides step-by-step instructions on accessing and integrating the tools, data, and services required for your projects. Whether you're setting up your workspace, connecting to cloud platforms, or ensuring access to the Open Science Catalog, this page will equip you with everything you need to create an efficient and effective research environment. It also covers best practices for managing data, code, and workflows, helping you establish a seamless and collaborative research process. - [Working with Platforms](./Platforms/) -- [Working with Data](./Data/) -- [Working with Workflows](./Workflows/) +- [Working with the ESA Project Results Repository (PRR)](./ESA%20Project%20Results%20Repository%20(PRR)/) +- [Working with the Open Science Catalog (OSC)](./Open%20Science%20Catalog%20(OSC)/) \ No newline at end of file