diff --git a/OSC/publishing_to_osc.ipynb b/OSC/publishing_to_osc.ipynb new file mode 100644 index 00000000..9c456e25 --- /dev/null +++ b/OSC/publishing_to_osc.ipynb @@ -0,0 +1,663 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "8a23d4a6-e0a5-4ea7-986b-88595635f928", + "metadata": {}, + "source": [ + "# Publishing Product to the Open Science Catalog\n", + "## Purpose\n", + "The purpose of this tutorial is to provide a guide as to how to publish your product to the Open Science Catalog (OSC). This is the last step in the publishing pipeline. You will have the option to fill in some fields relating to your data product, and by running through the rest of the notebook you should be able to generate an appropriate `product` entry in the OSC.\n", + "\n", + "We will do the following:\n", + "- Define our descriptory fields, such as id, title, description, extent, and more.\n", + "- Determine the relevant pre-existing metadata objects in the OSC, such as Project, Variables, Themes and EO mission\n", + "- Generate a valid product JSON object containing all this information (later stored as `collection.json`)\n", + "- Store this JSON as a valid STAC object in the `open-science-catalog-metadata-staging` repository\n", + "- Update relevant pre-existing metadata objects to link to our new object\n", + "- Explain how to use Git to create a Pull Request with our new OSC entry\n", + "\n", + "## Prerequisites\n", + "This notebook assumes that you have already prepared your Item Catalog / Data Package as a self-contained STAC catalog in some other, persistent repository. You should have a link to a `catalog.json` file stored remotely. \n", + "\n", + "If you haven't, please refer to the tutorials and guides on how you should create your Item Catalog." + ] + }, + { + "cell_type": "markdown", + "id": "631af09c-2926-441c-a619-4377d065cc75", + "metadata": {}, + "source": [ + "# Importing dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ac5b5fab-acc2-4df7-b31d-8c43623dff4e", + "metadata": {}, + "outputs": [], + "source": [ + "from datetime import datetime, timedelta\n", + "import pystac\n", + "import json" + ] + }, + { + "cell_type": "markdown", + "id": "4da88598-dbf7-4c94-9b1d-a8cdd9198c70", + "metadata": {}, + "source": [ + "# Describing our Product\n", + "Please make the appropriate edits to accurately describe your product here. All these cells should be adjusted for your product." + ] + }, + { + "cell_type": "markdown", + "id": "1e6727c7-a00c-4b21-9e04-af29eac41ff3", + "metadata": {}, + "source": [ + "## General Metadata" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bf32e6f1-dbd8-4b4f-82ac-2eddfe1977a2", + "metadata": {}, + "outputs": [], + "source": [ + "PRODUCT_ID: str = \"my-product-id\"\n", + "PRODUCT_TITLE: str = \"My Product Title\"\n", + "PRODUCT_DESCRIPTION: str = \"\"\"A detailed description of my dataset\"\"\"\n", + "\n", + "KEYWORDS: list[str] = [\"Keyword1\", \"Keyword2\"]\n", + "REGION: str = \"The region of the data\" # e.g. Antarctica, Europe, America\n", + "PRODUCT_STATUS = \"ongoing\" # planned | ongoing | completed" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b6b932f3-2796-4819-99c1-a3bb332417a9", + "metadata": {}, + "outputs": [], + "source": [ + "time_format = \"%Y-%m-%dT%H:%M:%SZ\" # write your own temporal extent in this format\n", + "TEMPORAL_EXTENT: list[str] = [\n", + " datetime.strftime(datetime.now() - timedelta(weeks=52), time_format), \n", + " datetime.strftime(datetime.now(), time_format),\n", + "]\n", + "\n", + "SPATIAL_EXTENT: list[float] = [-180.0, -90.0, 180.0, 90.0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4535dec1-2157-41e5-bdbb-b80666c15422", + "metadata": {}, + "outputs": [], + "source": [ + "# link to pre-existing Item Collection root catalog.json\n", + "ITEM_COLLECTION: str = \"https://raw.githubusercontent.com/anders0204/supraglacial-lakes-item-catalog/refs/heads/main/catalog.json\"" + ] + }, + { + "cell_type": "markdown", + "id": "72ec8d7f-8893-4f74-8bc7-46ee1652bd2b", + "metadata": {}, + "source": [ + "## Pre-existing OSC collections\n", + "Visit the open science catalog metadata staging GitHub for links to the existing collections.\n", + "\n", + "**Remember to use the _raw_ file links!**" + ] + }, + { + "cell_type": "markdown", + "id": "5eb6a22f-bf88-4226-a677-896449b12b13", + "metadata": {}, + "source": [ + "### Project\n", + "If the associated project for the product is already existing in the OSC, provide a link to its `collection.json` file on the OSC GitHub.\n", + "\n", + "If not, leave this variable as `None` and we will generate a Project entry based on the metadata for the product. You can change this file manually later to add contacts, websites, and more." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da1a78f7-0f2f-4f80-9c23-1202c0a44302", + "metadata": {}, + "outputs": [], + "source": [ + "PROJECT: str | None = None" + ] + }, + { + "cell_type": "markdown", + "id": "82d431f3-9bce-4417-927b-558f2343e0f3", + "metadata": {}, + "source": [ + "### Variables" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "efdcdb45-5f07-4fa1-8f19-87989f57263c", + "metadata": {}, + "outputs": [], + "source": [ + "VARIABLES: list[str] = [\n", + " # river ice\n", + " \"https://raw.githubusercontent.com/ESA-EarthCODE/open-science-catalog-metadata-staging/refs/heads/main/variables/river-ice/catalog.json\",\n", + " # h2o\n", + " \"https://raw.githubusercontent.com/ESA-EarthCODE/open-science-catalog-metadata-staging/refs/heads/main/variables/h2o/catalog.json\"\n", + "]" + ] + }, + { + "cell_type": "markdown", + "id": "0546fa4d-efdb-4e03-b69d-6b8f7a423923", + "metadata": {}, + "source": [ + "### Themes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f07a5a9e-624c-4cec-b79e-2ad4e25e67da", + "metadata": {}, + "outputs": [], + "source": [ + "THEMES: list[str] = [\n", + " # land\n", + " \"https://raw.githubusercontent.com/ESA-EarthCODE/open-science-catalog-metadata/refs/heads/main/themes/land/catalog.json\",\n", + "]" + ] + }, + { + "cell_type": "markdown", + "id": "c4e12ea9-9e6f-43c2-bccb-28bf984e9afc", + "metadata": {}, + "source": [ + "### EO Missions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "694dc789-ea3b-4a6c-b393-91168f0c7bac", + "metadata": {}, + "outputs": [], + "source": [ + "EO_MISSIONS: list[str] = [\n", + " # cryosat\n", + " \"https://raw.githubusercontent.com/ESA-EarthCODE/open-science-catalog-metadata/refs/heads/main/eo-missions/cryosat/catalog.json\",\n", + "]" + ] + }, + { + "cell_type": "markdown", + "id": "1d7a4cae-04c6-45ba-9972-dbbb2f6d1e64", + "metadata": {}, + "source": [ + "# Generating the product JSON\n", + "Here we will generate the product manually as a python dictionary. This part is not intended to be edited, simply run through the cells to generate a product based on the values you defined above." + ] + }, + { + "cell_type": "markdown", + "id": "3767cd50-e299-435f-9a54-063193776024", + "metadata": {}, + "source": [ + "## Creating Base" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e6db3f71-63c2-49c9-bd24-cd13ab65951d", + "metadata": {}, + "outputs": [], + "source": [ + "time_now = datetime.strftime(datetime.now(), time_format)\n", + "\n", + "product = {\n", + " \"type\": \"Collection\",\n", + " \"id\": PRODUCT_ID,\n", + " \"stac_version\": \"1.0.0\",\n", + " \"description\": PRODUCT_DESCRIPTION,\n", + " \"updated\": time_now,\n", + " \"title\": PRODUCT_TITLE,\n", + " \"licence\": \"proprietary\",\n", + " \"keywords\": KEYWORDS,\n", + " \"extent\": {\n", + " \"spatial\": {\n", + " \"bbox\": [\n", + " SPATIAL_EXTENT\n", + " ]\n", + " },\n", + " \"temporal\": {\n", + " \"interval\": [\n", + " TEMPORAL_EXTENT\n", + " ]\n", + " }\n", + " },\n", + " \"stac_extensions\": [\n", + " \"https://stac-extensions.github.io/osc/v1.0.0/schema.json\",\n", + " \"https://stac-extensions.github.io/themes/v1.0.0/schema.json\",\n", + " \"https://stac-extensions.github.io/cf/v0.2.0/schema.json\"\n", + " ],\n", + " \"osc:project\": PRODUCT_TITLE,\n", + " \"osc:status\": PRODUCT_STATUS,\n", + " \"osc:region\": REGION,\n", + " \"osc:type\": \"product\",\n", + " \"created\": time_now,\n", + " \"version\": \"1.0\",\n", + " \n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "12ee7a44-4beb-4523-9e05-1089e26a3e3a", + "metadata": {}, + "source": [ + "## Adding Links" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "db390953-ff37-47a8-871c-461120e329a3", + "metadata": {}, + "outputs": [], + "source": [ + "root_link = {\n", + " \"rel\": \"root\",\n", + " \"href\": \"../../catalog.json\",\n", + " \"type\": \"application/json\",\n", + " \"title\": \"Open Science Catalog\"\n", + "}\n", + "parent_link = {\n", + " \"rel\": \"parent\",\n", + " \"href\": \"../catalog.json\",\n", + " \"type\": \"application/json\",\n", + " \"title\": \"Products\"\n", + "}\n", + "\n", + "child_link = {\n", + " \"rel\": \"child\",\n", + " \"href\": ITEM_COLLECTION,\n", + " \"type\": \"application/json\",\n", + " \"title\": \"Items\"\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "28bdf0f1-38b8-47bf-9d0e-1bf500b1ac2d", + "metadata": {}, + "outputs": [], + "source": [ + "# Variables\n", + "variables_stac = []\n", + "variable_links = []\n", + "for file_name in VARIABLES:\n", + " stac_catalog = pystac.Catalog.from_file(file_name)\n", + " variables_stac.append(stac_catalog)\n", + "\n", + " variable_links.append(\n", + " {\n", + " \"rel\": \"related\",\n", + " \"href\": f\"../../variables/{stac_catalog.id}/catalog.json\",\n", + " \"type\": \"application/json\",\n", + " \"title\": f\"Variable: {stac_catalog.title}\"\n", + " })\n", + "\n", + "product[\"osc:variables\"] = [var.id for var in variables_stac]\n", + "\n", + "# Themes\n", + "themes_stac = []\n", + "theme_links = []\n", + "for file_name in THEMES:\n", + " stac_catalog = pystac.Catalog.from_file(file_name)\n", + " themes_stac.append(stac_catalog)\n", + "\n", + " theme_links.append(\n", + " {\n", + " \"rel\": \"related\",\n", + " \"href\": f\"../../themes/{stac_catalog.id}/catalog.json\",\n", + " \"type\": \"application/json\",\n", + " \"title\": f\"Theme: {stac_catalog.title}\"\n", + " })\n", + "\n", + "theme_ids = [{\"id\": theme.id} for theme in themes_stac]\n", + "\n", + "product[\"themes\"] = [\n", + " {\n", + " \"scheme\": \"https://github.com/stac-extensions/osc#theme\",\n", + " \"concepts\": theme_ids\n", + " }\n", + " ]\n", + "\n", + "# EO missions\n", + "eo_stac = []\n", + "eo_links = []\n", + "for file_name in EO_MISSIONS:\n", + " stac_catalog = pystac.Catalog.from_file(file_name)\n", + " eo_stac.append(stac_catalog)\n", + "\n", + " eo_links.append(\n", + " {\n", + " \"rel\": \"related\",\n", + " \"href\": f\"../../eo_missions/{stac_catalog.id}/catalog.json\",\n", + " \"type\": \"application/json\",\n", + " \"title\": f\"EO Mission: {stac_catalog.title}\"\n", + " })\n", + "\n", + "product[\"osc:missions\"] = [eo.id for eo in eo_stac]" + ] + }, + { + "cell_type": "markdown", + "id": "9a0473b6-7af0-4147-bc11-c5f269a2341a", + "metadata": {}, + "source": [ + "## Creating a project link" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5e8f3c67-99d9-41b4-b411-f5069cff9319", + "metadata": {}, + "outputs": [], + "source": [ + "if isinstance(PROJECT, str):\n", + " project_stac = pystac.Collection.from_file(PROJECT)\n", + " links.append(\n", + " {\n", + " \"rel\": \"related\",\n", + " \"href\": f\"../../projects/{project_stac.id}/collection.json\",\n", + " \"type\": \"application/json\",\n", + " \"title\": f\"Project: {project_stac.title}\"\n", + " }\n", + " )\n", + " product[\"osc:project\"] = project_stac.title\n", + " project = project_stac.to_dict()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b7de2708-a519-4a8e-aa60-3e1c2bee1be5", + "metadata": {}, + "outputs": [], + "source": [ + "if PROJECT is None:\n", + " project = {\n", + " \"type\": \"Collection\",\n", + " \"id\": PRODUCT_ID,\n", + " \"stac_version\": \"1.0.0\",\n", + " \"description\": PRODUCT_DESCRIPTION,\n", + " \"updated\": time_now,\n", + " \"title\": PRODUCT_TITLE,\n", + " \"licence\": \"proprietary\",\n", + " \"keywords\": KEYWORDS,\n", + " \"extent\": {\n", + " \"spatial\": {\"bbox\": [SPATIAL_EXTENT]},\n", + " \"temporal\": {\"interval\": [TEMPORAL_EXTENT]},\n", + " },\n", + " \"stac_extensions\": [\n", + " \"https://stac-extensions.github.io/osc/v1.0.0/schema.json\",\n", + " \"https://stac-extensions.github.io/themes/v1.0.0/schema.json\",\n", + " \"https://stac-extensions.github.io/contacts/v0.1.1/schema.json\",\n", + " ],\n", + " \"osc:status\": PRODUCT_STATUS,\n", + " \"themes\": [\n", + " {\n", + " \"scheme\": \"https://github.com/stac-extensions/osc#theme\",\n", + " \"concepts\": [theme_ids],\n", + " }\n", + " ],\n", + " \"osc:type\": \"project\",\n", + " \"contacts\": [ # Add all affiliations and contact points\n", + " {\n", + " \"name\": \"Your Name\",\n", + " \"emails\": [{\"value\": \"your.email@institution.org\"}],\n", + " \"roles\": [\"technical_officer\"],\n", + " },\n", + " {\n", + " \"name\": \"Name of an affiliated institution, organisation, etc.\",\n", + " \"roles\": [\"consortium_member\"],\n", + " },\n", + " {\n", + " \"name\": \"Name of another institution, organisation, etc.\",\n", + " \"roles\": [\"consortium_member\"],\n", + " },\n", + " ],\n", + " }\n", + "\n", + " project_links = [\n", + " {\n", + " \"rel\": \"root\",\n", + " \"href\": \"../../catalog.json\",\n", + " \"type\": \"application/json\",\n", + " \"title\": \"Open Science Catalog\",\n", + " },\n", + " {\n", + " \"rel\": \"via\", # Add all relevant websites, documentation, etc., with \"via\" links\n", + " \"href\": \"https://www..org/\",\n", + " \"title\": \"Website\",\n", + " },\n", + " {\n", + " \"rel\": \"child\",\n", + " \"href\": f\"../../products/{PRODUCT_ID}/collection.json\",\n", + " \"type\": \"application/json\",\n", + " \"title\": PRODUCT_TITLE,\n", + " },\n", + " {\n", + " \"rel\": \"parent\",\n", + " \"href\": \"../catalog.json\",\n", + " \"type\": \"application/json\",\n", + " \"title\": \"Projects\",\n", + " },\n", + " {\n", + " \"rel\": \"self\",\n", + " \"href\": f\"https://esa-earthcode.github.io/open-science-catalog-metadata/projects/{PRODUCT_ID}/collection.json\",\n", + " \"type\": \"application/json\",\n", + " },\n", + " ]\n", + "\n", + " for links in (variable_links, eo_links, theme_links):\n", + " for link in links:\n", + " project_links.append(link)\n", + "\n", + " project[\"links\"] = project_links" + ] + }, + { + "cell_type": "markdown", + "id": "ad00adb8-b439-4281-bfac-86e4454be13a", + "metadata": {}, + "source": [ + "### Finishing linking to our product" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "750d95a5-c89b-4097-8bab-1db14e6c6f0f", + "metadata": {}, + "outputs": [], + "source": [ + "product[\"links\"] = [link for links in (variable_links, eo_links, theme_links) for link in links] + [root_link, parent_link, child_link]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d8a25ee8-12f6-46ff-8dcc-1058be98c148", + "metadata": {}, + "outputs": [], + "source": [ + "link_to_project = {\n", + " \"rel\": \"related\",\n", + " \"href\": f\"../../projects/{project['id']}/collection.json\",\n", + " \"type\": \"application/json\",\n", + " \"title\": f\"Project: {project['title']}\"\n", + "}\n", + "product[\"links\"].append(link_to_project)" + ] + }, + { + "cell_type": "markdown", + "id": "b205b012-0b36-4241-b957-956bb3fa594b", + "metadata": {}, + "source": [ + "### **Done!**\n", + "We can now inspect the results:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "52aeb0ae-4ee3-4087-83d7-485d2ed2bb56", + "metadata": {}, + "outputs": [], + "source": [ + "print(json.dumps(product, indent=2))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "776f1b05-1005-4ced-952b-dde89a8a0928", + "metadata": {}, + "outputs": [], + "source": [ + "print(json.dumps(project, indent=2))" + ] + }, + { + "cell_type": "markdown", + "id": "5af14a5c-8d59-41af-ab55-bf057c072e5b", + "metadata": {}, + "source": [ + "# Saving dictionary as JSON\n", + "Now that we have the product represented as a dictionary in python, it's trivial to store it as a JSON object. The only thing you need to keep in mind is the location.\n", + "\n", + "To add a product to the Open Science Catalog, you should store the product under the `products/` folder in your local fork of the [`open-science-catalog-metadata-staging`](https://github.com/ESA-EarthCODE/open-science-catalog-metadata-staging) repository." + ] + }, + { + "cell_type": "markdown", + "id": "364f4da2-d363-4b19-874a-41162b471c84", + "metadata": {}, + "source": [ + "Change the following `OSC_ROOT` to your local path and run the cells to save the product file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6f14ecb0-cce2-4a6f-be0e-78c69aebfbd7", + "metadata": {}, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "\n", + "OSC_ROOT = Path(\"/open-science-catalog-metadata-staging/\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "617fb3cc-8388-489c-8a47-81a0f77cb378", + "metadata": {}, + "outputs": [], + "source": [ + "def save_json(obj: dict, location: Path) -> None:\n", + " if not location.parent.is_dir():\n", + " location.parent.mkdir(parents=True, exist_ok=True)\n", + " with open(location, \"w\") as f:\n", + " json.dump(obj, f)" + ] + }, + { + "cell_type": "markdown", + "id": "73c1b179-1fcd-4ba6-bd5f-02e65760a62a", + "metadata": {}, + "source": [ + "## Saving Product" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7145aa2d-99f0-4574-9c91-64bae021bc85", + "metadata": {}, + "outputs": [], + "source": [ + "product_path = OSC_ROOT / \"products\" / PRODUCT_ID / \"collection.json\"\n", + "save_json(product_path)" + ] + }, + { + "cell_type": "markdown", + "id": "103393f5-54d4-4cc2-9fbd-92606e77e81e", + "metadata": {}, + "source": [ + "## Saving Project" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b338f68d-4098-4276-9fd1-d808585620ab", + "metadata": {}, + "outputs": [], + "source": [ + "project_path = OSC_ROOT / \"projects\" / project[\"id\"] / \"collection.json\"\n", + "save_json(project_path)" + ] + }, + { + "cell_type": "markdown", + "id": "3d34804a-ba02-48c6-86f6-1a600540a141", + "metadata": {}, + "source": [ + "::: important\n", + "Before making a pull request, make sure that you add link to your new product in all the associated metadata catalogs for Variables, Themes and EO Missions\n", + ":::" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}