Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 60 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,16 @@
and Python API providing utilities that aid integration of DeepESDL datasets,
experiments with EarthCODE.

The first release will focus on implementing the publish feature of DeepESDL
experiments/workflow as OGC API record and Datasets as an OSC stac collection.

## Setup

## Install
`deep-code` will be available in PyPI and conda-forge. Till the stable release,
developers/contributors can follow the below steps to install deep-code.

## Installing from the repository for Developer
## Installing from the repository for Developers/Contributors

To install deep-code directly from the git repository, clone the repository, and execute the steps below:

Expand Down Expand Up @@ -72,16 +75,61 @@ github-token: personal access token
#### dataset-config.yaml example

```
dataset-id: hydrology-1D-0.009deg-100x60x60-3.0.2.zarr
collection-id: hydrology

#non-mandatory
documentation-link: https://deepesdl.readthedocs.io/en/latest/datasets/hydrology-1D-0-009deg-100x60x60-3-0-2-zarr/
access-link: s3://test
dataset-status: completed
dataset-region: global
dataset-theme: ["ocean", "environment"]
cf-parameter: [{"Name" : "hydrology"}]
dataset_id: hydrology-1D-0.009deg-100x60x60-3.0.2.zarr
collection_id: hydrology
osc_themes:
- Land
- Oceans
# non-mandatory
documentation_link: https://deepesdl.readthedocs.io/en/latest/datasets/hydrology-1D-0.009deg-100x60x60-3.0.2.zarr/
access_link: s3://test
dataset_status: completed
osc_region: global
cf_parameter:
- name: hydrology
```

dataset-id has to be a valid dataset-id from `deep-esdl-public` s3 or your team bucket.
dataset-id has to be a valid dataset-id from `deep-esdl-public` s3 or your team bucket.

### deep-code publish-workflow

Publish a workflow/experiment to the EarthCODE open-science catalog.

```commandline
deep-code publish-workflow /path/to/workflow-config.yaml
```
#### workflow-config.yaml example

```
workflow_id: "4D Med hydrology cube generation"
properties:
title: "Hydrology cube generation recipe"
description: "4D Med cube generation"
keywords:
- Earth Science
themes:
- Atmosphere
- Ocean
- Evaporation
license: proprietary
jupyter_kernel_info:
name: deepesdl-xcube-1.7.1
python_version: 3.11
env_file: https://git/env.yml
links:
- rel: "documentation"
type: "application/json"
title: "4DMed Hydrology Cube Generation Recipe"
href: "https://github.com/deepesdl/cube-gen/tree/main/hydrology/README.md"
- rel: "jupyter-notebook"
type: "application/json"
title: "Workflow Jupyter Notebook"
href: "https://github.com/deepesdl/cube-gen/blob/main/hydrology/notebooks/reading_hydrology.ipynb"
contact:
- name: Tejas Morbagal Harish
organization: Brockmann Consult GmbH
links:
- rel: "about"
type: "text/html"
href: "https://www.brockmann-consult.de/"
```
4 changes: 3 additions & 1 deletion deep_code/cli/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

import click

from deep_code.cli.publish import publish_dataset
from deep_code.cli.publish import publish_dataset, publish_workflow

Check warning on line 9 in deep_code/cli/main.py

View check run for this annotation

Codecov / codecov/patch

deep_code/cli/main.py#L9

Added line #L9 was not covered by tests


@click.group()
Expand All @@ -16,5 +16,7 @@


main.add_command(publish_dataset)
main.add_command(publish_workflow)

Check warning on line 19 in deep_code/cli/main.py

View check run for this annotation

Codecov / codecov/patch

deep_code/cli/main.py#L19

Added line #L19 was not covered by tests

if __name__ == "__main__":
main()
15 changes: 10 additions & 5 deletions deep_code/cli/publish.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,21 @@

import click

from deep_code.tools.publish import DatasetPublisher
from deep_code.tools.publish import DatasetPublisher, WorkflowPublisher

Check warning on line 9 in deep_code/cli/publish.py

View check run for this annotation

Codecov / codecov/patch

deep_code/cli/publish.py#L9

Added line #L9 was not covered by tests


@click.command(name="publish-dataset")
@click.argument(
"dataset_config",
type=click.Path(exists=True)
)
@click.argument("dataset_config", type=click.Path(exists=True))

Check warning on line 13 in deep_code/cli/publish.py

View check run for this annotation

Codecov / codecov/patch

deep_code/cli/publish.py#L13

Added line #L13 was not covered by tests
def publish_dataset(dataset_config):
"""Request publishing a dataset to the open science catalogue.
"""
publisher = DatasetPublisher()
publisher.publish_dataset(dataset_config_path=dataset_config)


@click.command(name="publish-workflow")
@click.argument("workflow_metadata", type=click.Path(exists=True))
def publish_workflow(workflow_metadata):

Check warning on line 23 in deep_code/cli/publish.py

View check run for this annotation

Codecov / codecov/patch

deep_code/cli/publish.py#L21-L23

Added lines #L21 - L23 were not covered by tests

workflow_publisher = WorkflowPublisher()
workflow_publisher.publish_workflow(workflow_config_path=workflow_metadata)

Check warning on line 26 in deep_code/cli/publish.py

View check run for this annotation

Codecov / codecov/patch

deep_code/cli/publish.py#L25-L26

Added lines #L25 - L26 were not covered by tests
5 changes: 5 additions & 0 deletions deep_code/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,8 @@
OSC_REPO_OWNER = "ESA-EarthCODE"
OSC_REPO_NAME = "open-science-catalog-metadata-testing"
OSC_BRANCH_NAME = "add-new-collection"
DEFAULT_THEME_SCHEME = (
"https://gcmd.earthdata.nasa.gov/kms/concepts/concept_scheme/sciencekeywords"
)
OGC_API_RECORD_SPEC = "http://www.opengis.net/spec/ogcapi-records-1/1.0/req/record-core"
WF_BRANCH_NAME = "add-new-workflow-from-deepesdl"
38 changes: 18 additions & 20 deletions deep_code/tests/tools/test_publish.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from unittest.mock import MagicMock, mock_open, patch

import pytest
from unittest.mock import patch, MagicMock, mock_open

from deep_code.tools.publish import DatasetPublisher

Expand Down Expand Up @@ -33,9 +34,7 @@ def test_publish_dataset_missing_ids(self, mock_fsspec_open):
publisher = DatasetPublisher()

with pytest.raises(
ValueError,
match="Dataset ID or Collection ID is missing in the "
"dataset-config.yaml file.",
ValueError, match="Dataset ID or Collection ID missing in the config."
):
publisher.publish_dataset("/path/to/dataset-config.yaml")

Expand All @@ -54,22 +53,21 @@ def test_publish_dataset_success(
mock_subprocess_run,
mock_chdir,
):

# Mock the YAML reads
git_yaml_content = """
github-username: test-user
github-token: test-token
"""
github-username: test-user
github-token: test-token
"""
dataset_yaml_content = """
dataset-id: test-dataset
collection-id: test-collection
documentation-link: http://example.com/doc
access-link: http://example.com/access
dataset-status: ongoing
dataset-region: Global
dataset-theme: ["climate"]
cf-parameter: []
"""
dataset_id: test-dataset
collection_id: test-collection
documentation_link: http://example.com/doc
access_link: http://example.com/access
dataset_status: ongoing
dataset_region: Global
osc_theme: ["climate"]
cf_parameter: []
"""
mock_fsspec_open.side_effect = [
mock_open(read_data=git_yaml_content)(),
mock_open(read_data=dataset_yaml_content)(),
Expand Down Expand Up @@ -102,16 +100,16 @@ def test_publish_dataset_success(
"links": [],
"stac_version": "1.0.0",
}
with patch("deep_code.tools.publish.OSCProductSTACGenerator") as mock_generator:
mock_generator.return_value.build_stac_collection.return_value = (
with patch("deep_code.tools.publish.OscDatasetStacGenerator") as mock_generator:
mock_generator.return_value.build_dataset_stac_collection.return_value = (
mock_collection
)

# Instantiate & publish
publisher = DatasetPublisher()
publisher.publish_dataset("/fake/path/to/dataset-config.yaml")

# 6Assert that we called git clone with /tmp/temp_repo
# Assert that we called git clone with /tmp/temp_repo
# Because expanduser("~") is now patched to /tmp, the actual path is /tmp/temp_repo
auth_url = "https://test-user:test-token@github.com/test-user/open-science-catalog-metadata-testing.git"
mock_subprocess_run.assert_any_call(
Expand Down
44 changes: 29 additions & 15 deletions deep_code/tests/utils/test_dataset_stac_generator.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
import os
import unittest
from datetime import datetime
from unittest.mock import MagicMock, patch

import numpy as np
from pystac import Collection
import unittest
from unittest.mock import patch, MagicMock
from xarray import Dataset

from deep_code.utils.dataset_stac_generator import OSCProductSTACGenerator
from deep_code.utils.dataset_stac_generator import OscDatasetStacGenerator


class TestOSCProductSTACGenerator(unittest.TestCase):
Expand All @@ -28,15 +28,31 @@ def setUp(self, mock_data_store):
},
attrs={"description": "Mock dataset for testing.", "title": "Mock Dataset"},
data_vars={
"var1": (("time", "lat", "lon"), np.random.rand(2, 5, 10)),
"var2": (("time", "lat", "lon"), np.random.rand(2, 5, 10)),
"var1": (
("time", "lat", "lon"),
np.random.rand(2, 5, 10),
{
"description": "dummy",
"standard_name": "var1",
"gcmd_keyword_url": "https://dummy",
},
),
"var2": (
("time", "lat", "lon"),
np.random.rand(2, 5, 10),
{
"description": "dummy",
"standard_name": "var2",
"gcmd_keyword_url": "https://dummy",
},
),
},
)
mock_store = MagicMock()
mock_store.open_data.return_value = self.mock_dataset
mock_data_store.return_value = mock_store

self.generator = OSCProductSTACGenerator(
self.generator = OscDatasetStacGenerator(
dataset_id="mock-dataset-id",
collection_id="mock-collection-id",
access_link="s3://mock-bucket/mock-dataset",
Expand Down Expand Up @@ -66,7 +82,7 @@ def test_get_temporal_extent(self):

def test_get_variables(self):
"""Test variable extraction."""
variables = self.generator._get_variables()
variables = self.generator.get_variable_ids()
self.assertEqual(variables, ["var1", "var2"])

def test_get_general_metadata(self):
Expand All @@ -78,7 +94,7 @@ def test_get_general_metadata(self):
@patch("pystac.Collection.set_self_href")
def test_build_stac_collection(self, mock_set_self_href, mock_add_link):
"""Test STAC collection creation."""
collection = self.generator.build_stac_collection()
collection = self.generator.build_dataset_stac_collection()
self.assertIsInstance(collection, Collection)
self.assertEqual(collection.id, "mock-collection-id")
self.assertEqual(collection.description, "Mock dataset for testing.")
Expand All @@ -104,19 +120,17 @@ def test_invalid_temporal_extent(self):
with self.assertRaises(ValueError):
self.generator._get_temporal_extent()


class TestOpenDataset(unittest.TestCase):
@patch("deep_code.utils.dataset_stac_generator.new_data_store")
@patch("deep_code.utils.dataset_stac_generator.logging.getLogger")
def test_open_dataset_success_public_store(self, mock_logger, mock_new_data_store):
"""Test dataset opening with the public store configuration."""
# Create a mock store and mock its `open_data` method
mock_store = MagicMock()
mock_new_data_store.return_value = mock_store
mock_store.open_data.return_value = "mock_dataset"
mock_store.open_data.return_value = self.mock_dataset

# Instantiate the generator (this will implicitly call _open_dataset)
generator = OSCProductSTACGenerator("mock-dataset-id", "mock-collection-id")
generator = OscDatasetStacGenerator("mock-dataset-id", "mock-collection-id")

# Validate that the dataset is assigned correctly
self.assertEqual(generator.dataset, "mock_dataset")
Expand Down Expand Up @@ -151,13 +165,13 @@ def test_open_dataset_success_authenticated_store(
mock_store,
# Second call (authenticated store) returns a mock store
]
mock_store.open_data.return_value = "mock_dataset"
mock_store.open_data.return_value = self.mock_dataset

os.environ["S3_USER_STORAGE_BUCKET"] = "mock-bucket"
os.environ["S3_USER_STORAGE_KEY"] = "mock-key"
os.environ["S3_USER_STORAGE_SECRET"] = "mock-secret"

generator = OSCProductSTACGenerator("mock-dataset-id", "mock-collection-id")
generator = OscDatasetStacGenerator("mock-dataset-id", "mock-collection-id")

# Validate that the dataset was successfully opened with the authenticated store
self.assertEqual(generator.dataset, "mock_dataset")
Expand Down Expand Up @@ -195,7 +209,7 @@ def test_open_dataset_failure(self, mock_logger, mock_new_data_store):
os.environ["S3_USER_STORAGE_SECRET"] = "mock-secret"

with self.assertRaises(ValueError) as context:
OSCProductSTACGenerator("mock-dataset-id", "mock-collection-id")
OscDatasetStacGenerator("mock-dataset-id", "mock-collection-id")

self.assertIn(
"Failed to open Zarr dataset with ID mock-dataset-id",
Expand Down
5 changes: 3 additions & 2 deletions deep_code/tests/utils/test_github_automation.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
import json
import unittest
from unittest.mock import patch, MagicMock
from pathlib import Path
import json
from unittest.mock import MagicMock, patch

from deep_code.utils.github_automation import GitHubAutomation


Expand Down
Loading
Loading