Skip to content

Conversation

@GermanHydrogen
Copy link
Collaborator

This PR adds a mechanism for pipelines to set the iFDO "image-set-header" section based on the second solution proposed in #22:

def _package(
        self,
        data_dir: Path,
        config: dict[str, Any],
        **kwargs: dict[str, Any],
    ) -> (
        dict[Path, tuple[Path, list[BaseMetadata] | None, dict[str, Any] | None]]
        | tuple[
            dict[Path, tuple[Path, list[BaseMetadata] | None, dict[str, Any] | None]],
            dict[type[BaseMetadata], MetadataHeader[BaseModel]],
        ]
    ):

Changes:

  • Added MetadataHeader class for wrapping different header data
  • Changed BasePipeline._package signature to enable pipelines to set header data
  • Changed iFDOMetdata to use the MetadataHeader supplied from the pipelines

This change should be backwards compatible.


To discuss @cjackett

@GermanHydrogen GermanHydrogen requested a review from cjackett July 16, 2025 13:51
Copy link
Contributor

@cjackett cjackett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @GermanHydrogen,

Thank you for this PR and for taking the time to implement a solution for #22. I appreciate the thought you put into the design and the consideration for backward compatibility.

After reviewing this implementation, I wanted to let you know that I've recently completed work on this feature using a different approach that I think better fits Marimba's architecture. Please see PR #32

The key architectural differences include:

Ownership and Responsibility Model: Having pipelines directly construct and return MetadataHeader[ImageSetHeader] objects tightly couples pipelines to specific metadata schemas (iFDO in particular). In the new implementation, pipelines instead return generic dictionaries via a get_metadata_header() method, and the metadata schema classes (iFDOMetadata, GenericMetadata) are responsible for interpreting those generic dictionaries according to their own schema requirements. This separation provides several benefits: pipelines remain schema-agnostic and don't need to import iFDO-specific classes, the same pipeline can work with any metadata schema without changes, and there's a clearer separation of concerns where pipelines know about their data and metadata classes know about metadata formats.

Backward Compatibility: Changing the _package() return type signature is a major breaking change that would require updates to all existing pipeline implementations. The new implemented approach avoids this by adding a new optional method with a default implementation that returns an empty dictionary from the base class, allowing existing pipelines to work without any changes.

Context Differentiation: Because Marimba can produce output metadata files at multiple levels (dataset, pipeline, and collection), the feature needs to generate different metadata for each granularity level. Without context differentiation, all three levels receive the same header information. The implemented solution provides a context parameter so pipelines can return different metadata depending on the granularity level, enabling level specific header attributes for dataset, pipeline and collection levels.

Automatic Optimization: I've included your deduplication implementation from the iFDO compressor work - it's integrated into the metadata generation pipeline and operates automatically without requiring pipelines to manage it manually.

I have just submitted the redesigned implementation in PR #32 if you want to review the details. I believe this alternative implementation better serves Marimba's long-term maintainability and aligns with the framework's design principles of separation of concerns and schema-agnostic design.

I appreciate your contribution to the project and it would be great to get your approval on the resultant structure of the output iFDO files.

-Chris

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants