Skip to content

Capture and elevate cloud executor injected metadata, specifically the Docker image digest #6923

@tfallon-ionis

Description

@tfallon-ionis

New feature

Hi, we're using Nextflow on AWS Batch.

We find all our tasks have this environmental variable injected ECS_CONTAINER_METADATA_URI_V4, that contains an HTTP+JSON endpoint to GET useful metadata from:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-metadata-endpoint-v4.html

And we can then call the endpoint and get a json of really handy metadata:
curl -sS --fail --connect-timeout 5 --max-time 10 "\$ECS_CONTAINER_METADATA_URI_V4" > ecs_metadata.json || true ## variable is an endpoint injected into all AWS Batch jobs that run on ECS-backed compute environments

Most importantly is the ImageID (i.e. content based digest) of the Docker container that is running, that I haven't seen accessible anywhere else in Nextflow
cat ecs_metadata.json | jq '.ImageID' "sha256:47d74d2f1d360a3167ea062129a4af229af095ef0fd23b842f62647e3ad29c6c"

The new feature would be to bake in fetching and parsing this metadata as appropriate for Nextflow cloud executors like AWS Batch.

Use case

Whenever you're running Docker containers on the cloud, and you want traceability as to the source container. (Other metadata presumably useful as well, but this is our use case with ImageID).

Suggested implementation

For each cloud executor, research if they have a similar pattern to this AWS Batch pattern, and then make a bespoke before script execution. Then injest it and make it available via trace.csv , report.html, etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions