-
Notifications
You must be signed in to change notification settings - Fork 780
Description
New feature
Hi, we're using Nextflow on AWS Batch.
We find all our tasks have this environmental variable injected ECS_CONTAINER_METADATA_URI_V4, that contains an HTTP+JSON endpoint to GET useful metadata from:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-metadata-endpoint-v4.html
And we can then call the endpoint and get a json of really handy metadata:
curl -sS --fail --connect-timeout 5 --max-time 10 "\$ECS_CONTAINER_METADATA_URI_V4" > ecs_metadata.json || true ## variable is an endpoint injected into all AWS Batch jobs that run on ECS-backed compute environments
Most importantly is the ImageID (i.e. content based digest) of the Docker container that is running, that I haven't seen accessible anywhere else in Nextflow
cat ecs_metadata.json | jq '.ImageID' "sha256:47d74d2f1d360a3167ea062129a4af229af095ef0fd23b842f62647e3ad29c6c"
The new feature would be to bake in fetching and parsing this metadata as appropriate for Nextflow cloud executors like AWS Batch.
Use case
Whenever you're running Docker containers on the cloud, and you want traceability as to the source container. (Other metadata presumably useful as well, but this is our use case with ImageID).
Suggested implementation
For each cloud executor, research if they have a similar pattern to this AWS Batch pattern, and then make a bespoke before script execution. Then injest it and make it available via trace.csv , report.html, etc.