Skip to content

Conversation

@asmacdo
Copy link
Member

@asmacdo asmacdo commented Dec 3, 2025

POC implementation of #284

The changes were substantial, so I wanted to go ahead and try it out to test sanity. Some implementation
did change the design doc.

All 3 phases are complete:

  • Phase 1: OCI image storage via Skopeo, versioned paths, Docker loading
  • Phase 2: --image and --exec flags for containers-run
  • Phase 3: YAML execution profiles with inheritance (clobber semantics)

Breaking changes from master

  • docker:// now stores as OCI directory (was: Singularity SIF)
  • cmdexec not set by containers-add (commented out, TODO Phase 4)
  • Old URL schemes commented out: dhub://, oci:, shub:// (TODO Phase 4)

New features

  • containers-profiles command to list available profiles
  • Profiles support extends: for inheritance
  • CLI args (--image, --exec) override profile fields
  • Base profiles (no image) work with --image flag
  • Early validation: error if profile references missing image

Can be further improved by:

  • Additional substitutions, e.g. {binds} so profiles can be better composable
  • containers-init or similar to install default profiles (docker-default, apptainer-default)
  • Phase 4: Backwards compatibility for old URL schemes
  • Include profile reference in provenance metadata

ReproNim/containers integration

To fully replace the singularity_cmd shim, profiles would need additional features:

  • pre-run / post-run hooks for dynamic setup/cleanup scripts
  • env: block for static environment variables
  • {env.VARNAME} placeholder for dynamic env var expansion

These are documented in the design doc as optional upstream RFEs. The current implementation covers the
core use case; ReproNim could adopt profiles now and request these extensions as needed.

kyleam and others added 30 commits December 4, 2020 16:10
These parts will be useful for the upcoming skopeo adapter as well.
"get" is probably clearer than "list", and tacking on "_ids" makes it
clearer what the return value is.

Also, drop the leading underscore, which is a holdover from the
function being in the adapters.docker module.
The minimum Python version of DataLad is new enough that we can assume
subprocess.run() is available.  It's recommended by the docs, and I
like it more, so switch to it.

Note that we might want to eventually switch to using WitlessRunner
here.  The original idea with using the subprocess module directly was
that it'd be nice for the docker adapter to be standalone, as nothing
in the adapter depended on datalad at the time.  That's not the case
anymore after the adapters.utils split and the use of datalad.utils
within it.  (And the upcoming skopeo adapter will make heavier use of
datalad for adding URLs to the layers.)
This logic will get a bit more involved in the next commit, and it
will be needed by the skopeo adapter too.
When the adapter is called from the command line (as containers-run
does) and datalad gets imported, the level set via the --verbose
argument doesn't have an effect and logging happens twice, once
through datalad's handler and once through the adapter's.

Before 313c4f0 (WIN/Workaround: don't pass gid and uid to docker run
call, 2020-11-10), the above was the case when docker.main() was
triggered with the documented `python -m datalad_container.adapters
...` invocation, but not when the script path was passed to python.
Following that commit, the adapter imports datalad, so datalad's
logger is always configured.

Adjust setup_logger() to set the log level of loggers under the
datalad.containers.adapters namespace so that the adapter's logging
level is in effect for command line calls to the adapter.

As mentioned above, datalad is now loaded in all cases, so a handler
is always configured, but, in case this changes in the future, add a
simpler handler if one isn't already configured.
The same handling will be needed in the skopeo adapter.  Avoid
repeating it.
Some of the subprocess calls capture stderr.  Show it to the caller on
failure.
In order to be able to track Docker containers in a dataset, we
introduced the docker-save-based docker adapter in 68a1462 (Add
prototype of a Docker adapter, 2018-05-18).  It's not clear how much
this has been used, but at least conceptually it seems to be viable.
One problem, however, is that ideally we'd be able to assign Docker
registry URLs to the image files stored in the dataset (particularly
the large non-configuration files).  There doesn't seem to be a way to
do this with the docker-save archives.

Another option for storing the image in a dataset is the Open
Container Initiative image format.  Skopeo can be used to copy images
in Docker registries (and some other destinations) to an OCI-compliant
directory.  When Docker Hub is used as the source, the resulting
layers blobs can be re-obtained via GET /v2/NAME/blobs/ID.

Using skopeo/OCI also has the advantage of making it easier to execute
via podman in the future.

Add an initial skopeo-based OCI adapter.  At this point, it has the
same functionality as the docker adapter.
After running `skopeo copy docker://docker.io/... oci:<dir>`, we can
link up the layer to the Docker registry.  However, other digests
aren't preserved.  One notable mismatch is between the image ID if you
run

    docker pull x

versus

    skopeo copy docker://x oci:x && skopeo copy oci:x docker-daemon:x

I haven't really wrapped my head around all the different digests and
when they can change.  However, skopeo's issue tracker has a good deal
of discussion about this, and it looks complicated (e.g., issues 11,
469, 949, 1046, and 1097).

The adapter docstring should probably note this, though at this point
I'm not sure I could say something coherent.  Anyway, add a to-do
note...
I _think_ containers-storage: is what we'd use for podman-run, but I
haven't attempted it.
Prevent skopeo-copy output from being shown, since it's probably
confusing to see output under run's "Command start (output follows)"
tag for a command that the user didn't explicitly call.  However, for
large images, this has the downside that the user might want some
signs of life, so this may need to be revisited.
We'll need this information in order to add a tag to the oci:
destination and to make the entry copied to docker-daemon more
informative.  I've tried to base the rules on containers/image
implementation, which is what skopeo uses underneath.
An image stored as an OCI directory can have a tag.  If the source has
a tag specified, copy it over to the destination.

Note that in upcoming commits will store the full source specification
as an image annotation, so we won't rely on this when copying the
image to docker-daemon:, but it still seems nice to have (e.g., when
looking at the directory with skopeo-inspect).
These will be used to store the value of the skopeo-copy source and
then retrieve it at load time to make the docker-daemon: entry more
informative.
The OCI format allows annotations.  Add one with the source value
(which will be determined by what the caller gives to containers-add)
so that we can use this information when copying the information to a
docker-daemon: destination.
The images copied to the daemon look like this

    $ docker images
    REPOSITORY             TAG                 IMAGE ID            CREATED             SIZE
    datalad-container/bb   sha256-98345e4      98345e418eb7        3 weeks ago         69.2MB

That tag isn't useful because it just repeats the image ID.  And the
name after "datalad-container/" is the name of the directory, so with
the default containers-add location it would be an uninformative
"image".

With the last commit, we store the source specification as an
annotation in the OCI directory.  Parse it and reuse the original
repository name and tag.

   REPOSITORY                 TAG                 IMAGE ID            CREATED             SIZE
   datalad-container/debian   buster-slim         98345e418eb7        3 weeks ago         69.2MB

If the source has a digest instead of the tag, construct the daemon
tag from that.
Add a new oci: scheme.  The stacking of the schemes isn't ideal
(oci:docker://, oci:docker-daemon:), but it allows for any skopeo
transport to be used.

Note: I'm not avoiding appending "//" for a conceptual reason
(although there might be a valid one), but because I find
"oci://docker://" to be ugly.  Perhaps the consistency with "shub://"
and "dhub://" outweighs that though.
The next commit will use this logic in the oci adapter as well, and,
it'd be nice (though not strictly necessary) to avoid oci and
containers_add importing each other.
TODO: Finalize approach in Datalad for Docker Registry URLs.
* origin/master: (217 commits)
  [DATALAD RUNCMD] Run pre-commit to harmonize code throughout
  Update __version__ to 1.2.6
  [skip ci] Update CHANGELOG
  BF: use setuptools.errors.OptionError instead of now removed import of distutils.DistutilsOptionError
  BF: docbuild - use python 3.9 (not 3.8) and upgrade setuptools
  [DATALAD RUNCMD] Run pre-commit to harmonize code throughout
  rm duplicate .codespellrc and move some of its skips into pyproject.toml
  progress codespell in pre-commit
  Add precommit configuration as in datalad ATM
  [release-action] Autogenerate changelog snippet for PR 268
  MNT: Account for a number of deprecations in core
  Revert linting a target return value for a container
  Fix lint errors other than line length
  upper case CWD acronym
  CI/tools: Add fuse2fs dependency for singularity installation
  Improving documentation for --url parameter
  Update __version__ to 1.2.5
  [skip ci] Update CHANGELOG
  Add changelog entry for isort PR
  [DATALAD RUNCMD] isort all files for consistency
  ...

 Conflicts - some were tricky:
	datalad_container/adapters/docker.py
	datalad_container/containers_add.py
	datalad_container/utils.py - both added but merge looked funny
otherwise even singularity does not install
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "sed -i -e 's,from distutils.spawn import find_executable,from shutil import which,g' -e 's,find_executable(,which(,g' datalad_container/adapters/tests/test_oci_more.py",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
Added comprehensive documentation for Claude Code to work effectively with
this codebase, including architecture overview, development commands, and
key implementation details.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Extended the OCI adapter to support any container registry without
hardcoding endpoints. The link() function now dynamically constructs
registry API endpoints using the pattern https://{registry}/v2/, with
Docker Hub as the only special case (registry-1.docker.io).

This enables automatic support for registries like:
- quay.io (Quay.io registry)
- gcr.io (Google Container Registry)
- ghcr.io (GitHub Container Registry)
- Any other V2-compatible registry

Changes:
- Removed hardcoded _ENDPOINTS dictionary
- Added dynamic endpoint construction in link() function
- Added unit tests for parsing references from alternative registries
- Added integration tests using real images:
  - ghcr.io/astral-sh/uv:latest for ghcr.io testing
  - quay.io/linuxserver.io/baseimage-alpine:3.18 for quay.io testing

The link() function will add registry URLs to annexed layer images for
any registry when proper provider configuration is available, enabling
efficient retrieval through git-annex.

All new tests are marked with @pytest.mark.ai_generated as per project
standards.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Enhanced the parametrized registry test to include:
1. Docker Hub (docker.io) with busybox:1.30 for consistency
2. Verification that annexed blobs exist in the OCI image
3. Check that all annexed files have URLs registered in either the
   datalad or web remote for efficient retrieval

The test now verifies that `git annex find --not --in datalad --and
--not --in web` returns empty, ensuring all blobs are accessible
through git-annex remotes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Enhanced the parametrized registry test to verify the complete
drop/get cycle for the entire dataset:

1. Drops all annexed content in the dataset
2. Verifies that files were actually dropped (non-empty results)
3. Gets everything back from remotes
4. Verifies that files were retrieved (non-empty results)

This ensures that the registered URLs in datalad/web remotes are
functional and files can be successfully retrieved from the registry.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
yarikoptic and others added 19 commits October 20, 2025 09:09
This fixture ensures that sys.executable's directory is first in PATH
for the duration of tests. This is needed when tests spawn subprocesses
that need to import modules from the same Python environment that's
running pytest, preventing "No module named X" errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…er handling

- Add parametrized integration test covering docker.io, gcr.io, and quay.io
- Test container addition, execution, and annexed blob verification
- Add drop/get cycle testing to verify remote retrieval works
- Fix link() to create datalad remote even without provider configuration
  - Issue warning instead of skipping when provider not found
  - Allows URLs to be registered and files to be retrieved from any registry
- Use pytest tmp_path fixture instead of @with_tempfile decorator

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
* origin/master:
  [release-action] Autogenerate changelog snippet for PR 278
  install libffi7 since otherwise git-annex install fails
  chore: appveyor -- progress Ubuntu to 2204
- Two concepts only: Image (artifact) and Profile (execution recipe)
- Profiles point to images, can extend other profiles
- Child profiles clobber parent values (no merging)
- YAML files instead of .datalad/config for images and profiles
- ReproNim provides base profiles, users extend with their settings

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- images/<name>/<version>/ for versioned image storage
- sources.yaml consolidates provenance for all versions of an image
- profiles/ for execution profiles referencing image/<version>
- Eliminates separate environments/ directory

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
- Add parse_registry_url() for docker:// scheme (quay://, ghcr:// commented pending testing)
- Store images at .datalad/containers/images/<name>/<version>/image/
- Version extracted from URL tag, defaults to 'latest'
- Comment out cmdexec handling (TODO Phase 4 backwards compatibility)
- Comment out deprecated schemes: dhub://, oci:, shub:// (TODO Phase 4)
- Add clear error for unsupported URL schemes

BREAKING CHANGES:
- docker:// now stores OCI directory via skopeo (was: singularity build to SIF)
- cmdexec/--call-fmt no longer set by containers-add
- New storage path structure

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Call oci.load() after saving and linking image
- Image is immediately usable with docker run
- TODO: add --load flag to make this optional

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Use short form (datalad-container/{name}:{tag}) that Docker recognizes
- Add TODOs for: other registries, non-library images, surfacing name in output

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Allow name:version in container names (e.g., 'alpine:latest', 'mriqc:23.1.0')
- Parse name:version to determine storage path
- Tag loaded Docker image with meaningful name (datalad-container/name:version)
- docker run datalad-container/alpine:mytag now works after containers-add

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- New --image flag: specifies container name:version
- New --exec flag: execution template with {img} and {cmd} placeholders
- Resolves image to datalad-container/<name>:<version> for Docker
- Tracks image directory as extra_input for provenance
- Records resolved command (not shim) in run record
- Legacy -n/--container-name path preserved

Usage:
  datalad containers-run --image alpine:latest \
    --exec "docker run --rm {img} {cmd}" echo hello

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- {img} = Docker image name (datalad-container/name:version)
- {img_path} = OCI directory path (.datalad/containers/images/.../image)
- Enables: apptainer exec oci:{img_path} {cmd}

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add pyyaml dependency to setup.cfg
- New profiles.py module with profile loading, extends resolution
  (clobber semantics), and image validation (error early)
- New containers-profiles command to list available profiles
- Add --profile flag to containers-run
- CLI args (--image, --exec) override profile fields

Profiles are YAML files in .datalad/containers/profiles/ defining
'image' and 'exec' template. Supports inheritance via 'extends' key.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Profiles can now omit the 'image' key, making them reusable base
templates for different runtimes. When using such profiles, the
--image flag is required.

This enables shipping docker-default, apptainer-default, podman-default
profiles that work with any image.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@asmacdo
Copy link
Member Author

asmacdo commented Dec 3, 2025

Added a demo.sh for easy review.

./demo.sh
#!/bin/bash
# demo.sh - Demonstrate datalad-container images/profiles refactor
#
# Usage: ./demo.sh [base_dir]
#        base_dir defaults to /tmp

set -e

BASE_DIR="${1:-/tmp}"
DEMO_DIR="$BASE_DIR/datalad-container-demo-$$"

section() {
    echo ""
    echo "════════════════════════════════════════════════════════════"
    echo "  $1"
    echo "════════════════════════════════════════════════════════════"
    echo ""
}

info() {
    echo "► $1"
}

warn() {
    echo "⚠ $1"
}

run_cmd() {
    echo "\$ $1"
    eval "$1"
}

show_last_commit() {
    echo ""
    info "Last commit:"
    git --no-pager log -1 --oneline
    echo ""
}

# ============================================================================
section "SETUP: Create demo dataset"
# ============================================================================

info "Creating demo directory: $DEMO_DIR"
mkdir -p "$DEMO_DIR"
cd "$DEMO_DIR"

run_cmd "datalad create ."

info "Adding alpine image..."
run_cmd "datalad containers-add alpine:latest --url docker://alpine:latest"

# ============================================================================
section "PHASE 1: containers-add + datalad run (raw docker command)"
# ============================================================================

info "Using datalad run with explicit docker command (no containers-run)"
run_cmd "datalad run --input .datalad/containers/images/alpine/latest/image --output phase1-output.txt -- docker run --rm datalad-container/alpine:latest sh -c 'echo Phase 1: raw datalad run > /dev/stdout'"

# Create output manually since docker stdout doesn't redirect to file easily
echo "Phase 1: raw datalad run" > phase1-output.txt
git add phase1-output.txt && git commit --amend --no-edit

show_last_commit

# ============================================================================
section "PHASE 2: containers-run --image --exec"
# ============================================================================

info "Using containers-run with explicit --image and --exec flags"
run_cmd "datalad containers-run --image alpine:latest --exec 'docker run --rm --user \$(id -u):\$(id -g) -v \$(pwd):/work -w /work {img} {cmd}' --output phase2-output.txt --expand outputs -- sh -c 'echo Phase 2: containers-run with --image --exec > {outputs}'"

show_last_commit

# ============================================================================
section "PHASE 3a: Create profile with image, containers-run --profile"
# ============================================================================

info "Creating profiles directory and docker-alpine profile..."
mkdir -p .datalad/containers/profiles

cat > .datalad/containers/profiles/docker-alpine.yaml << 'EOF'
image: alpine:latest
exec: docker run --rm --user $(id -u):$(id -g) -v $(pwd):/work -w /work {img} {cmd}
EOF

info "Profile contents:"
cat .datalad/containers/profiles/docker-alpine.yaml

info "Saving profile to dataset..."
run_cmd "datalad save -m 'Add docker-alpine profile' .datalad/containers/profiles/"

run_cmd "datalad containers-run --profile docker-alpine --output phase3a-output.txt --expand outputs -- sh -c 'echo Phase 3a: profile with image > {outputs}'"

show_last_commit

# ============================================================================
section "PHASE 3b: containers-list and containers-profiles output"
# ============================================================================

info "Listing containers:"
run_cmd "datalad containers-list"

echo ""
info "Listing profiles:"
run_cmd "datalad containers-profiles"

# ============================================================================
section "PHASE 3c: Profile with extends (inheritance)"
# ============================================================================

info "Creating docker-alpine-env profile that extends docker-alpine..."
cat > .datalad/containers/profiles/docker-alpine-env.yaml << 'EOF'
extends: docker-alpine
exec: docker run --rm --user $(id -u):$(id -g) -v $(pwd):/work -w /work -e MY_VAR=hello-from-extended-profile {img} {cmd}
EOF

info "Profile contents:"
cat .datalad/containers/profiles/docker-alpine-env.yaml

run_cmd "datalad save -m 'Add docker-alpine-env profile' .datalad/containers/profiles/"

run_cmd "datalad containers-run --profile docker-alpine-env --output phase3c-output.txt --expand outputs -- sh -c 'echo Phase 3c: MY_VAR=\$MY_VAR > {outputs}'"

info "Output file contents:"
cat phase3c-output.txt

show_last_commit

# ============================================================================
section "PHASE 3d: CLI override --exec"
# ============================================================================

info "Using --profile but overriding --exec to add environment variable..."
run_cmd "datalad containers-run --profile docker-alpine --exec 'docker run --rm --user \$(id -u):\$(id -g) -v \$(pwd):/work -w /work -e CLI_OVERRIDE=yes {img} {cmd}' --output phase3d-output.txt --expand outputs -- sh -c 'echo Phase 3d: CLI_OVERRIDE=\$CLI_OVERRIDE > {outputs}'"

info "Output file contents:"
cat phase3d-output.txt

show_last_commit

# ============================================================================
section "PHASE 3e: CLI override --image"
# ============================================================================

info "Adding busybox image..."
run_cmd "datalad containers-add busybox:latest --url docker://busybox:latest"

info "Using docker-alpine profile but overriding --image to use busybox..."
run_cmd "datalad containers-run --profile docker-alpine --image busybox:latest --output phase3e-output.txt --expand outputs -- sh -c 'echo Phase 3e: running busybox instead of alpine > {outputs}'"

info "Output file contents:"
cat phase3e-output.txt

show_last_commit

# ============================================================================
section "PHASE 3f: Error case - missing image (early validation)"
# ============================================================================

info "Creating profile that references non-existent image..."
cat > .datalad/containers/profiles/bad-profile.yaml << 'EOF'
image: nonexistent:v999
exec: docker run --rm {img} {cmd}
EOF

run_cmd "datalad save -m 'Add bad-profile for testing' .datalad/containers/profiles/"

info "Attempting to use bad-profile (should fail early)..."
echo ""
if datalad containers-run --profile bad-profile -- echo "this should not run" 2>&1; then
    warn "ERROR: Command should have failed!"
else
    info "Command failed as expected (early validation works)"
fi

# Clean up bad profile
rm .datalad/containers/profiles/bad-profile.yaml
run_cmd "datalad save -m 'Remove bad-profile' .datalad/containers/profiles/"

# ============================================================================
section "PHASE 3g: Base profiles without image (require --image flag)"
# ============================================================================

info "Creating base profiles for different runtimes (no image specified)..."

cat > .datalad/containers/profiles/docker-default.yaml << 'EOF'
exec: docker run --rm --user $(id -u):$(id -g) -v $(pwd):/work -w /work {img} {cmd}
EOF

cat > .datalad/containers/profiles/apptainer-default.yaml << 'EOF'
exec: apptainer exec oci:{img_path} {cmd}
EOF

cat > .datalad/containers/profiles/podman-default.yaml << 'EOF'
exec: podman run --rm --userns=keep-id -v $(pwd):/work:Z -w /work {img} {cmd}
EOF

run_cmd "datalad save -m 'Add base runtime profiles' .datalad/containers/profiles/"

info "Listing all profiles:"
run_cmd "datalad containers-profiles"

# Test docker-default (should work)
info "Testing docker-default profile with --image alpine:latest..."
run_cmd "datalad containers-run --profile docker-default --image alpine:latest --output phase3g-docker.txt --expand outputs -- sh -c 'echo Phase 3g: docker-default profile > {outputs}'"
show_last_commit

# Test apptainer-default if available
if command -v apptainer &> /dev/null || command -v singularity &> /dev/null; then
    info "Testing apptainer-default profile..."
    run_cmd "datalad containers-run --profile apptainer-default --image alpine:latest --output phase3g-apptainer.txt --expand outputs -- sh -c 'echo Phase 3g: apptainer-default profile > {outputs}'"
    show_last_commit
else
    warn "Skipping apptainer test (apptainer/singularity not installed)"
fi

# Test podman-default if available
if command -v podman &> /dev/null; then
    info "Testing podman-default profile..."
    run_cmd "datalad containers-run --profile podman-default --image alpine:latest --output phase3g-podman.txt --expand outputs -- sh -c 'echo Phase 3g: podman-default profile > {outputs}'"
    show_last_commit
else
    warn "Skipping podman test (podman not installed)"
fi

# ============================================================================
section "DEMO COMPLETE"
# ============================================================================

info "Demo directory: $DEMO_DIR"
echo ""
info "Final git log:"
git --no-pager log --oneline

echo ""
info "All containers:"
datalad containers-list

echo ""
info "All profiles:"
datalad containers-profiles

echo ""
echo "Demo completed successfully!"
Output

════════════════════════════════════════════════════════════
  SETUP: Create demo dataset
════════════════════════════════════════════════════════════

► Creating demo directory: /tmp/datalad-container-demo-1114184
$ datalad create .
create(ok): /tmp/datalad-container-demo-1114184 (dataset)
► Adding alpine image...
$ datalad containers-add alpine:latest --url docker://alpine:latest
[INFO] Saving OCI image from docker://alpine:latest 
Getting image source signatures
Copying blob sha256:014e56e613968f73cce0858124ca5fbc601d7888099969a4eea69f31dcd71a53
Copying config sha256:7acffee03fe864cd6b88219a1028855d6c912e7cf6fac633aa4307529fd0cc08
Writing manifest to image destination
[WARNING] Skipping non-annexed layer: /tmp/datalad-container-demo-1114184/.datalad/containers/images/alpine/latest/image/blobs/sha256/014e56e613968f73cce0858124ca5fbc601d7888099969a4eea69f31dcd71a53 
[INFO] Loaded image into Docker daemon: datalad-container/alpine:latest 
add(ok): .datalad/containers/images/alpine/latest/image/blobs/sha256/014e56e613968f73cce0858124ca5fbc601d7888099969a4eea69f31dcd71a53 (file)
add(ok): .datalad/containers/images/alpine/latest/image/blobs/sha256/7acffee03fe864cd6b88219a1028855d6c912e7cf6fac633aa4307529fd0cc08 (file)
add(ok): .datalad/containers/images/alpine/latest/image/blobs/sha256/a107a3c031732299dd9dd607bb13787834db2de38cfa13f1993b7105e4814c60 (file)
add(ok): .datalad/containers/images/alpine/latest/image/index.json (file)
add(ok): .datalad/containers/images/alpine/latest/image/oci-layout (file)
add(ok): .datalad/config (file)
save(ok): . (dataset)
action summary:
  add (ok: 6)
  save (ok: 1)
add(ok): .datalad/containers/images/alpine/latest/image/blobs/sha256/014e56e613968f73cce0858124ca5fbc601d7888099969a4eea69f31dcd71a53 (file)
add(ok): .datalad/containers/images/alpine/latest/image/blobs/sha256/7acffee03fe864cd6b88219a1028855d6c912e7cf6fac633aa4307529fd0cc08 (file)
add(ok): .datalad/containers/images/alpine/latest/image/blobs/sha256/a107a3c031732299dd9dd607bb13787834db2de38cfa13f1993b7105e4814c60 (file)
add(ok): .datalad/containers/images/alpine/latest/image/index.json (file)
add(ok): .datalad/containers/images/alpine/latest/image/oci-layout (file)
add(ok): .datalad/config (file)
save(ok): . (dataset)
containers_add(ok): /tmp/datalad-container-demo-1114184/.datalad/containers/images/alpine/latest/image (file)
action summary:
  add (ok: 6)
  containers_add (ok: 1)
  save (ok: 1)

════════════════════════════════════════════════════════════
  PHASE 1: containers-add + datalad run (raw docker command)
════════════════════════════════════════════════════════════

► Using datalad run with explicit docker command (no containers-run)
$ datalad run --input .datalad/containers/images/alpine/latest/image --output phase1-output.txt -- docker run --rm datalad-container/alpine:latest sh -c 'echo Phase 1: raw datalad run > /dev/stdout'
[INFO] Making sure inputs are available (this may take some time) 
[INFO] == Command start (output follows) ===== 
Phase 1: raw datalad run
[INFO] == Command exit (modification check follows) ===== 
run(ok): /tmp/datalad-container-demo-1114184 (dataset) [docker run --rm datalad-container/alpine...]
[master 3937679] [DATALAD] Configure containerized environment 'alpine:latest'
 Date: Wed Dec 3 14:25:21 2025 -0600
 7 files changed, 8 insertions(+)
 create mode 120000 .datalad/containers/images/alpine/latest/image/blobs/sha256/014e56e613968f73cce0858124ca5fbc601d7888099969a4eea69f31dcd71a53
 create mode 120000 .datalad/containers/images/alpine/latest/image/blobs/sha256/7acffee03fe864cd6b88219a1028855d6c912e7cf6fac633aa4307529fd0cc08
 create mode 120000 .datalad/containers/images/alpine/latest/image/blobs/sha256/a107a3c031732299dd9dd607bb13787834db2de38cfa13f1993b7105e4814c60
 create mode 120000 .datalad/containers/images/alpine/latest/image/index.json
 create mode 120000 .datalad/containers/images/alpine/latest/image/oci-layout
 create mode 100644 phase1-output.txt

► Last commit:
3937679 [DATALAD] Configure containerized environment 'alpine:latest'


════════════════════════════════════════════════════════════
  PHASE 2: containers-run --image --exec
════════════════════════════════════════════════════════════

► Using containers-run with explicit --image and --exec flags
$ datalad containers-run --image alpine:latest --exec 'docker run --rm --user $(id -u):$(id -g) -v $(pwd):/work -w /work {img} {cmd}' --output phase2-output.txt --expand outputs -- sh -c 'echo Phase 2: containers-run with --image --exec > {outputs}'
[INFO] Making sure inputs are available (this may take some time) 
[INFO] == Command start (output follows) ===== 
[INFO] == Command exit (modification check follows) ===== 
run(ok): /tmp/datalad-container-demo-1114184 (dataset) [docker run --rm --user $(id -u):$(id -g)...]
add(ok): phase2-output.txt (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  get (notneeded: 1)
  run (ok: 1)
  save (ok: 1)

► Last commit:
9a869d2 [DATALAD RUNCMD] docker run --rm --user $(id -u):$(id -g)...


════════════════════════════════════════════════════════════
  PHASE 3a: Create profile with image, containers-run --profile
════════════════════════════════════════════════════════════

► Creating profiles directory and docker-alpine profile...
► Profile contents:
image: alpine:latest
exec: docker run --rm --user $(id -u):$(id -g) -v $(pwd):/work -w /work {img} {cmd}
► Saving profile to dataset...
$ datalad save -m 'Add docker-alpine profile' .datalad/containers/profiles/
add(ok): .datalad/containers/profiles/docker-alpine.yaml (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)
$ datalad containers-run --profile docker-alpine --output phase3a-output.txt --expand outputs -- sh -c 'echo Phase 3a: profile with image > {outputs}'
[INFO] Making sure inputs are available (this may take some time) 
[INFO] == Command start (output follows) ===== 
[INFO] == Command exit (modification check follows) ===== 
run(ok): /tmp/datalad-container-demo-1114184 (dataset) [docker run --rm --user $(id -u):$(id -g)...]
add(ok): phase3a-output.txt (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  get (notneeded: 1)
  run (ok: 1)
  save (ok: 1)

► Last commit:
d68f9ab [DATALAD RUNCMD] docker run --rm --user $(id -u):$(id -g)...


════════════════════════════════════════════════════════════
  PHASE 3b: containers-list and containers-profiles output
════════════════════════════════════════════════════════════

► Listing containers:
$ datalad containers-list
alpine:latest -> .datalad/containers/images/alpine/latest/image

► Listing profiles:
$ datalad containers-profiles
docker-alpine -> alpine:latest

════════════════════════════════════════════════════════════
  PHASE 3c: Profile with extends (inheritance)
════════════════════════════════════════════════════════════

► Creating docker-alpine-env profile that extends docker-alpine...
► Profile contents:
extends: docker-alpine
exec: docker run --rm --user $(id -u):$(id -g) -v $(pwd):/work -w /work -e MY_VAR=hello-from-extended-profile {img} {cmd}
$ datalad save -m 'Add docker-alpine-env profile' .datalad/containers/profiles/
add(ok): .datalad/containers/profiles/docker-alpine-env.yaml (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)
$ datalad containers-run --profile docker-alpine-env --output phase3c-output.txt --expand outputs -- sh -c 'echo Phase 3c: MY_VAR=$MY_VAR > {outputs}'
[INFO] Making sure inputs are available (this may take some time) 
[INFO] == Command start (output follows) ===== 
[INFO] == Command exit (modification check follows) ===== 
run(ok): /tmp/datalad-container-demo-1114184 (dataset) [docker run --rm --user $(id -u):$(id -g)...]
add(ok): phase3c-output.txt (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  get (notneeded: 1)
  run (ok: 1)
  save (ok: 1)
► Output file contents:
Phase 3c: MY_VAR=hello-from-extended-profile

► Last commit:
9fe0639 [DATALAD RUNCMD] docker run --rm --user $(id -u):$(id -g)...


════════════════════════════════════════════════════════════
  PHASE 3d: CLI override --exec
════════════════════════════════════════════════════════════

► Using --profile but overriding --exec to add environment variable...
$ datalad containers-run --profile docker-alpine --exec 'docker run --rm --user $(id -u):$(id -g) -v $(pwd):/work -w /work -e CLI_OVERRIDE=yes {img} {cmd}' --output phase3d-output.txt --expand outputs -- sh -c 'echo Phase 3d: CLI_OVERRIDE=$CLI_OVERRIDE > {outputs}'
[INFO] Making sure inputs are available (this may take some time) 
[INFO] == Command start (output follows) ===== 
[INFO] == Command exit (modification check follows) ===== 
run(ok): /tmp/datalad-container-demo-1114184 (dataset) [docker run --rm --user $(id -u):$(id -g)...]
add(ok): phase3d-output.txt (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  get (notneeded: 1)
  run (ok: 1)
  save (ok: 1)
► Output file contents:
Phase 3d: CLI_OVERRIDE=yes

► Last commit:
ecc03bf [DATALAD RUNCMD] docker run --rm --user $(id -u):$(id -g)...


════════════════════════════════════════════════════════════
  PHASE 3e: CLI override --image
════════════════════════════════════════════════════════════

► Adding busybox image...
$ datalad containers-add busybox:latest --url docker://busybox:latest
[INFO] Saving OCI image from docker://busybox:latest 
Getting image source signatures
Copying blob sha256:e59838ecfec5e79eb4371e9995ef86c8000fe1c67d7b9fa7b57e996d9ba772ff
Copying config sha256:08ef35a1c3f050afbbd64194ffd1b8d5878659f5491567f26d1c814513ae9649
Writing manifest to image destination
[WARNING] Skipping non-annexed layer: /tmp/datalad-container-demo-1114184/.datalad/containers/images/busybox/latest/image/blobs/sha256/e59838ecfec5e79eb4371e9995ef86c8000fe1c67d7b9fa7b57e996d9ba772ff 
[INFO] Loaded image into Docker daemon: datalad-container/busybox:latest 
add(ok): .datalad/containers/images/busybox/latest/image/blobs/sha256/08ef35a1c3f050afbbd64194ffd1b8d5878659f5491567f26d1c814513ae9649 (file)
add(ok): .datalad/containers/images/busybox/latest/image/blobs/sha256/870e815c3a50dd0f6b40efddb319c72c32c3ee340b5a3e8945904232ccd12f44 (file)
add(ok): .datalad/containers/images/busybox/latest/image/blobs/sha256/e59838ecfec5e79eb4371e9995ef86c8000fe1c67d7b9fa7b57e996d9ba772ff (file)
add(ok): .datalad/containers/images/busybox/latest/image/index.json (file)
add(ok): .datalad/containers/images/busybox/latest/image/oci-layout (file)
add(ok): .datalad/config (file)
save(ok): . (dataset)
action summary:
  add (ok: 6)
  save (ok: 1)
add(ok): .datalad/containers/images/busybox/latest/image/blobs/sha256/08ef35a1c3f050afbbd64194ffd1b8d5878659f5491567f26d1c814513ae9649 (file)
add(ok): .datalad/containers/images/busybox/latest/image/blobs/sha256/870e815c3a50dd0f6b40efddb319c72c32c3ee340b5a3e8945904232ccd12f44 (file)
add(ok): .datalad/containers/images/busybox/latest/image/blobs/sha256/e59838ecfec5e79eb4371e9995ef86c8000fe1c67d7b9fa7b57e996d9ba772ff (file)
add(ok): .datalad/containers/images/busybox/latest/image/index.json (file)
add(ok): .datalad/containers/images/busybox/latest/image/oci-layout (file)
add(ok): .datalad/config (file)
save(ok): . (dataset)
containers_add(ok): /tmp/datalad-container-demo-1114184/.datalad/containers/images/busybox/latest/image (file)
action summary:
  add (ok: 6)
  containers_add (ok: 1)
  save (ok: 1)
► Using docker-alpine profile but overriding --image to use busybox...
$ datalad containers-run --profile docker-alpine --image busybox:latest --output phase3e-output.txt --expand outputs -- sh -c 'echo Phase 3e: running busybox instead of alpine > {outputs}'
[INFO] Making sure inputs are available (this may take some time) 
[INFO] == Command start (output follows) ===== 
[INFO] == Command exit (modification check follows) ===== 
run(ok): /tmp/datalad-container-demo-1114184 (dataset) [docker run --rm --user $(id -u):$(id -g)...]
add(ok): phase3e-output.txt (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  get (notneeded: 1)
  run (ok: 1)
  save (ok: 1)
► Output file contents:
Phase 3e: running busybox instead of alpine

► Last commit:
3f07f05 [DATALAD RUNCMD] docker run --rm --user $(id -u):$(id -g)...


════════════════════════════════════════════════════════════
  PHASE 3f: Error case - missing image (early validation)
════════════════════════════════════════════════════════════

► Creating profile that references non-existent image...
$ datalad save -m 'Add bad-profile for testing' .datalad/containers/profiles/
add(ok): .datalad/containers/profiles/bad-profile.yaml (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)
► Attempting to use bad-profile (should fail early)...

run(error): /tmp/datalad-container-demo-1114184 (dataset) [Profile 'bad-profile' references image 'nonexistent:v999' but no image found at /tmp/datalad-container-demo-1114184/.datalad/containers/images/nonexistent/v999/image]
► Command failed as expected (early validation works)
$ datalad save -m 'Remove bad-profile' .datalad/containers/profiles/
delete(ok): .datalad/containers/profiles/bad-profile.yaml (symlink)
save(ok): . (dataset)
action summary:
  delete (ok: 1)
  save (ok: 1)

════════════════════════════════════════════════════════════
  PHASE 3g: Base profiles without image (require --image flag)
════════════════════════════════════════════════════════════

► Creating base profiles for different runtimes (no image specified)...
$ datalad save -m 'Add base runtime profiles' .datalad/containers/profiles/
add(ok): .datalad/containers/profiles/apptainer-default.yaml (file)
add(ok): .datalad/containers/profiles/docker-default.yaml (file)
add(ok): .datalad/containers/profiles/podman-default.yaml (file)
save(ok): . (dataset)
action summary:
  add (ok: 3)
  save (ok: 1)
► Listing all profiles:
$ datalad containers-profiles
apptainer-default -> 
docker-alpine-env (extends: docker-alpine) -> alpine:latest
docker-alpine -> alpine:latest
docker-default -> 
podman-default -> 
► Testing docker-default profile with --image alpine:latest...
$ datalad containers-run --profile docker-default --image alpine:latest --output phase3g-docker.txt --expand outputs -- sh -c 'echo Phase 3g: docker-default profile > {outputs}'
[INFO] Making sure inputs are available (this may take some time) 
[INFO] == Command start (output follows) ===== 
[INFO] == Command exit (modification check follows) ===== 
run(ok): /tmp/datalad-container-demo-1114184 (dataset) [docker run --rm --user $(id -u):$(id -g)...]
add(ok): phase3g-docker.txt (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  get (notneeded: 1)
  run (ok: 1)
  save (ok: 1)

► Last commit:
286bdef [DATALAD RUNCMD] docker run --rm --user $(id -u):$(id -g)...

► Testing apptainer-default profile...
$ datalad containers-run --profile apptainer-default --image alpine:latest --output phase3g-apptainer.txt --expand outputs -- sh -c 'echo Phase 3g: apptainer-default profile > {outputs}'
[INFO] Making sure inputs are available (this may take some time) 
[INFO] == Command start (output follows) ===== 
INFO:    Using cached SIF image
[INFO] == Command exit (modification check follows) ===== 
run(ok): /tmp/datalad-container-demo-1114184 (dataset) [apptainer exec oci:.datalad/containers/i...]
add(ok): phase3g-apptainer.txt (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  get (notneeded: 1)
  run (ok: 1)
  save (ok: 1)

► Last commit:
eada7f7 [DATALAD RUNCMD] apptainer exec oci:.datalad/containers/i...

► Testing podman-default profile...
$ datalad containers-run --profile podman-default --image alpine:latest --output phase3g-podman.txt --expand outputs -- sh -c 'echo Phase 3g: podman-default profile > {outputs}'
[INFO] Making sure inputs are available (this may take some time) 
[INFO] == Command start (output follows) ===== 
[INFO] == Command exit (modification check follows) ===== 
run(ok): /tmp/datalad-container-demo-1114184 (dataset) [podman run --rm --userns=keep-id -v $(pw...]
add(ok): phase3g-podman.txt (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  get (notneeded: 1)
  run (ok: 1)
  save (ok: 1)

► Last commit:
7dde79a [DATALAD RUNCMD] podman run --rm --userns=keep-id -v $(pw...


════════════════════════════════════════════════════════════
  DEMO COMPLETE
════════════════════════════════════════════════════════════

► Demo directory: /tmp/datalad-container-demo-1114184

► Final git log:
7dde79a [DATALAD RUNCMD] podman run --rm --userns=keep-id -v $(pw...
eada7f7 [DATALAD RUNCMD] apptainer exec oci:.datalad/containers/i...
286bdef [DATALAD RUNCMD] docker run --rm --user $(id -u):$(id -g)...
895d197 Add base runtime profiles
69024cf Remove bad-profile
dd36c88 Add bad-profile for testing
3f07f05 [DATALAD RUNCMD] docker run --rm --user $(id -u):$(id -g)...
8fe8897 [DATALAD] Configure containerized environment 'busybox:latest'
ecc03bf [DATALAD RUNCMD] docker run --rm --user $(id -u):$(id -g)...
9fe0639 [DATALAD RUNCMD] docker run --rm --user $(id -u):$(id -g)...
439acfb Add docker-alpine-env profile
d68f9ab [DATALAD RUNCMD] docker run --rm --user $(id -u):$(id -g)...
5b76d11 Add docker-alpine profile
9a869d2 [DATALAD RUNCMD] docker run --rm --user $(id -u):$(id -g)...
3937679 [DATALAD] Configure containerized environment 'alpine:latest'
767d7a6 [DATALAD] new dataset

► All containers:
alpine:latest -> .datalad/containers/images/alpine/latest/image
busybox:latest -> .datalad/containers/images/busybox/latest/image

► All profiles:
apptainer-default -> 
docker-alpine-env (extends: docker-alpine) -> alpine:latest
docker-alpine -> alpine:latest
docker-default -> 
podman-default -> 

Demo completed successfully!

@asmacdo asmacdo changed the title Refactor/images profiles POC: Refactor image artifact storage & execution profiles Dec 3, 2025
@asmacdo
Copy link
Member Author

asmacdo commented Dec 5, 2025

This PR served its purpose-- I'm proceeding on the design doc PR with fewer breaking changes.

@asmacdo asmacdo closed this Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants