If a CI configuration (i.e. a "GitHub workflow") requires a docker image to run, it downloads such images for each CI run. These repeated downloads of docker images, which are often hundreds of megabytes to gigabytes large, significantly slow down each CI run and consume vast amounts of network bandwidth.
Specifically, using the Sailfish-SDK images provided by Coderus for a CI run results in downloading a docker image between 1 GB and 3,5 GB in size (depending on the SDK / SailfishOS version to build for) up to three times (once for each of the supported architectures: aarch64, armv7hl and i486) from an external "docker registry" (here: Docker Hub). This affects the simple variant of using these images (by directly using the coderus/github-sfos-build
"action") and the more sophisticated one alike.
As these images are downloaded by all users of Coderus' SailfishOS Platform SDK Docker images hosted at DockerHub (the Docker "registry") and Docker imposes consequtively stricter "rate limiting" (i.e. limits for download volume and / or frequency, before access is severely throttled or someone pays for it), this may prevent the use of these images for CI runs in the future.
Caching "locally" means, with the measure(s) provided at GitHub, e.g. GitHub "actions". Ultimately all these solutions use GitHub's action/cache
, which provides (as of 2023) 10 GB of cache, expiring cached items LRU based or when an item was not accessed for a week. But as some research shows, there are many variants and indirections how to utilise GitHub's action/cache
.
Other "solutions", as an external, caching proxy server, are implicitly not very effective.
Reducing the size of docker images is always a valid approach, has some potential (many docker images carry large amounts of unnecessary cruft), but is time consuming and futile, as the creation and distribution of such images are inviting to a "quick & dirty" approach (i.e. they are much quicker and easier to create and distribute than optimised).
The only real alternative solution is to host container images "locally" at GitHub, i.e. at GitHub's container registry. For an introduction, see GitHub's documentation for creating, managing and distributing "GitHub packages".
-
The
action/cache
seems to be implicitly run in the context of the userrunner
. While asudo su
executed as part of arun:
statement is effective for subsequent shell commands (tested with the Ubuntu-Linux runner environment provided by GitHub in 2023), I have not found a way to let an "action" run in a different user context. -
The
action/cache
only accepts download targets (i.e. local paths) to be configured as items to cache, not download sources. -
These first two properties of GitHub's
action/cache
prevent to simply cache the images downloaded by the local docker instance, usually (in 2023) in/var/lib/docker/<Docker Storage Driver, e.g, overlay2>/
on Linux (utilising overlayfs, as recommended by Docker Inc. and preinstalled on Ubuntu 2x.yz), because/var/lib/docker
and all its sub-directories are assigned to the user and grouproot
and provide no access for others. Adding the userrunner
to the grouproot
does not help, because this only provides search permission in directories (i.e. thex
bit is set for directories), but still no access to the files in/var/lib/docker/[<storage-driver>](https://docs.docker.com/storage/storagedriver/overlayfs-driver/)/
. -
The
action/cache
only caches items used in a successful CI job run. Sometimes it makes sense to always cache items, which are known be independent of the outcome of a CI run, e.g. classic prerequisites for it; exactly what the Sailfish-SDK images constitute for building software for SailfishOS at GitHub.Others have also noticed that long ago and trivially patched the original
action/cache
(e.g. [1], [2]), but very often this ultimately results in stale forks. Hence applying this trivial change by "live patching" is the only maintainable solution, which resulted in an improved version of the "live patching" approach.UnfortunatelyGitHub hasnotprovided a way to adjust this behaviour by a CI configuration,despite[see] issue #92 (and subsequent issues #165, #334 etc.) has been filed for GitHub'saction/cache
long ago.
Edit: Mostly solved by the initial release ofactions/cache/save
andactions/cache/restore
in December 2022; although this extension of the originalaction/cache
still provides a larger feature set and is structurally analog to GitHub's newactions/cache/save
andactions/cache/restore
. This is now the recommended way of storing items in a cache, regardless if the whole action is sucessful or fails; still "live patching" GitHub's originalaction/cache
to also cache when the job fails still has some appeal due to the simpler usage ofaction/cache
compared to the newaction/cache/save
andaction/cache/restore
, which all three are now and continue to be maintained by GitHub. As their basic properties are the same (except for this point), the remainder of this document can stay unchanged.Plan: Enhance and release a "live patching" action, which downloads (actually: checks-out), patches and transparently maps to the locally patched version of the original
action/cache
, ultimately also to the GitHub Marketplace.
The most trivial way to cope with action/cache
's access limitations is to pre-download images expicitly. For this one creates a download directory by issuing mkdir -p $GITHUB_WORKSPACE/<image-name>
(the -p
is only used to prevent an error, when the dirctory already exists; $GITHUB_WORKSPACE
resolves to /home/runner/<repository-name>/<repository-name>
on Linux (yes, twice <repository-name>
), GitHub calls this location "runner workspace", it is naturally also the initial PWD), download the image by some third party tool (the docker CLI commands do not allow for setting the download location), then execute a docker image load
(or docker image import
) and ultimately continue as before (e.g. instanciating and starting a docker container by docker run
).
Unfortunately this approach does not work for large images (e.g. > 1 GB) due to space constraints GitHub imposes for the runner home directory. I have not followed the idea of alleviating this by raising the quota, because that requires analysis (is it imposted by a classic quota
and can it be raised by sudo
ing?) and might be seen by GitHub as cirumventing their constraints.
Mind that the git repository is also checked out to the "runner workspace" ($GITHUB_WORKSPACE
) as root directory, so do pay attention to not clobber any files or directories of your source repository.
● download-frozen-image-v2.sh
by the Moby Project
- Its source code is hosted at GitHub and uses the Apache-2.0 license.
- Created and maintained as a by-product of a lively project.
- Provides tagged, stable releases, e.g. (latest as of 2023-01-07), v20.10.22.
- Is a simple and small shell-script (< 400 sloc, ~ 13 KBytes), which implicitly documents how to call it and how to utilise it.
- My favorite "external" tool for this approach.
● Scopeo by the "Containers" project
- Its source code is hosted at GitHub and uses the Apache-2.0 license.
- Created and maintained by a lively project.
- Provides tagged, stable releases.
- Is a capable container image management utility written in Go, hence first needs to be compiled.
● storage also by the "Containers" project
- Its source code is hosted at GitHub and uses the Apache-2.0 license.
- Created and maintained by a lively project.
- Provides tagged, stable releases.
- Is a capable container storage management library written in Go, hence first needs to be compiled.
- Provides the
containers-storage
CLI wrapper for manual and scripting use.
● docker-drag by NotGlop
- Its source code is hosted at GitHub and carries no license.
- Apparently unmaintained.
- Does not provide releases or git tags.
- Is a simple and small Python script (187 sloc, 7,3 KBytes), called
docker_pull.py
.
● docker_pull by ahdrr
- Its source code is hosted at GitHub and carries no license.
- Created in 2022.
- Does provide two releases (as of 2023-01-07) and git tags.
- Written in Go, pre-compiled versions are 11,6 MBytes large.
- Inspired by / an implementation in Go of docker-drag, the tool discussed one bullet point above.
http
only?
- Its source code is hosted at GitHub and uses the MIT license.
- Does provide two releases (as of 2023-01-07) and two corresponding git tags.
- Written in Go.
- Not much used.
- Initially appeared to be an easy and elegant soultion, but …
http
only?
- Its source code is hosted at GitHub and uses the MIT license.
- Does provide stable releases and git tags (lots!).
- Written in bash, heavily uses bash specific features.
- Small, the two bash scripts summarised are < 600 sloc, < 15 KBytes.
- Aimed at a different purpose: To cache docker images which are needed for building an own image.
- Initially appeared to be (ab)usable for solely caching the download of docker images, but a little analysis shows, that one would have to dissect the main bash script and adapt it for this purpose: Currently a
docker build
call is unavoidable.
- Its source code is hosted at GitHub and uses the Unlicense license.
- Does provide two releases (as of 2023-01-07) and git tags.
- Written in JavaScript.
- Small, summarised < 700 sloc, < 25 KBytes.
- Appears to be unmaintained.
- Nobody seems to use it.
- Appears to be easier to (ab)use for only caching the downloaded docker images than Build docker images using cache (discussed one bullet point above).
- Its source code is hosted at GitHub and uses the MIT license.
- Does provide a single git tag.
- Written in TypeScript (Microsoft's superset of JavaScript).
- Smallish, ca. 275 KiB comprising compiled JavaScript (three files), a bash script and an action.yaml.
- Appears to be unmaintained.
- Appears to be a generic caching solution for pulling external dependencies.
- States to be adaptable, includes cache configurations for
pip
,npm
andyarn
. - Despite extensive documentation, I fail to quickly comprehend:
- How to configure a different source (Docker Hub).
- If it is also limited to downloads in the runner's "workspace".
- Pulled (?) from the "GitHub marketplace" 2023-01-08, see github.com/marketplace/actions/cached-dependencies. 2023-01-07 it was still there and is still found via the search!?!
● Docker Cache by ScribeMD
- Its source code is hosted at GitHub and uses the MIT license.
- Does provide stable releases and git tags (lots!).
- Comprises a few TypeScript scripts (Microsoft's superset of JavaScript), which are compiled into two JavaScript scripts (main/index.js and post/index.js) each 1,17 MiB large (!), plus a tiny action.yaml file which calls these.
- Appears to be well maintained.
- Appears to be a generic caching solution for Docker images.
- Explicitly denotes the use case "pull images from Docker Hub"!
- Works technically fine, but uses
docker save --output ~/.docker-images.tar
, which results inwrite /home/runner/.docker_temp_XYZ: no space left on device
even with the smallest SailfishOS Platform SDK images by Coderus (ca. 1 GB, but these pull in a few additional layers).
● Rootless Docker also by ScribeMD
- Its source code is hosted at GitHub and uses the MIT license.
- Does provide stable releases and git tags (lots!).
- A small, well readable action.yaml file.
- Tiny: 2,65 KBytes
- But Downloads and executes directly
https://get.docker.com/rootless
shell script (some 10 KBytes), which in turn downloads and unpacks (i.e. "installs") TAR archives of the required Docker components (some MBytes). - Appears to be well maintained.
- States to provide a set of advantages over running docker conventionally in root mode.
- Renders any specific caching moot, as GitHub's
action/cache
should suffice. - But I have not yet determined in which directories pulled images / layers are stored (Rootless Docker's default is
~/.local/share/docker
, likely in the subdirectory<Storage Driver, e.g. overlay2>
); i.e. those which are to be cached by GitHub'saction/cache
.
Use Podman instead; it is preinstalled on GitHub's Ubuntu 22.04 runner image, too.
When started by an non-root user, it uses$HOME/.local/share/containers/storage/
to store images, layers and their metadata, specifically the subdirectory<Storage Driver>-layers
for the downloaded layers. This configuration can easily be adapted. But not all files are neccesarily redable by the user, despite being their owner, because they have no permissions set (e.g. an/etc/shadow
in a conatiner image). Consequently the GitHub Actionscache
andcache/save
fail.Rootless Docker: https://github.com/ScribeMD/rootless-docker
Very likely it exposes the same issue as rootless Podman, which is described in the prior point.Docker Cache: https://github.com/ScribeMD/docker-cache
Easily runs out of space on a GitHub runner, see details in its section.download-frozen-image-v2.sh
: https://github.com/moby/moby/tree/master/contrib#readme