Add ability to stop containers asynchronously #579
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
We've used Kamal to move some long-running jobs (ffmpeg transcoding) from AWS to Hetzner. The savings in $$-to-big-tech are amazing, thanks a lot for this project 🙌
The only problem we have is that we cannot find a way to prevent our jobs from being stopped midway while also having quick deployments as part of our Monolith's CI/CD system.
The simplest solution I could think of is:
docker stop
command with a generous grace period and forgets about my workers.I gave a bit more context on this discussion: #491
Implementation
stop_asynchronously
New configuration flag that can be applied per role:
stop_asynchronously
. This is more or less how we are using itRunning docker stop in the background
The command that is run is something like:
Keeping track of stopped containers
We persist the records in a simple plain text file that looks like this:
If the container is still up in a subsequent deploy that happens after the recorded stop time, then Kamal stops it synchronously.
Closing thoughts
I understand long-running jobs may not be the focus of this tool, and therefore the extra complexity in this PR might not be justified. The reason I built this is that I'd rather maintain a fork of Kamal than deploy our own Kubernetes cluster. Guidance on reducing this complexity would be greatly appreciated if you think this can be a good fit for Kamal.
Also: I am not very sure the way I encapsulated this behavior is in line with Kamal's architecture. I am more than open to feedback and happy to do any modification that would align better with the project.