Replies: 1 comment
-
I've submitted a PR: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Context
At Happy Scribe, we are using Kamal to orchestrate a bunch of workers that run FFmpeg to do video transcoding. FFmpeg is not great at distributed computing, and some video transcoding jobs can take more than 30 minutes.
Kamal is being extremely useful to us because it allows us to run those jobs in cheap hardware in Hetzner, while we keep the rest of our application running on Heroku. Thanks a lot for this project 🙌
The only problem we have is that we cannot find a way to prevent our jobs from being stopped midway while also having quick deployments as part of our Monolith's CI/CD system
Configuration that almost works
Right now, we have Kamal configured with a
stop_timeout
of 10 minutes, which callsdocker stop -t 600
. This sends first a SIGTERM to the worker and waits 10 minutes until sending a SIGKILL. In most queuing systems, upon receiving the SIGTERM, workers stop picking up new jobs but still finish those that are in progress. While the old worker container is finishing the jobs, Kamal has already started a new one with the new code. We then have our queuing system configured to re-enqueue the job in case it's running close to 10 minutes after SIGTERM.This works reasonably well. However, we would like the stop timeout to be something more like 30 minutes or 1 hour, while at the same time not locking subsequent deployments.
Proposal
What I am envisioning is Kamal just calling
docker stop -t stop_timeout
in the background and "forgetting" about the container. To have a bit of persistence and be able to monitor this, we could also create some files in the host's file system to keep track of what PID is responsible for killing what container. And then check things are good in subsequent deployments.Is this something you'd be willing to explore? (perhaps under a configuration flag) If so, I'll work on a PR and submit it.
If not, is there any alternative approach you'd recommend?
Note: I know this would be "easy" with Kubernetes, but honestly I'll rather maintain my own fork of Kamal, than our own Kubernetes cluster.
Beta Was this translation helpful? Give feedback.
All reactions