You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using Docker's live-restore feature, Nomad currently respawns containers (creates new allocations) unnecessarily during Docker daemon restarts. This behavior counteracts the benefits of using live-restore and might cause disruptions in environments where Docker daemon updates or restarts are necessary.
Current Behavior
Docker daemon is configured with live-restore enabled
When the Docker daemon restarts, containers continue running as expected
However, Nomad creates new allocations for these containers, effectively respawning them
This results in unnecessary disruption and resource usage
Expected Behavior
Nomad should be aware of Docker's live-restore feature
When Docker daemon restarts, Nomad should:
Detect that the daemon is unavailable
Wait for a configurable timeout period
Once the daemon is available again, check the actual state of containers
Only create new allocations if the containers are genuinely not running
Proposed Solution
Add a new configuration option for the Docker driver in Nomad, such as:
config {
docker_live_restore_timeout="5m"
}
This would allow Nomad to wait for the specified duration before deciding to create new allocations when it loses connection to the Docker daemon.
Additional Context
This feature would be particularly useful for environments where Docker daemon updates or restarts are necessary, such as for security patches or version upgrades
It would allow for more seamless operations and reduce unnecessary container churn
Possible Implementation
Add a new configuration option to the Docker driver
Modify the Docker driver's health check mechanism to be aware of this timeout
Implement a reconciliation process that checks the actual state of containers with Docker after a daemon restart
Impact
This feature would improve Nomad's behavior in environments using Docker's live-restore, reducing unnecessary allocation churn and making Docker daemon maintenance less disruptive.
The text was updated successfully, but these errors were encountered:
Hello @eduardolmedeiros, Thanks for suggesting this but in the latest version, when the daemon goes down, Nomad is unable to determine if the containers are running or not, so they allocations are classified as pending but once the daemon goes back up, it reports again the running containers to Nomad and the agent picks them up again. No new allocations are spawned. If you have a good example where the containers are redeployed, we would love to see it and learn if there is something that needs fixing. Feel free to reach out again if you keep running into problems, we are always looking to make Nomad better.
Description
When using Docker's
live-restore
feature, Nomad currently respawns containers (creates new allocations) unnecessarily during Docker daemon restarts. This behavior counteracts the benefits of usinglive-restore
and might cause disruptions in environments where Docker daemon updates or restarts are necessary.Current Behavior
live-restore
enabledExpected Behavior
live-restore
featureProposed Solution
Add a new configuration option for the Docker driver in Nomad, such as:
This would allow Nomad to wait for the specified duration before deciding to create new allocations when it loses connection to the Docker daemon.
Additional Context
Possible Implementation
Impact
This feature would improve Nomad's behavior in environments using Docker's
live-restore
, reducing unnecessary allocation churn and making Docker daemon maintenance less disruptive.The text was updated successfully, but these errors were encountered: