You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As an application developer, I'd like to deploy new instances of my application to a staging slot so that I can validate it prior to releasing to production and also quickly rollback in the case of a bad deployment
The following issues were identified with the ACR Web Hook approach, so it will be removed as a part of #316.
Failure modes for ACR webhook & Web App CD
In the following scenarios, assume the following:
“A” is a webapp. In each scenario it starts as the production slot
“B” is a webapp. In each scenario it starts as the staging slot
Scenario 1 – Failure to rollback application deployments via swap:
“A” is running image with image hash “foo”
“B” is running image with image hash “bar”
“A” and “B” are swapped
“A” (now staging) unexpectedly picks up the latest image (“bar”). I’m not sure why.
Image hash “bar” is determined to have a bug; a rollback is needed and is done by re-swapping the slots
“A” (now production) is incorrectly running “bar”
Scenario 2 – Production is already down in this case:
“A” is running a misconfigured container that fails start (i.e., process dies on startup due to bug in application code). App Service will (indefinitely) retry by pulling and deploying until it can startup successfully.
ACR push occurs; “B” picks up new image
On a retry attempt, “A” picks up the latest image and it is deployed unexpectedly to production
Scenario 3 – Possibly a very delayed impact:
“A” is running image with image hash “foo”
ACR push occurs; “B” is running image with image hash “bar”
“A” (re)starts for some reason (you can do this via portal, or perhaps a backend server dies, or perhaps the service plan scales out); “A” picks up the latest image and it is deployed unexpectedly to production.
Scenario 4 (low risk) – The first deployment:
“A” and “B” are newly deployed; No image is running in either slot
“A” and “B” try to start (indefinitely?) repeatedly
ACR push occurs; webhook for “B” fires
On a retry attempt, “A” picks up the latest image and it is deployed unexpectedly to production (note: similar to Scenario 1)
Description
As an application developer, I'd like to deploy new instances of my application to a staging slot so that I can validate it prior to releasing to production and also quickly rollback in the case of a bad deployment
Acceptance Criteria
Reference: [Done-Done Checklist] (https://github.com/Microsoft/code-with-engineering-playbook/blob/master/Engineering/BestPractices/DoneDone.md)
The text was updated successfully, but these errors were encountered: