Remove ACR Pull in favor of managing container settings in App Dev CD #316

nmiodice · 2019-09-24T18:31:23Z

Description

As an application developer I don't want to rely on ACR Web Hooks to initiate a deployment because it can result in unexpected deployments to production

Acceptance Criteria

ASE template no longer relies on webhook to deploy
SR template no longer relies on webhook to deploy
All existing tests pass
New tests are written if necessary to assert the correct behavior

Failure modes for ACR webhook & Web App CD

In the following scenarios, assume the following:

“A” is a webapp. In each scenario it starts as the production slot
“B” is a webapp. In each scenario it starts as the staging slot

Scenario 1 – Failure to rollback application deployments via swap:

“A” is running image with image hash “foo”
“B” is running image with image hash “bar”
“A” and “B” are swapped
“A” (now staging) unexpectedly picks up the latest image (“bar”). I’m not sure why.
Image hash “bar” is determined to have a bug; a rollback is needed and is done by re-swapping the slots
“A” (now production) is incorrectly running “bar”

Scenario 2 – Production is already down in this case:

“A” is running a misconfigured container that fails start (i.e., process dies on startup due to bug in application code). App Service will (indefinitely) retry by pulling and deploying until it can startup successfully.
ACR push occurs; “B” picks up new image
On a retry attempt, “A” picks up the latest image and it is deployed unexpectedly to production

Scenario 3 – Possibly a very delayed impact:

“A” is running image with image hash “foo”
ACR push occurs; “B” is running image with image hash “bar”
“A” (re)starts for some reason (you can do this via portal, or perhaps a backend server dies, or perhaps the service plan scales out); “A” picks up the latest image and it is deployed unexpectedly to production.

Scenario 4 (low risk) – The first deployment:

“A” and “B” are newly deployed; No image is running in either slot
“A” and “B” try to start (indefinitely?) repeatedly
ACR push occurs; webhook for “B” fires
On a retry attempt, “A” picks up the latest image and it is deployed unexpectedly to production (note: similar to Scenario 1)

TechnicallyWilliams · 2019-09-24T18:50:03Z

Once completed, suggested next step for us may be living in user story #313 ([Docs] Document Steps for Container Validation)

nmiodice added the pri-High High priority issue label Sep 24, 2019

nmiodice self-assigned this Sep 24, 2019

This was referenced Sep 24, 2019

Web hook should be specific to a deployment target, not all deployment targets #299

Closed

ACR push happens to production slot, not staging slot #276

Closed

nmiodice mentioned this issue Sep 25, 2019

Remove ACR Webhooks; Do not manage container settings via Terraform #324

Merged

nmiodice closed this as completed Sep 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove ACR Pull in favor of managing container settings in App Dev CD #316

Remove ACR Pull in favor of managing container settings in App Dev CD #316

nmiodice commented Sep 24, 2019 •

edited

Loading

TechnicallyWilliams commented Sep 24, 2019 •

edited

Loading

Remove ACR Pull in favor of managing container settings in App Dev CD #316

Remove ACR Pull in favor of managing container settings in App Dev CD #316

Comments

nmiodice commented Sep 24, 2019 • edited Loading

Description

Acceptance Criteria

Failure modes for ACR webhook & Web App CD

Scenario 1 – Failure to rollback application deployments via swap:

Scenario 2 – Production is already down in this case:

Scenario 3 – Possibly a very delayed impact:

Scenario 4 (low risk) – The first deployment:

TechnicallyWilliams commented Sep 24, 2019 • edited Loading

nmiodice commented Sep 24, 2019 •

edited

Loading

TechnicallyWilliams commented Sep 24, 2019 •

edited

Loading