Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ACR Pull in favor of managing container settings in App Dev CD #316

Closed
4 tasks done
nmiodice opened this issue Sep 24, 2019 · 1 comment
Closed
4 tasks done
Assignees
Labels
pri-High High priority issue

Comments

@nmiodice
Copy link

nmiodice commented Sep 24, 2019

Description

As an application developer I don't want to rely on ACR Web Hooks to initiate a deployment because it can result in unexpected deployments to production

Acceptance Criteria

  • ASE template no longer relies on webhook to deploy
  • SR template no longer relies on webhook to deploy
  • All existing tests pass
  • New tests are written if necessary to assert the correct behavior

Failure modes for ACR webhook & Web App CD

In the following scenarios, assume the following:

  • “A” is a webapp. In each scenario it starts as the production slot
  • “B” is a webapp. In each scenario it starts as the staging slot

Scenario 1 – Failure to rollback application deployments via swap:

  • “A” is running image with image hash “foo”
  • “B” is running image with image hash “bar”
  • “A” and “B” are swapped
  • “A” (now staging) unexpectedly picks up the latest image (“bar”). I’m not sure why.
  • Image hash “bar” is determined to have a bug; a rollback is needed and is done by re-swapping the slots
  • “A” (now production) is incorrectly running “bar”

Scenario 2 – Production is already down in this case:

  • “A” is running a misconfigured container that fails start (i.e., process dies on startup due to bug in application code). App Service will (indefinitely) retry by pulling and deploying until it can startup successfully.
  • ACR push occurs; “B” picks up new image
  • On a retry attempt, “A” picks up the latest image and it is deployed unexpectedly to production

Scenario 3 – Possibly a very delayed impact:

  • “A” is running image with image hash “foo”
  • ACR push occurs; “B” is running image with image hash “bar”
  • “A” (re)starts for some reason (you can do this via portal, or perhaps a backend server dies, or perhaps the service plan scales out); “A” picks up the latest image and it is deployed unexpectedly to production.

Scenario 4 (low risk) – The first deployment:

  • “A” and “B” are newly deployed; No image is running in either slot
  • “A” and “B” try to start (indefinitely?) repeatedly
  • ACR push occurs; webhook for “B” fires
  • On a retry attempt, “A” picks up the latest image and it is deployed unexpectedly to production (note: similar to Scenario 1)
@TechnicallyWilliams
Copy link
Contributor

TechnicallyWilliams commented Sep 24, 2019

Once completed, suggested next step for us may be living in user story #313 ([Docs] Document Steps for Container Validation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pri-High High priority issue
Projects
None yet
Development

No branches or pull requests

2 participants