-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Lately, our team has been experiencing issues with ZIP deployments to App Service, via GitHub Actions workflows. However, it does not appear that GitHub Actions has anything to do with this particular issues.
Over the past several weeks, we've been seeing the following errors, sporadically, during deployments to app service
Deployment has been stopped due to SCM container restart. The restart can happen due to a management operation on site. Do not perform a management operation and a deployment operation in quick succession. Adding a small delay can help avoid any conflicts.
As I mentioned, our team uses GitHub Actions to deploy our applications to Azure App Service, using the ZIP deploy method via the azure/webapps-deploy action. Under the hood, this should be using the Kudu APIs to deploy, similarly to if we used the Azure CLI az webapp deploy command.
Prior to the past few weeks, we had not experienced these errors in the several months that we've been deploying our apps to app service. It's also worth noting that these errors started occurring without any changes being made to the workflow.
In researching this issue, I've come to understand that Kudu utilizes a separate SCM container that provides the functionality needed for ZIP deployments, and that this container is restarted any time a "management operation" is performed for a web app (i.e. start, stop, etc). Judging from the error message, if that SCM container is in the process of restarting while a deployment is happening, then Kudu will fail the deployment with the error message I shared above.
In our case, we do stop our web apps prior to our standard deployments, because we apply database migrations, and we need to make sure that the applications are not accessing the database while we apply migrations. Once the DB migrations have been applied, we then deploy and start the web apps.
To mitigate the issue, for the time being, we've added a delay between the time when the apps are stopped, and when the latest build is deployed. We started with 30 seconds, which seemed to initially mitigate the issue. However, the same errors began popping up again this week, so now we've increased the delay to 60s, hoping this will avoid the problem.
My ultimate question is this...
Rather than put the burden on consumers to add arbitrary delays, can the underlying deployment functionality that is part of Kudu/App Service be updated to wait, or retry, if the SCM container is in the process of restarting?
If we continue to see this issue, our team will likely have to consider a switch to FTPS for deployments, to avoid the issues related to the SCM container. That would be a shame, however, because it would prevent us from authenticating with managed identities, which can be restricted to only accessing resources from GitHub Actions workflows. Instead, we'd have to use basic FTP credentials that could be used from "anywhere".