-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
walinuxagent fails to start when host is using EC keys [BUG] #1550
Comments
Btw, I would very much appreciate some guidance to patch our setup ASAP, eg "use sed to replace /usr/bin/openssl rsa with /usr/bin/openssl ec in file XXX". |
This PR provides a fix for the issue: c5c9c39 My sticking point now is: How can I update the /usr/sbin/waagent2.0 in an Azure VM instance? If I could patch it, even manually, I would be able to unblock myself. But when I make this fix to the file, it doesn't seem to get picked up, and I keep getting the error message showing the use of Any pointers on patching |
For anyone else looking for a way to patch walinuxagent on VMs, I found a way using cloud-init, which (I think/appears to) runs as one of the first steps in the walinuxagent sequence, before the certs are installed. The initial sticking point is that even if you patch the This is the section of my bootcmd:
# Fix a bug in walinuxagent RE ECC certs: https://github.com/Azure/WALinuxAgent/issues/1550
- cloud-init-per once update-waagent-python sed -i -e 's/RunGetOutput(Openssl + " rsa -in "/RunGetOutput(Openssl + " pkey -in "/' /usr/sbin/waagent2.0
- cloud-init-per once update-waagent-service sed -i -e 's:^ExecStart=/usr/bin/python3 -u /usr/sbin/waagent -daemon$:ExecStart=/usr/bin/python2 -u /usr/sbin/waagent2.0 -daemon:' /lib/systemd/system/walinuxagent.service
- cloud-init-per once reload-systemd systemctl daemon-reload && systemctl restart walinuxagent |
This PR is ready go, IMO - it's a one line fix. It works in our environment (though it was difficult to figure out how to install). Please consider merging it for the next version of waagent. If there's any way to get and test a hotfix version of waagent, I would be happy to help test it. |
Fixed in #1552. Closing the issue. |
Thank you for your help on this, @vrdmr . Do you know about how long it will be before this fix works its way to new Azure VMs? I'm trying to evaluate whether I should try to automate the patch process in our VMs, or just wait. |
This would be released as a part of the next agent release (hopefully 2.2.42), but the issue you pointed is in the agent embedded in the distro images itself, and that has a slower cadence (could take a couple of months to be generalized in all the azure images). |
@johncrim @vrdmr - I am re-opening this issue. Some of the client VMs are running older versions of OpenSSL and in the past we've had issues when making changes in this area. I found a couple of reports that the pkey argument is missing in openssl, for example: outroll/vesta#1825. I think we need to make this change in 2 steps --- first we'll collect telemetry to see if there are any clients not supporting this option and then we'll make the change based on that. In the meanwhile, I will revert the change in #1552 @johncrim - I assume you are patching on your side in the meanwhile? Thanks |
UGGGH. @narrieta - I have a patch, but it's pretty poor. It involves using the approach described above, which requires switching to waagent2.0 from waagent (when I created the patch, I wasn't aware that waagent2.0 is old code), which causes major bugs with OmsAgent. This is still a major issue for us in our journey to launch on Azure. Right now, our choice is between using my current patch (which prevents OmsAgent from working), or not using WALinuxAgent (which means no other extensions get deployed to Linux VMs in VMSets, which basically means we can't use Service Fabric in Azure or Azure monitoring). We've been waiting for 3 months for the patch to get deployed so that both of these issues go away. Is this currently deployed to Azure? I haven't seen it, so am still using my workaround, which causes the OmsAgent errors. I'm wondering how the issue you referenced popped up. Note that Azure keyvault now supports ECDSA certs (though UI support isn't there last I checked, the API support is), so the standard approach for copying certs to new VMs will break walinuxagent whenever an EC cert is used, if this isn't addressed. EC certs are widely considered superior to RSA certs, so this bug is holding the platform back. As opposed to rolling this back, I would recommend checking the openssl version, and only using |
Yes, that is the idea, but first we need to add telemetry to understand what old openssl version are in use.
Yes, this is old code and you shouldn't use it to replace waagent. Your patch could go here instead: https://github.com/Azure/WALinuxAgent/blob/master/azurelinuxagent/common/utils/cryptutil.py#L56 |
@narrieta : Thank you for the response. I understand the usefulness of knowing what openssl versions are in use, but for this specific issue, all that matters is whether pkey is supported or not. Since pkey was added in 1.0.0, we know that at least one waagent user uses openssl < 1.0.0. RE patching waagent instead of waagent2.0 - I understand that this is desired, but I don't see a reasonable way of completing the patch on the waagent codebase instead of waagent2.0 when provisioning a VMSet in Azure. waagent2.0 has the virtue of having raw .py files which can be edited and then run, but waagent uses pre-compiled python modules which can't be easily patched (I've tried, and the changes didn't have any effect - admittedly, I don't program python). Could you provide the commands, which can be run in cloudinit (early on in the first boot), to patch waagent as needed, and then continue the normal flow of waagent so the other extensions can be run? If I had a better patch, I'd be happier about waiting... |
@johncrim Not only we want to know the openssl versions, but more importantly we want to know what the best strategy to fallback to rsa. RE patching: have you tried patching the py file directly. When I patch live VMs I simple ignore the precompiled file and change the py file directly. |
@narrieta : Yes, I have tried patching the py file directly, but my changes weren't run. I also tried clearing out the .pyc modules, which also didn't work. That's why I ended up changing the waagent2.0 files, because that was the only change that was picked up. |
@narrieta : Please consider wrapping the Right now, the behavior is to abort the rest of waagent if the openssl call fails, and it's difficult to troubleshoot. This basically means that waagent (and by extension Azure VMs) are unusable if you reference an EC cert in your arm template. If the remainder of waagent were run (after logging an error), it would be reasonable to make the |
@narrieta Agree, the error handling should be improved. |
@johncrim Agree, the error handling should be improved. |
Support for ECDSA certs has not been added to the Agent yet. They can be deployed using the Keyvault extension: https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/key-vault-linux |
We're seeing the following error messages every 10 seconds on new Ubuntu hosts. Needless to say, none of the VM extensions are running, so we're unable to create new hosts.
The change is that the new hosts have ECDSA certs in their cert store, configured via an ARM template.
The use of (non-walinuxagent related) elliptic curve certs appears relevant because the listed command fails:
but this command succeeds:
The elliptic curve certs are references in the VM Scaleset portion of the arm templates like so:
Where cert1 and cert2 were previously RSA certs.
Distro and WALinuxAgent details:
Note that this scenario should be completely supported, b/c using such ARM templates with Azure KeyVault references is a standard way of deploying certs to VMs, and KeyVault supports ECC certificates (though they still don't have support in the UI, they work via direct API access). Info here: https://docs.microsoft.com/en-us/azure/key-vault/about-keys-secrets-and-certificates
The text was updated successfully, but these errors were encountered: