Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

worker auth: load correct records and ability to remove unused #4822

Closed
wants to merge 2 commits into from

Conversation

irenarindos
Copy link
Collaborator

Note: this will fail until the corresponding Nodeenrollment change is released

Copy link
Collaborator

@johanbrandhorst johanbrandhorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should fix the subsequent node rotation attempt, but will it fix the dialing errors we were seeing too? This relies on us having at least two active controller connections when the split brain error happens, since one will be lost during the EOF error, and the other will be required to try to rotate again. In a perfect world, I think that means the second rotation should go through and the split brain scenario should be resolved, but what if there's only a single controller and this happens? Since the worker can't tell the controller to delete unused credentials until it's able to complete dialing, I think we'll need a fix on the controller side of the dialing too.

@irenarindos
Copy link
Collaborator Author

This should fix the subsequent node rotation attempt, but will it fix the dialing errors we were seeing too? This relies on us having at least two active controller connections when the split brain error happens, since one will be lost during the EOF error, and the other will be required to try to rotate again. In a perfect world, I think that means the second rotation should go through and the split brain scenario should be resolved, but what if there's only a single controller and this happens? Since the worker can't tell the controller to delete unused credentials until it's able to complete dialing, I think we'll need a fix on the controller side of the dialing too.

It will fix the dialing errors; those were a result of GenerateServerCertificates calling LoadNodeInformation for a key id and the old logic there. We incorrectly assumed that the key id passed in was current and assigned the current auth record creds to the returned node information.

@irenarindos irenarindos added this to the 0.16.x milestone May 24, 2024
@irenarindos irenarindos closed this Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants