-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scaling policy created with wrong namespace if namespace is not defined in job file #24039
Comments
Hi @eduardolmedeiros and thanks for raising this issue. The command and application in question is Nomad as the Nomad Autoscaler doesn't create the scaling policies; this happens during the Nomad job registration phase. I'll move this into the Nomad repository. |
Hi @eduardolmedeiros! I was able to reproduce this. It looks like scaling policy is being created, but the target prefix doesn't include the correct namespace:
The policy gets created in the same Raft entry as the job. But at a quick glance it doesn't look like we're including the job's namespace when inserting them into the state store. I'll investigate further and report back. |
Hi @tgross , thank you for investigating this issue. Please let me know if you need any additional information or assistance from me. |
Ok, I've got the bug root-caused but I don't yet have a fix that I can be sure won't break somewhere else. Normally when we have a piece of data that's incomplete in an object we get from the jobspec, we mutate the object either in the RPC handler or the state store code, which both have access to the namespace we need. But for some reason when scaling policies were written, we do this mutation to add the missing namespace content in the conversion from an HTTP API struct to a RPC struct. That's not how we normally do this kind of thing and those are supposed to be pretty dumb conversions. The fix is to move that into the state store code, but I need to make sure that nothing is consuming the mutated policy object along the way so we don't introduce any new bugs in the process, particularly around upgrades. |
I've got a quick hack here that in my testing fixes the bug: #24065 But I do want to move this code into the state store anyways to avoid any lurking similar problems. |
When jobs are submitted with a scaling policy, the scaling policy's target only includes the job's namespace if the `namespace` field is set in the jobspec and not from the request. Normally jobs are canonicalized in the RPC handler before being written to Raft. But the scaling policy targets are instead written during the conversion from `api.Job` to `structs.Job`. We populate the `structs.Job` namespace from the request here as well, but only after the conversion has occurred. Swap the order of these operations so that the conversion is always happening with a correct namespace. Fixes: #24039
…24065) When jobs are submitted with a scaling policy, the scaling policy's target only includes the job's namespace if the `namespace` field is set in the jobspec and not from the request. Normally jobs are canonicalized in the RPC handler before being written to Raft. But the scaling policy targets are instead written during the conversion from `api.Job` to `structs.Job`. We populate the `structs.Job` namespace from the request here as well, but only after the conversion has occurred. Swap the order of these operations so that the conversion is always happening with a correct namespace. Long-term we should not be making mutations during conversion either. But we can't remove it immediately because API requests may come from any agent across upgrades. Move the scaling target creation into the `Canonicalize` method and mark it for future removal in the API conversion code path. Fixes: #24039
…in jobspec (#24065) (#24096) When jobs are submitted with a scaling policy, the scaling policy's target only includes the job's namespace if the `namespace` field is set in the jobspec and not from the request. Normally jobs are canonicalized in the RPC handler before being written to Raft. But the scaling policy targets are instead written during the conversion from `api.Job` to `structs.Job`. We populate the `structs.Job` namespace from the request here as well, but only after the conversion has occurred. Swap the order of these operations so that the conversion is always happening with a correct namespace. Long-term we should not be making mutations during conversion either. But we can't remove it immediately because API requests may come from any agent across upgrades. Move the scaling target creation into the `Canonicalize` method and mark it for future removal in the API conversion code path. Fixes: #24039 Co-authored-by: Tim Gross <tgross@hashicorp.com>
The nomad-autoscaler appears to have an issue with creating scaling policies when the namespace is not hardcoded in the job file. Even when specifying the namespace using the
-namespace
flag during job deployment, the autoscaler fails to create the corresponding scaling policy.Steps to Reproduce
http-echo.nomad.hcl
) without a hardcoded namespace.or
Expected Behavior
The nomad-autoscaler should create a scaling policy for the job in the specified namespace.
Actual Behavior
No scaling policies are created. The command
nomad scaling policy list -namespace=test
returns "No policies found".Environment
Additional Information
-namespace
flag during deployment.Possible Solutions
The text was updated successfully, but these errors were encountered: