-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terraform fails to notice when a Nomad job has changed #1
Comments
@paddycarver / @grubernaut / @radeksimko, is this a verified issue/bug? I'm seeing a slightly different issue, but not sure if it's the same as this. If a job has been stopped, or is not running (but the spec has not changed), then the TF nomad provider should update nomad. It might be best to always have the job sent to nomad, and let nomad work out issues with differences in the spec? This issue makes the provider unusable for the most basic use cases. |
The underlying issue here also produces the following behavior:
|
Hi @ketzacoatl! I can't say for sure whether this is a verified bug, nor can I explain the behaviour. I'll try to look into this soon and come back with a bit more information, and a fix if necessary. Apologies for the delays on this. |
hi @paddycarver, thanks for taking a look! were you able to confirm the behavior we see in practice? |
My observation is the same as @ketzacoatl. It would be just more operator friendly if users could interact via Terraform, rather than My versions are |
Any updates on this @paddycarver? Also, a question: Does the Terraform provider always submit the job to nomad, or does it decide whether or not it should? |
@paddycarver I took a quick look inside the codebase, and it seems the culprit is:
#15 by @apparentlymart adds more metadata, but Solution 1: add
|
@paddycarver, I'd love to help resolve this problem, as IMO it prevents serious use of Terraform to manage nomad, do you have any guidance or recommendations on how you would like to address the problem? cc @katbyte / @radeksimko |
@mitchellh I'd be very grateful for your feedback/guidance here. |
@cgbaker, With renewed development efforts here, what are your thoughts on this issue? |
Hi @ketzacoatl, I wholeheartedly agree that this is something that must be addressed. This shortcoming makes the Nomad provider just about useless for Day 2 operations. As noted in #15 , there are a few different options for addressing this. The solution partially hinges on whether we want to try to find a general solution that doesn't require modifying and re-releasing the provider every time Nomad releases a new version. On the other hand, with the Nomad product team taking ownership of this TF provider, we can potentially address such a workflow a little better than before. And it may be prudent to find a temporary solution to this issue while we work on a generic/unversioned provider. The Nomad team is committed to addressing this; I will post an update here soon giving an idea as to the timetable, but my intention is to either resolve this issue as part of the upcoming 1.3.0 version of the provider (targeting Nomad 0.8.x) or the 1.4.0 version (targeting Nomad 0.9.x). Thank you for your patience and persistence on this issue. |
That would be great, WRT being able to continue using the provider while future improvements get worked out. |
Nomad 0.8.x API support is available in the Nomad v1.3.0 that released today. I will look at finding a longer-term solution for this in upcoming versions; having said that, even if we continue version the Nomad providers, we pledge to be much more responsive in updating the Nomad provider going forward. |
Hello, just checking back up on this - have there been any improvements related to this? |
No update as of now, we've been focusing on core Nomad work, finishing the
0.9.0 release.
…On Thu, Apr 4, 2019 at 1:02 AM ketzacoatl ***@***.***> wrote:
The Nomad team is committed to addressing this; I will post an update here
soon giving an idea as to the timetable, but my intention is to either
resolve this issue as part of the upcoming 1.3.0 version of the provider
(targeting Nomad 0.8.x) or the 1.4.0 version (targeting Nomad 0.9.x).
Hello, just checking back up on this - have there been any improvements
related to this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABmPT9wkf1S_J33gI6wrFz0StYLsaQTuks5vdYdQgaJpZM4N5Ctd>
.
|
Fair enough, thanks for the update! |
Update: this was not resolved in the 1.4.0 release of the provider, but we're still tracking this. |
I am stoked to see this on the 1.5 milestone, rock on! |
It have been a year since the last comment. Any update on this? |
@cgbaker WRT the roadmap, are there refactors or internal changes that block fixing this properly? |
we were waiting for the update 2.0 plugin SDK to see if it helped deal with this. we're actively looking at it now; we want this issue dealt with in the next few months as part of the Nomad 1.0 milestone. |
Resolved hashicorp#1. Terraform fails to notice when a Nomad job has changed To have this, you need to maximize the use of Job Meta. (eg: Docker Image) This patch only notice when number of running instances of group in job has changed. Tested with Nomad 0.11.3 Tested with `Service` job. Batch is assumed to be save to run multiple times so there would be no problem.
Is there any plan to release v1.5 soon? Is there more work to finish out this feature? #149 hasn't been touched in almost 2 years. |
Hi @eliburke 👋 No plans to get this fixed unfortunately. The work on #149 was a brave attempt to try and map all possible jobspec fields into a Terraform resource schema, but that was not a sustainable approach. As Nomad evolves it becomes almost impossible to manually keep up. We think that a better approach would be to leverage the Nomad OpenAPI project that is able to auto-generate a Nomad job spec, and the Terraform Provider Framework which is a new and more flexible way to create providers. But this will take a significant effort that is not in our roadmap yet. |
suggestion: to have a quick workaround for this problem, what we have been doing for quite some time is the we render out the entire Nomad job file and use it via the terrafrm-nomad provider (in comple rendered form) What this does is, when you suspect things, and want to check what is up with the job, you can try out In my opinion this provides an easy escape hatch mechanism. |
Can the Submission field in the API now be used? |
Hi @tristanmorgan, Unfortunately that's not enough. The key issue here is describing the Nomad job specification as a Terraform resource schema and detecting changes on individual fields. The resource already has the raw jobspec stored, which would be the same as the value returned by the new job submission endpoint. |
Just made a comment on #238 (comment) about this. |
If available, use the job submission source to detect changes to `jobspec`. This can mitigate drift detection problems such as #1.
Reading @jorgemarey comment again, I noticed this part:
While not quite related to this issue, it made me realize that we can use the job submission data (if available) to detect changes to |
The default behaviour of the Terraform SDK is to copy the plan result into state which could result in partial state updates, where Terraform state is updated, but the actual resource state is not, in case of an error during the apply. This is normally not an issue because resources are expected to undo these changes on state refresh. Any partial update is reconciled with the actual resource state. But due to #1, the `nomad_job` resource is not able to properly reconcile on refresh, causing the partial update to prevent further applies unless the configuration is also changed. This commit uses the `d.Partial()` method to signal to Terraform that any state changes should be rolledback in case of an error.
The default behaviour of the Terraform SDK is to copy the plan result into state which could result in partial state updates, where Terraform state is updated, but the actual resource state is not, in case of an error during the apply. This is normally not an issue because resources are expected to undo these changes on state refresh. Any partial update is reconciled with the actual resource state. But due to #1, the `nomad_job` resource is not able to properly reconcile on refresh, causing the partial update to prevent further applies unless the configuration is also changed. This commit uses the `d.Partial()` method to signal to Terraform that any state changes should be rolledback in case of an error.
The default behaviour of the Terraform SDK is to copy the plan result into state which could result in partial state updates, where Terraform state is updated, but the actual resource state is not, in case of an error during the apply. This is normally not an issue because resources are expected to undo these changes on state refresh. Any partial update is reconciled with the actual resource state. But due to #1, the `nomad_job` resource is not able to properly reconcile on refresh, causing the partial update to prevent further applies unless the configuration is also changed. This commit uses the `d.Partial()` method to signal to Terraform that any state changes should be rolledback in case of an error.
If available, use the job submission source to detect changes to `jobspec`. This can mitigate drift detection problems such as #1.
If available, use the job submission source to detect changes to `jobspec`. This can mitigate drift detection problems such as #1. Read HCL2 variables from job submission even if the `nomad_job` resource does not speify an `hcl2` block.
Is this improved at all with v2 of the provider? I need to take over control of previously hand-generated Nomad job templates with Terraform. Usually when I do this workflow I create a resource in the Terraform code, import the existing resource, and reconcile the differences. However, the fact that this provider doesn't detect differences completely breaks that workflow. I briefly tried upgrading to v2 provider but still got a unexpectedly large diff, how much of that was this bug and how much was v1 to v2 incompatibilities I didn't determine. I dropped back to v1 to not burn time upgrading to a new version that might have the same major bug. |
Hey folks! @gulducat and I did a quick re-assessment of this bug and here's what the current situation is:
After some internal discussion I'm marking this for further roadmapping. We'll update again once we know more. |
@tgross awesome update, thank you for all that info. Even with a big lift in the future, smaller/incremental improvements are very welcome! |
This issue was originally opened by @blalor as hashicorp/terraform#14038. It was migrated here as part of the provider split. The original body of the issue is below.
Terraform Version
Terraform v0.9.3
Affected Resource(s)
Terraform Configuration Files
Debug Output
Log for 2nd
terraform apply
: apply.log.txtConsole output:
Expected Behavior
Terraform should have noticed that the job had changed in the cluster and updated it.
Actual Behavior
Nuttin', honey.
Steps to Reproduce
nomad agent -dev
terraform apply
nomad status example
shows one allocation for task groupgrp
change task group count:
nomad status example
shows two allocations forexample.grp
terraform plan
shows no changeterraform apply
makes no changenomad status example
still shows two allocationsThe text was updated successfully, but these errors were encountered: