You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since updating to Nomad 1.9.0, I am observing a weird behavior I cannot explain. Below is a bug report, since I would classify the behavior as a release regression from Nomad 1.9.0. The same code works flawlessly in Nomad 1.8.4 and lower.
Potentially, this issue is caused by switching to the native Docker library as done with #23966.
The issue is that in some circumstances (with a relatively short input buffer), an allocation Exec doesn't close stdin, so that the command within the allocation keeps waiting. The attached job file and the Go code were extracted from our production setup. They demonstrate the erroneous behavior, but might still be too large. For now, I didn't manage to reduce the example further.
I tried many different steps to nail down the problem, but couldn't. Any help would be greatly appreciated!
When a command in an allocation that waits for stdin is executed, the stream might not be closed as expected. This prevents the command to complete successfully.
The Nomad server / agent version is relevant, not the library version for the Go script
There are two "TODO" notes in the reproduction example:
The first one is at line 40 where the actual file size will be controlled. A file with 8100 bytes will not be processed correctly, but a file with 8200 bytes will.
The second one is at line 71, where the actual buffer size that is used for the Exec command is measured. This length is important and triggering the issue: A buffer size of 10,000 or less will not work, a buffer size of 10,000 or more works.
The same code works without any changes and with either file / buffer size in Nomad 1.8.4. It doesn't work with Nomad 1.9.0 any longer.
Expected Result
The reproduction example succeeds:
Tar archive created with 9728 bytes
fixed_filename.txt
Command executed with exit code: 0
Thanks for the detailed report @MrSerth! I was able to create a fix and even turn your reproduction example into an e2e test to help make sure something like this doesn't break again, in #24202
Awesome, thanks @shoenig for your work on this issue and the e2e test. I can confirm that your changes solve the issue reported. Furthermore, our test suite is now completing successfully given a Nomad binary from the current main branch. Looking forward to the next release of Nomad! 👍
Since updating to Nomad 1.9.0, I am observing a weird behavior I cannot explain. Below is a bug report, since I would classify the behavior as a release regression from Nomad 1.9.0. The same code works flawlessly in Nomad 1.8.4 and lower.
Potentially, this issue is caused by switching to the native Docker library as done with #23966.
The issue is that in some circumstances (with a relatively short input buffer), an allocation Exec doesn't close stdin, so that the command within the allocation keeps waiting. The attached job file and the Go code were extracted from our production setup. They demonstrate the erroneous behavior, but might still be too large. For now, I didn't manage to reduce the example further.
I tried many different steps to nail down the problem, but couldn't. Any help would be greatly appreciated!
Nomad version
Operating system and Environment details
Ubuntu 24.04.1 LTS and macOS 15.0.1
Issue
When a command in an allocation that waits for stdin is executed, the stream might not be closed as expected. This prevents the command to complete successfully.
Reproduction steps
Download the following reproduction_example.zip and execute it with
go run example.go
.Please note a few things:
Exec
command is measured. This length is important and triggering the issue: A buffer size of 10,000 or less will not work, a buffer size of 10,000 or more works.Expected Result
The reproduction example succeeds:
Actual Result
The reproduction example hangs:
Job file (if appropriate)
Nomad Server logs (if appropriate)
Nomad Client logs (if appropriate)
The text was updated successfully, but these errors were encountered: