-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test/0100-api.bats fail on recent systems #76
Comments
My guess would be that the |
Release mode helps - the test always passes. The trace log from the test with demos compiled in debug mode is below. Do you see something suspicious there. Or is it that ASAN makes everything much slower?
|
Not sure about ASAN, that imo shouldn't make that much difference for such a short program, but looking at Try commenting out |
Try commenting out `-Db_coverage=true` in `make debug` and see if it still happens.
It happens less often, but it happens. I think I'll work around it by
requiring the output to only contain our snippet, instead of requiring
exact equivalence. There will still be opportunity to fail due to mixing
initial lines, but this the downside of time-dependent tests.
|
This patch should also fix the issue by adding a second, longer window, which gives the process enough time to finish, while still correctly testing that the initialization occurred.
|
Why we need a second window? If I make the first window longer (I use 10 seconds), it is sometimes sufficient. I mean that sometimes, demos ends immediately when both process finish, sometimes demos waits for the end of the window, which is annoying. It seems like there is some bug in termination logic. |
It seems that the termination of the second process is detected only after the window ends. I'm not sure whether the the process really ends that late, but it seem unlikely. Maybe, we have a bug in |
Because by making the first window longer, the test no longer actually tests that the process has time to complete initialization. The original logic behind the test case is that during initialization, Currently, the test would fail anyway, because the process library detects an attempt to call |
I bisected master, and all these weird issues with slow startup, delays,... started happening after this commit, so I'd say that we have the suspect:
|
For the whole duration where the process is unresponsive, I see that it's in the It's possible that on your kernel version, this is compounded by some other effect, but I cannot reproduce the issue after a clean rebuild ( |
So my conclusion is that when compiled with code coverage tracking, this
test simply cannot work reliably even with the additional 1000 ms
window. I'll keep this open to remind that to anybody encountering the
problem.
To test these things reliably, we would need to eliminate time from the
tests. This is IMO only possible by using some kind of virtual time
instead of real time. Perhaps some ptrace magic similar to what we do in
OSY. And that's too much work for this project :-)
|
When I run the tests in current nixos-unstable with Linux 6.0.7, the test sometimes fails as follows:
I was not able to reproduce the failure it on nixos-22.05 with the same kernel.
The text was updated successfully, but these errors were encountered: