-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bounce #1550
Merged
Merged
Fix bounce #1550
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
vext01
commented
Jan 16, 2025
hwtracer/src/perf/collect.c
Outdated
@@ -765,6 +742,11 @@ bool hwt_perf_free_collector(struct hwt_perf_ctx *tr_ctx, | |||
if (tr_ctx->stop_fds[0] != -1) { | |||
close(tr_ctx->stop_fds[0]); | |||
} | |||
if (tr_ctx->perf_fd >= 0) { | |||
/* fprintf(stderr, "close perf\n"); */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gah.
ltratt
reviewed
Jan 16, 2025
Please force push an update. |
We've been experiencing non-deterministic segfaults, assertion failures and Lua-level errors in bounce.lua for a while: ykjit#1544 After a couple of days of systematically narrowing down the cause, it appears to be something to do with the re-use of the PT resources in collect.c. In theory reusing the PT stuff for subsequent tracing sessions on the same thread *should* be just fine -- I've even found examples on the internet where people do this -- however, something is wrong with our implementation, so I'm disabling the caching for now. This commit does several things related to this: - Fixes a bug in the aux buffer read code in the case that the ring buffer wraps around. We were incorrectly bumping the length of the recorded trace, which would cause trace corruption in the form of uninitialised reads when we come to decode the trace: ```diff - trace->len += size + head; + trace->len += head; ``` Later changes (not re-using the aux buffer) will actually mean that the wraparound case can't happen (since the aux buffer is the same size as the allocated copyout buffer), but I'm leaving it in in case we ever decide to re-use perf stuff again. Sadly this fix alone isn't enough to fix bounce.lua. - Removes caching of perf-related mmap buffers and the perf file descriptor itself. This basically reverts bedafd9. This means each tracing session now allocates fresh memory and opens a fresh perf fd. As part of this change we re-introduce `PacketParserState::Init` and add a couple of other permissible packets that I've seen come up when using try_repeat on the test suite. - Ensures that the collector's resources are freed by calling `hwt_perf_free_collector()`, even if collecting a trace is aborted by (e.g.) "trace too long". We were failing to do this, meaning that resources would potentially leak. Since we no longer cache the perf fd, and this is closed in `hwt_perf_free_collector()`, if we don't call that, then the thread is effectively blocked from ever tracing again: trying to open perf agaiin would give "resource busy".
Force pushed the fix. Sorry about that! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We've been experiencing non-deterministic segfaults, assertion failures
and Lua-level errors in bounce.lua for a while:
#1544
After a couple of days of systematically narrowing down the cause, it
appears to be something to do with the re-use of the PT resources in
collect.c.
In theory reusing the PT stuff for subsequent tracing sessions on the
same thread should be just fine -- I've even found examples on the
internet where people do this -- however, something is wrong with our
implementation, so I'm disabling the caching for now.
This commit does several things related to this:
Fixes a bug in the aux buffer read code in the case that the ring
buffer wraps around. We were incorrectly bumping the length of the
recorded trace, which would cause trace corruption in the form of
uninitialised reads when we come to decode the trace:
Later changes (not re-using the aux buffer) will actually mean that
the wraparound case can't happen (since the aux buffer is the same
size as the allocated copyout buffer), but I'm leaving it in in case
we ever decide to re-use perf stuff again.
Sadly this fix alone isn't enough to fix bounce.lua.
Removes caching of perf-related mmap buffers and the perf file
descriptor itself. This basically reverts bedafd9. This means each
tracing session now allocates fresh memory and opens a fresh perf fd.
As part of this change we re-introduce
PacketParserState::Init
andadd a couple of other permissible packets that I've seen come up when
using try_repeat on the test suite.
Ensures that the collector's resources are freed by calling
hwt_perf_free_collector()
, even if collecting a trace is aborted by(e.g.) "trace too long". We were failing to do this, meaning that
resources would potentially leak. Since we no longer cache the perf
fd, and this is closed in
hwt_perf_free_collector()
, if we don'tcall that, then the thread is effectively blocked from ever tracing
again: trying to open perf agaiin would give "resource busy".
This appears to fix the original
bounce.lua
and the reduced one that me an Lukas devised.I did 3 try_ci runs of this and all suceeded! Fingers crossed.