-
Notifications
You must be signed in to change notification settings - Fork 5
Gather full per-process statistics for a make job with current bad slowdown #37
Comments
The time when building glibc-2.27:
Getting a very detailed log per-process per-patchsite-per-syscall is a bit complicated, I plan to send over the events via a unix domain socket events, to a systrace server, what do you think? FYI the counter tool has a
|
Logging doesn't have to be high performance -- can't you just dump it all out to a file our stderr or whatever? For documenting perf overhead, it seems like there should be three variants/modes for systrace:
So does the 100% overhead quoted above represent a fully safe multithreaded version, or an unsafe one? Because systrace has more of a "global" view of the execution than |
It is the The document can be found at: https://github.com/iu-parfunc/systrace/blob/master/docs/syscall-patch.md |
There are a couple things I don't understand about the output of
These don't add to 100%!
Does that refer to the number of patch sites in the code? That seems high! But this is across 42,271 total processes, so perhaps not. Still this is a totally excessive amount of patching that we need to reduce w/ any of those techniques we've discussed [1] These statistics are good, but they lose a lot of information by summing across all processes. It would be good to get a "row per process" with them summing to the quoted numbers. It would also be good to start to break down per-syscall-type and per-patch site to dig into where the highest overheads are coming from empirically and start hacking away at it. [1] prepatching, LD_PRELOAD, in-process patching w/ SIGSYS, skipping some syscalls, etc... |
These are un-patched syscalls, they are handled by ptrace
captured syscalls, patch applied
captured syscalls, because the patch has been applied already, these numbers means they syscalls transparent to the tracer
the patch number could be lower than the number of calls after it is being patched, i.e.: |
@wangbj - which make job was it that had 4X "count tool" slowdown currently?
Let's focus on that job for the moment and build up a couple CSV datasets representing:
Of course a really detailed log could be post-processed to generate whatever counts we want too. A variant of (1) would be count information at the level of libc calls augmenting the syscall-/instruction-level information. This would help answer questions of the form:
Or:
A conversation with @mikerainey could help shed light on algorithm design for amortizing these overheads in a provably efficient way (that's what he specializes in, but in the different context of parallelism overheads).
The text was updated successfully, but these errors were encountered: