-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wall-time/all tasks profiler #55889
base: master
Are you sure you want to change the base?
Wall-time/all tasks profiler #55889
Conversation
An additional note: this kind of wall-time/all tasks profiler is also implemented in Go (and denoted as goroutine profiler there), so there is some precedent for this in other languages as well: https://github.com/felixge/fgprof. |
@nickrobinson251 I can't assign you as reviewer... Feel free to assign yourself or post review comments otherwise. |
6b80fe3
to
f5c8f5f
Compare
I think this is related to #55103. Could the metrics here be useful in that too? |
1e9f41f
to
6cd27d7
Compare
f6ea007
to
1029a84
Compare
0d4ca9c
to
e493403
Compare
For diagnosing excessive scheduling time? I can't immediately see how this PR would be useful for that. |
#55103 seems like a much more direct approach for doing so, at least. |
90bca24
to
14766d3
Compare
14766d3
to
b9f0f1d
Compare
5ddd5ba
to
c9d1995
Compare
16587e9
to
f84bc9a
Compare
6b6da74
to
b6c9e44
Compare
A few more workloads suggested by @NHDaly. Workload 3: compute_heavy.jlusing Base.Threads
using Profile
using PProf
ch = Channel(1)
const MAX_ITERS = (1 << 22)
const N_TASKS = (1 << 12)
function spawn_a_task_waiting_on_channel()
Threads.@spawn begin
take!(ch)
end
end
function sum_of_sqrt()
sum_of_sqrt = 0.0
for i in 1:MAX_ITERS
sum_of_sqrt += sqrt(i)
end
return sum_of_sqrt
end
function spawn_a_bunch_of_compute_heavy_tasks()
Threads.@sync begin
for i in 1:N_TASKS
Threads.@spawn begin
sum_of_sqrt()
end
end
end
end
function main()
spawn_a_task_waiting_on_channel()
spawn_a_bunch_of_compute_heavy_tasks()
end
Profile.@profile_walltime main() Expected resultsWe have a lot more compute-heavy tasks than sleeping tasks. We expect to see a lot of samples in |
Cool, thanks! 🎉 🤔 Is it expected that the currently-scheduled tasks seem to have their stacks starting at a different frame than the waiting tasks? It looks like the executing tasks start with right with the function in the Task ( I can't decide if I think this is helpful or not. On the one hand, it's maybe nice to visually divide the scheduled vs sleeping tasks, but on the other hand i think it would make more sense to integrate the stacks together if they had most of their content shared. |
My instinct is that this is not desirable, and we should figure out why they're different, and correct that. |
Good question, I don't know. Will investigate this. |
b6c9e44
to
e12822b
Compare
84b9e9e
to
3839a78
Compare
3839a78
to
3d7c7e1
Compare
One limitation of sampling CPU/thread profiles, as is currently done in Julia, is that they primarily capture samples from CPU-intensive tasks.
If many tasks are performing IO or contending for concurrency primitives like semaphores, these tasks won’t appear in the profile, as they aren't scheduled on OS threads sampled by the profiler.
A wall-time profiler, like the one implemented in this PR, samples tasks regardless of OS thread scheduling. This enables profiling of IO-heavy tasks and detecting areas of heavy contention in the system.
Co-developed with @nickrobinson251.