Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add count of lock conflicts to @time etc. Add @lock_conflicts #52883

Merged
merged 3 commits into from
Feb 2, 2024

Conversation

IanButterworth
Copy link
Member

@IanButterworth IanButterworth commented Jan 12, 2024

Updated:

$ julia -t6 -q
julia> function foo()
           l = ReentrantLock()
           Threads.@threads for i in 1:Threads.nthreads()
               lock(l) do
                   sleep(1)
               end
           end
       end

julia> @time foo()
  6.055423 seconds (60.99 k allocations: 3.081 MiB, 0.16% gc time, 5 lock conflicts, 1.71% compilation time)

julia> @lock_conflicts foo()
5

@ericphanson
Copy link
Contributor

Maybe it could work like @timed where it returns the result and the number? And then have a different macro to print like @time. I could imagine it being useful to monitor/log such counts in some scenarios, which is hard to do without api access to the value.

base/atomics.jl Outdated Show resolved Hide resolved
@IanButterworth
Copy link
Member Author

Maybe this should just be added to @time etc. ? And like other categories, if there are 0 conflicts don't show that part?

@ericphanson
Copy link
Contributor

That sounds like a great idea to me!

@IanButterworth
Copy link
Member Author

IanButterworth commented Jan 13, 2024

In @time now

julia> @time Threads.@threads for i in 1:Threads.nthreads()
       lock(l) do
          sleep(1)
       end
       end
  6.062353 seconds (73.97 k allocations: 3.708 MiB, 5 lock conflicts, 2.61% compilation time)

@IanButterworth IanButterworth changed the title Profile: Add way to count lock conflicts Add count of lock conflicts to @time Jan 13, 2024
@IanButterworth IanButterworth force-pushed the ib/profile_locks branch 2 times, most recently from 06b0112 to 6e605d6 Compare January 13, 2024 05:33
@IanButterworth IanButterworth changed the title Add count of lock conflicts to @time Add count of lock conflicts to @time etc. Add @lock_conflicts Jan 13, 2024
@KristofferC
Copy link
Member

How often does this show up? The @time output is already getting pretty long..

@IanButterworth
Copy link
Member Author

I've not field-tested it much, but Pkg.update with registry updates, package updates, installs, precompile didn't show any just now. What would be a good thing to test it on?

@ericphanson
Copy link
Contributor

ericphanson commented Jan 13, 2024

I tried this branch on the LRUCache.jl tests (since they are fast to run and test some concurrency stuff), and got:

(LRUCache) pkg> instantiate
Precompiling project...
  1 dependency successfully precompiled in 1 seconds

julia> @time include("test/runtests.jl")
Test Summary:                             | Pass  Total  Time
Single-threaded insertion and reinsertion |   44     44  0.6s
Test Summary:                            | Pass  Total  Time
Multi-threaded insertion and reinsertion |    2      2  0.3s
Test Summary:                       | Pass  Total  Time
Single-threaded getting and setting |  930    930  0.4s
Test Summary:                      | Pass  Total  Time
Multi-threaded getting and setting |    7      7  0.2s
Test Summary:            |Time
Recursive lock in get(!) | None  0.1s
Test Summary:     | Pass  Total  Time
Eviction callback |   11     11  0.4s
Test Summary:    | Pass  Total  Time
Reverse iterator |   10     10  0.3s
Test Summary: | Pass  Total  Time
Large entries |   13     13  0.2s
Test Summary: | Pass  Total  Time
cache_info    |   15     15  0.3s
Test Summary:  | Pass  Total  Time
Original tests |  115    115  0.3s
  4.307419 seconds (6.51 M allocations: 324.627 MiB, 1.54% gc time, 2 lock conflicts, 108.86% compilation time: 5% of which was recompilation)

Running again in the same session, I got

  1.710402 seconds (1.58 M allocations: 77.543 MiB, 0.31% gc time, 1 lock conflict, 157.12% compilation time)

The > 100% compilation time is weird, and I see it on the v1.10.0 release as well. (Or actually, is this a known thing when compilation is happening on multiple threads?).

So a couple conflicts there.

@IanButterworth
Copy link
Member Author

How many threads did you run with?

@ericphanson
Copy link
Contributor

julia> versioninfo()
Julia Version 1.11.0-DEV.1269
Commit 2f60a38ca0 (2024-01-13 15:16 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin23.0.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, apple-m1)
Threads: 4 default, 0 interactive, 2 GC (on 4 virtual cores)
Environment:
  JULIA_NUM_THREADS = 4
  JULIA_PKG_SERVER_REGISTRY_PREFERENCE = eager

@IanButterworth
Copy link
Member Author

LRUCache uses SpinLock rather than ReentrantLock. This tooling is currently only set up for ReentrantLock.
I'm not sure how to include SpinLocks without double-counting?


I've been using this tool already to reduce locks here JuliaImages/ImageFiltering.jl#271

Before

5.483911 seconds (2.14 M allocations: 10.671 GiB, 16.62% gc time, 34255 lock conflicts)

After

2.440659 seconds (478.12 k allocations: 7.888 GiB, 21.53% gc time, 8 lock conflicts)

@IanButterworth
Copy link
Member Author

IanButterworth commented Jan 14, 2024

I tried moving the counter increment to SpinLock like below, but the OP example with that gives counts of 3, 4, 7 conflicts in repeat runs where previously it was always, as expected, 5

diff --git a/base/lock.jl b/base/lock.jl
index b593ebd10d..e3a8463578 100644
--- a/base/lock.jl
+++ b/base/lock.jl
@@ -145,7 +145,6 @@ Each `lock` must be matched by an [`unlock`](@ref).
 """
 @inline function lock(rl::ReentrantLock)
     trylock(rl) || (@noinline function slowlock(rl::ReentrantLock)
-        Threads.lock_profiling() && Threads.inc_lock_conflict_count()
         c = rl.cond_wait
         lock(c.lock)
         try
diff --git a/base/locks-mt.jl b/base/locks-mt.jl
index 5d355b9ed2..00bbabfed4 100644
--- a/base/locks-mt.jl
+++ b/base/locks-mt.jl
@@ -39,10 +39,15 @@ end
 Base.assert_havelock(l::SpinLock) = islocked(l) ? nothing : Base.concurrency_violation()
 
 function lock(l::SpinLock)
+    first = true
     while true
         if @inline trylock(l)
             return
         end
+        if first
+            first = false
+            lock_profiling() && inc_lock_conflict_count()
+        end
         ccall(:jl_cpu_suspend, Cvoid, ())
         # Temporary solution before we have gc transition support in codegen.
         ccall(:jl_gc_safepoint, Cvoid, ())

@ericphanson with that LRUCache tests with -t6 give

3.156468 seconds (5.71 M allocations: 282.114 MiB, 0.82% gc time, 417 lock conflicts, 180.90% compilation time)

@vchuravy
Copy link
Member

Hm, I don't know if adding this to @time is the best strategy.

I would rather do something like Go, which has profiling support for mutex https://go.dev/doc/diagnostics

@pchintalapudi started to add something like this for the low-level locks.

@IanButterworth
Copy link
Member Author

Doing that too would be great. I like that @time flags if things are happening that you may want to get out the more capable profiling tools to look into. If there are no lock conflicts it doesn't mention them.

@ericphanson
Copy link
Contributor

if the concern is that it's too noisy, maybe there's some heuristic like if the number of conflicts is more than x% of the number of total lock calls, then show it? that would require a second counter though, which would trigger even on the fast path

@IanButterworth
Copy link
Member Author

It could count time spent waiting at locks and if that's greater than a % of total CPU time then show?

@ericphanson
Copy link
Contributor

That sounds even better. And perhaps what can be shown is that time or %? Since it is likely more interpretable than a raw count. (Though the cputime vs walltime distinction may be hard to communicate briefly)

@IanButterworth
Copy link
Member Author

I tried drafting some example messages with timing but I think count might be sufficient. It points to the mechanics of the code. If you just got a time it'd be hard to understand compared to threading structure. At least that was helpful in the PR I used it in above, because it compared to number of calls in my code that had deeply nested locks.

For the more detailed profiling a count and time, and more would be great.

@gbaraldi
Copy link
Member

Maybe we should do something like @time verbose=true ?

@KristofferC
Copy link
Member

It feels like there is a very large number of timers/counters that can be put into @time. At some point, maybe it shouldn't be put into @time and get its own macro or some more structured form etc.

@ericphanson
Copy link
Contributor

@timed is more structured and with #52889 @time is just display

@IanButterworth
Copy link
Member Author

My impression from using this a bit is that it's not particularly noisy except where it's helpful to highlight.

If the main concern with adding things to @time is noise if it reports too frequently, could this be merged so more can test it in the wild, with a particular intention to decide whether it's suitable before 1.11.0 lands? Could add an issue to the milestone etc.

Or if people want me to try more examples first, I'd be happy to try and report.

@IanButterworth IanButterworth marked this pull request as ready for review January 21, 2024 23:33
@ericphanson
Copy link
Contributor

I think the docs should be clear it only affects ReentrantLock currently

Copy link
Member

@quinnj quinnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is great and I've wanted something like this before; I think it's worth trying out.

@IanButterworth IanButterworth added needs docs Documentation for this change is required triage This should be discussed on a triage call labels Jan 31, 2024
@IanButterworth
Copy link
Member Author

I'm marking this for triage to review proposal #52883 (comment)

Also added the needs docs as a reminder to mention it being specific to ReentrantLock in the docs

@LilithHafner
Copy link
Member

From a "is this too noisy" perspective, I think this is good to merge. If it turns out to really be too noisy, we can back it out or use a different heuristic.

@IanButterworth IanButterworth removed needs docs Documentation for this change is required triage This should be discussed on a triage call labels Feb 1, 2024
@IanButterworth
Copy link
Member Author

We discussed this in the triage call. Whilst there wasn't a quorum to make triage decisions, @LilithHafner and I did agree that the plan above can be tried without triage approval.

@IanButterworth IanButterworth added the merge me PR is reviewed. Merge when all tests are passing label Feb 1, 2024
base/timing.jl Outdated Show resolved Hide resolved
base/timing.jl Outdated Show resolved Hide resolved
@IanButterworth IanButterworth removed the merge me PR is reviewed. Merge when all tests are passing label Feb 2, 2024
fix side effects in allocation tests
suggestions
@IanButterworth IanButterworth added the merge me PR is reviewed. Merge when all tests are passing label Feb 2, 2024
test/misc.jl Outdated Show resolved Hide resolved
@IanButterworth IanButterworth merged commit 12c5d2d into JuliaLang:master Feb 2, 2024
7 checks passed
@IanButterworth IanButterworth deleted the ib/profile_locks branch February 2, 2024 23:48
@LilithHafner LilithHafner removed the merge me PR is reviewed. Merge when all tests are passing label Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants