Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use System.Threading.Lock in Kestrel #57236

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ladeak
Copy link
Contributor

@ladeak ladeak commented Aug 8, 2024

Use System.Threading.Lock in Kestrel

Replacing lock objects with System.Threading.Lock

Description

  • Addressing Servers/Kestrel module
  • HTTP3 has a few dictionaries/lists locked that is not addressed in this PR
    • Question: should these objects have a corresponding Lock field to lock on instead of the actual object being locked?
  • In my personal measurements (not attached to this PR) the Lock type performed marginally better compared to the regular Monitor Lock when there is no contention.
  • Is there a way to run perf tests with the /benchmark plaintext aspnet-citrine-lin kestrel or is there a particular benchmark that might be interesting to run? Not sure if there is anything designed for locking and contention.
  • The objects changed does not seem to be passed around otherwise (for other types also locking them), so I think these are safe. (passing the Lock to a method accepting object and locking on the object results a problematic behavior as I see)

Linked #56794

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions label Aug 8, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Aug 8, 2024
@martincostello
Copy link
Member

Is there a way to run perf tests with the /benchmark plaintext aspnet-citrine-lin kestrel

The team members can run those for you - it needs write permission on the repo.

@BrennanConroy
Copy link
Member

  • Is there a way to run perf tests with the /benchmark plaintext aspnet-citrine-lin kestrel or is there a particular benchmark that might be interesting to run? Not sure if there is anything designed for locking and contention.

In theory using Lock can be more performant than locking with object, idk how much the CLR has done to actually implement the theoretical improvements yet. But I would be very surprised if we saw any measurable differences in E2E benchmarks (considering normal benchmark noise). In the past we've seen Http2 affected by contention and did a big refactor to reduce that, so maybe it would have been a nice test, but less so now 😄

@BrennanConroy
Copy link
Member

/benchmark plaintext aspnet-citrine-lin kestrel

@ladeak
Copy link
Contributor Author

ladeak commented Aug 8, 2024

Yes that is my impression as well.

Copy link

pr-benchmarks bot commented Aug 8, 2024

Benchmark started for plaintext on aspnet-citrine-lin with kestrel. Logs: link

Copy link

pr-benchmarks bot commented Aug 8, 2024

plaintext - aspnet-citrine-lin

application plaintext.base plaintext.pr
Max CPU Usage (%) 100 99 -1.00%
Max Cores usage (%) 2,788 2,785 -0.11%
Max Working Set (MB) 91 92 +1.10%
Max Private Memory (MB) 592 592 0.00%
Build Time (ms) 3,608 3,795 +5.18%
Start Time (ms) 228 225 -1.32%
Published Size (KB) 106,590 106,590 0.00%
Symbols Size (KB) 53 53 0.00%
.NET Core SDK Version 9.0.100-rc.1.24408.10 9.0.100-rc.1.24408.10
Max CPU Usage (%) 99 100 +0.44%
Max Working Set (MB) 95 96 +0.55%
Max GC Heap Size (MB) 25 25 -0.79%
Size of committed memory by the GC (MB) 16 17 +5.14%
Max Number of Gen 0 GCs / sec 4.00 3.00 -25.00%
Max Number of Gen 1 GCs / sec 3.00 2.00 -33.33%
Max Number of Gen 2 GCs / sec 3.00 2.00 -33.33%
Max Gen 0 GC Budget (MB) 26 26 0.00%
Max Time in GC (%) 8.00 6.00 -25.00%
Max Gen 0 Size (B) 1,569,560 1,731,808 +10.34%
Max Gen 1 Size (B) 1,954,968 2,312,136 +18.27%
Max Gen 2 Size (B) 2,479,352 2,397,416 -3.30%
Max LOH Size (B) 95,248 95,248 0.00%
Max POH Size (B) 9,463,448 9,500,528 +0.39%
Max Allocation Rate (B/sec) 14,922,296 14,312,544 -4.09%
Max GC Heap Fragmentation (%) 586% 694% +18.45%
# of Assemblies Loaded 62 62 0.00%
Max Exceptions (#/s) 1,302 1,304 +0.15%
Max Lock Contention (#/s) 72 309 +329.17%
Max ThreadPool Threads Count 48 48 0.00%
Max ThreadPool Queue Length 586 528 -9.90%
Max ThreadPool Items (#/s) 781,916 784,181 +0.29%
Max Active Timers 1 1 0.00%
IL Jitted (B) 285,669 287,410 +0.61%
Methods Jitted 3,504 3,519 +0.43%
load plaintext.base plaintext.pr
Max CPU Usage (%) 98 98 0.00%
Max Cores usage (%) 2,754 2,749 -0.18%
Max Working Set (MB) 46 46 0.00%
Max Private Memory (MB) 370 370 0.00%
Build Time (ms) 3,369 3,513 +4.27%
Start Time (ms) 0 0
Published Size (KB) 72,283 72,283 0.00%
Symbols Size (KB) 0 0
.NET Core SDK Version 8.0.303 8.0.303
First Request (ms) 113 114 +0.88%
Requests/sec 11,815,358 11,814,488 -0.01%
Requests 178,307,264 178,285,096 -0.01%
Mean latency (ms) 1.31 1.35 +3.05%
Max latency (ms) 77.08 70.29 -8.81%
Bad responses 0 0
Socket errors 0 0
Read throughput (MB/s) 1,423.36 1,423.36 0.00%
Latency 50th (ms) 0.67 0.68 +0.74%
Latency 75th (ms) 1.01 1.03 +1.98%
Latency 90th (ms) 2.02 2.18 +7.92%
Latency 99th (ms) 14.09 14.68 +4.19%

@amcasey
Copy link
Member

amcasey commented Aug 12, 2024

CI failure seems likely to be addressed by #57269.

@ladeak
Copy link
Contributor Author

ladeak commented Aug 12, 2024

My concern is that under lock contention the new lock does not seem to perform that well in my tests on x64. Looks nice on the first benchmark, then I repeat the tests and consistently get 30% slower results for all following benchmarks

@halter73
Copy link
Member

I wouldn't expect these locks to be too contended under normal usage with the exception of Http2FrameWriter._windowUpdateLock, so maybe hold off on that one if we're worried about contended performance.

Has Lock seen much usage in runtime yet? I see that it now used for TimeQueue. If we can get good contended performance (I'm not sure why it would be worse), System.IO.Pipelines and System.Thread.Channels seem like good candidates. It would require an extra allocation for channels though.

@kouvel Do you have any idea why @ladeak might be seeing a slowdown with Lock when he repeats tests? And do we have a good rule of thumb for when to use Lock vs a plain object?

@kouvel
Copy link
Member

kouvel commented Aug 12, 2024

@ladeak, can you share your tests? Do the tests do some warmup to ensure that methods get tiered up?

There can be slight differences in performance due to different code gen and TLS access, but when using the lock keyword Lock typically performance equal or better than Monitor locks. Using Enter/Exit directly may be a bit slower in some cases when using Lock due to a slightly more expensive TLS lookup.

@ladeak
Copy link
Contributor Author

ladeak commented Aug 13, 2024

@kouvel let me attach the benchmarks and the results:

BenchmarkRunner.Run<Benchmarks>();

[SimpleJob]
public class Benchmarks
{
    private int _value = 0;
    private const int N = 10000;
    private readonly object _regularLock = new();
    private readonly Lock _newLock = new();

    [Benchmark]
    public async Task RegularLock()
    {
        var t1 = Task.Run(Update);
        var t2 = Task.Run(Update);
        var t3 = Task.Run(Update);
        var t4 = Task.Run(Update);
        await Task.WhenAll(t1, t2, t3, t4);
        void Update()
        {
            for (int i = 0; i < N; i++)
            {
                lock (_regularLock)
                    _value++;
            }
        }
    }

    [Benchmark]
    public async Task NewLock()
    {
        var t1 = Task.Run(Update);
        var t2 = Task.Run(Update);
        var t3 = Task.Run(Update);
        var t4 = Task.Run(Update);
        await Task.WhenAll(t1, t2, t3, t4);
        void Update()
        {
            for (int i = 0; i < N; i++)
            {
                lock (_newLock)
                    _value++;
            }
        }
    }
}

This is the result I get most of the times:

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.3958/23H2/2023Update/SunValley3)
12th Gen Intel Core i7-1255U, 1 CPU, 12 logical and 10 physical cores
.NET SDK 9.0.100-preview.7.24371.4
  [Host]     : .NET 9.0.0 (9.0.24.36618), X64 RyuJIT AVX2
  DefaultJob : .NET 9.0.0 (9.0.24.36618), X64 RyuJIT AVX2


| Method      | Mean     | Error     | StdDev    |
|------------ |---------:|----------:|----------:|
| RegularLock | 1.661 ms | 0.0331 ms | 0.0719 ms |
| NewLock     | 3.109 ms | 0.0376 ms | 0.0351 ms |

The numbers slightly vary for RegularLock 1.6-2.9ms and NewLock 2.4-3.1ms, for example another run:

| Method      | Mean     | Error     | StdDev    |
|------------ |---------:|----------:|----------:|
| RegularLock | 2.091 ms | 0.0611 ms | 0.1802 ms |
| NewLock     | 2.836 ms | 0.0712 ms | 0.2099 ms |

I expect larger variance due to the nature of the tests. I tried different N-s ie. 1000, but the results are similar (unless I go with a very small N).

Very rarely I get results as follows (or even the NewLock being faster slightly):

| Method      | Mean     | Error     | StdDev    |
|------------ |---------:|----------:|----------:|
| RegularLock | 2.869 ms | 0.0569 ms | 0.1519 ms |
| NewLock     | 2.944 ms | 0.1004 ms | 0.2961 ms |

// * Warnings *
MultimodalDistribution
  Benchmarks.NewLock: Default -> It seems that the distribution can have several modes (mValue = 3)

Single Task

When I remove the parallel tasks and have an await t1;, and the results are different, and the Lock type is consistently faster:

| Method      | Mean     | Error   | StdDev  |
|------------ |---------:|--------:|--------:|
| RegularLock | 184.3 us | 1.62 us | 1.51 us |
| NewLock     | 170.1 us | 1.15 us | 1.02 us |

3 Tasks

With 3 parallel tasks (instead of 4), I get results that are pretty close, but still favor of the old lock consistently:


| Method      | Mean       | Error    | StdDev   |
|------------ |-----------:|---------:|---------:|
| RegularLock |   920.3 us | 11.25 us | 10.52 us |
| NewLock     | 1,098.0 us | 21.86 us | 58.72 us |

2 Tasks

Looks similar to 3 tasks:

| Method      | Mean     | Error    | StdDev   |
|------------ |---------:|---------:|---------:|
| RegularLock | 549.1 us | 10.67 us | 17.22 us |
| NewLock     | 581.5 us | 11.05 us | 11.34 us |

@kouvel
Copy link
Member

kouvel commented Aug 14, 2024

let me attach the benchmarks and the results:

I'm seeing opposite results on my machine, but in any case maybe an issue is that 10000 is small enough that one started task progresses significantly before the next one is started and since uncontended lock acquisitions are much faster, they could progress a lot, and that would I imagine introduce a lot of variability to the results. So it's hard to say how much contention is actually happening during each iteration of the test. I typically test it differently, having multiple threads continuously doing the contentious work, and warming up and measuring changes to some count over time in some other thread. Maybe with BDN a setup/cleanup could achieve something like that. With this kind of lock usage, I would not expect Lock to be slower, I'm seeing that it's noticeably faster on my machine.

@kouvel
Copy link
Member

kouvel commented Aug 14, 2024

let me attach the benchmarks and the results:

What do you see with very large N, like 1,000,000? Too large an N may cause the method to not be called enough to get tiered up, but anyway just curious.

@ladeak
Copy link
Contributor Author

ladeak commented Aug 14, 2024

@kouvel with N = 1_000_000;

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4037/23H2/2023Update/SunValley3)
12th Gen Intel Core i7-1255U, 1 CPU, 12 logical and 10 physical cores
.NET SDK 9.0.100-preview.7.24371.4
  [Host]     : .NET 9.0.0 (9.0.24.36618), X64 RyuJIT AVX2
  DefaultJob : .NET 9.0.0 (9.0.24.36618), X64 RyuJIT AVX2


| Method      | Mean     | Error   | StdDev  |
|------------ |---------:|--------:|--------:|
| RegularLock | 318.3 ms | 1.84 ms | 1.63 ms |
| NewLock     | 360.3 ms | 6.96 ms | 8.80 ms |

and after a second run:

| Method      | Mean     | Error   | StdDev  |
|------------ |---------:|--------:|--------:|
| RegularLock | 263.7 ms | 3.40 ms | 3.01 ms |
| NewLock     | 373.7 ms | 7.01 ms | 7.20 ms |

and a 3rd run:

| Method      | Mean     | Error   | StdDev  |
|------------ |---------:|--------:|--------:|
| RegularLock | 234.8 ms | 3.62 ms | 3.39 ms |
| NewLock     | 339.3 ms | 6.43 ms | 6.31 ms |

@kouvel
Copy link
Member

kouvel commented Aug 14, 2024

with N = 1_000_000;

Ok, maybe there's a processor difference here. Based on the processor spec it seems like a low-wattage mobile processor. Maybe there are differences there that I'm not aware of. I'll try to post another test case that's similar to yours but without BDN, curious what you would see there. May be a bit before I get back though.

@ladeak
Copy link
Contributor Author

ladeak commented Aug 14, 2024

Take your time, I am going to sleep in the next few hours.

@kouvel
Copy link
Member

kouvel commented Aug 15, 2024

@ladeak, can you try this on your machine:

using System.Diagnostics;

static class Program
{
    private static void Main()
    {
        TestMonitor();
        TestLock();
    }

    private static void TestMonitor()
    {
        object lockObj = new();
        int value = 0;
        Test(
            "Monitor",
            ref value,
            n =>
            {
                for (int i = 0; i < n; i++)
                {
                    lock (lockObj)
                        value++;
                }
            });
    }

    private static void TestLock()
    {
        Lock lockObj = new();
        int value = 0;
        Test(
            "Lock",
            ref value,
            n =>
            {
                for (int i = 0; i < n; i++)
                {
                    lock (lockObj)
                        value++;
                }
            });
    }

    private static void Test(string name, ref int value, Action<int> update)
    {
        bool stop = false;
        ThreadStart threadMain = () =>
        {
            while (!stop)
                update(100);
        };

        const int ThreadCount = 4;
        var threads = new Thread[ThreadCount];
        for (int i = 0; i < ThreadCount; i++)
        {
            var t = new Thread(threadMain);
            t.IsBackground = true;
            t.Start();
            threads[i] = t;
        }

        var sw = new Stopwatch();
        Thread.Sleep(500); // warmup
        for (int i = 0; i < 8; i++)
        {
            sw.Restart();
            int startValue = value;
            Thread.Sleep(500);
            sw.Stop();
            int changes = value - startValue;

            Console.WriteLine($"{name,10}: {sw.Elapsed.TotalNanoseconds / changes,10:0.000} ns/change");
        }

        stop = true;
        foreach (var t in threads)
            t.Join();
    }
}

On my machine (AMD Ryzen 9 5950X), I'm seeing numbers similar to these:

   Monitor:     50.974 ns/change
   Monitor:     49.492 ns/change
   Monitor:     49.922 ns/change
   Monitor:     50.590 ns/change
   Monitor:     49.990 ns/change
   Monitor:     50.124 ns/change
   Monitor:     49.841 ns/change
   Monitor:     49.883 ns/change
      Lock:     44.997 ns/change
      Lock:     44.390 ns/change
      Lock:     44.946 ns/change
      Lock:     44.924 ns/change
      Lock:     44.703 ns/change
      Lock:     44.684 ns/change
      Lock:     44.857 ns/change
      Lock:     44.880 ns/change

@ladeak
Copy link
Contributor Author

ladeak commented Aug 15, 2024

@kouvel

 Monitor:     86,511 ns/change
   Monitor:     84,948 ns/change
   Monitor:     86,056 ns/change
   Monitor:     82,722 ns/change
   Monitor:     85,529 ns/change
   Monitor:     86,259 ns/change
   Monitor:     85,450 ns/change
   Monitor:     84,780 ns/change
      Lock:     78,784 ns/change
      Lock:     78,250 ns/change
      Lock:     77,387 ns/change
      Lock:     76,139 ns/change
      Lock:     79,439 ns/change
      Lock:     77,637 ns/change
      Lock:     80,203 ns/change
      Lock:     79,250 ns/change

and then I run it again:

 Monitor:     73,323 ns/change
   Monitor:     72,383 ns/change
   Monitor:     71,390 ns/change
   Monitor:     72,732 ns/change
   Monitor:     73,548 ns/change
   Monitor:     73,232 ns/change
   Monitor:     73,744 ns/change
   Monitor:     73,970 ns/change
      Lock:     81,881 ns/change
      Lock:     83,385 ns/change
      Lock:     84,567 ns/change
      Lock:     84,398 ns/change
      Lock:     83,790 ns/change
      Lock:     83,837 ns/change
      Lock:     84,742 ns/change
      Lock:     84,603 ns/change

and again:

  Monitor:     86,886 ns/change
   Monitor:     85,706 ns/change
   Monitor:     85,577 ns/change
   Monitor:     87,071 ns/change
   Monitor:     88,068 ns/change
   Monitor:     88,368 ns/change
   Monitor:     87,037 ns/change
   Monitor:     87,302 ns/change
      Lock:     80,370 ns/change
      Lock:     80,248 ns/change
      Lock:     80,148 ns/change
      Lock:     79,105 ns/change
      Lock:     75,010 ns/change
      Lock:     80,219 ns/change
      Lock:     78,221 ns/change
      Lock:     79,416 ns/change

and a 4th time:

   Monitor:     74,073 ns/change
   Monitor:     74,139 ns/change
   Monitor:     74,123 ns/change
   Monitor:     74,676 ns/change
   Monitor:     73,848 ns/change
   Monitor:     74,512 ns/change
   Monitor:     74,441 ns/change
   Monitor:     75,091 ns/change
      Lock:     71,967 ns/change
      Lock:     71,727 ns/change
      Lock:     71,873 ns/change
      Lock:     70,318 ns/change
      Lock:     71,796 ns/change
      Lock:     71,635 ns/change
      Lock:     71,204 ns/change
      Lock:     72,196 ns/change

I think the results on this look better in the sense as in BDN the new Lock had worse results in general. It looks much better in this test

@kouvel
Copy link
Member

kouvel commented Aug 15, 2024

Ok maybe just a processor difference there, my results are varying a bit too between runs, but mostly similar. There are more things that can be improved for Lock, such as dotnet/runtime#79485. I'm not too worried about the difference there.

@ladeak
Copy link
Contributor Author

ladeak commented Aug 15, 2024

@kouvel in your suggested measurement code I updated the ThreadCount const from value 4 to 2, and I get the following results:

   Monitor:     23,446 ns/change
   Monitor:     22,299 ns/change
   Monitor:     23,097 ns/change
   Monitor:     22,636 ns/change
   Monitor:     23,446 ns/change
   Monitor:     22,707 ns/change
   Monitor:     22,448 ns/change
   Monitor:     22,749 ns/change
      Lock:     34,962 ns/change
      Lock:     33,850 ns/change
      Lock:     35,289 ns/change
      Lock:     35,144 ns/change
      Lock:     34,545 ns/change
      Lock:     34,966 ns/change
      Lock:     33,423 ns/change
      Lock:     32,656 ns/change

2nd run

  Monitor:     23,121 ns/change
   Monitor:     22,298 ns/change
   Monitor:     22,972 ns/change
   Monitor:     23,040 ns/change
   Monitor:     22,283 ns/change
   Monitor:     22,390 ns/change
   Monitor:     22,106 ns/change
   Monitor:     22,301 ns/change
      Lock:     26,932 ns/change
      Lock:     26,819 ns/change
      Lock:     26,672 ns/change
      Lock:     26,820 ns/change
      Lock:     26,858 ns/change
      Lock:     26,787 ns/change
      Lock:     26,837 ns/change
      Lock:     26,911 ns/change

3rd run:

 Monitor:     26,833 ns/change
   Monitor:     26,722 ns/change
   Monitor:     25,793 ns/change
   Monitor:     26,854 ns/change
   Monitor:     26,866 ns/change
   Monitor:     26,718 ns/change
   Monitor:     26,735 ns/change
   Monitor:     26,741 ns/change
      Lock:     41,130 ns/change
      Lock:     41,578 ns/change
      Lock:     40,824 ns/change
      Lock:     40,474 ns/change
      Lock:     36,992 ns/change
      Lock:     39,834 ns/change
      Lock:     36,275 ns/change
      Lock:     35,051 ns/change

I repeated the tests a few times with different thread counts, and it seems with ThreadCount=4 Lock is consistently better, with 3 or 2, Monitor is consistently better. With 1 thread Lock ~1ns better.

@kouvel
Copy link
Member

kouvel commented Aug 15, 2024

Yea it is possible actually. In the test, the more number of times one thread can reacquire the lock in sequence, the better the result would be. For instance, using a Sleep(1) upon contention would make the test show very good results, but lock acquisition latency would get very high. Slight differences in timing could affect the result, it's probably more nuanced to benchmark performance under contention. I typically randomize how long the lock is held to help with stabilizing the results. There are some small spin-waiting differences in Lock to yield the thread more quickly upon contention, which could also affect the timing, and helps to not waste time spinning when other threads can do more useful work.

@ladeak
Copy link
Contributor Author

ladeak commented Aug 16, 2024

@BrennanConroy and @halter73 I think all questions are clarified.

@halter73
Copy link
Member

If we're worried about regressing contended performance, should we continue to use object for Http2FrameWriter._windowUpdateLock and FlowControl._flowLock? I imagine Http2FrameWriter.WriteToOutputPipe will usually acquire the lock more than Http2Connection.ProcessWindowUpdateFrameAsync when writing a bunch of response bodies, but both methods will acquire the lock a lot from different threads. It seems most similar to the ThreadCount=2 scenario.

Similar logic goes for FlowControl._flowLock. That only affects request body uploads which are rarer, but could be worse because each concurrent stream tries to independently acquire the connection-level _flowLock on their own threadpool thread. It doesn't have logic to consolidate the input reading for all streams into a single loop that's analogous to what WriteToOutputPipe does for output writing.

The other locks seem like a purer win, since they will almost always be accessed from the application thread, and the locks are mostly there to handle early aborts.

@ladeak
Copy link
Contributor Author

ladeak commented Aug 17, 2024

I have been trying a few other things out, ie. WSL (Ubuntu 22.04.03), same machine

4 threads (the variance is huge between executions):

   Monitor:    150.381 ns/change
   Monitor:    148.997 ns/change
   Monitor:    152.141 ns/change
   Monitor:    154.373 ns/change
   Monitor:    148.490 ns/change
   Monitor:    153.407 ns/change
   Monitor:    147.473 ns/change
   Monitor:    133.847 ns/change
      Lock:    126.061 ns/change
      Lock:    126.451 ns/change
      Lock:    125.610 ns/change
      Lock:    126.589 ns/change
      Lock:    125.851 ns/change
      Lock:    120.473 ns/change
      Lock:    105.485 ns/change
      Lock:    105.622 ns/change
   Monitor:     70.997 ns/change
   Monitor:     70.977 ns/change
   Monitor:     71.349 ns/change
   Monitor:     71.223 ns/change
   Monitor:     71.148 ns/change
   Monitor:     71.697 ns/change
   Monitor:     70.972 ns/change
   Monitor:     70.906 ns/change
      Lock:    114.761 ns/change
      Lock:    113.196 ns/change
      Lock:    114.455 ns/change
      Lock:    113.534 ns/change
      Lock:    114.359 ns/change
      Lock:    113.941 ns/change
      Lock:    114.224 ns/change
      Lock:    114.851 ns/change

With 2 threads the results are pretty similar as on Windows, but with somewhat more variance between runs:

   Monitor:     25.994 ns/change
   Monitor:     26.257 ns/change
   Monitor:     26.150 ns/change
   Monitor:     26.087 ns/change
   Monitor:     25.803 ns/change
   Monitor:     26.018 ns/change
   Monitor:     25.840 ns/change
   Monitor:     25.803 ns/change
      Lock:     25.146 ns/change
      Lock:     24.732 ns/change
      Lock:     24.764 ns/change
      Lock:     25.223 ns/change
      Lock:     25.227 ns/change
      Lock:     25.133 ns/change
      Lock:     25.582 ns/change
      Lock:     25.067 ns/change

@ladeak
Copy link
Contributor Author

ladeak commented Sep 4, 2024

Some updates: with the latest preview the Lock type seems to be faster in all above cases (1-4 threads) compared to locking a non Lock instance.

- Replacing lock objects with Use System.Threading.Lock
- HTTP3 has a few dictionaries/lists locked that is not addressed in this PR
@ladeak ladeak force-pushed the ladeak-56794 branch 3 times, most recently from fcab2b3 to 8878dad Compare September 8, 2024 12:16
@ladeak
Copy link
Contributor Author

ladeak commented Sep 17, 2024

@halter73 would you have further suggestions?

@halter73
Copy link
Member

halter73 commented Sep 19, 2024

Thanks for all the work you've put into this! I think it would be better to first merge the System.Threading.Lock changes and separate the lock-free changes into another PR. And I'd keep the _flowLock as a plain object at least for now.

The lock-free changes look good to me too, but I would appreciate like @BrennanConroy @amcasey or maybe even @benaadams looking at it. Putting it in a separate PR to make will make it easier for others to review it on its own and make it easier to revert if need be.

@ladeak
Copy link
Contributor Author

ladeak commented Sep 19, 2024

Reverted this PR to use the scope of using Lock-s everywhere. Incorporating the latest measurements (which seemed to be faster in all cases), so InputFlowControl and Http2FrameWriter._windowUpdateLock uses it. Soon opening a new PR with the interlocked changes.

@ladeak
Copy link
Contributor Author

ladeak commented Sep 19, 2024

@halter73 I have opened a separate PR: #57968 for the interlocked changes in InputFlowControl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions community-contribution Indicates that the PR has been added by a community member pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants