Use System.Threading.Lock in Kestrel #57236

ladeak · 2024-08-08T14:21:27Z

Use System.Threading.Lock in Kestrel

Replacing lock objects with System.Threading.Lock

Description

Addressing Servers/Kestrel module
HTTP3 has a few dictionaries/lists locked that is not addressed in this PR
- Question: should these objects have a corresponding Lock field to lock on instead of the actual object being locked?
In my personal measurements (not attached to this PR) the Lock type performed marginally better compared to the regular Monitor Lock when there is no contention.
Is there a way to run perf tests with the /benchmark plaintext aspnet-citrine-lin kestrel or is there a particular benchmark that might be interesting to run? Not sure if there is anything designed for locking and contention.
The objects changed does not seem to be passed around otherwise (for other types also locking them), so I think these are safe. (passing the Lock to a method accepting object and locking on the object results a problematic behavior as I see)

Linked #56794

martincostello · 2024-08-08T14:24:29Z

Is there a way to run perf tests with the /benchmark plaintext aspnet-citrine-lin kestrel

The team members can run those for you - it needs write permission on the repo.

BrennanConroy · 2024-08-08T17:56:44Z

Is there a way to run perf tests with the /benchmark plaintext aspnet-citrine-lin kestrel or is there a particular benchmark that might be interesting to run? Not sure if there is anything designed for locking and contention.

In theory using Lock can be more performant than locking with object, idk how much the CLR has done to actually implement the theoretical improvements yet. But I would be very surprised if we saw any measurable differences in E2E benchmarks (considering normal benchmark noise). In the past we've seen Http2 affected by contention and did a big refactor to reduce that, so maybe it would have been a nice test, but less so now 😄

BrennanConroy · 2024-08-08T17:56:49Z

/benchmark plaintext aspnet-citrine-lin kestrel

ladeak · 2024-08-08T17:58:36Z

Yes that is my impression as well.

pr-benchmarks · 2024-08-08T18:07:43Z

Benchmark started for plaintext on aspnet-citrine-lin with kestrel. Logs: link

pr-benchmarks · 2024-08-08T18:16:26Z

plaintext - aspnet-citrine-lin

application	plaintext.base	plaintext.pr
Max CPU Usage (%)	100	99	-1.00%
Max Cores usage (%)	2,788	2,785	-0.11%
Max Working Set (MB)	91	92	+1.10%
Max Private Memory (MB)	592	592	0.00%
Build Time (ms)	3,608	3,795	+5.18%
Start Time (ms)	228	225	-1.32%
Published Size (KB)	106,590	106,590	0.00%
Symbols Size (KB)	53	53	0.00%
.NET Core SDK Version	9.0.100-rc.1.24408.10	9.0.100-rc.1.24408.10
Max CPU Usage (%)	99	100	+0.44%
Max Working Set (MB)	95	96	+0.55%
Max GC Heap Size (MB)	25	25	-0.79%
Size of committed memory by the GC (MB)	16	17	+5.14%
Max Number of Gen 0 GCs / sec	4.00	3.00	-25.00%
Max Number of Gen 1 GCs / sec	3.00	2.00	-33.33%
Max Number of Gen 2 GCs / sec	3.00	2.00	-33.33%
Max Gen 0 GC Budget (MB)	26	26	0.00%
Max Time in GC (%)	8.00	6.00	-25.00%
Max Gen 0 Size (B)	1,569,560	1,731,808	+10.34%
Max Gen 1 Size (B)	1,954,968	2,312,136	+18.27%
Max Gen 2 Size (B)	2,479,352	2,397,416	-3.30%
Max LOH Size (B)	95,248	95,248	0.00%
Max POH Size (B)	9,463,448	9,500,528	+0.39%
Max Allocation Rate (B/sec)	14,922,296	14,312,544	-4.09%
Max GC Heap Fragmentation (%)	586%	694%	+18.45%
# of Assemblies Loaded	62	62	0.00%
Max Exceptions (#/s)	1,302	1,304	+0.15%
Max Lock Contention (#/s)	72	309	+329.17%
Max ThreadPool Threads Count	48	48	0.00%
Max ThreadPool Queue Length	586	528	-9.90%
Max ThreadPool Items (#/s)	781,916	784,181	+0.29%
Max Active Timers	1	1	0.00%
IL Jitted (B)	285,669	287,410	+0.61%
Methods Jitted	3,504	3,519	+0.43%

load	plaintext.base	plaintext.pr
Max CPU Usage (%)	98	98	0.00%
Max Cores usage (%)	2,754	2,749	-0.18%
Max Working Set (MB)	46	46	0.00%
Max Private Memory (MB)	370	370	0.00%
Build Time (ms)	3,369	3,513	+4.27%
Start Time (ms)	0	0
Published Size (KB)	72,283	72,283	0.00%
Symbols Size (KB)	0	0
.NET Core SDK Version	8.0.303	8.0.303
First Request (ms)	113	114	+0.88%
Requests/sec	11,815,358	11,814,488	-0.01%
Requests	178,307,264	178,285,096	-0.01%
Mean latency (ms)	1.31	1.35	+3.05%
Max latency (ms)	77.08	70.29	-8.81%
Bad responses	0	0
Socket errors	0	0
Read throughput (MB/s)	1,423.36	1,423.36	0.00%
Latency 50th (ms)	0.67	0.68	+0.74%
Latency 75th (ms)	1.01	1.03	+1.98%
Latency 90th (ms)	2.02	2.18	+7.92%
Latency 99th (ms)	14.09	14.68	+4.19%

amcasey · 2024-08-12T18:14:02Z

CI failure seems likely to be addressed by #57269.

ladeak · 2024-08-12T18:17:55Z

My concern is that under lock contention the new lock does not seem to perform that well in my tests on x64. Looks nice on the first benchmark, then I repeat the tests and consistently get 30% slower results for all following benchmarks

halter73 · 2024-08-12T21:53:12Z

I wouldn't expect these locks to be too contended under normal usage with the exception of Http2FrameWriter._windowUpdateLock, so maybe hold off on that one if we're worried about contended performance.

Has Lock seen much usage in runtime yet? I see that it now used for TimeQueue. If we can get good contended performance (I'm not sure why it would be worse), System.IO.Pipelines and System.Thread.Channels seem like good candidates. It would require an extra allocation for channels though.

@kouvel Do you have any idea why @ladeak might be seeing a slowdown with Lock when he repeats tests? And do we have a good rule of thumb for when to use Lock vs a plain object?

kouvel · 2024-08-12T22:33:03Z

@ladeak, can you share your tests? Do the tests do some warmup to ensure that methods get tiered up?

There can be slight differences in performance due to different code gen and TLS access, but when using the lock keyword Lock typically performance equal or better than Monitor locks. Using Enter/Exit directly may be a bit slower in some cases when using Lock due to a slightly more expensive TLS lookup.

ladeak · 2024-08-13T07:22:11Z

@kouvel let me attach the benchmarks and the results:

BenchmarkRunner.Run<Benchmarks>();

[SimpleJob]
public class Benchmarks
{
    private int _value = 0;
    private const int N = 10000;
    private readonly object _regularLock = new();
    private readonly Lock _newLock = new();

    [Benchmark]
    public async Task RegularLock()
    {
        var t1 = Task.Run(Update);
        var t2 = Task.Run(Update);
        var t3 = Task.Run(Update);
        var t4 = Task.Run(Update);
        await Task.WhenAll(t1, t2, t3, t4);
        void Update()
        {
            for (int i = 0; i < N; i++)
            {
                lock (_regularLock)
                    _value++;
            }
        }
    }

    [Benchmark]
    public async Task NewLock()
    {
        var t1 = Task.Run(Update);
        var t2 = Task.Run(Update);
        var t3 = Task.Run(Update);
        var t4 = Task.Run(Update);
        await Task.WhenAll(t1, t2, t3, t4);
        void Update()
        {
            for (int i = 0; i < N; i++)
            {
                lock (_newLock)
                    _value++;
            }
        }
    }
}

This is the result I get most of the times:

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.3958/23H2/2023Update/SunValley3)
12th Gen Intel Core i7-1255U, 1 CPU, 12 logical and 10 physical cores
.NET SDK 9.0.100-preview.7.24371.4
  [Host]     : .NET 9.0.0 (9.0.24.36618), X64 RyuJIT AVX2
  DefaultJob : .NET 9.0.0 (9.0.24.36618), X64 RyuJIT AVX2


| Method      | Mean     | Error     | StdDev    |
|------------ |---------:|----------:|----------:|
| RegularLock | 1.661 ms | 0.0331 ms | 0.0719 ms |
| NewLock     | 3.109 ms | 0.0376 ms | 0.0351 ms |

The numbers slightly vary for RegularLock 1.6-2.9ms and NewLock 2.4-3.1ms, for example another run:

| Method      | Mean     | Error     | StdDev    |
|------------ |---------:|----------:|----------:|
| RegularLock | 2.091 ms | 0.0611 ms | 0.1802 ms |
| NewLock     | 2.836 ms | 0.0712 ms | 0.2099 ms |

I expect larger variance due to the nature of the tests. I tried different N-s ie. 1000, but the results are similar (unless I go with a very small N).

Very rarely I get results as follows (or even the NewLock being faster slightly):

| Method      | Mean     | Error     | StdDev    |
|------------ |---------:|----------:|----------:|
| RegularLock | 2.869 ms | 0.0569 ms | 0.1519 ms |
| NewLock     | 2.944 ms | 0.1004 ms | 0.2961 ms |

// * Warnings *
MultimodalDistribution
  Benchmarks.NewLock: Default -> It seems that the distribution can have several modes (mValue = 3)

Single Task

When I remove the parallel tasks and have an await t1;, and the results are different, and the Lock type is consistently faster:

| Method      | Mean     | Error   | StdDev  |
|------------ |---------:|--------:|--------:|
| RegularLock | 184.3 us | 1.62 us | 1.51 us |
| NewLock     | 170.1 us | 1.15 us | 1.02 us |

3 Tasks

With 3 parallel tasks (instead of 4), I get results that are pretty close, but still favor of the old lock consistently:


| Method      | Mean       | Error    | StdDev   |
|------------ |-----------:|---------:|---------:|
| RegularLock |   920.3 us | 11.25 us | 10.52 us |
| NewLock     | 1,098.0 us | 21.86 us | 58.72 us |

2 Tasks

Looks similar to 3 tasks:

| Method      | Mean     | Error    | StdDev   |
|------------ |---------:|---------:|---------:|
| RegularLock | 549.1 us | 10.67 us | 17.22 us |
| NewLock     | 581.5 us | 11.05 us | 11.34 us |

kouvel · 2024-08-14T20:20:45Z

let me attach the benchmarks and the results:

I'm seeing opposite results on my machine, but in any case maybe an issue is that 10000 is small enough that one started task progresses significantly before the next one is started and since uncontended lock acquisitions are much faster, they could progress a lot, and that would I imagine introduce a lot of variability to the results. So it's hard to say how much contention is actually happening during each iteration of the test. I typically test it differently, having multiple threads continuously doing the contentious work, and warming up and measuring changes to some count over time in some other thread. Maybe with BDN a setup/cleanup could achieve something like that. With this kind of lock usage, I would not expect Lock to be slower, I'm seeing that it's noticeably faster on my machine.

kouvel · 2024-08-14T20:21:56Z

let me attach the benchmarks and the results:

What do you see with very large N, like 1,000,000? Too large an N may cause the method to not be called enough to get tiered up, but anyway just curious.

ladeak · 2024-08-14T20:34:14Z

@kouvel with N = 1_000_000;

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4037/23H2/2023Update/SunValley3)
12th Gen Intel Core i7-1255U, 1 CPU, 12 logical and 10 physical cores
.NET SDK 9.0.100-preview.7.24371.4
  [Host]     : .NET 9.0.0 (9.0.24.36618), X64 RyuJIT AVX2
  DefaultJob : .NET 9.0.0 (9.0.24.36618), X64 RyuJIT AVX2


| Method      | Mean     | Error   | StdDev  |
|------------ |---------:|--------:|--------:|
| RegularLock | 318.3 ms | 1.84 ms | 1.63 ms |
| NewLock     | 360.3 ms | 6.96 ms | 8.80 ms |

and after a second run:

| Method      | Mean     | Error   | StdDev  |
|------------ |---------:|--------:|--------:|
| RegularLock | 263.7 ms | 3.40 ms | 3.01 ms |
| NewLock     | 373.7 ms | 7.01 ms | 7.20 ms |

and a 3rd run:

| Method      | Mean     | Error   | StdDev  |
|------------ |---------:|--------:|--------:|
| RegularLock | 234.8 ms | 3.62 ms | 3.39 ms |
| NewLock     | 339.3 ms | 6.43 ms | 6.31 ms |

kouvel · 2024-08-14T20:49:01Z

with N = 1_000_000;

Ok, maybe there's a processor difference here. Based on the processor spec it seems like a low-wattage mobile processor. Maybe there are differences there that I'm not aware of. I'll try to post another test case that's similar to yours but without BDN, curious what you would see there. May be a bit before I get back though.

ladeak · 2024-08-14T20:51:43Z

Take your time, I am going to sleep in the next few hours.

kouvel · 2024-08-15T17:04:27Z

@ladeak, can you try this on your machine:

using System.Diagnostics;

static class Program
{
    private static void Main()
    {
        TestMonitor();
        TestLock();
    }

    private static void TestMonitor()
    {
        object lockObj = new();
        int value = 0;
        Test(
            "Monitor",
            ref value,
            n =>
            {
                for (int i = 0; i < n; i++)
                {
                    lock (lockObj)
                        value++;
                }
            });
    }

    private static void TestLock()
    {
        Lock lockObj = new();
        int value = 0;
        Test(
            "Lock",
            ref value,
            n =>
            {
                for (int i = 0; i < n; i++)
                {
                    lock (lockObj)
                        value++;
                }
            });
    }

    private static void Test(string name, ref int value, Action<int> update)
    {
        bool stop = false;
        ThreadStart threadMain = () =>
        {
            while (!stop)
                update(100);
        };

        const int ThreadCount = 4;
        var threads = new Thread[ThreadCount];
        for (int i = 0; i < ThreadCount; i++)
        {
            var t = new Thread(threadMain);
            t.IsBackground = true;
            t.Start();
            threads[i] = t;
        }

        var sw = new Stopwatch();
        Thread.Sleep(500); // warmup
        for (int i = 0; i < 8; i++)
        {
            sw.Restart();
            int startValue = value;
            Thread.Sleep(500);
            sw.Stop();
            int changes = value - startValue;

            Console.WriteLine($"{name,10}: {sw.Elapsed.TotalNanoseconds / changes,10:0.000} ns/change");
        }

        stop = true;
        foreach (var t in threads)
            t.Join();
    }
}

On my machine (AMD Ryzen 9 5950X), I'm seeing numbers similar to these:

   Monitor:     50.974 ns/change
   Monitor:     49.492 ns/change
   Monitor:     49.922 ns/change
   Monitor:     50.590 ns/change
   Monitor:     49.990 ns/change
   Monitor:     50.124 ns/change
   Monitor:     49.841 ns/change
   Monitor:     49.883 ns/change
      Lock:     44.997 ns/change
      Lock:     44.390 ns/change
      Lock:     44.946 ns/change
      Lock:     44.924 ns/change
      Lock:     44.703 ns/change
      Lock:     44.684 ns/change
      Lock:     44.857 ns/change
      Lock:     44.880 ns/change

ladeak · 2024-08-15T17:29:27Z

@kouvel

 Monitor:     86,511 ns/change
   Monitor:     84,948 ns/change
   Monitor:     86,056 ns/change
   Monitor:     82,722 ns/change
   Monitor:     85,529 ns/change
   Monitor:     86,259 ns/change
   Monitor:     85,450 ns/change
   Monitor:     84,780 ns/change
      Lock:     78,784 ns/change
      Lock:     78,250 ns/change
      Lock:     77,387 ns/change
      Lock:     76,139 ns/change
      Lock:     79,439 ns/change
      Lock:     77,637 ns/change
      Lock:     80,203 ns/change
      Lock:     79,250 ns/change

and then I run it again:

 Monitor:     73,323 ns/change
   Monitor:     72,383 ns/change
   Monitor:     71,390 ns/change
   Monitor:     72,732 ns/change
   Monitor:     73,548 ns/change
   Monitor:     73,232 ns/change
   Monitor:     73,744 ns/change
   Monitor:     73,970 ns/change
      Lock:     81,881 ns/change
      Lock:     83,385 ns/change
      Lock:     84,567 ns/change
      Lock:     84,398 ns/change
      Lock:     83,790 ns/change
      Lock:     83,837 ns/change
      Lock:     84,742 ns/change
      Lock:     84,603 ns/change

and again:

  Monitor:     86,886 ns/change
   Monitor:     85,706 ns/change
   Monitor:     85,577 ns/change
   Monitor:     87,071 ns/change
   Monitor:     88,068 ns/change
   Monitor:     88,368 ns/change
   Monitor:     87,037 ns/change
   Monitor:     87,302 ns/change
      Lock:     80,370 ns/change
      Lock:     80,248 ns/change
      Lock:     80,148 ns/change
      Lock:     79,105 ns/change
      Lock:     75,010 ns/change
      Lock:     80,219 ns/change
      Lock:     78,221 ns/change
      Lock:     79,416 ns/change

and a 4th time:

   Monitor:     74,073 ns/change
   Monitor:     74,139 ns/change
   Monitor:     74,123 ns/change
   Monitor:     74,676 ns/change
   Monitor:     73,848 ns/change
   Monitor:     74,512 ns/change
   Monitor:     74,441 ns/change
   Monitor:     75,091 ns/change
      Lock:     71,967 ns/change
      Lock:     71,727 ns/change
      Lock:     71,873 ns/change
      Lock:     70,318 ns/change
      Lock:     71,796 ns/change
      Lock:     71,635 ns/change
      Lock:     71,204 ns/change
      Lock:     72,196 ns/change

I think the results on this look better in the sense as in BDN the new Lock had worse results in general. It looks much better in this test

kouvel · 2024-08-15T17:36:45Z

Ok maybe just a processor difference there, my results are varying a bit too between runs, but mostly similar. There are more things that can be improved for Lock, such as dotnet/runtime#79485. I'm not too worried about the difference there.

ladeak · 2024-08-15T18:03:31Z

@kouvel in your suggested measurement code I updated the ThreadCount const from value 4 to 2, and I get the following results:

   Monitor:     23,446 ns/change
   Monitor:     22,299 ns/change
   Monitor:     23,097 ns/change
   Monitor:     22,636 ns/change
   Monitor:     23,446 ns/change
   Monitor:     22,707 ns/change
   Monitor:     22,448 ns/change
   Monitor:     22,749 ns/change
      Lock:     34,962 ns/change
      Lock:     33,850 ns/change
      Lock:     35,289 ns/change
      Lock:     35,144 ns/change
      Lock:     34,545 ns/change
      Lock:     34,966 ns/change
      Lock:     33,423 ns/change
      Lock:     32,656 ns/change

2nd run

  Monitor:     23,121 ns/change
   Monitor:     22,298 ns/change
   Monitor:     22,972 ns/change
   Monitor:     23,040 ns/change
   Monitor:     22,283 ns/change
   Monitor:     22,390 ns/change
   Monitor:     22,106 ns/change
   Monitor:     22,301 ns/change
      Lock:     26,932 ns/change
      Lock:     26,819 ns/change
      Lock:     26,672 ns/change
      Lock:     26,820 ns/change
      Lock:     26,858 ns/change
      Lock:     26,787 ns/change
      Lock:     26,837 ns/change
      Lock:     26,911 ns/change

3rd run:

 Monitor:     26,833 ns/change
   Monitor:     26,722 ns/change
   Monitor:     25,793 ns/change
   Monitor:     26,854 ns/change
   Monitor:     26,866 ns/change
   Monitor:     26,718 ns/change
   Monitor:     26,735 ns/change
   Monitor:     26,741 ns/change
      Lock:     41,130 ns/change
      Lock:     41,578 ns/change
      Lock:     40,824 ns/change
      Lock:     40,474 ns/change
      Lock:     36,992 ns/change
      Lock:     39,834 ns/change
      Lock:     36,275 ns/change
      Lock:     35,051 ns/change

I repeated the tests a few times with different thread counts, and it seems with ThreadCount=4 Lock is consistently better, with 3 or 2, Monitor is consistently better. With 1 thread Lock ~1ns better.

kouvel · 2024-08-15T19:46:52Z

Yea it is possible actually. In the test, the more number of times one thread can reacquire the lock in sequence, the better the result would be. For instance, using a Sleep(1) upon contention would make the test show very good results, but lock acquisition latency would get very high. Slight differences in timing could affect the result, it's probably more nuanced to benchmark performance under contention. I typically randomize how long the lock is held to help with stabilizing the results. There are some small spin-waiting differences in Lock to yield the thread more quickly upon contention, which could also affect the timing, and helps to not waste time spinning when other threads can do more useful work.

ladeak · 2024-08-16T20:02:21Z

@BrennanConroy and @halter73 I think all questions are clarified.

halter73 · 2024-08-16T22:24:18Z

If we're worried about regressing contended performance, should we continue to use object for Http2FrameWriter._windowUpdateLock and FlowControl._flowLock? I imagine Http2FrameWriter.WriteToOutputPipe will usually acquire the lock more than Http2Connection.ProcessWindowUpdateFrameAsync when writing a bunch of response bodies, but both methods will acquire the lock a lot from different threads. It seems most similar to the ThreadCount=2 scenario.

Similar logic goes for FlowControl._flowLock. That only affects request body uploads which are rarer, but could be worse because each concurrent stream tries to independently acquire the connection-level _flowLock on their own threadpool thread. It doesn't have logic to consolidate the input reading for all streams into a single loop that's analogous to what WriteToOutputPipe does for output writing.

The other locks seem like a purer win, since they will almost always be accessed from the application thread, and the locks are mostly there to handle early aborts.

ladeak · 2024-08-17T13:18:35Z

I have been trying a few other things out, ie. WSL (Ubuntu 22.04.03), same machine

4 threads (the variance is huge between executions):

   Monitor:    150.381 ns/change
   Monitor:    148.997 ns/change
   Monitor:    152.141 ns/change
   Monitor:    154.373 ns/change
   Monitor:    148.490 ns/change
   Monitor:    153.407 ns/change
   Monitor:    147.473 ns/change
   Monitor:    133.847 ns/change
      Lock:    126.061 ns/change
      Lock:    126.451 ns/change
      Lock:    125.610 ns/change
      Lock:    126.589 ns/change
      Lock:    125.851 ns/change
      Lock:    120.473 ns/change
      Lock:    105.485 ns/change
      Lock:    105.622 ns/change

   Monitor:     70.997 ns/change
   Monitor:     70.977 ns/change
   Monitor:     71.349 ns/change
   Monitor:     71.223 ns/change
   Monitor:     71.148 ns/change
   Monitor:     71.697 ns/change
   Monitor:     70.972 ns/change
   Monitor:     70.906 ns/change
      Lock:    114.761 ns/change
      Lock:    113.196 ns/change
      Lock:    114.455 ns/change
      Lock:    113.534 ns/change
      Lock:    114.359 ns/change
      Lock:    113.941 ns/change
      Lock:    114.224 ns/change
      Lock:    114.851 ns/change

With 2 threads the results are pretty similar as on Windows, but with somewhat more variance between runs:

   Monitor:     25.994 ns/change
   Monitor:     26.257 ns/change
   Monitor:     26.150 ns/change
   Monitor:     26.087 ns/change
   Monitor:     25.803 ns/change
   Monitor:     26.018 ns/change
   Monitor:     25.840 ns/change
   Monitor:     25.803 ns/change
      Lock:     25.146 ns/change
      Lock:     24.732 ns/change
      Lock:     24.764 ns/change
      Lock:     25.223 ns/change
      Lock:     25.227 ns/change
      Lock:     25.133 ns/change
      Lock:     25.582 ns/change
      Lock:     25.067 ns/change

src/Servers/Kestrel/Core/src/Internal/Http2/Http2Stream.cs

ladeak · 2024-09-04T20:57:13Z

Some updates: with the latest preview the Lock type seems to be faster in all above cases (1-4 threads) compared to locking a non Lock instance.

src/Servers/Kestrel/Core/src/Internal/Http2/FlowControl/InputFlowControl.cs

- Replacing lock objects with Use System.Threading.Lock - HTTP3 has a few dictionaries/lists locked that is not addressed in this PR

src/Servers/Kestrel/Core/src/Internal/Http2/FlowControl/InputFlowControl.cs

src/Servers/Kestrel/perf/Microbenchmarks/Http2/FlowControl/InputFlowControl.cs

src/Servers/Kestrel/Core/test/Http2/FlowControl/InputFlowControlTests.cs

src/Servers/Kestrel/Core/src/Internal/Http2/FlowControl/InputFlowControl.cs

ladeak · 2024-09-17T20:13:10Z

@halter73 would you have further suggestions?

halter73 · 2024-09-19T01:17:54Z

Thanks for all the work you've put into this! I think it would be better to first merge the System.Threading.Lock changes and separate the lock-free changes into another PR. And I'd keep the _flowLock as a plain object at least for now.

The lock-free changes look good to me too, but I would appreciate like @BrennanConroy @amcasey or maybe even @benaadams looking at it. Putting it in a separate PR to make will make it easier for others to review it on its own and make it easier to revert if need be.

ladeak · 2024-09-19T06:17:07Z

Reverted this PR to use the scope of using Lock-s everywhere. Incorporating the latest measurements (which seemed to be faster in all cases), so InputFlowControl and Http2FrameWriter._windowUpdateLock uses it. Soon opening a new PR with the interlocked changes.

ladeak · 2024-09-19T18:06:21Z

@halter73 I have opened a separate PR: #57968 for the interlocked changes in InputFlowControl

ladeak requested review from halter73, BrennanConroy, JamesNK and mgravell as code owners August 8, 2024 14:21

dotnet-issue-labeler bot added the area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions label Aug 8, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Aug 8, 2024

ladeak force-pushed the ladeak-56794 branch from 23be6a7 to c6489eb Compare August 8, 2024 16:57

dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Aug 24, 2024

ladeak force-pushed the ladeak-56794 branch from 28f470d to 4cff31f Compare August 28, 2024 16:36

build-analysis bot mentioned this pull request Aug 28, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

ladeak commented Aug 31, 2024

View reviewed changes

src/Servers/Kestrel/Core/src/Internal/Http2/Http2Stream.cs Outdated Show resolved Hide resolved

ladeak commented Sep 4, 2024

View reviewed changes

src/Servers/Kestrel/Core/src/Internal/Http2/FlowControl/InputFlowControl.cs Show resolved Hide resolved

Use System.Threading.Lock in Kestrel

5d7287b

- Replacing lock objects with Use System.Threading.Lock - HTTP3 has a few dictionaries/lists locked that is not addressed in this PR

ladeak force-pushed the ladeak-56794 branch 3 times, most recently from fcab2b3 to 8878dad Compare September 8, 2024 12:16

ladeak commented Sep 8, 2024

View reviewed changes

src/Servers/Kestrel/Core/src/Internal/Http2/FlowControl/InputFlowControl.cs Outdated Show resolved Hide resolved

mkArtakMSFT assigned adityamandaleeka Sep 9, 2024

halter73 reviewed Sep 11, 2024

View reviewed changes

src/Servers/Kestrel/perf/Microbenchmarks/Http2/FlowControl/InputFlowControl.cs Outdated Show resolved Hide resolved

halter73 reviewed Sep 11, 2024

View reviewed changes

src/Servers/Kestrel/Core/test/Http2/FlowControl/InputFlowControlTests.cs Outdated Show resolved Hide resolved

halter73 reviewed Sep 11, 2024

View reviewed changes

src/Servers/Kestrel/Core/src/Internal/Http2/FlowControl/InputFlowControl.cs Outdated Show resolved Hide resolved

This was referenced Sep 11, 2024

Helix failure EPT_S_NOT_REGISTERED #57416

Open

There are no more endpoints available from the endpoint mapper. dotnet/dnceng#3844

Open

ladeak force-pushed the ladeak-56794 branch from bfc2e12 to 5d7287b Compare September 19, 2024 06:14

ladeak requested a review from halter73 September 19, 2024 17:08

ladeak mentioned this pull request Sep 19, 2024

Using Interlocked in InputFlowControl and Http2Stream #57968

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use System.Threading.Lock in Kestrel #57236

Use System.Threading.Lock in Kestrel #57236

ladeak commented Aug 8, 2024 •

edited

Loading

martincostello commented Aug 8, 2024

BrennanConroy commented Aug 8, 2024

BrennanConroy commented Aug 8, 2024

ladeak commented Aug 8, 2024

pr-benchmarks bot commented Aug 8, 2024

pr-benchmarks bot commented Aug 8, 2024

amcasey commented Aug 12, 2024

ladeak commented Aug 12, 2024

halter73 commented Aug 12, 2024

kouvel commented Aug 12, 2024

ladeak commented Aug 13, 2024

kouvel commented Aug 14, 2024 •

edited

Loading

kouvel commented Aug 14, 2024 •

edited

Loading

ladeak commented Aug 14, 2024

kouvel commented Aug 14, 2024 •

edited

Loading

ladeak commented Aug 14, 2024

kouvel commented Aug 15, 2024

ladeak commented Aug 15, 2024 •

edited

Loading

kouvel commented Aug 15, 2024

ladeak commented Aug 15, 2024

kouvel commented Aug 15, 2024

ladeak commented Aug 16, 2024

halter73 commented Aug 16, 2024

ladeak commented Aug 17, 2024

ladeak commented Sep 4, 2024

ladeak commented Sep 17, 2024

halter73 commented Sep 19, 2024 •

edited

Loading

ladeak commented Sep 19, 2024

ladeak commented Sep 19, 2024 •

edited

Loading

Use System.Threading.Lock in Kestrel #57236

Are you sure you want to change the base?

Use System.Threading.Lock in Kestrel #57236

Conversation

ladeak commented Aug 8, 2024 • edited Loading

Use System.Threading.Lock in Kestrel

Description

martincostello commented Aug 8, 2024

BrennanConroy commented Aug 8, 2024

BrennanConroy commented Aug 8, 2024

ladeak commented Aug 8, 2024

pr-benchmarks bot commented Aug 8, 2024

pr-benchmarks bot commented Aug 8, 2024

amcasey commented Aug 12, 2024

ladeak commented Aug 12, 2024

halter73 commented Aug 12, 2024

kouvel commented Aug 12, 2024

ladeak commented Aug 13, 2024

Single Task

3 Tasks

2 Tasks

kouvel commented Aug 14, 2024 • edited Loading

kouvel commented Aug 14, 2024 • edited Loading

ladeak commented Aug 14, 2024

kouvel commented Aug 14, 2024 • edited Loading

ladeak commented Aug 14, 2024

kouvel commented Aug 15, 2024

ladeak commented Aug 15, 2024 • edited Loading

kouvel commented Aug 15, 2024

ladeak commented Aug 15, 2024

kouvel commented Aug 15, 2024

ladeak commented Aug 16, 2024

halter73 commented Aug 16, 2024

ladeak commented Aug 17, 2024

ladeak commented Sep 4, 2024

ladeak commented Sep 17, 2024

halter73 commented Sep 19, 2024 • edited Loading

ladeak commented Sep 19, 2024

ladeak commented Sep 19, 2024 • edited Loading

ladeak commented Aug 8, 2024 •

edited

Loading

kouvel commented Aug 14, 2024 •

edited

Loading

kouvel commented Aug 14, 2024 •

edited

Loading

kouvel commented Aug 14, 2024 •

edited

Loading

ladeak commented Aug 15, 2024 •

edited

Loading

halter73 commented Sep 19, 2024 •

edited

Loading

ladeak commented Sep 19, 2024 •

edited

Loading