test: Add perf test to measure the "cost" of chain wakeup #2599

xemul · 2024-12-24T11:52:11Z

test: Add perf test to measure the "cost" of chain wakeup

The test creates a chain of future-promise-s and then wakes up the last one thus causing the cascade of resolutions. There are 2x2 tests -- resolve with value or exception vs co_await-ed or .then()-d chains.

The result is (depth of 32)

test                   iterations      median         mad         min         max      allocs       tasks        inst      cycles
chain.then_value         29944640    33.158ns     0.026ns    33.059ns    33.740ns       1.063       1.094       314.9         0.0
chain.await_value        18963648    52.668ns     0.242ns    52.426ns    54.334ns       1.094       1.125       414.3         0.0
chain.then_exception     29095808    33.633ns     0.093ns    33.541ns    35.308ns       1.094       1.094       316.3         0.0
chain.await_exception      325344     3.112us    17.301ns     3.083us     3.192us       2.156       1.125     23489.0         0.0

Waking up co-await-ed chain with exceptions is extremely expensive.

The result with depth of 8 are the same, which means that exception propagation via co-awaits is expensive on every co_await, as it "scales" linearly with the chain depth.

avikivity · 2024-12-24T15:36:37Z

The test creates a chain of future-promise-s and then wakes up the last one thus causing the cascade of resolutions. There are 2x2 tests -- resolve with value or exception vs co_await-ed or .then()-d chains.

The result is (depth of 32)

test                   iterations      median         mad         min         max      allocs       tasks        inst      cycles
chain.then_value           992498     1.007us     1.715ns     1.001us     1.008us      32.000      33.000      9623.7         0.0
chain.await_value          669230     1.466us     2.124ns     1.457us     1.512us      33.000      34.000     12804.5         0.0
chain.then_exception       955761     1.041us     0.731ns     1.039us     1.044us      34.000      34.000      9906.2         0.0
chain.await_exception        9980    96.767us    57.374ns    96.399us    96.850us      68.000      35.000    747836.2         0.0

Waking up co-await-ed chain with exceptions is extremely expensive.

Worse, if comparing the result with depth of 8

test                   iterations      median         mad         min         max      allocs       tasks        inst      cycles
chain.then_value          3516439   269.729ns     0.459ns   269.243ns   273.017ns       8.000       9.000      2582.4         0.0
chain.await_value         2511108   397.325ns     0.919ns   396.407ns   409.861ns       9.000      10.000      3623.0         0.0
chain.then_exception      2909952   344.783ns     0.559ns   341.926ns   345.653ns      10.000      10.000      3081.4         0.0
chain.await_exception       37663    26.700us   153.423ns    26.422us    27.405us      20.000      11.000    204489.5         0.0

it's clear that exception propagation via co-awaits is expensive on every co_await, as it "scales" linearly with the chain depth.

Please normalize by the depth, or at least add the depth to the test name. It's not then_value that takes 1 usec, it's 32 of them.

bhalevy · 2024-12-29T08:39:26Z

@xemul thanks. This is very important.
BTW, why is the cycles/op stat missing?

travisdowns · 2025-01-03T15:03:33Z

BTW, why is the cycles/op stat missing?

This is caused by a bug where cycles may always be zero for some benchmarks (issue: #2587, fix: #2588).

Also, cycles column is entirely missing in .md output, fix: #2590.

travisdowns · 2025-01-03T15:07:16Z

it's clear that exception propagation via co-awaits is expensive on every co_await, as it "scales" linearly with the chain depth.

Makes sense I guess as every co_await in the chain involves a throw-catch and those are the expensive part (high fixed cost, and also a cost per frame unwound though I guess there are very few frames unwound for each throw/catch here).

The test creates a chain of future-promise-s and then wakes up the last one thus causing the cascade of resolutions. There are 2x2 tests -- resolve with value or exception vs co_await-ed or .then()-d chains. The result is (depth of 32) test iterations median mad min max allocs tasks inst cycles chain.then_value 29944640 33.158ns 0.026ns 33.059ns 33.740ns 1.063 1.094 314.9 0.0 chain.await_value 18963648 52.668ns 0.242ns 52.426ns 54.334ns 1.094 1.125 414.3 0.0 chain.then_exception 29095808 33.633ns 0.093ns 33.541ns 35.308ns 1.094 1.094 316.3 0.0 chain.await_exception 325344 3.112us 17.301ns 3.083us 3.192us 2.156 1.125 23489.0 0.0 Waking up co-await-ed chain with exceptions is extremely expensive. The result with depth of 8 are the same, which means that exception propagation via co-awaits is expensive on _every_ co_await, as it "scales" linearly with the chain depth. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

xemul · 2025-01-09T07:53:29Z

upd:

fixed ~~(I hope)~~ CI compilation
normalized results by scale to see the cost of a single continuation

upd-upd:

CI is green now 🎉

xemul · 2025-01-09T07:54:13Z

it's clear that exception propagation via co-awaits is expensive on every co_await, as it "scales" linearly with the chain depth.

Makes sense I guess as every co_await in the chain involves a throw-catch and those are the expensive part (high fixed cost, and also a cost per frame unwound though I guess there are very few frames unwound for each throw/catch here).

That's exactly my guess too (Cc @regevran @tchaikov )

xemul · 2025-01-13T08:56:52Z

@avikivity , please consider merging

regevran · 2025-01-14T07:37:32Z

That's exactly my guess too

It is a possible explanation but a closer look at the implementation is required.
I will try to take a look when hopefully I have the leisure.

xemul · 2025-01-20T08:51:32Z

@avikivity , please consider merging

xemul requested a review from avikivity December 24, 2024 11:52

xemul force-pushed the br-perf-test-wakeup branch from 9303523 to 5a2414b Compare January 9, 2025 07:51

tchaikov mentioned this pull request Jan 16, 2025

perf_tests: Right-align numeric metrics in markdown tables #2619

Merged

avikivity merged commit 2312b7a into scylladb:master Jan 20, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: Add perf test to measure the "cost" of chain wakeup #2599

test: Add perf test to measure the "cost" of chain wakeup #2599

Uh oh!

xemul commented Dec 24, 2024 •

edited

Loading

Uh oh!

avikivity commented Dec 24, 2024

Uh oh!

bhalevy commented Dec 29, 2024

Uh oh!

travisdowns commented Jan 3, 2025 •

edited

Loading

Uh oh!

travisdowns commented Jan 3, 2025

Uh oh!

xemul commented Jan 9, 2025 •

edited

Loading

Uh oh!

xemul commented Jan 9, 2025

Uh oh!

xemul commented Jan 13, 2025

Uh oh!

regevran commented Jan 14, 2025

Uh oh!

xemul commented Jan 20, 2025

Uh oh!

Uh oh!

Uh oh!

test: Add perf test to measure the "cost" of chain wakeup #2599

test: Add perf test to measure the "cost" of chain wakeup #2599

Uh oh!

Conversation

xemul commented Dec 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avikivity commented Dec 24, 2024

Uh oh!

bhalevy commented Dec 29, 2024

Uh oh!

travisdowns commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

travisdowns commented Jan 3, 2025

Uh oh!

xemul commented Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xemul commented Jan 9, 2025

Uh oh!

xemul commented Jan 13, 2025

Uh oh!

regevran commented Jan 14, 2025

Uh oh!

xemul commented Jan 20, 2025

Uh oh!

Uh oh!

Uh oh!

xemul commented Dec 24, 2024 •

edited

Loading

travisdowns commented Jan 3, 2025 •

edited

Loading

xemul commented Jan 9, 2025 •

edited

Loading