Skip to content

test: Add perf test to measure the "cost" of chain wakeup #2599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 20, 2025

Conversation

xemul
Copy link
Contributor

@xemul xemul commented Dec 24, 2024

test: Add perf test to measure the "cost" of chain wakeup

The test creates a chain of future-promise-s and then wakes up the last one thus causing the cascade of resolutions. There are 2x2 tests -- resolve with value or exception vs co_await-ed or .then()-d chains.

The result is (depth of 32)

test                   iterations      median         mad         min         max      allocs       tasks        inst      cycles
chain.then_value         29944640    33.158ns     0.026ns    33.059ns    33.740ns       1.063       1.094       314.9         0.0
chain.await_value        18963648    52.668ns     0.242ns    52.426ns    54.334ns       1.094       1.125       414.3         0.0
chain.then_exception     29095808    33.633ns     0.093ns    33.541ns    35.308ns       1.094       1.094       316.3         0.0
chain.await_exception      325344     3.112us    17.301ns     3.083us     3.192us       2.156       1.125     23489.0         0.0

Waking up co-await-ed chain with exceptions is extremely expensive.

The result with depth of 8 are the same, which means that exception propagation via co-awaits is expensive on every co_await, as it "scales" linearly with the chain depth.

@xemul xemul requested a review from avikivity December 24, 2024 11:52
@avikivity
Copy link
Member

The test creates a chain of future-promise-s and then wakes up the last one thus causing the cascade of resolutions. There are 2x2 tests -- resolve with value or exception vs co_await-ed or .then()-d chains.

The result is (depth of 32)

test                   iterations      median         mad         min         max      allocs       tasks        inst      cycles
chain.then_value           992498     1.007us     1.715ns     1.001us     1.008us      32.000      33.000      9623.7         0.0
chain.await_value          669230     1.466us     2.124ns     1.457us     1.512us      33.000      34.000     12804.5         0.0
chain.then_exception       955761     1.041us     0.731ns     1.039us     1.044us      34.000      34.000      9906.2         0.0
chain.await_exception        9980    96.767us    57.374ns    96.399us    96.850us      68.000      35.000    747836.2         0.0

Waking up co-await-ed chain with exceptions is extremely expensive.

Worse, if comparing the result with depth of 8

test                   iterations      median         mad         min         max      allocs       tasks        inst      cycles
chain.then_value          3516439   269.729ns     0.459ns   269.243ns   273.017ns       8.000       9.000      2582.4         0.0
chain.await_value         2511108   397.325ns     0.919ns   396.407ns   409.861ns       9.000      10.000      3623.0         0.0
chain.then_exception      2909952   344.783ns     0.559ns   341.926ns   345.653ns      10.000      10.000      3081.4         0.0
chain.await_exception       37663    26.700us   153.423ns    26.422us    27.405us      20.000      11.000    204489.5         0.0

it's clear that exception propagation via co-awaits is expensive on every co_await, as it "scales" linearly with the chain depth.

Please normalize by the depth, or at least add the depth to the test name. It's not then_value that takes 1 usec, it's 32 of them.

@bhalevy
Copy link
Member

bhalevy commented Dec 29, 2024

@xemul thanks. This is very important.
BTW, why is the cycles/op stat missing?

@travisdowns
Copy link
Contributor

travisdowns commented Jan 3, 2025

BTW, why is the cycles/op stat missing?

This is caused by a bug where cycles may always be zero for some benchmarks (issue: #2587, fix: #2588).

Also, cycles column is entirely missing in .md output, fix: #2590.

@travisdowns
Copy link
Contributor

it's clear that exception propagation via co-awaits is expensive on every co_await, as it "scales" linearly with the chain depth.

Makes sense I guess as every co_await in the chain involves a throw-catch and those are the expensive part (high fixed cost, and also a cost per frame unwound though I guess there are very few frames unwound for each throw/catch here).

The test creates a chain of future-promise-s and then wakes up the last
one thus causing the cascade of resolutions. There are 2x2 tests --
resolve with value or exception vs co_await-ed or .then()-d chains.

The result is (depth of 32)

test                   iterations      median         mad         min         max      allocs       tasks        inst      cycles
chain.then_value         29944640    33.158ns     0.026ns    33.059ns    33.740ns       1.063       1.094       314.9         0.0
chain.await_value        18963648    52.668ns     0.242ns    52.426ns    54.334ns       1.094       1.125       414.3         0.0
chain.then_exception     29095808    33.633ns     0.093ns    33.541ns    35.308ns       1.094       1.094       316.3         0.0
chain.await_exception      325344     3.112us    17.301ns     3.083us     3.192us       2.156       1.125     23489.0         0.0

Waking up co-await-ed chain with exceptions is extremely expensive.

The result with depth of 8 are the same, which means that exception
propagation via co-awaits is expensive on _every_ co_await, as it
"scales" linearly with the chain depth.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
@xemul xemul force-pushed the br-perf-test-wakeup branch from 9303523 to 5a2414b Compare January 9, 2025 07:51
@xemul
Copy link
Contributor Author

xemul commented Jan 9, 2025

upd:

  • fixed (I hope) CI compilation
  • normalized results by scale to see the cost of a single continuation

upd-upd:

  • CI is green now 🎉

@xemul
Copy link
Contributor Author

xemul commented Jan 9, 2025

it's clear that exception propagation via co-awaits is expensive on every co_await, as it "scales" linearly with the chain depth.

Makes sense I guess as every co_await in the chain involves a throw-catch and those are the expensive part (high fixed cost, and also a cost per frame unwound though I guess there are very few frames unwound for each throw/catch here).

That's exactly my guess too (Cc @regevran @tchaikov )

@xemul
Copy link
Contributor Author

xemul commented Jan 13, 2025

@avikivity , please consider merging

@regevran
Copy link

That's exactly my guess too

It is a possible explanation but a closer look at the implementation is required.
I will try to take a look when hopefully I have the leisure.

@xemul
Copy link
Contributor Author

xemul commented Jan 20, 2025

@avikivity , please consider merging

@avikivity avikivity merged commit 2312b7a into scylladb:master Jan 20, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants