Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Alert message on latest commits #6730

Open
khsrali opened this issue Jan 24, 2025 · 8 comments
Open

Performance Alert message on latest commits #6730

khsrali opened this issue Jan 24, 2025 · 8 comments
Assignees

Comments

@khsrali
Copy link
Contributor

khsrali commented Jan 24, 2025

The performance of this particular test is getting slower and slower..

/tests/benchmark/test_json_contains.py::test_wide_json

first warning:
d0c9572#commitcomment-151682807
second warning:
eba6954#commitcomment-151687906
third warning:
208d6a9#commitcomment-151735963

@rabbull should we ignore this? any idea why it has performed super fast on your PR 😅

pinning @unkcpz who noticed this

@rabbull
Copy link
Contributor

rabbull commented Jan 24, 2025

Sorry I didn't expect that benchmarks will be run in every PR..

The fact is this benchmark is always slow, also in my PR. The purpose of it is to compare, in extreme data sizes, the performance difference between SQLite's implementation of json contains with PostgreSQL's built-in implementation, which finds the prior one is about ~2x slower than the later one. This is acceptable.

Is it possible to exclude this benchmark from being run frequently? Or it's also reasonable to just remove this benchmark as it doesn't need to exist in the test suite, as its mission is already accomplished.

EDIT: Let me just remove this benchmark. It doesn't make too much sense to keep it along with the test suite.

EDIT2: It still make some sense. I will look into this problem.

@rabbull
Copy link
Contributor

rabbull commented Jan 25, 2025

TLDR: The test is flaky due to randomized data and database being sensitive to fluctuations. Fixing the random seed helps, but doesn't fully solve the issue. Adopting a better metric like median instead of OPS is a better option.

I performed the benchmark a few times locally and found that the test is indeed flaky -- the time consumption of a single case can vary significantly, ranging from 5 ms to 37 ms, a ~7x difference. I plotted the histogram of the rounds for this case, and it shows a long-tailed distribution. Most of the rounds are quite fast, but a few outliers are significantly slower. The median of this case is 5 ms, and the average is 6 ms, with both being close to the minimum. Below is the histogram (x-axis: time consumption, y-axis: number of occurrences):

Image

I identified two possible factors contributing to this issue:

  1. the benchmark relies on randomized data generation, leading to different test data for every round. This introduces significant variability in performance across rounds.
  2. the operation involves a heavy database query (finding rows with JSON data in some column satisfying a condition). Database queries like this are inherently sensitive to fluctuations, even after warmup, due to factors such as caching and CPU/IO load.

The first problem can be addressed by fixing the seed before generating randomized data. I have implemented this in a local branch, and it has reduced the variability. However, while the variability is lower, the maximum remains high, and the stddev is still unacceptable, which leads us to the second issue.

The second problem is harder to eliminate entirely. As database operations can be inherently variable, the best we can do is to adopt better metrics for monitoring. Some promising alternatives include using median or average instead of relying on OPS, which is overly sensitive to such fluctuations.

My conclusion is that this warning can be ignored for now, as it does not directly impact the functionality. However, I recommend replacing OPS with a better metric such as median to improve monitoring reliability in the future. In the meantime, I can submit a PR to add a fixed seed for randomized data generation to reduce flakiness.

@khsrali
Copy link
Contributor Author

khsrali commented Jan 29, 2025

Wow, thanks a lot @rabbull for the detailed analysis!

I see the problem now. Since GitHub's computation resource is not that reliable, sometimes is slower/faster based on availability/load/etc, I suggest

to introduce a nighty test, that executes this particular test (or the entire test suit?) several times (5-10), and simply just get the smallest number of those runs and compare that with the benchmark number.
How about this?

@rabbull
Copy link
Contributor

rabbull commented Jan 30, 2025

By

introduce a nighty test, that executes this particular test (or the entire test suit?) several times (5-10), and simply just get the smallest number of those runs and compare that with the benchmark number

did you mean selecting the run with highest OPS among 5-10 runs? Or selecting the fastest iteration in all iterations of the several full runs?

@khsrali
Copy link
Contributor Author

khsrali commented Jan 30, 2025

yes, the second!
Selecting the fastest iter/sec of the several runs

@rabbull
Copy link
Contributor

rabbull commented Jan 30, 2025

yes, the second! Selecting the fastest iter/sec of the several runs

This looks promising to me. However do we have proper tools for such cases?

@khsrali
Copy link
Contributor Author

khsrali commented Jan 30, 2025

nope :) we don't.
We have nightly tests on place. But it will require some modifications to go with this.

@rabbull rabbull self-assigned this Jan 30, 2025
@khsrali
Copy link
Contributor Author

khsrali commented Feb 17, 2025

@rabbull the performance message is back:
ce9dcf4

Just checking if we decided to switch it off, in the end? I don't remember, can you remind me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants