-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Alert message on latest commits #6730
Comments
Sorry I didn't expect that benchmarks will be run in every PR.. The fact is this benchmark is always slow, also in my PR. The purpose of it is to compare, in extreme data sizes, the performance difference between SQLite's implementation of json contains with PostgreSQL's built-in implementation, which finds the prior one is about ~2x slower than the later one. This is acceptable. Is it possible to exclude this benchmark from being run frequently? Or it's also reasonable to just remove this benchmark as it doesn't need to exist in the test suite, as its mission is already accomplished. EDIT: Let me just remove this benchmark. It doesn't make too much sense to keep it along with the test suite. EDIT2: It still make some sense. I will look into this problem. |
TLDR: The test is flaky due to randomized data and database being sensitive to fluctuations. Fixing the random seed helps, but doesn't fully solve the issue. Adopting a better metric like median instead of OPS is a better option. I performed the benchmark a few times locally and found that the test is indeed flaky -- the time consumption of a single case can vary significantly, ranging from 5 ms to 37 ms, a ~7x difference. I plotted the histogram of the rounds for this case, and it shows a long-tailed distribution. Most of the rounds are quite fast, but a few outliers are significantly slower. The median of this case is 5 ms, and the average is 6 ms, with both being close to the minimum. Below is the histogram (x-axis: time consumption, y-axis: number of occurrences): I identified two possible factors contributing to this issue:
The first problem can be addressed by fixing the seed before generating randomized data. I have implemented this in a local branch, and it has reduced the variability. However, while the variability is lower, the maximum remains high, and the stddev is still unacceptable, which leads us to the second issue. The second problem is harder to eliminate entirely. As database operations can be inherently variable, the best we can do is to adopt better metrics for monitoring. Some promising alternatives include using median or average instead of relying on OPS, which is overly sensitive to such fluctuations. My conclusion is that this warning can be ignored for now, as it does not directly impact the functionality. However, I recommend replacing OPS with a better metric such as median to improve monitoring reliability in the future. In the meantime, I can submit a PR to add a fixed seed for randomized data generation to reduce flakiness. |
Wow, thanks a lot @rabbull for the detailed analysis! I see the problem now. Since GitHub's computation resource is not that reliable, sometimes is slower/faster based on availability/load/etc, I suggest to introduce a nighty test, that executes this particular test (or the entire test suit?) several times (5-10), and simply just get the smallest number of those runs and compare that with the benchmark number. |
By
did you mean selecting the run with highest OPS among 5-10 runs? Or selecting the fastest iteration in all iterations of the several full runs? |
yes, the second! |
This looks promising to me. However do we have proper tools for such cases? |
nope :) we don't. |
The performance of this particular test is getting slower and slower..
/tests/benchmark/test_json_contains.py::test_wide_json
first warning:
d0c9572#commitcomment-151682807
second warning:
eba6954#commitcomment-151687906
third warning:
208d6a9#commitcomment-151735963
@rabbull should we ignore this? any idea why it has performed super fast on your PR 😅
pinning @unkcpz who noticed this
The text was updated successfully, but these errors were encountered: