Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase the number of times CI runs benchmarks #49092

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rosstimothy
Copy link
Contributor

This is in an attempt to provide more reliable test results. 6 was chosen as that's the minimum number benchstat requires for a 95% confidence interval.

Example of warning emitted from benchstat in CI: https://github.com/gravitational/teleport/actions/runs/11862270592

@rosstimothy rosstimothy added the no-changelog Indicates that a PR does not require a changelog entry label Nov 15, 2024
@rosstimothy rosstimothy force-pushed the tross/increase_benchmark_count branch from 622b657 to 3ffcce1 Compare November 15, 2024 19:44
@rosstimothy
Copy link
Contributor Author

@r0mant @doggydogworld what are your thoughts on this change? The attempt is to provide more reliable information by running the benchmark tests a few times, though that comes at the expense of the tests themselves taking longer to complete. The benckmark tests are still however faster than unit and integration tests.

@doggydogworld
Copy link
Contributor

@r0mant @doggydogworld what are your thoughts on this change? The attempt is to provide more reliable information by running the benchmark tests a few times, though that comes at the expense of the tests themselves taking longer to complete. The benckmark tests are still however faster than unit and integration tests.

Yeah I originally chose the 1x to ensure that it finishes ASAP to not be a blocker and to prove that changes won't break the benchmark.

Running 6 times minimum would be necessary to test for performance regressions so this eventually was going to make it in. Since we've been able to observe so far that 6 times probably won't slow CI down too much I think it's good to do it now.

The eventual goal would then be to create a baseline (which would have more than 6 runs) to compare with these results to determine regressions.

Makefile Outdated
@@ -943,14 +943,14 @@ endif
test-go-bench: PACKAGES = $(shell grep --exclude-dir api --include "*_test.go" -lr testing.B . | xargs dirname | xargs go list | sort -u)
test-go-bench: BENCHMARK_SKIP_PATTERN = "^BenchmarkRoot"
test-go-bench: | $(TEST_LOG_DIR)
go test -run ^$$ -bench . -skip $(BENCHMARK_SKIP_PATTERN) -benchtime 1x $(PACKAGES) \
go test -run ^$$ -bench . -skip $(BENCHMARK_SKIP_PATTERN) -count=6 -benchtime 1x $(PACKAGES) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we take advantage of the increased runs in any way? I can see this being part of a larger change that makes use of the bench results, but on its own I'm not sure it adds much. (Even so I'm a bit skeptical that every benchmark we have was built with this in mind.)

Also notice that we are still running 1x, so even though the counts are higher the benchmark itself is still minimal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd like to use our benchmarks as a signal that nothing has broken or regressed between releases. However, right now output between two consecutive tags on a release branch can vary quite dramatically. I'm hoping this might have some impact on stabilizing things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we increase the benchtime too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against it, I was trying to limit the changes here to keep test times from ballooning. Do you have a recommendation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's hard to pitch a generic number, my 2c is that the flags should be on a per-benchmark basis. It also has some to do with how long we want the jobs to take.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it seems we have too many benchmarks that adding 6s for each one really balloons it. I think this is a good first step at least to get some data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like we'll want to revert to -benchtime=1x but I'll leave my stamp regardless. Thanks for the all the explanations here, folks.

Copy link
Contributor

@codingllama codingllama Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, one last suggestion: what about -benchtime=6x instead of -benchtime=1x -count=6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give that a try. I don't really want to merge this as is and require all PRs to wait >30 minutes just so the benchmark tests can finish.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With -benchtime=1x -count=6 we are back to need >= 6 samples for confidence interval at level 0.95

This is in an attempt to provide more reliable test results. 6 was
chosen as that's the minimum number benchstat requires for a 95%
confidence interval.
@rosstimothy rosstimothy force-pushed the tross/increase_benchmark_count branch from 1dcfe4f to ede6752 Compare December 5, 2024 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/branch/v16 backport/branch/v17 no-changelog Indicates that a PR does not require a changelog entry size/sm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants