-
-
Notifications
You must be signed in to change notification settings - Fork 816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: Mark some tests as slow and schedule them at the beginning #15950
base: master
Are you sure you want to change the base?
Conversation
(For what it's worth, I highly recommend using nextest rather than test for local development!) |
Right! Did not think about using nextest. 😄 Certainly has a better interface, but unfortunately tests are executing a bit longer :( I added nextest benchmarks to the table in the description. At first glance, performance improvements are similar to |
You could try the profile dedicated to CI - it's a bit more optimized: |
Although, I do like the idea of somehow trying to sort the test cases in decreasing real time duration. But I wish it was somehow ✨automagical✨, with no need to manually annotate anything. That sort (pun may or may not be intended) of thing is bound to get outdated. |
This PR is just a suggestion, I tried speeding tests up somehow and noticed that at the end my CPU was sitting idle. Regarding the I also thought about automating test annotation somehow. The idea of no annotations seems eerily close to the halting problem though ;) I was thinking about how to annotate the tests in order to schedule them properly. It could be the slow flag which divides tests into two categories, but it also could be a nondiscrete value (e.g. estimated execution time) which constitutes a total order. The latter seems to be an overkill, because (1) these values would be more dependent on the system, and (2) time improvement would be minimal compared to the former, which is a much simpler approach. Regarding automation, I thought about giving a feedback to the user about slow tests which are fast and non-slow tests which are slow. That would require two time values in order to create hysteresis. They could be set manually (e.g. 1s,10s), or calculated at the end based on the probability distribution (e.g. σ,2σ). That of course is a bit complicated so I did not go in this direction as I wasn't sure whether we would want this approach at all. That also sounds like a problem that possibly could be solved at a toolchain/library level. |
Of course, and thanks! :)
Bah, it could also be just a little system-local cache of "the last 5-10 times this test case was ran, it took xx milliseconds". Then to schedule, look up these records, average per case, and sort. The averaging should even out any discrepancies caused by P-core vs. E-core, powersave/performance, dynamic clocks, overall system load, thermal throttling, etc... And how a given test case is identified isn't that crucial either, as long as it works "reasonably enough", since it only determines the order of test executions, which shouldn't affect anything ™️, other than load balancing at the end, of course. This could even be a generic |
Ahh, see: nextest-rs/nextest#905 |
Okay, I think I found out why nextest does not run tests marked as slow; I also found out why tests on nextest are slower compared to Turns out The strategy of executing tests by nextest also explains why it's significantly slower than Removing the full walk on exact match makes nextest about 15s faster than before, which is closer to being as fast as |
1a196a2
to
12713e7
Compare
While I'm still not entirely sold on the manual |
Created a new PR with these two commits: #16031 |
I know I'm a bit late to the thread but
I'm not sure why I'd ever want that tbh. And I haven't really seen other projects do this kind of thing before either? |
Because it minimizes the chance that a many-core CPU grinds through most tests in X seconds, and one or two really long tests happened to get started at let's say X*0.8, extending the overall time to wait by another large fraction of X, while most cores are already idle. If the short and quick tests are left to the end, they can maximally utilize all of the cores all the way until all tests are done, therefore being quicker overall. It's a classic "packing problem" kind of thing... |
That is better when we expect tests to fail instead of pass, but generally in CI it's way more probable for the tests to pass. When running locally we might have different expectations, although I usually have a positive attitude and expect them to pass ;) |
Refreshed this PR a bit to see how it performs as of today. I've removed the
|
Just a random idea on the concern of "manual annotations getting put of sync with reality over time": Is it feasible to at least output a little note at the end of a test "campaign" like this?: Or is this perhaps also blocked on these:? |
I think that would be easy to implement in I would argue that this "is_slow" annotation is only a heuristic: having outdated annotations also improves test times (not as much as having current annotations, but still). The outdating issue also mainly refers to new tests, i.e. tests not marked as slow when they should be, as it's far less likely for a slow test to become fast out of the blue. If having a warning about wrongly marked tests after |
All fair points, and this isn't worth much effort even in my opinion - only if it was trivial, which it isn't then. |
It isn't trivial for nextest, but for |
@@ -15,6 +15,26 @@ pub struct Font { | |||
pub italic: bool, | |||
} | |||
|
|||
#[derive(Clone, Copy)] | |||
pub enum TestKind { | |||
Slow, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant before is, why is this "kind" type even needed?
Do you have any other "kinds" of tests you plan to add in the near future, that are mutually exclusive with them being "slow"? If not, I feel like this is just premature over engineering - YAGNI.
Or is this somehow a technical requirement to make test ordering with this attribute possible/easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sure, using test kinds was actually the easiest way. It's not like "I created test kinds to possibly support more than one kind", but rather "libtest_mimic supports test kinds and I can utilize that for slow tests and test ordering". I think I could create a wrapper for a test trial that contains the information about the test ordering, but wouldn't that be over engineering?
Compare that approach also to other frameworks, for instance:
- JUnit 4 test categories: https://github.com/junit-team/junit4/wiki/Categories (the example mentions
SlowTests.class
category), - JUnit 5 tags: https://junit.org/junit5/docs/current/user-guide/#writing-tests-tagging-and-filtering (the example mentions a
fast
tag), - pytest markers: https://docs.pytest.org/en/7.1.x/example/markers.html (docs mention a
slow
test mark).
The big disadvantage of libtest_mimic test kinds is that a test cannot have more than one kind, but it is possible to store multiple kinds by joining them e.g. with a comma (kind1,kind2
).
Using test kinds also means that tests with a kind are shown as such in the output (i.e. slow tests are prefixed with [slow]
).
If you're referring to the enum and not test kinds in general, well... that's only a high-level representation of a kind. It's better IMO than comparing strings, and it also encapsulates ordering logic. If you have any suggestions on improving it, I'm all ears!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but rather "libtest_mimic supports test kinds and I can utilize that for slow tests and test ordering".
Ooh, right, pardon my ignorance... It's been a while since we discussed this... ^^'
Is this still the case? If not, why is this still a draft? |
Updated slow tests, removed draft status. All nextest issues should be fixed already |
Maybe just randomizing them also achieves something like that? 😶 Or, we could start with randomizing, then sort the slow ones into the first places... Just to be extra sure...? |
By setting `is_slow = true`, a test may be marked as slow, which means that the duration of its execution is exceptionally long compared to other tests.
In order to minimize the duration of `cargo test`, slow tests should be executed at the beginning. This maximizes load on multithreaded CPUs by fully utilizing all threads and preventing late slow tests from stalling the whole suite.
The option `sleep_to_meet_frame_rate` artificially increases the duration of tests in order to run at realtime speed.
This patch marks all tests which execute significantly longer than an average test with `is_slow = true`.
As @torokati44 noticed... this PR works totally by accident, see: |
By setting
is_slow = true
, a test may be marked as slow, which means that the duration of its execution is exceptionally long compared to other tests.In order to minimize the duration of
cargo test
, slow tests should be executed at the beginning. This maximizes load on multithreaded CPUs by fully utilizing all threads and preventing late slow tests from stalling the whole suite.By enabling thefast
feature, slow tests may be skipped. This is useful e.g. when someone wants to quickly run the test suite without waiting for a long time in order to roughly assess whether their changes broke something.fast
cargo test
cargo test -F imgtests
cargo nextest run
cargo nextest run -F imgtests
cargo nextest run --cargo-profile ci