tests: Mark some tests as slow and schedule them at the beginning #15950

kjarosh · 2024-04-10T23:20:45Z

By setting is_slow = true, a test may be marked as slow, which means that the duration of its execution is exceptionally long compared to other tests.

In order to minimize the duration of cargo test, slow tests should be executed at the beginning. This maximizes load on multithreaded CPUs by fully utilizing all threads and preventing late slow tests from stalling the whole suite.

By enabling the fast feature, slow tests may be skipped. This is useful e.g. when someone wants to quickly run the test suite without waiting for a long time in order to roughly assess whether their changes broke something.

Test Command	Before	After	After with `fast`
`cargo test`	106.74s	77.45s (-27%)	43.55s (-59%)
`cargo test -F imgtests`	115.94s	80.55s (-31%)	52.22s (-55%)
`cargo nextest run`	118.382s	80.872s (-32%)	49.526s (-58%)
`cargo nextest run -F imgtests`	132.738s	91.500s (-31%)	69.857s (-47%)
`cargo nextest run --cargo-profile ci`	23.916s	13.178s (-45%)	10.070s (-58%)
CI ubuntu	118.932s	82.319s (-31%)	N/A
CI macos	69.940s	29.981s (-57%)	N/A
CI windows	310.591s	217.358s (-30%)	N/A

Dinnerbone · 2024-04-10T23:21:57Z

(For what it's worth, I highly recommend using nextest rather than test for local development!)

kjarosh · 2024-04-11T00:02:23Z

Right! Did not think about using nextest. 😄 Certainly has a better interface, but unfortunately tests are executing a bit longer :(

I added nextest benchmarks to the table in the description. At first glance, performance improvements are similar to cargo test. However, it seems that nextest does not execute tests marked as slow which is weird... 🤔 I'm going to investigate it tomorrow, and mark this PR as a draft for now.

torokati44 · 2024-04-11T00:04:07Z

You could try the profile dedicated to CI - it's a bit more optimized:
https://github.com/ruffle-rs/ruffle/pull/15320/files

torokati44 · 2024-04-11T00:25:46Z

Although, I do like the idea of somehow trying to sort the test cases in decreasing real time duration. But I wish it was somehow ✨automagical✨, with no need to manually annotate anything. That sort (pun may or may not be intended) of thing is bound to get outdated.
I'm on the fence about the fast feature... It feels like not the right tool to achieve this (tests can be filtered already anyway, at least by name, after all), but what do I know, maybe it is.

kjarosh · 2024-04-11T10:29:26Z

This PR is just a suggestion, I tried speeding tests up somehow and noticed that at the end my CPU was sitting idle. Regarding the fast feature, I'm not sure about it either. It executes 98.86% of tests taking only 60% of the time. I'm not sure how useful it is, but decided to include it anyways for you to jugde.

I also thought about automating test annotation somehow. The idea of no annotations seems eerily close to the halting problem though ;) I was thinking about how to annotate the tests in order to schedule them properly. It could be the slow flag which divides tests into two categories, but it also could be a nondiscrete value (e.g. estimated execution time) which constitutes a total order. The latter seems to be an overkill, because (1) these values would be more dependent on the system, and (2) time improvement would be minimal compared to the former, which is a much simpler approach.

Regarding automation, I thought about giving a feedback to the user about slow tests which are fast and non-slow tests which are slow. That would require two time values in order to create hysteresis. They could be set manually (e.g. 1s,10s), or calculated at the end based on the probability distribution (e.g. σ,2σ). That of course is a bit complicated so I did not go in this direction as I wasn't sure whether we would want this approach at all. That also sounds like a problem that possibly could be solved at a toolchain/library level.

torokati44 · 2024-04-11T10:38:00Z

This PR is just a suggestion,

Of course, and thanks! :)

Regarding automation, I thought about giving a feedback to the user about slow tests which are fast and non-slow tests which are slow.

nextest already warns about [SLOW] tests by itself, but it doesn't have the is_slow annotation of course, which would make this more or less acceptable for a given test case.

The idea of no annotations seems eerily close to the halting problem though ;) I was thinking about how to annotate the tests in order to schedule them properly. It could be the slow flag which divides tests into two categories, but it also could be a nondiscrete value (e.g. estimated execution time)

Bah, it could also be just a little system-local cache of "the last 5-10 times this test case was ran, it took xx milliseconds". Then to schedule, look up these records, average per case, and sort. The averaging should even out any discrepancies caused by P-core vs. E-core, powersave/performance, dynamic clocks, overall system load, thermal throttling, etc... And how a given test case is identified isn't that crucial either, as long as it works "reasonably enough", since it only determines the order of test executions, which shouldn't affect anything ™️, other than load balancing at the end, of course.

This could even be a generic cargo-test or cargo-nextest feature, maybe even enabled by default?

torokati44 · 2024-04-11T11:20:42Z

This could even be a generic cargo-test or cargo-nextest feature, maybe even enabled by default?

Ahh, see: nextest-rs/nextest#905
Also somewhat related: nextest-rs/nextest#742
Ah, and of course: nextest-rs/nextest#958

kjarosh · 2024-04-12T17:03:50Z

Okay, I think I found out why nextest does not run tests marked as slow; I also found out why tests on nextest are slower compared to cargo test.

Turns out libtest_mimic adds the test kind as a prefix before its name (e.g. [slow] test). Nextest executes tests one by one matching them by name (with the prefix), but the filtering is done manually by our code (is_candidate), which does not take into the account the kind prefix as libtest-mimic does (see TestInfo::test_name_with_kind from libtest-mimic). That silently ignores any test with a kind, as they are not executed but marked as passed.

The strategy of executing tests by nextest also explains why it's significantly slower than cargo test. Regression tests always perform a full walk over tests/swfs regardless of whether a test is matched exactly by its name or not. This full walk is performed of course for each test.

Removing the full walk on exact match makes nextest about 15s faster than before, which is closer to being as fast as cargo test. (I suspect the difference will be even greater on Windows, as it is not very efficient with files.)

torokati44 · 2024-04-18T15:28:53Z

While I'm still not entirely sold on the manual [slow] annotation, I think the first two commits could be considered independently as well - they seem useful!

kjarosh · 2024-04-18T16:34:03Z

Created a new PR with these two commits: #16031

adrian17 · 2024-04-18T18:08:08Z

I know I'm a bit late to the thread but

Although, I do like the idea of somehow trying to sort the test cases in decreasing real time duration.

I'm not sure why I'd ever want that tbh. And I haven't really seen other projects do this kind of thing before either?
(If we wanted to sort (which I personally don't), if anything, I'd go for increasing time duration; if there's a failing test, this will minimize average time to it being reported.)

torokati44 · 2024-04-18T18:30:24Z

I'm not sure why I'd ever want that tbh.

Because it minimizes the chance that a many-core CPU grinds through most tests in X seconds, and one or two really long tests happened to get started at let's say X*0.8, extending the overall time to wait by another large fraction of X, while most cores are already idle. If the short and quick tests are left to the end, they can maximally utilize all of the cores all the way until all tests are done, therefore being quicker overall. It's a classic "packing problem" kind of thing...

kjarosh · 2024-04-18T18:39:41Z

if anything, I'd go for increasing time duration

That is better when we expect tests to fail instead of pass, but generally in CI it's way more probable for the tests to pass. When running locally we might have different expectations, although I usually have a positive attitude and expect them to pass ;)

kjarosh · 2024-07-27T17:38:29Z

Refreshed this PR a bit to see how it performs as of today. I've removed the fast feature and did some tests (average out of 3 retries):

Test Command	Before	After	Difference
`cargo test`	119.06s	84.73s	−34.33 (-29%)
`cargo nextest run`	133.123s	97.44s	−35.68 (-27%)

torokati44 · 2024-07-27T18:11:31Z

Just a random idea on the concern of "manual annotations getting put of sync with reality over time": Is it feasible to at least output a little note at the end of a test "campaign" like this?:
"BTW test3428 is annotated as slow but was actually fsster than average (with some hysteresis margin), and test2638 and test2627 vica versa - you might want to check that."

Or is this perhaps also blocked on these:?
nextest-rs/nextest#20
nextest-rs/nextest#1152

kjarosh · 2024-07-27T19:06:11Z

I think that would be easy to implement in cargo test, but I just don't see how I would do that in nextest: our code is executed only when tests are listed, and then for each test separately. We would have to persist the data for each test, detect when the last test is being executed, and somehow convey the message to the user. Of course unless we use some nextest-specific hooks or something (which I'm not knowledgeable enough to say anything about that).

I would argue that this "is_slow" annotation is only a heuristic: having outdated annotations also improves test times (not as much as having current annotations, but still). The outdating issue also mainly refers to new tests, i.e. tests not marked as slow when they should be, as it's far less likely for a slow test to become fast out of the blue.

If having a warning about wrongly marked tests after cargo test is okay, I could add that.

torokati44 · 2024-07-27T19:08:55Z

All fair points, and this isn't worth much effort even in my opinion - only if it was trivial, which it isn't then.

kjarosh · 2024-07-27T19:17:04Z

It isn't trivial for nextest, but for cargo test I even remember having a PoC of something like that. But if we aren't planning on merging it anyway, taking the time to implement that does not make sense 😄

tests/framework/src/test.rs

tests/README.md

torokati44 · 2024-07-31T21:36:43Z

tests/framework/src/test.rs

@@ -15,6 +15,26 @@ pub struct Font {
    pub italic: bool,
 }

+#[derive(Clone, Copy)]
+pub enum TestKind {
+    Slow,


What I meant before is, why is this "kind" type even needed?
Do you have any other "kinds" of tests you plan to add in the near future, that are mutually exclusive with them being "slow"? If not, I feel like this is just premature over engineering - YAGNI.
Or is this somehow a technical requirement to make test ordering with this attribute possible/easier?

Oh sure, using test kinds was actually the easiest way. It's not like "I created test kinds to possibly support more than one kind", but rather "libtest_mimic supports test kinds and I can utilize that for slow tests and test ordering". I think I could create a wrapper for a test trial that contains the information about the test ordering, but wouldn't that be over engineering?

Compare that approach also to other frameworks, for instance:

JUnit 4 test categories: https://github.com/junit-team/junit4/wiki/Categories (the example mentions SlowTests.class category),

JUnit 5 tags: https://junit.org/junit5/docs/current/user-guide/#writing-tests-tagging-and-filtering (the example mentions a fast tag),

pytest markers: https://docs.pytest.org/en/7.1.x/example/markers.html (docs mention a slow test mark).

The big disadvantage of libtest_mimic test kinds is that a test cannot have more than one kind, but it is possible to store multiple kinds by joining them e.g. with a comma (kind1,kind2).

Using test kinds also means that tests with a kind are shown as such in the output (i.e. slow tests are prefixed with [slow]).

If you're referring to the enum and not test kinds in general, well... that's only a high-level representation of a kind. It's better IMO than comparing strings, and it also encapsulates ordering logic. If you have any suggestions on improving it, I'm all ears!

but rather "libtest_mimic supports test kinds and I can utilize that for slow tests and test ordering".

Ooh, right, pardon my ignorance... It's been a while since we discussed this... ^^'

torokati44 · 2024-07-31T23:09:59Z

However, it seems that nextest does not execute tests marked as slow which is weird... 🤔 I'm going to investigate it tomorrow, and mark this PR as a draft for now.

Is this still the case? If not, why is this still a draft?

kjarosh · 2024-08-01T00:40:50Z

Updated slow tests, removed draft status. All nextest issues should be fixed already

torokati44 · 2024-08-11T00:17:08Z

Maybe just randomizing them also achieves something like that? 😶
https://github.com/torokati44/ruffle/actions/runs/10334744158

Or, we could start with randomizing, then sort the slow ones into the first places... Just to be extra sure...?

By setting `is_slow = true`, a test may be marked as slow, which means that the duration of its execution is exceptionally long compared to other tests.

In order to minimize the duration of `cargo test`, slow tests should be executed at the beginning. This maximizes load on multithreaded CPUs by fully utilizing all threads and preventing late slow tests from stalling the whole suite.

The option `sleep_to_meet_frame_rate` artificially increases the duration of tests in order to run at realtime speed.

This patch marks all tests which execute significantly longer than an average test with `is_slow = true`.

kjarosh · 2024-09-17T14:34:17Z

As @torokati44 noticed... this PR works totally by accident, see:

kjarosh marked this pull request as draft April 11, 2024 00:02

kjarosh force-pushed the slow-tests branch from b0697cc to 0b31a73 Compare April 11, 2024 10:32

kjarosh force-pushed the slow-tests branch from 0b31a73 to 793126d Compare April 13, 2024 01:00

kjarosh marked this pull request as ready for review April 13, 2024 08:51

kjarosh force-pushed the slow-tests branch 2 times, most recently from 1a196a2 to 12713e7 Compare April 16, 2024 22:33

kjarosh mentioned this pull request Apr 18, 2024

tests: Improve test lookup and add support for test kinds #16031

Merged

kjarosh force-pushed the slow-tests branch from 12713e7 to 9a14542 Compare April 29, 2024 21:35

kjarosh marked this pull request as draft April 30, 2024 07:54

kjarosh force-pushed the slow-tests branch from 9a14542 to 6a151ac Compare July 27, 2024 17:34

torokati44 reviewed Jul 31, 2024

View reviewed changes

tests/framework/src/test.rs Outdated Show resolved Hide resolved

torokati44 reviewed Jul 31, 2024

View reviewed changes

tests/README.md Outdated Show resolved Hide resolved

kjarosh force-pushed the slow-tests branch from 6a151ac to cbc4c78 Compare July 31, 2024 20:29

torokati44 reviewed Jul 31, 2024

View reviewed changes

kjarosh force-pushed the slow-tests branch from cbc4c78 to c8a3c04 Compare August 1, 2024 00:39

kjarosh marked this pull request as ready for review August 1, 2024 00:39

torokati44 force-pushed the slow-tests branch from c8a3c04 to ac60d92 Compare August 11, 2024 00:17

torokati44 mentioned this pull request Aug 20, 2024

ci: Set the "render" kind on tests that need renderer, mark them as heavy for nextest #17574

Draft

torokati44 mentioned this pull request Sep 17, 2024

ci: Run regression tests in random order #17961

Closed

kjarosh added 6 commits September 17, 2024 14:02

tests: Add is_slow parameter to test.toml

bbe60d9

By setting `is_slow = true`, a test may be marked as slow, which means that the duration of its execution is exceptionally long compared to other tests.

tests: Schedule slow tests first

8ba2162

In order to minimize the duration of `cargo test`, slow tests should be executed at the beginning. This maximizes load on multithreaded CPUs by fully utilizing all threads and preventing late slow tests from stalling the whole suite.

tests: Assume sleep_to_meet_frame_rate tests are slow

d2c98c0

The option `sleep_to_meet_frame_rate` artificially increases the duration of tests in order to run at realtime speed.

tests: Mark slow tests

65ab44f

This patch marks all tests which execute significantly longer than an average test with `is_slow = true`.

tests: Add is_slow parameter to README

e99f09c

times

64c9d49

kjarosh force-pushed the slow-tests branch from ac60d92 to 64c9d49 Compare September 17, 2024 12:04

danielhjacobs added A-tests Area: Tests & Test Framework T-perf labels Sep 17, 2024

danielhjacobs added T-perf Type: Performance Improvements and removed T-perf labels Sep 17, 2024

kjarosh marked this pull request as draft October 8, 2024 13:05

torokati44 mentioned this pull request Oct 9, 2024

Bug: No way to affect the order in which test cases are run nextest-rs/nextest#1783

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: Mark some tests as slow and schedule them at the beginning #15950

tests: Mark some tests as slow and schedule them at the beginning #15950

kjarosh commented Apr 10, 2024 •

edited

Loading

Dinnerbone commented Apr 10, 2024

kjarosh commented Apr 11, 2024

torokati44 commented Apr 11, 2024

torokati44 commented Apr 11, 2024 •

edited

Loading

kjarosh commented Apr 11, 2024

torokati44 commented Apr 11, 2024 •

edited

Loading

torokati44 commented Apr 11, 2024 •

edited

Loading

kjarosh commented Apr 12, 2024 •

edited

Loading

torokati44 commented Apr 18, 2024

kjarosh commented Apr 18, 2024

adrian17 commented Apr 18, 2024

torokati44 commented Apr 18, 2024

kjarosh commented Apr 18, 2024

kjarosh commented Jul 27, 2024 •

edited

Loading

torokati44 commented Jul 27, 2024

kjarosh commented Jul 27, 2024

torokati44 commented Jul 27, 2024

kjarosh commented Jul 27, 2024

torokati44 Jul 31, 2024

kjarosh Jul 31, 2024

torokati44 Jul 31, 2024

torokati44 commented Jul 31, 2024

kjarosh commented Aug 1, 2024

torokati44 commented Aug 11, 2024 •

edited

Loading

kjarosh commented Sep 17, 2024

tests: Mark some tests as slow and schedule them at the beginning #15950

Are you sure you want to change the base?

tests: Mark some tests as slow and schedule them at the beginning #15950

Conversation

kjarosh commented Apr 10, 2024 • edited Loading

Dinnerbone commented Apr 10, 2024

kjarosh commented Apr 11, 2024

torokati44 commented Apr 11, 2024

torokati44 commented Apr 11, 2024 • edited Loading

kjarosh commented Apr 11, 2024

torokati44 commented Apr 11, 2024 • edited Loading

torokati44 commented Apr 11, 2024 • edited Loading

kjarosh commented Apr 12, 2024 • edited Loading

torokati44 commented Apr 18, 2024

kjarosh commented Apr 18, 2024

adrian17 commented Apr 18, 2024

torokati44 commented Apr 18, 2024

kjarosh commented Apr 18, 2024

kjarosh commented Jul 27, 2024 • edited Loading

torokati44 commented Jul 27, 2024

kjarosh commented Jul 27, 2024

torokati44 commented Jul 27, 2024

kjarosh commented Jul 27, 2024

torokati44 Jul 31, 2024

Choose a reason for hiding this comment

kjarosh Jul 31, 2024

Choose a reason for hiding this comment

torokati44 Jul 31, 2024

Choose a reason for hiding this comment

torokati44 commented Jul 31, 2024

kjarosh commented Aug 1, 2024

torokati44 commented Aug 11, 2024 • edited Loading

kjarosh commented Sep 17, 2024

kjarosh commented Apr 10, 2024 •

edited

Loading

torokati44 commented Apr 11, 2024 •

edited

Loading

torokati44 commented Apr 11, 2024 •

edited

Loading

torokati44 commented Apr 11, 2024 •

edited

Loading

kjarosh commented Apr 12, 2024 •

edited

Loading

kjarosh commented Jul 27, 2024 •

edited

Loading

torokati44 commented Aug 11, 2024 •

edited

Loading