Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Add test retry logic for flaky tests #9218

Merged
merged 25 commits into from
Aug 11, 2024

Conversation

dplewis
Copy link
Member

@dplewis dplewis commented Jul 17, 2024

Pull Request

Issue

Flaky tests require a lot more effort to merge PRs. Test would have to be re-ran until the flaky tests passes or they could be ignored if you know which tests are flaky. Ideally every flaky test should be fixed as they are found which hasn't been the case.

Closes: #8654

Approach

  • Add the ability to retry well known flaky tests (tests that randomly fail)
  • Identify test by name
  • Retries 3 times if flaky test fails
  • Remove function scoping this from test suite RegexVulnerabilities.spec

Copy link

Thanks for opening this pull request!

@dplewis dplewis changed the title feat: Add test retry logic for flaky tests ci: Add test retry logic for flaky tests Jul 17, 2024
Copy link

codecov bot commented Jul 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.76%. Comparing base (453a987) to head (e6b6fd1).
Report is 5 commits behind head on alpha.

Additional details and impacted files
@@           Coverage Diff           @@
##            alpha    #9218   +/-   ##
=======================================
  Coverage   93.76%   93.76%           
=======================================
  Files         184      184           
  Lines       14715    14715           
=======================================
  Hits        13797    13797           
  Misses        918      918           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dplewis dplewis requested a review from a team July 17, 2024 21:41
Copy link
Member

@mtrezza mtrezza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain how this works?

@dplewis
Copy link
Member Author

dplewis commented Jul 17, 2024

Override the Jasmine Spec class, replace the original test function with retrying the original test function

@mtrezza
Copy link
Member

mtrezza commented Jul 17, 2024

How are flaky tests identified, so they get fixed at some point? What's the purpose of flakyTest.json?

What's the implication of "Identify test by name"? What happens if 2 tests have the same name?

@dplewis
Copy link
Member Author

dplewis commented Jul 17, 2024

How are flaky tests identified, so they get fixed at some point? What's the purpose of flakyTest.json?

From this point forward any test that randomly fails for no reason is a flaky test. If you find such a test add it to flakyTest.json

What's the implication of "Identify test by name"? What happens if 2 tests have the same name?

If any test with that name fails will get retried. The only implication is if the name gets changed. If the name gets changed hopefully they are fixing the flaky test.

I could set it up to auto retry any failed test. The only problem with that is local development but I can set it up to retry on CI only.

@mtrezza You are the main reviewer so whats easier for you? Do you currently keep track of randomly failing tests or just rerun everytime?

@mtrezza
Copy link
Member

mtrezza commented Jul 18, 2024

If we undertake a targeted effort to fix flaky tests we track them via an issue, like we did in #7180. There is currently no such open issue where we actively track them, but I don't think that's necessary as we have significantly less flaky tests today than we had back then - also thanks to your efforts.

Why don't we use test IDs to identify flaky tests? If we see a test is flaky, we assign an ID via it_id and then add it to the flaky list, similar to how the test exclusion list works.

@dplewis
Copy link
Member Author

dplewis commented Jul 18, 2024

Why don't we use test IDs to identify flaky tests? If we see a test is flaky, we assign an ID via it_id and then add it to the flaky list, similar to how the #8714 works.

I tried that already. I can't get the id as it happens before the test spec is ran and there is no way to pass it through.

@mtrezza
Copy link
Member

mtrezza commented Jul 18, 2024

For example, in https://github.com/parse-community/parse-server/actions/runs/9983595053/job/27591447103 I would identify this flaky test:

  1. ParseLiveQuery handle invalid websocket payload length
  • Error: Timeout - Async function did not complete within 200000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL)

So I'll assign ID 4ccc9508-ae6a-46ec-932a-9f5e49ab3b9e to it in #9205.

And I would add that ID to the flaky list.json, right?

-- Edit, just read your previous comment; that's too bad. Maybe we can modify the it_id in the future so that the ID becomes accessible. And we use the names for now. I just don't think we will / want / can enforce that names won't change, because of typos, rephrasing, reorganizing, etc, so that's a weak spot we'd accept for now.

@dplewis
Copy link
Member Author

dplewis commented Jul 18, 2024

You would add ParseLiveQuery handle invalid websocket payload length to the list also why is that timeout so long? 200000ms?

@mtrezza
Copy link
Member

mtrezza commented Jul 18, 2024

It's idempotency where we simulate a TTL index:

/** Enable TTL expiration simulated by removing entry instead of waiting for MongoDB TTL monitor which
runs only every 60s, so it can take up to 119s until entry removal - ain't nobody got time for that */

Haven't looked at this for a while but I vaguely remember that I've set it to that value on purpose; it has something to do with the comment above.

@mtrezza
Copy link
Member

mtrezza commented Jul 18, 2024

At what point should we turn on the test randomizer again?

@dplewis
Copy link
Member Author

dplewis commented Jul 18, 2024

At what point should we turn on the test randomizer again?

Jasmine 5.0.0 released Running Spec in Parallel that would require randomizer to be turned on. To upgrade Jasmine we would have to give the test suite some TLC as we are mixing async and done()

mtrezza
mtrezza previously approved these changes Jul 18, 2024
Copy link
Member

@mtrezza mtrezza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@mtrezza
Copy link
Member

mtrezza commented Jul 18, 2024

But we had the randomizer turned off for flakiness investigation, right? So if we now fixed a lot of these flaky tests, plus we have this retry logic, shouldn't we turn it back on?

@dplewis
Copy link
Member Author

dplewis commented Jul 18, 2024

Yeah we can turn it back on in a separate PR.

@mtrezza
Copy link
Member

mtrezza commented Jul 18, 2024

Isn't turning on the randomizer part of this retry strategy? It would also be interesting to see how this tool behaves with the randomizer turned on.

Before we merge this we should find a process for how to use this tool, what's the criteria for adding a test to the list and how to deal with flaky tests once they are obscured by this tool. For example, there is flakiness that isn't related to a specific test but is a result of previous tests. So we may end up adding more and more tests to the flaky list without really solving anything.

@mtrezza
Copy link
Member

mtrezza commented Jul 18, 2024

Is there any way of knowing, how many retries are required for the flaky tests to pass? Some stats would be interesting to know how efficient the approach is.

@dplewis
Copy link
Member Author

dplewis commented Jul 19, 2024

@mtrezza It now outputs how many times a flaky test retried to pass

@mtrezza
Copy link
Member

mtrezza commented Jul 20, 2024

Looks good to me. As I mentioned earlier, I'm still somewhat unsure how this tool is intended to be used. A flaky test may well indicate a bug and not just be a test issue. So we'd need to define when to add a test to the list.

Adding a test to the list for retrying only makes sense if we work on fixing them once they are added, otherwise we are effectively weakening our CI quality. The annoyance of a failing CI is what drove fixing flaky tests in the past. This tool creates the convenience of hiding the flakiness, but that may undermine the drive to fix it.

So roughly, the rules could be:

  • A test is flaky if it sometimes fails on our (pre-)release branches, not in a PR branch, as the flakiness may be a result of the changes in the PR that is still under development.
  • After adding a test to the flaky list, a GH issues needs to be opened to fix the test so that it gets removed from the list as soon as possible.

@mtrezza
Copy link
Member

mtrezza commented Jul 21, 2024

@dplewis how should we treat this test issue?

https://github.com/parse-community/parse-server/actions/runs/10028582673/job/27715676644#step:8:1

It seems that after shutdown LiveQuery the subsequent tests all failed. This looks like an issue with the test logic, so we wouldn't add anything to the flaky list, right?

@mtrezza mtrezza merged commit 9fd7070 into parse-community:alpha Aug 11, 2024
27 checks passed
@parseplatformorg
Copy link
Contributor

🎉 This change has been released in version 7.3.0-alpha.7

@parseplatformorg parseplatformorg added the state:released-alpha Released as alpha version label Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state:released-alpha Released as alpha version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add test retry logic for flaky tests
3 participants