Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce TestInitDatabaseService flakiness #49477

Conversation

gabrielcorado
Copy link
Contributor

Related to #42963.

The problem is likely related to the test cleanup (which removes the temp dir) that happens before the Teleport process entirely shuts down.

This PR updates the test to use the WaitForSignals and wait for it to return before letting the test clean up. This ensures the Teleport process is closed in both cases (with and without errors). (This function is the same as when we start the Teleport process from CLI.)

In addition, the assertion about the server shutdown has been moved to the test body function (instead of the t.Cleanup function).

@gabrielcorado gabrielcorado added the no-changelog Indicates that a PR does not require a changelog entry label Nov 26, 2024
Copy link

This pull request is automatically being deployed by Amplify Hosting (learn more).

Access this pull request here: https://pr-49477.d3pp5qlev8mo18.amplifyapp.com

Copy link
Contributor

@codingllama codingllama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the flaky fix!

@@ -1842,6 +1847,9 @@ func TestInitDatabaseService(t *testing.T) {
exitPayload, ok := event.Payload.(ExitEventPayload)
require.True(t, ok, "expected ExitEventPayload but got %T", event.Payload)
require.Equal(t, "db.init", exitPayload.Service.Name())
// Database service init is a critical service, meaning failures on
// it should cause the process to exit with error.
require.Error(t, eg.Wait())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the requires above fails what will make process stop? Would this line block up to 20s in that case?

(If you take my comment about the t.Cleanup I think this gets solved too.)


if !test.expectErr {
_, err := process.WaitForEvent(ctx, TeleportReadyEvent)
require.NoError(t, err)
require.NoError(t, process.Close())
// Expect Teleport to shutdown without reporting any issue.
require.NoError(t, eg.Wait())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a t.Cleanup so it runs even if the assertions before it fail?

(Same for the Wait in the "error" code branch.)

Ie:

			t.Cleanup(func() {
				cancel()
				assert.NoError(t, eg.Wait(), "eg.Wait errored")
			})

Copy link
Contributor Author

@gabrielcorado gabrielcorado Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should always call the eg.Wait() to ensure the Teleport state regardless of the assertions' results.

Just a brief context on this change: The main reason for moving this away from t.Cleanup was that the assertion on the WaitForSignals (which is on the errgroup) differs depending on the value on test.expectErr. But I guess having two t.Cleanup definitions per "test branch" is okay. I'll update the code, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cleanup doesn't need to care about the WaitForSignals, right? Just the cancel/Wait?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mean that the WaitForSignals is done on the errgroup goroutine so that it will be the error reported by eg.Wait

Copy link
Contributor

@codingllama codingllama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Corado!

@public-teleport-github-review-bot public-teleport-github-review-bot bot removed the request for review from rosstimothy December 2, 2024 20:11
@rosstimothy
Copy link
Contributor

@gabrielcorado is this good to go?

@gabrielcorado gabrielcorado added this pull request to the merge queue Dec 11, 2024
Merged via the queue into master with commit 12f0d12 Dec 11, 2024
41 of 43 checks passed
@gabrielcorado gabrielcorado deleted the gabrielcorado/improve-test-init-database-service-assertions branch December 11, 2024 19:22
@public-teleport-github-review-bot

@gabrielcorado See the table below for backport results.

Branch Result
branch/v16 Failed
branch/v17 Create PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/branch/v16 backport/branch/v17 no-changelog Indicates that a PR does not require a changelog entry size/sm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants