Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(server): Fix flakiness in TestServer_ReloadWorkerTags #3810

Closed
wants to merge 5 commits into from

Conversation

moduli
Copy link
Collaborator

@moduli moduli commented Oct 6, 2023

This PR attempts to address flakiness in TestServer_ReloadWorkerTags by replacing hard-coded sleeps with some smarter polling similarly used in TestServer_ReloadInitialUpstreams. This test would occasionally fail for the following reason...

--- FAIL: TestServer_ReloadWorkerTags (121.34s)
                                               
    worker_tags_reload_test.go:126: 
        	Error Trace:	/home/runner/actions-runner/_work/boundary-enterprise/boundary-enterprise/internal/cmd/commands/server/worker_tags_reload_test.go:108
        	            				/home/runner/actions-runner/_work/boundary-enterprise/boundary-enterprise/internal/cmd/commands/server/worker_tags_reload_test.go:126
        	Error:      	Should be true
        	Test:       	TestServer_ReloadWorkerTags

The test is expecting the controller to get updated worker tags, but sometimes, it's not getting the expected values. The initial theory is that the hard-coded sleeps didn't wait long enough for the changes to propagate to the controller.

This attempts to address some flakiness by replacing hard-coded sleeps with some smarter polling similarly used in TestServer_ReloadInitialUpstreams.
@moduli moduli added the pr/no-milestone Ignores the Milestone Check label Oct 6, 2023
@github-actions github-actions bot added the core label Oct 6, 2023
require.True(ok)
tags := w.CanonicalTags()
v, ok := tags[key]
require.True(ok, fmt.Sprintf("EXPECTED: map[%s:%s], ACTUAL: %+v", key, values, tags))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated logging to get more information

Comment on lines +122 to +149
timeout := time.NewTimer(15 * time.Second)
poll := time.NewTimer(0)
var w *server.Worker
var lastStatusTime time.Time
serversRepo, err := testController.Controller().ServersRepoFn()
require.NoError(err)
pollController:
for {
select {
case <-timeout.C:
t.Fatalf("timeout waiting for worker to connect to controller")
case <-poll.C:
w, err = serversRepo.LookupWorkerByName(testController.Context(), "test")
require.NoError(err)
if w != nil {
switch {
case lastStatusTime.IsZero():
lastStatusTime = w.GetLastStatusTime().AsTime().Round(time.Second)
default:
if !lastStatusTime.Equal(w.GetLastStatusTime().AsTime().Round(time.Second)) {
timeout.Stop()
break pollController
}
}
}
poll.Reset(time.Second)
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from TestServer_ReloadInitialUpstreams

timeout.Reset(15 * time.Second)
poll.Reset(time.Second)
lastStatusTime = time.Time{}
startTime := time.Now()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slightly modified version of the loop. TestServer_ReloadInitialUpstreams is different in that it uses a second controller and can set lastStatusTime to zero. In this case, the idea is to wait for 2 status updates starting from this point in time.

@moduli
Copy link
Collaborator Author

moduli commented Oct 6, 2023

Just got a failure when testing this locally... back to the drawing board...

@moduli
Copy link
Collaborator Author

moduli commented Dec 14, 2023

Looks like there might be actually something in the boundary application causing this. More investigation is needed. Closing this PR.

@moduli moduli closed this Dec 14, 2023
@moduli moduli deleted the moduli-flaky-worktagsreloadtest-ICU-11163 branch February 13, 2024 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core pr/no-milestone Ignores the Milestone Check
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant