Skip to content

Conversation

@michal-shalev
Copy link
Contributor

@michal-shalev michal-shalev commented Oct 7, 2025

What?

Fix wireup issue in test_ucp_device initialization.

Why?

CI is failing on #10921 which adds error logging to device APIs. The test's init() function was waiting for endpoint wireup by calling ucp_device_mem_list_create() with NULL params in a loop. This caused multiple error logs ("ep didn't complete wireup" and "check parameters failed") to be printed during normal test initialization. Without error logging these were silent, but PR #10921's improved logging now exposes these errors, causing test failures.

How?

Used flush_ep() to ensure endpoint wireup is fully complete before tests create device memory lists.

@michal-shalev michal-shalev self-assigned this Oct 7, 2025
while (ucp_device_mem_list_create(sender().ep(), NULL, &handle) ==
UCS_ERR_NOT_CONNECTED) {
/* Wait for endpoint to complete wireup by checking the REMOTE_CONNECTED flag */
while (!(sender().ep()->flags & UCP_EP_FLAG_REMOTE_CONNECTED)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if logging prevents using invalid mem list create call, maybe we can try to create a valid one/destroy, to avoid using internal flags. also that is the call being needed in the tests so it could make sense to check if it works in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used flush_ep() instead

@michal-shalev michal-shalev requested a review from tvegas1 October 7, 2025 13:40
@michal-shalev michal-shalev force-pushed the fix-test-ucp-device-wireup-sync branch from 6d229a2 to 191803f Compare October 7, 2025 13:44
progress();
}
/* Flush to ensure endpoint wireup is complete before device API usage */
flush_ep(sender());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think flush can return ok even when connection has not happened, so if we use flush, we should probably send first a dummy request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants