-
Notifications
You must be signed in to change notification settings - Fork 160
Listener: Fix error handling for metadata exchange over sockets #854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>
/build |
👋 Hi ovidiusm! Thank you for contributing to ai-dynamo/nixl. Your PR reviewers will review your contribution then trigger the CI to test your changes. 🚀 |
Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>
Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>
Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>
/build |
Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>
/build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe move to reuse try { sendCommMessage() } catch () {}
in separate function and use:
if (sendCommMessageChecked(client->second, "LOAD"+myID) {
NIXL_ERR << "Error..";
break;
}
/build |
/build |
/build |
1 similar comment
/build |
What?
Socket metadata exchange helper functions are throwing exceptions, and some of them are not caught, leading to application crash in case of peer closing connection.
This is different from ETCD metadata exchange, which is handled graciously and errors are logged without throwing exceptions to the application.
This PR adds handling code for the thrown exceptions.
Why?
Crash seen in CI in flaky test: