Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stuck call to Dial when calling Stop on the Initiator #654

Merged

Conversation

abronan
Copy link
Contributor

@abronan abronan commented Jul 15, 2024

This commit fixes an issue when calling Start() and then Stop() on the initiator while the connection is likely to fail and timeout. Sending a SIGTERM and calling Stop() will block since Dial will attempt to connect until it times out and returns on the waitForReconnectInterval call.

We mitigate this problem by using a proxy.ContextDialer and allowing to pass a context with cancellation method to the dialer.DialContext method on handleConnection.

We need to start a routine listening for the stopChan in order to call cancel() explicitly and thus exit the DialContext method.

Note: there are scenarios where cancel() will be called twice, this choice was made in order to avoid a larger refactor of the reconnect logic, but since the call to cancel() is idempotent, this doesn't lead to any adverse effect.

fixes #653

@abronan
Copy link
Contributor Author

abronan commented Jul 15, 2024

I have tested this locally using a program sending a SIGTERM and calling initiator.Stop() on a connection that fails, but would love to hear about an approach to test this properly within the repository. There didn't seem to be existing tests for initiator.go (unless I'm mistaken). As of now, it doesn't break any of the tests in dialer_test.go from what I could see.

This commit fixes an issue when calling Start() and then
Stop() on the initiator while the connection is likely
to fail and timeout. Calling initiator.Stop() will block since
Dial will attempt to connect until it times out and returns
on the 'waitForReconnectInterval' call.

We mitigate this problem by using a proxy.ContextDialer and
allowing to pass a context with cancellation method to the
dialer.DialContext method on 'handleConnection'.

We need to start a routine listening for the stopChan in
order to call cancel() explicitly and thus exit the DialContext
method.

Note: there are scenarios where cancel() will be called twice,
this choice was made in order to avoid a larger refactor of the
reconnect logic, but since the call to cancel() is idempotent,
this doesn't lead to any adverse effect.

fixes quickfixgo#653

Signed-off-by: Alexandre Beslic <ab21c8b9f7@abronan.com>
@abronan abronan force-pushed the mitigate-stuck-dial-on-initiator-stop branch from 5121af8 to 3939268 Compare July 18, 2024 19:07
@ackleymi ackleymi merged commit e92fa68 into quickfixgo:main Aug 9, 2024
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calling Stop() on Initiator blocks on Dial method
2 participants