Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue 1272][connection] Attempt to avoid deadlock during reconnection #1273

Merged
merged 1 commit into from
Sep 11, 2024

Conversation

Gilthoniel
Copy link
Contributor

@Gilthoniel Gilthoniel commented Aug 28, 2024

Fixes #1272

Motivation

Producers and consumers register themselves to the connection to be notified when it gets closed. Even though the callback push the events in a channel, it can get stuck and the connection pool is locked which prevents any other caller to get a connection.

Modifications

This PR makes sure that the callback never blocks.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is already covered by existing tests.

Does this pull request potentially affect one of the following parts:

none

nodece
nodece previously approved these changes Sep 4, 2024
@nodece
Copy link
Member

nodece commented Sep 4, 2024

We can use a goroutine to call the ConnectionClosed method in the connection pool.

What do you think of that?

@Gilthoniel
Copy link
Contributor Author

It does not really change the issue that multiple reconnects will be attempted relentlessly until all callbacks have been processed. As mentioned in the issue, we can receive several dozens or hundreds of callback. It does not seem right to reconnect hundred of times then.

What is your concern with this fix ?

@nodece
Copy link
Member

nodece commented Sep 4, 2024

If the broker is not ready, the client will retry to connect to the broker. Once the broker is ready, the client can have an available connection to create the producer/consumer, when the producer/consumer is created, and then the connection pool listens to the producer/consumer.

I don't know why you can receive several dozens or hundreds of callback.

If your network environment is unstable? or if the broker is changing its status?

What is your concern with this fix ?

You are dropping the reconnect request.

@Gilthoniel
Copy link
Contributor Author

It happens during a restart of everything so it's very unstable yes, brokers are probably changing their status.

I understand your worry but here it ensures that all reconnects will be handled and it only drops duplicate. Also in normal situation, as you said, you get only one reconnect.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lhotari
Copy link
Member

lhotari commented Sep 10, 2024

Great contributions @Gilthoniel! Thank you

@RobertIndie RobertIndie merged commit 98dc8d4 into apache:master Sep 11, 2024
7 checks passed
@RobertIndie RobertIndie added this to the v0.14.0 milestone Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deadlock in connection pool
4 participants