Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config parameter 'coordinator_not_ready_retry_timeout_ms' #1209

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

zembunia
Copy link

@zembunia zembunia commented Sep 11, 2017

Rebased all former changes and resolved conflicts


This change is Reviewable

@dpkp
Copy link
Owner

dpkp commented Sep 16, 2017

I appreciate the motivation here, but I think a better solution would be to pass a timeout parameter to ensure_coordinator_known . That is the approach taken in the java client (see KAFKA-4426).

@zembunia
Copy link
Author

zembunia commented Oct 2, 2017

Yes, I can add a timeout parameter to ensure_coordinator_known and can call it with the desired value of the timeout. None can be the default value in order not to change the existing functionality. And the desired timeout value can be taken from the configuration.
Isn't it what I have implemented? I implemented this to only handle NodeNotReadyError because this is the only error I could produce. So I named the timeout as node_not_ready_retry_timeout_ms.

@zembunia zembunia changed the title Config parameter 'node_not_ready_retry_timeout_ms' Config parameter 'coordinator_not_ready_retry_timeout_ms' Mar 29, 2018
@zembunia
Copy link
Author

Can this be merged now?

@zembunia
Copy link
Author

zembunia commented Apr 3, 2018

@dpkp I have checked the changes made for KAFKA-4426 but I think the problem I mentioned here does not fit to that situation. KAFKA-4426 handles the closing of the KafkaConsumer in different scenario including unavailable coordinator. In this case, the consumer is polling without knowing if the coordinator is available or not, and the host code using the KafkaConsumer may decide to close the KafkaConsumer if it gets notified about the unavailable coordinator.
IMO Kafka is designed with resiliency assuming that there will always be a coordinator. Therefore the current implementation is correct for most of the cases. But in my case I want to handle the only coordinator being unavailable, and let the hosting code handle it once it is aware of the unavailable coordinator. Since the timeout in polling is not carried on to coordinator poll, it polls infinitely (until a coordinator is available. In my case the coordinator will not be available with the same connection string).
This is kind of a design issue, that's why I added another configuration parameter. I don't want to change the resiliency feature, but I want to add an option to get notified about the unavailable coordinator.

I faced the same problem in Java client. But in Java client there is a difference with the Python client in ConsumerCoordinator.poll; if the partitions are manually assigned the coordinator readiness check is skipped in the Java client.

@micwoj92
Copy link
Contributor

This branch has conflicts that must be resolved

@dpkp dpkp force-pushed the master branch 2 times, most recently from 9c8c8af to 8ebb14c Compare February 12, 2025 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants