Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry resolving on errors #43

Merged
merged 2 commits into from
Oct 21, 2024
Merged

retry resolving on errors #43

merged 2 commits into from
Oct 21, 2024

Conversation

fho
Copy link
Collaborator

@fho fho commented Oct 18, 2024

prevent calling ReportError with the same error in a row

The resolver already prevents that UpdateState is called for the same
address multiples times in a row, do the same when reporting errors.
Only call ReportError() if the same error hasn't been reported in the
last call and UpdateState hasn't been called last.

This prevents that we unnecessary report the same state multiple times
and trigger the same events for the connection.

The code is refactored to store the last reported state in the
consulResolver struct and reporting addresses and errors is moved to
separate methods.


retry resolving on errors

Some grpc-go load balancing policies, including round_robin, do not take
care about retrying resolving after an error is reported1.

This causes that when querying consul fails the grpc connection can hang
in a state without addresses until the connection idle timeout expires,
despite consul became reachable.

To prevent it, retry periodically querying consul when it fails.
This is done by creating a timer that calls ResolveNow() after it
expires.
The backoff implementation from commit f1f1a15 is recovered and modified
to randomize subtracting or adding the jitter from the delay, use
smaller pauses and returning the default implementation from a
constructor instead of defining it as package variable.


Closes: #41

Footnotes

  1. https://github.com/grpc/grpc-go/issues/7729

@fho fho self-assigned this Oct 18, 2024
@fho fho changed the title consul/backoff.go: retry resolving on errors Oct 18, 2024
consul/resolver.go Outdated Show resolved Hide resolved
@fho fho marked this pull request as ready for review October 18, 2024 16:19
@fho fho requested a review from FranciscoKurpiel October 18, 2024 16:19
@fho fho marked this pull request as draft October 21, 2024 08:26
@fho fho force-pushed the retries branch 3 times, most recently from f02af87 to 3681fbc Compare October 21, 2024 11:18
fho added 2 commits October 21, 2024 13:21
Some grpc-go load balancing policies, including round_robin, do not take
care about retrying resolving after an error is reported[^1].

This causes that when querying consul fails the grpc connection can hang
in a state without addresses until the connection idle timeout expires,
despite consul became reachable.

To prevent it, retry periodically querying consul when it fails.
This is done by creating a timer that calls ResolveNow() after it
expires.
The backoff implementation from commit f1f1a15 is recovered and modified
to randomize subtracting or adding the jitter from the delay, use
smaller pauses and returning the default implementation from a
constructor instead of defining it as package variable.

[^1]: grpc/grpc-go#7729
The resolver already prevents that UpdateState is called for the same
address multiples times in a row, do the same when reporting errors.
Only call ReportError() if the same error hasn't been reported in the
last call and UpdateState hasn't been called last.

This prevents that we unnecessary report the same state multiple times
and trigger the same events for the connection.

The code is refactored to store the last reported state in the
consulResolver struct and reporting addresses and errors is moved to
separate methods.
@fho fho marked this pull request as ready for review October 21, 2024 11:22
@fho fho merged commit 07f509f into main Oct 21, 2024
5 checks passed
@fho fho deleted the retries branch October 21, 2024 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

implement retries on resolver errors
1 participant