-
Notifications
You must be signed in to change notification settings - Fork 42
csiaddonsnode: Add retry with exponential backoff for connections #924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
csiaddonsnode: Add retry with exponential backoff for connections #924
Conversation
b007a75 to
b5acabf
Compare
|
|
||
| const ( | ||
| CSIAddonsNodeConnectionMaxRetries = 3 | ||
| CSIAddonsNodeConnectionSleepInterval = 2 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a little short, maybe? If there is a temporary network issue, the object may get deleted too early?
Consider doing an exponential backoff like other reties in Kubernetes do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| var connErr error | ||
| for i := range CSIAddonsNodeConnectionMaxRetries { | ||
| logger.Info("Connecting to sidecar", "attempt", i) | ||
| newConn, connErr = connection.NewConnection(ctx, endPoint, nodeID, driverName, csiAddonsNode.Namespace, csiAddonsNode.Name, r.EnableAuth) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered doing the retry in the connection itself? That would make the controller a little simpler, it should not need to care about reconnecting too much. Only if there is a particular error returned, the CSIAddonsNode CR could be deleted by the controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds great. Components other than this reconciler would also benefit from the said change. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon further consideration, retries inside connection package will be blocking and might starve the work queue.
I implemented the backoff using RequeueAfter and am tracking state in annotation and status.
b5acabf to
714a22d
Compare
This patch adds the functionality to retry for a maximum of `maxRetries` to connect to the sidecar. If the connection attempt is not successful, the object is considered obsolete and is deleted. The retry is tracked inside an annotation(`connRetryAnnotation`) and also reflected in object's status. These transient artifacts are cleaned up once a connection is established. Signed-off-by: Niraj Yadav <niryadav@redhat.com>
714a22d to
31bdf4c
Compare
| backoff := baseRetryDelay * time.Duration(math.Pow(2, float64(currentRetries))) | ||
| logger.Info("Requeuing request for attempting the connection again", "backoff", backoff) | ||
|
|
||
| return ctrl.Result{RequeueAfter: backoff}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 1st retry is after 0 seconds? That probably isn't what you want. Should getRetryCountFromAnnotation() really return 0 if the annotation is not set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first retry is baseRetryDelay * 2 ^ valFromAnnotation.
Hence, 3 * 2^0 leads to 3 * 1, as intended.
|
What is the process to get a deleted CSIAddonsNode back in case of a longer network interruption? Can that be automated too? |
This patch adds the functionality to retry for a maximum of
maxRetriesto connect to the sidecar.If the connection attempt is not successful, the object is considered
obsolete and is deleted.
The retry is tracked inside an annotation(
connRetryAnnotation) and alsoreflected in object's status.
These transient artifacts are cleaned up once a connection is
established.