-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Abort pending deletion on IndicesService stop #123569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When IndicesService is closed, the pending deletion may still be in progress due to indices removed before IndicesService gets closed. If the deletion stucks for some reason, it can stall the node shutdown. This PR aborts the pending deletion more promptly by not retry after IndicesService is closed. Resolves: elastic#121717, elastic#121716, elastic#122119
Hi @ywangd, I've created a changelog YAML for you. |
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah no I think we need a little more than this ideally, we need to abort the Thread.sleep(sleepTime);
promptly too cos that could be many seconds of waiting. I'd suggest making it a timed wait on a CountDownLatch(1)
instead of a bare Thread.sleep()
.
server/src/main/java/org/elasticsearch/indices/IndicesService.java
Outdated
Show resolved
Hide resolved
Thanks for the review, David. I pushed 2228f89 based on your suggestion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, a couple of comments on logging only.
@@ -1436,11 +1438,13 @@ public void processPendingDeletes(Index index, IndexSettings indexSettings, Time | |||
} | |||
if (remove.isEmpty() == false) { | |||
logger.warn("{} still pending deletes present for shards {} - retrying", index, remove.toString()); | |||
Thread.sleep(sleepTime); | |||
if (stopLatch.await(sleepTime, TimeUnit.MILLISECONDS)) { | |||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe log here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added logging in 71146a2
sleepTime = Math.min(maxSleepTimeMs, sleepTime * 2); // increase the sleep time gradually | ||
logger.debug("{} schedule pending delete retry after {} ms", index, sleepTime); | ||
} | ||
} while ((System.nanoTime() - startTimeNS) < timeout.nanos()); | ||
} while ((System.nanoTime() - startTimeNS) < timeout.nanos() && lifecycle.started()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this change now? Not really massively important, it just means that the logging is a little incorrect if we're stopped exactly between the stopLatch.await()
timing out and getting to this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes with logging this becomes awkward. I deleted it in 71146a2
It's not really important either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
When IndicesService is closed, the pending deletion may still be in progress due to indices removed before IndicesService gets closed. If the deletion stucks for some reason, it can stall the node shutdown. This PR aborts the pending deletion more promptly by not retry after IndicesService is stopped. Resolves: elastic#121717 Resolves: elastic#121716 Resolves: elastic#122119 (cherry picked from commit c7e7dbe) # Conflicts: # muted-tests.yml
When IndicesService is closed, the pending deletion may still be in progress due to indices removed before IndicesService gets closed. If the deletion stucks for some reason, it can stall the node shutdown. This PR aborts the pending deletion more promptly by not retry after IndicesService is stopped. Resolves: elastic#121717 Resolves: elastic#121716 Resolves: elastic#122119 (cherry picked from commit c7e7dbe) # Conflicts: # muted-tests.yml
When IndicesService is closed, the pending deletion may still be in progress due to indices removed before IndicesService gets closed. If the deletion stucks for some reason, it can stall the node shutdown. This PR aborts the pending deletion more promptly by not retry after IndicesService is stopped. Resolves: #121717 Resolves: #121716 Resolves: #122119 (cherry picked from commit c7e7dbe) # Conflicts: # muted-tests.yml
When IndicesService is closed, the pending deletion may still be in progress due to indices removed before IndicesService gets closed. If the deletion stucks for some reason, it can stall the node shutdown. This PR aborts the pending deletion more promptly by not retry after IndicesService is stopped. Resolves: #121717 Resolves: #121716 Resolves: #122119 (cherry picked from commit c7e7dbe) # Conflicts: # muted-tests.yml
When IndicesService is closed, the pending deletion may still be in progress due to indices removed before IndicesService gets closed. If the deletion stucks for some reason, it can stall the node shutdown. This PR aborts the pending deletion more promptly by not retry after IndicesService is stopped.
Resolves: #121717
Resolves: #121716
Resolves: #122119