Abort pending deletion on IndicesService stop #123569

ywangd · 2025-02-27T07:19:00Z

When IndicesService is closed, the pending deletion may still be in progress due to indices removed before IndicesService gets closed. If the deletion stucks for some reason, it can stall the node shutdown. This PR aborts the pending deletion more promptly by not retry after IndicesService is stopped.

Resolves: #121717
Resolves: #121716
Resolves: #122119

When IndicesService is closed, the pending deletion may still be in progress due to indices removed before IndicesService gets closed. If the deletion stucks for some reason, it can stall the node shutdown. This PR aborts the pending deletion more promptly by not retry after IndicesService is closed. Resolves: elastic#121717, elastic#121716, elastic#122119

elasticsearchmachine · 2025-02-27T07:19:24Z

Hi @ywangd, I've created a changelog YAML for you.

elasticsearchmachine · 2025-02-27T07:19:24Z

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

DaveCTurner

Ah no I think we need a little more than this ideally, we need to abort the Thread.sleep(sleepTime); promptly too cos that could be many seconds of waiting. I'd suggest making it a timed wait on a CountDownLatch(1) instead of a bare Thread.sleep().

server/src/main/java/org/elasticsearch/indices/IndicesService.java

ywangd · 2025-02-27T09:41:02Z

Thanks for the review, David. I pushed 2228f89 based on your suggestion.

DaveCTurner

Looks good, a couple of comments on logging only.

DaveCTurner · 2025-02-27T10:06:12Z

server/src/main/java/org/elasticsearch/indices/IndicesService.java

@@ -1436,11 +1438,13 @@ public void processPendingDeletes(Index index, IndexSettings indexSettings, Time
                    }
                    if (remove.isEmpty() == false) {
                        logger.warn("{} still pending deletes present for shards {} - retrying", index, remove.toString());
-                        Thread.sleep(sleepTime);
+                        if (stopLatch.await(sleepTime, TimeUnit.MILLISECONDS)) {
+                            break;


Maybe log here too?

Added logging in 71146a2

DaveCTurner · 2025-02-27T10:07:11Z

server/src/main/java/org/elasticsearch/indices/IndicesService.java

                        sleepTime = Math.min(maxSleepTimeMs, sleepTime * 2); // increase the sleep time gradually
                        logger.debug("{} schedule pending delete retry after {} ms", index, sleepTime);
                    }
-                } while ((System.nanoTime() - startTimeNS) < timeout.nanos());
+                } while ((System.nanoTime() - startTimeNS) < timeout.nanos() && lifecycle.started());


Do we need this change now? Not really massively important, it just means that the logging is a little incorrect if we're stopped exactly between the stopLatch.await() timing out and getting to this check.

Yes with logging this becomes awkward. I deleted it in 71146a2
It's not really important either way.

DaveCTurner

LGTM

elasticsearchmachine · 2025-02-27T12:45:28Z

💔 Backport failed

Status	Branch	Result
❌	8.18	Commit could not be cherrypicked due to conflicts
❌	8.x	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 123569

ywangd · 2025-02-28T00:59:44Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x
✅	8.18

Questions ?

Please refer to the Backport tool documentation

When IndicesService is closed, the pending deletion may still be in progress due to indices removed before IndicesService gets closed. If the deletion stucks for some reason, it can stall the node shutdown. This PR aborts the pending deletion more promptly by not retry after IndicesService is stopped. Resolves: elastic#121717 Resolves: elastic#121716 Resolves: elastic#122119 (cherry picked from commit c7e7dbe) # Conflicts: # muted-tests.yml

When IndicesService is closed, the pending deletion may still be in progress due to indices removed before IndicesService gets closed. If the deletion stucks for some reason, it can stall the node shutdown. This PR aborts the pending deletion more promptly by not retry after IndicesService is stopped. Resolves: #121717 Resolves: #121716 Resolves: #122119 (cherry picked from commit c7e7dbe) # Conflicts: # muted-tests.yml

ywangd added >enhancement :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. auto-backport Automatically create backport pull requests when merged v8.18.1 v8.19.0 v9.1.0 labels Feb 27, 2025

ywangd requested a review from DaveCTurner February 27, 2025 07:19

elasticsearchmachine added the Team:Distributed Indexing Meta label for Distributed Indexing team label Feb 27, 2025

Update docs/changelog/123569.yaml

a1900c4

This was referenced Feb 27, 2025

[CI] DedicatedClusterSnapshotRestoreIT testRestoreShrinkIndex failing #121717

Closed

Revert changes to DistributedArchitectureGuide.md #123565

Merged

unmute

aea3695

DaveCTurner reviewed Feb 27, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/indices/IndicesService.java Outdated Show resolved Hide resolved

review comment

2228f89

ywangd requested a review from DaveCTurner February 27, 2025 09:41

DaveCTurner reviewed Feb 27, 2025

View reviewed changes

ywangd added 3 commits February 27, 2025 21:17

Merge remote-tracking branch 'origin/main' into es-121717-fix

e108381

logging

71146a2

Merge remote-tracking branch 'origin/main' into es-121717-fix

1e79bb8

ywangd requested a review from DaveCTurner February 27, 2025 10:32

DaveCTurner approved these changes Feb 27, 2025

View reviewed changes

ywangd added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Feb 27, 2025

ywangd changed the title ~~Abort pending deletion on IndicesService close~~ Abort pending deletion on IndicesService stop Feb 27, 2025

elasticsearchmachine merged commit c7e7dbe into elastic:main Feb 27, 2025
17 checks passed

ywangd deleted the es-121717-fix branch February 27, 2025 12:44

elasticsearchmachine added the backport pending label Feb 27, 2025

This was referenced Feb 28, 2025

[8.x] Abort pending deletion on IndicesService stop (#123569) #123668

Merged

[8.18] Abort pending deletion on IndicesService stop (#123569) #123669

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Abort pending deletion on IndicesService stop #123569

Abort pending deletion on IndicesService stop #123569

Uh oh!

ywangd commented Feb 27, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Feb 27, 2025

Uh oh!

elasticsearchmachine commented Feb 27, 2025

Uh oh!

DaveCTurner left a comment

Uh oh!

Uh oh!

ywangd commented Feb 27, 2025

Uh oh!

DaveCTurner left a comment

Uh oh!

DaveCTurner Feb 27, 2025

Uh oh!

ywangd Feb 27, 2025

Uh oh!

DaveCTurner Feb 27, 2025

Uh oh!

ywangd Feb 27, 2025

Uh oh!

DaveCTurner left a comment

Uh oh!

Uh oh!

elasticsearchmachine commented Feb 27, 2025

Uh oh!

ywangd commented Feb 28, 2025

Uh oh!

Uh oh!

Abort pending deletion on IndicesService stop #123569

Abort pending deletion on IndicesService stop #123569

Uh oh!

Conversation

ywangd commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Feb 27, 2025

Uh oh!

elasticsearchmachine commented Feb 27, 2025

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ywangd commented Feb 27, 2025

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

ywangd Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

ywangd Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elasticsearchmachine commented Feb 27, 2025

💔 Backport failed

Uh oh!

ywangd commented Feb 28, 2025

💚 All backports created successfully

Questions ?

Uh oh!

Uh oh!

ywangd commented Feb 27, 2025 •

edited

Loading