Skip to content

Abort pending deletion on IndicesService stop #123569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 27, 2025

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented Feb 27, 2025

When IndicesService is closed, the pending deletion may still be in progress due to indices removed before IndicesService gets closed. If the deletion stucks for some reason, it can stall the node shutdown. This PR aborts the pending deletion more promptly by not retry after IndicesService is stopped.

Resolves: #121717
Resolves: #121716
Resolves: #122119

When IndicesService is closed, the pending deletion may still be in
progress due to indices removed before IndicesService gets closed. If
the deletion stucks for some reason, it can stall the node shutdown.
This PR aborts the pending deletion more promptly by not retry after
IndicesService is closed.

Resolves: elastic#121717, elastic#121716, elastic#122119
@ywangd ywangd added >enhancement :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. auto-backport Automatically create backport pull requests when merged v8.18.1 v8.19.0 v9.1.0 labels Feb 27, 2025
@ywangd ywangd requested a review from DaveCTurner February 27, 2025 07:19
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Indexing Meta label for Distributed Indexing team label Feb 27, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @ywangd, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no I think we need a little more than this ideally, we need to abort the Thread.sleep(sleepTime); promptly too cos that could be many seconds of waiting. I'd suggest making it a timed wait on a CountDownLatch(1) instead of a bare Thread.sleep().

@ywangd
Copy link
Member Author

ywangd commented Feb 27, 2025

Thanks for the review, David. I pushed 2228f89 based on your suggestion.

@ywangd ywangd requested a review from DaveCTurner February 27, 2025 09:41
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a couple of comments on logging only.

@@ -1436,11 +1438,13 @@ public void processPendingDeletes(Index index, IndexSettings indexSettings, Time
}
if (remove.isEmpty() == false) {
logger.warn("{} still pending deletes present for shards {} - retrying", index, remove.toString());
Thread.sleep(sleepTime);
if (stopLatch.await(sleepTime, TimeUnit.MILLISECONDS)) {
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe log here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added logging in 71146a2

sleepTime = Math.min(maxSleepTimeMs, sleepTime * 2); // increase the sleep time gradually
logger.debug("{} schedule pending delete retry after {} ms", index, sleepTime);
}
} while ((System.nanoTime() - startTimeNS) < timeout.nanos());
} while ((System.nanoTime() - startTimeNS) < timeout.nanos() && lifecycle.started());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this change now? Not really massively important, it just means that the logging is a little incorrect if we're stopped exactly between the stopLatch.await() timing out and getting to this check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes with logging this becomes awkward. I deleted it in 71146a2
It's not really important either way.

@ywangd ywangd requested a review from DaveCTurner February 27, 2025 10:32
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ywangd ywangd added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Feb 27, 2025
@ywangd ywangd changed the title Abort pending deletion on IndicesService close Abort pending deletion on IndicesService stop Feb 27, 2025
@elasticsearchmachine elasticsearchmachine merged commit c7e7dbe into elastic:main Feb 27, 2025
17 checks passed
@ywangd ywangd deleted the es-121717-fix branch February 27, 2025 12:44
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.18 Commit could not be cherrypicked due to conflicts
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 123569

@ywangd
Copy link
Member Author

ywangd commented Feb 28, 2025

💚 All backports created successfully

Status Branch Result
8.x
8.18

Questions ?

Please refer to the Backport tool documentation

ywangd added a commit to ywangd/elasticsearch that referenced this pull request Feb 28, 2025
When IndicesService is closed, the pending deletion may still be in
progress due to indices removed before IndicesService gets closed. If
the deletion stucks for some reason, it can stall the node shutdown.
This PR aborts the pending deletion more promptly by not retry after
IndicesService is stopped.

Resolves: elastic#121717 Resolves: elastic#121716  Resolves: elastic#122119
(cherry picked from commit c7e7dbe)

# Conflicts:
#	muted-tests.yml
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Feb 28, 2025
When IndicesService is closed, the pending deletion may still be in
progress due to indices removed before IndicesService gets closed. If
the deletion stucks for some reason, it can stall the node shutdown.
This PR aborts the pending deletion more promptly by not retry after
IndicesService is stopped.

Resolves: elastic#121717 Resolves: elastic#121716  Resolves: elastic#122119
(cherry picked from commit c7e7dbe)

# Conflicts:
#	muted-tests.yml
elasticsearchmachine pushed a commit that referenced this pull request Feb 28, 2025
When IndicesService is closed, the pending deletion may still be in
progress due to indices removed before IndicesService gets closed. If
the deletion stucks for some reason, it can stall the node shutdown.
This PR aborts the pending deletion more promptly by not retry after
IndicesService is stopped.

Resolves: #121717 Resolves: #121716  Resolves: #122119
(cherry picked from commit c7e7dbe)

# Conflicts:
#	muted-tests.yml
elasticsearchmachine pushed a commit that referenced this pull request Feb 28, 2025
When IndicesService is closed, the pending deletion may still be in
progress due to indices removed before IndicesService gets closed. If
the deletion stucks for some reason, it can stall the node shutdown.
This PR aborts the pending deletion more promptly by not retry after
IndicesService is stopped.

Resolves: #121717 Resolves: #121716  Resolves: #122119
(cherry picked from commit c7e7dbe)

# Conflicts:
#	muted-tests.yml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport pending :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. >enhancement Team:Distributed Indexing Meta label for Distributed Indexing team v8.18.1 v8.19.0 v9.1.0
Projects
None yet
3 participants