[CI] ReactiveStorageIT testScaleWhileShrinking failing #122119

elasticsearchmachine · 2025-02-08T05:22:14Z

Build Scans:

Reproduction Line:

gradlew ":x-pack:plugin:autoscaling:internalClusterTest" --tests "org.elasticsearch.xpack.autoscaling.storage.ReactiveStorageIT.testScaleWhileShrinking" -Dtests.seed=A8072C4149FB3248 -Dtests.locale=lg-UG -Dtests.timezone=America/Boa_Vista -Druntime.java=23

Applicable branches:
main

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

java.lang.IllegalStateException: Some shards are still open after the threadpool terminated. Something is leaking index readers or store references.

Issue Reasons:

[main] 3 failures in test testScaleWhileShrinking (0.4% fail rate in 761 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2025-02-08T05:22:37Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

nicktindall · 2025-02-11T03:42:35Z

Marking this as medium because it's suggesting we have a resource leak.

nicktindall · 2025-02-11T03:45:40Z

Updated to low, it appears to be a windows thing

elasticsearchmachine · 2025-02-11T03:55:41Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

nicktindall · 2025-02-11T03:56:16Z

Assigned to core-infra off the back of conversation on #121716 (comment)

Please feel free to send back if you think this is not the same root cause

nicktindall · 2025-02-11T04:03:55Z

Also includes

1> java.io.IOException: could not remove the following files (in the order of attempts):
  1>    C:\bk\x-pack\plugin\autoscaling\build\testrun\internalClusterTest\temp\org.elasticsearch.xpack.autoscaling.storage.ReactiveStorageIT_73BD10C01CD081CC-001\tempDir-003\node-1\indices\9XlEdfb6TKGpWBFATp0Knw\0\index\_2.cfs: java.nio.file.AccessDeniedException: C:\bk\x-pack\plugin\autoscaling\build\testrun\internalClusterTest\temp\org.elasticsearch.xpack.autoscaling.storage.ReactiveStorageIT_73BD10C01CD081CC-001\tempDir-003\node-1\indices\9XlEdfb6TKGpWBFATp0Knw\0\index\_2.cfs

in the logs

ldematte · 2025-02-11T13:27:03Z

Actually @nicktindall I'm sending this back, as we think the cleanup failure is a red-herring, and the root cause is that the node can't close because of reference leaks, that in turn causes test cleanup to fail because the node is still running.

nicktindall · 2025-02-13T03:54:32Z

Bumped this one back up to medium risk as it might indicate a resource leak

ywangd · 2025-02-14T07:29:06Z

This is the same issue as #121717. See here for the analysis for the core-infra label.

elasticsearchmachine · 2025-02-15T21:23:43Z

This has been muted on branch main

Mute Reasons:

[main] 3 failures in test testScaleWhileShrinking (0.4% fail rate in 761 executions)

Build Scans:

…stScaleWhileShrinking #122119

ldematte · 2025-02-18T17:26:58Z

See #121717 (comment) for the reason behind the reassignement

elasticsearchmachine · 2025-02-18T17:28:12Z

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

arteam · 2025-02-20T11:05:55Z

Following the discussion on #121717 for propertly shutting down nodes

When IndicesService is closed, the pending deletion may still be in progress due to indices removed before IndicesService gets closed. If the deletion stucks for some reason, it can stall the node shutdown. This PR aborts the pending deletion more promptly by not retry after IndicesService is closed. Resolves: elastic#121717, elastic#121716, elastic#122119

When IndicesService is closed, the pending deletion may still be in progress due to indices removed before IndicesService gets closed. If the deletion stucks for some reason, it can stall the node shutdown. This PR aborts the pending deletion more promptly by not retry after IndicesService is stopped. Resolves: elastic#121717 Resolves: elastic#121716 Resolves: elastic#122119 (cherry picked from commit c7e7dbe) # Conflicts: # muted-tests.yml

When IndicesService is closed, the pending deletion may still be in progress due to indices removed before IndicesService gets closed. If the deletion stucks for some reason, it can stall the node shutdown. This PR aborts the pending deletion more promptly by not retry after IndicesService is stopped. Resolves: #121717 Resolves: #121716 Resolves: #122119 (cherry picked from commit c7e7dbe) # Conflicts: # muted-tests.yml

nicktindall added medium-risk An open issue or test failure that is a medium risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Feb 11, 2025

nicktindall added low-risk An open issue or test failure that is a low risk to future releases and removed medium-risk An open issue or test failure that is a medium risk to future releases labels Feb 11, 2025

nicktindall mentioned this issue Feb 11, 2025

[CI] DedicatedClusterSnapshotRestoreIT testRestoreShrinkIndex failing #121717

Closed

elasticsearchmachine added Team:Core/Infra Meta label for core/infra team and removed Team:Distributed Coordination Meta label for Distributed Coordination team labels Feb 11, 2025

ldematte added :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team labels Feb 11, 2025

elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Feb 11, 2025

nicktindall added medium-risk An open issue or test failure that is a medium risk to future releases and removed low-risk An open issue or test failure that is a low risk to future releases labels Feb 13, 2025

ywangd self-assigned this Feb 14, 2025

ywangd added :Core/Infra/Core Core issues without another label and removed :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Feb 14, 2025

elasticsearchmachine added Team:Core/Infra Meta label for core/infra team and removed Team:Distributed Coordination Meta label for Distributed Coordination team labels Feb 14, 2025

ywangd added low-risk An open issue or test failure that is a low risk to future releases and removed medium-risk An open issue or test failure that is a medium risk to future releases labels Feb 14, 2025

elasticsearchmachine added a commit that referenced this issue Feb 15, 2025

Mute org.elasticsearch.xpack.autoscaling.storage.ReactiveStorageIT te…

a4c645b

…stScaleWhileShrinking #122119

ywangd removed their assignment Feb 16, 2025

ldematte added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. and removed :Core/Infra/Core Core issues without another label labels Feb 18, 2025

elasticsearchmachine added Team:Distributed Indexing Meta label for Distributed Indexing team and removed Team:Core/Infra Meta label for core/infra team labels Feb 18, 2025

ywangd mentioned this issue Feb 27, 2025

Abort pending deletion on IndicesService stop #123569

Merged

elasticsearchmachine closed this as completed in #123569 Feb 27, 2025

elasticsearchmachine closed this as completed in c7e7dbe Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] ReactiveStorageIT testScaleWhileShrinking failing #122119

[CI] ReactiveStorageIT testScaleWhileShrinking failing #122119

elasticsearchmachine commented Feb 8, 2025 •

edited

Loading

elasticsearchmachine commented Feb 8, 2025

nicktindall commented Feb 11, 2025

nicktindall commented Feb 11, 2025

elasticsearchmachine commented Feb 11, 2025

nicktindall commented Feb 11, 2025 •

edited

Loading

nicktindall commented Feb 11, 2025

ldematte commented Feb 11, 2025

nicktindall commented Feb 13, 2025

ywangd commented Feb 14, 2025

elasticsearchmachine commented Feb 15, 2025

ldematte commented Feb 18, 2025

elasticsearchmachine commented Feb 18, 2025

arteam commented Feb 20, 2025

[CI] ReactiveStorageIT testScaleWhileShrinking failing #122119

[CI] ReactiveStorageIT testScaleWhileShrinking failing #122119

Comments

elasticsearchmachine commented Feb 8, 2025 • edited Loading

elasticsearchmachine commented Feb 8, 2025

nicktindall commented Feb 11, 2025

nicktindall commented Feb 11, 2025

elasticsearchmachine commented Feb 11, 2025

nicktindall commented Feb 11, 2025 • edited Loading

nicktindall commented Feb 11, 2025

ldematte commented Feb 11, 2025

nicktindall commented Feb 13, 2025

ywangd commented Feb 14, 2025

elasticsearchmachine commented Feb 15, 2025

ldematte commented Feb 18, 2025

elasticsearchmachine commented Feb 18, 2025

arteam commented Feb 20, 2025

elasticsearchmachine commented Feb 8, 2025 •

edited

Loading

nicktindall commented Feb 11, 2025 •

edited

Loading