Skip to content

Conversation

vrozov
Copy link
Member

@vrozov vrozov commented Sep 17, 2025

What changes were proposed in this pull request?

Implement several code improvement in UninterruptibleThreadSuite:

  • run several iterations of stress test
  • log InterruptedException
  • improve assert error message ("false did not equal true" => "hasInterruptedException was false")
  • use await instead of sleep
  • fail test fast

Why are the changes needed?

improve test coverage of UninterruptibleThread and help with test failure troubleshooting

Does this PR introduce any user-facing change?

No

How was this patch tested?

Run modified test

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the CORE label Sep 17, 2025
@vrozov
Copy link
Member Author

vrozov commented Sep 17, 2025

@cloud-fan @Ngone51 Please take a look

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-53622][TEST] Improve UninterruptibleThread test [SPARK-53622][CORE][TEST] Improve UninterruptibleThreadSuite Sep 18, 2025
}

test("stress test") {
for (i <- 0 until 20) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the major change tries to improve the test coverage? Since it's executed in sequence, the stress of interruptions to a single UninterruptibleThread doesn't seem to be inscreased to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change does not target to increase concurrency of the stress test. It targets to reproduce SPARK-53394

}

/* Await latch and return true if it's interrupted */
private def await(latch: CountDownLatch, timeout: Long = 10,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the major difference after using this await()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can equally use sleep() and await() to test for InterruptedException as they both throw an exception in case thread is interrupted. The difference is that sleep() does not return while await() will exit once the main thread calls interrupt() and count down the latch.

timeUnit: TimeUnit = TimeUnit.SECONDS): Boolean = {
try {
if (!latch.await(timeout, timeUnit)) {
log.error("timeout while waiting for the latch")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's test, and this log seems not useful as we fail right after it with the same error message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When await is used not in the main test thread, fail() does not cause thread to fail, it terminates the thread with TestFailedException (that is logged to stderr). And when this happens test fails as other conditions are not met. The log.error() logs to unit-tests.log, so there is no duplication of the error message. Note that in this case, the code is similar to:

log.error("message")
throw new Exception("message")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants