Skip to content

[Improvement-18034][alert] Add configurable timeout for script alert plugin#18050

Open
asadjan4611 wants to merge 2 commits intoapache:devfrom
asadjan4611:feature-18034-script-alert-timeout
Open

[Improvement-18034][alert] Add configurable timeout for script alert plugin#18050
asadjan4611 wants to merge 2 commits intoapache:devfrom
asadjan4611:feature-18034-script-alert-timeout

Conversation

@asadjan4611
Copy link
Contributor

Purpose of the pull request

Close: #18034

The Script Alert Plugin executes external scripts to send alerts. Currently, ProcessUtils.executeScript() calls process.waitFor() with no timeout. If a script hangs (network stall, deadlock, etc.), it permanently blocks a thread in the alert sender thread pool, degrading or blocking all alert delivery.

This PR adds a configurable timeout parameter (default: 60 seconds) to the Script Alert Plugin following the same pattern used by the HTTP Alert Plugin.

Brief change log

  • Add SCRIPT_TIMEOUT, NAME_SCRIPT_TIMEOUT, DEFAULT_SCRIPT_TIMEOUT constants to ScriptParamsConstants
  • Add InputNumberParam timeout parameter to ScriptAlertChannelFactory.params()
  • Replace blocking process.waitFor() with process.waitFor(timeout, TimeUnit.SECONDS) in ProcessUtils
  • Destroy process forcibly and interrupt stream gobblers on timeout
  • Parse timeout config in ScriptSender with fallback to default
  • Handle timeout exit code with clear error message in alert result

Verify this pull request

This change added tests and can be verified as follows:

  • Added ProcessUtilsTest.testExecuteScriptTimeout to verify process timeout and forced destruction (returns exit code -2)
  • Added ScriptSenderTest.testDefaultTimeout to verify default timeout works correctly
  • Added ScriptSenderTest.testCustomTimeout to verify custom timeout value is parsed and applied
  • Updated ScriptAlertChannelFactoryTest.testGetParams to verify new param count (3 → 4)
  • Compilation verified locally with mvn clean compile

@asadjan4611 asadjan4611 requested a review from SbloodyS as a code owner March 11, 2026 09:45
@SbloodyS SbloodyS requested a review from Copilot March 11, 2026 10:05
@SbloodyS SbloodyS added this to the 3.4.2 milestone Mar 11, 2026
@SbloodyS SbloodyS added the improvement make more easy to user or prompt friendly label Mar 11, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a configurable execution timeout to the Script Alert Plugin to prevent hung scripts from permanently blocking alert-sender threads, aligning behavior with the HTTP alert plugin’s timeout pattern.

Changes:

  • Introduce script timeout plugin parameter (default 60s) and expose it via the Script plugin channel factory params.
  • Update script execution to use waitFor(timeout, TimeUnit.SECONDS) and return a dedicated timeout exit code.
  • Add/adjust unit tests for the new timeout parameter and timeout behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
dolphinscheduler-alert/dolphinscheduler-alert-plugins/dolphinscheduler-alert-script/src/main/java/org/apache/dolphinscheduler/plugin/alert/script/ProcessUtils.java Adds timed waiting for script processes and timeout handling.
dolphinscheduler-alert/dolphinscheduler-alert-plugins/dolphinscheduler-alert-script/src/main/java/org/apache/dolphinscheduler/plugin/alert/script/ScriptSender.java Parses timeout config and maps timeout exit code to a clearer alert result message.
dolphinscheduler-alert/dolphinscheduler-alert-plugins/dolphinscheduler-alert-script/src/main/java/org/apache/dolphinscheduler/plugin/alert/script/ScriptParamsConstants.java Adds timeout-related constants and default value.
dolphinscheduler-alert/dolphinscheduler-alert-plugins/dolphinscheduler-alert-script/src/main/java/org/apache/dolphinscheduler/plugin/alert/script/ScriptAlertChannelFactory.java Exposes timeout as an InputNumberParam in the Script plugin UI params.
dolphinscheduler-alert/dolphinscheduler-alert-plugins/dolphinscheduler-alert-script/src/test/java/org/apache/dolphinscheduler/plugin/alert/script/ProcessUtilsTest.java Updates tests to use the new timeout API and adds a timeout test case.
dolphinscheduler-alert/dolphinscheduler-alert-plugins/dolphinscheduler-alert-script/src/test/java/org/apache/dolphinscheduler/plugin/alert/script/ScriptSenderTest.java Adds basic tests for default/custom timeout configuration.
dolphinscheduler-alert/dolphinscheduler-alert-plugins/dolphinscheduler-alert-script/src/test/java/org/apache/dolphinscheduler/plugin/alert/script/ScriptAlertChannelFactoryTest.java Updates expected param count after adding timeout.
Comments suppressed due to low confidence (1)

dolphinscheduler-alert/dolphinscheduler-alert-plugins/dolphinscheduler-alert-script/src/main/java/org/apache/dolphinscheduler/plugin/alert/script/ProcessUtils.java:64

  • The catch block interrupts the current thread for both InterruptedException and IOException. Interrupting the thread on an IOException can cause unrelated higher-level code to behave as if it was interrupted. Split the catch and only call Thread.currentThread().interrupt() when the exception is actually an InterruptedException.
        } catch (IOException | InterruptedException e) {
            log.error("execute alert script error {}", e.getMessage());
            Thread.currentThread().interrupt();
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +51 to +53
timeout = StringUtils.isNotBlank(config.get(ScriptParamsConstants.NAME_SCRIPT_TIMEOUT))
? Long.parseLong(config.get(ScriptParamsConstants.NAME_SCRIPT_TIMEOUT))
: ScriptParamsConstants.DEFAULT_SCRIPT_TIMEOUT;
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long.parseLong(config.get(NAME_SCRIPT_TIMEOUT)) can throw NumberFormatException if the stored plugin config contains a non-numeric value. To prevent alert sending from failing with an uncaught exception, parse this defensively (e.g., try/catch with fallback to DEFAULT_SCRIPT_TIMEOUT, optionally logging a warning).

Suggested change
timeout = StringUtils.isNotBlank(config.get(ScriptParamsConstants.NAME_SCRIPT_TIMEOUT))
? Long.parseLong(config.get(ScriptParamsConstants.NAME_SCRIPT_TIMEOUT))
: ScriptParamsConstants.DEFAULT_SCRIPT_TIMEOUT;
String timeoutConfig = config.get(ScriptParamsConstants.NAME_SCRIPT_TIMEOUT);
if (StringUtils.isNotBlank(timeoutConfig)) {
long parsedTimeout = ScriptParamsConstants.DEFAULT_SCRIPT_TIMEOUT;
try {
parsedTimeout = Long.parseLong(timeoutConfig);
} catch (NumberFormatException ex) {
log.warn("Invalid script timeout config value: '{}', using default: {}",
timeoutConfig, ScriptParamsConstants.DEFAULT_SCRIPT_TIMEOUT, ex);
}
timeout = parsedTimeout;
} else {
timeout = ScriptParamsConstants.DEFAULT_SCRIPT_TIMEOUT;
}

Copilot uses AI. Check for mistakes.
.setValue(ScriptParamsConstants.DEFAULT_SCRIPT_TIMEOUT)
.addValidate(Validate.newBuilder()
.setType(DataType.NUMBER.getDataType())
.setRequired(false)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new timeout parameter allows any number, including negative values, which can lead to unexpected behavior in ProcessUtils (and may cause immediate timeout). Add a min validation (e.g., setMin(0D) or setMin(1D) depending on whether 0 should disable timeouts) to constrain input in the UI/config.

Suggested change
.setRequired(false)
.setRequired(false)
.setMin(0D)

Copilot uses AI. Check for mistakes.
public void testExecuteScript() {
int code = ProcessUtils.executeScript(cmd);
int code = ProcessUtils.executeScript(60, cmd);
assert code != -1;
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses the Java assert keyword, which is disabled by default in Maven/JUnit runs unless -ea is set, so the assertion may never execute. Use JUnit assertions (e.g., Assertions.assertNotEquals(-1, code) or a more specific expected exit code) so the test reliably validates behavior.

Suggested change
assert code != -1;
Assertions.assertNotEquals(-1, code);

Copilot uses AI. Check for mistakes.
Comment on lines +110 to +116
@Test
public void testCustomTimeout() {
scriptConfig.put(ScriptParamsConstants.NAME_SCRIPT_TIMEOUT, "30");
ScriptSender scriptSender = new ScriptSender(scriptConfig);
AlertResult alertResult = scriptSender.sendScriptAlert("test title Kris", "test content");
Assertions.assertTrue(alertResult.isSuccess());
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scriptConfig is a static mutable map and @BeforeEach does not clear it. After testCustomTimeout sets NAME_SCRIPT_TIMEOUT, later tests may unintentionally reuse that value (test order is not guaranteed), making tests flaky. Clear/recreate the map in @BeforeEach or remove the timeout key at the end of this test.

Copilot uses AI. Check for mistakes.
Comment on lines +103 to +116
@Test
public void testDefaultTimeout() {
ScriptSender scriptSender = new ScriptSender(scriptConfig);
AlertResult alertResult = scriptSender.sendScriptAlert("test title Kris", "test content");
Assertions.assertTrue(alertResult.isSuccess());
}

@Test
public void testCustomTimeout() {
scriptConfig.put(ScriptParamsConstants.NAME_SCRIPT_TIMEOUT, "30");
ScriptSender scriptSender = new ScriptSender(scriptConfig);
AlertResult alertResult = scriptSender.sendScriptAlert("test title Kris", "test content");
Assertions.assertTrue(alertResult.isSuccess());
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testDefaultTimeout/testCustomTimeout currently only assert that sending succeeds, but they don’t validate that the configured timeout value is actually parsed/applied. To meaningfully test the new behavior, consider using a script/command that sleeps longer than the configured timeout and assert a timeout failure, or mock/spying ProcessUtils to assert it’s called with the expected timeout.

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +58
boolean finished = process.waitFor(timeoutSeconds, TimeUnit.SECONDS);
if (!finished) {
log.error("script execution timed out after {} seconds, destroying process", timeoutSeconds);
process.destroyForcibly();
inputStreamGobbler.interrupt();
errorStreamGobbler.interrupt();
return -2;
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeoutSeconds is documented as "if <= 0 waits indefinitely", but the implementation always calls process.waitFor(timeoutSeconds, TimeUnit.SECONDS). For timeoutSeconds <= 0 this will not wait indefinitely and can immediately return -2. Handle timeoutSeconds <= 0 by calling process.waitFor() (no timeout) and only using timed wait when timeoutSeconds > 0.

Suggested change
boolean finished = process.waitFor(timeoutSeconds, TimeUnit.SECONDS);
if (!finished) {
log.error("script execution timed out after {} seconds, destroying process", timeoutSeconds);
process.destroyForcibly();
inputStreamGobbler.interrupt();
errorStreamGobbler.interrupt();
return -2;
if (timeoutSeconds > 0) {
boolean finished = process.waitFor(timeoutSeconds, TimeUnit.SECONDS);
if (!finished) {
log.error("script execution timed out after {} seconds, destroying process", timeoutSeconds);
process.destroyForcibly();
inputStreamGobbler.interrupt();
errorStreamGobbler.interrupt();
return -2;
}
} else {
// Wait indefinitely when timeoutSeconds <= 0, as documented
process.waitFor();

Copilot uses AI. Check for mistakes.
Comment on lines +56 to +60
inputStreamGobbler.interrupt();
errorStreamGobbler.interrupt();
return -2;
}
return process.exitValue();
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On timeout the code interrupts the StreamGobbler threads, but StreamGobbler blocks on BufferedReader.readLine() and doesn’t check the interrupt flag, so interrupt() alone may not stop these threads promptly. Consider closing the process streams and/or updating the gobblers to exit on interrupt, and then join() them (or mark them as daemon) to avoid leaking threads after a timeout.

Suggested change
inputStreamGobbler.interrupt();
errorStreamGobbler.interrupt();
return -2;
}
return process.exitValue();
try {
process.getInputStream().close();
} catch (IOException ioe) {
log.warn("Failed to close process input stream after timeout", ioe);
}
try {
process.getErrorStream().close();
} catch (IOException ioe) {
log.warn("Failed to close process error stream after timeout", ioe);
}
inputStreamGobbler.interrupt();
errorStreamGobbler.interrupt();
try {
inputStreamGobbler.join(TimeUnit.SECONDS.toMillis(1));
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
}
try {
errorStreamGobbler.join(TimeUnit.SECONDS.toMillis(1));
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
}
return -2;
}
int processExitCode = process.exitValue();
try {
inputStreamGobbler.join(TimeUnit.SECONDS.toMillis(1));
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
}
try {
errorStreamGobbler.join(TimeUnit.SECONDS.toMillis(1));
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
}
return processExitCode;

Copilot uses AI. Check for mistakes.
@asadjan4611
Copy link
Contributor Author

@SbloodyS I have resolved all the copilot issue's please review it again.

@asadjan4611 asadjan4611 force-pushed the feature-18034-script-alert-timeout branch from e902f7e to 124bf56 Compare March 12, 2026 14:07
closeProcessStreams(process);
joinGobbler(inputStreamGobbler);
joinGobbler(errorStreamGobbler);
return -2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use positive integer here.

.build();

return Arrays.asList(scriptUserParam, scriptPathParam, scriptTypeParams);
InputNumberParam scriptTimeoutParam =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is best to tell the user the meaning of the exit code in the document.

? config.get(ScriptParamsConstants.NAME_SCRIPT_USER_PARAMS)
: "";
String timeoutConfig = config.get(ScriptParamsConstants.NAME_SCRIPT_TIMEOUT);
if (StringUtils.isNotBlank(timeoutConfig)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (StringUtils.isNotBlank(timeoutConfig)) {
if (StringUtils.isNotEmpty(timeoutConfig)) {

try {
parsedTimeout = Long.parseLong(timeoutConfig);
} catch (NumberFormatException ex) {
log.warn("Invalid script timeout config value: '{}', using default: {}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should return alertResult here.

private String[] cmd = {"/bin/sh", "-c", shellFilPath + " -t 1"};

@Test
@DisabledOnOs(OS.WINDOWS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to do this since we've decleared in the document that windows is not supported.

}

@Test
@DisabledOnOs(OS.WINDOWS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend improvement make more easy to user or prompt friendly test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement][alert] Script alert plugin can hang indefinitely (no process timeout) and exhaust alert sender threads

3 participants