[Improvement-18034][alert] Add configurable timeout for script alert plugin#18050
[Improvement-18034][alert] Add configurable timeout for script alert plugin#18050asadjan4611 wants to merge 2 commits intoapache:devfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a configurable execution timeout to the Script Alert Plugin to prevent hung scripts from permanently blocking alert-sender threads, aligning behavior with the HTTP alert plugin’s timeout pattern.
Changes:
- Introduce script timeout plugin parameter (default 60s) and expose it via the Script plugin channel factory params.
- Update script execution to use
waitFor(timeout, TimeUnit.SECONDS)and return a dedicated timeout exit code. - Add/adjust unit tests for the new timeout parameter and timeout behavior.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| dolphinscheduler-alert/dolphinscheduler-alert-plugins/dolphinscheduler-alert-script/src/main/java/org/apache/dolphinscheduler/plugin/alert/script/ProcessUtils.java | Adds timed waiting for script processes and timeout handling. |
| dolphinscheduler-alert/dolphinscheduler-alert-plugins/dolphinscheduler-alert-script/src/main/java/org/apache/dolphinscheduler/plugin/alert/script/ScriptSender.java | Parses timeout config and maps timeout exit code to a clearer alert result message. |
| dolphinscheduler-alert/dolphinscheduler-alert-plugins/dolphinscheduler-alert-script/src/main/java/org/apache/dolphinscheduler/plugin/alert/script/ScriptParamsConstants.java | Adds timeout-related constants and default value. |
| dolphinscheduler-alert/dolphinscheduler-alert-plugins/dolphinscheduler-alert-script/src/main/java/org/apache/dolphinscheduler/plugin/alert/script/ScriptAlertChannelFactory.java | Exposes timeout as an InputNumberParam in the Script plugin UI params. |
| dolphinscheduler-alert/dolphinscheduler-alert-plugins/dolphinscheduler-alert-script/src/test/java/org/apache/dolphinscheduler/plugin/alert/script/ProcessUtilsTest.java | Updates tests to use the new timeout API and adds a timeout test case. |
| dolphinscheduler-alert/dolphinscheduler-alert-plugins/dolphinscheduler-alert-script/src/test/java/org/apache/dolphinscheduler/plugin/alert/script/ScriptSenderTest.java | Adds basic tests for default/custom timeout configuration. |
| dolphinscheduler-alert/dolphinscheduler-alert-plugins/dolphinscheduler-alert-script/src/test/java/org/apache/dolphinscheduler/plugin/alert/script/ScriptAlertChannelFactoryTest.java | Updates expected param count after adding timeout. |
Comments suppressed due to low confidence (1)
dolphinscheduler-alert/dolphinscheduler-alert-plugins/dolphinscheduler-alert-script/src/main/java/org/apache/dolphinscheduler/plugin/alert/script/ProcessUtils.java:64
- The catch block interrupts the current thread for both
InterruptedExceptionandIOException. Interrupting the thread on anIOExceptioncan cause unrelated higher-level code to behave as if it was interrupted. Split the catch and only callThread.currentThread().interrupt()when the exception is actually anInterruptedException.
} catch (IOException | InterruptedException e) {
log.error("execute alert script error {}", e.getMessage());
Thread.currentThread().interrupt();
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| timeout = StringUtils.isNotBlank(config.get(ScriptParamsConstants.NAME_SCRIPT_TIMEOUT)) | ||
| ? Long.parseLong(config.get(ScriptParamsConstants.NAME_SCRIPT_TIMEOUT)) | ||
| : ScriptParamsConstants.DEFAULT_SCRIPT_TIMEOUT; |
There was a problem hiding this comment.
Long.parseLong(config.get(NAME_SCRIPT_TIMEOUT)) can throw NumberFormatException if the stored plugin config contains a non-numeric value. To prevent alert sending from failing with an uncaught exception, parse this defensively (e.g., try/catch with fallback to DEFAULT_SCRIPT_TIMEOUT, optionally logging a warning).
| timeout = StringUtils.isNotBlank(config.get(ScriptParamsConstants.NAME_SCRIPT_TIMEOUT)) | |
| ? Long.parseLong(config.get(ScriptParamsConstants.NAME_SCRIPT_TIMEOUT)) | |
| : ScriptParamsConstants.DEFAULT_SCRIPT_TIMEOUT; | |
| String timeoutConfig = config.get(ScriptParamsConstants.NAME_SCRIPT_TIMEOUT); | |
| if (StringUtils.isNotBlank(timeoutConfig)) { | |
| long parsedTimeout = ScriptParamsConstants.DEFAULT_SCRIPT_TIMEOUT; | |
| try { | |
| parsedTimeout = Long.parseLong(timeoutConfig); | |
| } catch (NumberFormatException ex) { | |
| log.warn("Invalid script timeout config value: '{}', using default: {}", | |
| timeoutConfig, ScriptParamsConstants.DEFAULT_SCRIPT_TIMEOUT, ex); | |
| } | |
| timeout = parsedTimeout; | |
| } else { | |
| timeout = ScriptParamsConstants.DEFAULT_SCRIPT_TIMEOUT; | |
| } |
| .setValue(ScriptParamsConstants.DEFAULT_SCRIPT_TIMEOUT) | ||
| .addValidate(Validate.newBuilder() | ||
| .setType(DataType.NUMBER.getDataType()) | ||
| .setRequired(false) |
There was a problem hiding this comment.
The new timeout parameter allows any number, including negative values, which can lead to unexpected behavior in ProcessUtils (and may cause immediate timeout). Add a min validation (e.g., setMin(0D) or setMin(1D) depending on whether 0 should disable timeouts) to constrain input in the UI/config.
| .setRequired(false) | |
| .setRequired(false) | |
| .setMin(0D) |
| public void testExecuteScript() { | ||
| int code = ProcessUtils.executeScript(cmd); | ||
| int code = ProcessUtils.executeScript(60, cmd); | ||
| assert code != -1; |
There was a problem hiding this comment.
This test uses the Java assert keyword, which is disabled by default in Maven/JUnit runs unless -ea is set, so the assertion may never execute. Use JUnit assertions (e.g., Assertions.assertNotEquals(-1, code) or a more specific expected exit code) so the test reliably validates behavior.
| assert code != -1; | |
| Assertions.assertNotEquals(-1, code); |
| @Test | ||
| public void testCustomTimeout() { | ||
| scriptConfig.put(ScriptParamsConstants.NAME_SCRIPT_TIMEOUT, "30"); | ||
| ScriptSender scriptSender = new ScriptSender(scriptConfig); | ||
| AlertResult alertResult = scriptSender.sendScriptAlert("test title Kris", "test content"); | ||
| Assertions.assertTrue(alertResult.isSuccess()); | ||
| } |
There was a problem hiding this comment.
scriptConfig is a static mutable map and @BeforeEach does not clear it. After testCustomTimeout sets NAME_SCRIPT_TIMEOUT, later tests may unintentionally reuse that value (test order is not guaranteed), making tests flaky. Clear/recreate the map in @BeforeEach or remove the timeout key at the end of this test.
| @Test | ||
| public void testDefaultTimeout() { | ||
| ScriptSender scriptSender = new ScriptSender(scriptConfig); | ||
| AlertResult alertResult = scriptSender.sendScriptAlert("test title Kris", "test content"); | ||
| Assertions.assertTrue(alertResult.isSuccess()); | ||
| } | ||
|
|
||
| @Test | ||
| public void testCustomTimeout() { | ||
| scriptConfig.put(ScriptParamsConstants.NAME_SCRIPT_TIMEOUT, "30"); | ||
| ScriptSender scriptSender = new ScriptSender(scriptConfig); | ||
| AlertResult alertResult = scriptSender.sendScriptAlert("test title Kris", "test content"); | ||
| Assertions.assertTrue(alertResult.isSuccess()); | ||
| } |
There was a problem hiding this comment.
testDefaultTimeout/testCustomTimeout currently only assert that sending succeeds, but they don’t validate that the configured timeout value is actually parsed/applied. To meaningfully test the new behavior, consider using a script/command that sleeps longer than the configured timeout and assert a timeout failure, or mock/spying ProcessUtils to assert it’s called with the expected timeout.
| boolean finished = process.waitFor(timeoutSeconds, TimeUnit.SECONDS); | ||
| if (!finished) { | ||
| log.error("script execution timed out after {} seconds, destroying process", timeoutSeconds); | ||
| process.destroyForcibly(); | ||
| inputStreamGobbler.interrupt(); | ||
| errorStreamGobbler.interrupt(); | ||
| return -2; |
There was a problem hiding this comment.
timeoutSeconds is documented as "if <= 0 waits indefinitely", but the implementation always calls process.waitFor(timeoutSeconds, TimeUnit.SECONDS). For timeoutSeconds <= 0 this will not wait indefinitely and can immediately return -2. Handle timeoutSeconds <= 0 by calling process.waitFor() (no timeout) and only using timed wait when timeoutSeconds > 0.
| boolean finished = process.waitFor(timeoutSeconds, TimeUnit.SECONDS); | |
| if (!finished) { | |
| log.error("script execution timed out after {} seconds, destroying process", timeoutSeconds); | |
| process.destroyForcibly(); | |
| inputStreamGobbler.interrupt(); | |
| errorStreamGobbler.interrupt(); | |
| return -2; | |
| if (timeoutSeconds > 0) { | |
| boolean finished = process.waitFor(timeoutSeconds, TimeUnit.SECONDS); | |
| if (!finished) { | |
| log.error("script execution timed out after {} seconds, destroying process", timeoutSeconds); | |
| process.destroyForcibly(); | |
| inputStreamGobbler.interrupt(); | |
| errorStreamGobbler.interrupt(); | |
| return -2; | |
| } | |
| } else { | |
| // Wait indefinitely when timeoutSeconds <= 0, as documented | |
| process.waitFor(); |
| inputStreamGobbler.interrupt(); | ||
| errorStreamGobbler.interrupt(); | ||
| return -2; | ||
| } | ||
| return process.exitValue(); |
There was a problem hiding this comment.
On timeout the code interrupts the StreamGobbler threads, but StreamGobbler blocks on BufferedReader.readLine() and doesn’t check the interrupt flag, so interrupt() alone may not stop these threads promptly. Consider closing the process streams and/or updating the gobblers to exit on interrupt, and then join() them (or mark them as daemon) to avoid leaking threads after a timeout.
| inputStreamGobbler.interrupt(); | |
| errorStreamGobbler.interrupt(); | |
| return -2; | |
| } | |
| return process.exitValue(); | |
| try { | |
| process.getInputStream().close(); | |
| } catch (IOException ioe) { | |
| log.warn("Failed to close process input stream after timeout", ioe); | |
| } | |
| try { | |
| process.getErrorStream().close(); | |
| } catch (IOException ioe) { | |
| log.warn("Failed to close process error stream after timeout", ioe); | |
| } | |
| inputStreamGobbler.interrupt(); | |
| errorStreamGobbler.interrupt(); | |
| try { | |
| inputStreamGobbler.join(TimeUnit.SECONDS.toMillis(1)); | |
| } catch (InterruptedException ie) { | |
| Thread.currentThread().interrupt(); | |
| } | |
| try { | |
| errorStreamGobbler.join(TimeUnit.SECONDS.toMillis(1)); | |
| } catch (InterruptedException ie) { | |
| Thread.currentThread().interrupt(); | |
| } | |
| return -2; | |
| } | |
| int processExitCode = process.exitValue(); | |
| try { | |
| inputStreamGobbler.join(TimeUnit.SECONDS.toMillis(1)); | |
| } catch (InterruptedException ie) { | |
| Thread.currentThread().interrupt(); | |
| } | |
| try { | |
| errorStreamGobbler.join(TimeUnit.SECONDS.toMillis(1)); | |
| } catch (InterruptedException ie) { | |
| Thread.currentThread().interrupt(); | |
| } | |
| return processExitCode; |
|
@SbloodyS I have resolved all the copilot issue's please review it again. |
e902f7e to
124bf56
Compare
| closeProcessStreams(process); | ||
| joinGobbler(inputStreamGobbler); | ||
| joinGobbler(errorStreamGobbler); | ||
| return -2; |
There was a problem hiding this comment.
It's better to use positive integer here.
| .build(); | ||
|
|
||
| return Arrays.asList(scriptUserParam, scriptPathParam, scriptTypeParams); | ||
| InputNumberParam scriptTimeoutParam = |
There was a problem hiding this comment.
It is best to tell the user the meaning of the exit code in the document.
| ? config.get(ScriptParamsConstants.NAME_SCRIPT_USER_PARAMS) | ||
| : ""; | ||
| String timeoutConfig = config.get(ScriptParamsConstants.NAME_SCRIPT_TIMEOUT); | ||
| if (StringUtils.isNotBlank(timeoutConfig)) { |
There was a problem hiding this comment.
| if (StringUtils.isNotBlank(timeoutConfig)) { | |
| if (StringUtils.isNotEmpty(timeoutConfig)) { |
| try { | ||
| parsedTimeout = Long.parseLong(timeoutConfig); | ||
| } catch (NumberFormatException ex) { | ||
| log.warn("Invalid script timeout config value: '{}', using default: {}", |
There was a problem hiding this comment.
We should return alertResult here.
| private String[] cmd = {"/bin/sh", "-c", shellFilPath + " -t 1"}; | ||
|
|
||
| @Test | ||
| @DisabledOnOs(OS.WINDOWS) |
There was a problem hiding this comment.
We don't need to do this since we've decleared in the document that windows is not supported.
| } | ||
|
|
||
| @Test | ||
| @DisabledOnOs(OS.WINDOWS) |
|



Purpose of the pull request
Close: #18034
The Script Alert Plugin executes external scripts to send alerts. Currently, ProcessUtils.executeScript() calls process.waitFor() with no timeout. If a script hangs (network stall, deadlock, etc.), it permanently blocks a thread in the alert sender thread pool, degrading or blocking all alert delivery.
This PR adds a configurable timeout parameter (default: 60 seconds) to the Script Alert Plugin following the same pattern used by the HTTP Alert Plugin.
Brief change log
Verify this pull request
This change added tests and can be verified as follows:
mvn clean compile