[Feature-18070][Task] Add Amazon EMR Serverless task plugin#18069
[Feature-18070][Task] Add Amazon EMR Serverless task plugin#18069norrishuang wants to merge 10 commits intoapache:devfrom
Conversation
- New backend module: dolphinscheduler-task-emr-serverless - EmrServerlessTask: submit/track/cancel via AWS SDK v1 - Auth: reuse aws.emr.* config, fallback to DefaultCredentialsProvider - SPI registration via @autoservice - Frontend: EMR_SERVERLESS task type with form fields - applicationId, executionRoleArn, jobName, startJobRunRequestJson - i18n: en_US + zh_CN - BOM: add aws-java-sdk-emrserverless dependency
11 test cases covering: - Submit → track → success/failed/cancelled lifecycle - Full state transition (SUBMITTED→PENDING→SCHEDULED→RUNNING→SUCCESS) - Submit error handling (SDK exception) - GetJobRun returns null - Cancel application (with and without jobRunId) - Failover recovery via appIds - Parameter validation (checkParameters) - Invalid JSON handling
- Add maven-shade-plugin to emr-serverless pom.xml so shade jar is included in dist assembly - Add applicationId, executionRoleArn, startJobRunRequestJson fields to ITaskParams in types.ts to fix TypeScript build
The use-task.ts imports TASK_TYPES_MAP from store/project/task-type.ts (not constants/task-type.ts), so EMR_SERVERLESS must be defined there too. Missing entry caused 'Cannot read properties of undefined (reading taskExecuteType)' error when dragging the node onto canvas.
EMR Serverless has no local emulator, so the endpoint from aws.emr.* config (which often points to a local MinIO/S3 mock like localhost:9000) should not be used. Always use the standard AWS endpoint resolved by region. Also updated aws.yaml on deploy server to use InstanceProfileCredentialsProvider.
- Copy EMR icon for EMR_SERVERLESS task type (emr_serverless.png, emr_serverless_hover.png) - Add Chinese doc: docs/docs/zh/guide/task/emr-serverless.md - Add English doc: docs/docs/en/guide/task/emr-serverless.md - Register docs in sidebar config (docsdev.js) - Docs include: overview, task parameters, Spark/Hive examples, AWS auth config, job state transitions, and notices - Screenshot placeholders marked with TODO comments
|
Thanks for opening this pull request! Please check out our contributing guidelines. (https://github.com/apache/dolphinscheduler/blob/dev/docs/docs/en/contribute/join/pull-request.md) |
SbloodyS
left a comment
There was a problem hiding this comment.
Please add api-test or e2e for this. @norrishuang
Comprehensive unit tests have already been included for the EMR Serverless task plugin, covering job submission, state polling, success/failure/cancellation handling, failover recovery, parameter validation, and invalid input scenarios. Since this task plugin depends on AWS EMR Serverless, running api-test or e2e in the CI Docker environment would require AWS credentials and a running EMR Serverless application. I'm happy to add an api-test or e2e if there is a recommended approach for handling AWS authentication in CI. Could you share any guidance on this? |
| static final ObjectMapper objectMapper = new ObjectMapper() | ||
| .configure(FAIL_ON_UNKNOWN_PROPERTIES, false) | ||
| .configure(ACCEPT_EMPTY_ARRAY_AS_NULL_OBJECT, true) | ||
| .configure(READ_UNKNOWN_ENUM_VALUES_AS_NULL, true) | ||
| .configure(REQUIRE_SETTERS_FOR_GETTERS, true) |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note
|



Was this PR generated or assisted by AI?
YES. The implementation was assisted by AI (Claude) for code generation, with human review, testing and verification on a real AWS EMR Serverless environment.
Purpose of the pull request
Add a new task plugin for Amazon EMR Serverless, enabling users to submit, monitor, and cancel Spark/Hive jobs on EMR Serverless applications directly from DolphinScheduler workflows.
Unlike the existing EMR on EC2 task plugin which manages EC2-based clusters, EMR Serverless is a serverless runtime that requires no cluster infrastructure management and automatically scales compute resources on demand.
Close 18070
Brief change log
Backend (new module:
dolphinscheduler-task-emr-serverless)EmrServerlessTask— extendsAbstractRemoteTask, implements submit/track/cancel lifecycle via AWS SDK v1 (StartJobRun,GetJobRun,CancelJobRun)EmrServerlessParameters— task parameter model (applicationId, executionRoleArn, jobName, startJobRunRequestJson)EmrServerlessTaskChannel/EmrServerlessTaskChannelFactory— SPI registration via@AutoService, registered asEMR_SERVERLESSEmrServerlessTaskException— dedicated exception classaws.emr.*config fromaws.yaml, falls back toDefaultAWSCredentialsProviderChainappIds(jobRunId)Frontend
use-emr-serverless.ts(fields) — form fields for Application Id, Execution Role Arn, Job Name, StartJobRunRequest JSON editoruse-emr-serverless.ts(tasks) — task model definitionDocumentation
docs/docs/zh/guide/task/emr-serverless.mddocs/docs/en/guide/task/emr-serverless.mdVerify this pull request
This change added tests and can be verified as follows:
EmrServerlessTaskTestwith 11 unit tests covering: success/failed/cancelled lifecycle, full state chain, submit error handling, null GetJobRun response, cancel with/without jobRunId, failover recovery, parameter validation, and invalid JSON handling.