Skip to content

[Feature-18070][Task] Add Amazon EMR Serverless task plugin#18069

Open
norrishuang wants to merge 10 commits intoapache:devfrom
norrishuang:dev
Open

[Feature-18070][Task] Add Amazon EMR Serverless task plugin#18069
norrishuang wants to merge 10 commits intoapache:devfrom
norrishuang:dev

Conversation

@norrishuang
Copy link

@norrishuang norrishuang commented Mar 14, 2026

Was this PR generated or assisted by AI?

YES. The implementation was assisted by AI (Claude) for code generation, with human review, testing and verification on a real AWS EMR Serverless environment.

Purpose of the pull request

Add a new task plugin for Amazon EMR Serverless, enabling users to submit, monitor, and cancel Spark/Hive jobs on EMR Serverless applications directly from DolphinScheduler workflows.
Unlike the existing EMR on EC2 task plugin which manages EC2-based clusters, EMR Serverless is a serverless runtime that requires no cluster infrastructure management and automatically scales compute resources on demand.
Close 18070

Brief change log

Backend (new module: dolphinscheduler-task-emr-serverless)

  • EmrServerlessTask — extends AbstractRemoteTask, implements submit/track/cancel lifecycle via AWS SDK v1 (StartJobRun, GetJobRun, CancelJobRun)
  • EmrServerlessParameters — task parameter model (applicationId, executionRoleArn, jobName, startJobRunRequestJson)
  • EmrServerlessTaskChannel / EmrServerlessTaskChannelFactory — SPI registration via @AutoService, registered as EMR_SERVERLESS
  • EmrServerlessTaskException — dedicated exception class
  • Authentication: reuses aws.emr.* config from aws.yaml, falls back to DefaultAWSCredentialsProviderChain
  • Supports failover recovery via appIds (jobRunId)
    Frontend
  • use-emr-serverless.ts (fields) — form fields for Application Id, Execution Role Arn, Job Name, StartJobRunRequest JSON editor
  • use-emr-serverless.ts (tasks) — task model definition
  • Registered in task type constants, store, format-data, i18n (en_US/zh_CN)
  • Task icon (reuses EMR icon)
    Documentation
  • Chinese doc: docs/docs/zh/guide/task/emr-serverless.md
  • English doc: docs/docs/en/guide/task/emr-serverless.md
  • Includes: overview, task parameters, Spark/Hive JSON examples, AWS auth config, job state transitions, screenshots

Verify this pull request

This change added tests and can be verified as follows:

  • Added EmrServerlessTaskTest with 11 unit tests covering: success/failed/cancelled lifecycle, full state chain, submit error handling, null GetJobRun response, cancel with/without jobRunId, failover recovery, parameter validation, and invalid JSON handling.
  • Manually verified by deploying to an EC2 instance in Standalone mode and successfully submitting a Spark job to a real AWS EMR Serverless application.

- New backend module: dolphinscheduler-task-emr-serverless
  - EmrServerlessTask: submit/track/cancel via AWS SDK v1
  - Auth: reuse aws.emr.* config, fallback to DefaultCredentialsProvider
  - SPI registration via @autoservice
- Frontend: EMR_SERVERLESS task type with form fields
  - applicationId, executionRoleArn, jobName, startJobRunRequestJson
  - i18n: en_US + zh_CN
- BOM: add aws-java-sdk-emrserverless dependency
11 test cases covering:
- Submit → track → success/failed/cancelled lifecycle
- Full state transition (SUBMITTED→PENDING→SCHEDULED→RUNNING→SUCCESS)
- Submit error handling (SDK exception)
- GetJobRun returns null
- Cancel application (with and without jobRunId)
- Failover recovery via appIds
- Parameter validation (checkParameters)
- Invalid JSON handling
- Add maven-shade-plugin to emr-serverless pom.xml so shade jar is
  included in dist assembly
- Add applicationId, executionRoleArn, startJobRunRequestJson fields
  to ITaskParams in types.ts to fix TypeScript build
The use-task.ts imports TASK_TYPES_MAP from store/project/task-type.ts
(not constants/task-type.ts), so EMR_SERVERLESS must be defined there
too. Missing entry caused 'Cannot read properties of undefined
(reading taskExecuteType)' error when dragging the node onto canvas.
EMR Serverless has no local emulator, so the endpoint from aws.emr.*
config (which often points to a local MinIO/S3 mock like localhost:9000)
should not be used. Always use the standard AWS endpoint resolved by
region. Also updated aws.yaml on deploy server to use
InstanceProfileCredentialsProvider.
- Copy EMR icon for EMR_SERVERLESS task type (emr_serverless.png, emr_serverless_hover.png)
- Add Chinese doc: docs/docs/zh/guide/task/emr-serverless.md
- Add English doc: docs/docs/en/guide/task/emr-serverless.md
- Register docs in sidebar config (docsdev.js)
- Docs include: overview, task parameters, Spark/Hive examples,
  AWS auth config, job state transitions, and notices
- Screenshot placeholders marked with TODO comments
@boring-cyborg
Copy link

boring-cyborg bot commented Mar 14, 2026

Thanks for opening this pull request! Please check out our contributing guidelines. (https://github.com/apache/dolphinscheduler/blob/dev/docs/docs/en/contribute/join/pull-request.md)

@github-actions github-actions bot added UI ui and front end related backend test document labels Mar 14, 2026
@norrishuang norrishuang changed the title [Feature][Task] Add Amazon EMR Serverless task plugin [Feature-18070][Task] Add Amazon EMR Serverless task plugin Mar 14, 2026
Copy link
Member

@SbloodyS SbloodyS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add api-test or e2e for this. @norrishuang

@norrishuang
Copy link
Author

Please add api-test or e2e for this. @norrishuang

Comprehensive unit tests have already been included for the EMR Serverless task plugin, covering job submission, state polling, success/failure/cancellation handling, failover recovery, parameter validation, and invalid input scenarios. Since this task plugin depends on AWS EMR Serverless, running api-test or e2e in the CI Docker environment would require AWS credentials and a running EMR Serverless application. I'm happy to add an api-test or e2e if there is a recommended approach for handling AWS authentication in CI. Could you share any guidance on this?

Comment on lines +85 to +89
static final ObjectMapper objectMapper = new ObjectMapper()
.configure(FAIL_ON_UNKNOWN_PROPERTIES, false)
.configure(ACCEPT_EMPTY_ARRAY_AS_NULL_OBJECT, true)
.configure(READ_UNKNOWN_ENUM_VALUES_AS_NULL, true)
.configure(REQUIRE_SETTERS_FOR_GETTERS, true)

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
ObjectMapper.configure
should be avoided because it has been deprecated.
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend document test UI ui and front end related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][Task] Support Amazon EMR Serverless task plugin

2 participants