Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement an upper bound limit to the number of tracked executor #2181

Merged

Conversation

ImpSy
Copy link
Contributor

@ImpSy ImpSy commented Sep 19, 2024

Purpose of this PR

In some case executor can enter in a crashlooping state causing the CR to grow out of control
This can have an impact on the entire SparkApplication CR processing reconciliation loop
This can eventually lead to etcd errors when we trying to patch / update the CR

Proposed changes:

  • implement a upper bound limit to the number of tracked executor

Change Category

Indicate the type of change by marking the applicable boxes:

  • Bugfix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that could affect existing functionality)
  • Documentation update

Rationale

Submitting apps and tracking driver pods is the highest priority of the spark-operator
1 app going haywire should not hinder the entirety of the cluster

Checklist

Before submitting your PR, please review the following:

  • I have conducted a self-review of my own code.
  • I have updated documentation accordingly.
  • I have added tests that prove my changes are effective or that my feature works.
  • Existing unit tests pass locally with my changes.

Additional Notes

This feature has been live on our fork for the past year -> spotinst#8

@ImpSy
Copy link
Contributor Author

ImpSy commented Sep 26, 2024

Hey @andreyvelich, @jacobsalway, @mwielgus

Could I please have a review on this PR ?

cmd/operator/controller/start.go Outdated Show resolved Hide resolved
pkg/common/spark.go Outdated Show resolved Hide resolved
@ImpSy ImpSy force-pushed the max-tracked-executor-per-app branch from 7de884e to 475113d Compare September 27, 2024 13:16
@ImpSy
Copy link
Contributor Author

ImpSy commented Sep 27, 2024

@ChenYi015 I've done the change you requested :)
I know that you were not tag on this so thanks for the review 👍

@ImpSy
Copy link
Contributor Author

ImpSy commented Sep 28, 2024

@vara-bonthu I've added the possibility to customize the value in the chart following your comment
Thanks for the review 🙏

@ImpSy ImpSy force-pushed the max-tracked-executor-per-app branch from 9ba20ef to 9447c25 Compare September 28, 2024 20:02
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ChenYi015

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ChenYi015
Copy link
Contributor

LGTM. Will wait for another approval.

Signed-off-by: ImpSy <3097030+ImpSy@users.noreply.github.com>
Signed-off-by: ImpSy <3097030+ImpSy@users.noreply.github.com>
@ImpSy
Copy link
Contributor Author

ImpSy commented Oct 3, 2024

@vara-bonthu could you re-review this please 🙏 ?

@ImpSy
Copy link
Contributor Author

ImpSy commented Oct 8, 2024

@ChenYi015 It's been more than a week that you approve the PR, could we merge it now ?

@ChenYi015
Copy link
Contributor

/lgtm

@google-oss-prow google-oss-prow bot added the lgtm label Oct 11, 2024
@google-oss-prow google-oss-prow bot merged commit a8b5d64 into kubeflow:master Oct 11, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants