Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job-like workload supported by WorkloadSpread #1838

Merged

Conversation

AiRanthem
Copy link
Member

AI scenario is supported by WorkloadSpread

Ⅰ. Describe what this PR does

a targetFilter is added to WorkloadSpread to make it possible to manage only a part of Pods owned by a target workload. And it also provides support for workloads without replicas.

Ⅱ. Does this pull request fix one issue?

fixes #1818

Ⅲ. Describe how to verify it

Jobs like TFJob are supported, use WorkloadSpread on them.

Ⅳ. Special notes for reviews

Copy link

codecov bot commented Nov 27, 2024

Codecov Report

Attention: Patch coverage is 58.66667% with 93 lines in your changes missing coverage. Please review.

Project coverage is 50.95%. Comparing base (0d0031a) to head (cda1e9b).
Report is 137 commits behind head on master.

Files with missing lines Patch % Lines
...roller/workloadspread/workloadspread_controller.go 54.11% 28 Missing and 11 partials ⚠️
pkg/util/controllerfinder/controller_finder.go 0.00% 32 Missing ⚠️
pkg/util/workloadspread/workloadspread.go 80.39% 7 Missing and 3 partials ⚠️
...loadspread/validating/workloadspread_validation.go 0.00% 4 Missing and 1 partial ⚠️
pkg/util/controllerfinder/pods_finder.go 0.00% 4 Missing ⚠️
pkg/util/workloadspread/utils.go 93.75% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1838      +/-   ##
==========================================
+ Coverage   47.91%   50.95%   +3.04%     
==========================================
  Files         162      194      +32     
  Lines       23491    25288    +1797     
==========================================
+ Hits        11256    12886    +1630     
- Misses      11014    11099      +85     
- Partials     1221     1303      +82     
Flag Coverage Δ
unittests 50.95% <58.66%> (+3.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AiRanthem AiRanthem force-pushed the feature/ws-job-like-support-241118 branch from 02f7b9c to 02df344 Compare November 28, 2024 09:00
pkg/util/workloadspread/workloadspread.go Outdated Show resolved Hide resolved
apis/apps/v1alpha1/workloadspread_types.go Outdated Show resolved Hide resolved
pkg/util/workloadspread/workloadspread.go Show resolved Hide resolved
pkg/util/workloadspread/workloadspread.go Outdated Show resolved Hide resolved
apis/apps/v1alpha1/workloadspread_types.go Outdated Show resolved Hide resolved
@AiRanthem AiRanthem force-pushed the feature/ws-job-like-support-241118 branch 3 times, most recently from 0123426 to c5c9fa3 Compare December 9, 2024 02:09
@AiRanthem AiRanthem force-pushed the feature/ws-job-like-support-241118 branch from c5c9fa3 to 969bbea Compare December 10, 2024 03:54
Copy link
Member

@furykerry furykerry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

pkg/controller/workloadspread/workloadspread_controller.go Outdated Show resolved Hide resolved
pkg/controller/workloadspread/workloadspread_controller.go Outdated Show resolved Hide resolved
pkg/controller/workloadspread/workloadspread_controller.go Outdated Show resolved Hide resolved
pkg/controller/workloadspread/workloadspread_controller.go Outdated Show resolved Hide resolved
pkg/controller/workloadspread/workloadspread_controller.go Outdated Show resolved Hide resolved
pkg/util/controllerfinder/controller_finder.go Outdated Show resolved Hide resolved
@@ -120,6 +121,7 @@ func (r *ControllerFinder) GetPodsForRef(apiVersion, kind, ns, name string, acti
FieldSelector: fields.SelectorFromSet(fields.Set{fieldindex.IndexNameForOwnerRefUID: string(uid)}),
}
pods, err := listPods(&listOption)
klog.V(5).InfoS("result of list pods with owner ref uid", "pods", len(pods), "err", err, "refUid", uid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uid is not very suitable for viewing logs, please update it to be more log-friendly with some key.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is very useful to check whether the pods are listed properly while debugging

@@ -91,6 +91,7 @@ func (r *ControllerFinder) GetPodsForRef(apiVersion, kind, ns, name string, acti
labelSelector = obj.Selector
workloadUIDs = append(workloadUIDs, obj.UID)
}
klog.V(5).InfoS("find pods and replicas result", "workloadReplicas", workloadReplicas, "workloadUIDs", workloadUIDs, "labelSelector", labelSelector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uid is not very suitable for viewing logs, please update it to be more log-friendly with some key.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is very useful to check whether the pods are listed properly while debugging

pkg/util/workloadspread/utils.go Outdated Show resolved Hide resolved
@@ -791,11 +800,13 @@ func initializeWorkloadsInWhiteList(c client.Client) {
})
}
}
klog.InfoS("initialized workload list", "workloads", workloads)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and , "workloadSpread", klog.KObj(ws)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The effect of this function is global and it will execute only once.

…ge only a part of Pods owned by a target workload to support AI workloads like TFJob. And it also provides support for workloads without replicas.

Signed-off-by: AiRanthem <zhongtianyun.zty@alibaba-inc.com>
Signed-off-by: AiRanthem <zhongtianyun.zty@alibaba-inc.com>
@AiRanthem AiRanthem force-pushed the feature/ws-job-like-support-241118 branch from 5d99bfa to cda1e9b Compare December 25, 2024 09:24
@furykerry
Copy link
Member

/lgtm

@zmberg zmberg closed this Dec 27, 2024
@zmberg zmberg reopened this Dec 27, 2024
@zmberg zmberg merged commit 0f3b58a into openkruise:master Dec 27, 2024
74 of 76 checks passed
furykerry pushed a commit to furykerry/kruise that referenced this pull request Feb 8, 2025
* A TargetFilter is added to WorkloadSpread to make it possible to manage only a part of Pods owned by a target workload to support AI workloads like TFJob. And it also provides support for workloads without replicas.

Signed-off-by: AiRanthem <zhongtianyun.zty@alibaba-inc.com>

* fix some logs

Signed-off-by: AiRanthem <zhongtianyun.zty@alibaba-inc.com>

---------

Signed-off-by: AiRanthem <zhongtianyun.zty@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request] WorkloadSpread support AI Job
4 participants