-
Notifications
You must be signed in to change notification settings - Fork 40
Add KEP for AddOnPlacementScoreGenerator #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,196 @@ | ||
# New API: AddOnPlacementScoreGenerator | ||
|
||
## Release Signoff Checklist | ||
|
||
- [ ] Enhancement is `implemented` | ||
- [ ] Design details are appropriately documented from clear requirements | ||
- [ ] Test plan is defined | ||
- [ ] Graduation criteria for dev preview, tech preview, GA | ||
- [ ] User-facing documentation is created in [website](https://github.com/open-cluster-management-io/open-cluster-management-io.github.io/) | ||
|
||
## Summary | ||
|
||
This proposal will be adding a new OCM API named `AddOnPlacementScoreGenerator` | ||
which helps the administrators to provide custom values to Addon controllers | ||
that generate `AddOnPlacementScores`. | ||
|
||
A valid `AddOnPlacementScoreGenerator` resource should be in a "cluster namespace" and | ||
the associated config resources will be delivered to the associated managed cluster | ||
with that "cluster namespace". | ||
|
||
## Motivation | ||
|
||
### Influence Addon controllers behavior | ||
|
||
Currently, when writing an Addon in order to extend the scheduling capabilities of OCM | ||
there is no way to influence behavior of that addon via an API. The controller will run | ||
and update the status for the `AddOnPlacementScores` object. | ||
|
||
One of the use-cases we want to cover is testing latency from managed clusters to a set | ||
of user-define locations. With the current `AddOnPlacementScores` implementation we would | ||
need to hard code these locations or maybe use something like a `ConfigMap` that gets consumed | ||
by the controller. We don't find these solutions flexible enough, so our proposal would be having | ||
a new API to influence the behavior of such controllers. | ||
|
||
Let's say I want to test latencies to redhat.com and google.com and place my application based on | ||
the managed cluster with the lowest latency to redhat.com. | ||
|
||
Providing I created the required controller with the hardcoded locations (redhat.com and google.com) | ||
an `AddOnPlacementScores` similar to this one would be created on each managed cluster running this addon: | ||
|
||
~~~yaml | ||
apiVersion: cluster.open-cluster-management.io/v1alpha1 | ||
kind: AddOnPlacementScore | ||
metadata: | ||
name: cluster1-generator | ||
namespace: cluster1 | ||
status: | ||
conditions: | ||
- lastTransitionTime: "2021-10-28T08:31:39Z" | ||
message: AddOnPlacementScore updated successfully | ||
reason: AddOnPlacementScoreUpdated | ||
status: "True" | ||
type: AddOnPlacementScoreUpdated | ||
validUntil: "2021-10-29T18:31:39Z" | ||
scores: | ||
- name: "redhat-com-avgLatency" | ||
value: 30 | ||
- name: "google-com-avgLatency" | ||
value: 50 | ||
~~~ | ||
|
||
Now, a `Placement` like this could be used: | ||
|
||
~~~yaml | ||
apiVersion: cluster.open-cluster-management.io/v1beta1 | ||
kind: Placement | ||
metadata: | ||
name: latency-placement | ||
namespace: ns1 | ||
spec: | ||
numberOfClusters: 3 | ||
prioritizerPolicy: | ||
mode: Exact | ||
configurations: | ||
- scoreCoordinate: | ||
type: AddOn | ||
addOn: | ||
resourceName: cluster1-generator | ||
scoreName: redhat-com-avgLatency | ||
weight: -1 | ||
~~~ | ||
|
||
Other applications may use latency to google.com. | ||
|
||
Now, we want to add linux.com to the list of locations to test. With the current implementation we | ||
will need to edit the code of the addon controller and include the test to linux.com + the result to be added | ||
to the `AddOnPlacementScore`. | ||
|
||
To fix above issue, the proposal is to create a new API `AddOnPlacementScoreGenerator`, which could | ||
look like this for the example above: | ||
|
||
~~~yaml | ||
apiVersion: cluster.open-cluster-management.io/v1alpha1 | ||
kind: AddOnPlacementScoreGenerator | ||
metadata: | ||
name: cluster1-generator | ||
namespace: cluster1 | ||
spec: | ||
addOnPlacementSelector: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this actually the name of the score? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not really, it's the name of the |
||
name: latency | ||
namespace: cluster1 | ||
latencies: | ||
- name: redhat-com-avgLatency | ||
url: https://redhat.com | ||
runs: 2 | ||
waitBetweenRuns: 10s | ||
- name: google-com-avgLatency | ||
url: https://google.com | ||
- name: linux-com-avgLatency | ||
url: https://linux.com | ||
~~~ | ||
|
||
Our controller can now read the `AddOnPlacementScoreGenerator` and our specific controller will be interested on | ||
everything below `.spec.latencies`. For example, the redhat-com-avgLatency will have the result of running two latency | ||
tests to https://redhat.com and waiting 10s between runs, then the mean value will be posted to the `AddOnPlacementScore` | ||
status. | ||
|
||
## Goals & Non-Goals | ||
|
||
### Goals | ||
|
||
- Help the administrators to provide custom values to addon controllers that generate `AddOnPlacementScores` | ||
|
||
### Non-Goals | ||
|
||
- Addond developers need to develop their own controller to consume this new API. | ||
|
||
### Future goals | ||
|
||
- It is currently assumed that the user of `AddOnPlacementScoreGenerator` is either a | ||
cluster admin or a user who can create `AddOnPlacementScoreGenerator` in the hub cluster's | ||
managed "cluster namespace". | ||
|
||
## Design | ||
|
||
### Component & API | ||
|
||
We purpose to adding a new custom resource named | ||
`AddOnPlacementScoreGenerator` introduced into OCM by this proposal: | ||
|
||
A sample of the `AddOnPlacementScoreGenerator` will be: | ||
|
||
~~~yaml | ||
apiVersion: cluster.open-cluster-management.io/v1alpha1 | ||
kind: AddOnPlacementScoreGenerator | ||
metadata: | ||
name: cluster1-generator | ||
namespace: cluster1 | ||
spec: | ||
addOnPlacementSelector: | ||
name: latency | ||
namespace: cluster1 | ||
latencies: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My question is, should this field There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey @haoqing0110 yes, other than more generic to me this would be "any string" and even we could have more than one, something like this for example: apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: AddOnPlacementScoreGenerator
metadata:
name: cluster1-generator
namespace: cluster1
spec:
addOnPlacementSelector:
name: latency
namespace: cluster1
latencies:
- name: redhat-com-avgLatency
url: https://redhat.com
runs: 2
waitBetweenRuns: 10s
- name: google-com-avgLatency
url: https://google.com
- name: linux-com-avgLatency
url: https://linux.com
energy:
- name: some-name
someValue: some-value Then, the controller interested in latencies will consume There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the explanation. This There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And I'm wondering could the controller to consume There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cc @qiujian16 , want to know your suggestions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @haoqing0110 the use cases defined by 58-addon-configuration would work for us, I have checked the examples and the example with And answering your second comment, yes, the same controller could use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is more like a cluster scoped resource that applies to all clusters, it is less likely that we want different configs for different clusters. Also how could we parse the result of the http responses? It might be straightforward for latency. But if we try to fetch data from a metrics server or 3rd party services to generate the score, the returned output varies and we might need a way to define how to parse the result. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey @qiujian16, I think that for latencies global object or per-cluster object would both work. In terms of data, I was thinking of an unstructured structure where people developing custom controllers with the addon framework will consume the data the way they need it. I don't think this should be a controller taking care of multiple data sources, instead a single controller taking care of a data source, so if anything were wrong with the data itself it wouldn't affect the other add-ons. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 on Mario's approach on a single controller taking care of a single data source type. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To my understanding, in this API we should define
|
||
- name: redhat-com-avgLatency | ||
url: https://redhat.com | ||
runs: 2 | ||
waitBetweenRuns: 10s | ||
otherPlugin: | ||
- name: score-name | ||
optionConsumedByPlugin: optionValue | ||
|
||
~~~ | ||
|
||
The `AddOnPlacementScoreGenerator` resource is expected to be | ||
created under the "cluster namespace" which is a namespace with | ||
the same name as the managed cluster, the `AddOnPlacementScoreGenerator` | ||
delivered to the managed cluster will have the same name as the | ||
`AddOnPlacementScoreGenerator` resource. | ||
|
||
The addon controller must setup a watcher and reconcile `AddOnPlacementScoreGenerator`. | ||
|
||
### Test Plan | ||
|
||
- Unit tests | ||
- Integration tests | ||
|
||
### Graduation Criteria | ||
|
||
#### Alpha | ||
|
||
At first, This proposal will be in the alpha stage and needs to meet | ||
|
||
1. The new APIs are reviewed and accepted; | ||
2. Implementation is completed to support the functionalities; | ||
3. Develop test cases to demonstrate this proposal works correctly; | ||
|
||
#### Beta | ||
1. Need to revisit the API shape before upgrading to beta based on user feedback. | ||
|
||
### Upgrade / Downgrade Strategy | ||
TBD | ||
|
||
### Version Skew Strategy | ||
N/A | ||
|
||
## Alternatives |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
title: score-generator | ||
authors: | ||
- "@mvazquezc" | ||
reviewers: | ||
- "@deads2k" | ||
- "@qiujian16" | ||
- "@elgnay" | ||
- "@haoqing0110" | ||
approvers: | ||
- "@deads2k" | ||
- "@qiujian16" | ||
- "@elgnay" | ||
- "@haoqing0110" | ||
creation-date: 2023-04-13 | ||
last-updated: 2023-04-13 | ||
status: provisional | ||
see-also: | ||
- "/enhancements/sig-architecture/32-extensiblescheduling" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe AddOnPlacementScoreDataSource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wfm
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the comment below:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not suggest unstructured type, it will lose the schema validation and make the API unbounded.