Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
# New API: AddOnPlacementScoreGenerator

## Release Signoff Checklist

- [ ] Enhancement is `implemented`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [website](https://github.com/open-cluster-management-io/open-cluster-management-io.github.io/)

## Summary

This proposal will be adding a new OCM API named `AddOnPlacementScoreGenerator`
which helps the administrators to provide custom values to Addon controllers
that generate `AddOnPlacementScores`.

A valid `AddOnPlacementScoreGenerator` resource should be in a "cluster namespace" and
the associated config resources will be delivered to the associated managed cluster
with that "cluster namespace".

## Motivation

### Influence Addon controllers behavior

Currently, when writing an Addon in order to extend the scheduling capabilities of OCM
there is no way to influence behavior of that addon via an API. The controller will run
and update the status for the `AddOnPlacementScores` object.

One of the use-cases we want to cover is testing latency from managed clusters to a set
of user-define locations. With the current `AddOnPlacementScores` implementation we would
need to hard code these locations or maybe use something like a `ConfigMap` that gets consumed
by the controller. We don't find these solutions flexible enough, so our proposal would be having
a new API to influence the behavior of such controllers.

Let's say I want to test latencies to redhat.com and google.com and place my application based on
the managed cluster with the lowest latency to redhat.com.

Providing I created the required controller with the hardcoded locations (redhat.com and google.com)
an `AddOnPlacementScores` similar to this one would be created on each managed cluster running this addon:

~~~yaml
apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: AddOnPlacementScore
metadata:
name: cluster1-generator
namespace: cluster1
status:
conditions:
- lastTransitionTime: "2021-10-28T08:31:39Z"
message: AddOnPlacementScore updated successfully
reason: AddOnPlacementScoreUpdated
status: "True"
type: AddOnPlacementScoreUpdated
validUntil: "2021-10-29T18:31:39Z"
scores:
- name: "redhat-com-avgLatency"
value: 30
- name: "google-com-avgLatency"
value: 50
~~~

Now, a `Placement` like this could be used:

~~~yaml
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
name: latency-placement
namespace: ns1
spec:
numberOfClusters: 3
prioritizerPolicy:
mode: Exact
configurations:
- scoreCoordinate:
type: AddOn
addOn:
resourceName: cluster1-generator
scoreName: redhat-com-avgLatency
weight: -1
~~~

Other applications may use latency to google.com.

Now, we want to add linux.com to the list of locations to test. With the current implementation we
will need to edit the code of the addon controller and include the test to linux.com + the result to be added
to the `AddOnPlacementScore`.

To fix above issue, the proposal is to create a new API `AddOnPlacementScoreGenerator`, which could
look like this for the example above:

~~~yaml
apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: AddOnPlacementScoreGenerator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe AddOnPlacementScoreDataSource?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wfm

Copy link
Author

@mvazquezc mvazquezc Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the comment below:

  1. I don't think we want to create types... I was thinking of an unstructured data type where the controller consuming the API should take care of reading the fields it expects to be present in order to generate the required scores.
  2. Controller should take care of that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not suggest unstructured type, it will lose the schema validation and make the API unbounded.

metadata:
name: cluster1-generator
namespace: cluster1
spec:
addOnPlacementSelector:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this actually the name of the score?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really, it's the name of the AddOnPlacementScore. This selector would be used by the controller generating the AddOnPlacementScore to get the AddOnPlacementScoreGenerator that applies to this specific AddOnPlacementScore.

name: latency
namespace: cluster1
latencies:
- name: redhat-com-avgLatency
url: https://redhat.com
runs: 2
waitBetweenRuns: 10s
- name: google-com-avgLatency
url: https://google.com
- name: linux-com-avgLatency
url: https://linux.com
~~~

Our controller can now read the `AddOnPlacementScoreGenerator` and our specific controller will be interested on
everything below `.spec.latencies`. For example, the redhat-com-avgLatency will have the result of running two latency
tests to https://redhat.com and waiting 10s between runs, then the mean value will be posted to the `AddOnPlacementScore`
status.

## Goals & Non-Goals

### Goals

- Help the administrators to provide custom values to addon controllers that generate `AddOnPlacementScores`

### Non-Goals

- Addond developers need to develop their own controller to consume this new API.

### Future goals

- It is currently assumed that the user of `AddOnPlacementScoreGenerator` is either a
cluster admin or a user who can create `AddOnPlacementScoreGenerator` in the hub cluster's
managed "cluster namespace".

## Design

### Component & API

We purpose to adding a new custom resource named
`AddOnPlacementScoreGenerator` introduced into OCM by this proposal:

A sample of the `AddOnPlacementScoreGenerator` will be:

~~~yaml
apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: AddOnPlacementScoreGenerator
metadata:
name: cluster1-generator
namespace: cluster1
spec:
addOnPlacementSelector:
name: latency
namespace: cluster1
latencies:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is, should this field latencies be a more general field? and how could a user define a URL here, what's the requirement for the URL and the output?

Copy link
Author

@mvazquezc mvazquezc Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @haoqing0110 yes, other than more generic to me this would be "any string" and even we could have more than one, something like this for example:

apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: AddOnPlacementScoreGenerator
metadata:
  name: cluster1-generator
  namespace: cluster1
spec:
  addOnPlacementSelector:
    name: latency
    namespace: cluster1
  latencies:
    - name: redhat-com-avgLatency
      url: https://redhat.com
      runs: 2
      waitBetweenRuns: 10s
    - name: google-com-avgLatency
      url: https://google.com
    - name: linux-com-avgLatency
      url: https://linux.com
  energy:
    - name: some-name
      someValue: some-value

Then, the controller interested in latencies will consume latencies , the controller interested in energy will consume energy.

Copy link
Member

@haoqing0110 haoqing0110 Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. This AddOnPlacementScoreGenerator sounds like a configuration for the Addon controller, and to generate different scores, developers still need to develop their own addon controller, correct? Sounds similar to the idea of https://github.com/open-cluster-management-io/enhancements/tree/main/enhancements/sig-architecture/58-addon-configuration, not sure if this can satisfy the motivation of this proposal, and this proposal is like to propose a new addon configuration called AddOnPlacementScoreGenerator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'm wondering could the controller to consume latencies and energy could be a unified controller, it can consume different kind of data source, eg, an URL https://redhat.com or some host path /some/path, or some other things. And this API just defines the supported data source types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @qiujian16 , want to know your suggestions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haoqing0110 the use cases defined by 58-addon-configuration would work for us, I have checked the examples and the example with ManagedProxyConfiguration is very similar to what we want.

And answering your second comment, yes, the same controller could use latencies and energy data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more like a cluster scoped resource that applies to all clusters, it is less likely that we want different configs for different clusters. Also how could we parse the result of the http responses? It might be straightforward for latency. But if we try to fetch data from a metrics server or 3rd party services to generate the score, the returned output varies and we might need a way to define how to parse the result.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @qiujian16,

I think that for latencies global object or per-cluster object would both work. In terms of data, I was thinking of an unstructured structure where people developing custom controllers with the addon framework will consume the data the way they need it. I don't think this should be a controller taking care of multiple data sources, instead a single controller taking care of a data source, so if anything were wrong with the data itself it wouldn't affect the other add-ons.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on Mario's approach on a single controller taking care of a single data source type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding, in this API we should define

  1. how to get the raw data. It can not be only latency. It probably could be a field type and latency could be a certain type to get raw data.
  2. how to convent raw data to the score

- name: redhat-com-avgLatency
url: https://redhat.com
runs: 2
waitBetweenRuns: 10s
otherPlugin:
- name: score-name
optionConsumedByPlugin: optionValue

~~~

The `AddOnPlacementScoreGenerator` resource is expected to be
created under the "cluster namespace" which is a namespace with
the same name as the managed cluster, the `AddOnPlacementScoreGenerator`
delivered to the managed cluster will have the same name as the
`AddOnPlacementScoreGenerator` resource.

The addon controller must setup a watcher and reconcile `AddOnPlacementScoreGenerator`.

### Test Plan

- Unit tests
- Integration tests

### Graduation Criteria

#### Alpha

At first, This proposal will be in the alpha stage and needs to meet

1. The new APIs are reviewed and accepted;
2. Implementation is completed to support the functionalities;
3. Develop test cases to demonstrate this proposal works correctly;

#### Beta
1. Need to revisit the API shape before upgrading to beta based on user feedback.

### Upgrade / Downgrade Strategy
TBD

### Version Skew Strategy
N/A

## Alternatives
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
title: score-generator
authors:
- "@mvazquezc"
reviewers:
- "@deads2k"
- "@qiujian16"
- "@elgnay"
- "@haoqing0110"
approvers:
- "@deads2k"
- "@qiujian16"
- "@elgnay"
- "@haoqing0110"
creation-date: 2023-04-13
last-updated: 2023-04-13
status: provisional
see-also:
- "/enhancements/sig-architecture/32-extensiblescheduling"