-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heterogeneous architecture clusters #1014
Heterogeneous architecture clusters #1014
Conversation
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first pass
there's also a fair bit of formatting and typo cleanup needed that i didn't explicilty scrub for
also in general "manifestlist" is written as one word, it's effectively a noun.
### Machine Config Operator | ||
|
||
No changes are needed in the MCO for Phase 1 as the machine-os-content image would be a manifest listed image and the machine config daemon would extract the relevant architecture's machine-os-content based on the node it runs on. In the future architecture | ||
specific configurations should not be required, instead they need to be templatized and be generalized for certain items, for example kubelet system reserved memory could scale based on page size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so:
initially, we'll expect the user to provide any additional args/config in the machineset config
middle-state, we'll explicitly configure those things based on the arch
final-state, we'll auto-configure them based on generalized analysis
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i still have to understand from Colin/Mrunal the changes involved here for MCO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrunalp could you please clarify what specifically would change in MCO when we support heterogeneous clusters a opposed to what we are doing today ?
a46117c
to
e8348e1
Compare
e8348e1
to
2cbce2d
Compare
did a fair bit of cleanup on the typos and formatting. |
2cbce2d
to
7e09475
Compare
46074ba
to
6469fdc
Compare
3a6fa8d
to
b925a5f
Compare
68d577b
to
a54b492
Compare
eba03e1
to
e525ffd
Compare
|
||
As part of this migration, the ClusterVersion API's spec would need to have an additional "architecture" field to indicate the architecture of the cluster. | ||
This field would indicate whether the cluster is a homogeneous or heterogeneous cluster with the individual arches denoting a homogeneous cluster and | ||
"multi" indicating a heterogeneous cluster. This field would also be part of the ClusterVersionStatus to indicate to consumers of the API who would want to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there going to be a metric exposing this so that we can slice our telemetry data based on this information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes in the future we do want to have this. Today for collecting data on the number of clusters per arch we have to look at a node's arch label. This will make it easier going forward to identify the cluster arch from the ClusterVersionStatus field rather than looking at a node's arch in that cluster.
21037c4
to
78bbc34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm a little unclear on the intent of this EP at this point. Given 4.11 is about to ship, i'd expect this EP to be a lot more concrete about exactly what is/was done for 4.11/phase1, and then reserving the more "design" bits for the work that still needs to be done.
this enhancement was opened in January and since then there are a lot of changes and lot of implementation details that have been worked on and changed. that being said, the deliverables for 4.11 (and other phases) are reflected accurately i believe. I have tried to keep most of the component sections up to date, but i see that there are several sections which are outdated. It would be good to get this merged soon, so incremental changes can be made by the individual component teams themselves as they progress. In the meanwhile I will work on updating all sections to reflect the latest so we can get it merged by end of next week. |
a5db613
to
67c679a
Compare
@bparees i've cleaned up some of the sections which did not reflect the current plan. Are there any other concerns from your side as i look to merge this week? |
@Prashanth684 no further concerns, the cleanup has significantly improved readability and made it clearer what work is being proposed in what areas/components(and why). Thanks! |
67c679a
to
0e93569
Compare
- When creating the hosted cluster, the latest x86 release payload is picked by default. This would have to change to pick a heterogeneous payload | ||
when creating the hosted cluster on mixed architecture workers. | ||
- For the hosted control plane to work on non x86 architecture worker nodes, the following image references would need changing: | ||
- the release image fixtures referencing x86 imagestreams. these would need to be changed to manifestlist references or there would need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which images is this referring this exactly? we infer everything else from hostedCluster.spec.release.image, so as far as that's heterogeneous, any component should pick the right arch from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm referring to these: https://github.com/openshift/hypershift/tree/main/support/releaseinfo/fixtures . shouldn't there be a fixture added with manifestlisted pullspecs for heterogeneous?
0e93569
to
8b47389
Compare
Introduces support for provisioning and upgrading heterogenous architecture clusters in phases Co-authored-by: Ben Parees <bparees@users.noreply.github.com>
8b47389
to
0db5f58
Compare
/approve i expect this EP will continue to be a living document as we work our way through the implementation phases, but for now it outlines the big picture steps we need to take and the implications. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bparees The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@Prashanth684: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Introduces support for provisioning and upgrading heterogenous architecture clusters in phases