[PROPOSAL] no official production deployment method policy #821

thesuperzapper · 2025-02-14T21:30:11Z

Background

We originally created the concept of "Kubeflow Distributions" because it's not sustainable for the community to maintain an official distribution that can be used everywhere. This is because of the extreme variation in the way people build AI platforms (which Kubernetes they use, which tools they choose, how they do auth, how they do networking, etc.).

Given the recent push by some members to create official helm charts for the "Kubeflow Platform", and promote them on the kubeflow/manifests repository, I believe that it is critical we agree on a path forward that ensures the project remains sustainable.

Proposal

Point 1: We agree not to build or promote any specific way of deploying "Kubeflow Platform" as official, or supported directly by the community for production usage.
Point 2: We create a formal conformance program that sets minimum expectations (i.e. included tools, open-source licensing, etc.) to be called a "Kubeflow Platform" and be listed on the Kubeflow Website.
Point 3: We agree that kubeflow/manifests are not intended to be used in production without manual changes, and we will not support vendor-specific integrations beyond ensuring they work on local Kind clusters.

Questions

Why is this related to Helm charts?
- Having official helm charts sets the expectation that they are the supported and production ready way to deploy Kubeflow.
- Problem 1: we are an open source project, we don't have the resources to support every possible deployment.
- Problem 2: we would need to be very opinionated to make a helm chart, and this will exclude certain users
- Problem 3: kubeflow is too complex to deploy in a single helm chart, meaning we would need to develop complex systems to handle state, or introduce dependencies on ArgoCD, which makes us even more opinionated and excludes more users.
Are you saying Kubeflow should not have helm charts?
- I am suggesting only that we don't have an official "Kubeflow Platform" deployment method for production usage.
- This includes all deployment methods, not just helm.
- We already have helm charts for some standalone components like Spark Operator. This is great and should continue.
- PS: I love Helm and maintain one of the most popular helm charts for Apache Airflow.

Definitions

Kubeflow Platform:
- The collection of Kubeflow components (notebooks, pipelines, etc.) when deployed with platform-level features like "multi-tenancy" and a "service mesh", rather than as "Standalone Components".
Standalone Components:
- Components of Kubeflow that can be deployed on their own, without depending on others or the wider platform-level features.

The text was updated successfully, but these errors were encountered:

thesuperzapper · 2025-02-14T21:34:16Z

@kubeflow/kubeflow-steering-committee I think this issue is critical for the health of the project.

I would like formally request your feedback and formal vote on this topic (after the community has had reasonable time to raise their opinions).

thesuperzapper · 2025-02-14T21:34:27Z

/cc @kubeflow/wg-notebooks-leads @kubeflow/wg-data-leads @kubeflow/wg-automl-leads @kubeflow/wg-pipeline-leads @kubeflow/wg-training-leads @kubeflow/wg-manifests-leads

juliusvonkohout · 2025-02-16T11:37:55Z

My current definition of distribution differs a lot. For me it becomes a distribution by deriving /deviating from Kubeflow/manifests in private or public.

And I am definitely in favour of basic helm charts as 1:1 Kustomize copy, opposite to what you require from a helm chart.

The production usage warning is also something I do not support and my boundaries are more inclusive. From my point of view you would need to heavily cater to for example AWS to cross the line. I think we are far far away from that in kubeflow/manifests and we could tell users about special quirks for popular platforms although our main target is Kubernetes including authentication.

Maybe on the "no official distribution I could agree, but only partially because kubeflow/manifests is not a distribution by my definition.

For me it's a balance between commercial and community interests and the more development happens upstream in kubeflow/manifests the better, because everyone can gain from it. I also have a longer text/proposal discussed, but like I said before the KSC has 5 members and is still refining an answer, so you might have to wait a bit.

This is what i have in mind:

"The Kubeflow manifests provide a quick way to get a minimum viable Kubeflow Platform up and running. We also welcome contributions and bug reports very much to improve the experience for everyone. The Kubeflow community support for Kubeflow manifests is best-effort, and not guaranteed for environment-specific issues or custom configurations. If you explicitly need commercial support there are many options. You can use a third-party commercial distribution, hire consultants or build up the knowledge yourself to maintain and extend your Kubeflow installation."

thesuperzapper · 2025-02-18T01:05:21Z

For me it's a balance between commercial and community interests and the more development happens upstream in kubeflow/manifests the better, because everyone can gain from it. I also have a longer text/proposal discussed, but like I said before the KSC has 5 members and is still refining an answer, so you might have to wait a bit.

@juliusvonkohout It's not war between commercial vs community, almost everyone who works on Kubeflow is paid to be here, either as consultants, distribution vendors, or end-users of the actual tools.

The reason I am so serious about these 3 points is that it's simply not sustainable for the community to maintain an official distribution that can be used everywhere. There is extreme variation in the way people build AI platforms on Kubernetes:

deployment method - helm/operator/kustomize/argocd/etc
Kubernetes distro - aws/local/gcp/etc
KF tools they want - kfp, notebooks, kserve, etc.
auth - LDAP/Okta/etc
networking - Ingress-types/Load-Balancers/etc

indemnifyai · 2025-02-18T16:46:13Z

I am +1 on the community not developing, promoting and supporting an "official production ready" distribution for all environments. I think those requirements are too large and would be very difficult to deliver. That said, having an installation solution that "works" in a specific configuration(s) would potentially address a concern from (new) users, who are just wanting to try out Kubeflow. I propose the term "production" means different things to different people and not every application/installation needs 99.999% uptime. If the concern is around "official" and "production", then we can use other terms like community supported and perhaps "operational" or "functional". I propose the community Helm project could define the installation and support expectation (and limitations) around a community supported Helm installation pattern that provides an operational Kubeflow cluster in a specific environment. My 2 cents, Josh

kromanow94 · 2025-02-19T11:55:43Z

Hello, my suggestion is to focus on a Helm Chart that deploys Kubeflow Components and Kubeflow Components only, provide a nice templating and parameterization capabilities and document some use cases for usage and configuration of dependencies in different environments.

I think going anywhere beyond that is too much on the community and at the same time provides great flexibility for any distribution.

This is exactly what's happening here:
https://github.com/kromanow94/kubeflow-manifests/releases/tag/kubeflow-0.4.0

From my perspective, there is not too much configuration needed for Kubeflow Components to work on AWS/GCP/local. The big differences shows up where we consider the dependencies together with Kubeflow Components as the whole Kubeflow Platform. Istio, cert-manager, Argo WF, Dex, oauth2-proxy (and more), all of them can have different configuration for specific cloud vendor. Documenting these cases and showing differences between a quick install and cloud specific configuration so people can make their distributions based on official guidelines and an easy to parameterize tool is something maintainable and I think it would be very much appreciated by the community.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROPOSAL] no official production deployment method policy #821

[PROPOSAL] no official production deployment method policy #821

thesuperzapper commented Feb 14, 2025

thesuperzapper commented Feb 14, 2025

thesuperzapper commented Feb 14, 2025

juliusvonkohout commented Feb 16, 2025 •

edited

Loading

thesuperzapper commented Feb 18, 2025

indemnifyai commented Feb 18, 2025

kromanow94 commented Feb 19, 2025

[PROPOSAL] no official production deployment method policy #821

[PROPOSAL] no official production deployment method policy #821

Comments

thesuperzapper commented Feb 14, 2025

Background

Proposal

Questions

Definitions

thesuperzapper commented Feb 14, 2025

thesuperzapper commented Feb 14, 2025

juliusvonkohout commented Feb 16, 2025 • edited Loading

thesuperzapper commented Feb 18, 2025

indemnifyai commented Feb 18, 2025

kromanow94 commented Feb 19, 2025

juliusvonkohout commented Feb 16, 2025 •

edited

Loading