Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] no official production deployment method policy #821

Open
thesuperzapper opened this issue Feb 14, 2025 · 6 comments
Open

[PROPOSAL] no official production deployment method policy #821

thesuperzapper opened this issue Feb 14, 2025 · 6 comments

Comments

@thesuperzapper
Copy link
Member

Background

We originally created the concept of "Kubeflow Distributions" because it's not sustainable for the community to maintain an official distribution that can be used everywhere. This is because of the extreme variation in the way people build AI platforms (which Kubernetes they use, which tools they choose, how they do auth, how they do networking, etc.).

Given the recent push by some members to create official helm charts for the "Kubeflow Platform", and promote them on the kubeflow/manifests repository, I believe that it is critical we agree on a path forward that ensures the project remains sustainable.

Proposal

  • Point 1: We agree not to build or promote any specific way of deploying "Kubeflow Platform" as official, or supported directly by the community for production usage.

  • Point 2: We create a formal conformance program that sets minimum expectations (i.e. included tools, open-source licensing, etc.) to be called a "Kubeflow Platform" and be listed on the Kubeflow Website.

  • Point 3: We agree that kubeflow/manifests are not intended to be used in production without manual changes, and we will not support vendor-specific integrations beyond ensuring they work on local Kind clusters.

Questions

  • Why is this related to Helm charts?

    • Having official helm charts sets the expectation that they are the supported and production ready way to deploy Kubeflow.
    • Problem 1: we are an open source project, we don't have the resources to support every possible deployment.
    • Problem 2: we would need to be very opinionated to make a helm chart, and this will exclude certain users
    • Problem 3: kubeflow is too complex to deploy in a single helm chart, meaning we would need to develop complex systems to handle state, or introduce dependencies on ArgoCD, which makes us even more opinionated and excludes more users.
  • Are you saying Kubeflow should not have helm charts?

    • I am suggesting only that we don't have an official "Kubeflow Platform" deployment method for production usage.
    • This includes all deployment methods, not just helm.
    • We already have helm charts for some standalone components like Spark Operator. This is great and should continue.
    • PS: I love Helm and maintain one of the most popular helm charts for Apache Airflow.

Definitions

  • Kubeflow Platform:

    • The collection of Kubeflow components (notebooks, pipelines, etc.) when deployed with platform-level features like "multi-tenancy" and a "service mesh", rather than as "Standalone Components".
  • Standalone Components:

    • Components of Kubeflow that can be deployed on their own, without depending on others or the wider platform-level features.
@thesuperzapper
Copy link
Member Author

@kubeflow/kubeflow-steering-committee I think this issue is critical for the health of the project.

I would like formally request your feedback and formal vote on this topic (after the community has had reasonable time to raise their opinions).

@thesuperzapper
Copy link
Member Author

/cc @kubeflow/wg-notebooks-leads @kubeflow/wg-data-leads @kubeflow/wg-automl-leads @kubeflow/wg-pipeline-leads @kubeflow/wg-training-leads @kubeflow/wg-manifests-leads

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Feb 16, 2025

My current definition of distribution differs a lot. For me it becomes a distribution by deriving /deviating from Kubeflow/manifests in private or public.

And I am definitely in favour of basic helm charts as 1:1 Kustomize copy, opposite to what you require from a helm chart.

The production usage warning is also something I do not support and my boundaries are more inclusive. From my point of view you would need to heavily cater to for example AWS to cross the line. I think we are far far away from that in kubeflow/manifests and we could tell users about special quirks for popular platforms although our main target is Kubernetes including authentication.

Maybe on the "no official distribution I could agree, but only partially because kubeflow/manifests is not a distribution by my definition.

For me it's a balance between commercial and community interests and the more development happens upstream in kubeflow/manifests the better, because everyone can gain from it. I also have a longer text/proposal discussed, but like I said before the KSC has 5 members and is still refining an answer, so you might have to wait a bit.

This is what i have in mind:

"The Kubeflow manifests provide a quick way to get a minimum viable Kubeflow Platform up and running. We also welcome contributions and bug reports very much to improve the experience for everyone. The Kubeflow community support for Kubeflow manifests is best-effort, and not guaranteed for environment-specific issues or custom configurations. If you explicitly need commercial support there are many options. You can use a third-party commercial distribution, hire consultants or build up the knowledge yourself to maintain and extend your Kubeflow installation."

@thesuperzapper
Copy link
Member Author

For me it's a balance between commercial and community interests and the more development happens upstream in kubeflow/manifests the better, because everyone can gain from it. I also have a longer text/proposal discussed, but like I said before the KSC has 5 members and is still refining an answer, so you might have to wait a bit.

@juliusvonkohout It's not war between commercial vs community, almost everyone who works on Kubeflow is paid to be here, either as consultants, distribution vendors, or end-users of the actual tools.

The reason I am so serious about these 3 points is that it's simply not sustainable for the community to maintain an official distribution that can be used everywhere. There is extreme variation in the way people build AI platforms on Kubernetes:

  • deployment method - helm/operator/kustomize/argocd/etc
  • Kubernetes distro - aws/local/gcp/etc
  • KF tools they want - kfp, notebooks, kserve, etc.
  • auth - LDAP/Okta/etc
  • networking - Ingress-types/Load-Balancers/etc

@indemnifyai
Copy link

I am +1 on the community not developing, promoting and supporting an "official production ready" distribution for all environments. I think those requirements are too large and would be very difficult to deliver. That said, having an installation solution that "works" in a specific configuration(s) would potentially address a concern from (new) users, who are just wanting to try out Kubeflow. I propose the term "production" means different things to different people and not every application/installation needs 99.999% uptime. If the concern is around "official" and "production", then we can use other terms like community supported and perhaps "operational" or "functional". I propose the community Helm project could define the installation and support expectation (and limitations) around a community supported Helm installation pattern that provides an operational Kubeflow cluster in a specific environment. My 2 cents, Josh

@kromanow94
Copy link

Hello, my suggestion is to focus on a Helm Chart that deploys Kubeflow Components and Kubeflow Components only, provide a nice templating and parameterization capabilities and document some use cases for usage and configuration of dependencies in different environments.

I think going anywhere beyond that is too much on the community and at the same time provides great flexibility for any distribution.

This is exactly what's happening here:
https://github.com/kromanow94/kubeflow-manifests/releases/tag/kubeflow-0.4.0

From my perspective, there is not too much configuration needed for Kubeflow Components to work on AWS/GCP/local. The big differences shows up where we consider the dependencies together with Kubeflow Components as the whole Kubeflow Platform. Istio, cert-manager, Argo WF, Dex, oauth2-proxy (and more), all of them can have different configuration for specific cloud vendor. Documenting these cases and showing differences between a quick install and cloud specific configuration so people can make their distributions based on official guidelines and an easy to parameterize tool is something maintainable and I think it would be very much appreciated by the community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants