Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KubeFlowRunnerV2 support for InfraValidator #5512

Closed
hanneshapke opened this issue Nov 18, 2022 · 8 comments
Closed

KubeFlowRunnerV2 support for InfraValidator #5512

hanneshapke opened this issue Nov 18, 2022 · 8 comments

Comments

@hanneshapke
Copy link

Describe the feature and the current behavior/state.
The InfraValidator component seems to be highly valuable for production use cases (e.g. generating warm-up files, testing infra config). Is there any plan from the TFX team to support KubeFlowRunnerV2 in the near future? The runner doesn't seem to be supported at this point. What's the bottleneck? Could we contribute from the OSS side?

Thank you for your reply and consideration.

@rcrowe-google
Copy link
Contributor

Could we contribute from the OSS side?

Certainly if support for KubeFlowRunnerV2 could be added by creating a new component, then developing one in TFX-Addons would be an option. Before doing that let me explore why it hasn't been done already, and whether there are plans to do it in the near future.

@chongkong
Copy link
Contributor

Sorry I am little bit lost here. What is blocking for using InfraValidator + KubeflowV2DagRunner? Is it only the infra-validator that lacks the KubeFLow v2 (i.e. Vertex AI Pipelines) support?

@chongkong
Copy link
Contributor

Oh nevermind I think I found it. IIUC it's the infra_validator.proto KubernetesConfig not supporting the kubeflow v2 (at least from the docstring), right?

I was the original author of the most codes of the TFX InfraValidator; if you ask me what is the bottleneck for this, is that I don't have a good understanding of the vertex ai pipelines to come up with the reliable implementation of infra validator on it. Infra validator should be able to launch the model server job into reachable endpoint, where in KubeflowDagRunner it uses the same kubernetes cluster it runs the infra validator job, but that's probably infeasible with the Vertex's fully-managed services... Can you share your idea on the implementation?

Could we contribute from the OSS side?

Sure, I can review the PR, but I may be not very responsive due to the lack of bandwidth.. I assume you're purely contributing this for the feature-completeness rather than your immediate needs?

@hanneshapke
Copy link
Author

Hi @chongkong ,
Thank you for your reply and the context. Understandable a heavy lift to get the component to spin up an endpoint in GCP (within the same VPC, access to GCR, passing IP address back to the Infra component, etc.).

We do have an immediate need for this component, since we generated our warm up files with the component and we have a use case to test models before deploying them.

Will reply with thoughts about the infra tests in the Vertex env here in the coming days.

@EdwardCuiPeacock
Copy link
Contributor

I am also running into the same issue. I discovered that Infravalidator is not supported on Vertex AI by trying to build a pipeline and submit to Vertex, but saw the error message

NotImplementedError: The componet type "<class 'tfx.components.infra_validator.component.InfraValidator'>" is not supported

As we want to move from self-hosted Kubeflow pipeline to Vertex AI pipeline, having Infravalidator supported on Vertex would be great.

@singhniraj08 singhniraj08 self-assigned this Oct 12, 2023
@singhniraj08
Copy link
Contributor

Hi @hanneshapke,

Thank you for opening this issue. Since this issue has been open for a long time, the code/debug information for this issue may not be relevant with the current state of the code base.

The TFX team is constantly improving the framework by fixing bugs and adding new features. We suggest you try the latest TFX version with the latest compatible hardware configuration which could potentially resolve the issue. If you are still facing the issue, please create a new GitHub issue with your latest findings, with all the debugging information which could help us investigate.

Please follow the release notes to stay up to date with the latest developments which are happening in the TFX space.
Thank you!

@github-actions
Copy link
Contributor

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale label Oct 20, 2023
@hanneshapke
Copy link
Author

Hi @singhniraj08,
I don't think the issue is resolved but it extends into GCP land and might be outside of the scope of TFX. I think it is ok to close the issue, but it should be noted that GCP Vertex Pipelines doesn't support this component + functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants