Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk/backend): enable parameterization of container images #11404

Merged
merged 1 commit into from
Dec 4, 2024

Conversation

HumairAK
Copy link
Collaborator

Resolves #11391

Description of your changes:
This change allows component base images to be parameterized using runtime pipeline parameters. The container images can be specified within an @pipeline decorated function, and takes precedence over the @component(base_image=..) argument.

This change also adds logic to resolve these runtime parameters in the argo driver logic. It also includes resolution steps for resolving the accelerator type which functions the same way but was missing the resolution logic. The resolution logic is a generic workaround solution for any run time pod spec input parameters that cannot be resolved because they cannot be added dynamically in the argo pod spec container template.

Checklist:

@HumairAK HumairAK force-pushed the set_container_img_2 branch 2 times, most recently from 2e8e4fd to ee6677c Compare November 25, 2024 13:55
@github-actions github-actions bot added the ci-passed All CI tests on a pull request have passed label Nov 25, 2024
backend/src/v2/driver/driver.go Outdated Show resolved Hide resolved
backend/src/v2/driver/driver.go Outdated Show resolved Hide resolved
backend/src/v2/driver/util.go Outdated Show resolved Hide resolved
backend/src/v2/driver/util.go Show resolved Hide resolved
sdk/python/kfp/dsl/pipeline_task.py Show resolved Hide resolved
@jgarciao
Copy link

One question: if the user sets the image with set_container_image, is the cache key using that setting?

@HumairAK HumairAK force-pushed the set_container_img_2 branch 2 times, most recently from 252eb8c to 1c2da91 Compare November 25, 2024 17:40
@HumairAK
Copy link
Collaborator Author

One question: if the user sets the image with set_container_image, is the cache key using that setting?

this is a good point, I think we may want to resolve the container image before creating the fingerprint

@github-actions github-actions bot added ci-passed All CI tests on a pull request have passed and removed ci-passed All CI tests on a pull request have passed labels Nov 25, 2024
Copy link

@javali-google javali-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Is the documentation updated as part of different PR?

@HumairAK
Copy link
Collaborator Author

this is a good point, I think we may want to resolve the container image before creating the fingerprint

Actually we don't need to cover this as it's already covered via the inputs field. Since fingerprint is also generated based on input, every image change via runtime parameter input will result in a new cache, which is what we want.

Example:

For this sample pipeline:

@dsl.component(base_image="docker.io/python:3.9.17")
def empty_component(image: str, otherinput: str):
    print("image=" + image + ", otherinput=" + otherinput)

@dsl.pipeline()
def pipeline_accel(img: str, otherinput: str):
    task = empty_component(image=img, otherinput=otherinput)
    task.set_container_image(img)

We have:

image
image

@HumairAK
Copy link
Collaborator Author

HumairAK commented Nov 25, 2024

@javali-google I belive the docs are auto generated and served via readthedocs

@chensun can confirm

my expectation is the docstring here will be auto surfaced like the other task fields here

@HumairAK HumairAK requested a review from chensun November 25, 2024 20:34
@HumairAK HumairAK added this to the KFP SDK 2.11 milestone Nov 25, 2024
@jgarciao
Copy link

Actually we don't need to cover this as it's already covered via the inputs field. Since fingerprint is also generated based on input, every image change via runtime parameter input will result in a new cache, which is what we want ...

Thanks for the detailed analysis, @HumairAK !

@jgarciao
Copy link

lgtm

sdk/python/kfp/compiler/compiler_test.py Show resolved Hide resolved
backend/src/v2/driver/driver.go Outdated Show resolved Hide resolved
backend/src/v2/driver/util.go Outdated Show resolved Hide resolved
backend/src/v2/driver/util.go Outdated Show resolved Hide resolved
@HumairAK HumairAK force-pushed the set_container_img_2 branch from 1c2da91 to 2f493c0 Compare November 27, 2024 14:32
@github-actions github-actions bot added ci-passed All CI tests on a pull request have passed and removed ci-passed All CI tests on a pull request have passed labels Nov 27, 2024
@HumairAK HumairAK force-pushed the set_container_img_2 branch from 2f493c0 to 6a06019 Compare November 27, 2024 15:21
@HumairAK
Copy link
Collaborator Author

cc @chensun PTAL when you get the chance it would be good to be able to get this in for 2.11

@github-actions github-actions bot removed the ci-passed All CI tests on a pull request have passed label Nov 27, 2024
@github-actions github-actions bot added the ci-passed All CI tests on a pull request have passed label Nov 27, 2024
Copy link
Contributor

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the CHANGELOG needs to be updated too?

This change allows component base images to be parameterized using runtime pipeline parameters. The container images can be specified within an @pipeline decorated function, and takes precedence over the @component(base_image=..) argument.

This change also adds logic to resolve these runtime parameters in the argo driver logic. It also includes resolution steps for resolving the accelerator type which functions the same way but was missing the resolution logic. The resolution logic is a generic workaround solution for any run time pod spec input parameters that cannot be resolved because they cannot be added dynamically in the argo pod spec container template.

Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
@HumairAK HumairAK force-pushed the set_container_img_2 branch from 6a06019 to dd0f46c Compare November 28, 2024 15:25
@HumairAK
Copy link
Collaborator Author

I believe the CHANGELOG needs to be updated too?

good call, updated!

@jgarciao
Copy link

jgarciao commented Dec 3, 2024

/lgtm

Copy link

@jgarciao: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@VaniHaripriya
Copy link
Contributor

VaniHaripriya commented Dec 3, 2024

/lgtm verified the accelerator type and container image by passing them as input parameters. I used the below pipeline to test the functionality -

from kfp import dsl, compiler

@dsl.component(base_image='docker.io/python:3.9.17')
def empty_component():
    pass

@dsl.pipeline()
def simple_pipeline(img: str,atype: str):        
    task = empty_component()    
    task.set_container_image(img)
    task.set_accelerator_type(atype)

if __name__ == '__main__':
    compiler.Compiler().compile(pipeline_accel, __file__.replace('.py', '-run.yaml'))
  • The container image in the launcher pod was updated with the input image parameter I provided.
  • The accelerator type appeared in the resources section of the launcher pod as show below-
- resources:
       limits:
         gpu-vendor.example/example-gpu: '1'
       requests:
         gpu-vendor.example/example-gpu: '1'

@VaniHaripriya
Copy link
Contributor

/lgtm

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: jgarciao, leseb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 22e85de into kubeflow:master Dec 4, 2024
39 checks passed
@zeidsolh
Copy link

Hey this is not full working. When I use a base_image that comes from a previous component I get the following error:
│ Warning InspectFailed 64s (x8 over 2m29s) kubelet Failed to apply default image tag "{{$.inputs.parameters['pipelinechannel--get-image-url-Output']}} │
│ ": couldn't parse image name "{{$.inputs.parameters['pipelinechannel--get-image-url-Output']}}": invalid reference format: repository name (library/{{$.inputs.parameters['pipelinechanne │
│ l--get-image-url-Output']}}) must be lowercase

@HumairAK
Copy link
Collaborator Author

@zeidsolh have you deployed from master ? this requires backend changes that are not in the latest kfp release yet, it will be available in 2.4, or you can deploy from the master branch and check there

@zeidsolh
Copy link

@HumairAK I did a pip install kfp=2.11.0. So how do you recommend I install it instead? Could you please provide the steps

@zeidsolh
Copy link

@zeidsolh have you deployed from master ? this requires backend changes that are not in the latest kfp release yet, it will be available in 2.4, or you can deploy from the master branch and check there

Hey Humair, I just deployed from master and still getting the following issue:
│ Warning InspectFailed 64s (x8 over 2m29s) kubelet Failed to apply default image tag "{{$.inputs.parameters['pipelinechannel--get-image-url-Output']}} │
│ ": couldn't parse image name "{{$.inputs.parameters['pipelinechannel--get-image-url-Output']}}": invalid reference format: repository name (library/{{$.inputs.parameters['pipelinechanne │
│ l--get-image-url-Output']}}) must be lowercase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved ci-passed All CI tests on a pull request have passed lgtm size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature] Add ability to parameterize container images
6 participants