Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backend] conditions in componet.yaml does not work properly #11009

Open
daro1337 opened this issue Jul 16, 2024 · 13 comments
Open

[backend] conditions in componet.yaml does not work properly #11009

daro1337 opened this issue Jul 16, 2024 · 13 comments

Comments

@daro1337
Copy link

Environment

  • How did you deploy Kubeflow Pipelines (KFP)?
    Kubeflow Platform (EKS)
  • KFP version:
    v2.2.0
  • KFP SDK version:
    v2.8.0

Steps to reproduce

Conditions in the component.yaml file are not working correctly. It appears that the argo-workflow component takes the "IF" condition argument literally. This is something that worked in kubeflow v1.8.0 and KFP 1.8.X (argoworkflow manifest)

here we I wrote dummy pipeline that prints /etc/hosts file. If line_number=true it should add -n argument to cat executable to print file with line numbers.

  1. Compile following using KFP v2.8.0
---
name: kfp-busybox
description: check-conditions

inputs:
- name: file
  type: String
  default: "/etc/hosts"
- name: line_number
  type: Boolean
  optional: true

implementation:
  container:
    args:
      - inputValue: file
      - if:
          cond:
            isPresent: line_number
          then:
          - -n
    image: busybox:latest
    command: [
      "/bin/cat"
    ]
  1. compiled component:
components:
  comp-kfp-busybox:
    executorLabel: exec-kfp-busybox
    inputDefinitions:
      parameters:
        file:
          defaultValue: /etc/hosts
          isOptional: true
          parameterType: STRING
        line_number:
          isOptional: true
          parameterType: BOOLEAN
deploymentSpec:
  executors:
    exec-kfp-busybox:
      container:
        args:
        - '{{$.inputs.parameters[''file'']}}'
        - '{"IfPresent": {"InputName": "line_number", "Then": ["-n"]}}'
        command:
        - /bin/cat
        image: busybox:latest
pipelineInfo:
  name: pipeline
root:
  dag:
    tasks:
      kfp-busybox:
        cachingOptions:
          enableCache: true
        componentRef:
          name: comp-kfp-busybox
        inputs:
          parameters:
            file:
              componentInputParameter: file
            line_number:
              componentInputParameter: line_number
        taskInfo:
          name: cat-hosts
  inputDefinitions:
    parameters:
      file:
        parameterType: STRING
      line_number:
        parameterType: BOOLEAN
schemaVersion: 2.1.0
sdkVersion: kfp-2.8.0
---
platforms:
  kubernetes:
    deploymentSpec:
      executors:
        exec-kfp-busybox:
          imagePullSecret:
          - secretName: sdk-docker
  1. upload pipeline to kubeflow platform
  2. run pipeline

Output:

time="2024-07-16T11:16:04.035Z" level=info msg="capturing logs" argo=true
time="2024-07-16T11:16:04.153Z" level=info msg="capturing logs" argo=true
I0716 11:16:04.212214      25 launcher_v2.go:90] input ComponentSpec:{
  "inputDefinitions": {
    "parameters": {
      "file": {
        "parameterType": "STRING",
        "defaultValue": "/etc/hosts",
        "isOptional": true
      },
      "line_number": {
        "parameterType": "BOOLEAN",
        "isOptional": true
      }
    }
  },
  "executorLabel": "exec-kfp-busybox"
}
I0716 11:16:04.213720      25 cache.go:139] Cannot detect ml-pipeline in the same namespace, default to ml-pipeline.kubeflow:8887 as KFP endpoint.
I0716 11:16:04.213766      25 cache.go:116] Connecting to cache endpoint ml-pipeline.kubeflow:8887
# Kubernetes-managed hosts file.
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
fe00::0	ip6-mcastprefix
fe00::1	ip6-allnodes
fe00::2	ip6-allrouters
cat: can't open '{"IfPresent": {"InputName": "line_number", "Then": ["-n"]}}': No such file or directory
I0716 11:16:04.443096      25 launcher_v2.go:151] publish success.
F0716 11:16:04.443142      25 main.go:49] failed to execute component: exit status 1
time="2024-07-16T11:16:05.156Z" level=info msg="sub-process exited" argo=true error="<nil>"
Error: exit status 1
time="2024-07-16T11:16:06.037Z" level=info msg="sub-process exited" argo=true error="<nil>"
Error: exit status 1

as you can see here:

cat: can't open '{"IfPresent": {"InputName": "line_number", "Then": ["-n"]}}': No such file or directory

Expected result

Print file with line numbers for instance:

     1	127.0.0.1	localhost
     2	::1	localhost ip6-localhost ip6-loopback
     3	fe00::0	ip6-localnet
     4	ff00::0	ip6-mcastprefix
     5	ff02::1	ip6-allnodes
     6	ff02::2	ip6-allrouters

Materials and Reference


Impacted by this bug? Give it a 👍.

@papagala
Copy link
Contributor

papagala commented Aug 5, 2024

This issue is critical to us since we have hundreds of pipelines that use the conditions to conditionally preprocess datasets. @juliusvonkohout FYI.

@juliusvonkohout
Copy link
Member

@rimolive maybe something for 1.9.1

@daro1337
Copy link
Author

daro1337 commented Aug 9, 2024

@juliusvonkohout FYI It looks like central dashboard does not support Optional field (all are mandatory) also default value does not work in central dashboard :(

@juliusvonkohout
Copy link
Member

@juliusvonkohout FYI It looks like central dashboard does not support Optional field (all are mandatory) also default value does not work in central dashboard :(

@daro1337 you might want to ask @thesuperzapper about this.

@papagala
Copy link
Contributor

I was able to successfully pass a default value given this snipped. @daro1337

@dsl.pipeline(name="pipeline-with-loop-output-v2")
def my_pipeline(size: Optional[int] = 5):
    print_before_parallel = print_op(s="something before parallel")
    args_generator = args_generator_op(size=size)
    # parallelism is being ignored. There's an open issue about this
    with dsl.ParallelFor(args_generator.output, parallelism=10) as item:  
        print_op_sleep(s=item).after(print_before_parallel)

@papagala
Copy link
Contributor

papagala commented Sep 20, 2024

FYI @juliusvonkohout @thesuperzapper 😄 . In other words this below fails

def test_kfp_busybox(compile_pipeline):

    @dsl.pipeline()
    def _pipeline(
        file: str,
        line_number: bool,
    ):

        yamlpath="tests/_fixtures/busybox.yaml"

        test = config_file_step(
            # component_text=kserve_operation_component,
            path=yamlpath,
            params={
                "file": file,
                "line_number": line_number,
            },
            name="cat-hosts",
        )

    compile_pipeline(_pipeline)

But this works

def test_kfp_busybox(compile_pipeline):

    @dsl.pipeline()
    def _pipeline(
        file: str = default_value, # Only default values are accepted here
        line_number: bool = default_value, # Only default value accepted here
    ):
       # Any default values defined in the component below
       # are being ignored. Also optionals are flagged as required.
        yamlpath="tests/_fixtures/busybox.yaml" 

        test = config_file_step(
            # component_text=kserve_operation_component,
            path=yamlpath,
            params={
                "file": file,
                "line_number": line_number,
            },
            name="cat-hosts", 
        )

    compile_pipeline(_pipeline)

However, the first version compiles and even the default values are visible in the Intermediate representation YAML, making this a bug or at the very least some unexpected edge case.

Note: There seem to be two separate issues going on here, first the optional and default values from the components are being ignored. In our example the tests/_fixtures/busybox.yaml would not respect any default or optional values. Additionally, the condition defined in the arguments of the YAML component are ignored too, which forces us to stop using this pattern and start using run-time logic (which used to work)

      - if:
          cond:
            isPresent: line_number
          then:
          - -n

@juliusvonkohout
Copy link
Member

I am still busy with releasing Kubeflow 1.9.1 so for now i can only suggest to retry with the 1.9.1 RC (KFP 2.3.0) and the 2.9.0 SDK.

@papagala
Copy link
Contributor

papagala commented Oct 1, 2024

@daro1337 I guess it's a good try to retest this again. Thanks @juliusvonkohout. We will retest asap. Probably the biggest blocker we have is #10050

Copy link

github-actions bot commented Dec 1, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Dec 1, 2024
@thesuperzapper
Copy link
Member

I don't believe this issue is fixed?

Can someone confirm?

@github-actions github-actions bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Dec 2, 2024
@daro1337
Copy link
Author

daro1337 commented Dec 4, 2024

@thesuperzapper I conducted a test today and can confirm that the issue exists in Kubeflow version 1.9.1 and KFP SDK version 2.10.1.

@HumairAK HumairAK added this to the KFP 2.5.0 milestone Dec 4, 2024
Copy link

github-actions bot commented Feb 3, 2025

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Feb 3, 2025
@daro1337
Copy link
Author

daro1337 commented Feb 3, 2025

This is still an issue.

@github-actions github-actions bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

5 participants