Skip to content

fix(sdk): avoid conflicting component names in DAG when reusing pipelines #11071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

stijntratsaertit
Copy link
Contributor

Description of your changes:
Up to date, properly following the contributor's guide copy of PR #9969.

This pull request addresses the issue of ensuring unique component names when merging component specifications from a sub-pipeline into a main pipeline configuration. The changes ensure that each component in the merged pipeline has a unique name, thus preventing conflicts and collisions that can occur when components from sub-pipelines are integrated into the main pipeline.

Checklist:

Copy link

Hi @stijntratsaertit. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hbelmiro
Copy link
Contributor

hbelmiro commented Aug 5, 2024

/ok-to-test
/rerun-all

Copy link

@stijntratsaertit: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
test-run-all-gcpc-modules f9c7c40 link true /test test-run-all-gcpc-modules

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

delimiter='-')
old_name_to_new_name[old_component_name] = new_component_name

ordered_names = enumerate(old_name_to_new_name.items())
lifo_ordered_names = sorted(ordered_names, key=lambda x: x[0], reverse=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LIFO ordering for renaming might not appropriately handle the component references, especially if the pipeline structure doesn't align with this approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the pipeline structure have to align with the renaming approach? LIFO seems crucial here as you want the most complex names (the last in order) to be renamed first to avoid renaming/conflicting with the more generic names.

old_name_to_new_name = {}
for component_name, component_spec in sub_pipeline_spec.components.items():
existing_main_comp_names = list(main_pipeline_spec.components.keys())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last I heard about this test, this was said. Let me know what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon inspecting the logs more in depth, it seems logic that this test would fail is this renaming logic has been updated. Do you think it is appropriate to update the .yaml result with the new configuration?

new_component_name = utils.make_name_unique_by_adding_index(
name=component_name,
collection=list(main_pipeline_spec.components.keys()),
collection=existing_main_comp_names + current_comp_name_collection,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_name_unique_by_adding_index may not ensure complete uniqueness of component names when used within nested or reused pipelines. This could result in naming conflicts if the names are not correctly indexed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect it does when passing every component to the collection instead of just the components from the main pipeline. Could you elaborate on other cases I'm missing out on?

@stijntratsaertit
Copy link
Contributor Author

No updates?

@m-dz
Copy link

m-dz commented Jan 7, 2025

Could this be resurrected somehow @DharmitD ? I encountered the same issue while nesting a pipeline with 2 ParallelFor loops within another pipeline with a ParallelFor loop: #11484

@github-actions github-actions bot removed the Stale label Jan 8, 2025
@chensun
Copy link
Member

chensun commented Mar 6, 2025

@stijntratsaertit would you be able to rebase and resolve the merge conflicts? On a quick glance, the change makes sense to me. If you can rebase, we can check the test results.

Signed-off-by: Stijn Tratsaert <stijn.tratsaert.it@gmail.com>
Signed-off-by: Stijn Tratsaert <stijn.tratsaert.it@gmail.com>
Signed-off-by: Stijn Tratsaert <stijn.tratsaert.it@gmail.com>
@stijntratsaertit stijntratsaertit force-pushed the avoid-conflicting-component-names-in-dag branch from f9c7c40 to 5ae5617 Compare March 6, 2025 21:58
Copy link
Member

@chensun chensun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thanks!

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chensun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit d1b15ef into kubeflow:master Mar 18, 2025
32 checks passed
VaniHaripriya pushed a commit to VaniHaripriya/data-science-pipelines that referenced this pull request Mar 20, 2025
…ines (kubeflow#11071)

* make component names in dag more unique

Signed-off-by: Stijn Tratsaert <stijn.tratsaert.it@gmail.com>

* tweak ordering so that renaming is handled in LIFO fashion

Signed-off-by: Stijn Tratsaert <stijn.tratsaert.it@gmail.com>

* add test that reuses a pipeline multiple times

Signed-off-by: Stijn Tratsaert <stijn.tratsaert.it@gmail.com>

---------

Signed-off-by: Stijn Tratsaert <stijn.tratsaert.it@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants