-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add hardware parameters, secret parameters, and taxonomy repo authentication #272
base: main
Are you sure you want to change the base?
add hardware parameters, secret parameters, and taxonomy repo authentication #272
Conversation
661c8bd
to
8742a83
Compare
8742a83
to
73b7ac1
Compare
f"Error fetching secret: {response.status_code} {response.text}" | ||
) | ||
|
||
if judge_secret_name is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note on this, we keep the previous approach for 2 reasons:
- we dont' want to break standalone.py, and this code is leveraged there so we need to maintain backwards compatibility
- we will need this again when we use the sdk to mount the secrets, and we'll get rid of the new additional calls to fetch_secret
Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
this commit adds logic to parameterize secret names for: * judge server for both eval phases * teacher server for sdg phase the same fetch_secret() function is duplicated to ensure that we are not passing secret data around as input/output parameters/artifacts. Doing the latter would result in user secrets being stored in mlmd/object store which we should avoid. In the future this logic will be replaced with kfp built in secret mounting once it supports parameterization, a lot of the duplicated logic will be removed. We also perform rest requests against the host cluster because access to kubernetes python package is not guaranteed. Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
73b7ac1
to
86a9ad8
Compare
/cc @mprahl |
@@ -392,12 +392,6 @@ def ilab_pipeline( | |||
run_mt_bench_task.set_accelerator_limit(1) | |||
run_mt_bench_task.set_caching_options(False) | |||
run_mt_bench_task.after(training_phase_2) | |||
use_config_map_as_env( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have been removed to maintain backwards compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I'm following, backwards compatibility for what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code under if judge_secret_name is None:
relies on an environment variable of JUDGE_ENDPOINT
and JUDGE_NAME
does it not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see what you mean, the backwards compatibility comment I left in the PR description refers to only standalone.py, which will utilize the same component code for sdg/mt_eval/final_eval but will mount the configmaps/secrets as env vars (it's a bit convoluted), for example for sdg this is done here, we want to maintain compatibility with standalone.py
from the pipeline's perspective, you provide a secret name, and we will use it, how we use it is an implementation detail
4fa342c
to
0627095
Compare
Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
0627095
to
45b2aea
Compare
Description
Parameterized the secret names for sdg/judge. These have some duplicated code in the form of a copied function
fetch_secret
, some notes:get_pod_template_spec
does not expose underlying hardware fieldsISTIO_SIDECAR_INJECTION
in the training pod but I've kept it to be consistent with the underlyingget_pod_template_spec
implementation.The PR also adds parameterizes the secret for taxonomy repo, this adds auth management for this repo, user can provide ssh or username/pat authentication. This has various edge cases for consideration, and I've tested the following:
The remaining private github.com case has high confidence that it works given the self-hosted private case (it's basically the same). There are other edge cases worth considering, like if the taxonomy repo itself is ssh based but the
qna
repos are git based, the code currently doesn't support this but it should be a simple addition to add this support (probably enough to get rid of the conditional on ssh vs username/pat, but there may be other issues with this).Note that the git clone component is removed entirely, this is because we don't want to be passing token/credentials as input/outputs around, and if we kept this as a separate component there would be a lot of repeated logic in both the clone and sdg_op components. To avoid this we just manage the auth, clone, and sdg generate in the same component.
How Has This Been Tested?
Ran the ilab pipeline in a non-disconnected environment
I haven't tested it with tolerations & node selectors yet, but the rest works.
I've tested the sdg taxonomy repo auth against the cases mentioned above.
Log Outputs from the different cases outlined above:
Self-Signed cases
self-signed-https-private-repo.log
self-signed-https-public-repo
Github.com Cases (well-known)
github-https-public-repo.log
github-ssh-private-repo.log
github-ssh-public-with-ssh-key.log
Merge criteria: