-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(backend): Fix enable_caching issues when handling PVC creation/deletion #11411
Conversation
backend/src/v2/driver/driver.go
Outdated
err = createCache(ctx, createdExecution, opts, taskStartedTime, fingerPrint, cacheClient) | ||
if err != nil { | ||
return "", createdExecution, pb.Execution_FAILED, fmt.Errorf("failed to create cache entrty for create pvc: %w", err) | ||
if opts.Task.GetCachingOptions().GetEnableCache() && ecfg.CachedMLMDExecutionID != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ecfg.CachedMLMDExecutionID != ""
condition might possibly not be necessary here, as creating a PVC doesn't always depend on a cached execution ID. The cached execution ID might not exist at the time of PVC creation.
Unlike here, as deleting a PVC without a valid cached execution ID could result in errors or unintended behavior (e.g., trying to delete a PVC that doesn’t exist or isn't linked to the cache).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right about createPVC, but it seems to me like this won't have the intended result for deletepvc either
Unlike here, as deleting a PVC without a valid cached execution ID could result in errors or unintended behavior (e.g., trying to delete a PVC that doesn’t exist or isn't linked to the cache).
when would we ever hit this in either case?
this exact condition is checked here, and this results in an early exit.
Regardless, how can ecfg.CachedMLMDExecutionID != ""
ever be true
if we never create a cache outside this condition?
@rimolive did you test to see if deletepvc is even getting cached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HumairAK Yes, tested with the same pipeline but creating/deleting the PVC and with cache enabled. The cache fingerprint is generated in both tasks, so maybe this condition should also apply to deletePVC.
bf69940
to
400d108
Compare
400d108
to
5c5d99a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a test?
It seems that samples/v2/pipeline_with_volume.py
already tests it with the cache enabled. It should be simple enough to copy this test but with the cache disabled?
I'm afraid it is not possible because it's a special condition when you create a run directly from the SDK:
@mprahl Sample test does not interact with KFP that way, it only compiles the sample and submit as-is to KFP. https://github.com/kubeflow/pipelines/blob/master/samples/v2/sample_test.py#L92 Personally, I think duplicating this code just to cover one single case is overengineering. I can take a look at other options, but I'm open for other suggestions. |
I disagree that this is "overengineering". Since this used to work before and was broken and not caught (i.e. regression) in a newer release, that signals to me that the test coverage is not sufficient. Submitting the job with diff --git a/samples/v2/pipeline_with_volume.py b/samples/v2/pipeline_with_volume.py
index c1a8b1035..216e0947a 100644
--- a/samples/v2/pipeline_with_volume.py
+++ b/samples/v2/pipeline_with_volume.py
@@ -41,10 +41,10 @@ def pipeline_with_volume():
access_modes=['ReadWriteOnce'],
size='5Mi',
storage_class_name='standard',
- )
+ ).set_caching_options(False)
- task1 = producer()
- task2 = consumer().after(task1)
+ task1 = producer().set_caching_options(False)
+ task2 = consumer().set_caching_options(False).after(task1)
kubernetes.mount_pvc(
task1,
@@ -58,11 +58,11 @@ def pipeline_with_volume():
)
delete_pvc1 = kubernetes.DeletePVC(
- pvc_name=pvc1.outputs['name']).after(task2)
+ pvc_name=pvc1.outputs['name']).set_caching_options(False).after(task2)
if __name__ == '__main__':
# execute only if run as a script
from kfp import compiler
compiler.Compiler().compile(
pipeline_func=pipeline_with_volume,
- package_path='pipeline_with_volume.json')
\ No newline at end of file
+ package_path='pipeline_with_volume.yaml')
\ No newline at end of file
|
5c5d99a
to
7ebe489
Compare
mount_path='/data', | ||
) | ||
|
||
delete_pvc1 = kubernetes.DeletePVC( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to also have caching disabled for the deletePvc
function test coverage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, as the Driver changes apply to both functions. I'm adding it to the test.
7ebe489
to
0230152
Compare
@@ -16,9 +16,9 @@ | |||
.PHONY: all | |||
all: | |||
@for f in $$(find . -name "*.py"); do \ | |||
compiled="$${f%.*}_pipeline.json"; \ | |||
compiled="$${f%.*}_pipeline.yaml"; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll need to update the clean
Make target as well since it looks for all JSON files still.
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
"""Pipeline with volume creation, mount and deletion in v2 engine pipeline.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Pipeline with volume creation, mount and deletion in v2 engine pipeline.""" | |
"""Pipeline with no caching on volume creation, mount and deletion in v2 engine pipeline.""" |
0230152
to
06e8466
Compare
Signed-off-by: Ricardo M. Oliveira <rmartine@redhat.com>
06e8466
to
ac2c533
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@HumairAK Can you PTAL? |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: HumairAK, mprahl The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description of your changes:
Resolves #10188
When setting enable_caching=false, KFP Driver still tries to create the cache entry. Added a condition to ensure cache entry will only get created if enable_caching=true.
Checklist: