Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support WDL call caching #5105

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Conversation

adamnovak
Copy link
Member

This adds support for WDL call caching compatible with MiniWDL, controlled by the same MiniWDL config/environment variables as MiniWDL uses. This should fix #4797.

It only caches task calls, and the write_* WDL function results necessary to have calls that depend on workflow-generated files.

It copies all output files from a Toil step into the MiniWDL cache folder, when saving to the cache, since Toil doesn't usually generate a persistent copy of task output files.

It does not cache downloads like MiniWDL can, so tasks that depend on URL files probably can't cache (or might get stuck cached and not update when the URL content changes).

It doesn't do anything special for string to File coercion, but probably should.

If you have this workflow:

version 1.1

workflow TestWorkflow {
    input {
    }
    call TestTask {
    input:
    }
    call WCTask as count_listing {
        input:
            to_count = TestTask.dirlist
    }
    call WCTask as count_dynamic {
        input:
            to_count = TestTask.trouble
    }
    call WCTask as count_workflow_scope {
        input:
            to_count = write_lines(["At", "workflow", "scope"])
    }
    output {
        Array[File] counts = [count_listing.result, count_dynamic.result, count_workflow_scope.result]
    }
}

task TestTask {
    input {
    }
    command <<<
        ls -l
    >>>
    output {
        Int number = 1
        File dirlist = stdout()
        File trouble = write_lines(["Hello", "World"])
    }
    runtime {
        container: "ubuntu:latest"
        cpu: 1
    }
}

task WCTask {
    input {
        File to_count
    }
    command <<<
        wc -l ~{to_count}
    >>>
    output {
        File result = stdout()
    }
    runtime {
        container: "ubuntu:latest"
        cpu: 1
    }
}

Then you can run it with either of:

MINIWDL__CALL_CACHE__GET=true MINIWDL__CALL_CACHE__PUT=true MINIWDL__CALL_CACHE__DIR=`pwd`/miniwdl-cache miniwdl run test.wdl

Or

MINIWDL__CALL_CACHE__GET=true MINIWDL__CALL_CACHE__PUT=true MINIWDL__CALL_CACHE__DIR=`pwd`/miniwdl-cache toil-wdl-runner --logDebug --jobStore ./tree test.wdl

And the second one will get cache hits for all the tasks that the first one ran.

Changelog Entry

To be copied to the draft changelog by merger:

  • Toil now supports reading and writing MiniWDL's call cache.

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/{cliOptions,cwl,wdl}.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passes tests.
  • Make sure the PR has been reviewed since its last modification. If not, review it.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

@adamnovak
Copy link
Member Author

I need to make this also support caching calls to whole workflows. If I use the files in the output tasks' caches instead of fetching the cached results of one workflow, I get different paths (that symlink to the same files), and then I can't re-use MiniWDL cached calls that depend on files from the called workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WDL Call Caching
1 participant