Skip to content
This repository has been archived by the owner on May 31, 2024. It is now read-only.

Workflow definitions that include a workflow.zip file fail parsing in CromwellWESAdapter #577

Open
wleepang opened this issue Dec 7, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@wleepang
Copy link
Contributor

wleepang commented Dec 7, 2022

Describe the Bug

Running a multi-file workflow that includes a workflow.zip file as part of its definition produces the following error with agc workflow run:

2022-12-06T17:47:00-08:00 ✘   error="unable to run workflow: 500 Internal Server Error"
Error: an error occurred invoking 'workflow run'

Steps to Reproduce

  1. Create a multi-file WDL workflow definition
  2. Add a zip file called workflow.zip to the definition folder
  3. Add the workflow to agc-project.yaml
  4. Start a Cromwell context
  5. Run the workflow

Relevant Logs

Context adapter log contains the following:

Tue, 06 Dec 2022 17:19:28 -0800 [ERROR] 2022-12-07T01:19:28.581Z        091259f7-8c63-4b27-9362-72f6f91e9125    Exception on /ga4gh/wes/v1/runs [POST]
Traceback (most recent call last):
  File "/var/task/amazon_genomics/wes/adapters/CromwellWESAdapter.py", line 514, in get_workflow_from_s3
    props = parse_workflow_zip_file(file, workflow_type)
  File "/var/task/amazon_genomics/wes/adapters/CromwellWESAdapter.py", line 555, in parse_workflow_zip_file
    zip.extractall(wd)
  File "/var/lang/lib/python3.9/zipfile.py", line 1642, in extractall
    self._extract_member(zipinfo, path, pwd)
  File "/var/lang/lib/python3.9/zipfile.py", line 1697, in _extract_member
    shutil.copyfileobj(source, target)
  File "/var/lang/lib/python3.9/shutil.py", line 205, in copyfileobj
    buf = fsrc_read(length)
  File "/var/lang/lib/python3.9/zipfile.py", line 924, in read
    data = self._read1(n)
  File "/var/lang/lib/python3.9/zipfile.py", line 992, in _read1
    data += self._read2(n - len(data))
  File "/var/lang/lib/python3.9/zipfile.py", line 1027, in _read2
    raise EOFError
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/task/amazon_genomics/wes/adapters/CromwellWESAdapter.py", line 239, in run_workflow
    props = get_workflow_from_s3(workflow_url, tmpdir, workflow_type)
  File "/var/task/amazon_genomics/wes/adapters/CromwellWESAdapter.py", line 516, in get_workflow_from_s3
    raise RuntimeError(f"{s3_uri} is not a valid workflow.zip file: {e}")
RuntimeError: s3://agc-111122223333-us-west-2/project/orca/userid/pwymingJKP3z/context/cromwellCtx/workflow/broad_gtex/workflow.zip is not a valid workflow.zip file: 

Expected Behavior

Workflow definitions that are accompanied by extra modules bundled as a zip file should run regardless of what the module bundle zip is named.

Actual Behavior

Screenshots

Additional Context

Proposed fix:

The workflow definition bundle needs to be extracted to a distinct folder. The following line:

should be replaced with something like:

zip.extractall(path='path/to/tmpdir')

where path/to/tmpdir is different than wd which is currently set to the parent folder of the downloaded workflow definition bundle.

Operating System: macOS
AGC Version: 1.5.2
Was AGC setup with a custom bucket: No
Was AGC setup with a custom VPC: No

@wleepang wleepang added the bug Something isn't working label Dec 7, 2022
@wleepang
Copy link
Contributor Author

wleepang commented Dec 7, 2022

Doing the following should be a sufficient fix:

# rest of code ...
wd = path.join(path.dirname(file), 'workflow')
with zipfile.ZipFile(file) as zip:
    # rest of code ...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant