Skip to content

Commit

Permalink
Merge pull request #189 from ENCODE-DCC/dev
Browse files Browse the repository at this point in the history
v2.3.1
  • Loading branch information
leepc12 authored May 4, 2023
2 parents 9eff7ca + 4ea8e66 commit 11782bc
Show file tree
Hide file tree
Showing 36 changed files with 249 additions and 272 deletions.
9 changes: 4 additions & 5 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,10 @@ install_python3: &install_python3
install_singularity: &install_singularity
name: Install Singularity (container)
command: |
sudo apt-get install -y alien
sudo wget https://kojipkgs.fedoraproject.org//packages/singularity/3.8.5/2.el8/x86_64/singularity-3.8.5-2.el8.x86_64.rpm
sudo alien -d singularity-3.8.5-2.el8.x86_64.rpm
sudo apt-get install -y ./singularity_3.8.5-3_amd64.deb
sudo apt-get install -y squashfs-tools
sudo apt-get install -y alien squashfs-tools libseccomp-dev
sudo wget https://github.com/sylabs/singularity/releases/download/v3.11.3/singularity-ce-3.11.3-1.el8.x86_64.rpm
sudo alien -d singularity-ce-3.11.3-1.el8.x86_64.rpm
sudo apt-get install -y ./singularity-ce_3.11.3-2_amd64.deb
singularity --version
Expand Down
17 changes: 9 additions & 8 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
---
repos:
- repo: https://github.com/psf/black
rev: 19.3b0
rev: 22.3.0
hooks:
- id: black
args: [--skip-string-normalization]
language_version: python3.6
language_version: python3

- repo: https://github.com/asottile/seed-isort-config
rev: v1.9.2
Expand All @@ -15,7 +16,7 @@
rev: v4.3.21
hooks:
- id: isort
language_version: python3.6
language_version: python3

- repo: https://github.com/detailyang/pre-commit-shell
rev: v1.0.6
Expand All @@ -33,8 +34,8 @@
- id: debug-statements
- id: check-yaml

- repo: https://github.com/jumanjihouse/pre-commit-hook-yamlfmt
rev: 0.0.10
hooks:
- id: yamlfmt
args: [--mapping, '2', --sequence, '4', --offset, '2']
# - repo: https://github.com/jumanjihouse/pre-commit-hook-yamlfmt
# rev: 0.0.10
# hooks:
# - id: yamlfmt
# args: [--mapping, '2', --sequence, '4', --offset, '2']
10 changes: 3 additions & 7 deletions DETAILS.md
Original file line number Diff line number Diff line change
Expand Up @@ -794,19 +794,15 @@ $ psql -d $DB_NAME -c "create role $DB_USER with superuser login password $DB_PA

## File database

> **WARINING**: Using this type of metadata database is **NOT RECOMMENDED**. It's unstable and fragile.
Define file DB parameters in `~/.caper/default.conf`.
Caper defaults to use file database to store workflow's metadata. Such metadata database is necessary for restarting a workflow from where it left off (Cromwell's call-caching feature). Default database location is on `local_out_dir` in the configuration file `~/.caper/default.conf` or CWD where you run Caper run/server command line. Its default filename prefix is `caper-db_[WDL_BASENAME].[INPUT_JSON_BASENAME]`. Therefore,
unless you explicitly define `file-db` in your configuration file you can simply resume a failed workflow with the same command line used for starting a new pipeline.

File database cannot be accessed with multiple processes. So defining `file-db` in `~/.caper/default.conf` can result in DB connection timeout error. Define `file-db` in `~/.caper/default.conf` only when you run a Caper server (with `caper server`) and submit workflows to it.
```
db=file
file-db=/YOUR/FILE/DB/PATH/PREFIX
```

This file DB is genereted on your working directory by default. Its default filename prefix is `caper_file_db.[INPUT_JSON_BASENAME_WO_EXT]`. A DB is consist of multiple files and directories with the same filename prefix.

Unless you explicitly define `file-db` in your configuration file `~/.caper/default.conf` this file DB name will depend on your input JSON filename. Therefore, you can simply resume a failed workflow with the same command line used for starting a new pipeline.


## Profiling/monitoring resources on Google Cloud

Expand Down
25 changes: 22 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,8 @@ For Conda users, make sure that you have installed pipeline's Conda environments

Take a look at the following examples:
```bash
$ caper run test.wdl --docker # can be used as a flag too, Caper will find a docker image defined in WDL
$ caper run test.wdl --singularity docker://ubuntu:latest
$ caper run test.wdl --docker # can be used as a flag too, Caper will find a default docker image in WDL if defined
$ caper run test.wdl --singularity docker://ubuntu:latest # define default singularity image in the command line
$ caper hpc submit test.wdl --singularity --leader-job-name test1 # submit to job engine and use singularity defined in WDL
$ caper submit test.wdl --conda your_conda_env_name # running caper server is required
```
Expand Down Expand Up @@ -98,7 +98,8 @@ $ caper hpc submit [WDL] -i [INPUT_JSON] --singularity --leader-job-name GOOD_NA

# Example with Conda and using call-caching (restarting a workflow from where it left off)
# Use the same --file-db PATH for next re-run then Caper will collect and softlink previous outputs.
$ caper hpc submit [WDL] -i [INPUT_JSON] --conda --leader-job-name GOOD_NAME2 --db file --file-db [METADATA_DB_PATH]
# If you see any DB connection error then replace it with "--db in-memory" then call-cahing will be disabled
$ caper hpc submit [WDL] -i [INPUT_JSON] --conda --leader-job-name GOOD_NAME2 --file-db [METADATA_DB_PATH]

# List all leader jobs.
$ caper hpc list
Expand All @@ -116,6 +117,24 @@ $ ls -l cromwell.out*
$ caper hpc abort [JOB_ID]
```

## Restarting a pipeline on local machine (and HPCs)

Caper uses Cromwell's call-caching to restart a pipeline from where it left off. Such database is automatically generated on `local_out_dir` defined in the configuration file `~/.caper/default.conf`. The DB file name is simply consist of WDL's basename and input JSON file's basename so you can simply run the same `caper run` command line on the same working directory to restart a workflow.

```bash
# for standalone/client
$ caper run ... --db in-memory

# for server
$ caper server ... --db in-memory
````


## DB connection timeout

If you see any DB connection timeout error, that means you have multiple caper/Cromwell processes trying to connect to the same file DB. Check any running Cromwell processes with `ps aux | grep cromwell` and close them with `kill PID`. If that does not fix the problem, then use `caper run ... --db in-memory` to disable Cromwell's metadata DB. You will not be able to use call-caching.
## Customize resource parameters on HPCs
If default settings of Caper does not work with your HPC, then see [this document](docs/resource_param.md) to manually customize resource command line (e.g. `sbatch ... [YOUR_CUSTOM_PARAMETER]`) for your chosen backend.
Expand Down
2 changes: 1 addition & 1 deletion caper/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
from .caper_runner import CaperRunner

__all__ = ['CaperClient', 'CaperClientSubmit', 'CaperRunner']
__version__ = '2.2.3'
__version__ = '2.3.1'
1 change: 1 addition & 0 deletions caper/arg_tool.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import os
from argparse import ArgumentParser
from configparser import ConfigParser, MissingSectionHeaderError

from distutils.util import strtobool


Expand Down
27 changes: 11 additions & 16 deletions caper/caper_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,9 @@
CromwellBackendSlurm,
)
from .cromwell_rest_api import CromwellRestAPI
from .hpc import LsfWrapper, PbsWrapper, SgeWrapper, SlurmWrapper
from .resource_analysis import ResourceAnalysis
from .server_heartbeat import ServerHeartbeat
from .hpc import (
SlurmWrapper,
SgeWrapper,
PbsWrapper,
LsfWrapper,
)

DEFAULT_CAPER_CONF = '~/.caper/default.conf'
DEFAULT_LIST_FORMAT = 'id,status,name,str_label,user,parent,submission'
Expand Down Expand Up @@ -163,7 +158,7 @@ def get_parser_and_defaults(conf_file=None):
)
group_db.add_argument(
'--db',
default=CromwellBackendDatabase.DEFAULT_DB,
default=CromwellBackendDatabase.DB_FILE,
help='Cromwell metadata database type',
)
group_db.add_argument(
Expand Down Expand Up @@ -534,31 +529,31 @@ def get_parser_and_defaults(conf_file=None):
'--leader-job-name',
help='Leader job name for a submitted workflow.'
'This name will be appended to the prefix "CAPER_LEADER_" and then '
'submitted to HPC. Such prefix is used to identify Caper leader jobs.'
'submitted to HPC. Such prefix is used to identify Caper leader jobs.',
)
group_hpc_submit.add_argument(
'--slurm-leader-job-resource-param',
help='Resource parameters to submit a Caper leader job to SLURM. '
'Make sure to quote if you use it in the command line arguments.',
default=' '.join(SlurmWrapper.DEFAULT_LEADER_JOB_RESOURCE_PARAM)
default=' '.join(SlurmWrapper.DEFAULT_LEADER_JOB_RESOURCE_PARAM),
)
group_hpc_submit.add_argument(
'--sge-leader-job-resource-param',
help='Resource parameters to submit a Caper leader job to SGE'
'Make sure to quote if you use it in the command line arguments.',
default=' '.join(SgeWrapper.DEFAULT_LEADER_JOB_RESOURCE_PARAM)
default=' '.join(SgeWrapper.DEFAULT_LEADER_JOB_RESOURCE_PARAM),
)
group_hpc_submit.add_argument(
'--pbs-leader-job-resource-param',
help='Resource parameters to submit a Caper leader job to PBS'
'Make sure to quote if you use it in the command line arguments.',
default=' '.join(PbsWrapper.DEFAULT_LEADER_JOB_RESOURCE_PARAM)
default=' '.join(PbsWrapper.DEFAULT_LEADER_JOB_RESOURCE_PARAM),
)
group_hpc_submit.add_argument(
'--lsf-leader-job-resource-param',
help='Resource parameters to submit a Caper leader job to LSF'
'Make sure to quote if you use it in the command line arguments.',
default=' '.join(LsfWrapper.DEFAULT_LEADER_JOB_RESOURCE_PARAM)
default=' '.join(LsfWrapper.DEFAULT_LEADER_JOB_RESOURCE_PARAM),
)

group_slurm = parent_submit.add_argument_group('SLURM arguments')
Expand Down Expand Up @@ -771,7 +766,7 @@ def get_parser_and_defaults(conf_file=None):
parent_hpc_abort.add_argument(
'job_ids',
nargs='+',
help='Job ID or list of job IDs to abort matching Caper leader jobs.'
help='Job ID or list of job IDs to abort matching Caper leader jobs.',
)

# all subcommands
Expand Down Expand Up @@ -864,18 +859,18 @@ def get_parser_and_defaults(conf_file=None):
parents=[parent_all],
)
subparser_hpc = p_hpc.add_subparsers(dest='hpc_action')
p_hpc_submit = subparser_hpc.add_parser(
subparser_hpc.add_parser(
'submit',
help='Submit a single workflow to HPC.',
parents=[parent_all, parent_submit, parent_run, parent_runner, parent_backend],
)

p_hpc_list = subparser_hpc.add_parser(
subparser_hpc.add_parser(
'list',
help='List all workflows submitted to HPC.',
parents=[parent_all, parent_backend],
)
p_hpc_abort = subparser_hpc.add_parser(
subparser_hpc.add_parser(
'abort',
help='Abort a workflow submitted to HPC.',
parents=[parent_all, parent_backend, parent_hpc_abort],
Expand Down
3 changes: 1 addition & 2 deletions caper/caper_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,8 +185,7 @@ def create_timestamped_work_dir(self, prefix=''):
return work_dir

def get_loc_dir(self, backend):
"""Get localization directory for a backend.
"""
"""Get localization directory for a backend."""
if backend == BACKEND_GCP:
return self._gcp_loc_dir
elif backend == BACKEND_AWS:
Expand Down
12 changes: 0 additions & 12 deletions caper/caper_init.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,8 @@
BACKEND_PBS,
BACKEND_SGE,
BACKEND_SLURM,
CromwellBackendLsf,
CromwellBackendPbs,
CromwellBackendSge,
CromwellBackendSlurm,
)

from .hpc import (
SlurmWrapper,
SgeWrapper,
PbsWrapper,
LsfWrapper,
)


CONF_CONTENTS_TMP_DIR = """
# Local directory for localized files and Cromwell's intermediate files.
# If not defined then Caper will make .caper_tmp/ on CWD or `local-out-dir`.
Expand Down
82 changes: 41 additions & 41 deletions caper/caper_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -495,47 +495,47 @@ def server(
dry_run=False,
):
"""Run a Cromwell server.
default_backend:
Default backend. If backend is not specified for a submitted workflow
then default backend will be used.
Choose among Caper's built-in backends.
(aws, gcp, Local, slurm, sge, pbs, lsf).
Or use a backend defined in your custom backend config file
(above "backend_conf" file).
server_heartbeat:
Server heartbeat to write hostname/port of a server.
server_port:
Server port to run Cromwell server.
Make sure to use different port for multiple Cromwell servers on the same
machine.
server_hostname:
Server hostname. If not defined then socket.gethostname() will be used.
If server_heartbeat is given, then this hostname will be written to
the server heartbeat file defined in server_heartbeat.
custom_backend_conf:
Backend config file (HOCON) to override Caper's auto-generated backend config.
fileobj_stdout:
File-like object to write Cromwell's STDOUT.
embed_subworkflow:
Caper stores/updates metadata.JSON file on
each workflow's root directory whenever there is status change
of workflow (or its tasks).
This flag ensures that any subworkflow's metadata JSON will be
embedded in main (this) workflow's metadata JSON.
This is to mimic behavior of Cromwell run mode's -m parameter.
java_heap_server:
Java heap (java -Xmx) for Cromwell server mode.
auto_write_metadata:
Automatic retrieval/writing of metadata.json upon workflow/task's status change.
work_dir:
Local temporary directory to store all temporary files.
Temporary files mean intermediate files used for running Cromwell.
For example, auto-generated backend config file and workflow options file.
If this is not defined, then cache directory self._local_loc_dir with a timestamp
will be used.
However, Cromwell Java process itself will run on CWD instead of this directory.
dry_run:
Stop before running Java command line for Cromwell.
default_backend:
Default backend. If backend is not specified for a submitted workflow
then default backend will be used.
Choose among Caper's built-in backends.
(aws, gcp, Local, slurm, sge, pbs, lsf).
Or use a backend defined in your custom backend config file
(above "backend_conf" file).
server_heartbeat:
Server heartbeat to write hostname/port of a server.
server_port:
Server port to run Cromwell server.
Make sure to use different port for multiple Cromwell servers on the same
machine.
server_hostname:
Server hostname. If not defined then socket.gethostname() will be used.
If server_heartbeat is given, then this hostname will be written to
the server heartbeat file defined in server_heartbeat.
custom_backend_conf:
Backend config file (HOCON) to override Caper's auto-generated backend config.
fileobj_stdout:
File-like object to write Cromwell's STDOUT.
embed_subworkflow:
Caper stores/updates metadata.JSON file on
each workflow's root directory whenever there is status change
of workflow (or its tasks).
This flag ensures that any subworkflow's metadata JSON will be
embedded in main (this) workflow's metadata JSON.
This is to mimic behavior of Cromwell run mode's -m parameter.
java_heap_server:
Java heap (java -Xmx) for Cromwell server mode.
auto_write_metadata:
Automatic retrieval/writing of metadata.json upon workflow/task's status change.
work_dir:
Local temporary directory to store all temporary files.
Temporary files mean intermediate files used for running Cromwell.
For example, auto-generated backend config file and workflow options file.
If this is not defined, then cache directory self._local_loc_dir with a timestamp
will be used.
However, Cromwell Java process itself will run on CWD instead of this directory.
dry_run:
Stop before running Java command line for Cromwell.
"""
if work_dir is None:
work_dir = self.create_timestamped_work_dir(
Expand Down
12 changes: 4 additions & 8 deletions caper/caper_wdl_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@


class CaperWDLParser(WDLParser):
"""WDL parser for Caper.
"""
"""WDL parser for Caper."""

RE_WDL_COMMENT_DOCKER = r'^\s*\#\s*CAPER\s+docker\s(.+)'
RE_WDL_COMMENT_SINGULARITY = r'^\s*\#\s*CAPER\s+singularity\s(.+)'
Expand All @@ -25,8 +24,7 @@ def __init__(self, wdl):

@property
def caper_docker(self):
"""Backward compatibility for property name. See property default_docker.
"""
"""Backward compatibility for property name. See property default_docker."""
return self.default_docker

@property
Expand All @@ -48,8 +46,7 @@ def default_docker(self):

@property
def caper_singularity(self):
"""Backward compatibility for property name. See property default_singularity.
"""
"""Backward compatibility for property name. See property default_singularity."""
return self.default_singularity

@property
Expand All @@ -71,8 +68,7 @@ def default_singularity(self):

@property
def default_conda(self):
"""Find a default Conda environment name in WDL for Caper.
"""
"""Find a default Conda environment name in WDL for Caper."""
if self.workflow_meta:
for conda_key in CaperWDLParser.WDL_WORKFLOW_META_CONDA_KEYS:
if conda_key in self.workflow_meta:
Expand Down
Loading

0 comments on commit 11782bc

Please sign in to comment.