Merge pull request #189 from ENCODE-DCC/dev

v2.3.1
ENCODE-DCC · May 4, 2023 · 11782bc · 11782bc
2 parents 9eff7ca + 4ea8e66
commit 11782bc
Show file tree

Hide file tree

Showing 36 changed files with 249 additions and 272 deletions.
diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -29,11 +29,10 @@ install_python3: &install_python3
 install_singularity: &install_singularity
   name: Install Singularity (container)
   command: |
-    sudo apt-get install -y alien
-    sudo wget https://kojipkgs.fedoraproject.org//packages/singularity/3.8.5/2.el8/x86_64/singularity-3.8.5-2.el8.x86_64.rpm
-    sudo alien -d singularity-3.8.5-2.el8.x86_64.rpm
-    sudo apt-get install -y ./singularity_3.8.5-3_amd64.deb
-    sudo apt-get install -y squashfs-tools
+    sudo apt-get install -y alien squashfs-tools libseccomp-dev
+    sudo wget https://github.com/sylabs/singularity/releases/download/v3.11.3/singularity-ce-3.11.3-1.el8.x86_64.rpm
+    sudo alien -d singularity-ce-3.11.3-1.el8.x86_64.rpm
+    sudo apt-get install -y ./singularity-ce_3.11.3-2_amd64.deb
     singularity --version
 
 

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,10 +1,11 @@
 ---
+repos:
   - repo: https://github.com/psf/black
-    rev: 19.3b0
+    rev: 22.3.0
     hooks:
       - id: black
         args: [--skip-string-normalization]
-        language_version: python3.6
+        language_version: python3
 
   - repo: https://github.com/asottile/seed-isort-config
     rev: v1.9.2
@@ -15,7 +16,7 @@
     rev: v4.3.21
     hooks:
       - id: isort
-        language_version: python3.6
+        language_version: python3
 
   - repo: https://github.com/detailyang/pre-commit-shell
     rev: v1.0.6
@@ -33,8 +34,8 @@
       - id: debug-statements
       - id: check-yaml
 
-  - repo: https://github.com/jumanjihouse/pre-commit-hook-yamlfmt
-    rev: 0.0.10
-    hooks:
-      - id: yamlfmt
-        args: [--mapping, '2', --sequence, '4', --offset, '2']
+#  - repo: https://github.com/jumanjihouse/pre-commit-hook-yamlfmt
+#    rev: 0.0.10
+#    hooks:
+#      - id: yamlfmt
+#        args: [--mapping, '2', --sequence, '4', --offset, '2']
diff --git a/DETAILS.md b/DETAILS.md
@@ -794,19 +794,15 @@ $ psql -d $DB_NAME -c "create role $DB_USER with superuser login password $DB_PA
 
 ## File database
 
-> **WARINING**: Using this type of metadata database is **NOT RECOMMENDED**. It's unstable and fragile.
-
-Define file DB parameters in `~/.caper/default.conf`.
+Caper defaults to use file database to store workflow's metadata. Such metadata database is necessary for restarting a workflow from where it left off (Cromwell's call-caching feature). Default database location is on `local_out_dir` in the configuration file `~/.caper/default.conf` or CWD where you run Caper run/server command line. Its default filename prefix is `caper-db_[WDL_BASENAME].[INPUT_JSON_BASENAME]`. Therefore,
+unless you explicitly define `file-db` in your configuration file you can simply resume a failed workflow with the same command line used for starting a new pipeline.
 
+File database cannot be accessed with multiple processes. So defining `file-db` in  `~/.caper/default.conf` can result in DB connection timeout error. Define `file-db` in  `~/.caper/default.conf` only when you run a Caper server (with `caper server`) and submit workflows to it.
 ```
 db=file
 file-db=/YOUR/FILE/DB/PATH/PREFIX
 ```
 
-This file DB is genereted on your working directory by default. Its default filename prefix is `caper_file_db.[INPUT_JSON_BASENAME_WO_EXT]`. A DB is consist of multiple files and directories with the same filename prefix.
-
-Unless you explicitly define `file-db` in your configuration file `~/.caper/default.conf` this file DB name will depend on your input JSON filename. Therefore, you can simply resume a failed workflow with the same command line used for starting a new pipeline.
-
 
 ## Profiling/monitoring resources on Google Cloud
 

diff --git a/README.md b/README.md
@@ -59,8 +59,8 @@ For Conda users, make sure that you have installed pipeline's Conda environments
 
 Take a look at the following examples:
 ```bash
-$ caper run test.wdl --docker # can be used as a flag too, Caper will find a docker image defined in WDL
-$ caper run test.wdl --singularity docker://ubuntu:latest
+$ caper run test.wdl --docker # can be used as a flag too, Caper will find a default docker image in WDL if defined
+$ caper run test.wdl --singularity docker://ubuntu:latest # define default singularity image in the command line
 $ caper hpc submit test.wdl --singularity --leader-job-name test1 # submit to job engine and use singularity defined in WDL
 $ caper submit test.wdl --conda your_conda_env_name # running caper server is required
 ```
@@ -98,7 +98,8 @@ $ caper hpc submit [WDL] -i [INPUT_JSON] --singularity --leader-job-name GOOD_NA
 
 # Example with Conda and using call-caching (restarting a workflow from where it left off)
 # Use the same --file-db PATH for next re-run then Caper will collect and softlink previous outputs.
-$ caper hpc submit [WDL] -i [INPUT_JSON] --conda --leader-job-name GOOD_NAME2 --db file --file-db [METADATA_DB_PATH] 
+# If you see any DB connection error then replace it with "--db in-memory" then call-cahing will be disabled
+$ caper hpc submit [WDL] -i [INPUT_JSON] --conda --leader-job-name GOOD_NAME2 --file-db [METADATA_DB_PATH]
 
 # List all leader jobs.
 $ caper hpc list
@@ -116,6 +117,24 @@ $ ls -l cromwell.out*
 $ caper hpc abort [JOB_ID]
 ```
 
+## Restarting a pipeline on local machine (and HPCs)
+
+Caper uses Cromwell's call-caching to restart a pipeline from where it left off. Such database is automatically generated on `local_out_dir` defined in the configuration file `~/.caper/default.conf`. The DB file name is simply consist of WDL's basename and input JSON file's basename so you can simply run the same `caper run` command line on the same working directory to restart a workflow.
+
+```bash
+# for standalone/client
+$ caper run ... --db in-memory
+
+# for server
+$ caper server ... --db in-memory
+````
+
+
+## DB connection timeout
+
+If you see any DB connection timeout error, that means you have multiple caper/Cromwell processes trying to connect to the same file DB. Check any running Cromwell processes with `ps aux | grep cromwell` and close them with `kill         PID`. If that does not fix the problem, then use `caper run ... --db in-memory` to disable Cromwell's metadata DB. You will not be able to use call-caching.
+
+
 ## Customize resource parameters on HPCs
 
 If default settings of Caper does not work with your HPC, then see [this document](docs/resource_param.md) to manually customize resource command line (e.g. `sbatch ... [YOUR_CUSTOM_PARAMETER]`) for your chosen backend.

diff --git a/caper/__init__.py b/caper/__init__.py
@@ -2,4 +2,4 @@
 from .caper_runner import CaperRunner
 
 __all__ = ['CaperClient', 'CaperClientSubmit', 'CaperRunner']
-__version__ = '2.2.3'
+__version__ = '2.3.1'
diff --git a/caper/arg_tool.py b/caper/arg_tool.py
@@ -1,6 +1,7 @@
 import os
 from argparse import ArgumentParser
 from configparser import ConfigParser, MissingSectionHeaderError
+
 from distutils.util import strtobool
 
 

diff --git a/caper/caper_args.py b/caper/caper_args.py
@@ -23,14 +23,9 @@
     CromwellBackendSlurm,
 )
 from .cromwell_rest_api import CromwellRestAPI
+from .hpc import LsfWrapper, PbsWrapper, SgeWrapper, SlurmWrapper
 from .resource_analysis import ResourceAnalysis
 from .server_heartbeat import ServerHeartbeat
-from .hpc import (
-    SlurmWrapper,
-    SgeWrapper,
-    PbsWrapper,
-    LsfWrapper,
-)
 
 DEFAULT_CAPER_CONF = '~/.caper/default.conf'
 DEFAULT_LIST_FORMAT = 'id,status,name,str_label,user,parent,submission'
@@ -163,7 +158,7 @@ def get_parser_and_defaults(conf_file=None):
     )
     group_db.add_argument(
         '--db',
-        default=CromwellBackendDatabase.DEFAULT_DB,
+        default=CromwellBackendDatabase.DB_FILE,
         help='Cromwell metadata database type',
     )
     group_db.add_argument(
@@ -534,31 +529,31 @@ def get_parser_and_defaults(conf_file=None):
         '--leader-job-name',
         help='Leader job name for a submitted workflow.'
         'This name will be appended to the prefix "CAPER_LEADER_" and then '
-        'submitted to HPC. Such prefix is used to identify Caper leader jobs.'
+        'submitted to HPC. Such prefix is used to identify Caper leader jobs.',
     )
     group_hpc_submit.add_argument(
         '--slurm-leader-job-resource-param',
         help='Resource parameters to submit a Caper leader job to SLURM. '
         'Make sure to quote if you use it in the command line arguments.',
-        default=' '.join(SlurmWrapper.DEFAULT_LEADER_JOB_RESOURCE_PARAM)
+        default=' '.join(SlurmWrapper.DEFAULT_LEADER_JOB_RESOURCE_PARAM),
     )
     group_hpc_submit.add_argument(
         '--sge-leader-job-resource-param',
         help='Resource parameters to submit a Caper leader job to SGE'
         'Make sure to quote if you use it in the command line arguments.',
-        default=' '.join(SgeWrapper.DEFAULT_LEADER_JOB_RESOURCE_PARAM)
+        default=' '.join(SgeWrapper.DEFAULT_LEADER_JOB_RESOURCE_PARAM),
     )
     group_hpc_submit.add_argument(
         '--pbs-leader-job-resource-param',
         help='Resource parameters to submit a Caper leader job to PBS'
         'Make sure to quote if you use it in the command line arguments.',
-        default=' '.join(PbsWrapper.DEFAULT_LEADER_JOB_RESOURCE_PARAM)
+        default=' '.join(PbsWrapper.DEFAULT_LEADER_JOB_RESOURCE_PARAM),
     )
     group_hpc_submit.add_argument(
         '--lsf-leader-job-resource-param',
         help='Resource parameters to submit a Caper leader job to LSF'
         'Make sure to quote if you use it in the command line arguments.',
-        default=' '.join(LsfWrapper.DEFAULT_LEADER_JOB_RESOURCE_PARAM)
+        default=' '.join(LsfWrapper.DEFAULT_LEADER_JOB_RESOURCE_PARAM),
     )
 
     group_slurm = parent_submit.add_argument_group('SLURM arguments')
@@ -771,7 +766,7 @@ def get_parser_and_defaults(conf_file=None):
     parent_hpc_abort.add_argument(
         'job_ids',
         nargs='+',
-        help='Job ID or list of job IDs to abort matching Caper leader jobs.'
+        help='Job ID or list of job IDs to abort matching Caper leader jobs.',
     )
 
     # all subcommands
@@ -864,18 +859,18 @@ def get_parser_and_defaults(conf_file=None):
         parents=[parent_all],
     )
     subparser_hpc = p_hpc.add_subparsers(dest='hpc_action')
-    p_hpc_submit = subparser_hpc.add_parser(
+    subparser_hpc.add_parser(
         'submit',
         help='Submit a single workflow to HPC.',
         parents=[parent_all, parent_submit, parent_run, parent_runner, parent_backend],
     )
 
-    p_hpc_list = subparser_hpc.add_parser(
+    subparser_hpc.add_parser(
         'list',
         help='List all workflows submitted to HPC.',
         parents=[parent_all, parent_backend],
     )
-    p_hpc_abort = subparser_hpc.add_parser(
+    subparser_hpc.add_parser(
         'abort',
         help='Abort a workflow submitted to HPC.',
         parents=[parent_all, parent_backend, parent_hpc_abort],

diff --git a/caper/caper_base.py b/caper/caper_base.py
@@ -185,8 +185,7 @@ def create_timestamped_work_dir(self, prefix=''):
         return work_dir
 
     def get_loc_dir(self, backend):
-        """Get localization directory for a backend.
-        """
+        """Get localization directory for a backend."""
         if backend == BACKEND_GCP:
             return self._gcp_loc_dir
         elif backend == BACKEND_AWS:

diff --git a/caper/caper_init.py b/caper/caper_init.py
@@ -10,20 +10,8 @@
     BACKEND_PBS,
     BACKEND_SGE,
     BACKEND_SLURM,
-    CromwellBackendLsf,
-    CromwellBackendPbs,
-    CromwellBackendSge,
-    CromwellBackendSlurm,
 )
 
-from .hpc import (
-    SlurmWrapper,
-    SgeWrapper,
-    PbsWrapper,
-    LsfWrapper,
-)
-
-
 CONF_CONTENTS_TMP_DIR = """
 # Local directory for localized files and Cromwell's intermediate files.
 # If not defined then Caper will make .caper_tmp/ on CWD or `local-out-dir`.

diff --git a/caper/caper_runner.py b/caper/caper_runner.py
@@ -495,47 +495,47 @@ def server(
         dry_run=False,
     ):
         """Run a Cromwell server.
-            default_backend:
-                Default backend. If backend is not specified for a submitted workflow
-                then default backend will be used.
-                Choose among Caper's built-in backends.
-                (aws, gcp, Local, slurm, sge, pbs, lsf).
-                Or use a backend defined in your custom backend config file
-                (above "backend_conf" file).
-            server_heartbeat:
-                Server heartbeat to write hostname/port of a server.
-            server_port:
-                Server port to run Cromwell server.
-                Make sure to use different port for multiple Cromwell servers on the same
-                machine.
-            server_hostname:
-                Server hostname. If not defined then socket.gethostname() will be used.
-                If server_heartbeat is given, then this hostname will be written to
-                the server heartbeat file defined in server_heartbeat.
-            custom_backend_conf:
-                Backend config file (HOCON) to override Caper's auto-generated backend config.
-            fileobj_stdout:
-                File-like object to write Cromwell's STDOUT.
-            embed_subworkflow:
-                Caper stores/updates metadata.JSON file on
-                each workflow's root directory whenever there is status change
-                of workflow (or its tasks).
-                This flag ensures that any subworkflow's metadata JSON will be
-                embedded in main (this) workflow's metadata JSON.
-                This is to mimic behavior of Cromwell run mode's -m parameter.
-            java_heap_server:
-                Java heap (java -Xmx) for Cromwell server mode.
-            auto_write_metadata:
-                Automatic retrieval/writing of metadata.json upon workflow/task's status change.
-            work_dir:
-                Local temporary directory to store all temporary files.
-                Temporary files mean intermediate files used for running Cromwell.
-                For example, auto-generated backend config file and workflow options file.
-                If this is not defined, then cache directory self._local_loc_dir with a timestamp
-                will be used.
-                However, Cromwell Java process itself will run on CWD instead of this directory.
-            dry_run:
-                Stop before running Java command line for Cromwell.
+        default_backend:
+            Default backend. If backend is not specified for a submitted workflow
+            then default backend will be used.
+            Choose among Caper's built-in backends.
+            (aws, gcp, Local, slurm, sge, pbs, lsf).
+            Or use a backend defined in your custom backend config file
+            (above "backend_conf" file).
+        server_heartbeat:
+            Server heartbeat to write hostname/port of a server.
+        server_port:
+            Server port to run Cromwell server.
+            Make sure to use different port for multiple Cromwell servers on the same
+            machine.
+        server_hostname:
+            Server hostname. If not defined then socket.gethostname() will be used.
+            If server_heartbeat is given, then this hostname will be written to
+            the server heartbeat file defined in server_heartbeat.
+        custom_backend_conf:
+            Backend config file (HOCON) to override Caper's auto-generated backend config.
+        fileobj_stdout:
+            File-like object to write Cromwell's STDOUT.
+        embed_subworkflow:
+            Caper stores/updates metadata.JSON file on
+            each workflow's root directory whenever there is status change
+            of workflow (or its tasks).
+            This flag ensures that any subworkflow's metadata JSON will be
+            embedded in main (this) workflow's metadata JSON.
+            This is to mimic behavior of Cromwell run mode's -m parameter.
+        java_heap_server:
+            Java heap (java -Xmx) for Cromwell server mode.
+        auto_write_metadata:
+            Automatic retrieval/writing of metadata.json upon workflow/task's status change.
+        work_dir:
+            Local temporary directory to store all temporary files.
+            Temporary files mean intermediate files used for running Cromwell.
+            For example, auto-generated backend config file and workflow options file.
+            If this is not defined, then cache directory self._local_loc_dir with a timestamp
+            will be used.
+            However, Cromwell Java process itself will run on CWD instead of this directory.
+        dry_run:
+            Stop before running Java command line for Cromwell.
         """
         if work_dir is None:
             work_dir = self.create_timestamped_work_dir(

diff --git a/caper/caper_wdl_parser.py b/caper/caper_wdl_parser.py
@@ -6,8 +6,7 @@
 
 
 class CaperWDLParser(WDLParser):
-    """WDL parser for Caper.
-    """
+    """WDL parser for Caper."""
 
     RE_WDL_COMMENT_DOCKER = r'^\s*\#\s*CAPER\s+docker\s(.+)'
     RE_WDL_COMMENT_SINGULARITY = r'^\s*\#\s*CAPER\s+singularity\s(.+)'
@@ -25,8 +24,7 @@ def __init__(self, wdl):
 
     @property
     def caper_docker(self):
-        """Backward compatibility for property name. See property default_docker.
-        """
+        """Backward compatibility for property name. See property default_docker."""
         return self.default_docker
 
     @property
@@ -48,8 +46,7 @@ def default_docker(self):
 
     @property
     def caper_singularity(self):
-        """Backward compatibility for property name. See property default_singularity.
-        """
+        """Backward compatibility for property name. See property default_singularity."""
         return self.default_singularity
 
     @property
@@ -71,8 +68,7 @@ def default_singularity(self):
 
     @property
     def default_conda(self):
-        """Find a default Conda environment name in WDL for Caper.
-        """
+        """Find a default Conda environment name in WDL for Caper."""
         if self.workflow_meta:
             for conda_key in CaperWDLParser.WDL_WORKFLOW_META_CONDA_KEYS:
                 if conda_key in self.workflow_meta: