Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
5c56ab4
fix: move deprecated strtobool import to a local function [fixes ENCO…
mihirsamdarshi Mar 28, 2025
f9d9a92
Merge pull request #1 from MoTrPAC/fix/python312-compatibility
mihirsamdarshi Jul 20, 2025
1febe9f
feat: add Google Batch backend support
mihirsamdarshi Apr 8, 2024
608101a
fix: remove fully deprecated Genomics/Life Sciences APIs
mihirsamdarshi Jul 20, 2025
a28ab45
refactor: continue removing Life Sciences API support, update for Bat…
mihirsamdarshi Jul 21, 2025
0879d3e
docs: update documentation and other scripts to remove Life Sciences …
mihirsamdarshi Jul 21, 2025
c7f19c9
feat: add support for customizing Google Batch compute service account
mihirsamdarshi Jul 26, 2025
32e9e5a
feat: add GCP compute service account support in tests and update tes…
mihirsamdarshi Jul 26, 2025
38cbca3
feat: add GCP logging policy support and update Cromwell/Womtool vers…
mihirsamdarshi Jul 26, 2025
02c379f
chore: move pytest configuration file
mihirsamdarshi Jul 26, 2025
b6a2aa6
chore: add `slow` marker to tests and update pytest configuration
mihirsamdarshi Jul 27, 2025
fb8fed6
fix: GCS URL formatting in test and add `slow` marker to test
mihirsamdarshi Jul 27, 2025
4ea7b67
refactor: migrate to pyproject.toml, use uv as the package manager
mihirsamdarshi Jul 21, 2025
b5bdef6
refactor: complete migration to pyproject.toml/uv
mihirsamdarshi Jul 27, 2025
daf226b
docs: update GCP docs to remove Life Sciences API references and refl…
mihirsamdarshi Jul 27, 2025
f82371d
test: add `google_cloud` markers to GCP-related resource analysis tests
mihirsamdarshi Jul 27, 2025
975b7a3
chore: bump required Python to 3.10
mihirsamdarshi Nov 1, 2025
a95ac79
fix: remove incorrect group reassignment, relocate `--cromwell-stdout…
mihirsamdarshi Dec 21, 2025
5a877d9
fix: add missing required service-account config option
mihirsamdarshi Dec 21, 2025
f5e8431
feat: add type annotations and improve error handling
mihirsamdarshi Dec 25, 2025
9801564
chore: update python version and dependencies, update Ruff config
mihirsamdarshi Dec 24, 2025
570cf44
Ignore VS Code workspace files
biodavidjm Dec 16, 2025
a85ae83
chore: ignore additional IDE folders
mihirsamdarshi Dec 22, 2025
795c378
Set 'auth' to 'service-account' for GCP backend
biodavidjm Dec 21, 2025
9c5ec88
refactor: update CromwellMetadata internals for readability
mihirsamdarshi Dec 25, 2025
8c1cf55
feat: modernize type hints and docstrings
mihirsamdarshi Dec 25, 2025
b48ddd4
refactor: improve code organization and documentation
mihirsamdarshi Dec 25, 2025
890a690
refactor: improve type hints and error handling
mihirsamdarshi Dec 24, 2025
ae86ad1
refactor: improve code structure and type hints
mihirsamdarshi Dec 25, 2025
b3537e8
refactor: improve error handling and type annotations
mihirsamdarshi Dec 25, 2025
5049f7d
refactor: improve type hints and documentation
mihirsamdarshi Jan 20, 2026
ef2d3b4
feat: update test configuration for new GCS bucket and project
mihirsamdarshi Jan 20, 2026
8c599f9
refactor: fix lint errors, add type annotations to functions
mihirsamdarshi Jan 20, 2026
f208cf2
chore: update supported Python to 3.12
mihirsamdarshi Jan 19, 2026
44579f4
docs: update documentation and installation requirements
mihirsamdarshi Jan 19, 2026
8786e6a
refactor: update instance creation script
mihirsamdarshi Jan 19, 2026
3d3f4b3
build: add CI workflow for unit and GCP integration tests
mihirsamdarshi Jan 20, 2026
2b30c96
test: fix CI failures with ls return code
mihirsamdarshi Jan 20, 2026
24d831e
chore: add pyright config for type-checking
mihirsamdarshi Jan 20, 2026
6315803
feat: add optional callbacks for server events and status changes
mihirsamdarshi Jan 20, 2026
41a9449
chore: add class variable for Cromwell version, bump default version …
mihirsamdarshi Jan 21, 2026
db473a5
feat: add GCP VPC network and subnetwork configuration support
mihirsamdarshi Jan 28, 2026
58816ed
feat: add Docker Hub mirroring configuration
mihirsamdarshi Jan 28, 2026
9a7d2a4
chore(release): bump version
mihirsamdarshi Jan 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
141 changes: 141 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
name: Tests

on:
push:
branches:
- master
pull_request:
branches:
- master

jobs:
# Unit tests - runs on every push/PR
# Excludes Google Cloud tests
unit-tests:
name: Unit Tests (Python ${{ matrix.python-version }})
runs-on: ubuntu-latest
timeout-minutes: 30
strategy:
fail-fast: false
matrix:
python-version:
- "3.12"
- "3.13"
- "3.14"

steps:
- name: Checkout repository
uses: actions/checkout@v6

- name: Set up uv
uses: astral-sh/setup-uv@v7
with:
python-version: ${{ matrix.python-version }}
enable-cache: true

- name: Install system dependencies
run: |
sudo apt-get update -y
sudo DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
tzdata default-jre

- name: Install Singularity 4
env:
SINGULARITY_VERSION: 4.3.7
run: |
UBUNTU_CODENAME=$(lsb_release -cs)
wget -q "https://github.com/sylabs/singularity/releases/download/v${SINGULARITY_VERSION}/singularity-ce_${SINGULARITY_VERSION}-${UBUNTU_CODENAME}_amd64.deb"
sudo apt-get install -y "./singularity-ce_${SINGULARITY_VERSION}-${UBUNTU_CODENAME}_amd64.deb"
singularity --version

- name: Install dependencies
run: uv sync --all-groups --all-extras

- name: Run unit tests (excluding Google Cloud tests)
run: uv run pytest -m "not google_cloud and not slow" -vv -s --junit-xml=report.xml

- name: Test Report
uses: dorny/test-reporter@v2
if: success() || failure() # run this step even if previous step failed
with:
name: Google Cloud Integration Test Results
path: report.xml # Path to test results
reporter: java-junit # Format of test results

# Google Cloud integration tests
# Requires maintainer approval via environment
gcp-integration-tests:
name: Google Cloud Integration Tests (Python 3.12)
runs-on: ubuntu-latest
timeout-minutes: 90
needs:
- unit-tests
# This environment requires maintainer approval
# Configured at: Settings > Environments > google-cloud > Required reviewers
environment: Google Cloud

# Required for Workload Identity Federation
permissions:
contents: read
id-token: write

steps:
- name: Checkout repository
uses: actions/checkout@v6

# See the action's README for more details on how/why Workload Identity Federation is configured and used
# The principalSet:// has direct access to the --gcs-root bucket
# https://docs.cloud.google.com/iam/docs/workload-identity-federation#access_management
# access to the service account is configured via the MoTrPAC/motrpac-iac repo
- name: Authenticate to Google Cloud
uses: google-github-actions/auth@v3
with:
project_id: ${{ secrets.GCP_PROJECT_ID }}
service_account: ${{ secrets.GCP_ACTIONS_SERVICE_ACCOUNT }}
workload_identity_provider: ${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }}

- name: Set up Cloud SDK
uses: google-github-actions/setup-gcloud@v3

- name: Set up uv
uses: astral-sh/setup-uv@v7
with:
python-version: "3.12"
enable-cache: true

- name: Install system dependencies
run: |
sudo apt-get update -y
sudo DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
tzdata default-jre

- name: Install Singularity 4
env:
SINGULARITY_VERSION: 4.3.7
run: |
UBUNTU_CODENAME=$(lsb_release -cs)
wget -q "https://github.com/sylabs/singularity/releases/download/v${SINGULARITY_VERSION}/singularity-ce_${SINGULARITY_VERSION}-${UBUNTU_CODENAME}_amd64.deb"
sudo apt-get install -y "./singularity-ce_${SINGULARITY_VERSION}-${UBUNTU_CODENAME}_amd64.deb"
singularity --version

- name: Install dependencies
run: uv sync --all-groups --all-extras

- name: Run Google Cloud integration tests
timeout-minutes: 60
env:
GOOGLE_CLOUD_PROJECT: ${{ secrets.GCP_PROJECT_ID }}
run: |
uv run pytest -m "google_cloud" \
--ci-prefix ${{ github.run_id }} \
--gcs-root gs://motrpac-test-caper \
--debug-caper \
-vv -s --junit-xml=report.xml

- name: Test Report
uses: dorny/test-reporter@v2
if: success() || failure() # run this step even if previous step failed
with:
name: Google Cloud Integration Test Results
path: report.xml # Path to test results
reporter: java-junit # Format of test results
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -112,3 +112,7 @@ src/test_caper_uri/
cromwell.out
dev/
tests/hpc/

*.code-workspace
.vscode/
.idea/
10 changes: 0 additions & 10 deletions .isort.cfg

This file was deleted.

14 changes: 6 additions & 8 deletions DETAILS.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ hpc abort | JOB_ID | Abort a Caper leader job. This will cascade kill all child

> **IMPORTANT**: `--deepcopy` has been deprecated and it's activated by default. You can disable it with `--no-deepcopy`.

Deepcopy allows Caper to **RECURSIVELY** copy files defined in your input JSON into your target backend's temporary storage. For example, Cromwell cannot read directly from URLs in an [input JSON file](https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/examples/caper/ENCSR356KRQ_subsampled.json), but Caper makes copies of these URLs on your backend's temporary directory (e.g. `--local-loc-dir` for `local`, `--gcp-loc-dir` for `gcp`) and pass them to Cromwell.
Deepcopy allows Caper to **RECURSIVELY** copy files defined in your input JSON into your target backend's temporary storage. For example, Cromwell cannot read directly from URLs in an input JSON file, but Caper makes copies of these URLs on your backend's temporary directory (e.g. `--local-loc-dir` for `local`, `--gcp-loc-dir` for `gcp`) and pass them to Cromwell.

## How to manage configuration file per project

Expand Down Expand Up @@ -187,8 +187,6 @@ We highly recommend to use a default configuration file described in the section
**Conf. file**|**Cmd. line**|**Description**
:-----|:-----|:-----
gcp-prj|--gcp-prj|Google Cloud project
use-google-cloud-life-sciences|--use-google-cloud-life-sciences|Use Google Cloud Life Sciences API instead of (deprecated) Genomics API
gcp-zones|--gcp-zones|Comma-delimited Google Cloud Platform zones to provision worker instances (e.g. us-central1-c,us-west1-b)
gcp-out-dir, out-gcs-bucket|--gcp-out-dir, --out-gcs-bucket|Output `gs://` directory for GC backend
gcp-loc-dir, tmp-gcs-bucket|--gcp-loc-dir, --tmp-gcs-bucket|Tmp. directory for localization on GC backend
gcp-call-caching-dup-strat|--gcp-call-caching-dup-strat|Call-caching duplication strategy. Choose between `copy` and `reference`. `copy` will make a copy for a new workflow, `reference` will make refer to the call-cached output of a previous workflow in `metadata.json`. Defaults to `reference`
Expand Down Expand Up @@ -466,12 +464,12 @@ If Caper's built-in backends don't work as expected on your clusters (e.g. due t
Find this `backend.conf` first by dry-running `caper run [WDL] --dry-run ...`. For example of a `slurm` backend:
```
$ caper run main.wdl --dry-run
2020-07-07 11:18:13,196|caper.caper_runner|INFO| Adding encode-dcc-1016 to env var GOOGLE_CLOUD_PROJECT
2020-07-07 11:18:13,197|caper.caper_base|INFO| Creating a timestamped temporary directory. /mnt/data/scratch/leepc12/test_caper_tmp/main/20200707_111813_197082
2020-07-07 11:18:13,197|caper.caper_runner|INFO| Localizing files on work_dir. /mnt/data/scratch/leepc12/test_caper_tmp/main/20200707_111813_197082
2020-07-07 11:18:13,196|caper.caper_runner|INFO| Adding my-gcp-project to env var GOOGLE_CLOUD_PROJECT
2020-07-07 11:18:13,197|caper.caper_base|INFO| Creating a timestamped temporary directory. /scratch/user/caper_tmp/main/20200707_111813_197082
2020-07-07 11:18:13,197|caper.caper_runner|INFO| Localizing files on work_dir. /scratch/user/caper_tmp/main/20200707_111813_197082
2020-07-07 11:18:13,829|caper.cromwell|INFO| Validating WDL/inputs/imports with Womtool...
2020-07-07 11:18:16,034|caper.cromwell|INFO| Womtool validation passed.
2020-07-07 11:18:16,035|caper.caper_runner|INFO| launching run: wdl=/mnt/data2/scratch/leepc12/test_wdl1_sub/main.wdl, inputs=None, backend_conf=/mnt/data/scratch/leepc12/test_caper_tmp/main/20200707_111813_197082/backend.conf
2020-07-07 11:18:16,035|caper.caper_runner|INFO| launching run: wdl=/scratch/user/workflows/main.wdl, inputs=None, backend_conf=/scratch/user/caper_tmp/main/20200707_111813_197082/backend.conf
```

Find `backend_conf`, make a copy of it and edit it.
Expand Down Expand Up @@ -554,7 +552,7 @@ until [ $ITER -ge 3 ]; do
sleep 30
done
"""
root = "/mnt/data/scratch/leepc12/caper_out"
root = "/scratch/user/caper_out"
exit-code-timeout-seconds = 360
check-alive = """for ITER in 1 2 3; do
CHK_ALIVE=$(squeue --noheader -j ${job_id} --format=%i | grep ${job_id})
Expand Down
25 changes: 15 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![CircleCI](https://circleci.com/gh/ENCODE-DCC/caper.svg?style=svg)](https://circleci.com/gh/ENCODE-DCC/caper)
[![Python Version from PEP 621 TOML](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2FMoTrPAC%2Fcaper%2Fmain%2Fpyproject.toml)](./pyproject.toml)


## Introduction

Caper (Cromwell Assisted Pipeline ExecutoR) is a wrapper Python package for [Cromwell](https://github.com/broadinstitute/cromwell/). Caper wraps Cromwell to run pipelines on multiple platforms like GCP (Google Cloud Platform), AWS (Amazon Web Service) and HPCs like SLURM, SGE, PBS/Torque and LSF. It provides easier way of running Cromwell server/run modes by automatically composing necessary input files for Cromwell. Caper can run each task on a specified environment (Docker, Singularity or Conda). Also, Caper automatically localizes all files (keeping their directory structure) defined in your input JSON and command line according to the specified backend. For example, if your chosen backend is GCP and files in your input JSON are on S3 buckets (or even URLs) then Caper automatically transfers `s3://` and `http(s)://` files to a specified `gs://` bucket directory. Supported URIs are `s3://`, `gs://`, `http(s)://` and local absolute paths. You can use such URIs either in CLI and input JSON. Private URIs are also accessible if you authenticate using cloud platform CLIs like `gcloud auth`, `aws configure` and using `~/.netrc` for URLs.
Caper (Cromwell Assisted Pipeline ExecutoR) is a wrapper Python package for [Cromwell](https://github.com/broadinstitute/cromwell/). This project is maintained by [MoTrPAC](https://motrpac.org/), forked from the original [ENCODE-DCC caper](https://github.com/ENCODE-DCC/caper) to add support for [Google Cloud Batch API](https://cloud.google.com/batch) and remove deprecated Google Cloud Life Sciences and Google Cloud Genomics APIs.

Caper wraps Cromwell to run pipelines on multiple platforms like GCP (Google Cloud Platform), AWS (Amazon Web Service) and HPCs like SLURM, SGE, PBS/Torque and LSF. It provides an easier way of running Cromwell server/run modes by automatically composing necessary input files for Cromwell. Caper can run each task on a specified environment (Docker, Singularity or Conda). Also, Caper automatically localizes all files (keeping their directory structure) defined in your input JSON and command line according to the specified backend. For example, if your chosen backend is GCP and files in your input JSON are on S3 buckets (or even URLs) then Caper automatically transfers `s3://` and `http(s)://` files to a specified `gs://` bucket directory. Supported URIs are `s3://`, `gs://`, `http(s)://` and local absolute paths. You can use such URIs either in CLI and input JSON. Private URIs are also accessible if you authenticate using cloud platform CLIs like `gcloud auth`, `aws configure` and using `~/.netrc` for URLs.


## Installation for Google Cloud Platform and AWS
Expand All @@ -18,19 +20,22 @@ See [this](scripts/aws_caper_server/README.md) for details.

## Installation for local computers and HPCs

1) Make sure that you have Java (>= 11) and Python>=3.6 installed on your system and `pip` to install Caper.
1) Make sure that you have Java (>= 17) and Python >= 3.12 installed on your system.

2) Install Caper from the [MoTrPAC GitHub repository](https://github.com/MoTrPAC/caper) using [uv](https://docs.astral.sh/uv/) (recommended) or pip:

```bash
$ pip install caper
```
# Using uvx (recommended) - runs caper without permanent installation
$ uvx --from git+https://github.com/MoTrPAC/caper caper

2) If you see an error message like `caper: command not found` after installing then add the following line to the bottom of `~/.bashrc` and re-login.
# Or install with uv
$ uv pip install git+https://github.com/MoTrPAC/caper

```bash
export PATH=$PATH:~/.local/bin
# Or install with pip
$ pip install git+https://github.com/MoTrPAC/caper
```

3) Choose a backend from the following table and initialize Caper. This will create a default Caper configuration file `~/.caper/default.conf`, which have only required parameters for each backend. `caper init` will also install Cromwell/Womtool JARs on `~/.caper/`. Downloading those files can take up to 10 minutes. Once they are installed, Caper can completely work offline with local data files.
3) Choose a backend from the following table and initialize Caper. This will create a default Caper configuration file `~/.caper/default.conf`, which has only required parameters for each backend. `caper init` will also install Cromwell/Womtool JARs in `~/.caper/`. Downloading those files can take up to 10 minutes. Once they are installed, Caper can work completely offline with local data files.

**Backend**|**Description**
:--------|:-----
Expand All @@ -51,7 +56,7 @@ See [this](scripts/aws_caper_server/README.md) for details.

## Docker, Singularity and Conda

For local backends (`local`, `slurm`, `sge`, `pbs` and `lsf`), you can use `--docker`, `--singularity` or `--conda` to run WDL tasks in a pipeline within one of these environment. For example, `caper run ... --singularity docker://ubuntu:latest` will run each task within a Singularity image built from a docker image `ubuntu:latest`. These parameters can also be used as flags. If used as a flag, Caper will try to find a default docker/singularity/conda in WDL. e.g. All ENCODE pipelines have default docker/singularity images defined within WDL's meta section (under key `caper_docker` or `default_docker`).
For local backends (`local`, `slurm`, `sge`, `pbs` and `lsf`), you can use `--docker`, `--singularity` or `--conda` to run WDL tasks in a pipeline within one of these environments. For example, `caper run ... --singularity docker://ubuntu:latest` will run each task within a Singularity image built from a docker image `ubuntu:latest`. These parameters can also be used as flags. If used as a flag, Caper will try to find a default docker/singularity/conda in WDL. Pipelines can define default docker/singularity images within WDL's meta section (under key `caper_docker` or `default_docker`).

> **IMPORTANT**: Docker/singularity/conda defined in Caper's configuration file or in CLI (`--docker`, `--singularity` and `--conda`) will be overriden by those defined in WDL task's `runtime`. We provide these parameters to define default/base environment for a pipeline, not to override on WDL task's `runtime`.

Expand Down
13 changes: 0 additions & 13 deletions bin/caper

This file was deleted.

4 changes: 3 additions & 1 deletion caper/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
"""Caper - Cromwell Assisted Pipeline ExecutoR."""

from .caper_client import CaperClient, CaperClientSubmit
from .caper_runner import CaperRunner

__all__ = ['CaperClient', 'CaperClientSubmit', 'CaperRunner']
__version__ = '2.3.2'
__version__ = '3.0.0'
2 changes: 2 additions & 0 deletions caper/__main__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
"""Entry point for running caper as a module."""

from . import cli

if __name__ == '__main__':
Expand Down
Loading
Loading