This Nextflow workflow is designed to process a sample sheet (samplesheet.csv
), retrieve files from Synapse based on entityId
, and upload them to an AWS S3 bucket.
[!NOTE]
The workflow consists of two main steps:
- synapse_get: Downloads the files from Synapse using the
entityId
from the sample sheet. - cds_upload: Uploads the downloaded files to a specified AWS S3 bucket.
nextflow run ncihtan/nf-cdstransfer --input samplesheet.csv
Parameter | Type | Description |
---|---|---|
input | str |
Path to input samplesheet CSV file containing entityId and aws_uri columns. Required. |
take_n | int |
Number of samples to process from the samplesheet. Use -1 to process all samples. Default: -1 |
dryrun | bool |
If true, adds --dryrun flag to AWS copy commands for testing without actual file transfer. Default: false |
aws_secret_prefix | str |
Prefix for AWS credential environment variables. Used to construct variable names like ${aws_secret_prefix}_AWS_ACCESS_KEY_ID . Useful for managing multiple AWS credential sets. Default: "" |
The workflow transfers files from Synapse to CDS (Cloud Data Service) in three main steps:
- Read and parse input samplesheet
- Download files from Synapse
- Upload files to CDS S3 bucket
- Generate transfer report
Nextflow secrets are used to ensure that tokens, keys and secrets are not exposed
The following Nextflow secrets should be set:
SYNAPSE_AUTH_TOKEN
: Synapse authentication token.<params.aws_secret_prefix>_AWS_ACCESS_KEY_ID
: AWS access key ID. egCDS_AWS_ACCESS_KEY_ID
<params.aws_secret_prefix>
: AWS secret access key. egCDS_AWS_SECRET_ACCESS_KEY
nextflow secrets set SYNAPSE_AUTH_TOKEN <SUPER_SECRET_THING>
Field | Required | Pattern | Description | Example |
---|---|---|---|---|
entityId | Yes | ^syn\d+$ |
Synapse entity ID starting with 'syn' followed by numbers | syn123456 |
file_url_in_cds | Yes | ^s3://.+ |
URL to the file location in AWS S3, must start with 's3://' | s3://mybucket/path/to/file |
Notes:
- Additional columns are allowed but not validated
- Both fields are mapped internally:
entityId
→entityid
file_url_in_cds
→aws_uri
The workflow uses the following plugins:
nf-schema
: For parameter validation and schema managementnf-boost
: For enhanced functionality and utilities
The included nextflow.config
file specifies the following default options. These are used if not overridden by a custom config or profile.
docker.enabled = true
The nextflow.config
file defines several profiles to customize the workflow execution. Below are the available profiles and the parameters/settings they configure:
Setting / Profile | test | CDS | local | docker | tower |
---|---|---|---|---|---|
params.input | $projectDir/samplesheet.csv | - | - | - | - |
params.aws_secret_prefix | TEST | CDS | - | - | - |
params.dryrun | true | - | - | - | - |
docker.enabled | true | true | true | true | true |
process.executor | local | - | local | - | - |
process.cpus | - | - | - | 1 * task.attempt | |
process.memory | - | - | - | 1.GB * task.attempt | |
process.maxRetries | - | - | - | 3 | |
process.errorStrategy | - | - | - | retrys |
Downloads files from Synapse using entityIds.
meta
: Object containingentityId
andaws_uri
- Tuple of (
meta
, downloaded file path)
- Requires
SYNAPSE_AUTH_TOKEN
secret - Uses
synapsepythonclient
container
Uploads downloaded files to CDS S3 bucket.
- Tuple of (
meta
, file path) fromsynapse_get
- Tuple of (
meta
, upload success boolean)
- Requires AWS credentials:
${aws_secret_prefix}_AWS_ACCESS_KEY_ID
${aws_secret_prefix}_AWS_SECRET_ACCESS_KEY
- Uses AWS CLI container
No specific outputs are generated by the workflow.
By default a trace file is saved to reports/trace.csv
- Ensure Nextflow is installed.
- Ensure you have access to the necessary containers (
synapseclient
,awscli
). - Ensure you have the appropriate credentials for Synapse and AWS.
Run the workflow with the following command:
nextflow run ncihtan/nf-cdstransfer --input path/to/samplesheet.csv
Using the test profile will use samplesheet.csv
when stored in your projectDir. Please generate your own samplesheet and use aws_secret_prefix TEST
when setting your relevent AWS Nextflow secrets
nextflow run ncihtan/nf-cdstransfer -profile test
To avoid having to reset secrets when moving between destination accounts you can set your secrets using a prefix
nextflow secrets set MYCREDS_AWS_ACCESS_KEY_ID
nextflow secrets set MYCREDS_AWS_SECRET_ACCESS_KEY
nextflow run ncihtan/nf-cdstransfer --aws_secret_prefix MYCREDS
or use a configured profile in which params.aws_secret_prefix is set
nextflow run ncihtan/nf-cdstransfer -profile CDS --input samplesheet.csv