This project aims at facilitating the deployment of AWS Batch infrastructure to use Nextflow.
Below is a summary diagram:
- Terraform installed
- AWS CLI installed and configured with the appropriate profile
Note that you'll need the AWS CLI version 2
aws configure sso
Importantly the profile needs the following access
Required AWS Managed policies:
- AmazonEC2FullAccess
- AmazonECS_FullAccess
- AmazonS3FullAccess
- AWSBatchFullAccess
- AWSImageBuilderFullAccess
- IAMFullAccess
Note: These permissions are broad and should be refined for production environments. The setup includes a Nextflow-specific user with more restricted permissions.
git clone https://github.com/nexomis/terraform_nf_awsbatch.git
cd terraform_nf_awsbatch
Manually create a S3 bucket XXXXXXXXXXXXXXXXXXXX
in the appropriate region.
Note: Bucket name and region are configurable in
backend.tf
.
terraform {
backend "s3" {
region = "eu-west-3"
bucket = "XXXXXXXXXXXXXXXXXXXX"
key = "YYYYYYY.tfstate"
}
}
prefix = "nf_awsbatch_jfouret"
new_tmp_bucket_for_env = "nf-awsbatch-jfouret.tmp"
Run the following commands to initialize and apply the Terraform configuration:
terraform init
terraform apply
Name | Description | Type | Default | Required |
---|---|---|---|---|
aws_profile | AWS profile | string |
n/a | yes |
aws_region | AWS region | string |
"eu-west-3" |
no |
batch_instance_type | list of instance types for AWS Batch | list(string) |
[ |
no |
batch_volume_iops | IOPS for block storage for Batch instances | number |
6000 |
no |
batch_volume_size | Volume size for Batch instances that must be higher than the root volume from base ami | number |
1000 |
no |
batch_volume_throughput | Throughput (MB/s) for block storage for Batch instances | number |
500 |
no |
max_cpus | Max number of CPUs | number |
128 |
no |
new_tmp_bucket_for_env | The name of a bucket that will be created for tmp data | string |
n/a | yes |
prefix | The prefix for naming | string |
n/a | yes |
session_instance_type | Instance type to use for the session (c5n good for network) | string |
"c5n.xlarge" |
no |
session_volume_iops | IOPS for block storage for Session instance | number |
3000 |
no |
session_volume_size | Volume size for Session instance that must be higher than the root volume from base ami | number |
100 |
no |
session_volume_throughput | Throughput (MB/s) for block storage for Session instance | number |
125 |
no |
tower_access_token | The token from the seqera plateform to use wave | string |
n/a | yes |
use_fusion | Flag to determine whether to use fusion or not | bool |
false |
no |
Name | Description |
---|---|
private_key | Path of the private key to connect the session instance |
public_ip | IP of the session instance to connect to start a pipeline |
username | Username to use with SSH |
First, connect to the EC2 instance with SSH.
NXF_VER=23.10.0 nextflow -c ~/nextflow.config nexomis/primary
I have followed the following ressources to elaborate this template:
- Seqera Labs Nextflow and AWS Batch Integration Part 1
- Seqera Labs Nextflow and AWS Batch Integration Part 2
- STAPH-B Public Health Bacterial Bioinformatics Portal
- AWS Open Data Genomics Workflows
Of note, with the introduction of wave and fusion some things have changed, therefore the use of image is less necessary. In addition I tried to use more of role-defined permission rather than using acess key or secret.
Establishes a VPC with both public and private subnets across availability zones.
- Options
- Private subnets in AWS Batch with a NAT Gateway per availability zone.
- Private subnets with a single NAT Gateway (default).
- Public subnets without NAT, assigning public IPs to each instance.
- Pricing considerations
- NAT Gateway uptime and data transfer processing fees.
- Cross-zone data transfer costs when using a single NAT.
- Elastic IPv4 charges for NATs or instances in public subnet scenarios.
For more details : https://aws.amazon.com/vpc/pricing/
Configures the AWS Batch infrastructure, tailored for optimal performance with typical Nextflow use cases.
Facilitates setting up a session with an EC2 instance for launching Nextflow runs. This module includes:
- Creation of an S3 bucket for storing Nextflow intermediate files.
- Provisioning an instance in the same region as AWS Batch for efficient interaction with the S3 bucket.
- A basic Nextflow configuration file for AWS Batch.
- Installation of
awscli
andmount-s3
, with instance roles permitting S3 access.
Serves as an example showcasing the integration of all modules.
Output of the instance IP and provision of generated_key.pem
for secure to the batch session.
Cross-region S3 Gateway have not been setup https://repost.aws/knowledge-center/vpc-endpoints-cross-region-aws-services