Skip to content

nexomis/terraform_nf_awsbatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Terraform Template for Nextflow AWS Batch

Introduction

This project aims at facilitating the deployment of AWS Batch infrastructure to use Nextflow.

Below is a summary diagram:

Alt text

Prerequisites

  • Terraform installed
  • AWS CLI installed and configured with the appropriate profile

Note that you'll need the AWS CLI version 2

aws configure sso

Importantly the profile needs the following access

Required AWS Managed policies:

  • AmazonEC2FullAccess
  • AmazonECS_FullAccess
  • AmazonS3FullAccess
  • AWSBatchFullAccess
  • AWSImageBuilderFullAccess
  • IAMFullAccess

Note: These permissions are broad and should be refined for production environments. The setup includes a Nextflow-specific user with more restricted permissions.

Configuration

1. Clone the repository

git clone https://github.com/nexomis/terraform_nf_awsbatch.git
cd terraform_nf_awsbatch

2. (Optional) Create a backend.tf file (optional, if you want to use remote state storage with S3)

Manually create a S3 bucket XXXXXXXXXXXXXXXXXXXX in the appropriate region.

Note: Bucket name and region are configurable in backend.tf.

terraform {
  backend "s3" {
    region = "eu-west-3"
    bucket = "XXXXXXXXXXXXXXXXXXXX"
    key    = "YYYYYYY.tfstate"
  }
}

3. Create a main.auto.tfvars file to specify variables with no default values. Here is an example:

prefix = "nf_awsbatch_jfouret"
new_tmp_bucket_for_env = "nf-awsbatch-jfouret.tmp"

Usage

Quick start

Run the following commands to initialize and apply the Terraform configuration:

terraform init
terraform apply

Inputs

Inputs

Name Description Type Default Required
aws_profile AWS profile string n/a yes
aws_region AWS region string "eu-west-3" no
batch_instance_type list of instance types for AWS Batch list(string)
[
"r5a.4xlarge",
"r5a.8xlarge",
"r5.4xlarge",
"r5.8xlarge",
"m5a.4xlarge",
"m5a.8xlarge",
"m5.4xlarge",
"m5.8xlarge",
"c5a.4xlarge",
"c5a.8xlarge",
"c5.4xlarge",
"c5.8xlarge"
]
no
batch_volume_iops IOPS for block storage for Batch instances number 6000 no
batch_volume_size Volume size for Batch instances that must be higher than the root volume from base ami number 1000 no
batch_volume_throughput Throughput (MB/s) for block storage for Batch instances number 500 no
max_cpus Max number of CPUs number 128 no
new_tmp_bucket_for_env The name of a bucket that will be created for tmp data string n/a yes
prefix The prefix for naming string n/a yes
session_instance_type Instance type to use for the session (c5n good for network) string "c5n.xlarge" no
session_volume_iops IOPS for block storage for Session instance number 3000 no
session_volume_size Volume size for Session instance that must be higher than the root volume from base ami number 100 no
session_volume_throughput Throughput (MB/s) for block storage for Session instance number 125 no
tower_access_token The token from the seqera plateform to use wave string n/a yes
use_fusion Flag to determine whether to use fusion or not bool false no

Outputs

Name Description
private_key Path of the private key to connect the session instance
public_ip IP of the session instance to connect to start a pipeline
username Username to use with SSH

Run a nextflow pipeline

First, connect to the EC2 instance with SSH.

NXF_VER=23.10.0 nextflow -c ~/nextflow.config nexomis/primary

Inspirations

I have followed the following ressources to elaborate this template:

Of note, with the introduction of wave and fusion some things have changed, therefore the use of image is less necessary. In addition I tried to use more of role-defined permission rather than using acess key or secret.

Modules

Module: nf_awsbatch_network

Establishes a VPC with both public and private subnets across availability zones.

  • Options
    1. Private subnets in AWS Batch with a NAT Gateway per availability zone.
    2. Private subnets with a single NAT Gateway (default).
    3. Public subnets without NAT, assigning public IPs to each instance.
  • Pricing considerations
    • NAT Gateway uptime and data transfer processing fees.
    • Cross-zone data transfer costs when using a single NAT.
    • Elastic IPv4 charges for NATs or instances in public subnet scenarios.

For more details : https://aws.amazon.com/vpc/pricing/

Module: nf_awsbatch_batch

Configures the AWS Batch infrastructure, tailored for optimal performance with typical Nextflow use cases.

Module: nf_awsbatch_session

Facilitates setting up a session with an EC2 instance for launching Nextflow runs. This module includes:

  • Creation of an S3 bucket for storing Nextflow intermediate files.
  • Provisioning an instance in the same region as AWS Batch for efficient interaction with the S3 bucket.
  • A basic Nextflow configuration file for AWS Batch.
  • Installation of awscli and mount-s3, with instance roles permitting S3 access.

Template main.tf

Serves as an example showcasing the integration of all modules.

Output of the instance IP and provision of generated_key.pem for secure to the batch session.

known limitation

Cross-region S3 Gateway have not been setup https://repost.aws/knowledge-center/vpc-endpoints-cross-region-aws-services