Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Template standardization #10

Merged
merged 101 commits into from
Nov 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
3153c9f
first commit
Dec 2, 2022
b47872f
dummy ex process
Dec 2, 2022
949e701
row split
Dec 2, 2022
ec2fe69
row split
Dec 2, 2022
68c0840
sed g fix
Dec 2, 2022
4a7bf21
hpo list
Dec 5, 2022
88a2af6
vcf path rm ln
Dec 5, 2022
d41c9ac
proband placeholder
Dec 6, 2022
1de6293
proband placeholder
Dec 6, 2022
6190d7c
update exomiser|PED|parser_py
Dec 6, 2022
b1ec0c4
remove dsl1 param
Dec 6, 2022
a8c7625
readlink PED file
Dec 6, 2022
2e16bd0
sed colon
Dec 6, 2022
0767c00
sed colon fix
Dec 6, 2022
a13a382
ped input by cat
Dec 7, 2022
dc895e0
file ped no proband
Dec 7, 2022
9effa15
Ped_parser container added
Dec 7, 2022
7886f44
dockerhub test
Dec 7, 2022
08814b6
fix ped cmd bash error
Dec 7, 2022
ef381cc
fix ped cmd bash error
Dec 7, 2022
cbe4d20
add quay image
Dec 7, 2022
5e5ff75
collect input
Dec 8, 2022
0a13943
mistaken deletion fix
Dec 8, 2022
fc1b22f
add maxforks
Dec 8, 2022
819d08d
ls it
Dec 8, 2022
2f07bda
wildcard input collect
Dec 9, 2022
64ea1d9
rm collect
Dec 9, 2022
eeb3ab2
split channels
Dec 9, 2022
83e87cf
remove exomiser data
Dec 9, 2022
3982aba
add exomiser dat
Dec 9, 2022
9349889
add exomiser dat
Dec 9, 2022
090be19
final
Dec 9, 2022
3971fdc
final2
Dec 9, 2022
4e8d2c9
gunzip basename
Dec 9, 2022
9b72d06
gunzip basename
Dec 9, 2022
93e6eb3
HPO fix again
Dec 9, 2022
8a9ee12
proband id channel
Dec 11, 2022
66d6e98
fix if ch
Dec 11, 2022
e285a8b
add index
Dec 12, 2022
222f065
add retry
Dec 12, 2022
7557f5f
copy file
Dec 12, 2022
8b9bf8e
copy file
Dec 12, 2022
05b2eab
gunzip file
Dec 12, 2022
275934d
tuple input output
Dec 13, 2022
330319a
debug local
Dec 13, 2022
049b25c
fork1
R-Cardenas Dec 13, 2022
fcd92da
rm ln hash
Dec 13, 2022
4accd6e
fork 5 delay submit
Dec 14, 2022
02f3a89
uncomment out exomiser commands - fork =1
R-Cardenas Jan 10, 2023
ef6f2ad
add container options
R-Cardenas Jan 11, 2023
9e13023
change rate
R-Cardenas Jan 11, 2023
e318df6
remove nans from ped
R-Cardenas Jan 11, 2023
afd2672
rate limit 5m
R-Cardenas Jan 12, 2023
ecdcb3c
retries
R-Cardenas Jan 12, 2023
b70b904
rm symbolic link
R-Cardenas Jan 12, 2023
89707a0
force symbolic link
R-Cardenas Jan 12, 2023
e557a63
size vcf symbolic
R-Cardenas Jan 12, 2023
353f1fa
stat symbolic link
R-Cardenas Jan 12, 2023
76fccdb
join channels
Jan 16, 2023
85617be
realign params
Jan 16, 2023
a772ebc
ln svf
Jan 16, 2023
103bd00
update yaml conf
Jan 16, 2023
e475cf8
Update main.nf
sk-sahu May 30, 2023
22ea0c6
batch run fix
sk-sahu Jul 17, 2023
7df8854
added the ped_module script in the bin and removed the hardcoded path
l-mansouri Oct 26, 2023
d4cf4ec
added testing_family_file
l-mansouri Oct 26, 2023
b8d585e
changed paths in the family file
l-mansouri Oct 27, 2023
b87088c
changed the sep of the family file
l-mansouri Oct 27, 2023
199af4a
removed family file as it is in S3
l-mansouri Oct 27, 2023
4895370
fixed pipeline bugs in testing
l-mansouri Oct 27, 2023
8544610
fixed bugs in testing
l-mansouri Oct 27, 2023
7146570
added documentation for the pipeline
l-mansouri Oct 27, 2023
0b7191d
fixed url in documentation for the pipeline
l-mansouri Oct 27, 2023
6b41db0
added default paths to files in S3 bucket
l-mansouri Oct 30, 2023
ba522c6
fixed typo
l-mansouri Oct 30, 2023
2223fa2
changed the docker for bug on platform
l-mansouri Oct 30, 2023
8151ee9
changed bundle data path
l-mansouri Oct 31, 2023
3c05507
commented debug code
l-mansouri Oct 31, 2023
d170d62
Merge branch 'dev' into remove-additional-line
l-mansouri Oct 31, 2023
1c98cfa
removed_js_files
l-mansouri Oct 31, 2023
da6c0fe
Merge branch 'remove-additional-line' of https://github.com/lifebit-a…
l-mansouri Oct 31, 2023
596ac16
change name of the profiles
l-mansouri Oct 31, 2023
5672654
fixed typo in documentation
l-mansouri Oct 31, 2023
39cb1e1
changed channel names
l-mansouri Oct 31, 2023
028b695
outputs MultiQC html into outdir
l-mansouri Oct 31, 2023
6d381ae
removing directive from process and adding them to the config file
l-mansouri Oct 31, 2023
d15b8bd
added usage command line with params and defaults
l-mansouri Oct 31, 2023
668993d
parametrized resources
l-mansouri Oct 31, 2023
7e5ce7e
fix typo
l-mansouri Oct 31, 2023
02de25b
fix typo
l-mansouri Oct 31, 2023
4abc11d
fix typo
l-mansouri Oct 31, 2023
6c77128
fix typo
l-mansouri Nov 2, 2023
1491d41
fix typo
l-mansouri Nov 2, 2023
5ef94b8
moved submitRateLimit back to the process
l-mansouri Nov 2, 2023
b266fee
added testing profile for multi_hpo
l-mansouri Nov 2, 2023
0e617b6
fixed README
l-mansouri Nov 6, 2023
fe46b3f
fix typo
l-mansouri Nov 7, 2023
88b3466
changed configs for template standardization
l-mansouri Nov 17, 2023
84e1fbc
moved test config files
l-mansouri Nov 17, 2023
e2db14b
changed README to reflect changes
l-mansouri Nov 17, 2023
8dfa2e9
fix merge conflict with dev
l-mansouri Nov 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 16 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Exomiser

# Exomiser

## Pipeline documentation

Table of contents
Expand Down Expand Up @@ -62,7 +64,7 @@ This is a file needed by exomiser to run. It contains placeholders in the text t

### --exomiser_data

This path refers to the reference data bundle needed by exomiser (~120 GB!). A copy of such files can be found [here](https://lifebit-featured-datasets.s3.eu-west-1.amazonaws.com/pipelines/exomiser-data-bundle/) . The reference dataset has been added as a parameter, allowing flexibility to pull the data from any resource (i.e. cloud, local storage, ftp, ...) and Nextlfow will automatically take care of fetching the data without having to add anything to the pipeline itself.
This path refers to the reference data bundle needed by exomiser (~120 GB!). A copy of such files can be found [here](https://lifebit-featured-datasets.s3.eu-west-1.amazonaws.com/pipelines/exomiser-data-bundle/) . The reference dataset has been added as a parameter, allowing flexibility to pull the data from any resource (i.e. cloud, local storage, ftp, ...) and Nextflow will automatically take care of fetching the data without having to add anything to the pipeline itself.

There are other parameters that can be tweaked to personalize the behaviour of the pipeline. These are referenced in `nextflow.config`

Expand Down Expand Up @@ -98,13 +100,19 @@ To run the pipeline with `docker` (used by default), type the following commands
To test the pipeline on a multi-VCF:

```
nextflow run main.nf -profile family_test
nextflow run main.nf -profile test_full_family
```

or

```
nextflow run main.nf -profile test_full_multi_hpo
```

To test the pipeline on a single-sample VCF:

```
nextflow run main.nf -profile single_vcf_test
nextflow run main.nf -profile test_full_single_vcf
```

Be careful when running this, as the pipeline requires the staging of 120 GB of reference data, required by exomiser, so only that takes a while!
Expand All @@ -113,7 +121,8 @@ Be careful when running this, as the pipeline requires the staging of 120 GB of

### Profiles

| profile name | Run locally | Run on CloudOS | description |
| :-------------: | :--------------------------------------------------------------------: | :------------: | :-----------------------------------------------------------------------------: |
| test_family | the data required is so big, it was tested on a c5.4xlarge EC2 machine | Successful | this test is designed to test the pipeline on a multi-VCF with trio information |
| test_single_vcf | the data required is so big, it was tested on a c5.4xlarge EC2 machine | Successful | this test is designed to test the pipeline on a single-sample-VCF |
| profile name | Run locally | Run on CloudOS | description |
| :------------------: | :--------------------------------------------------------------------: | :------------: | :------------------------------------------------------------------------------------------------------: |
| test_full_family | the data required is so big, it was tested on a c5.4xlarge EC2 machine | Successful | this test is designed to test the pipeline on a multi-VCF with trio information |
| test_full_single_vcf | the data required is so big, it was tested on a c5.4xlarge EC2 machine | Successful | this test is designed to test the pipeline on a single-sample-VCF |
| test_full_multi_hpo | the data required is so big, it was tested on a c5.4xlarge EC2 machine | Successful | this test is designed to test the pipeline on a multi-VCF with trio information using multiple HPO terms |
3 changes: 3 additions & 0 deletions conf/containers/dockerhub.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
// params {
// main_container = 'dockerhub.io/lifebitaiorg/report:latest'
// }
5 changes: 5 additions & 0 deletions conf/containers/ecr.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
// params {
// main_container = 'https://${params.aws_account_id}.dkr.${params.aws_region}.amazonaws.com/lifebitaiorg/report:latest'
// }
// ECR pattern:
// https://aws_account_id.dkr.ecr.region.amazonaws.com/lifebitaiorg/tool:version
4 changes: 2 additions & 2 deletions conf/containers/quay.config
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
params {
main_container = 'quay.io/lifebitai/exomiser:12.1.0'
}
main_container = "quay.io/lifebitai/exomiser:${params.exomiser_container_tag}"
}
37 changes: 37 additions & 0 deletions conf/customised_pipeline_resources.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
params {

// process resources default
cpus = 1
memory = 2.GB
time = 8.h // do not change

// max resources limits defaults
max_cpus = 8
max_memory = 60.GB
max_time = 300.h // do not change

// process_micro defaults
micro_memory = 2.GB
micro_cpus = 1

// process_small defaults
small_memory = 4.GB
small_cpus = 2

// process_medium defaults
medium_memory = 6.GB
medium_cpus = 4

// process_large defaults
large_memory = 15.GB
large_cpus = 4

// other parameters
echo = false
errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'terminate' }
maxErrors = -1
maxRetries = 3
maxForks = 200
queueSize = 200

}
6 changes: 6 additions & 0 deletions conf/data/data.config
Original file line number Diff line number Diff line change
@@ -1 +1,7 @@
// If there is any data that needs to be included in the config, it should be placed here using "${params.reference_data_bucket}/path/to/data"
params {
exomiser_data = "${params.reference_data_bucket}/pipelines/exomiser-data-bundle"
application_properties = "${params.reference_data_bucket}/pipelines/exomiser-nf/application.properties"
auto_config_yml = '${params.reference_data_bucket}/pipelines/exomiser-nf/auto_config_v2.yml'

}
20 changes: 20 additions & 0 deletions conf/executors/singularity.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
/*
* -------------------------------------------------
* Nextflow config file for running pipeline with Singularity locally
* -------------------------------------------------
* Base config needed for running with -profile singularity
*/

params {
singularity_cache = "local_singularity_cache"
}

singularity {
enabled = true
cacheDir = params.singularity_cache
autoMounts = true
}

docker {
enabled = false
}
File renamed without changes.
7 changes: 7 additions & 0 deletions conf/tests/full/test_full_family.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
params {
families_file = 's3://lifebit-featured-datasets/pipelines/exomiser-nf/fam_file.tsv'
prioritisers = 'hiPhivePrioritiser'
exomiser_data = "s3://lifebit-featured-datasets/pipelines/exomiser-data-bundle"
application_properties = 's3://lifebit-featured-datasets/pipelines/exomiser-nf/application.properties'
auto_config_yml = 's3://lifebit-featured-datasets/pipelines/exomiser-nf/auto_config.yml'
}
7 changes: 7 additions & 0 deletions conf/tests/full/test_full_multi_hpo.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
params {
families_file = 's3://lifebit-featured-datasets/pipelines/exomiser-nf/fam_file_multi_hpo.tsv'
prioritisers = 'hiPhivePrioritiser'
exomiser_data = "s3://lifebit-featured-datasets/pipelines/exomiser-data-bundle"
application_properties = 's3://lifebit-featured-datasets/pipelines/exomiser-nf/application.properties'
auto_config_yml = 's3://lifebit-featured-datasets/pipelines/exomiser-nf/auto_config.yml'
}
8 changes: 8 additions & 0 deletions conf/tests/full/test_full_single_vcf.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
params {
families_file = 's3://lifebit-featured-datasets/pipelines/exomiser-nf/single_vcf.tsv'
hpo_terms_file = 's3://lifebit-featured-datasets/pipelines/exomiser-nf/hpo_terms_file.txt'
prioritisers = 'hiPhivePrioritiser'
exomiser_data = "s3://lifebit-featured-datasets/pipelines/exomiser-data-bundle"
application_properties = 's3://lifebit-featured-datasets/pipelines/exomiser-nf/application.properties'
auto_config_yml = 's3://lifebit-featured-datasets/pipelines/exomiser-nf/auto_config.yml'
}
49 changes: 36 additions & 13 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,16 @@ manifest {
name = 'lifebit-ai/exomiser-nf'
description = 'A pipeline to perform variant prioritisation'
mainScript = 'main.nf'
version = 'v1.0'
version = 'v2.0'
}

includeConfig 'conf/customised_pipeline_resources.config'

docker.enabled = true

params {
raci_owner = "Lifebit"

// Exomiser specific parameters
reference_data_bucket = "s3://lifebit-featured-datasets"
bucket_pattern = "lifebit-featured-datasets"
Expand All @@ -23,8 +27,8 @@ params {
exomiser_phenotype_data = 's3://lifebit-featured-datasets/pipelines/exomiser/very_fake/2102_phenotype'
cadd_snvs = 's3://lifebit-featured-datasets/pipelines/exomiser/very_fake/cadd_snvs'
phenix_data = 's3://lifebit-featured-datasets/pipelines/exomiser/very_fake/phenix'
application_properties = 's3://lifebit-featured-datasets/pipelines/exomiser-nf/application.properties'
auto_config_yml = 's3://lifebit-featured-datasets/pipelines/exomiser-nf/auto_config_v2.yml'
application_properties = "${params.reference_data_bucket}/pipelines/exomiser-nf/application.properties"
auto_config_yml = '${params.reference_data_bucket}/pipelines/exomiser-nf/auto_config_v2.yml'
hpo_terms_file = false
modes_of_inheritance = 'AUTOSOMAL_DOMINANT,AUTOSOMAL_RECESSIVE,X_RECESSIVE,UNDEFINED'
prioritisers = 'hiPhivePrioritiser,phivePrioritiser,phenixPrioritiser'
Expand All @@ -48,11 +52,16 @@ params {
exomiser_container_tag = '12.1.0'
cloudos_cli_container_tag = '0.0.2'

// awsbatch specific
aws_batch_process_queue = null
aws_batch_cli_path = '/home/ec2-user/miniconda/bin/aws'
aws_batch_fetch_instance_type = true
aws_region = 'ap-east-1'
queueSize = 200
executor = false

// AWS batch
aws_region = 'eu-west-1'
aws_batch_default_queue = "optimal-instance-1tb-ami-on-demand-queue"
aws_batch_cli_path = '/home/ec2-user/miniconda/bin/aws'
aws_batch_fetch_instance_type = true
aws_batch_max_parallel_transfers = 2
aws_batch_volumes = '/home/ec2-user/.aws:/root/.aws'

//process resources
memory = 6.GB
Expand All @@ -63,25 +72,33 @@ params {
maxRetries = 3
}

includeConfig 'conf/containers/quay.config'
l-mansouri marked this conversation as resolved.
Show resolved Hide resolved
includeConfig 'conf/data/data.config' // Loads in data


profiles {
standard { includeConfig params.config }
test_family { includeConfig 'conf/family_test.config' }
test_single_vcf { includeConfig 'conf/single_vcf_test.config' }
test_multi_hpo { includeConfig 'conf/multi_hpo_test.config' }
awsbatch { includeConfig 'conf/executors/awsbatch.config' }
eu_west_1 { includeConfig 'conf/cloud-region/eu_west_1.config' }
eu_west_2 { includeConfig 'conf/cloud-region/eu_west_2.config' }
test_full { includeConfig "conf/tests/full/test_full.config" }
ci_test_data { includeConfig "conf/data/ci_test_data.config" }
test_full_family { includeConfig 'conf/tests/full/test_full_family.config' }
test_full_single_vcf { includeConfig 'conf/tests/full/test_full_single_vcf.config' }
test_full_multi_hpo { includeConfig 'conf/tests/full/test_full_multi_hpo.config' }
ci_test_data { includeConfig "conf/tests/ci/ci_test_data.config" }
singularity { includeConfig 'conf/executors/singularity.config' }
dockerhub { includeConfig 'conf/containers/dockerhub.config' }
quay { includeConfig 'conf/containers/quay.config' }
ecr { includeConfig 'conf/containers/ecr.config' }
}

includeConfig 'conf/resources.config'

process {
echo = params.echo
errorStrategy = params.errorStrategy
withName: exomiser {
container = "quay.io/lifebitai/exomiser:${params.exomiser_container_tag}"
container = params.main_container
containerOptions = "--volume ${params.exomiser_data_directory}:/data/"
memory = params.memory
cpus = params.cpus
Expand All @@ -90,4 +107,10 @@ process {
errorStrategy = params.errorStrategy
maxRetries = params.maxRetries
}
}


executor {
name = params.executor
queueSize = params.queueSize
}
Loading