GitHub - VCCRI/MetaDenovo: MetaDenovo : An automated framework to detect de novo mutations from whole genome trio data using cloud computing technology.

MetaDenovo : An automated framework to detect de novo mutations from whole genome trio data using cloud computing technology

MetaDenovo utilizes consensus for current state-of-the-art de novo callers (DeNovoGear, VarScan2, TrioDeNovo and PhaseByTransmission). It is developed using Cromwell (an open-source Workflow Management System for bioinformatics) and WDL(Workflow Definition Language) developed by Broad Institute. It is executed on AWS Cloud computing environment.

Input File Requirements

o WGS BAM files for trio (Mother, Father, Child)

o Trio VCF file from variant calling pipeline

o Genome Reference file (.fa format)

o Pedigree file (PED file format)

System Requirements

Cromwell Setup

See the Cromwell documentation for details - https://cromwell.readthedocs.io/en/stable/

We have tested MetaDenovo using Cromwell on Amazon Web Services.

Establishing Cromwell on AWS consists of three steps :

a. Creation of Amazon VPC

b. Creation of Genomics Workflow Core

c. Creation of Workflow Orchestrators using Cromwell

Please make sure you have sufficient credits available for utilizing AWS.

Running MetaDenovo.WDL script

a. Access Cromwell server (EC2 instance) via AWS Session Manager

b. Type in the command “sudo su - ec2-user” to switch to the user’s home directory.

c. Copy MetaDenovo repository onto Cromwell server

   git clone https://github.com/VCCRI/MetaDenovo.git

d. Zip of dependent files into imports directory.

   zip import *.wdl

e. Copy the directory "code_extend" into your own S3 bucket. The py files are used by MetaDenovo workflow and need to be accessible by the batch instances created by MetaDenvo workflow.

Update the parameters in UserInputs.json/UserInputs_demo.json
"MetaDenovo_workflow.python_file": "s3://your S3 location/code_extend/DenovoGear_numeric_genotype.py",
"MetaDenovo_workflow.selectDNMGenotype_program": "s3://your S3 location>code_extend/TrioDenovo_select_DNM_genotype.py"

g. Run the MetaDenovo DEMO workflow using curl command :

   curl -X POST "http://localhost:8000/api/workflows/v1" -H "accept: application/json" -F "workflowSource=@MetaDenovo.wdl" -F "workflowDependencies=@imports.zip" -F "workflowInputs=@UserInputs_demo.json" -F "workflowOptions=@Options_demo.json"

Modify UserInputs_demo.json with S3 path to the "code_extend/" py files.
Mmodify Options_demo.json files to set your output S3 directory path.
The output files are stored under the "final_workflow_outputs_dir" specified in Options_demo.json

For de novo SNVs : ALL_dnSNP.txt, MetaDenovo_four_callers_SNP.txt, MetaDenovo_three_callers_SNP.txt, MetaDenovo_two_callers_SNP.txt and MetaDenovo_one_callers_SNP.txt

For de novo INDELs : ALL_dnINDEL.txt, MetaDenovo_three_callers_INDEL.txt, MetaDenovo_two_callers_INDEL.txt and MetaDenovo_one_callers_INDEL.txt
The worklow log files is located under wf_logs directory (provided by final_workflow_log_dir parameter in Options_demo.json file) on S3.
Check for the message "Workflow MetaDenovo_workflow complete" in the workflow log file for successful workflow completion.
It should take less than 1 hour to run DEMO data.

h. Run the MetaDenovo workflow for your own data :

Provide the AWS S3 storage service paths to the following input file parameters in the MetaDenovo_UserInputs.json:

mother_bam = Aligned BAM file of mother
mother_bam_bai = Index file of aligned BAM file of mother
father_bam = Aligned BAM file of father
father_bam_bai = Index file of aligned BAM file of father
child_bam = Aligned BAM file of offspring
child_bam_bai = Index file of aligned BAM file of offspring
chromosome IDs = list of chromosomes to interrogate for DNMs, eg [1,2,3]
reference = reference genome fasta file
reference_fai = Index file of reference genome fasta file
reference_dict = Dictionary file of reference genome fasta file
ped_file = trio’s pedigree file
gatk_vcf = GATK processed VCF file for trio
python_file = your S3 location/code_extend/DenovoGear_numeric_genotype.py
selectDNMGenotype_program = your S3 location/code_extend/TrioDenovo_select_DNM_genotype.py

Setup your MetaDenovo_Options.json to contain the AWS S3 storage service paths where MetaDenovo outputs are to be stored:

final_workflow_outputs_dir = location to store output DNMs files from MetaDenovo, which contains the consensus DNMs reports mentioned above and the original DNM results from each DNM caller.
use_relative_output_paths: false
final_workflow_log_dir = location to store workflow logs - a unique ID log file keeps track of all steps from the MetaDeNovo workflow.

final_call_logs_dir = location to store call logs

 curl -X POST "http://localhost:8000/api/workflows/v1" -H "accept: application/json" -F "workflowSource=@MetaDenovo.wdl" -F "workflowDependencies=@imports.zip" -F "workflowInputs=@MetaDenovo_UserInputs.json" -F "workflowOptions=@MetaDenovo_Options.json"

MetaDeNovo is executed using the following command on the Cromwell server:

curl -X POST "http://localhost:8000/api/workflows/v1"

-H "accept: application/json"

-F "workflowSource=@MetaDenovo.wdl"

-F "workflowDependencies=@imports.zip"

-F "workflowInputs=@ MetaDenovo_UserInputs.json "

-F "workflowOptions=@ MetaDenovo_Options.json"

Here, MetaDeNovo.wdl is the main WDL file. All sub-workflow WDL files for the four DNMs callers are zipped into the imports.zip folder.

Take note of the "id" after submission as that is needed to check on the status of your workflow run.

"id":"1c2f0368-e876-4699-9049-7a5510f6df2f","status":"Submitted"}

i. Checking the status of your MetaDenovo workflow run :

curl -X GET "http://localhost:8000/api/workflows/v1/1c2f0368-e876-4699-9049-7a5510f6df2f/status"

Response:

{"status":"Running","id":"1c2f0368-e876-4699-9049-7a5510f6df2f"}

{"status":"Succeeded","id":"1c2f0368-e876-4699-9049-7a5510f6df2f"}

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
code_extend		code_extend
ConsensusDNMs.wdl		ConsensusDNMs.wdl
DenovoGearPipeline.wdl		DenovoGearPipeline.wdl
DenovoGearPostProcessing.wdl		DenovoGearPostProcessing.wdl
LICENSE		LICENSE
MetaDenovo.wdl		MetaDenovo.wdl
MetaDenovo_Options.json		MetaDenovo_Options.json
MetaDenovo_UserInputs.json		MetaDenovo_UserInputs.json
Options_demo.json		Options_demo.json
PhasebytransmissionPipeline.wdl		PhasebytransmissionPipeline.wdl
README.md		README.md
TrioDenovoPipeline.wdl		TrioDenovoPipeline.wdl
UserInputs_demo.json		UserInputs_demo.json
VarScan2Pipeline.wdl		VarScan2Pipeline.wdl
VarScan2PostProcessing.wdl		VarScan2PostProcessing.wdl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Input File Requirements

System Requirements

Cromwell Setup

Running MetaDenovo.WDL script

About

Releases

Packages

Contributors 2

Languages

License

VCCRI/MetaDenovo

Folders and files

Latest commit

History

Repository files navigation

Input File Requirements

System Requirements

Cromwell Setup

Running MetaDenovo.WDL script

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages