metaGOflow overview

Welcome to the metaGOflow wiki!

metaGOflow supports:

the fast inference of taxonomic profiles from shotgun metagenomics data based on rRNA genes and their mOTUs
the functional annotation of the raw reads
theis assembly using the MEGAHIT algorithm

metagoworkflow_svg

Input

metaGOflow's main input files are:

forward and reverse .fastq files of shotgun metagenomics data, that can be either local or retrieved through an ENA run accession number, and
the config.yml file, where the user provides all the necessary parameter values for the workflow to run.

metaGOflow arguments

metaGOflow gets only a short list of arguments through the CLI that are strongly related to how it is going to be performed. You need to specify the raw data to be used. In case you need to fetch private ENA data you need to use the -p flag. If you are using Singularity, you also need to use the -s flag.

Pipeline parameters:
  -f                  Forward reads fasta file path (mandatory if and olny if -e not used).
  -r                  Reverse reads fasta file path (mandatory if and olny if -e not used).
  -e                  ENA run accession number. Its raw data will be fetched and then analysed (if used, -f and -r should not me set). 
  -d                  Output directory name (mandatory).
  -n                  Name of run and prefix to output files (mandatory).
  -s                  Run workflow using Singularity (Docker is the by default container technology). Works as a flag, i.e. by adding -s in your command, Singularity is going to be used
  -p                  Use ENA private data. Works as flag.
  -b                  Keep tmp folder. Works as flag. 

Resources:
  -m                  Memory to use to with toil --defaultMemory. (optional, default ${MEMORY})
  -c                  Number of cpus to use with toil --defaultCores. (optional, default ${NUM_CORES})
  -l                  Limit number of jobs to schedule. (optional, default ${LIMIT_QUEUE})

Here is an example of running metaGOflow with public data in ENA, in a Singularity cluster, without asking for the tmp folder to be kept.

./run_wf.sh -e ERR599171 -d TARA_OCEANS_SAMPLE -n ERR599171 -s

The `config.yml` file

This file works as an interface between metaGOflow and the user. In this file, you set which steps you want to perform as well as all the arguments for the tools that will be invoked.

We strongly advised user not to use the default arguments without considering first their data. The default min_length_required is 130 however your sequences might be shorter. This would lead metaGOflow to fail. You need to consider your data first as well as your computing environment, especially for the case of the functional annotation step, and fill in the config.yml file properly.

Output

metaGOflow will return a .zip file that is a compressed RO-Crate. This is an example case of the .zip content from a complete run of the workflow:


├── config.yml
├── ERR599171.yml
├── results
│   ├── ERR599171_1.fastq.trimmed.fasta
│   ├── ERR599171_1.fastq.trimmed.qc_summary
│   ├── ERR599171_2.fastq.trimmed.fasta
│   ├── ERR599171_2.fastq.trimmed.qc_summary
│   ├── ERR599171.merged_CDS.faa
│   ├── ERR599171.merged_CDS.ffn
│   ├── ERR599171.merged.cmsearch.all.tblout.deoverlapped
│   ├── ERR599171.merged.fasta
│   ├── ERR599171.merged.motus.tsv
│   ├── ERR599171.merged.qc_summary
│   ├── ERR599171.merged.unfiltered_fasta
│   ├── fastp.html
│   ├── final.contigs.fa
│   ├── functional-annotation
│   │   ├── ERR599171.merged_CDS.I5.tsv.chunks
│   │   ├── ERR599171.merged_CDS.I5.tsv.gz
│   │   ├── ERR599171.merged.hmm.tsv.chunks
│   │   ├── ERR599171.merged.hmm.tsv.gz
│   │   ├── ERR599171.merged.summary.go
│   │   ├── ERR599171.merged.summary.go_slim
│   │   ├── ERR599171.merged.summary.ips
│   │   ├── ERR599171.merged.summary.ko
│   │   ├── ERR599171.merged.summary.pfam
│   │   ├── ERR599171.merged.emapper.summary.eggnog
│   │   └── stats
│   │       ├── go.stats
│   │       ├── interproscan.stats
│   │       ├── ko.stats
│   │       ├── orf.stats
│   │       └── pfam.stats
│   ├── RNA-counts
│   ├── sequence-categorisation
│   │   ├── 5_8S.fa.gz
│   │   ├── alpha_tmRNA.RF01849.fasta.gz
│   │   ├── Bacteria_large_SRP.RF01854.fasta.gz
│   │   ├── Bacteria_small_SRP.RF00169.fasta.gz
│   │   ├── cyano_tmRNA.RF01851.fasta.gz
│   │   ├── LSU_rRNA_archaea.RF02540.fa.gz
│   │   ├── LSU_rRNA_bacteria.RF02541.fa.gz
│   │   ├── LSU_rRNA_eukarya.RF02543.fa.gz
│   │   ├── RNaseP_bact_a.RF00010.fasta.gz
│   │   ├── SSU_rRNA_archaea.RF01959.fa.gz
│   │   ├── SSU_rRNA_bacteria.RF00177.fa.gz
│   │   ├── SSU_rRNA_eukarya.RF01960.fa.gz
│   │   ├── tmRNA.RF00023.fasta.gz
│   │   ├── tRNA.RF00005.fasta.gz
│   │   └── tRNA-Sec.RF01852.fasta.gz
│   └── taxonomy-summary
│       ├── LSU
│       │   ├── ERR599171.merged_LSU.fasta.mseq.gz
│       │   ├── ERR599171.merged_LSU.fasta.mseq_hdf5.biom
│       │   ├── ERR599171.merged_LSU.fasta.mseq_json.biom
│       │   ├── ERR599171.merged_LSU.fasta.mseq.tsv
│       │   ├── ERR599171.merged_LSU.fasta.mseq.txt
│       │   └── krona.html
│       └── SSU
│           ├── ERR599171.merged_SSU.fasta.mseq.gz
│           ├── ERR599171.merged_SSU.fasta.mseq_hdf5.biom
│           ├── ERR599171.merged_SSU.fasta.mseq_json.biom
│           ├── ERR599171.merged_SSU.fasta.mseq.tsv
│           ├── ERR599171.merged_SSU.fasta.mseq.txt
│           └── krona.html
└── ro-crate-metadata.json

The ro-crate-metadata.json file includes metadata about the sample (link to its ENA record) and about the metaGOflow version. A copy of the config.yml file is also included so one can reproduce the analysis.

Anything unclear or inaccurate? Please open an issue or email Dr.Haris Zafeiropoulos (haris.zafeiropoulos@kuleuven.be).

With respect to EMO BON protocols, samples, analyses you may contact the Observation, Data and Service Development Officer of EMBRC, Dr. Ioulia Santi (ioulia.santi@embrc.eu)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metaGOflow overview

Input

metaGOflow arguments

The `config.yml` file

Output

metaGOflow wiki

Clone this wiki locally

metaGOflow overview

Input

metaGOflow arguments

The config.yml file

Output

metaGOflow wiki

Clone this wiki locally

The `config.yml` file