Skip to content

metaGOflow overview

Haris Zafeiropoulos edited this page May 10, 2023 · 14 revisions

Welcome to the metaGOflow wiki!

metaGOflow supports:

  • the fast inference of taxonomic profiles from shotgun metagenomics data based on rRNA genes and their mOTUs
  • the functional annotation of the raw reads
  • theis assembly using the MEGAHIT algorithm

metagoworkflow_svg

Input

metaGOflow's main input files are:

  • forward and reverse .fastq files of shotgun metagenomics data, that can be either local or retrieved through an ENA run accession number, and
  • the config.yml file, where the user provides all the necessary parameter values for the workflow to run.

metaGOflow arguments

argument description
-n (Optional) Defines the prefix of the output file (e.g. a sample ID)
-s (Optional) Running metaGOflow using Singularity rather than Docker (default)
-d (Required) Defines the output folder name
-f (Required) The path of the .gz file containing the forward reads
-r (Required) The path of the .gz file containing the reverse reads

The config.yml file

This file works as an interface between metaGOflow and the user. In this file, you set which steps you want to perform as well as all the arguments for the tools that will be invoked.

We strongly advised user not to use the default arguments without considering first their data. The default min_length_required is 130 however your sequences might be shorter. This would lead metaGOflow to fail. You need to consider your data first as well as your computing environment, especially for the case of the functional annotation step, and fill in the config.yml file properly.

Output

metaGOflow will return a .zip file that is a compressed RO-Crate. This is an example case of the .zip content from a complete run of the workflow:


├── config.yml
├── ERR599171.yml
├── results
│   ├── ERR599171_1.fastq.trimmed.fasta
│   ├── ERR599171_1.fastq.trimmed.qc_summary
│   ├── ERR599171_2.fastq.trimmed.fasta
│   ├── ERR599171_2.fastq.trimmed.qc_summary
│   ├── ERR599171.merged_CDS.faa
│   ├── ERR599171.merged_CDS.ffn
│   ├── ERR599171.merged.cmsearch.all.tblout.deoverlapped
│   ├── ERR599171.merged.fasta
│   ├── ERR599171.merged.motus.tsv
│   ├── ERR599171.merged.qc_summary
│   ├── ERR599171.merged.unfiltered_fasta
│   ├── fastp.html
│   ├── final.contigs.fa
│   ├── functional-annotation
│   │   ├── ERR599171.merged_CDS.I5.tsv.chunks
│   │   ├── ERR599171.merged_CDS.I5.tsv.gz
│   │   ├── ERR599171.merged.hmm.tsv.chunks
│   │   ├── ERR599171.merged.hmm.tsv.gz
│   │   ├── ERR599171.merged.summary.go
│   │   ├── ERR599171.merged.summary.go_slim
│   │   ├── ERR599171.merged.summary.ips
│   │   ├── ERR599171.merged.summary.ko
│   │   ├── ERR599171.merged.summary.pfam
│   │   ├── ERR599171.merged.emapper.summary.eggnog
│   │   └── stats
│   │       ├── go.stats
│   │       ├── interproscan.stats
│   │       ├── ko.stats
│   │       ├── orf.stats
│   │       └── pfam.stats
│   ├── RNA-counts
│   ├── sequence-categorisation
│   │   ├── 5_8S.fa.gz
│   │   ├── alpha_tmRNA.RF01849.fasta.gz
│   │   ├── Bacteria_large_SRP.RF01854.fasta.gz
│   │   ├── Bacteria_small_SRP.RF00169.fasta.gz
│   │   ├── cyano_tmRNA.RF01851.fasta.gz
│   │   ├── LSU_rRNA_archaea.RF02540.fa.gz
│   │   ├── LSU_rRNA_bacteria.RF02541.fa.gz
│   │   ├── LSU_rRNA_eukarya.RF02543.fa.gz
│   │   ├── RNaseP_bact_a.RF00010.fasta.gz
│   │   ├── SSU_rRNA_archaea.RF01959.fa.gz
│   │   ├── SSU_rRNA_bacteria.RF00177.fa.gz
│   │   ├── SSU_rRNA_eukarya.RF01960.fa.gz
│   │   ├── tmRNA.RF00023.fasta.gz
│   │   ├── tRNA.RF00005.fasta.gz
│   │   └── tRNA-Sec.RF01852.fasta.gz
│   └── taxonomy-summary
│       ├── LSU
│       │   ├── ERR599171.merged_LSU.fasta.mseq.gz
│       │   ├── ERR599171.merged_LSU.fasta.mseq_hdf5.biom
│       │   ├── ERR599171.merged_LSU.fasta.mseq_json.biom
│       │   ├── ERR599171.merged_LSU.fasta.mseq.tsv
│       │   ├── ERR599171.merged_LSU.fasta.mseq.txt
│       │   └── krona.html
│       └── SSU
│           ├── ERR599171.merged_SSU.fasta.mseq.gz
│           ├── ERR599171.merged_SSU.fasta.mseq_hdf5.biom
│           ├── ERR599171.merged_SSU.fasta.mseq_json.biom
│           ├── ERR599171.merged_SSU.fasta.mseq.tsv
│           ├── ERR599171.merged_SSU.fasta.mseq.txt
│           └── krona.html
└── ro-crate-metadata.json



Provenance