-
Notifications
You must be signed in to change notification settings - Fork 7
metaGOflow overview
Welcome to the metaGOflow wiki!
metaGOflow supports:
- the fast inference of taxonomic profiles from shotgun metagenomics data based on rRNA genes and their mOTUs
- the functional annotation of the raw reads
- theis assembly using the MEGAHIT algorithm
metaGOflow's main input files are:
- forward and reverse
.fastq
files of shotgun metagenomics data, that can be either local or retrieved through an ENA run accession number, and - the
config.yml
file, where the user provides all the necessary parameter values for the workflow to run.
argument | description |
---|---|
-n | (Optional) Defines the prefix of the output file (e.g. a sample ID) |
-s | (Optional) Running metaGOflow using Singularity rather than Docker (default) |
-d | (Required) Defines the output folder name |
-f | (Required) The path of the .gz file containing the forward reads |
-r | (Required) The path of the .gz file containing the reverse reads |
This file works as an interface between metaGOflow
and the user.
In this file, you set which steps you want to perform as well as
all the arguments for the tools that will be invoked.
We strongly advised user not to use the default arguments without considering first their data.
The default min_length_required
is 130 however your sequences might be shorter.
This would lead metaGOflow
to fail.
You need to consider your data first as well as your computing environment, especially for the case of the functional annotation step, and fill in the config.yml
file properly.
metaGOflow
will return a .zip
file that is a compressed RO-Crate.
This is an example case of the .zip
content from a complete run of the workflow:
├── config.yml
├── ERR599171.yml
├── results
│ ├── ERR599171_1.fastq.trimmed.fasta
│ ├── ERR599171_1.fastq.trimmed.qc_summary
│ ├── ERR599171_2.fastq.trimmed.fasta
│ ├── ERR599171_2.fastq.trimmed.qc_summary
│ ├── ERR599171.merged_CDS.faa
│ ├── ERR599171.merged_CDS.ffn
│ ├── ERR599171.merged.cmsearch.all.tblout.deoverlapped
│ ├── ERR599171.merged.fasta
│ ├── ERR599171.merged.motus.tsv
│ ├── ERR599171.merged.qc_summary
│ ├── ERR599171.merged.unfiltered_fasta
│ ├── fastp.html
│ ├── final.contigs.fa
│ ├── functional-annotation
│ │ ├── ERR599171.merged_CDS.I5.tsv.chunks
│ │ ├── ERR599171.merged_CDS.I5.tsv.gz
│ │ ├── ERR599171.merged.hmm.tsv.chunks
│ │ ├── ERR599171.merged.hmm.tsv.gz
│ │ ├── ERR599171.merged.summary.go
│ │ ├── ERR599171.merged.summary.go_slim
│ │ ├── ERR599171.merged.summary.ips
│ │ ├── ERR599171.merged.summary.ko
│ │ ├── ERR599171.merged.summary.pfam
│ │ ├── ERR599171.merged.emapper.summary.eggnog
│ │ └── stats
│ │ ├── go.stats
│ │ ├── interproscan.stats
│ │ ├── ko.stats
│ │ ├── orf.stats
│ │ └── pfam.stats
│ ├── RNA-counts
│ ├── sequence-categorisation
│ │ ├── 5_8S.fa.gz
│ │ ├── alpha_tmRNA.RF01849.fasta.gz
│ │ ├── Bacteria_large_SRP.RF01854.fasta.gz
│ │ ├── Bacteria_small_SRP.RF00169.fasta.gz
│ │ ├── cyano_tmRNA.RF01851.fasta.gz
│ │ ├── LSU_rRNA_archaea.RF02540.fa.gz
│ │ ├── LSU_rRNA_bacteria.RF02541.fa.gz
│ │ ├── LSU_rRNA_eukarya.RF02543.fa.gz
│ │ ├── RNaseP_bact_a.RF00010.fasta.gz
│ │ ├── SSU_rRNA_archaea.RF01959.fa.gz
│ │ ├── SSU_rRNA_bacteria.RF00177.fa.gz
│ │ ├── SSU_rRNA_eukarya.RF01960.fa.gz
│ │ ├── tmRNA.RF00023.fasta.gz
│ │ ├── tRNA.RF00005.fasta.gz
│ │ └── tRNA-Sec.RF01852.fasta.gz
│ └── taxonomy-summary
│ ├── LSU
│ │ ├── ERR599171.merged_LSU.fasta.mseq.gz
│ │ ├── ERR599171.merged_LSU.fasta.mseq_hdf5.biom
│ │ ├── ERR599171.merged_LSU.fasta.mseq_json.biom
│ │ ├── ERR599171.merged_LSU.fasta.mseq.tsv
│ │ ├── ERR599171.merged_LSU.fasta.mseq.txt
│ │ └── krona.html
│ └── SSU
│ ├── ERR599171.merged_SSU.fasta.mseq.gz
│ ├── ERR599171.merged_SSU.fasta.mseq_hdf5.biom
│ ├── ERR599171.merged_SSU.fasta.mseq_json.biom
│ ├── ERR599171.merged_SSU.fasta.mseq.tsv
│ ├── ERR599171.merged_SSU.fasta.mseq.txt
│ └── krona.html
└── ro-crate-metadata.json
Anything unclear or inaccurate? Please open an issue or email Dr.Haris Zafeiropoulos (haris.zafeiropoulos@kuleuven.be).
With respect to EMO BON protocols, samples, analyses you may contact the Observation, Data and Service Development Officer of EMBRC, Dr. Ioulia Santi (ioulia.santi@embrc.eu)