Skip to content

ravel-lab/VIRGO2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VIRGO2

Overview

VIRGO2 is a non-redudant catalog of genes from the human vaginal microbiome that allows for rapid taxonomic and functional analysis of metagenomic/metatranscriptomic reads. VIRGO2 is distributed as a bowtie2 index so that processed/QC'd reads can be mapped using the bowtie2 short read mapping tool. Annotation tables can be merged with mapping results to provide taxonomic and functional information.

Dependencies

VIRGO2 requires the following software to be install and in the PATH

-git lfs -Gzip -Bowtie2 -Samtools -Python3 >v3.8

and the following python packages

-pandas -numpy

Installation

VIRGO2 can be install by cloning the repository with git lfs installed and configured and then unzipping the annotation and fasta files, and building the bowtie2 index from the provided fasta file. Once the repository is cloned, you should run the VIRGO2.py install command which will unzip the required files and build the bowtie2 index.

git clone https://github.com/ravel-lab/VIRGO2.git

cd VIRGO2

python3 VIRGO2.py install

NOTE: installation will fail if 'git lfs' is not installed and configured. If you have already cloned the repository and need to add the large files after install and configuring 'git lfs', this can be accomplished by running the follow command in the repository directory.

git lfs pull

Alternative source

VIRGO2 can also be obtained from zenodo under the DOI: 10.5281/zenodo.18703182 (https://zenodo.org/records/18703182). If you obtain VIRGO2 from zenodo, you will need to decompress the archived directories (FastaFiles.tar.gz, AnnotationTables.tar.gz, Index.tar.gz, AccessoryScripts.tar.gz) prior to installation.

Usage

The main VIRGO2 operations are performed by the script VIRGO2.py that has the commands VIRGO2.py map , VIRGO2.py compile , and VIRGO2.py taxonomy. The map command should be run on all samples individually. The compile command concatenates the results from the individual samples into one table containing genes are rows and samples as columns. The taxonomy command operates on the file generated by the compile command and produces tables containing the estimated composition of each sample.

> python VIRGO2.py -h
    usage: VIRGO2.py [-h] {install,map,compile,taxonomy,license} ...

    VIRGO2 is a tool and associated database used to analyze vaginal shotgun metagenomes and metatranscriptomes

    positional arguments:
    {install,map,compile,taxonomy,license}

    optional arguments:
    -h, --help            show this help message and exit

VIRGO2.py map

This module will map the sequencing reads from a single sample to the VIRGO2 bowtie2 index. Suggested usage is with the default settings.

> python VIRGO2.py map -h
    usage: VIRGO2.py map [-h] -r READS [-c {0,1}] [-p THREADS] -o OUTPUTPREFIX [-b {0,1}]

    optional arguments:
    -h, --help              Show this help message and exit
    -r READS, --reads READS
                            Single-End reads file, can be gzipped
    -c {0,1}, --cov {0,1}
                            Assign multi-mapped reads to gene with highest percent covered, 0=No,1=Yes, default:Yes
    -p THREADS, --threads THREADS
                            Number of threads used in mapping default:1
    -o OUTPUTPREFIX, --outputPrefix OUTPUTPREFIX
                            Prefix used in the filename for the mapping output
    -b {0,1}, --bypass {0,1}
                            Sam and coverage files already generated, proceed to coverage correction directly 0=No, 1=Yes, default No

VIRGO2.py compile

After all samples have been mapped to VIRGO2, the compile command will merge the mapping results from all sample to a single file to be used in downstream analysis.

> python VIRGO2.py compile -h
    usage: virgo2_ic.py compile [-h] [-i INPUT] [-o OUTPUTPREFIX]

    optional arguments:
    -h, --help              show this help message and exit
    -i INPUT, --input INPUT
                            Directory where bowtie2 mapping results are located
    -o OUTPUTPREFIX, --outputPrefix OUTPUTPREFIX
                            Prefix used in the filename for the compiled output

After running compile, there will be a single tab-delimmted file with the read counts per gene per sample.

VIRGO2.py taxonomy

After the compiled output has been produced, the compile command can be used to estimate the taxonomic composition of each sample. Composition can be reported including only bacteria in the calculation and can be output as read counts or relative abundances.

Default settings apply a per species gene-detection number threshold that a species must meet in order to be considered present in a sample. These thresholds are located in the file (2.VIRGO2.taxonThresholds.txt) and are set to 20% of the median number of genes detected for the species in the dataset used to build VIRGO2 (e.g. the median number of L. crispatus genes in metagenome containing L. crispatus was 2809, so the threshold for L. crispatus is set at 561). The detection threshold can be disabled.

>python VIRGO2.py taxonomy -h
    usage: VIRGO2.py taxonomy [-h] [-i INPUT] [-o OUTPUTPREFIX] [-b {0,1}] [-f {0,1}] [-r {0,1}] [-m {0,1}]

    optional arguments:
      -h, --help            show this help message and exit
      -i INPUT, --input INPUT
                            Full path to compiled results file
      -o OUTPUTPREFIX, --outputPrefix OUTPUTPREFIX
                            Prefix used in the filename for the compiled output
      -b {0,1}, --bacteria {0,1}
                            Report composition including only the bacteria, default=1
      -f {0,1}, --filter {0,1}
                            Mask contribution from taxa where number of genes detected is below threshold, default=1
      -r {0,1}, --readCounts {0,1}
                            Report values as read counts instead of relative abundances, default=0
      -m {0,1}, --multigenera {0,1}
                            Report relative abundance of multigenera genes, off by default (0)

After running taxonomy, there will be a single comma-separated file with the taxonomy composition per sample.

Annotation files and additional analyses

VIRGO2 contains the following gene annotation files that can be merged with the compiled mapped results using standard functions in python or R. An example script can be found in AccessoryScripts/VIRGO2_add_annotations.py.

-0.VIRGO2.geneLength.txt        :Length of each VIRGO2 gene
-1.VIRGO2.taxon.txt             :Taxonomic annotation of each VIRGO2 gene
-2.VIRGO2.taxonThresholds.txt   :Per taxon gene number thresholds used in estimated relative abundance
-3.VIRGO2.eggNog.txt            :EggNog Annotations per VIRGO2 gene
-4.VIRGO2.PFAM.txt              :PFAM Annotations per VIRGO2 gene
-5.VIRGO2.EC.txt                :Enzyme Commision numbers per VIRGO2 gene
-6.VIRGO2.geneProduct.txt       :Gene product Annotations per VIRGO2 gene
-7.VIRGO2.kegg.txt              :KEGG annotations per VIRGO2 gene
-8.VIRGO2.CAZy.txt              :Carbohydrate-active enzmye annotations per VIRGO2 gene
-9.VIRGO2.AMR.txt               :Antimicrobial resistance annotations per VIRGO2 gene
-10.VIRGO2.phage.txt            :Bacteriophage annotations per VIRGO2 gene
-11.VIRGO2.compound.txt         :Biosynthetic gene cluster annotations per VIRGO2 gene
-VIRGO2_VOGkey.txt              :Per-species orthologous gene cluster assignment for VIRGO2 genes

About

VIRGO2: A non-redundant gene catalog for the vaginal microbiome

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages