Group abundances of UniRef50 gene families obtained with HUMAnN2 to Gene Ontology (GO) slim terms with relative abundances

Introduction

HUMAnN2 is a pipeline to profile the presence/absence and abundance of microbial pathways in community of microbiota sequencing data. One output is a file with UniRef50 gene family abundances. HUMAnN2 proposes a script to regroup Uniref50 to GO, but used GO terms are too precise to get a broad overview of the ontology content.

The tool described here contains scripts to group UniRef50 abundances obtained using main HUMANn2 script (Gene families) to GO slim terms. GO slim is a subset of the terms in the whole GO. For this tool, metagenomics GO slim terms developed by Jane Lomax and the InterPro group.

Script in this tool calls:

A script to formate correspondance between Uniref50 and GO available in HUMAnN2 package
HUMAnN2 script to regroup Uniref50 to GO using formatted correspondance
GoaTools script to map GO terms to GO slim terms
A script to format output of previous script
HUMAnN2 script to regroup GO to GO slim terms
A script to format generated file

Installation

Using `conda`

$ conda install -c bioconda group_humann2_uniref_abundances_to_GO

It will manage installation of all dependencies.

Using code source

Get the code

Clone the repository:

$ git clone https://github.com/ASaiM/group_humann2_uniref_abundances_to_GO.git
$ cd group_humann2_uniref_abundances_to_GO

Install the requirements

This tool needs:

Git
Mercurial
VirtualEnv
Python with pip

Once these tools installed, you can run:

$ install_dependencies.sh

This script will launch a virtual environment and install:

HUMAnN2
GoaTools > 0.6.4 with

$ pip install -r requirements.txt
$ git clone https://github.com/tanghaibao/goatools.git

Using Galaxy

A wrapper was also developed and is available on Galaxy ToolShed. It can be installed on any Galaxy instance.

Usage

$ ./group_humann2_uniref_abundances_to_GO.sh [OPTIONS] \ 
     -i humann2_gene_families_abundance \
     -m molecular_function_abundance \
     -b biological_process_abundance \
     -c cellular_component_abundance

To get more information about options:

$ ./group_humann2_uniref_abundances_to_GO.sh -h

Tests

This tool is tested at each change of the GitHub repository using Travis CI.

In these tests, dependencies are installed and group_humann2_uniref_abundances_to_GO.sh is run on test data available in test-data directory:

A file with UniRef50 gene family abundances from HUMAnN2 (computed on gut microbiota data of lean women): humann2_gene_families.csv
A file with basic Gene Ontology, downloaded on 02/22/2016: go_02_22_2016.obo
A file with metagenomic slim Gene Ontology, downloaded on 02/22/2016: goslim_metagenomics_02_22_2016.obo
A file with humann2 correspondance between Uniref50 and GO, downloaded on 02/22/2016: map_infogo1000_uniref50_02_22_2016.txt

Generated outputs are compared to expected ones:

expected_molecular_function_abundances.txt with expected abundance of GO related to molecular functions
expected_biological_process_abundances.txt with expected abundance of GO related to biological processes
expected_cellular_component_abundances.txt with expected abundance of GO related to cellular components

You can check .travis.yml file for more information.

License

This tool is released under Apache 2 License. See the LICENSE file for details.

Citation

To cite this tool, a DOI is generated for each release using Zenodo.

Last release DOI and corresponding bibtex export:

@misc{berenice_batut_2016_50086,
  author       = {Bérénice Batut},
  title        = {{Group abundances of UniRef50 gene families 
                   obtained with HUMAnN2 to Gene Ontology (GO) slim
                   terms with relative abundances: release v1.2.0}},
  month        = apr,
  year         = 2016,
  doi          = {10.5281/zenodo.50086},
  url          = {http://dx.doi.org/10.5281/zenodo.50086}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Group abundances of UniRef50 gene families obtained with HUMAnN2 to Gene Ontology (GO) slim terms with relative abundances

Introduction

Installation

Using `conda`

Using code source

Get the code

Install the requirements

Using Galaxy

Usage

Tests

License

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Group abundances of UniRef50 gene families obtained with HUMAnN2 to Gene Ontology (GO) slim terms with relative abundances

Introduction

Installation

Using conda

Using code source

Get the code

Install the requirements

Using Galaxy

Usage

Tests

License

Citation

Using `conda`