Group abundances of UniRef50 gene families obtained with HUMAnN2 to Gene Ontology (GO) slim terms with relative abundances
HUMAnN2 is a pipeline to profile the presence/absence and abundance of microbial pathways in community of microbiota sequencing data. One output is a file with UniRef50 gene family abundances. HUMAnN2 proposes a script to regroup Uniref50 to GO, but used GO terms are too precise to get a broad overview of the ontology content.
The tool described here contains scripts to group UniRef50 abundances obtained using main HUMANn2 script (Gene families) to GO slim terms. GO slim is a subset of the terms in the whole GO. For this tool, metagenomics GO slim terms developed by Jane Lomax and the InterPro group.
Script in this tool calls:
- A script to formate correspondance between Uniref50 and GO available in HUMAnN2 package
- HUMAnN2 script to regroup Uniref50 to GO using formatted correspondance
- GoaTools script to map GO terms to GO slim terms
- A script to format output of previous script
- HUMAnN2 script to regroup GO to GO slim terms
- A script to format generated file
$ conda install -c bioconda group_humann2_uniref_abundances_to_GO
It will manage installation of all dependencies.
Clone the repository:
$ git clone https://github.com/ASaiM/group_humann2_uniref_abundances_to_GO.git
$ cd group_humann2_uniref_abundances_to_GO
This tool needs:
- Git
- Mercurial
- VirtualEnv
- Python with pip
Once these tools installed, you can run:
$ install_dependencies.sh
This script will launch a virtual environment and install:
$ pip install -r requirements.txt
$ git clone https://github.com/tanghaibao/goatools.git
A wrapper was also developed and is available on Galaxy ToolShed. It can be installed on any Galaxy instance.
$ ./group_humann2_uniref_abundances_to_GO.sh [OPTIONS] \
-i humann2_gene_families_abundance \
-m molecular_function_abundance \
-b biological_process_abundance \
-c cellular_component_abundance
To get more information about options:
$ ./group_humann2_uniref_abundances_to_GO.sh -h
This tool is tested at each change of the GitHub repository using Travis CI.
In these tests, dependencies are installed and group_humann2_uniref_abundances_to_GO.sh
is run on test data available in test-data
directory:
- A file with UniRef50 gene family abundances from HUMAnN2 (computed on gut microbiota data of lean women):
humann2_gene_families.csv
- A file with basic Gene Ontology, downloaded on 02/22/2016:
go_02_22_2016.obo
- A file with metagenomic slim Gene Ontology, downloaded on 02/22/2016:
goslim_metagenomics_02_22_2016.obo
- A file with humann2 correspondance between Uniref50 and GO, downloaded on 02/22/2016:
map_infogo1000_uniref50_02_22_2016.txt
Generated outputs are compared to expected ones:
expected_molecular_function_abundances.txt
with expected abundance of GO related to molecular functionsexpected_biological_process_abundances.txt
with expected abundance of GO related to biological processesexpected_cellular_component_abundances.txt
with expected abundance of GO related to cellular components
You can check .travis.yml
file for more information.
This tool is released under Apache 2 License. See the LICENSE file for details.
To cite this tool, a DOI is generated for each release using Zenodo.
Last release DOI and corresponding bibtex export:
@misc{berenice_batut_2016_50086,
author = {Bérénice Batut},
title = {{Group abundances of UniRef50 gene families
obtained with HUMAnN2 to Gene Ontology (GO) slim
terms with relative abundances: release v1.2.0}},
month = apr,
year = 2016,
doi = {10.5281/zenodo.50086},
url = {http://dx.doi.org/10.5281/zenodo.50086}
}