-
Notifications
You must be signed in to change notification settings - Fork 4
fangly/AmpliCopyrighter
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
NAME copyrighter - Correct trait bias in microbial profiles SYNOPSIS copyrighter -i otu_table.qiime -o otu_table_copyrighted.generic DESCRIPTION The genome of Bacteria and Archaea often contains several copies of the 16S rRNA gene. This can lead to significant biases when estimating the composition of microbial communities using 16S rRNA amplicons or microarrays or their total abundance using 16S rRNA quantitative PCR, since species with a large number of copies will contribute disproportionally more 16S amplicons than species with a unique copy. Fortunately, it is possible to infer the copy number of unsequenced microbial species, based on that of close relatives that have been fully sequenced. Using this information, CopyRigher corrects microbial relative abundance by applying a weight proportional to the inverse of the estimated copy number to each species. In metagenomic surveys, a similar problem arises due to genome length variations between species, and can be corrected by CopyRighter as well. In all cases, a community file is used as input (-i option) and a corrected community file with trait-corrected (16S rRNA gene copy number or genome length) relative abundances is generated (-o option). Total abundance can optionally be provided (-t option), corrected and combined with relative abundance estimates to get the absolute abundance of each species. Also the average trait value in each community is reported on standard output. We are grateful to the Genomics Virtual Lab <https://genome.edu.au/> for providing a public Galaxy webserver in which users can run CopyRighter in a graphical environment: <http://galaxy-qld.genome.edu.au>. REQUIRED ARGUMENTS -i <input> Input community file obtained from 16S rRNA microarray, 16S rRNA amplicon sequencing or metagenomic sequencing, in biom, QIIME, GAAS, Unifrac, or generic (tabular site-by-species) format. The file must contain read counts (not percentages) and taxa must have UNALTERED taxonomic assignments. Here is an example of Greengenes 2012/10 taxonomic string (note the whitespace after each semicolon): k__Bacteria; p__Proteobacteria; c__Alphaproteobacteria; o__Rhodospirillales; f__Rhodospirillaceae; g__Telmatospirillum; s__siberiense See also the <data> parameter to specify your own database of trait values. OPTIONAL ARGUMENTS -d <data> Provide the file of trait estimates to use for correction. Data files of 16S rRNA gene copy number and genome length (based on IMG 4.0 genomes mapped onto the Oct 2012 Greengenes taxonomy) are distributed with CopyRighter. In case you want to use an alternative data file, be aware that it should be tab-delimited and have two columns, an ID or taxonomic string (col 1), and trait estimate (col 2), as illustrated in this example: # ID 16S rRNA count 4 1.51098055313977 7 1.51812891020048 ... 24084 3.41268502385832 # taxstring 16S rRNA count k__Archaea; p__; c__; o__; f__; g__; s__ 1.57262 k__Archaea; p__Crenarchaeota; [...] g__Cenarchaeum; s__symbiosum 1.00000 ... k__Bacteria; p__Actinobacteria; [...] g__Actinomyces; s__europaeus 1.19211 Extra columns are ignored, as well as empty lines and comment lines (starting with #). Note that the header line can define the name of the weight used. Also, the file can contain trait values both at the ID and taxstring level. This argument is optional. When omitted, CopyRighter will look for the data file location stored in the "COPYRIGHTER_DB" environment variable. Feel free to make this variable point to your preferred data file. -l <lookup> What to match when looking up the trait value of a taxon: 'desc', use taxonomic description, or 'id', use OTU ID (if recorded in your input community file). The script bc_use_repr_id of Bio::Community can help in replacing arbitrary OTU IDs by their corresponding Greengenes ID. Default: desc -o <output> Output path for the corrected community files (in same format as input), with relative abundance expressed in percent. Default: out_copyrighted.txt -t <total> File containing the total microbial abundance to be corrected by the average trait value, e.g. 16S rRNA quantitative PCR numbers to be corrected by the average 16S rRNA copy number in each community. This file should be tab-delimited and contain two columns: community name, and total abundance. Using this option will produce two additional output files, one containing the corrected total microbial abundance, and other the absolute abundance of each taxon in the <input> (in the same format as <input>): assuming an <output> called 'out_copyrighted.txt', these files will be named, respectively, 'out_copyrighted_total.tsv' and 'out_copyrighted_combined.txt'. -v Verbose mode. Display trait value assignments. You should probably use this option and make sure that your taxa are processed as intended. HELP & FEEDBACK Mailing list New releases of CopyRighter, usage help and suggestions are discussed on this mailing list: <https://groups.google.com/d/forum/copyrighter> Bugs All complex software has bugs lurking in it, and this program is no exception. If you find a bug, please report it on the bug tracker: <http://github.com/fangly/AmpliCopyrighter/issues> AUTHOR Florent Angly <florent.angly@gmail.com> VERSION This document refers to copyrighter version 0.46 COPYRIGHT Copyright 2012-2014 Florent ANGLY <florent.angly@gmail.com> CopyRighter is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. CopyRighter is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with CopyRighter. If not, see <http://www.gnu.org/licenses/>.