-
Notifications
You must be signed in to change notification settings - Fork 7
3. Import your data
gbtools
treats each assembly and the associated coverage data as a single object of class gbt
. If you have several coverage tables for the same assembly, you can import them together with the taxon marker, SSU rRNA, and tRNA annotations with the gbt
function. The examples in this README can be replicated with the files in the example_data
folder, adapted from the Multi-metagenome data set.
# If you have a single coverage table, give the filename at the covstats= option
d <- gbt(covstats="HPminus.coverage", # File with coverage data
mark="phylotype.result.parsed", # Taxonomic marker data
marksource="amphora2", # Name for the taxonomic marker data set
ssu="HPminus.ssu.tab", # SSU rRNA data
trna="HPminus.trna.tab") # tRNA data
# With two or more coverage tables, you can import them together with the c() function
d <- gbt(covstats=c("HPminus.coverage","HPplus.coverage"), # More than one coverage table
mark="phylotype.result.parsed",
marksource="amphora2",
ssu="HPminus.ssu.tab",
trna="HPminus.trna.tab")
Only the coverage table is required. The rest are optional (though having them will provide more information and prettier plots).
Type the name of the object to see summary statistics:
d
summary(d) # Does the same thing
With the mark=
parameter, you can sepcify a file with taxonomic marker annotations of the contigs in your assembly. There are several published sets of such conserved marker genes, some for Bacteria and Archaea in general, others specific for particular phyla or taxonomic groups. The taxonomic affiliation of each contig in your metagenome can also be annotated independently of specific genes, e.g. with the Blobology pipeline.
If you have used several different taxonomic annotations of the same metagenome, you may wish to import them together. The files have to be formatted in the same way (details in the appendix to this documentation). Suppose the marker data are in files called marker_file1
, marker_file2
and marker_file3
. Then to import them, simply use the combine function c()
to list all the files for the mark=
parameter in the gbt()
function. You will also need to give a name for each of the marker sets, using the marksource=
parameter.
d <- gbt(covstats="HPminus.coverage",
mark=c("marker_file1","marker_file2","marker_file3"),
marksource=c("marker_set1","marker_set2","marker_set3"), ... )
Special characters like the comment character ("hash") and apostrophes can cause problems with data import into R. Use the input validation script input_validator.pl
found in the gbtools/inst/Perl
folder to check for possible problems in your input files.
perl input_validator.pl --covstats HPminus.coverage,HPplus.coverage --mark phylotype.result.parsed --ssu HPminus.ssu.tab --trna HPminus.trna.tab --outdir checked_output
If an output folder name is specified with the --outdir
option, a new folder is created and modified files with the errors corrected will be written to that folder, with filename suffix .mod
. Otherwise errors will only be reported but not fixed.
A log file of all the error messages will also be written to input_validator.log
.
See a help message with perl input_validator.pl --help
.
You can also use it directly within the R environment:
gbt_checkinput(covstats="HPminus.coverage", mark="phylotype.result.parsed", ssu="HPminus.ssu.tab")
This will give a summary of which files contain errors. The input_validator.log
file will be written in the current working folder.