Functions for the processing and contaminant filtering of high throughput sequencing data to identify low abundance microbes.
Please see the exotic-manuscript repository for the analyses related to the manuscript:
Exogenous sequences in tumors and immune cells (exotic): A tool for estimating the microbe abundances in tumor RNAseq data. Hoyd R, Wheeler CE, Liu Y, Jagjit Singh MS, Muniak M, Jin N, Denko NC, Carbone DP, Mo X, Spakowicz DJ. Cancer Research Communications. AACR; 2023 https://aacrjournals.org/cancerrescommun/article/doi/10.1158/2767-9764.CRC-22-0435/729620
The package can be installed from GitHub via devtools:
install.packages("devtools")
devtools::install_github("spakowiczlab/exotic")
For more detailed instructions, please refer to the user manual.
The custom database containing bacteria, fungi, viruses, archaea, and select eukaryotes is available for download at https://go.osu.edu/exotic-database. The human reference genome (hg38) and univec contaminants database are included as an additional contaminant filters.
Additional filter function, developed by segmenting the CHM13 human and GRCm39 mouse transcriptome/genome into 100 base pairs, with 50 base pair overlaps, and running through the exotic pipeline. This function filters out microbes falsely identified as microbial in this process. The synthetic genome/transciptome are available at https://zenodo.org/records/10999313.
transcript_genome_filter(counts, filters)
Additional function to calculate the abundance relative to human counts. For analysis, we now recommend using the unnormalized, filtered counts or the abundance relative to human calculated with the unnormalized, filtered counts.
calculate_abundance_relative_to_human(counts)
Added krakenuniq-based filters. For analysis, we now recommend using the krakenuniq unnormalized, filtered counts or the abundance relative to human calculated with the unnormalized, filtered counts.