diff --git a/doc/manuscript.Rmd b/doc/manuscript.Rmd index a3ec71c..cbd5cf5 100644 --- a/doc/manuscript.Rmd +++ b/doc/manuscript.Rmd @@ -52,7 +52,7 @@ pvalcalc <- function(x) { \newpage ## Introduction -Due to their mutagenic abilities and propensity for functional manipulation, human viruses are strongly associated with, and in many cases cause, cancer [@Feng:2008kr; @Shuda:2011gf; @Schiller:2012ba; @Chang:1994up]. Because bacteriophages (viruses that specifically infect bacteria) are crucial for bacterial community stability and composition [@Harcombe:2005fd; @RodriguezValera:2009cr; @Cortez:2014bk] and have been implicated as oncogenic agents [@Zackular:2014fba; @Garrett:2015fg; @Baxter:2014hb; @Arthur:2012kl], bacteriophages have the potential to indirectly impact cancer. The gut virome (the virus community of the gut) therefore has the potential to impact health and disease. Altered human virome composition and diversity have been identified in diseases including periodontal disease [@Ly:2014ew], HIV [@Monaco:2016ita], cystic fibrosis [@Willner:2009dq], antibiotic exposure [@Abeles:2015dy; @Modi:2013fi], urinary tract infections [@SantiagoRodriguez:2015gd], and inflammatory bowel disease [@Norman:2015kb]. The strong association of bacterial communities with colorectal cancer and the precedent for the virome to impact other human diseases suggest that colorectal cancer may be associated with altered virus communities. +Due to their mutagenic abilities and propensity for functional manipulation, human viruses are strongly associated with, and in many cases cause, cancer [@Feng:2008kr; @Shuda:2011gf; @Schiller:2012ba; @Chang:1994up]. Because bacteriophages (i.e. viruses that specifically infect bacteria) are crucial for bacterial community stability and composition [@Harcombe:2005fd; @RodriguezValera:2009cr; @Cortez:2014bk] and have been implicated as oncogenic agents [@Zackular:2014fba; @Garrett:2015fg; @Baxter:2014hb; @Arthur:2012kl], bacteriophages have the potential to indirectly impact cancer. The gut virome (i.e. the virus community of the gut) therefore has the potential to impact health and disease. Altered human virome composition and diversity have been identified in diseases including periodontal disease [@Ly:2014ew], HIV [@Monaco:2016ita], cystic fibrosis [@Willner:2009dq], antibiotic exposure [@Abeles:2015dy; @Modi:2013fi], urinary tract infections [@SantiagoRodriguez:2015gd], and inflammatory bowel disease [@Norman:2015kb]. The strong association of bacterial communities with colorectal cancer and the precedent for the virome to impact other human diseases suggest that colorectal cancer may be associated with altered virus communities. Colorectal cancer is the second leading cause of cancer-related deaths in the United States [@Siegel:2014jo]. The US National Cancer Institute estimates over 1.5 million Americans were diagnosed with colorectal cancer in 2016 and over 500,000 Americans died from the disease [@Siegel:2014jo]. Growing evidence suggests that an important component of colorectal cancer etiology may be perturbations in the colonic bacterial community [@Zackular:2016en; @Zackular:2014fba; @Baxter:2014hb; @Dejea:2014fz; @Arthur:2012kl]. Work in this area has led to a proposed disease model in which bacteria colonize the colon, develop biofilms, promote inflammation, and enter an oncogenic synergy with the cancerous human cells [@Flynn:2016iu]. This association also has allowed researchers to leverage bacterial community signatures as biomarkers to provide accurate, noninvasive colorectal cancer detection from stool [@Zackular:2014fba; @Baxter:2016dja; @Zeller:2014ix]. While an understanding of colorectal cancer bacterial communities has proven fruitful both for disease classification and for identifying the underlying disease etiology, bacteria are only a subset of the colon microbiome. Viruses are another important component of the colon microbial community that have yet to be studied in the context of colorectal cancer. We evaluated disruptions in virus and bacterial community composition in a human cohort whose stool was sampled at the three relevant stages of cancer development: healthy, adenomatous, and cancerous. @@ -63,17 +63,17 @@ Here we address the knowledge gap of whether virus community composition is alte ## Cohort Design, Sample Collection, and Processing Our study cohort consisted of 90 human subjects, 30 of whom had healthy colons, 30 of whom had adenomas, and 30 of whom had carcinomas **(Figure \ref{sampleproc})**. Half of each stool sample was used to sequence the bacterial communities using both 16S rRNA gene and shotgun sequencing techniques. The 16S rRNA gene sequencing was performed for a previous study, and the sequences were re-analyzed using contemporary methods [@Zackular:2014fba]. The other half of each stool sample was purified for virus like particles (VLPs) before genomic DNA extraction and shotgun metagenomic sequencing. In the VLP purification, cells were disrupted and extracellular DNA degraded **(Figure \ref{sampleproc})** to allow the exclusive analysis of viral DNA within virus capsids. In this manner, the *extracellular virome* of encapsulated viruses was targeted. -Each extraction was performed with a blank buffer control to detect contaminants from reagents or other unintentional sources. Only one of the nine controls contained detectable DNA at a minimal concentration of 0.011 ng/µl, thus providing evidence of the enrichment and purification of VLP genomic DNA over potential contaminants **(Figure \ref{qualcontrol} A)**. As was expected, these controls yielded few sequences and were almost entirely removed while rarefying the datasets to a common number of sequences **(Figure \ref{qualcontrol} B)**. The high quality phage and bacterial sequences were assembled into highly covered contigs longer than 1 kb **(Figure \ref{contigqc})**. Because contigs represent genome fragments, we further clustered related bacterial contigs into operational genomic units (OGUs) and viral contigs into operational viral units (OVUs) **(Figure \ref{contigqc} - \ref{clustercontigqc})** to approximate organismal units. +Each extraction was performed with a blank buffer control to detect contaminants from reagents or other unintentional sources. Only one of the nine controls contained detectable DNA at a minimal concentration of 0.011 ng/µl, thus providing evidence of the enrichment and purification of VLP genomic DNA over potential contaminants **(Figure \ref{qualcontrol} A)**. As expected, these controls yielded few sequences and were almost entirely removed while rarefying the datasets to a common number of sequences **(Figure \ref{qualcontrol} B)**. The high quality phage and bacterial sequences were assembled into highly covered contigs longer than 1 kb **(Figure \ref{contigqc})**. Because contigs represent genome fragments, we further clustered related bacterial contigs into operational genomic units (OGUs) and viral contigs into operational viral units (OVUs) **(Figure \ref{contigqc} - \ref{clustercontigqc})** to approximate organismal units. ## Unaltered Virome Diversity in Colorectal Cancer -Microbiome and disease associations are often described as being of an altered diversity (i.e., "dysbiotic"). We therefore first evaluated the influence of colorectal cancer on virome OVU diversity. We evaluated differences in communities between disease states using the Shannon diversity, richness, and Bray-Curtis metrics. We observed no significant alterations in either Shannon diversity or richness in the diseased states as compared to the healthy state **(Figure \ref{betaogu} C-D)**. There was no statistically significant clustering of the disease groups (ANOSIM p-value `r pvalcalc(alphadf[c(alphadf$Title %in% "anovawithoutneg"),"Stat"])`, **Figure \ref{betaogu}**). Notably, there was a significant difference between the few blank controls that remained after rarefying the data and the other study groups (ANOSIM p-value `r pvalcalc(alphadf[c(alphadf$Title %in% "anovawithneg"),"Stat"])`, **Figure \ref{betaogunegative})**, further supporting the quality of the sample set. In summary, standard alpha and beta diversity metrics were insufficient for capturing virus community differences between disease states **(Figure \ref{betaogu})**. This is consistent with what has been observed when the same metrics were applied to 16S rRNA sequenced and metagenomic samples [@Zeller:2014ix; @Zackular:2014fba; @Baxter:2016dja] and points to the need for alternate approaches to detect the impact of colorectal cancer disease state on these communities. +Microbiome and disease associations are often described as being of an altered diversity (i.e., "dysbiotic"). Therefore, we first evaluated the influence of colorectal cancer on virome OVU diversity. We evaluated differences in communities between disease states using the Shannon diversity, richness, and Bray-Curtis metrics. We observed no significant alterations in either Shannon diversity or richness in the diseased states as compared to the healthy state **(Figure \ref{betaogu} C-D)**. There was no statistically significant clustering of the disease groups (ANOSIM p-value `r pvalcalc(alphadf[c(alphadf$Title %in% "anovawithoutneg"),"Stat"])`, **Figure \ref{betaogu}**). Notably, there was a significant difference between the few blank controls that remained after rarefying the data and the other study groups (ANOSIM p-value `r pvalcalc(alphadf[c(alphadf$Title %in% "anovawithneg"),"Stat"])`, **Figure \ref{betaogunegative})**, further supporting the quality of the sample set. In summary, standard alpha and beta diversity metrics were insufficient for capturing virus community differences between disease states **(Figure \ref{betaogu})**. This is consistent with what has been observed when the same metrics were applied to 16S rRNA sequenced and metagenomic samples [@Zeller:2014ix; @Zackular:2014fba; @Baxter:2016dja] and points to the need for alternate approaches to detect the impact of colorectal cancer disease state on these communities. ## Altered Virome Composition in Colorectal Cancer As opposed to the diversity metrics discussed above, OTU-based relative abundance profiles generated from 16S rRNA gene sequences are effective feature sets for classifying stool samples as originating from individuals with healthy, adenomatous, or cancerous colons [@Zackular:2014fba; @Baxter:2016dja]. The exceptional performance of bacteria in these classification models supports a role for bacteria in colorectal cancer. We built off of these findings by evaluating the ability of virus community signatures to classify stool samples and compared their performance to models built using bacterial community signatures. To identify the altered virus communities associated with colorectal cancer, we built and tested random forest models for classifying stool samples as belonging to individuals with either cancerous or healthy colons. We confirmed that our bacterial 16S rRNA gene model replicated the performance of the original report which used logit models instead of random forest models **(Figure \ref{predmodel} A)** [@Zackular:2014fba]. We then compared the bacterial OTU model to a model built using OVU relative abundances. The viral model performed as well as the bacterial model (corrected p-value `r pvalcalc(twoss[c(twoss$First %in% "Virus" & twoss$Second %in% "Bacteria"),"pval"])`), with the viral and bacterial models achieving mean area under the curve (AUC) values of `r signif(twoauc[c(twoauc$class %in% "Virus"),"meanAUC"], digits = asigfig)` and `r signif(twoauc[c(twoauc$class %in% "Bacteria"),"meanAUC"], digits = asigfig)`, respectively **(Figure \ref{predmodel} A - B)**. To evaluate the ability of both bacterial and viral biomarkers to classify samples, we built a combined model that used both bacterial and viral community data. The combined model yielded a modest but statistically significant performance improvement beyond the viral (corrected p-value `r pvalcalc(twoss[c(twoss$First %in% "Combined" & twoss$Second %in% "Virus"),"pval"])`) and bacterial (corrected p-value `r pvalcalc(twoss[c(twoss$First %in% "Combined" & twoss$Second %in% "Bacteria"),"pval"])`) models, yielding an AUC of `r signif(twoauc[c(twoauc$class %in% "Combined"),"meanAUC"], digits = asigfig)` **(Figure \ref{predmodel} A - B)**. The combined features from the virus and bacterial communities improved our ability to classify stool as belonging to individuals with cancerous colons. -To determine the advantage of viral metagenomic methods over bacterial metagenomic methods, we compared the viral model to a model built using OGU relative abundance profiles from bacterial metagenomic shotgun sequencing data. This model performed worse than the other models (mean AUC = `r signif(twoauc[c(twoauc$class %in% "Metagenomic"),"meanAUC"], digits = asigfig)`) **(Figure \ref{predmodel} A - B)**. Because the coverage provided by the metagenomic sequencing was not as deep as the equivalent 16S rRNA gene sequencing, we attempted to compare the approaches at a common sequencing depth. This investigation revealed that the bacterial 16S rRNA gene model was strongly driven by sparse and low abundance OTUs **(Figure \ref{16scompare})**. Removal of OTUs with a median abundance of zero resulted in the removal of six OTUs, and a loss of model performance down to what was observed in the metagenome-based model **(Figure \ref{16scompare} A)**. The majority of these OTUs had a relative abundance lower than 1% across the samples **(Figure \ref{16scompare} B)**. Although the features in the viral model also were of low abundance **(Figure \ref{threewaymodel} F)**, the coverage was sufficient for high model performance, likely because viral genomes are orders of magnitude smaller than bacterial genomes. Thus, the targeted 16S rRNA gene sequencing approach, which represented only a fraction of the bacterial metagenomic sequencing depth, was more effective for detecting colorectal cancer in stool samples. Despite a loss of enthusiasm for 16S rRNA gene sequencing in favor of shotgun metagenomic techniques, 16S rRNA gene sequencing is still a superior methodological approach for some important applications. +To determine the advantage of viral metagenomic methods over bacterial metagenomic methods, we compared the viral model to a model built using OGU relative abundance profiles from bacterial metagenomic shotgun sequencing data. This model performed worse than the other models (mean AUC = `r signif(twoauc[c(twoauc$class %in% "Metagenomic"),"meanAUC"], digits = asigfig)`) **(Figure \ref{predmodel} A - B)**. Because the coverage provided by the metagenomic sequencing was not as deep as the equivalent 16S rRNA gene sequencing, we attempted to compare the approaches at a common sequencing depth. This investigation revealed that the bacterial 16S rRNA gene model was strongly driven by sparse and low abundance OTUs **(Figure \ref{16scompare})**. Removal of OTUs with a median abundance of zero resulted in the removal of six OTUs, and a loss of model performance down to what was observed in the metagenome-based model **(Figure \ref{16scompare} A)**. The majority of these OTUs had a relative abundance lower than 1% across the samples **(Figure \ref{16scompare} B)**. Although the features in the viral model also were of low abundance **(Figure \ref{threewaymodel} F)**, the coverage was sufficient for high model performance, likely because viral genomes are orders of magnitude smaller than bacterial genomes. Thus, the targeted 16S rRNA gene sequencing approach, which represented only a fraction of the bacterial metagenomic sequencing depth, was more effective for detecting colorectal cancer in stool samples. Despite the recent loss of enthusiasm for 16S rRNA gene sequencing in favor of shotgun metagenomic techniques, 16S rRNA gene sequencing is still a superior methodological approach for some important applications. The association between the bacterial and viral communities and colorectal cancer was driven by a few important microbes. *Fusobacterium* was the primary driver of the bacterial association with colorectal cancer, which is consistent with its previously described oncogenic potential **(Figure \ref{predmodel} C)**[@Flynn:2016iu]. The virome signature also was driven by a few OVUs, suggesting a role for these viruses in tumorigenesis **(Figure \ref{predmodel} D)**. The identified viruses were bacteriophages, belonging to *Siphoviridae*, *Myoviridae*, and "unclassified" phage taxa. Many of the important viruses were unidentifiable (denoted "unknown"). This is common in viromes across habitats; studies have reported as much as 95% of virus sequences belonging to unknown genomic units [@Pedulla:2003tu; @Hannigan:2015fz; @Willner:2009dq; @Brum:2015iaa]. When the bacterial and viral community signatures were combined, both bacterial and viral organisms drove the community association with cancer **(Figure \ref{predmodel} E)**. @@ -90,14 +90,14 @@ We evaluated whether the phages in the community were primarily lytic (i.e. obli ## Community Context of Influential Phages Because the link between colorectal cancer and the virome was driven by bacteriophages, we hypothesized that the influential phages were primarily predators of the influential bacteria, and thus influenced their relative abundance through predation. If this hypothesis were true, we would expect a correlation between the relative abundances of influential bacteria and phages. Instead, we observed a strikingly low correlation between bacterial and phage relative abundances **(Figure \ref{correlations} A,C)**. Overall, there was an absence of correlation between the most influential OVUs and bacterial OTUs **(Figure \ref{correlations} B)**. This evidence supported our null hypothesis that the influential phages were not primarily predators of influential bacteria. -Given these findings, we hypothesized that the most influential phages were acting by infecting a wide range of bacteria in the overall community, instead of just the influential bacteria. In other words, we hypothesized that the influential bacteriophages were community hubs (central members) within the bacteria and phage interactive network. We investigated the potential host ranges of all phage OVUs using a previously developed random forest model that relies on sequence features to predict which phages infected which bacteria in the community **(Figure \ref{network} A)** [@Hannigan:2017cj]. The predicted interactions were then used to identify phage community hubs. We calculated the alpha centrality (measure of importance in the ecological network) of each phage OVU's connection to the rest of the network. The phages with high centrality values were defined as community hubs. Next, the centrality of each OVU was compared to its importance in the colorectal cancer classification model. Phage OVU centrality was significantly and positively correlated with importance to the disease model (p-value `r pvalcalc(corstats[c(corstats$names %in% "pval"),"scores"])`, R = `r signif(corstats[c(corstats$names %in% "rho"),"scores"], digits = sigfig)`), suggesting that phages important in driving colorectal cancer also were more likely to be community hubs **(Figure \ref{network} B)**. Together these findings supported our hypothesis that influential phages were hubs within their microbial communities and had broad host ranges. +Given these findings, we hypothesized that the most influential phages were acting by infecting a wide range of bacteria in the overall community, instead of just the influential bacteria. In other words, we hypothesized that the influential bacteriophages were community hubs (i.e. central members) within the bacteria and phage interactive network. We investigated the potential host ranges of all phage OVUs using a previously developed random forest model that relies on sequence features to predict which phages infected which bacteria in the community **(Figure \ref{network} A)** [@Hannigan:2017cj]. The predicted interactions were then used to identify phage community hubs. We calculated the alpha centrality (i.e. measure of importance in the ecological network) of each phage OVU's connection to the rest of the network. The phages with high centrality values were defined as community hubs. Next, the centrality of each OVU was compared to its importance in the colorectal cancer classification model. Phage OVU centrality was significantly and positively correlated with importance to the disease model (p-value `r pvalcalc(corstats[c(corstats$names %in% "pval"),"scores"])`, R = `r signif(corstats[c(corstats$names %in% "rho"),"scores"], digits = sigfig)`), suggesting that phages important in driving colorectal cancer also were more likely to be community hubs **(Figure \ref{network} B)**. Together these findings supported our hypothesis that influential phages were hubs within their microbial communities and had broad host ranges. ## Working Model for Virome & Cancer Progression Because of their propensity for mutagenesis and capacity for modulating their host functionality, many viruses are oncogenic [@Feng:2008kr; @Shuda:2011gf; @Schiller:2012ba; @Chang:1994up]. Some bacteria also have oncogenic properties, suggesting that bacteriophages may play an indirect role in promoting carcinogenesis by influencing bacterial community composition and dynamics [@Zackular:2014fba; @Garrett:2015fg; @Baxter:2014hb]. Despite their carcinogenic potential and the strong association between bacteria and colorectal cancer, a mechanistic link between virus colorectal communities and colorectal cancer has yet to be evaluated. Here we show that, like colonic bacterial communities, the colon virome was altered in patients with colorectal cancer relative to those with healthy colons. Our findings support a working hypothesis for oncogenesis by phage-modulated bacterial community composition. We have begun to delineate the role the colonic virome plays in colorectal cancer **(Figure \ref{modelsummary} A)**. We found that basic diversity metrics of alpha diversity (richness and Shannon diversity) and beta diversity (Bray-Curtis dissimilarity) were insufficient for identifying virome community differences between healthy and cancerous states. By implementing a more sophisticated machine learning approach (random forest classification), we detected strong associations between the colon virus community composition and colorectal cancer. The colorectal cancer virome was composed primarily of bacteriophages. These phage communities were not exclusively predators of the most influential bacteria, as demonstrated by the lack of correlation between the abundances of the bacterial and phage populations. Instead, we identified influential phages as being community hubs, suggesting phages influence cancer by altering the greater bacterial community instead of directly modulating the influential bacteria. Our previous work has shown that modifying colon bacterial communities alters colorectal cancer progression and tumor burden in mice [@Zackular:2016en; @Baxter:2014hb]. This provides a precedent for phage indirectly influencing colorectal cancer progression by altering the bacterial community composition. Overall, our data support a model in which the bacteriophage community modulates the bacterial community, and through those interactions indirectly influences the bacteria driving colorectal cancer progression **(Figure \ref{modelsummary} A)**. Although our evidence suggested phages indirectly influenced colorectal cancer development, we were not able to rule out the role of phages directly interacting with the human host [@Lengeling:2013ia; @Grski:2012fa]. -In addition to modeling the potential connections between virus communities, bacteria communities, and colorectal cancer, we also used our data and existing knowledge of phage biology to develop a working hypothesis for the mechanisms by which this may occur. This was done by incorporating our findings into the current model for colorectal cancer development **(Figure \ref{modelsummary} B)** [@Flynn:2016iu]. We hypothesize that the process began with broadly infectious phages in the colon lysing and thereby disrupting the existing bacterial communities. This shift led to novel niche space that enabled opportunistic bacteria (such as *Fusobacterium nucleatum*) to colonize. Once the initial influential founder bacteria established themselves in the epithelium, secondary opportunistic bacteria were able to adhere to the founders, colonize, and begin establishing a biofilm. Phages may have played a role in biofilm dispersal and growth by lysing bacteria within the biofilm, a process important for effective biofilm growth [@Rossmann:2015cj]. The oncogenic bacteria may then have been able to transform the epithelial cells and disrupt tight junctions to infiltrate the epithelium, thereby initiating an inflammatory immune response. As the adenomatous polyps developed and progressed towards carcinogenesis, we observed a shift in the phages and bacteria whose relative abundances were most influential. As the bacteria entered their oncogenic synergy with the epithelium, we conjecture that the phages continued mediating biofilm dispersal. This process would thereby support the colonized oncogenic bacteria by lysing competing cells and releasing nutrients to other bacteria in the form of cellular lysates. In addition to highlighting the likely mechanisms by which the colorectal cancer virome is interacting with the bacterial communities, this outline will guide future research investigations of the role the virome plays colorectal cancer. +In addition to modeling the potential connections between virus communities, bacterial communities, and colorectal cancer, we also used our data and existing knowledge of phage biology to develop a working hypothesis for the mechanisms by which this may occur. This was done by incorporating our findings into the current model for colorectal cancer development **(Figure \ref{modelsummary} B)** [@Flynn:2016iu]. We hypothesize that the process begins with broadly infectious phages in the colon lysing and thereby disrupting the existing bacterial communities. This shift opens novel niche space that enabled opportunistic bacteria (such as *Fusobacterium nucleatum*) to colonize. Once the initial influential founder bacteria establish themselves in the epithelium, secondary opportunistic bacteria are able to adhere to the founders, colonize, and establish a biofilm. Phages may play a role in biofilm dispersal and growth by lysing bacteria within the biofilm, a process important for effective biofilm growth [@Rossmann:2015cj]. The oncogenic bacteria may then be able to transform the epithelial cells and disrupt tight junctions to infiltrate the epithelium, thereby initiating an inflammatory immune response. As the adenomatous polyps developed and progressed towards carcinogenesis, we observed a shift in the phages and bacteria whose relative abundances were most influential. As the bacteria enter their oncogenic synergy with the epithelium, we conjecture that the phages continue mediating biofilm dispersal. This process would thereby support the colonized oncogenic bacteria by lysing competing cells and releasing nutrients to other bacteria in the form of cellular lysates. In addition to highlighting the likely mechanisms by which the colorectal cancer virome is interacting with the bacterial communities this model will guide future research investigations of the role the virome plays colorectal cancer. ## Conclusions diff --git a/doc/manuscript.docx b/doc/manuscript.docx index 250423c..d2a35cc 100644 Binary files a/doc/manuscript.docx and b/doc/manuscript.docx differ diff --git a/doc/manuscript.md b/doc/manuscript.md index f9bd787..dfd483f 100644 --- a/doc/manuscript.md +++ b/doc/manuscript.md @@ -35,7 +35,7 @@ fontsize: 12pt \newpage ## Introduction -Due to their mutagenic abilities and propensity for functional manipulation, human viruses are strongly associated with, and in many cases cause, cancer [@Feng:2008kr; @Shuda:2011gf; @Schiller:2012ba; @Chang:1994up]. Because bacteriophages (viruses that specifically infect bacteria) are crucial for bacterial community stability and composition [@Harcombe:2005fd; @RodriguezValera:2009cr; @Cortez:2014bk] and have been implicated as oncogenic agents [@Zackular:2014fba; @Garrett:2015fg; @Baxter:2014hb; @Arthur:2012kl], bacteriophages have the potential to indirectly impact cancer. The gut virome (the virus community of the gut) therefore has the potential to impact health and disease. Altered human virome composition and diversity have been identified in diseases including periodontal disease [@Ly:2014ew], HIV [@Monaco:2016ita], cystic fibrosis [@Willner:2009dq], antibiotic exposure [@Abeles:2015dy; @Modi:2013fi], urinary tract infections [@SantiagoRodriguez:2015gd], and inflammatory bowel disease [@Norman:2015kb]. The strong association of bacterial communities with colorectal cancer and the precedent for the virome to impact other human diseases suggest that colorectal cancer may be associated with altered virus communities. +Due to their mutagenic abilities and propensity for functional manipulation, human viruses are strongly associated with, and in many cases cause, cancer [@Feng:2008kr; @Shuda:2011gf; @Schiller:2012ba; @Chang:1994up]. Because bacteriophages (i.e. viruses that specifically infect bacteria) are crucial for bacterial community stability and composition [@Harcombe:2005fd; @RodriguezValera:2009cr; @Cortez:2014bk] and have been implicated as oncogenic agents [@Zackular:2014fba; @Garrett:2015fg; @Baxter:2014hb; @Arthur:2012kl], bacteriophages have the potential to indirectly impact cancer. The gut virome (i.e. the virus community of the gut) therefore has the potential to impact health and disease. Altered human virome composition and diversity have been identified in diseases including periodontal disease [@Ly:2014ew], HIV [@Monaco:2016ita], cystic fibrosis [@Willner:2009dq], antibiotic exposure [@Abeles:2015dy; @Modi:2013fi], urinary tract infections [@SantiagoRodriguez:2015gd], and inflammatory bowel disease [@Norman:2015kb]. The strong association of bacterial communities with colorectal cancer and the precedent for the virome to impact other human diseases suggest that colorectal cancer may be associated with altered virus communities. Colorectal cancer is the second leading cause of cancer-related deaths in the United States [@Siegel:2014jo]. The US National Cancer Institute estimates over 1.5 million Americans were diagnosed with colorectal cancer in 2016 and over 500,000 Americans died from the disease [@Siegel:2014jo]. Growing evidence suggests that an important component of colorectal cancer etiology may be perturbations in the colonic bacterial community [@Zackular:2016en; @Zackular:2014fba; @Baxter:2014hb; @Dejea:2014fz; @Arthur:2012kl]. Work in this area has led to a proposed disease model in which bacteria colonize the colon, develop biofilms, promote inflammation, and enter an oncogenic synergy with the cancerous human cells [@Flynn:2016iu]. This association also has allowed researchers to leverage bacterial community signatures as biomarkers to provide accurate, noninvasive colorectal cancer detection from stool [@Zackular:2014fba; @Baxter:2016dja; @Zeller:2014ix]. While an understanding of colorectal cancer bacterial communities has proven fruitful both for disease classification and for identifying the underlying disease etiology, bacteria are only a subset of the colon microbiome. Viruses are another important component of the colon microbial community that have yet to be studied in the context of colorectal cancer. We evaluated disruptions in virus and bacterial community composition in a human cohort whose stool was sampled at the three relevant stages of cancer development: healthy, adenomatous, and cancerous. @@ -46,17 +46,17 @@ Here we address the knowledge gap of whether virus community composition is alte ## Cohort Design, Sample Collection, and Processing Our study cohort consisted of 90 human subjects, 30 of whom had healthy colons, 30 of whom had adenomas, and 30 of whom had carcinomas **(Figure \ref{sampleproc})**. Half of each stool sample was used to sequence the bacterial communities using both 16S rRNA gene and shotgun sequencing techniques. The 16S rRNA gene sequencing was performed for a previous study, and the sequences were re-analyzed using contemporary methods [@Zackular:2014fba]. The other half of each stool sample was purified for virus like particles (VLPs) before genomic DNA extraction and shotgun metagenomic sequencing. In the VLP purification, cells were disrupted and extracellular DNA degraded **(Figure \ref{sampleproc})** to allow the exclusive analysis of viral DNA within virus capsids. In this manner, the *extracellular virome* of encapsulated viruses was targeted. -Each extraction was performed with a blank buffer control to detect contaminants from reagents or other unintentional sources. Only one of the nine controls contained detectable DNA at a minimal concentration of 0.011 ng/µl, thus providing evidence of the enrichment and purification of VLP genomic DNA over potential contaminants **(Figure \ref{qualcontrol} A)**. As was expected, these controls yielded few sequences and were almost entirely removed while rarefying the datasets to a common number of sequences **(Figure \ref{qualcontrol} B)**. The high quality phage and bacterial sequences were assembled into highly covered contigs longer than 1 kb **(Figure \ref{contigqc})**. Because contigs represent genome fragments, we further clustered related bacterial contigs into operational genomic units (OGUs) and viral contigs into operational viral units (OVUs) **(Figure \ref{contigqc} - \ref{clustercontigqc})** to approximate organismal units. +Each extraction was performed with a blank buffer control to detect contaminants from reagents or other unintentional sources. Only one of the nine controls contained detectable DNA at a minimal concentration of 0.011 ng/µl, thus providing evidence of the enrichment and purification of VLP genomic DNA over potential contaminants **(Figure \ref{qualcontrol} A)**. As expected, these controls yielded few sequences and were almost entirely removed while rarefying the datasets to a common number of sequences **(Figure \ref{qualcontrol} B)**. The high quality phage and bacterial sequences were assembled into highly covered contigs longer than 1 kb **(Figure \ref{contigqc})**. Because contigs represent genome fragments, we further clustered related bacterial contigs into operational genomic units (OGUs) and viral contigs into operational viral units (OVUs) **(Figure \ref{contigqc} - \ref{clustercontigqc})** to approximate organismal units. ## Unaltered Virome Diversity in Colorectal Cancer -Microbiome and disease associations are often described as being of an altered diversity (i.e., "dysbiotic"). We therefore first evaluated the influence of colorectal cancer on virome OVU diversity. We evaluated differences in communities between disease states using the Shannon diversity, richness, and Bray-Curtis metrics. We observed no significant alterations in either Shannon diversity or richness in the diseased states as compared to the healthy state **(Figure \ref{betaogu} C-D)**. There was no statistically significant clustering of the disease groups (ANOSIM p-value = 0.4, **Figure \ref{betaogu}**). Notably, there was a significant difference between the few blank controls that remained after rarefying the data and the other study groups (ANOSIM p-value < 0.001, **Figure \ref{betaogunegative})**, further supporting the quality of the sample set. In summary, standard alpha and beta diversity metrics were insufficient for capturing virus community differences between disease states **(Figure \ref{betaogu})**. This is consistent with what has been observed when the same metrics were applied to 16S rRNA sequenced and metagenomic samples [@Zeller:2014ix; @Zackular:2014fba; @Baxter:2016dja] and points to the need for alternate approaches to detect the impact of colorectal cancer disease state on these communities. +Microbiome and disease associations are often described as being of an altered diversity (i.e., "dysbiotic"). Therefore, we first evaluated the influence of colorectal cancer on virome OVU diversity. We evaluated differences in communities between disease states using the Shannon diversity, richness, and Bray-Curtis metrics. We observed no significant alterations in either Shannon diversity or richness in the diseased states as compared to the healthy state **(Figure \ref{betaogu} C-D)**. There was no statistically significant clustering of the disease groups (ANOSIM p-value = 0.4, **Figure \ref{betaogu}**). Notably, there was a significant difference between the few blank controls that remained after rarefying the data and the other study groups (ANOSIM p-value < 0.001, **Figure \ref{betaogunegative})**, further supporting the quality of the sample set. In summary, standard alpha and beta diversity metrics were insufficient for capturing virus community differences between disease states **(Figure \ref{betaogu})**. This is consistent with what has been observed when the same metrics were applied to 16S rRNA sequenced and metagenomic samples [@Zeller:2014ix; @Zackular:2014fba; @Baxter:2016dja] and points to the need for alternate approaches to detect the impact of colorectal cancer disease state on these communities. ## Altered Virome Composition in Colorectal Cancer As opposed to the diversity metrics discussed above, OTU-based relative abundance profiles generated from 16S rRNA gene sequences are effective feature sets for classifying stool samples as originating from individuals with healthy, adenomatous, or cancerous colons [@Zackular:2014fba; @Baxter:2016dja]. The exceptional performance of bacteria in these classification models supports a role for bacteria in colorectal cancer. We built off of these findings by evaluating the ability of virus community signatures to classify stool samples and compared their performance to models built using bacterial community signatures. To identify the altered virus communities associated with colorectal cancer, we built and tested random forest models for classifying stool samples as belonging to individuals with either cancerous or healthy colons. We confirmed that our bacterial 16S rRNA gene model replicated the performance of the original report which used logit models instead of random forest models **(Figure \ref{predmodel} A)** [@Zackular:2014fba]. We then compared the bacterial OTU model to a model built using OVU relative abundances. The viral model performed as well as the bacterial model (corrected p-value = 0.4), with the viral and bacterial models achieving mean area under the curve (AUC) values of 0.793 and 0.796, respectively **(Figure \ref{predmodel} A - B)**. To evaluate the ability of both bacterial and viral biomarkers to classify samples, we built a combined model that used both bacterial and viral community data. The combined model yielded a modest but statistically significant performance improvement beyond the viral (corrected p-value = 0.002) and bacterial (corrected p-value = 0.002) models, yielding an AUC of 0.816 **(Figure \ref{predmodel} A - B)**. The combined features from the virus and bacterial communities improved our ability to classify stool as belonging to individuals with cancerous colons. -To determine the advantage of viral metagenomic methods over bacterial metagenomic methods, we compared the viral model to a model built using OGU relative abundance profiles from bacterial metagenomic shotgun sequencing data. This model performed worse than the other models (mean AUC = 0.505) **(Figure \ref{predmodel} A - B)**. Because the coverage provided by the metagenomic sequencing was not as deep as the equivalent 16S rRNA gene sequencing, we attempted to compare the approaches at a common sequencing depth. This investigation revealed that the bacterial 16S rRNA gene model was strongly driven by sparse and low abundance OTUs **(Figure \ref{16scompare})**. Removal of OTUs with a median abundance of zero resulted in the removal of six OTUs, and a loss of model performance down to what was observed in the metagenome-based model **(Figure \ref{16scompare} A)**. The majority of these OTUs had a relative abundance lower than 1% across the samples **(Figure \ref{16scompare} B)**. Although the features in the viral model also were of low abundance **(Figure \ref{threewaymodel} F)**, the coverage was sufficient for high model performance, likely because viral genomes are orders of magnitude smaller than bacterial genomes. Thus, the targeted 16S rRNA gene sequencing approach, which represented only a fraction of the bacterial metagenomic sequencing depth, was more effective for detecting colorectal cancer in stool samples. Despite a loss of enthusiasm for 16S rRNA gene sequencing in favor of shotgun metagenomic techniques, 16S rRNA gene sequencing is still a superior methodological approach for some important applications. +To determine the advantage of viral metagenomic methods over bacterial metagenomic methods, we compared the viral model to a model built using OGU relative abundance profiles from bacterial metagenomic shotgun sequencing data. This model performed worse than the other models (mean AUC = 0.505) **(Figure \ref{predmodel} A - B)**. Because the coverage provided by the metagenomic sequencing was not as deep as the equivalent 16S rRNA gene sequencing, we attempted to compare the approaches at a common sequencing depth. This investigation revealed that the bacterial 16S rRNA gene model was strongly driven by sparse and low abundance OTUs **(Figure \ref{16scompare})**. Removal of OTUs with a median abundance of zero resulted in the removal of six OTUs, and a loss of model performance down to what was observed in the metagenome-based model **(Figure \ref{16scompare} A)**. The majority of these OTUs had a relative abundance lower than 1% across the samples **(Figure \ref{16scompare} B)**. Although the features in the viral model also were of low abundance **(Figure \ref{threewaymodel} F)**, the coverage was sufficient for high model performance, likely because viral genomes are orders of magnitude smaller than bacterial genomes. Thus, the targeted 16S rRNA gene sequencing approach, which represented only a fraction of the bacterial metagenomic sequencing depth, was more effective for detecting colorectal cancer in stool samples. Despite the recent loss of enthusiasm for 16S rRNA gene sequencing in favor of shotgun metagenomic techniques, 16S rRNA gene sequencing is still a superior methodological approach for some important applications. The association between the bacterial and viral communities and colorectal cancer was driven by a few important microbes. *Fusobacterium* was the primary driver of the bacterial association with colorectal cancer, which is consistent with its previously described oncogenic potential **(Figure \ref{predmodel} C)**[@Flynn:2016iu]. The virome signature also was driven by a few OVUs, suggesting a role for these viruses in tumorigenesis **(Figure \ref{predmodel} D)**. The identified viruses were bacteriophages, belonging to *Siphoviridae*, *Myoviridae*, and "unclassified" phage taxa. Many of the important viruses were unidentifiable (denoted "unknown"). This is common in viromes across habitats; studies have reported as much as 95% of virus sequences belonging to unknown genomic units [@Pedulla:2003tu; @Hannigan:2015fz; @Willner:2009dq; @Brum:2015iaa]. When the bacterial and viral community signatures were combined, both bacterial and viral organisms drove the community association with cancer **(Figure \ref{predmodel} E)**. @@ -73,14 +73,14 @@ We evaluated whether the phages in the community were primarily lytic (i.e. obli ## Community Context of Influential Phages Because the link between colorectal cancer and the virome was driven by bacteriophages, we hypothesized that the influential phages were primarily predators of the influential bacteria, and thus influenced their relative abundance through predation. If this hypothesis were true, we would expect a correlation between the relative abundances of influential bacteria and phages. Instead, we observed a strikingly low correlation between bacterial and phage relative abundances **(Figure \ref{correlations} A,C)**. Overall, there was an absence of correlation between the most influential OVUs and bacterial OTUs **(Figure \ref{correlations} B)**. This evidence supported our null hypothesis that the influential phages were not primarily predators of influential bacteria. -Given these findings, we hypothesized that the most influential phages were acting by infecting a wide range of bacteria in the overall community, instead of just the influential bacteria. In other words, we hypothesized that the influential bacteriophages were community hubs (central members) within the bacteria and phage interactive network. We investigated the potential host ranges of all phage OVUs using a previously developed random forest model that relies on sequence features to predict which phages infected which bacteria in the community **(Figure \ref{network} A)** [@Hannigan:2017cj]. The predicted interactions were then used to identify phage community hubs. We calculated the alpha centrality (measure of importance in the ecological network) of each phage OVU's connection to the rest of the network. The phages with high centrality values were defined as community hubs. Next, the centrality of each OVU was compared to its importance in the colorectal cancer classification model. Phage OVU centrality was significantly and positively correlated with importance to the disease model (p-value = 0.02, R = 0.14), suggesting that phages important in driving colorectal cancer also were more likely to be community hubs **(Figure \ref{network} B)**. Together these findings supported our hypothesis that influential phages were hubs within their microbial communities and had broad host ranges. +Given these findings, we hypothesized that the most influential phages were acting by infecting a wide range of bacteria in the overall community, instead of just the influential bacteria. In other words, we hypothesized that the influential bacteriophages were community hubs (i.e. central members) within the bacteria and phage interactive network. We investigated the potential host ranges of all phage OVUs using a previously developed random forest model that relies on sequence features to predict which phages infected which bacteria in the community **(Figure \ref{network} A)** [@Hannigan:2017cj]. The predicted interactions were then used to identify phage community hubs. We calculated the alpha centrality (i.e. measure of importance in the ecological network) of each phage OVU's connection to the rest of the network. The phages with high centrality values were defined as community hubs. Next, the centrality of each OVU was compared to its importance in the colorectal cancer classification model. Phage OVU centrality was significantly and positively correlated with importance to the disease model (p-value = 0.02, R = 0.14), suggesting that phages important in driving colorectal cancer also were more likely to be community hubs **(Figure \ref{network} B)**. Together these findings supported our hypothesis that influential phages were hubs within their microbial communities and had broad host ranges. ## Working Model for Virome & Cancer Progression Because of their propensity for mutagenesis and capacity for modulating their host functionality, many viruses are oncogenic [@Feng:2008kr; @Shuda:2011gf; @Schiller:2012ba; @Chang:1994up]. Some bacteria also have oncogenic properties, suggesting that bacteriophages may play an indirect role in promoting carcinogenesis by influencing bacterial community composition and dynamics [@Zackular:2014fba; @Garrett:2015fg; @Baxter:2014hb]. Despite their carcinogenic potential and the strong association between bacteria and colorectal cancer, a mechanistic link between virus colorectal communities and colorectal cancer has yet to be evaluated. Here we show that, like colonic bacterial communities, the colon virome was altered in patients with colorectal cancer relative to those with healthy colons. Our findings support a working hypothesis for oncogenesis by phage-modulated bacterial community composition. We have begun to delineate the role the colonic virome plays in colorectal cancer **(Figure \ref{modelsummary} A)**. We found that basic diversity metrics of alpha diversity (richness and Shannon diversity) and beta diversity (Bray-Curtis dissimilarity) were insufficient for identifying virome community differences between healthy and cancerous states. By implementing a more sophisticated machine learning approach (random forest classification), we detected strong associations between the colon virus community composition and colorectal cancer. The colorectal cancer virome was composed primarily of bacteriophages. These phage communities were not exclusively predators of the most influential bacteria, as demonstrated by the lack of correlation between the abundances of the bacterial and phage populations. Instead, we identified influential phages as being community hubs, suggesting phages influence cancer by altering the greater bacterial community instead of directly modulating the influential bacteria. Our previous work has shown that modifying colon bacterial communities alters colorectal cancer progression and tumor burden in mice [@Zackular:2016en; @Baxter:2014hb]. This provides a precedent for phage indirectly influencing colorectal cancer progression by altering the bacterial community composition. Overall, our data support a model in which the bacteriophage community modulates the bacterial community, and through those interactions indirectly influences the bacteria driving colorectal cancer progression **(Figure \ref{modelsummary} A)**. Although our evidence suggested phages indirectly influenced colorectal cancer development, we were not able to rule out the role of phages directly interacting with the human host [@Lengeling:2013ia; @Grski:2012fa]. -In addition to modeling the potential connections between virus communities, bacteria communities, and colorectal cancer, we also used our data and existing knowledge of phage biology to develop a working hypothesis for the mechanisms by which this may occur. This was done by incorporating our findings into the current model for colorectal cancer development **(Figure \ref{modelsummary} B)** [@Flynn:2016iu]. We hypothesize that the process began with broadly infectious phages in the colon lysing and thereby disrupting the existing bacterial communities. This shift led to novel niche space that enabled opportunistic bacteria (such as *Fusobacterium nucleatum*) to colonize. Once the initial influential founder bacteria established themselves in the epithelium, secondary opportunistic bacteria were able to adhere to the founders, colonize, and begin establishing a biofilm. Phages may have played a role in biofilm dispersal and growth by lysing bacteria within the biofilm, a process important for effective biofilm growth [@Rossmann:2015cj]. The oncogenic bacteria may then have been able to transform the epithelial cells and disrupt tight junctions to infiltrate the epithelium, thereby initiating an inflammatory immune response. As the adenomatous polyps developed and progressed towards carcinogenesis, we observed a shift in the phages and bacteria whose relative abundances were most influential. As the bacteria entered their oncogenic synergy with the epithelium, we conjecture that the phages continued mediating biofilm dispersal. This process would thereby support the colonized oncogenic bacteria by lysing competing cells and releasing nutrients to other bacteria in the form of cellular lysates. In addition to highlighting the likely mechanisms by which the colorectal cancer virome is interacting with the bacterial communities, this outline will guide future research investigations of the role the virome plays colorectal cancer. +In addition to modeling the potential connections between virus communities, bacterial communities, and colorectal cancer, we also used our data and existing knowledge of phage biology to develop a working hypothesis for the mechanisms by which this may occur. This was done by incorporating our findings into the current model for colorectal cancer development **(Figure \ref{modelsummary} B)** [@Flynn:2016iu]. We hypothesize that the process begins with broadly infectious phages in the colon lysing and thereby disrupting the existing bacterial communities. This shift opens novel niche space that enabled opportunistic bacteria (such as *Fusobacterium nucleatum*) to colonize. Once the initial influential founder bacteria establish themselves in the epithelium, secondary opportunistic bacteria are able to adhere to the founders, colonize, and establish a biofilm. Phages may play a role in biofilm dispersal and growth by lysing bacteria within the biofilm, a process important for effective biofilm growth [@Rossmann:2015cj]. The oncogenic bacteria may then be able to transform the epithelial cells and disrupt tight junctions to infiltrate the epithelium, thereby initiating an inflammatory immune response. As the adenomatous polyps developed and progressed towards carcinogenesis, we observed a shift in the phages and bacteria whose relative abundances were most influential. As the bacteria enter their oncogenic synergy with the epithelium, we conjecture that the phages continue mediating biofilm dispersal. This process would thereby support the colonized oncogenic bacteria by lysing competing cells and releasing nutrients to other bacteria in the form of cellular lysates. In addition to highlighting the likely mechanisms by which the colorectal cancer virome is interacting with the bacterial communities this model will guide future research investigations of the role the virome plays colorectal cancer. ## Conclusions diff --git a/doc/manuscript.pdf b/doc/manuscript.pdf index f3cde89..3cb3b41 100644 Binary files a/doc/manuscript.pdf and b/doc/manuscript.pdf differ diff --git a/doc/missfont.log b/doc/missfont.log index 1d03af5..ce35d3a 100644 --- a/doc/missfont.log +++ b/doc/missfont.log @@ -217,3 +217,9 @@ mktextfm Times mktextfm Times mktextfm Times mktextfm Times +mktextfm Times +mktextfm Times +mktextfm Times +mktextfm Times +mktextfm Times +mktextfm Times