-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathTutorial.Rmd
executable file
·147 lines (99 loc) · 4.6 KB
/
Tutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
---
title: "Tutorial on Microbiome Data Analysis"
author: "Leo Lahti, Sudarshan Shetty et al."
bibliography:
- bibliography.bib
output:
BiocStyle::html_document:
number_sections: no
toc: yes
toc_depth: 4
toc_float: true
self_contained: true
thumbnails: true
lightbox: true
gallery: true
use_bookdown: false
highlight: haddock
---
<!--
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{microbiome tutorial - Installation}
%\usepackage[utf8]{inputenc}
%\VignetteEncoding{UTF-8}
-->
This tutorial gets You started with R tools for microbial ecology. In particular, to provide an introduction to
* R tools for microbial ecology
* Role of custom data formats and tools in data analytical workflows
* Reproducible document generation
* Possibilities and challenges in population-level microbiome profiling studies
# Installation
Launch R/RStudio and install the microbiome R package (see [installation instructions](http://microbiome.github.io/tutorials/Installation.html)).
To initiate reproducible documentation, do the following in RStudio:
1. Open a new Rmarkdown (.html) file
1. Convert that .html file with the 'knit' button
1. Modify the file and knit again to make your own reproducible report
# Example data: Intestinal microbiota of 1006 Western adults
Example data set will be the [HITChip Atlas](Atlas.html), which is available via the microbiome R package in phyloseq format. This data set from [Lahti et al. Nat. Comm. 5:4344, 2014](http://www.nature.com/ncomms/2014/140708/ncomms5344/full/ncomms5344.html) comes with 130 genus-like taxonomic groups across 1006 western adults with no reported health complications. Some subjects have also short time series. Load the data in R with:
```{r atlasdata, message=FALSE, warning=FALSE}
# Data citation doi: 10.1038/ncomms5344
library(microbiome)
library(knitr)
data(atlas1006)
print(atlas1006)
# Rename the data
pseq <- atlas1006
```
# Phyloseq data stucture for taxonomic profiling
Explore the phyloseq data format. See [examples on microbiome data processing](Preprocessing.html).
# Alpha diversity
Explore the estimation and analysis of various [diversity indices](Alphadiversity.html) and [taxonomic composition](Composition.html).
```{r global, message=FALSE, warning=FALSE}
alpha <- microbiome::alpha
tab <- alpha(pseq, index = "all")
kable(head(tab))
```
Assign the estimated diversity to sample metadata
```{r alpha2, message=FALSE, warning=FALSE}
sample_data(pseq)$diversity <- tab$diversity_shannon
```
Visualize the data
```{r diversity, message=FALSE, warning=FALSE}
p <- plot_regression(diversity ~ age, meta(pseq)) +
labs(x = "Age", y = "Alpha diversity")
```
# Technical biases
Explore potential technical biases in the data. DNA extraction method has a remarkable effect on sample grouping.
```{r dnaextraction, message=FALSE, warning=FALSE, fig.width=12, fig.height=8}
# Use relative abundance data
ps <- microbiome::transform(pseq, "compositional")
# Pick core taxa
ps <- core(ps, detection = 0, prevalence = 80/100)
# For this example, choose samples with DNA extraction information available
ps <- subset_samples(ps, !is.na(DNA_extraction_method))
# Illustrate sample similarities with PCoA (NMDS)
plot_landscape(ps, "NMDS", "bray", col = "DNA_extraction_method")
```
# Core microbiota
[Core microbiota](Core.html) refers to the set of species shared by (almost) all individuals.
A full phyloseq object with just the core taxa is obtained as follows:
```{r core-data, message=FALSE, warning=FALSE}
# Transform to compositional abundances
pseq.rel <- microbiome::transform(pseq, "compositional")
# Pick the core (>0.1% relative abundance in >50% of the samples)
pseq.core <- core(pseq.rel, detection = 0.1/100, prevalence = 50/100)
```
Visualizing the core. Using all data can give a visual of which taxa are core at various detection thresholds. The pseq.core object is filtered to have only core taxa based on the user provided values. We suggest using all data for plotting with plot_core and then changing settings for prevalence to limit which taxa you want to visualise.
```{r core-example3,fig.width=12, fig.height=8, warning=FALSE}
# Core with compositionals:
prevalences <- seq(.05, 1, .05)
detections <- round(10^seq(log10(5e-3), log10(.2), length = 10), 3)
p <- plot_core(pseq.rel, plot.type = "heatmap",
prevalences = prevalences, detections = detections, min.prevalence = 0.5) +
xlab("Detection Threshold (Relative Abundance)") +
theme(axis.text.x = element_text(size = 9))
print(p)
```
```
# Other tools
Explore further tools in [microbiome tutorial](http://microbiome.github.io/tutorials/).