Tutorial.Rmd

---
title: "Tutorial on Microbiome Data Analysis"
author: "Leo Lahti, Sudarshan Shetty et al."
bibliography: 
- bibliography.bib
output:
  BiocStyle::html_document:
    number_sections: no
    toc: yes
    toc_depth: 4
    toc_float: true
    self_contained: true
    thumbnails: true
    lightbox: true
    gallery: true
    use_bookdown: false
    highlight: haddock
---
<!--
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{microbiome tutorial - Installation}
  %\usepackage[utf8]{inputenc}
  %\VignetteEncoding{UTF-8}  
-->


This tutorial gets You started with R tools for microbial ecology. In particular, to provide an introduction to

 * R tools for microbial ecology
 * Role of custom data formats and tools in data analytical workflows
 * Reproducible document generation 
 * Possibilities and challenges in population-level microbiome profiling studies


# Installation

Launch R/RStudio and install the microbiome R package (see [installation instructions](http://microbiome.github.io/tutorials/Installation.html)).

To initiate reproducible documentation, do the following in RStudio:

 1. Open a new Rmarkdown (.html) file 
 1. Convert that .html file with the 'knit' button
 1. Modify the file and knit again to make your own reproducible report


# Example data: Intestinal microbiota of 1006 Western adults

Example data set will be the [HITChip Atlas](Atlas.html), which is available via the microbiome R package in phyloseq format. This data set from [Lahti et al. Nat. Comm. 5:4344, 2014](http://www.nature.com/ncomms/2014/140708/ncomms5344/full/ncomms5344.html) comes with 130 genus-like taxonomic groups across 1006 western adults with no reported health complications. Some subjects have also short time series. Load the data in R with:

```{r atlasdata, message=FALSE, warning=FALSE}
# Data citation doi: 10.1038/ncomms5344
library(microbiome)
library(knitr)
data(atlas1006) 
print(atlas1006)

# Rename the data
pseq <- atlas1006
```

# Phyloseq data stucture for taxonomic profiling

Explore the phyloseq data format. See [examples on microbiome data processing](Preprocessing.html).


# Alpha diversity

Explore the estimation and analysis of various [diversity indices](Alphadiversity.html) and [taxonomic composition](Composition.html).

```{r global, message=FALSE, warning=FALSE}
alpha <- microbiome::alpha
tab <- alpha(pseq, index = "all")
kable(head(tab))
```

Assign the estimated diversity to sample metadata
```{r alpha2, message=FALSE, warning=FALSE}
sample_data(pseq)$diversity <- tab$diversity_shannon
```

Visualize the data

```{r diversity, message=FALSE, warning=FALSE}
p <- plot_regression(diversity ~ age, meta(pseq)) +
       labs(x = "Age", y = "Alpha diversity")
```


# Technical biases

Explore potential technical biases in the data. DNA extraction method has a remarkable effect on sample grouping.

```{r dnaextraction, message=FALSE, warning=FALSE, fig.width=12, fig.height=8}
# Use relative abundance data
ps <- microbiome::transform(pseq, "compositional")

# Pick core taxa
ps <- core(ps, detection = 0, prevalence = 80/100)

# For this example, choose samples with DNA extraction information available
ps <- subset_samples(ps, !is.na(DNA_extraction_method))

# Illustrate sample similarities with PCoA (NMDS)
plot_landscape(ps, "NMDS", "bray", col = "DNA_extraction_method")
```


# Core microbiota

[Core microbiota](Core.html) refers to the set of species shared by (almost) all individuals.


A full phyloseq object with just the core taxa is obtained as follows:

```{r core-data, message=FALSE, warning=FALSE}
# Transform to compositional abundances
pseq.rel <- microbiome::transform(pseq, "compositional")

# Pick the core (>0.1% relative abundance in >50% of the samples)
pseq.core <- core(pseq.rel, detection = 0.1/100, prevalence = 50/100)
```

Visualizing the core. Using all data can give a visual of which taxa are core at various detection thresholds. The pseq.core object is filtered to have only core taxa based on the user provided values. We suggest using all data for plotting with plot_core and then changing settings for prevalence to limit which taxa you want to visualise.

```{r core-example3,fig.width=12, fig.height=8, warning=FALSE}

# Core with compositionals:
prevalences <- seq(.05, 1, .05)

detections <- round(10^seq(log10(5e-3), log10(.2), length = 10), 3)

p <- plot_core(pseq.rel, plot.type = "heatmap",
  prevalences = prevalences, detections = detections, min.prevalence = 0.5) +
  xlab("Detection Threshold (Relative Abundance)") +
  theme(axis.text.x = element_text(size = 9))

print(p)
```
```


# Other tools

Explore further tools in [microbiome tutorial](http://microbiome.github.io/tutorials/).