0.IntroBayesian.Rmd

---
######################################
# Click on "Run Document" in RStudio #
######################################
title: "Quick Introduction to Bayesian Statistics"
author: "Miguel de Navascués"
output: learnr::tutorial
runtime: shiny_prerendered
bibliography: src/references.bib
---

These learning resources are available at [GitHub](https://github.com/mnavascues/ABCRFtutorial) & [Zenodo](http://doi.org/10.5281/zenodo.1435503).

## Setup (tl;dr)

This unit uses R packages `learnr` [@Schloerke2020].

```{r setup, include=TRUE}
library(learnr)
# function to calculate likelihood for the coin toss experiment
source("src/coin_toss.R", local = TRUE)
```


## Question, model, experiment, data

**The question**: We have a coin that we use to choose randomly between two alternatives. Is the coin fair?

**The [model](http://en.wikipedia.org/wiki/Bernoulli_trial)**: For each coin toss there is a probability $p$ of getting *heads* and $q=1-p$ of getting *tails*. Under this model, the question can be formulated as: is $p=q=0.5$?

**The experiment**: A coin is tossed 10 times and the results (*heads/tails*, plough/cereal in this case) are recorded.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/10_lire_1975_O.jpg/240px-10_lire_1975_O.jpg" alt="heads" style="height: 80px; width:80px;"/>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5d/10_lire_1975_R.jpg/240px-10_lire_1975_R.jpg" alt="tails" style="height: 80px; width:80px;"/>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5d/10_lire_1975_R.jpg/240px-10_lire_1975_R.jpg" alt="tails" style="height: 80px; width:80px;"/>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5d/10_lire_1975_R.jpg/240px-10_lire_1975_R.jpg" alt="tails" style="height: 80px; width:80px;"/>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/10_lire_1975_O.jpg/240px-10_lire_1975_O.jpg" alt="heads" style="height: 80px; width:80px;"/>

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5d/10_lire_1975_R.jpg/240px-10_lire_1975_R.jpg" alt="tails" style="height: 80px; width:80px;"/>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/10_lire_1975_O.jpg/240px-10_lire_1975_O.jpg" alt="heads" style="height: 80px; width:80px;"/>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5d/10_lire_1975_R.jpg/240px-10_lire_1975_R.jpg" alt="tails" style="height: 80px; width:80px;"/>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5d/10_lire_1975_R.jpg/240px-10_lire_1975_R.jpg" alt="tails" style="height: 80px; width:80px;"/>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/10_lire_1975_O.jpg/240px-10_lire_1975_O.jpg" alt="heads" style="height: 80px; width:80px;"/>

**The data**: The sequence of coin tosses is: H, T, T, T, H, T, H, T, T, H. That is `h=4` and `t=6`.

## Likelihood

The likelihood that $p=q=0.5$ given our data ($D=(h=4,t=6)$) is the probability of the data under the model with $p=q=0.5$:

\[ L(p=0.5|D) = P(D|p=0.5) \]

### Exercise 1

How do you calculate that probability?

```{r likelihood, exercise=TRUE}
p<-0.5; h<-4; t<-6 
_____
```

```{r likelihood-hint-1}
p<-0.5; h<-4; t<-6 
_____ * p^h * _____ 
```

```{r likelihood-hint-2}
p<-0.5; h<-4; t<-6 
_____ * p^h * (1-p)^t 
```

```{r likelihood-solution}
p<-0.5; h<-4; t<-6 
choose(h+t, h) * p^h * (1-p)^t
```

### Exercise 2

This value in itself does not solve your question, you need to compare with the likelihood for other values of $p$. Create a function that calculates the likelihood and plot the likelihood values for `p=seq(0,1,0.01)`. 

```{r likelihood_function, exercise=TRUE}
flip_coin_likelihood <- function(p,h,t){
  likelihood <- _____
  return (list(p=p,l=likelihood))
}
likelihood_profile <- _____
plot(likelihood_profile$_____,
     likelihood_profile$_____,
     xlab = "p", ylab = "Likelihood", type = "l", col="red")
```

```{r likelihood_function-solution}
flip_coin_likelihood <- function(p,h,t){
  likelihood <- choose(h+t,h) * p^h * (1-p)^t
  return (list(p=p,l=likelihood))
}
likelihood_profile <- flip_coin_likelihood(p=seq(0,1,0.01),h=4,t=6)
plot(likelihood_profile$p,
     likelihood_profile$l,
     xlab = "p", ylab = "Likelihood", type = "l", col="red")
```

Note: function `flip_coin_likelihood()` has been loaded with `source("src/coin_toss.R")` in the setup and is available for all exercises.

### Exercise 3

From the likelihood profile we can obtain which is the value of $p$ that explain the best the observed data: the [maximum likelihood estimate](http://en.wikipedia.org/wiki/Maximum_likelihood_estimation). Which is the MLE estimate  of $p$ ($\hat{p}$)?


```{r maximum_likelihood, exercise=TRUE}
likelihood_profile <- flip_coin_likelihood(p=seq(0,1,0.01),h=4,t=6)
maxL <- _____(likelihood_profile$l)
p_hat <- likelihood_profile$p[which(likelihood_profile$l==_____)]
p_hat
```

```{r maximum_likelihood-solution}
likelihood_profile <- flip_coin_likelihood(p=seq(0,1,0.01),h=4,t=6)
maxL <- max(likelihood_profile$l)
p_hat <- likelihood_profile$p[which(likelihood_profile$l==maxL)]
p_hat
```


For this simple model there is an analytical solution for the maximum likelihood estimator of $p$:

\[ \hat{p}=h/(h+t) \]

What about our initial question? Is the coin fair? From the likelihood profile we can also calculate whether the maximum likelihood estimate for $p$ is significantly better that $p=0.5$ (*via* the [confidence interval](http://en.wikipedia.org/wiki/Confidence_interval#Methods_of_derivation)). Based on the likelihood-ratio test (with $\alpha=0.05$):


```{r confidence_interval, eval=TRUE, echo=TRUE}
likelihood_profile <- flip_coin_likelihood(p=seq(0,1,0.01),h=4,t=6)
maxL <- max(likelihood_profile$l)
CI95  <- likelihood_profile$p[which(likelihood_profile$l > exp(log(maxL)-1.92))]
CI95  <- CI95[c(1,length(CI95))]
CI95
```

According to this results, we cannot reject the hypothesis that $p=0.5$ (with $\alpha=0.05$). 

### Exercise 4

What would happen for a larger number of coin tosses? What about `h=20` and `t=30`? Or `h=400` and `t=600`? Explore this and other values below:

```{r explore_binomial_likelihood, exercise=TRUE}
h=_____
t=_____
likelihood_profile <- flip_coin_likelihood(p=seq(0,1,0.001),h=h,t=t)
maxL <- max(likelihood_profile$l)
p_hat <- likelihood_profile$p[which(likelihood_profile$l==maxL)]
CI95  <- likelihood_profile$p[which(likelihood_profile$l > exp(log(maxL)-1.92))]
CI95  <- CI95[c(1,length(CI95))]
plot(x = likelihood_profile$p,
     y = likelihood_profile$l,
     xlab = "p", ylab = "Likelihood", type = "l", col="red")
abline(v=p_hat, col="red")
abline(v=CI95[1],lt=2, col="red")
abline(v=CI95[2],lt=2, col="red")
abline(v=0.5)

```

## Bayesian statistics

The posterior probability is the probability of the parameter $p$ given the data $D$: $P(p|D)$. Using Bayes' Theorem: 

\[P(p|D)=\frac{P(D|p)P(p)}{P(D)}\]

where $P(D|p)=L(p|D)$ is the likelihood and $P(p)$ is the prior probability. The marginal likelihood, $P(D)$, is constant and does not need to be calculated: $P(p|D)\propto L(p|D)P(p)$.

In this simple case (coin flipping experiment) there is an analytical solution for the posterior when using a conjugate prior such as the beta distribution, $\mathrm{B}(\alpha, \beta)$. Thus, the posterior distribution is $\mathrm{B}(\alpha+h, \beta+t)$. Note that the uniform distribution $\mathrm{U}(0,1)$ is the same as beta distribution $\mathrm{B}(1,1)$.

In Bayesian statistics a point estimate of the parameters is often given by the median posterior probability and credibility intervals by the quantile probabilities.


### Exercise 5

Practice the Bayesian analysis of the coin toss experiment. How does the results change when the prior parameters change? What is the interpretation of the prior? i.e. what is the interpretation of $\alpha$ and $\beta$?

```{r explore_binomial_posterior, exercise=TRUE}
h=4;t=6;a=1;b=1
p=seq(0,1,0.001)

plot(p, y = dbeta(p,a,b), # prior probability
     xlab = "p", ylab = "Probability density", type = "l", ylim=c(0,5))
lines(p, y = dbeta(p,a+h,b+t), col="blue") # posterior probability
quant <- qbeta(c(0.5,0.025,0.975),a+h,b+t)
abline(v=quant[1], col="blue")
abline(v=quant[2], lt=2, col="blue")
abline(v=quant[3], lt=2, col="blue")
abline(v=0.5)
```


## Likelihood *vs.* Posterior probability

### Exercise 6


Explore the difference between the likelihood and the posterior probability. Note the scale difference. What happens to the likelihood and posterior probability values when the amount of data ($h+t$) increases? What happens to the likelihood when the prior changes? Note: you might need to change the scale (`ylim=c(0,_____)`) depending on the values you explore.

```{r explore_likelihood_vs_posterior, exercise=TRUE}
h=4;t=6;a=1;b=1
p=seq(0,1,0.001)
plot(p, y = dbeta(p,a,b), # prior probability
     xlab = "p", ylab = "Probability density/Likelihood", type = "l", ylim=c(0,3))
lines(p, dbeta(p,a+h,b+t), col="blue") # posterior probability
lines(p, y = flip_coin_likelihood(p,h,t)$l, col = "red")# likelihood
```


## *All models are wrong*

Until this point we have considered that the binomial model describes well the reality of our coin. However, in addition to heads and tails there are other possible outcomes when tossing a coin, e.g. the coin stays on its side, falls into a drainage grate... [see also @Diaconis2007]

### Exercise 7

Analyze the data under a model that considers three outcomes (i.e. heads/tails/side):

```{r multinomial, exercise=TRUE, , exercise.lines=15}
h=4;t=6;s=0
```


#   

#### References