6_MEmodels.Rmd

---
title: "Mixed effect models"
output: html_notebook
---

American Foulbrood (AFB) is an infectious disease affecting the larval stage of honeybees (*Apis mellifera*) and is the most widespread and destructive of the brood diseases. The causative agent is *Paenibacillus larvae*, and the spore forming bacterium infects queen, drone, and worker larvae. Only the spore stage of the bacterium is infectious to honey bee larvae. The spores germinate into the vegetative stage soon after they enter the larval gut and continue to multiply until larval death. The spores are extremely infective and resilient, and one dead larva may contain billions of spores. 

Although adult bees are not directly affected by AFB, some of the tasks carried out by workers might have an impact on the transmission of AFB spores within the colony and on the transmission of spores between colonies. When a bee hatches from its cell, its first task is to clean the surrounding cells, and its next task is tending and feeding of larvae. Here, the risk of transmitting AFB spores is particularly great if larvae that succumbed to AFB are cleaned prior to feeding susceptible larvae. 

Because AFB is extremely contagious, hard to cure, and lethal at the colony level, it is of importance to detect outbreaks, before they spread and become difficult to control. Reliable detection methods are also important for studies of pathogen transmission within and between colonies. Of the available methods, sampling adult bees has been shown the most effective. Hornitzky and Karlovskis (1989) introduced the method of culturing adult honey bees for AFB, and demonstrated that spores can be detected from colonies without clinical symptoms. Recently, culturing of P. larvae from adult honey bee samples has been shown to be a more sensitive tool for AFB screening compared to culturing of honey samples. When samples of adult bees are used, the detection level of *P. larvae* is closely linked to the distribution of spores among the bees. 

For this reason, we will model the density of *P. larvae* with the potential explanatory variables as number of bees in the hive, presence or absence of AFB, and hive identity.

# Read in bee data
```{r load}
# Spobee column has density of P. larvae spores (the bacterium). 
# Hive has the ID of the hive sampled (3 samples/hive)
# Infection has a metric quantifying the degree of infection. We will turn this into yes/no whether infection is present. 
Bees <- read.table('Bees.txt', header=TRUE)
head(Bees)

# make hive a factor
Bees$fhive <- factor(Bees$Hive)

# Make a yes/no infection column (Infection01)
Bees$Infection01 <- Bees$Infection 
Bees$Infection01[Bees$Infection01 > 0] <- 1
Bees$fInfection01 <- factor(Bees$Infection01) # turn this into a factor

# Scale BeesN to improve model convergence (mean 0, standard deviation 1)
Bees$sBeesN <- scale(Bees$BeesN)
```

# Make a Cleveland dot-chart of spores vs. hive
```{r}
dotchart(Bees$Spobee, groups = Bees$fhive, xlab='Spores', ylab='Hive ID')
```

## Q1. Does variance of spore density appear homogenous among hives? Why or why not?
type answer here

Try some transformations of the response variable to homogenize the variances (or at least improve it). 
```{r transformations}
#type your code here
```

## Q2. Which transformation of spore density seems reasonable? Why?
type answer here


## Develop a simple linear model for transformed spore density. 
Include infection (fInfection01), number of bees (sBeesN) and their interaction as explanatory variables. Check for a hive effect by plotting standardized residuals (see the residuals(yourmodel, type='pearson') function) against hive ID (fhive). Show your code and your plots. 
```{r linear model}
#type your code here
```

### Q3. Do residuals look homogenous among hives?
type answer here


### Q4. What are the advantages of including hive as a random effect, rather than as a fixed effect?
type answer here


# Apply the Zuur protocol
The 10-step version is outlined here, as used with the barn owl nesting data in Zuur Ch. 5

## Step 1: Fit and check a "beyond optimal" linear regression 
(already done above, so skip this)

## Step 2: Fit a generalized least squares version of the "beyond optimal" model 
(no need: we will use the linear regression model).

## Step 3. Choose a variance structure or structures (the random effects). 
### Q5 What random effects do you want to try?
type answer here

We will now fit a mixed effects (ME) model. Zuur et al. used the nlme package in R, but Douglas Bates now has a newer package that is widely used and that is called lme4. The benefits of lme4 include greater flexibility in the structure of the random effects, the option to use non-Gaussian error structures (for generalized linear mixed effects models, or GLMMs), and more efficient code to fit models. The main difference between nlme's lme() function and the lmer() function in lme4 is in how random effects are specified:

model <- lmer(response ~ explanantoryvars + (1|random), data=mydata) # a random intercept model

model <- lmer(response ~ explanantoryvars + (slope|random), data=mydata) # a random intercept and slope model

One of the frustrations some people run into is that the lme4 package doesn't provide p-values. This stems from disagreements and uncertainty about how best to calculate p-values. However, if p-values are important to you, approximate p-values can be derived from the lmerTest package

```{r}
# install.packages('lme4') # if needed
# install.packages('lmerTest') if needed
require(lmerTest)
```

## Step 4. Fit the "beyond optimal" ME model(s)
Use lmer() in the lme4 package (transformed spore density is response, fInfection01, sBeesN, and interaction are the explanatory variables). 

### Q6. Show your code.
```{r beyond optimal model}
#type your code here
```


## Step 5. Compare the linear regression and ME model(s) with a likelihood ratio test.
Include correction for testing on the boundary if needed. Use the anova() command. This will re-fit your lmer model with maximum likelihood, but this is OK (note there are some debates about exactly how to best compare an lm and lmer model). Show your work and the results. 
```{r LRT}
#type your code here

```

### Q7. Which random effect structure do you choose based on the results?
type your answer here


## Step 6. Check the model
Plot standardized residuals vs. fitted values and vs. each predictor. (You can get standardized residuals with residuals(yourmodel, type='pearson')). 
```{r check}
#type your code here

```

### Q8. How do they look?
type your answer here


## Step 7. Re-fit the full model with ML
Use REML=FALSE and compare against a reduced model without the interaction term, also fit with ML. Use anova() to compare the models. 
```{r refit}

```

### Q9. Which model do you choose? Why?
type your answer here


## Step 8. Iterate Step 7 to arrive at the final model. 
Show your work. 
```{r simplification}
#type your code here

```

### Q10. What is your final set of fixed effects?
type your answer here


## Step 9. Fit the final model with REML. 
Check assumptions by plotting a histogram of residuals, plotting Pearson standardized residuals vs. fitted values, and plotting Pearson standardized residuals vs. explanatory variables. 
```{r check assumptions}
#type your code here

```

### Q11. Are there issues with the model? If so, how might you address them?
type your answer here


## Step 10. Interpret the model. 
The summary() command is useful here. 
```{r summary}
#type your code here

```

### Q12. What have you learned about American Foulbrood? 
type your answer here


Calculate the correlation between observations from the same hive as variance(fhive random effect)/(variance(fhive random effect) + variance(residual)). 
```{r correlation}
#type your code here

```

### Q13. Given the correlation among observations from the same hive, do you think it's a good use of time to sample each hive multiple times? Why or why not?