-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME.Rmd
152 lines (122 loc) · 5.41 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
title: ICS Modeling with baseline
author: Greg Finak <gfinak@fredhutch.org>
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# Background
MIMOSA2 is an extension of the [http://github.com:RGLab/MIMOSA](MIMOSA) package for
modeling immune responses to antigen stimulate using intracellular cytokine staining data.
Input data are cell counts from ICS flow cytometry experiments. Specifically, the number of
cytokine positive cells, and the total number of cells.
Package usage is described below.
# Usage
```{r load}
library(MIMOSA2)
library(data.table)
library(ggplot2)
```
We simulate from the model.
```{r sim}
set.seed(100)
s = simulate_MIMOSA2(effect=2e-4, baseline_stim_effect=5e-5,baseline_background = 2e-5,bg_effect =2e-5,phi=c(50000,50000,50000,50000),P=100,rng=c(120000,150000))
```
`s` is a `list` with components `r names(s)`. `Ntot` is a matrix with column names "ns1" "nu1" "ns0" and "nu0".
These correspond to the total T-cell counts for stimulated (ns) and non-stimulated (nu) conditions at time 0 and time 1 (pre-vaccine and post-vaccine, respectively).
"ns0" through "nu1" are the positive cell counts.
`truth` is the true model component producing each observation.
There are eight different components in the model, each represents a different configuration of observed counts, producing
either response, non-response, or non-specific response.
```{r include=FALSE,simulation_freqs,echo=FALSE,results='asis'}
tbl = table(data.frame(s$truth))
tbl = data.frame(tbl)
setnames(tbl, c("Component","Nobs"))
knitr::kable(data.frame(tbl))
```
We fit the model with the `MIMOSA2` modeling function.
```{r fitmodel,fig.width=6,fig.height=6}
fit = with(s, MIMOSA2(Ntot=Ntot,ns1=ns1,nu1=nu1,ns0=ns0,nu0=nu0,tol=1e-5))
```
The output of MIMOSA2 is a list with components `r names(fit)`. "z" is matrix with the probability that each observation arises from each component. "inds" holds the hard assignment of each observation to each model component.
The confusion matrix from the model fit is shown below.
There is some confusion between the responder model components, and between the non-responder model components.
However, discrimination of responders from non-responders is generally good, which is the important part.
```{r confusion}
table(fit=c("R1","R2","R3","R4","NR1","NR2","NR3","NSR","NSR2","NSR3","NSR4")[max.col(fit$z)],truth = s$truth)
```
We plot the empirical proportions from stimulated and non-stimulated samples at each timepoint,
conditioning the true response status.
```{r,empirical_boxplots_vs_ps,message=FALSE,warning=FALSE}
p = with(s, {
p = do.call(cbind, mget(colnames(Ntot))) / Ntot
colnames(p) = c("pu1", "ps1", "pu0", "ps0")
data.frame(p, truth=truth%like%"^R")
})
ggplot(melt(p, value.var = c("pu1", "ps1", "pu0", "ps0"))) +
geom_boxplot(outlier.color = NA) +
facet_wrap( ~ variable) +
geom_jitter() +
aes(x = truth, y = value)+theme_classic()
```
Here we show the empirical background-corrected and baseline-subtracted proportions.
```{r empiricalboxplots_vs_truth,message=FALSE,warning=FALSE}
ggplot(data.frame(p,ns1=s$ns1))+
geom_boxplot(outlier.color=NA)+
geom_jitter(aes(col=getResponse(fit,threshold=0.01)))+
aes(x=truth,y=ps1-pu1-ps0+pu0)+
scale_color_discrete("Predicted Response\n(1% FDR)",labels=c("NR","R")) +
theme_classic() +
scale_x_discrete(labels=c("NR","R")) +
scale_y_continuous("Background and baseline adjusted proportion.")
```
We can generate ROCs comparing MIMOSA2 positivity calls against a test for a difference in log-odds ratios.
The ROC curves show that MIMOSA2 performs better than a one-sided test for a difference in log-odds ratios.
```{r ROC,fig.width=6,fig.height=6}
ortest = with(s,ORTest(Ntot,ns1,nu1,ns0,nu0))
toplot = ROC(or_test=ortest, fit=fit,truth=s$truth%like%"^R")
ROCPlot(toplot)+guides(color=guide_legend(nrow=2))
```
```{r include=FALSE,echo=FALSE,ratevthreshold,fig.width=6,fig.height=4.5}
fdr = function (z)
{
fdr <- rep(0, nrow(z))
o <- order(z[, 2], decreasing = TRUE)
fdr[o] <- (cumsum(z[o, 1])/1:nrow(z))
return(fdr)
}
RRvTh = function(thresh,truth,add=FALSE,lty=3,...){
th = thresh
o = order(th,decreasing=FALSE)
tpc = cumsum(truth[o])
fpc = cumsum(!truth[o])
tpc = tpc/sum(truth)
fpc = fpc/sum(!truth)
if(add){
lines(y=tpc,th[o],type="l",ylab="Rate",xlab="FDR Threshold",col="blue",lty=ifelse(add,lty,1))
}else{
plot(y=tpc,th[o],type="l",ylab="Rate",xlab="FDR Threshold",col="blue",...)
}
lines(y=fpc,th[o],type="l",col="red",lty=ifelse(add,lty,1))
}
RRvTh(fdr(cbind(1-rowSums(fit$z[,1:4]),rowSums(fit$z[,1:4]))),s$truth%in%c("R1","R2","R3","R4"),xlim=c(0,0.1),ylim=c(0,1))
RRvTh(ortest,s$truth%in%c("R1","R2","R3","R4"),add=TRUE)
legend("topright",lty=c(1,2),c("MIMOSA","OR Test"))
```
MIMOSA2 also provides better control of the nominal false discovery rate.
```{r,echo=FALSE,nomfdr,fig.width=6,fig.height=6}
fdr_th = getFDR(fit)
o = order(fdr_th,decreasing=FALSE)
plot(y=fdr_th[o],x=cumsum(!s$truth[o]%like%c("^R"))/1:length(s$truth),type="l",ylab="observed fdr",xlab="nominal fdr",xlim=c(0,1),ylim=c(0,1),col="blue")
abline(0,1,lty=2)
o = order(ortest,decreasing=FALSE)
lines(y=ortest[o],x=cumsum(!s$truth[o]%like%c("^R"))/1:100,type="l",ylab="observed fdr",xlab="nominal fdr",col="red")
abline(0,1,lty=2)
legend("bottomrigh",c("MIMOSA2","ORTest"),col=c("blue","red"),lty=1)
```