Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
200 changes: 200 additions & 0 deletions vignettes/independent_filtering_plots.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
---
title: "Additional plots for: Independent filtering increases power for detecting differentially expressed genes, Bourgon et al., PNAS (2010)"
author:
- name: "Richard Bourgon"
- name: "Wolfgang Huber"
- name: "Paul Villafuerte"
affiliation: "Vignette translation from Sweave to Rmarkdown / HTML"
date: "`r format(Sys.time(), '%B %d, %Y')`"
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{03 - Additional plots for: Independent filtering increases power for detecting differentially expressed genes, Bourgon et al., PNAS (2010)}
%\VignettePackage{genefilter}
%\VignetteEncoding{UTF-8}
output:
BiocStyle::html_document:
number_sections: yes
toc: yes
toc_depth: 4
---

```{r functions, message=TRUE, warning=TRUE, include=FALSE}
## Color Format
colFmt <- function(x,color) {
outputFormat <- knitr::opts_knit$get("rmarkdown.pandoc.to")
if(outputFormat == 'latex') {
ret <- paste("\\textcolor{",color,"}{",x,"}",sep="")
} else if(outputFormat == 'html') {
ret <- paste("<font color='",color,"'>",x,"</font>",sep="")
} else {
ret <- x
}
return(ret)
}
```

```{r setup, echo=FALSE}
options(width=80 )
```



# Introduction

This vignette illustrates use of some functions in the *genefilter*
package that provide useful diagnostics for independent
filtering. ^[Richard Bourgon, Robert Gentleman and Wolfgang Huber. Independent filtering
increases power for detecting differentially expressed genes.]

- `kappa_p` and `kappa_t`

- `filtered_p` and `filtered_R`

- `filter_volcano`

- `rejection_plot`

# Data preparation

Load the ALL data set and the *genefilter* package:

```{r libraries, message=FALSE, warning=FALSE}
library("genefilter")
library("ALL")
data("ALL")
```

Reduce to just two conditions, then take a small subset of arrays from
these, with 3 arrays per condition:

```{r sample_data, cache=TRUE}
bcell <- grep("^B", as.character(ALL$BT))
moltyp <- which(as.character(ALL$mol.biol) %in%
c("NEG", "BCR/ABL"))
ALL_bcrneg <- ALL[, intersect(bcell, moltyp)]
ALL_bcrneg$mol.biol <- factor(ALL_bcrneg$mol.biol)
n1 <- n2 <- 3
set.seed(1969)
use <- unlist(tapply(1:ncol(ALL_bcrneg),
ALL_bcrneg$mol.biol, sample, n1))
subsample <- ALL_bcrneg[,use]
```

We now use functions from *genefilter* to compute overall standard
deviation filter statistics as well as standard two-sample $t$ and
related statistics.

```{r stats, cache=TRUE}
S <- rowSds( exprs( subsample ) )
temp <- rowttests( subsample, subsample$mol.biol )
d <- temp$dm
p <- temp$p.value
t <- temp$statistic
```


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging to remove the extra carriage return here

# Filtering volcano plot

Filtering on overall standard deviation and then using a standard
$t$-statistic induces a lower bound of fold change, albeit one which
varies somewhat with the significance of the $t$-statistic. The
`filter_volcano` function allows you to visualize this effect.

The output is shown in the left panel of Fig. \@ref(fig:filter-volcano).

The `kappa_p` and `kappa_t` functions, used to make the volcano plot,
compute the fold change bound multiplier as a function of either a
$t$-test $p$-value or the $t$-statistic itself. The actual induced bound
on the fold change is $\kappa$ times the filter's cutoff on the overall
standard deviation. Note that fold change bounds for values of $|T|$
which are close to 0 are not of practical interest because we will not
reject the null hypothesis with test statistics in this range.

The output is shown in the right panel of Fig. \@ref(fig:filter-volcano).


```{r filter-volcano, fig.cap = "Left panel: plot produced by the filter_volcano function. Right panel: graph of the `kappa_t` function."}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(You may have already been working with this) but it looks like this syntax is not shown in the original vignette PDF. Please add echo=FALSE to the chunk header to hide the code

par(mfrow=c(1, 2))

S_cutoff <- quantile(S, .50)
ss <- filter_volcano(d, p, S, n1, n2, alpha=.01, S_cutoff)

t <- seq(0, 5, length=100)
plot(t, kappa_t(t, n1, n2) * S_cutoff,
xlab="|T|", ylab="Fold change bound", type="l")
```

# Rejection count plots

## Across $p$-value cutoffs

The `filtered_p` function permits easy simultaneous calculation of
unadjusted or adjusted $p$-values over a range of filtering thresholds
($\theta$). Here, we return to the full "BCR/ABL" versus "NEG" data set,
and compute adjusted $p$-values using the method of Benjamini and
Hochberg, for a range of different filter stringencies.

```{r table}
table(ALL_bcrneg$mol.biol)
```

```{r filtered_p, message=FALSE, warning=FALSE, include=TRUE}
S2 <- rowVars(exprs(ALL_bcrneg))
p2 <- rowttests(ALL_bcrneg, "mol.biol")$p.value
theta <- seq(0, .5, .1)
p_bh <- filtered_p(S2, p2, theta, method="BH")
```

```{r p_bh, echo=TRUE}
head(p_bh)
```

The `rejection_plot` function takes sets of $p$-values corresponding to
different filtering choices --- in the columns of a matrix or in a list
--- and shows how rejection count ($R$) relates to the choice of cutoff
for the $p$-values. For these data, over a reasonable range of FDR
cutoffs, increased filtering corresponds to increased rejections.

```{r rejection-plot, fig.cap = "Plot produced by the `rejection_plot function`"}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the vignette PDF, I see that "Figure 2" is in bold black font and the caption is black font whereas this caption has bold blue font as the caption text. Is this an easy edit? Not sure whether it would be a headache to match the original vignette PDF or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to accomplish this with the MiKTeX package, but do we want to have another library?

rejection_plot(p_bh, at="sample",
xlim=c(0,.3), ylim=c(0,1000),
main="Benjamini & Hochberg adjustment")
```

The plot is shown in Fig. \@ref(fig:rejection-plot).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we've had a difficult time as a group figuring out how to format these figures so that they're side-by-side. I checked in with Jen during our last Sweave meeting and because the different formatting doesn't significantly change reader's understanding, it sounds like we can keep them in current format, one on top of the other

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the wording of some of the comments and captions. We should double-check to make sure they're right.

Also, l moved the lines "The output is shown in Fig ..." above each graph to be consistent with the first graph. This line was below all the graphs except Figure 1. Let me know you would this editorial change changed back.


## Across filtering fractions

If we select a fixed cutoff for the adjusted $p$-values, we can also
look more closely at the relationship between the fraction of null
hypotheses filtered and the total number of discoveries. The
`filtered_R` function wraps `filtered_p` and just returns rejection
counts. It requires a $p$-value cutoff.

```{r filtered_R}
theta <- seq(0, .80, .01)
R_BH <- filtered_R(alpha=.10, S2, p2, theta, method="BH")
```

```{r R_BH}
head(R_BH)
```

Because overfiltering (or use of a filter which is inappropriate for the
application domain) discards both false and true null hypotheses, very
large values of $\theta$ reduce power in this example:

```{r filtered-R-plot, fig.cap="graph of theta."}
plot(theta, R_BH, type="l",
xlab=expression(theta), ylab="Rejections",
main="BH cutoff = 0.1")
```

The plot is shown in Fig. \@ref(fig:filtered-R-plot).

# Session information {#session-information .unnumbered .unlisted}

```{r sessionInfo, echo=FALSE}
sessionInfo()
```

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know how to add a header that isn't reflected in the table of contents? Ideally, it would be nice to add a "References" header here to match the original vignette but I don't want it to show up in the TOC. This is something I was playing w/ in another vignette I converted and I have not come up with a clean answer. Do you have any ideas @villafup ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the reference as a side note under the "Introduction" section. We can add the reference at the bottom like in the original vignette, but I think we'd need to add another file to the vignette folder. It's definitely doable.