-
Notifications
You must be signed in to change notification settings - Fork 5
Convert independent_filtering_plots.Rnw to independent_filtering_plots.Rmd #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Changes from all commits
8ffc90d
89a5d7c
1e2c9ed
dc53ca8
17de333
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,200 @@ | ||
| --- | ||
| title: "Additional plots for: Independent filtering increases power for detecting differentially expressed genes, Bourgon et al., PNAS (2010)" | ||
| author: | ||
| - name: "Richard Bourgon" | ||
| - name: "Wolfgang Huber" | ||
| - name: "Paul Villafuerte" | ||
| affiliation: "Vignette translation from Sweave to Rmarkdown / HTML" | ||
| date: "`r format(Sys.time(), '%B %d, %Y')`" | ||
| vignette: > | ||
| %\VignetteEngine{knitr::rmarkdown} | ||
| %\VignetteIndexEntry{03 - Additional plots for: Independent filtering increases power for detecting differentially expressed genes, Bourgon et al., PNAS (2010)} | ||
| %\VignettePackage{genefilter} | ||
| %\VignetteEncoding{UTF-8} | ||
| output: | ||
| BiocStyle::html_document: | ||
| number_sections: yes | ||
| toc: yes | ||
| toc_depth: 4 | ||
| --- | ||
|
|
||
| ```{r functions, message=TRUE, warning=TRUE, include=FALSE} | ||
| ## Color Format | ||
| colFmt <- function(x,color) { | ||
| outputFormat <- knitr::opts_knit$get("rmarkdown.pandoc.to") | ||
| if(outputFormat == 'latex') { | ||
| ret <- paste("\\textcolor{",color,"}{",x,"}",sep="") | ||
| } else if(outputFormat == 'html') { | ||
| ret <- paste("<font color='",color,"'>",x,"</font>",sep="") | ||
| } else { | ||
| ret <- x | ||
| } | ||
| return(ret) | ||
| } | ||
| ``` | ||
|
|
||
| ```{r setup, echo=FALSE} | ||
| options(width=80 ) | ||
| ``` | ||
|
|
||
|
|
||
|
|
||
| # Introduction | ||
|
|
||
| This vignette illustrates use of some functions in the *genefilter* | ||
| package that provide useful diagnostics for independent | ||
| filtering. ^[Richard Bourgon, Robert Gentleman and Wolfgang Huber. Independent filtering | ||
| increases power for detecting differentially expressed genes.] | ||
|
|
||
| - `kappa_p` and `kappa_t` | ||
|
|
||
| - `filtered_p` and `filtered_R` | ||
|
|
||
| - `filter_volcano` | ||
|
|
||
| - `rejection_plot` | ||
|
|
||
| # Data preparation | ||
|
|
||
| Load the ALL data set and the *genefilter* package: | ||
|
|
||
| ```{r libraries, message=FALSE, warning=FALSE} | ||
| library("genefilter") | ||
| library("ALL") | ||
| data("ALL") | ||
| ``` | ||
|
|
||
| Reduce to just two conditions, then take a small subset of arrays from | ||
| these, with 3 arrays per condition: | ||
|
|
||
| ```{r sample_data, cache=TRUE} | ||
| bcell <- grep("^B", as.character(ALL$BT)) | ||
| moltyp <- which(as.character(ALL$mol.biol) %in% | ||
| c("NEG", "BCR/ABL")) | ||
| ALL_bcrneg <- ALL[, intersect(bcell, moltyp)] | ||
| ALL_bcrneg$mol.biol <- factor(ALL_bcrneg$mol.biol) | ||
| n1 <- n2 <- 3 | ||
| set.seed(1969) | ||
| use <- unlist(tapply(1:ncol(ALL_bcrneg), | ||
| ALL_bcrneg$mol.biol, sample, n1)) | ||
| subsample <- ALL_bcrneg[,use] | ||
| ``` | ||
|
|
||
| We now use functions from *genefilter* to compute overall standard | ||
| deviation filter statistics as well as standard two-sample $t$ and | ||
| related statistics. | ||
|
|
||
| ```{r stats, cache=TRUE} | ||
| S <- rowSds( exprs( subsample ) ) | ||
| temp <- rowttests( subsample, subsample$mol.biol ) | ||
| d <- temp$dm | ||
| p <- temp$p.value | ||
| t <- temp$statistic | ||
| ``` | ||
|
|
||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Flagging to remove the extra carriage return here |
||
| # Filtering volcano plot | ||
|
|
||
| Filtering on overall standard deviation and then using a standard | ||
| $t$-statistic induces a lower bound of fold change, albeit one which | ||
| varies somewhat with the significance of the $t$-statistic. The | ||
| `filter_volcano` function allows you to visualize this effect. | ||
|
|
||
| The output is shown in the left panel of Fig. \@ref(fig:filter-volcano). | ||
villafup marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| The `kappa_p` and `kappa_t` functions, used to make the volcano plot, | ||
| compute the fold change bound multiplier as a function of either a | ||
| $t$-test $p$-value or the $t$-statistic itself. The actual induced bound | ||
| on the fold change is $\kappa$ times the filter's cutoff on the overall | ||
| standard deviation. Note that fold change bounds for values of $|T|$ | ||
| which are close to 0 are not of practical interest because we will not | ||
| reject the null hypothesis with test statistics in this range. | ||
|
|
||
| The output is shown in the right panel of Fig. \@ref(fig:filter-volcano). | ||
|
|
||
villafup marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ```{r filter-volcano, fig.cap = "Left panel: plot produced by the filter_volcano function. Right panel: graph of the `kappa_t` function."} | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (You may have already been working with this) but it looks like this syntax is not shown in the original vignette PDF. Please add echo=FALSE to the chunk header to hide the code
villafup marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| par(mfrow=c(1, 2)) | ||
|
|
||
| S_cutoff <- quantile(S, .50) | ||
| ss <- filter_volcano(d, p, S, n1, n2, alpha=.01, S_cutoff) | ||
|
|
||
| t <- seq(0, 5, length=100) | ||
| plot(t, kappa_t(t, n1, n2) * S_cutoff, | ||
| xlab="|T|", ylab="Fold change bound", type="l") | ||
| ``` | ||
|
|
||
| # Rejection count plots | ||
|
|
||
| ## Across $p$-value cutoffs | ||
|
|
||
| The `filtered_p` function permits easy simultaneous calculation of | ||
| unadjusted or adjusted $p$-values over a range of filtering thresholds | ||
| ($\theta$). Here, we return to the full "BCR/ABL" versus "NEG" data set, | ||
| and compute adjusted $p$-values using the method of Benjamini and | ||
| Hochberg, for a range of different filter stringencies. | ||
|
|
||
| ```{r table} | ||
| table(ALL_bcrneg$mol.biol) | ||
| ``` | ||
|
|
||
| ```{r filtered_p, message=FALSE, warning=FALSE, include=TRUE} | ||
| S2 <- rowVars(exprs(ALL_bcrneg)) | ||
| p2 <- rowttests(ALL_bcrneg, "mol.biol")$p.value | ||
| theta <- seq(0, .5, .1) | ||
| p_bh <- filtered_p(S2, p2, theta, method="BH") | ||
| ``` | ||
|
|
||
| ```{r p_bh, echo=TRUE} | ||
| head(p_bh) | ||
| ``` | ||
|
|
||
| The `rejection_plot` function takes sets of $p$-values corresponding to | ||
| different filtering choices --- in the columns of a matrix or in a list | ||
| --- and shows how rejection count ($R$) relates to the choice of cutoff | ||
| for the $p$-values. For these data, over a reasonable range of FDR | ||
| cutoffs, increased filtering corresponds to increased rejections. | ||
|
|
||
| ```{r rejection-plot, fig.cap = "Plot produced by the `rejection_plot function`"} | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the vignette PDF, I see that "Figure 2" is in bold black font and the caption is black font whereas this caption has bold blue font as the caption text. Is this an easy edit? Not sure whether it would be a headache to match the original vignette PDF or not.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We might be able to accomplish this with the MiKTeX package, but do we want to have another library? |
||
| rejection_plot(p_bh, at="sample", | ||
| xlim=c(0,.3), ylim=c(0,1000), | ||
| main="Benjamini & Hochberg adjustment") | ||
| ``` | ||
|
|
||
| The plot is shown in Fig. \@ref(fig:rejection-plot). | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know we've had a difficult time as a group figuring out how to format these figures so that they're side-by-side. I checked in with Jen during our last Sweave meeting and because the different formatting doesn't significantly change reader's understanding, it sounds like we can keep them in current format, one on top of the other
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I changed the wording of some of the comments and captions. We should double-check to make sure they're right. Also, l moved the lines "The output is shown in Fig ..." above each graph to be consistent with the first graph. This line was below all the graphs except Figure 1. Let me know you would this editorial change changed back. |
||
|
|
||
| ## Across filtering fractions | ||
|
|
||
| If we select a fixed cutoff for the adjusted $p$-values, we can also | ||
| look more closely at the relationship between the fraction of null | ||
| hypotheses filtered and the total number of discoveries. The | ||
| `filtered_R` function wraps `filtered_p` and just returns rejection | ||
| counts. It requires a $p$-value cutoff. | ||
|
|
||
| ```{r filtered_R} | ||
| theta <- seq(0, .80, .01) | ||
| R_BH <- filtered_R(alpha=.10, S2, p2, theta, method="BH") | ||
| ``` | ||
|
|
||
| ```{r R_BH} | ||
| head(R_BH) | ||
| ``` | ||
|
|
||
| Because overfiltering (or use of a filter which is inappropriate for the | ||
| application domain) discards both false and true null hypotheses, very | ||
| large values of $\theta$ reduce power in this example: | ||
|
|
||
| ```{r filtered-R-plot, fig.cap="graph of theta."} | ||
villafup marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| plot(theta, R_BH, type="l", | ||
| xlab=expression(theta), ylab="Rejections", | ||
| main="BH cutoff = 0.1") | ||
| ``` | ||
|
|
||
| The plot is shown in Fig. \@ref(fig:filtered-R-plot). | ||
|
|
||
| # Session information {#session-information .unnumbered .unlisted} | ||
|
|
||
| ```{r sessionInfo, echo=FALSE} | ||
| sessionInfo() | ||
| ``` | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we know how to add a header that isn't reflected in the table of contents? Ideally, it would be nice to add a "References" header here to match the original vignette but I don't want it to show up in the TOC. This is something I was playing w/ in another vignette I converted and I have not come up with a clean answer. Do you have any ideas @villafup ?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added the reference as a side note under the "Introduction" section. We can add the reference at the bottom like in the original vignette, but I think we'd need to add another file to the vignette folder. It's definitely doable. |
||
Uh oh!
There was an error while loading. Please reload this page.