diff --git a/02-setup.md b/02-setup.md index dbecd2e5..abc266fe 100644 --- a/02-setup.md +++ b/02-setup.md @@ -182,7 +182,7 @@ The data file is located at https://github.com/carpentries-incubator/bioc-rnaseq We shall save the downloaded file in the `data` folder of our working directory with the name `GSE96870_counts_cerebellum.csv`. -```r +``` r download.file( url = "https://github.com/carpentries-incubator/bioc-rnaseq/raw/main/episodes/data/GSE96870_counts_cerebellum.csv", destfile = "data/GSE96870_counts_cerebellum.csv" @@ -215,7 +215,7 @@ working directory. ::::::::::::::::::::::::::::::::::::: solution -```r +``` r download.file( url = "https://github.com/carpentries-incubator/bioc-rnaseq/raw/main/episodes/data/GSE96870_coldata_cerebellum.csv", destfile = "data/GSE96870_coldata_cerebellum.csv" diff --git a/03-import-annotate.md b/03-import-annotate.md index f989d85a..c56e23bd 100644 --- a/03-import-annotate.md +++ b/03-import-annotate.md @@ -27,7 +27,7 @@ exercises: 40 In this episode we will use some functions from add-on R packages. In order to use them, we need to load them from our `library`: -```r +``` r suppressPackageStartupMessages({ library(AnnotationDbi) library(org.Mm.eg.db) @@ -47,17 +47,17 @@ Let's read in the data files that we downloaded in the last episode and start to ### Counts -```r +``` r counts <- read.csv("data/GSE96870_counts_cerebellum.csv", row.names = 1) dim(counts) ``` -```{.output} +``` output [1] 41786 22 ``` -```r +``` r # View(counts) ``` @@ -68,17 +68,17 @@ Genes are in rows and samples are in columns, so we have counts for 41,786 genes Next read in the sample annotations. Because samples are in columns in the count matrix, we will name the object `coldata`: -```r +``` r coldata <- read.csv("data/GSE96870_coldata_cerebellum.csv", row.names = 1) dim(coldata) ``` -```{.output} +``` output [1] 22 10 ``` -```r +``` r # View(coldata) ``` @@ -88,7 +88,7 @@ Now samples are in rows with the GEO sample IDs as the rownames, and we have 10 The counts only have gene symbols, which while short and somewhat recognizable to the human brain, are not always good absolute identifiers for exactly what gene was measured. For this we need additional gene annotations that were provided by the authors. The `count` and `coldata` files were in comma separated value (.csv) format, but we cannot use that for our gene annotation file because the descriptions can contain commas that would prevent a .csv file from being read in correctly. Instead the gene annotation file is in tab separated value (.tsv) format. Likewise, the descriptions can contain the single quote `'` (e.g., 5'), which by default R assumes indicates a character entry. So we have to use a more generic function `read.delim()` with extra arguments to specify that we have tab-separated data (`sep = "\t"`) with no quotes used (`quote = ""`). We also put in other arguments to specify that the first row contains our column names (`header = TRUE`), the gene symbols that should be our `row.names` are in the 5th column (`row.names = 5`), and that NCBI's species-specific gene ID (i.e., ENTREZID) should be read in as character data even though they look like numbers (`colClasses` argument). You can look up this details on available arguments by simply entering the function name starting with question mark. (e.g., `?read.delim`) -```r +``` r rowranges <- read.delim("data/GSE96870_rowranges.tsv", sep = "\t", colClasses = c(ENTREZID = "character"), @@ -98,11 +98,11 @@ rowranges <- read.delim("data/GSE96870_rowranges.tsv", dim(rowranges) ``` -```{.output} +``` output [1] 41786 7 ``` -```r +``` r # View(rowranges) ``` @@ -113,11 +113,11 @@ useful for the downstream analysis. For example, from the `gbkey` column, we can check what types of genes and how many of them are in our dataset: -```r +``` r table(rowranges$gbkey) ``` -```{.output} +``` output C_region D_segment exon J_segment misc_RNA 20 23 4008 94 1988 @@ -163,23 +163,23 @@ Before we put them together, you ABSOLUTELY MUST MAKE SURE THE SAMPLES AND GENES -```r +``` r all.equal(colnames(counts), rownames(coldata)) # samples ``` -```{.output} +``` output [1] TRUE ``` -```r +``` r all.equal(rownames(counts), rownames(rowranges)) # genes ``` -```{.output} +``` output [1] TRUE ``` -```r +``` r # If the first is not TRUE, you can match up the samples/columns in # counts with the samples/rows in coldata like this (which is fine # to run even if the first was TRUE): @@ -191,7 +191,7 @@ coldata <- coldata[tempindex, ] all.equal(colnames(counts), rownames(coldata)) ``` -```{.output} +``` output [1] TRUE ``` @@ -206,7 +206,7 @@ Write the codes. ::::::::::::::::::::::::::::::::::: solution -```r +``` r tempindex <- match(rownames(counts), rownames(rowranges)) rowranges <- rowranges[tempindex, ] @@ -221,7 +221,7 @@ Once we have verified that samples and genes are in the same order, we can then create our `SummarizedExperiment` object. -```r +``` r # One final check: stopifnot(rownames(rowranges) == rownames(counts), # features rownames(coldata) == colnames(counts)) # samples @@ -238,7 +238,7 @@ Because matching the genes and samples is so important, the `SummarizedExperimen genes/samples and the sample/row names match. If not, you will get some error messages: -```r +``` r # wrong number of samples: bad1 <- SummarizedExperiment( @@ -248,14 +248,14 @@ bad1 <- SummarizedExperiment( ) ``` -```{.error} +``` error Error in validObject(.Object): invalid class "SummarizedExperiment" object: nb of cols in 'assay' (22) must equal nb of rows in 'colData' (3) ``` -```r +``` r # same number of genes but in different order: bad2 <- SummarizedExperiment( @@ -265,7 +265,7 @@ bad2 <- SummarizedExperiment( ) ``` -```{.error} +``` error Error in SummarizedExperiment(assays = list(counts = as.matrix(counts)), : the rownames and colnames of the supplied assay(s) must be NULL or identical to those of the RangedSummarizedExperiment object (or derivative) to construct @@ -276,12 +276,12 @@ Error in SummarizedExperiment(assays = list(counts = as.matrix(counts)), : the r A brief recap of how to access the various data slots in a `SummarizedExperiment` and how to make some manipulations: -```r +``` r # Access the counts head(assay(se)) ``` -```{.output} +``` output GSM2545336 GSM2545337 GSM2545338 GSM2545339 GSM2545340 GSM2545341 Xkr4 1891 2410 2159 1980 1977 1945 LOC105243853 0 0 1 4 0 0 @@ -312,15 +312,15 @@ Rp1 3 3 1 0 Sox17 273 197 310 246 ``` -```r +``` r dim(assay(se)) ``` -```{.output} +``` output [1] 41786 22 ``` -```r +``` r # The above works now because we only have one assay, "counts" # But if there were more than one assay, we would have to specify # which one like so: @@ -328,7 +328,7 @@ dim(assay(se)) head(assay(se, "counts")) ``` -```{.output} +``` output GSM2545336 GSM2545337 GSM2545338 GSM2545339 GSM2545340 GSM2545341 Xkr4 1891 2410 2159 1980 1977 1945 LOC105243853 0 0 1 4 0 0 @@ -359,12 +359,12 @@ Rp1 3 3 1 0 Sox17 273 197 310 246 ``` -```r +``` r # Access the sample annotations colData(se) ``` -```{.output} +``` output DataFrame with 22 rows and 10 columns title geo_accession organism age sex @@ -394,20 +394,20 @@ GSM2545363 InfluenzaA C57BL/6 Day4 Cerebellum 12 GSM2545380 InfluenzaA C57BL/6 Day8 Cerebellum 19 ``` -```r +``` r dim(colData(se)) ``` -```{.output} +``` output [1] 22 10 ``` -```r +``` r # Access the gene annotations head(rowData(se)) ``` -```{.output} +``` output DataFrame with 6 rows and 3 columns ENTREZID product gbkey @@ -419,22 +419,22 @@ Rp1 19888 retinitis pigmentosa.. mRNA Sox17 20671 SRY (sex determining.. mRNA ``` -```r +``` r dim(rowData(se)) ``` -```{.output} +``` output [1] 41786 3 ``` -```r +``` r # Make better sample IDs that show sex, time and mouse ID: se$Label <- paste(se$sex, se$time, se$mouse, sep = "_") se$Label ``` -```{.output} +``` output [1] "Female_Day8_14" "Female_Day0_9" "Female_Day0_10" "Female_Day4_15" [5] "Male_Day4_18" "Male_Day8_6" "Female_Day8_5" "Male_Day0_11" [9] "Female_Day4_22" "Male_Day4_13" "Male_Day8_23" "Male_Day8_24" @@ -443,7 +443,7 @@ se$Label [21] "Male_Day4_12" "Female_Day8_19" ``` -```r +``` r colnames(se) <- se$Label # Our samples are not in order based on sex and time @@ -451,7 +451,7 @@ se$Group <- paste(se$sex, se$time, sep = "_") se$Group ``` -```{.output} +``` output [1] "Female_Day8" "Female_Day0" "Female_Day0" "Female_Day4" "Male_Day4" [6] "Male_Day8" "Female_Day8" "Male_Day0" "Female_Day4" "Male_Day4" [11] "Male_Day8" "Male_Day8" "Female_Day0" "Male_Day0" "Male_Day4" @@ -459,7 +459,7 @@ se$Group [21] "Male_Day4" "Female_Day8" ``` -```r +``` r # change this to factor data with the levels in order # that we want, then rearrange the se object: @@ -470,7 +470,7 @@ se <- se[, order(se$Group)] colData(se) ``` -```{.output} +``` output DataFrame with 22 rows and 12 columns title geo_accession organism age @@ -513,7 +513,7 @@ Male_Day8_23 23 Male_Day8_23 Male_Day8 Male_Day8_24 24 Male_Day8_24 Male_Day8 ``` -```r +``` r # Finally, also factor the Label column to keep in order in plots: se$Label <- factor(se$Label, levels = se$Label) @@ -535,18 +535,18 @@ of expression levels for infected and non-infected samples based on these genes. ::::::::::::::::::::::::::::::::::: solution -```r +``` r # 1 table(se$infection) ``` -```{.output} +``` output InfluenzaA NonInfected 15 7 ``` -```r +``` r # 2 se_infected <- se[, se$infection == "InfluenzaA"] se_noninfected <- se[, se$infection == "NonInfected"] @@ -557,26 +557,26 @@ means_noninfected <- rowMeans(assay(se_noninfected)[1:500, ]) summary(means_infected) ``` -```{.output} +``` output Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 0.133 2.867 764.068 337.417 18896.600 ``` -```r +``` r summary(means_noninfected) ``` -```{.output} +``` output Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 0.143 3.143 771.017 366.571 20010.143 ``` -```r +``` r # 3 ncol(se[, se$sex == "Female" & se$infection == "InfluenzaA" & se$time == "Day8"]) ``` -```{.output} +``` output [1] 4 ``` @@ -588,7 +588,7 @@ ncol(se[, se$sex == "Female" & se$infection == "InfluenzaA" & se$time == "Day8"] This was a bit of code and time to create our `SummarizedExperiment` object. We will need to keep using it throughout the workshop, so it can be useful to save it as an actual single file on our computer to read it back in to R's memory if we have to shut down RStudio. To save an R-specific file we can use the `saveRDS()` function and later read it back into R using the `readRDS()` function. -```r +``` r saveRDS(se, "data/GSE96870_se.rds") rm(se) # remove the object! se <- readRDS("data/GSE96870_se.rds") @@ -613,12 +613,12 @@ Before, we conceptually discussed subsetting to only the mRNA genes. Now that we ::::::::::::::::::::::::::::::::::: solution -```r +``` r se_mRNA <- se[rowData(se)$gbkey == "mRNA" , ] dim(se_mRNA) ``` -```{.output} +``` output [1] 21198 22 ``` @@ -656,15 +656,15 @@ Where - *keytype* is the type of key used -```r +``` r mapIds(org.Mm.eg.db, keys = "497097", column = "SYMBOL", keytype = "ENTREZID") ``` -```{.output} +``` output 'select()' returned 1:1 mapping between keys and columns ``` -```{.output} +``` output 497097 "Xkr4" ``` @@ -675,23 +675,23 @@ The below example demonstrate this functionality using the `hgu95av2.db` package, an Affymetrix Human Genome U95 Set annotation data. -```r +``` r keys <- head(keys(hgu95av2.db, "ENTREZID")) last <- function(x){x[[length(x)]]} mapIds(hgu95av2.db, keys = keys, column = "ALIAS", keytype = "ENTREZID") ``` -```{.output} +``` output 'select()' returned 1:many mapping between keys and columns ``` -```{.output} +``` output 10 100 1000 10000 100008586 10001 "AAC2" "ADA1" "ACOGS" "MPPH" "AL4" "ARC33" ``` -```r +``` r # When there is 1:many mapping, the default behavior was # to output the first match. This can be changed to a function, # which we defined above to give us the last match: @@ -699,26 +699,26 @@ mapIds(hgu95av2.db, keys = keys, column = "ALIAS", keytype = "ENTREZID") mapIds(hgu95av2.db, keys = keys, column = "ALIAS", keytype = "ENTREZID", multiVals = last) ``` -```{.output} +``` output 'select()' returned 1:many mapping between keys and columns ``` -```{.output} +``` output 10 100 1000 10000 100008586 10001 "NAT2" "ADA" "CDH2" "AKT3" "GAGE12F" "MED6" ``` -```r +``` r # Or we can get back all the many mappings: mapIds(hgu95av2.db, keys = keys, column = "ALIAS", keytype = "ENTREZID", multiVals = "list") ``` -```{.output} +``` output 'select()' returned 1:many mapping between keys and columns ``` -```{.output} +``` output $`10` [1] "AAC2" "NAT-2" "PNAT" "NAT2" @@ -747,14 +747,14 @@ $`10001` ## Session info -```r +``` r sessionInfo() ``` -```{.output} -R version 4.3.2 (2023-10-31) -Platform: x86_64-pc-linux-gnu (64-bit) -Running under: Ubuntu 22.04.4 LTS +``` output +R version 4.4.1 (2024-06-14) +Platform: x86_64-pc-linux-gnu +Running under: Ubuntu 22.04.5 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 @@ -774,28 +774,28 @@ attached base packages: [8] base other attached packages: - [1] hgu95av2.db_3.13.0 org.Hs.eg.db_3.17.0 - [3] org.Mm.eg.db_3.17.0 AnnotationDbi_1.62.2 - [5] SummarizedExperiment_1.30.2 Biobase_2.60.0 - [7] MatrixGenerics_1.12.3 matrixStats_1.2.0 - [9] GenomicRanges_1.52.1 GenomeInfoDb_1.36.4 -[11] IRanges_2.34.1 S4Vectors_0.38.2 -[13] BiocGenerics_0.46.0 knitr_1.45 + [1] hgu95av2.db_3.13.0 org.Hs.eg.db_3.19.1 + [3] org.Mm.eg.db_3.19.1 AnnotationDbi_1.66.0 + [5] SummarizedExperiment_1.34.0 Biobase_2.64.0 + [7] MatrixGenerics_1.16.0 matrixStats_1.4.1 + [9] GenomicRanges_1.56.1 GenomeInfoDb_1.40.1 +[11] IRanges_2.38.1 S4Vectors_0.42.1 +[13] BiocGenerics_0.50.0 knitr_1.48 loaded via a namespace (and not attached): - [1] Matrix_1.6-5 bit_4.0.5 highr_0.10 - [4] compiler_4.3.2 BiocManager_1.30.22 renv_1.0.5 - [7] crayon_1.5.2 blob_1.2.4 Biostrings_2.68.1 -[10] bitops_1.0-7 png_0.1-8 fastmap_1.1.1 -[13] yaml_2.3.8 lattice_0.22-5 R6_2.5.1 -[16] XVector_0.40.0 S4Arrays_1.0.6 DelayedArray_0.26.7 -[19] GenomeInfoDbData_1.2.10 DBI_1.2.2 rlang_1.1.3 -[22] KEGGREST_1.40.1 cachem_1.0.8 xfun_0.42 -[25] bit64_4.0.5 RSQLite_2.3.5 memoise_2.0.1 -[28] cli_3.6.2 zlibbioc_1.46.0 grid_4.3.2 -[31] vctrs_0.6.5 evaluate_0.23 abind_1.4-5 -[34] RCurl_1.98-1.14 httr_1.4.7 pkgconfig_2.0.3 -[37] tools_4.3.2 + [1] Matrix_1.7-0 bit_4.5.0 jsonlite_1.8.9 + [4] highr_0.11 compiler_4.4.1 BiocManager_1.30.25 + [7] renv_1.0.11 crayon_1.5.3 blob_1.2.4 +[10] Biostrings_2.72.1 png_0.1-8 fastmap_1.2.0 +[13] yaml_2.3.10 lattice_0.22-6 R6_2.5.1 +[16] XVector_0.44.0 S4Arrays_1.4.1 DelayedArray_0.30.1 +[19] GenomeInfoDbData_1.2.12 DBI_1.2.3 rlang_1.1.4 +[22] KEGGREST_1.44.1 cachem_1.1.0 xfun_0.47 +[25] bit64_4.0.5 memoise_2.0.1 SparseArray_1.4.8 +[28] RSQLite_2.3.7 cli_3.6.3 zlibbioc_1.50.0 +[31] grid_4.4.1 vctrs_0.6.5 evaluate_1.0.0 +[34] abind_1.4-8 httr_1.4.7 pkgconfig_2.0.3 +[37] tools_4.4.1 UCSC.utils_1.0.0 ``` ::: keypoints diff --git a/04-exploratory-qc.md b/04-exploratory-qc.md index 7716a4a4..33e0843f 100644 --- a/04-exploratory-qc.md +++ b/04-exploratory-qc.md @@ -31,7 +31,7 @@ Assuming you just started RStudio again, load some packages we will use in this -```r +``` r suppressPackageStartupMessages({ library(SummarizedExperiment) library(DESeq2) @@ -45,7 +45,7 @@ suppressPackageStartupMessages({ ``` -```r +``` r se <- readRDS("data/GSE96870_se.rds") ``` @@ -59,21 +59,21 @@ These tools are in no way limited to (or developed for) analysis of RNA-seq data However, there are certain characteristics of count assays that need to be taken into account when they are applied to this type of data. First of all, not all mouse genes in the genome will be expressed in our Cerebellum samples. There are many different threshold you could use to say whether a gene's expression was detectable or not; here we are going to use a very minimal one that if a gene does not have more than 5 counts total across all samples, there is simply not enough data to be able to do anything with it anyway. -```r +``` r nrow(se) ``` -```{.output} +``` output [1] 41786 ``` -```r +``` r # Remove genes/rows that do not have > 5 total counts se <- se[rowSums(assay(se, "counts")) > 5, ] nrow(se) ``` -```{.output} +``` output [1] 27430 ``` @@ -94,11 +94,11 @@ Last episode we discussed subsetting down to only mRNA genes. Here we subsetted 1. -```r +``` r table(rowData(se)$gbkey) ``` -```{.output} +``` output C_region exon J_segment misc_RNA mRNA 14 1765 14 1539 16859 @@ -108,27 +108,27 @@ table(rowData(se)$gbkey) 2. -```r +``` r nrow(se) # represents the number of genes using 5 as filtering threshold ``` -```{.output} +``` output [1] 27430 ``` -```r +``` r length(which(rowSums(assay(se, "counts")) > 10)) ``` -```{.output} +``` output [1] 25736 ``` -```r +``` r length(which(rowSums(assay(se, "counts")) > 20)) ``` -```{.output} +``` output [1] 23860 ``` @@ -150,7 +150,7 @@ Differences in the total number of reads assigned to genes between samples typic In the rest of this section, we will use the term *library size* to refer to the total number of reads assigned to genes for a sample. First we should compare the library sizes of all samples. -```r +``` r # Add in the sum of all counts se$libSize <- colSums(assay(se)) @@ -183,16 +183,16 @@ The latter is important due to the compositional nature of RNA-seq data. There is a fixed number of reads to distribute between the genes, and if a single (or a few) very highly expressed gene consume a large part of the reads, all other genes will consequently receive very low counts. We now switch our `SummarizedExperiment` object over to a `DESeqDataSet` as it has the internal structure to store these size factors. We also need to tell it our main experiment design, which is sex and time: -```r +``` r dds <- DESeq2::DESeqDataSet(se, design = ~ sex + time) ``` -```{.warning} +``` warning Warning in DESeq2::DESeqDataSet(se, design = ~sex + time): some variables in design formula are characters, converting to factors ``` -```r +``` r dds <- estimateSizeFactors(dds) # Plot the size factors against library size @@ -216,7 +216,7 @@ For read count data such as RNA-seq, this is not the case. In fact, the variance increases with the average read count. -```r +``` r meanSdPlot(assay(dds), ranks = FALSE) ``` @@ -226,7 +226,7 @@ There are two ways around this: either we develop methods specifically adapted t Both ways have been explored; however, at the moment the second approach is arguably more widely applied in practice. We can transform our data using DESeq2's variance stabilizing transformation and then verify that it has removed the correlation between average read count and variance. -```r +``` r vsd <- DESeq2::vst(dds, blind = TRUE) meanSdPlot(assay(vsd), ranks = FALSE) ``` @@ -238,7 +238,7 @@ meanSdPlot(assay(vsd), ranks = FALSE) There are many ways to cluster samples based on their similarity of expression patterns. One simple way is to calculate Euclidean distances between all pairs of samples (longer distance = more different) and then display the results with both a branching dendrogram and a heatmap to visualize the distances in color. From this, we infer that the Day 8 samples are more similar to each other than the rest of the samples, although Day 4 and Day 0 do not separate distinctly. Instead, males and females reliably separate. -```r +``` r dst <- dist(t(assay(vsd))) colors <- colorRampPalette(brewer.pal(9, "Blues"))(255) ComplexHeatmap::Heatmap( @@ -269,9 +269,16 @@ By definition, the first principal component will always represent more of the v The fraction of explained variance is a measure of how much of the 'signal' in the data that is retained when we project the samples from the original, high-dimensional space to the low-dimensional space for visualization. -```r +``` r pcaData <- DESeq2::plotPCA(vsd, intgroup = c("sex", "time"), returnData = TRUE) +``` + +``` output +using ntop=500 top features by variance +``` + +``` r percentVar <- round(100 * attr(pcaData, "percentVar")) ggplot(pcaData, aes(x = PC1, y = PC2)) + geom_point(aes(color = sex, shape = time), size = 5) + @@ -292,6 +299,11 @@ ggplot(pcaData, aes(x = PC1, y = PC2)) + 2. Consider an experimental design where you have multiple samples from the same donor. You are still interested in differences by time and observe the following PCA plot. What does this PCA plot suggest? + +``` output +using ntop=500 top features by variance +``` + @@ -325,9 +337,16 @@ Compare before and after variance stabilizing transformation. ::::::::::::::::::::::::::::::::::: solution -```r +``` r pcaDataVst <- DESeq2::plotPCA(vsd, intgroup = c("libSize"), returnData = TRUE) +``` + +``` output +using ntop=500 top features by variance +``` + +``` r percentVar <- round(100 * attr(pcaDataVst, "percentVar")) ggplot(pcaDataVst, aes(x = PC1, y = PC2)) + geom_point(aes(color = libSize / 1e6), size = 5) + @@ -342,9 +361,16 @@ ggplot(pcaDataVst, aes(x = PC1, y = PC2)) + -```r +``` r pcaDataCts <- DESeq2::plotPCA(DESeqTransform(se), intgroup = c("libSize"), returnData = TRUE) +``` + +``` output +using ntop=500 top features by variance +``` + +``` r percentVar <- round(100 * attr(pcaDataCts, "percentVar")) ggplot(pcaDataCts, aes(x = PC1, y = PC2)) + geom_point(aes(color = libSize / 1e6), size = 5) + @@ -371,7 +397,7 @@ Useful tools for interactive exploratory data analysis for RNA-seq are [Glimma]( ## Challenge: Interactively explore our data using iSEE -```r +``` r ## Convert DESeqDataSet object to a SingleCellExperiment object, in order to ## be able to store the PCA representation sce <- as(dds, "SingleCellExperiment") @@ -395,14 +421,14 @@ shiny::runApp(app) ## Session info -```r +``` r sessionInfo() ``` -```{.output} -R version 4.3.2 (2023-10-31) -Platform: x86_64-pc-linux-gnu (64-bit) -Running under: Ubuntu 22.04.4 LTS +``` output +R version 4.4.1 (2024-06-14) +Platform: x86_64-pc-linux-gnu +Running under: Ubuntu 22.04.5 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 @@ -422,48 +448,49 @@ attached base packages: [8] methods base other attached packages: - [1] iSEE_2.12.0 SingleCellExperiment_1.22.0 - [3] hexbin_1.28.3 RColorBrewer_1.1-3 - [5] ComplexHeatmap_2.16.0 ggplot2_3.5.0 - [7] vsn_3.68.0 DESeq2_1.40.2 - [9] SummarizedExperiment_1.30.2 Biobase_2.60.0 -[11] MatrixGenerics_1.12.3 matrixStats_1.2.0 -[13] GenomicRanges_1.52.1 GenomeInfoDb_1.36.4 -[15] IRanges_2.34.1 S4Vectors_0.38.2 -[17] BiocGenerics_0.46.0 + [1] iSEE_2.16.0 SingleCellExperiment_1.26.0 + [3] hexbin_1.28.4 RColorBrewer_1.1-3 + [5] ComplexHeatmap_2.20.0 ggplot2_3.5.1 + [7] vsn_3.72.0 DESeq2_1.44.0 + [9] SummarizedExperiment_1.34.0 Biobase_2.64.0 +[11] MatrixGenerics_1.16.0 matrixStats_1.4.1 +[13] GenomicRanges_1.56.1 GenomeInfoDb_1.40.1 +[15] IRanges_2.38.1 S4Vectors_0.42.1 +[17] BiocGenerics_0.50.0 loaded via a namespace (and not attached): - [1] bitops_1.0-7 rlang_1.1.3 magrittr_2.0.3 - [4] shinydashboard_0.7.2 clue_0.3-65 GetoptLong_1.0.5 - [7] compiler_4.3.2 mgcv_1.9-1 png_0.1-8 -[10] vctrs_0.6.5 pkgconfig_2.0.3 shape_1.4.6.1 -[13] crayon_1.5.2 fastmap_1.1.1 XVector_0.40.0 -[16] ellipsis_0.3.2 labeling_0.4.3 utf8_1.2.4 -[19] promises_1.2.1 preprocessCore_1.62.1 shinyAce_0.4.2 -[22] xfun_0.42 cachem_1.0.8 zlibbioc_1.46.0 -[25] jsonlite_1.8.8 highr_0.10 later_1.3.2 -[28] DelayedArray_0.26.7 BiocParallel_1.34.2 parallel_4.3.2 -[31] cluster_2.1.6 R6_2.5.1 bslib_0.6.1 -[34] limma_3.56.2 jquerylib_0.1.4 Rcpp_1.0.12 -[37] iterators_1.0.14 knitr_1.45 httpuv_1.6.14 -[40] Matrix_1.6-5 splines_4.3.2 igraph_2.0.2 -[43] tidyselect_1.2.0 abind_1.4-5 yaml_2.3.8 -[46] doParallel_1.0.17 codetools_0.2-19 affy_1.78.2 -[49] miniUI_0.1.1.1 lattice_0.22-5 tibble_3.2.1 -[52] shiny_1.8.0 withr_3.0.0 evaluate_0.23 -[55] circlize_0.4.16 pillar_1.9.0 affyio_1.70.0 -[58] BiocManager_1.30.22 renv_1.0.5 DT_0.32 + [1] rlang_1.1.4 magrittr_2.0.3 shinydashboard_0.7.2 + [4] clue_0.3-65 GetoptLong_1.0.5 compiler_4.4.1 + [7] mgcv_1.9-1 png_0.1-8 vctrs_0.6.5 +[10] pkgconfig_2.0.3 shape_1.4.6.1 crayon_1.5.3 +[13] fastmap_1.2.0 XVector_0.44.0 labeling_0.4.3 +[16] utf8_1.2.4 promises_1.3.0 shinyAce_0.4.2 +[19] UCSC.utils_1.0.0 preprocessCore_1.66.0 xfun_0.47 +[22] cachem_1.1.0 zlibbioc_1.50.0 jsonlite_1.8.9 +[25] listviewer_4.0.0 highr_0.11 later_1.3.2 +[28] DelayedArray_0.30.1 BiocParallel_1.38.0 parallel_4.4.1 +[31] cluster_2.1.6 R6_2.5.1 bslib_0.8.0 +[34] limma_3.60.4 jquerylib_0.1.4 Rcpp_1.0.13 +[37] iterators_1.0.14 knitr_1.48 httpuv_1.6.15 +[40] Matrix_1.7-0 splines_4.4.1 igraph_2.0.3 +[43] tidyselect_1.2.1 abind_1.4-8 yaml_2.3.10 +[46] doParallel_1.0.17 codetools_0.2-20 affy_1.82.0 +[49] miniUI_0.1.1.1 lattice_0.22-6 tibble_3.2.1 +[52] shiny_1.9.1 withr_3.0.1 evaluate_1.0.0 +[55] circlize_0.4.16 pillar_1.9.0 affyio_1.74.0 +[58] BiocManager_1.30.25 renv_1.0.11 DT_0.33 [61] foreach_1.5.2 shinyjs_2.1.0 generics_0.1.3 -[64] RCurl_1.98-1.14 munsell_0.5.0 scales_1.3.0 -[67] xtable_1.8-4 glue_1.7.0 tools_4.3.2 -[70] colourpicker_1.3.0 locfit_1.5-9.9 colorspace_2.1-0 -[73] nlme_3.1-164 GenomeInfoDbData_1.2.10 vipor_0.4.7 -[76] cli_3.6.2 fansi_1.0.6 viridisLite_0.4.2 -[79] S4Arrays_1.0.6 dplyr_1.1.4 gtable_0.3.4 -[82] rintrojs_0.3.4 sass_0.4.8 digest_0.6.34 -[85] ggrepel_0.9.5 farver_2.1.1 rjson_0.2.21 -[88] htmlwidgets_1.6.4 htmltools_0.5.7 lifecycle_1.0.4 -[91] shinyWidgets_0.8.2 GlobalOptions_0.1.2 mime_0.12 +[64] munsell_0.5.1 scales_1.3.0 xtable_1.8-4 +[67] glue_1.7.0 tools_4.4.1 colourpicker_1.3.0 +[70] locfit_1.5-9.10 colorspace_2.1-1 nlme_3.1-164 +[73] GenomeInfoDbData_1.2.12 vipor_0.4.7 cli_3.6.3 +[76] fansi_1.0.6 viridisLite_0.4.2 S4Arrays_1.4.1 +[79] dplyr_1.1.4 gtable_0.3.5 rintrojs_0.3.4 +[82] sass_0.4.9 digest_0.6.37 SparseArray_1.4.8 +[85] ggrepel_0.9.6 farver_2.1.2 rjson_0.2.23 +[88] htmlwidgets_1.6.4 htmltools_0.5.8.1 lifecycle_1.0.4 +[91] shinyWidgets_0.8.6 httr_1.4.7 GlobalOptions_0.1.2 +[94] statmod_1.5.0 mime_0.12 ``` :::::::::::::::::::::::::::::::::::::::: keypoints diff --git a/05-differential-expression.md b/05-differential-expression.md index dc8ad0fd..5b3443d4 100644 --- a/05-differential-expression.md +++ b/05-differential-expression.md @@ -40,7 +40,7 @@ The *design formula* expresses the variables which will be used in modeling. The ### Load packages -```r +``` r suppressPackageStartupMessages({ library(SummarizedExperiment) library(DESeq2) @@ -57,7 +57,7 @@ suppressPackageStartupMessages({ Let's load in our `SummarizedExperiment` object again. In the last episode for quality control exploration, we removed ~35% genes that had 5 or fewer counts because they had too little information in them. For DESeq2 statistical analysis, we do not technically have to remove these genes because by default it will do some independent filtering, but it can reduce the memory size of the `DESeqDataSet` object resulting in faster computation. Plus, we do not want these genes cluttering up some of the visualizations. -```r +``` r se <- readRDS("data/GSE96870_se.rds") se <- se[rowSums(assay(se, "counts")) > 5, ] ``` @@ -67,12 +67,12 @@ se <- se[rowSums(assay(se, "counts")) > 5, ] The design matrix we will use in this example is `~ sex + time`. This will allow us test the difference between males and females (averaged over time point) and the difference between day 0, 4 and 8 (averaged over males and females). If we wanted to test other comparisons (e.g., Female.Day8 vs. Female.Day0 and also Male.Day8 vs. Male.Day0) we could use a different design matrix to more easily extract those pairwise comparisons. -```r +``` r dds <- DESeq2::DESeqDataSet(se, design = ~ sex + time) ``` -```{.warning} +``` warning Warning in DESeq2::DESeqDataSet(se, design = ~sex + time): some variables in design formula are characters, converting to factors ``` @@ -82,7 +82,7 @@ design formula are characters, converting to factors The function to generate a `DESeqDataSet` needs to be adapted depending on the input type, e.g, -```r +``` r #From SummarizedExperiment object ddsSE <- DESeqDataSet(se, design = ~ sex + time) @@ -106,7 +106,7 @@ As shown in the [previous section](../episodes/04-exploratory-qc.Rmd) on explora Recall the `estimateSizeFactors()` function from the previous section: -```r +``` r dds <- estimateSizeFactors(dds) ``` @@ -132,23 +132,23 @@ $\theta$ represents the gene-specific __dispersion__, a measure of variability o We can visualize the effect of *shrinkage* using `plotDispEsts()`: -```r +``` r dds <- estimateDispersions(dds) ``` -```{.output} +``` output gene-wise dispersion estimates ``` -```{.output} +``` output mean-dispersion relationship ``` -```{.output} +``` output final dispersion estimates ``` -```r +``` r plotDispEsts(dds) ``` @@ -166,7 +166,7 @@ Finally, the estimated log2 fold changes are scaled by their standard error and -```r +``` r dds <- nbinomWaldTest(dds) ``` @@ -177,12 +177,12 @@ dds <- nbinomWaldTest(dds) Standard differential expression analysis as performed above is wrapped into a single function, `DESeq()`. Running the first code chunk is equivalent to running the second one: -```r +``` r dds <- DESeq(dds) ``` -```r +``` r dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) dds <- nbinomWaldTest(dds) @@ -209,11 +209,11 @@ In the lesson example the last variable of the design formula is `time`. The __reference level__ (first in alphabetical order) is `Day0` and the __last level__ is `Day8` -```r +``` r levels(dds$time) ``` -```{.output} +``` output [1] "Day0" "Day4" "Day8" ``` @@ -226,13 +226,13 @@ To explore the output of the `results()` function we can use the `summary()` fun -```r +``` r ## Day 8 vs Day 0 resTime <- results(dds, contrast = c("time", "Day8", "Day0")) summary(resTime) ``` -```{.output} +``` output out of 27430 with nonzero total read count adjusted p-value < 0.1 @@ -245,12 +245,12 @@ low counts [2] : 3723, 14% [2] see 'independentFiltering' argument of ?results ``` -```r +``` r # View(resTime) head(resTime[order(resTime$pvalue), ]) ``` -```{.output} +``` output log2 fold change (MLE): time Day8 vs Day0 Wald test p-value: time Day8 vs Day0 DataFrame with 6 rows and 6 columns @@ -277,7 +277,7 @@ Both of the below ways of specifying the contrast are essentially equivalent. The `name` parameter can be accessed using `resultsNames()`. -```r +``` r resTime <- results(dds, contrast = c("time", "Day8", "Day0")) resTime <- results(dds, name = "time_Day8_vs_Day0") ``` @@ -295,13 +295,13 @@ Explore the DE genes between males and females independent of time. -```r +``` r ## Male vs Female resSex <- results(dds, contrast = c("sex", "Male", "Female")) summary(resSex) ``` -```{.output} +``` output out of 27430 with nonzero total read count adjusted p-value < 0.1 @@ -314,11 +314,11 @@ low counts [2] : 8504, 31% [2] see 'independentFiltering' argument of ?results ``` -```r +``` r head(resSex[order(resSex$pvalue), ]) ``` -```{.output} +``` output log2 fold change (MLE): sex Male vs Female Wald test p-value: sex Male vs Female DataFrame with 6 rows and 6 columns @@ -362,7 +362,7 @@ We can visualize the results in many ways. A good check is to explore the relati `DESeq2` provides a useful function to do so, `plotMA()`. -```r +``` r plotMA(resTime) ``` @@ -373,18 +373,18 @@ This is caused by counts from lowly expressed genes tending to be very noisy. We can *shrink* the log fold changes of these genes with low mean and high dispersion, as they contain little information. -```r +``` r resTimeLfc <- lfcShrink(dds, coef = "time_Day8_vs_Day0", res = resTime) ``` -```{.output} +``` output using 'apeglm' for LFC shrinkage. If used in published research, please cite: Zhu, A., Ibrahim, J.G., Love, M.I. (2018) Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. Bioinformatics. https://doi.org/10.1093/bioinformatics/bty895 ``` -```r +``` r plotMA(resTimeLfc) ``` @@ -398,14 +398,14 @@ By default `independentFiltering` is set to `TRUE`. What happens without filteri :::::::::::::::::::::::: solution -```r +``` r resTimeNotFiltered <- results(dds, contrast = c("time", "Day8", "Day0"), independentFiltering = FALSE) summary(resTime) ``` -```{.output} +``` output out of 27430 with nonzero total read count adjusted p-value < 0.1 @@ -418,11 +418,11 @@ low counts [2] : 3723, 14% [2] see 'independentFiltering' argument of ?results ``` -```r +``` r summary(resTimeNotFiltered) ``` -```{.output} +``` output out of 27430 with nonzero total read count adjusted p-value < 0.1 @@ -449,7 +449,7 @@ We will use transformed data (see [exploratory data analysis](../episodes/04-exp -```r +``` r # Transform counts vsd <- vst(dds, blind = TRUE) @@ -487,7 +487,7 @@ Check the heatmap and top DE genes. Do you find something expected/unexpected in We may want to to output our results out of R to have a stand-alone file. The format of `resTime` only has the gene symbols as rownames, so let us join the gene annotation information, and then write out as .csv file: -```r +``` r head(as.data.frame(resTime)) head(as.data.frame(rowRanges(se))) diff --git a/06-extra-design.md b/06-extra-design.md index 7932d029..f50d35ad 100644 --- a/06-extra-design.md +++ b/06-extra-design.md @@ -32,7 +32,7 @@ We start by loading a few packages that will be needed in this episode. In particular, the [ExploreModelMatrix](https://bioconductor.org/packages/ExploreModelMatrix/) package provides resources for exploring design matrices in a graphical fashion, for easier interpretation. -```r +``` r suppressPackageStartupMessages({ library(SummarizedExperiment) library(ExploreModelMatrix) @@ -47,12 +47,12 @@ Moreover, all mice have the same age (8 weeks). Hence, in the first part of this episode we consider only the sex, tissue and time variables further. -```r +``` r meta <- read.csv("data/GSE96870_coldata_all.csv", row.names = 1) meta ``` -```{.output} +``` output title geo_accession organism age sex GSM2545336 CNS_RNA-seq_10C GSM2545336 Mus musculus 8 weeks Female GSM2545337 CNS_RNA-seq_11C GSM2545337 Mus musculus 8 weeks Female @@ -147,11 +147,11 @@ GSM2545379 InfluenzaA C57BL/6 Day8 Spinalcord 24 GSM2545380 InfluenzaA C57BL/6 Day8 Cerebellum 19 ``` -```r +``` r table(meta$time, meta$infection) ``` -```{.output} +``` output InfluenzaA NonInfected Day0 0 15 @@ -159,11 +159,11 @@ table(meta$time, meta$infection) Day8 14 0 ``` -```r +``` r table(meta$age) ``` -```{.output} +``` output 8 weeks 45 @@ -172,19 +172,19 @@ table(meta$age) We can start by visualizing the number of observations for each combination of the three predictor variables. -```r +``` r vd <- VisualizeDesign(sampleData = meta, designFormula = ~ tissue + time + sex) vd$cooccurrenceplots ``` -```{.output} +``` output $`tissue = Cerebellum` ``` -```{.output} +``` output $`tissue = Spinalcord` ``` @@ -210,14 +210,14 @@ This intercept will represent the 'baseline' level of the predictor variable, wh If not explicitly specified, R will order the values of the predictor in alphabetical order and select the first one as the reference or baseline level. -```r +``` r ## Subset metadata meta_noninf_spc <- meta %>% filter(time == "Day0" & tissue == "Spinalcord") meta_noninf_spc ``` -```{.output} +``` output title geo_accession organism age sex GSM2545356 CNS_RNA-seq_574 GSM2545356 Mus musculus 8 weeks Male GSM2545357 CNS_RNA-seq_575 GSM2545357 Mus musculus 8 weeks Male @@ -238,7 +238,7 @@ GSM2545366 NonInfected C57BL/6 Day0 Spinalcord 10 GSM2545367 NonInfected C57BL/6 Day0 Spinalcord 11 ``` -```r +``` r ## Use ExploreModelMatrix to create a design matrix and visualizations, given ## the desired design formula. vd <- VisualizeDesign(sampleData = meta_noninf_spc, @@ -246,7 +246,7 @@ vd <- VisualizeDesign(sampleData = meta_noninf_spc, vd$designmatrix ``` -```{.output} +``` output (Intercept) sexMale GSM2545356 1 1 GSM2545357 1 1 @@ -258,22 +258,22 @@ GSM2545366 1 0 GSM2545367 1 1 ``` -```r +``` r vd$plotlist ``` -```{.output} +``` output [[1]] ``` -```r +``` r ## Note that we can also generate the design matrix like this model.matrix(~ sex, data = meta_noninf_spc) ``` -```{.output} +``` output (Intercept) sexMale GSM2545356 1 1 GSM2545357 1 1 @@ -309,13 +309,13 @@ Set up the design formula to compare male and female spinal cord samples from Da ### Solution -```r +``` r meta_noninf_spc <- meta %>% filter(time == "Day0" & tissue == "Spinalcord") meta_noninf_spc ``` -```{.output} +``` output title geo_accession organism age sex GSM2545356 CNS_RNA-seq_574 GSM2545356 Mus musculus 8 weeks Male GSM2545357 CNS_RNA-seq_575 GSM2545357 Mus musculus 8 weeks Male @@ -336,13 +336,13 @@ GSM2545366 NonInfected C57BL/6 Day0 Spinalcord 10 GSM2545367 NonInfected C57BL/6 Day0 Spinalcord 11 ``` -```r +``` r vd <- VisualizeDesign(sampleData = meta_noninf_spc, designFormula = ~ 0 + sex) vd$designmatrix ``` -```{.output} +``` output sexFemale sexMale GSM2545356 0 1 GSM2545357 0 1 @@ -354,11 +354,11 @@ GSM2545366 1 0 GSM2545367 0 1 ``` -```r +``` r vd$plotlist ``` -```{.output} +``` output [[1]] ``` @@ -379,12 +379,12 @@ Set up the design formula to compare the three time points (Day0, Day4, Day8) in ### Solution -```r +``` r meta_male_spc <- meta %>% filter(sex == "Male" & tissue == "Spinalcord") meta_male_spc ``` -```{.output} +``` output title geo_accession organism age sex infection GSM2545355 CNS_RNA-seq_571 GSM2545355 Mus musculus 8 weeks Male InfluenzaA GSM2545356 CNS_RNA-seq_574 GSM2545356 Mus musculus 8 weeks Male NonInfected @@ -413,12 +413,12 @@ GSM2545378 C57BL/6 Day8 Spinalcord 23 GSM2545379 C57BL/6 Day8 Spinalcord 24 ``` -```r +``` r vd <- VisualizeDesign(sampleData = meta_male_spc, designFormula = ~ time) vd$designmatrix ``` -```{.output} +``` output (Intercept) timeDay4 timeDay8 GSM2545355 1 1 0 GSM2545356 1 0 0 @@ -434,11 +434,11 @@ GSM2545378 1 0 1 GSM2545379 1 0 1 ``` -```r +``` r vd$plotlist ``` -```{.output} +``` output [[1]] ``` @@ -454,12 +454,12 @@ Next, we again consider only non-infected mice, but fit a model incorporating bo We assume that the tissue differences are the same for both male and female mice, and consequently fit an additive model, without interaction terms. -```r +``` r meta_noninf <- meta %>% filter(time == "Day0") meta_noninf ``` -```{.output} +``` output title geo_accession organism age sex GSM2545337 CNS_RNA-seq_11C GSM2545337 Mus musculus 8 weeks Female GSM2545338 CNS_RNA-seq_12C GSM2545338 Mus musculus 8 weeks Female @@ -494,13 +494,13 @@ GSM2545366 NonInfected C57BL/6 Day0 Spinalcord 10 GSM2545367 NonInfected C57BL/6 Day0 Spinalcord 11 ``` -```r +``` r vd <- VisualizeDesign(sampleData = meta_noninf, designFormula = ~ sex + tissue) vd$designmatrix ``` -```{.output} +``` output (Intercept) sexMale tissueSpinalcord GSM2545337 1 0 0 GSM2545338 1 0 0 @@ -519,11 +519,11 @@ GSM2545366 1 0 1 GSM2545367 1 1 1 ``` -```r +``` r vd$plotlist ``` -```{.output} +``` output [[1]] ``` @@ -535,12 +535,12 @@ In the previous model, we assumed that the tissue differences were the same for To allow for the estimation of sex-specific tissue differences (at the expense of having one additional coefficient to estimate from the data), we can include an interaction term in the model. -```r +``` r meta_noninf <- meta %>% filter(time == "Day0") meta_noninf ``` -```{.output} +``` output title geo_accession organism age sex GSM2545337 CNS_RNA-seq_11C GSM2545337 Mus musculus 8 weeks Female GSM2545338 CNS_RNA-seq_12C GSM2545338 Mus musculus 8 weeks Female @@ -575,7 +575,7 @@ GSM2545366 NonInfected C57BL/6 Day0 Spinalcord 10 GSM2545367 NonInfected C57BL/6 Day0 Spinalcord 11 ``` -```r +``` r ## Define a design including an interaction term ## Note that ~ sex * tissue is equivalent to ## ~ sex + tissue + sex:tissue @@ -584,7 +584,7 @@ vd <- VisualizeDesign(sampleData = meta_noninf, vd$designmatrix ``` -```{.output} +``` output (Intercept) sexMale tissueSpinalcord sexMale:tissueSpinalcord GSM2545337 1 0 0 0 GSM2545338 1 0 0 0 @@ -603,11 +603,11 @@ GSM2545366 1 0 1 0 GSM2545367 1 1 1 1 ``` -```r +``` r vd$plotlist ``` -```{.output} +``` output [[1]] ``` @@ -619,13 +619,13 @@ Sometimes, for experiments with multiple factors, it is easier to interpret coef Let's consider the previous example again, using this approach: -```r +``` r meta_noninf <- meta %>% filter(time == "Day0") meta_noninf$sex_tissue <- paste0(meta_noninf$sex, "_", meta_noninf$tissue) meta_noninf ``` -```{.output} +``` output title geo_accession organism age sex GSM2545337 CNS_RNA-seq_11C GSM2545337 Mus musculus 8 weeks Female GSM2545338 CNS_RNA-seq_12C GSM2545338 Mus musculus 8 weeks Female @@ -660,13 +660,13 @@ GSM2545366 NonInfected C57BL/6 Day0 Spinalcord 10 Female_Spinalcord GSM2545367 NonInfected C57BL/6 Day0 Spinalcord 11 Male_Spinalcord ``` -```r +``` r vd <- VisualizeDesign(sampleData = meta_noninf, designFormula = ~ 0 + sex_tissue) vd$designmatrix ``` -```{.output} +``` output sex_tissueFemale_Cerebellum sex_tissueFemale_Spinalcord GSM2545337 1 0 GSM2545338 1 0 @@ -701,11 +701,11 @@ GSM2545366 0 0 GSM2545367 0 1 ``` -```r +``` r vd$plotlist ``` -```{.output} +``` output [[1]] ``` @@ -720,7 +720,7 @@ However, accounting for it can increase power to detect tissue differences by el Here, we define a paired design for the female non-infected mice, aimed at testing for differences between tissues after accounting for baseline differences between mice. -```r +``` r meta_fem_day0 <- meta %>% filter(sex == "Female" & time == "Day0") @@ -730,7 +730,7 @@ meta_fem_day0$mouse <- factor(meta_fem_day0$mouse) meta_fem_day0 ``` -```{.output} +``` output title geo_accession organism age sex GSM2545337 CNS_RNA-seq_11C GSM2545337 Mus musculus 8 weeks Female GSM2545338 CNS_RNA-seq_12C GSM2545338 Mus musculus 8 weeks Female @@ -751,13 +751,13 @@ GSM2545365 NonInfected C57BL/6 Day0 Spinalcord 9 GSM2545366 NonInfected C57BL/6 Day0 Spinalcord 10 ``` -```r +``` r vd <- VisualizeDesign(sampleData = meta_fem_day0, designFormula = ~ mouse + tissue) vd$designmatrix ``` -```{.output} +``` output (Intercept) mouse8 mouse9 mouse10 tissueSpinalcord GSM2545337 1 0 1 0 0 GSM2545338 1 0 0 1 0 @@ -769,11 +769,11 @@ GSM2545365 1 0 1 0 1 GSM2545366 1 0 0 1 1 ``` -```r +``` r vd$plotlist ``` -```{.output} +``` output [[1]] ``` @@ -788,7 +788,7 @@ One way to view this type of design is as two paired experiments, one for each i We can then easily compare the two tissues in each infection group, and contrast the tissue differences between the infection groups. -```r +``` r meta_fem_day04 <- meta %>% filter(sex == "Female" & time %in% c("Day0", "Day4")) %>% @@ -799,7 +799,7 @@ meta_fem_day04$mouse <- factor(meta_fem_day04$mouse) meta_fem_day04 ``` -```{.output} +``` output title geo_accession organism age sex GSM2545337 CNS_RNA-seq_11C GSM2545337 Mus musculus 8 weeks Female GSM2545338 CNS_RNA-seq_12C GSM2545338 Mus musculus 8 weeks Female @@ -836,7 +836,7 @@ GSM2545376 InfluenzaA C57BL/6 Day4 Spinalcord 21 GSM2545377 InfluenzaA C57BL/6 Day4 Spinalcord 22 ``` -```r +``` r design <- model.matrix(~ mouse, data = meta_fem_day04) design <- cbind(design, Spc.Day0 = meta_fem_day04$tissue == "Spinalcord" & @@ -847,7 +847,7 @@ rownames(design) <- rownames(meta_fem_day04) design ``` -```{.output} +``` output (Intercept) mouse8 mouse9 mouse10 mouse15 mouse20 mouse21 mouse22 GSM2545337 1 0 1 0 0 0 0 0 GSM2545338 1 0 0 1 0 0 0 0 @@ -884,7 +884,7 @@ GSM2545376 0 1 GSM2545377 0 1 ``` -```r +``` r vd <- VisualizeDesign(sampleData = meta_fem_day04 %>% select(time, tissue, mouse), designFormula = NULL, @@ -892,7 +892,7 @@ vd <- VisualizeDesign(sampleData = meta_fem_day04 %>% vd$designmatrix ``` -```{.output} +``` output (Intercept) mouse8 mouse9 mouse10 mouse15 mouse20 mouse21 mouse22 GSM2545337 1 0 1 0 0 0 0 0 GSM2545338 1 0 0 1 0 0 0 0 @@ -929,17 +929,17 @@ GSM2545376 0 1 GSM2545377 0 1 ``` -```r +``` r vd$plotlist ``` -```{.output} +``` output $`time = Day0` ``` -```{.output} +``` output $`time = Day4` ``` @@ -952,45 +952,45 @@ Now that we have learnt more about interpreting design matrices, let's look back We will repeat the main lines of code here. -```r +``` r se <- readRDS("data/GSE96870_se.rds") se <- se[rowSums(assay(se, "counts")) > 5, ] dds <- DESeq2::DESeqDataSet(se, design = ~ sex + time) dds <- DESeq(dds) ``` -```{.output} +``` output estimating size factors ``` -```{.output} +``` output estimating dispersions ``` -```{.output} +``` output gene-wise dispersion estimates ``` -```{.output} +``` output mean-dispersion relationship ``` -```{.output} +``` output final dispersion estimates ``` -```{.output} +``` output fitting model and testing ``` `DESeq2` stores the design matrix in the object: -```r +``` r attr(dds, "modelMatrix") ``` -```{.output} +``` output Intercept sex_Male_vs_Female time_Day4_vs_Day0 time_Day8_vs_Day0 Female_Day0_9 1 0 0 0 Female_Day0_10 1 0 0 0 @@ -1027,11 +1027,11 @@ attr(,"contrasts")$time The column names can be obtained via the `resultsNames` function: -```r +``` r resultsNames(dds) ``` -```{.output} +``` output [1] "Intercept" "sex_Male_vs_Female" "time_Day4_vs_Day0" [4] "time_Day8_vs_Day0" ``` @@ -1039,14 +1039,14 @@ resultsNames(dds) Let's visualize this design: -```r +``` r vd <- VisualizeDesign(sampleData = colData(dds)[, c("sex", "time")], designMatrix = attr(dds, "modelMatrix"), flipCoordFitted = TRUE) vd$plotlist ``` -```{.output} +``` output [[1]] ``` @@ -1055,7 +1055,7 @@ vd$plotlist In the previous episode, we performed a test comparing Day8 samples to Day0 samples: -```r +``` r resTime <- results(dds, contrast = c("time", "Day8", "Day0")) ``` @@ -1063,18 +1063,18 @@ From the figure above, we see that this comparison is represented by the `time_D Thus, an alternative way of specifying the contrast for the test would be: -```r +``` r resTimeNum <- results(dds, contrast = c(0, 0, 0, 1)) ``` Let's check if the results are comparable: -```r +``` r summary(resTime) ``` -```{.output} +``` output out of 27430 with nonzero total read count adjusted p-value < 0.1 @@ -1087,11 +1087,11 @@ low counts [2] : 3723, 14% [2] see 'independentFiltering' argument of ?results ``` -```r +``` r summary(resTimeNum) ``` -```{.output} +``` output out of 27430 with nonzero total read count adjusted p-value < 0.1 @@ -1104,7 +1104,7 @@ low counts [2] : 3723, 14% [2] see 'independentFiltering' argument of ?results ``` -```r +``` r ## logFC plot(resTime$log2FoldChange, resTimeNum$log2FoldChange) abline(0, 1) @@ -1112,7 +1112,7 @@ abline(0, 1) -```r +``` r ## -log10(p-value) plot(-log10(resTime$pvalue), -log10(resTimeNum$pvalue)) abline(0, 1) @@ -1127,42 +1127,42 @@ We still consider the sex and time predictors, but now we allow an interaction b In other words, we allow the time effect to be different for males and females. -```r +``` r se <- readRDS("data/GSE96870_se.rds") se <- se[rowSums(assay(se, "counts")) > 5, ] dds <- DESeq2::DESeqDataSet(se, design = ~ sex * time) dds <- DESeq(dds) ``` -```{.output} +``` output estimating size factors ``` -```{.output} +``` output estimating dispersions ``` -```{.output} +``` output gene-wise dispersion estimates ``` -```{.output} +``` output mean-dispersion relationship ``` -```{.output} +``` output final dispersion estimates ``` -```{.output} +``` output fitting model and testing ``` -```r +``` r attr(dds, "modelMatrix") ``` -```{.output} +``` output Intercept sex_Male_vs_Female time_Day4_vs_Day0 time_Day8_vs_Day0 Female_Day0_9 1 0 0 0 Female_Day0_10 1 0 0 0 @@ -1222,14 +1222,14 @@ attr(,"contrasts")$time Let's visualize this design: -```r +``` r vd <- VisualizeDesign(sampleData = colData(dds)[, c("sex", "time")], designMatrix = attr(dds, "modelMatrix"), flipCoordFitted = TRUE) vd$plotlist ``` -```{.output} +``` output [[1]] ``` @@ -1239,7 +1239,7 @@ Note that now, the `time_Day8_vs_Day0` coefficient represents the difference bet To get the corresponding difference for the male samples, we need to also add the interaction effect (`sexMale.timeDay8`). -```r +``` r ## Day8 vs Day0, female resTimeFemale <- results(dds, contrast = c("time", "Day8", "Day0")) @@ -1250,7 +1250,7 @@ resTimeInt <- results(dds, name = "sexMale.timeDay8") Let's try to fit this model with the second approach mentioned above, namely to create a single factor. -```r +``` r se <- readRDS("data/GSE96870_se.rds") se <- se[rowSums(assay(se, "counts")) > 5, ] se$sex_time <- paste0(se$sex, "_", se$time) @@ -1258,35 +1258,35 @@ dds <- DESeq2::DESeqDataSet(se, design = ~ 0 + sex_time) dds <- DESeq(dds) ``` -```{.output} +``` output estimating size factors ``` -```{.output} +``` output estimating dispersions ``` -```{.output} +``` output gene-wise dispersion estimates ``` -```{.output} +``` output mean-dispersion relationship ``` -```{.output} +``` output final dispersion estimates ``` -```{.output} +``` output fitting model and testing ``` -```r +``` r attr(dds, "modelMatrix") ``` -```{.output} +``` output sex_timeFemale_Day0 sex_timeFemale_Day4 sex_timeFemale_Day8 Female_Day0_9 1 0 0 Female_Day0_10 1 0 0 @@ -1343,14 +1343,14 @@ attr(,"contrasts")$sex_time We again visualize this design: -```r +``` r vd <- VisualizeDesign(sampleData = colData(dds)[, c("sex", "time")], designMatrix = attr(dds, "modelMatrix"), flipCoordFitted = TRUE) vd$plotlist ``` -```{.output} +``` output [[1]] ``` @@ -1359,7 +1359,7 @@ vd$plotlist We then set up the same contrasts as above -```r +``` r ## Day8 vs Day0, female resTimeFemaleSingle <- results(dds, contrast = c("sex_time", "Female_Day8", "Female_Day0")) @@ -1367,23 +1367,23 @@ resTimeFemaleSingle <- results(dds, contrast = c("sex_time", "Female_Day8", "Fem resultsNames(dds) ``` -```{.output} +``` output [1] "sex_timeFemale_Day0" "sex_timeFemale_Day4" "sex_timeFemale_Day8" [4] "sex_timeMale_Day0" "sex_timeMale_Day4" "sex_timeMale_Day8" ``` -```r +``` r resTimeIntSingle <- results(dds, contrast = c(1, 0, -1, -1, 0, 1)) ``` Check that these results agree with the ones obtained by fitting the model with the two factors and the interaction term. -```r +``` r summary(resTimeFemale) ``` -```{.output} +``` output out of 27430 with nonzero total read count adjusted p-value < 0.1 @@ -1396,11 +1396,11 @@ low counts [2] : 6382, 23% [2] see 'independentFiltering' argument of ?results ``` -```r +``` r summary(resTimeFemaleSingle) ``` -```{.output} +``` output out of 27430 with nonzero total read count adjusted p-value < 0.1 @@ -1413,18 +1413,18 @@ low counts [2] : 6382, 23% [2] see 'independentFiltering' argument of ?results ``` -```r +``` r plot(-log10(resTimeFemale$pvalue), -log10(resTimeFemaleSingle$pvalue)) abline(0, 1) ``` -```r +``` r summary(resTimeInt) ``` -```{.output} +``` output out of 27430 with nonzero total read count adjusted p-value < 0.1 @@ -1437,11 +1437,11 @@ low counts [2] : 0, 0% [2] see 'independentFiltering' argument of ?results ``` -```r +``` r summary(resTimeIntSingle) ``` -```{.output} +``` output out of 27430 with nonzero total read count adjusted p-value < 0.1 @@ -1454,7 +1454,7 @@ low counts [2] : 0, 0% [2] see 'independentFiltering' argument of ?results ``` -```r +``` r plot(-log10(resTimeInt$pvalue), -log10(resTimeIntSingle$pvalue)) abline(0, 1) ``` diff --git a/07-gene-set-analysis.md b/07-gene-set-analysis.md index 9d4a2d92..56a4287f 100644 --- a/07-gene-set-analysis.md +++ b/07-gene-set-analysis.md @@ -70,7 +70,7 @@ Following is a list of packages that will be used in this episode: -```r +``` r library(SummarizedExperiment) library(DESeq2) library(gplots) @@ -99,7 +99,7 @@ In the following code, there are also comments that explain every step of the analysis. -```r +``` r library(SummarizedExperiment) library(DESeq2) @@ -125,19 +125,19 @@ Let's check the number of DE genes and how they look like. It seems the number of DE genes is very small, but it is OK for this example. -```r +``` r length(sexDEgenes) ``` -```{.output} +``` output [1] 54 ``` -```r +``` r head(sexDEgenes) ``` -```{.output} +``` output [1] "Lgr6" "Myoc" "Fibcd1" "Kcna4" "Ctxn2" "S100a9" ``` @@ -152,22 +152,22 @@ format and we need to explicitly convert it to a normal vector by `as.vector()`. -```r +``` r geneGR <- rowRanges(se) totalGenes <- rownames(se) XYGeneSet <- totalGenes[as.vector(seqnames(geneGR)) %in% c("X", "Y")] head(XYGeneSet) ``` -```{.output} +``` output [1] "Gm21950" "Gm14346" "Gm14345" "Gm14351" "Spin2-ps1" "Gm3701" ``` -```r +``` r length(XYGeneSet) ``` -```{.output} +``` output [1] 1134 ``` @@ -189,7 +189,7 @@ Since the DE genes and the gene set can be mathematically thought of as two sets a natural way is to first visualize them with a Venn diagram. -```r +``` r library(gplots) plot(venn(list("sexDEgenes" = sexDEgenes, "XY gene set" = XYGeneSet))) @@ -230,7 +230,7 @@ These numbers can be obtained as in the following code^[Genes must be unique in R variable names. -```r +``` r n <- nrow(se) n_01 <- length(XYGeneSet) n_10 <- length(sexDEgenes) @@ -240,7 +240,7 @@ n_11 <- length(intersect(sexDEgenes, XYGeneSet)) Other values can be obtained by: -```r +``` r n_12 <- n_10 - n_11 n_21 <- n_01 - n_11 n_20 <- n - n_10 @@ -251,12 +251,12 @@ n_22 <- n_02 - n_12 All the values are: -```r +``` r matrix(c(n_11, n_12, n_10, n_21, n_22, n_20, n_01, n_02, n), nrow = 3, byrow = TRUE) ``` -```{.output} +``` output [,1] [,2] [,3] [1,] 13 41 54 [2,] 1121 20023 21144 @@ -283,12 +283,12 @@ the test. The input is the top-left 2x2 sub-matrix. We specify `alternative = over-representation. -```r +``` r fisher.test(matrix(c(n_11, n_12, n_21, n_22), nrow = 2, byrow = TRUE), alternative = "greater") ``` -```{.output} +``` output Fisher's Exact Test for Count Data @@ -309,13 +309,13 @@ Results of the Fisher's Exact test can be saved into an object `t`, which is a simple list, and the _p_-value can be obtained by `t$p.value`. -```r +``` r t <- fisher.test(matrix(c(n_11, n_12, n_21, n_22), nrow = 2, byrow = TRUE), alternative = "greater") t$p.value ``` -```{.output} +``` output [1] 3.9059e-06 ``` @@ -350,12 +350,12 @@ and put whether genes are DE on columns. And the corresponding `fisher.test()` is: -```r +``` r fisher.test(matrix(c(13, 1121, 41, 20023), nrow = 2, byrow = TRUE), alternative = "greater") ``` -```{.output} +``` output Fisher's Exact Test for Count Data @@ -451,11 +451,11 @@ Then, the correct use of `phyper()` is: Let's plugin our variables: -```r +``` r 1 - phyper(n_11 - 1, n_10, n_20, n_01) ``` -```{.output} +``` output [1] 3.9059e-06 ``` @@ -463,22 +463,22 @@ Optionally, `lower.tail` argument can be specified which directly calculates _p_-values from the upper tail of the distribution. -```r +``` r phyper(n_11 - 1, n_10, n_20, n_01, lower.tail = FALSE) ``` -```{.output} +``` output [1] 3.9059e-06 ``` If we switch `n_01` and `n_10`, the _p_-values are identical: -```r +``` r 1 - phyper(n_11 - 1, n_01, n_02, n_10) ``` -```{.output} +``` output [1] 3.9059e-06 ``` @@ -489,7 +489,7 @@ Let's test the runtime of the two functions: -```r +``` r library(microbenchmark) microbenchmark( fisher = fisher.test(matrix(c(n_11, n_12, n_21, n_22), nrow = 2, byrow = TRUE), @@ -498,11 +498,11 @@ microbenchmark( ) ``` -```{.output} +``` output Unit: microseconds - expr min lq mean median uq max neval - fisher 238.955 243.980 270.67608 248.163 259.2635 1920.491 100 - hyper 1.333 1.488 2.28346 2.550 2.8505 6.953 100 + expr min lq mean median uq max neval + fisher 247.441 253.1220 287.03204 261.883 275.659 2133.728 100 + hyper 1.453 1.6835 2.51601 1.999 3.061 9.678 100 ``` It is very astonishing that `phyper()` is hundreds of times faster than @@ -573,7 +573,7 @@ list of vectors. In the following example, there are three gene sets with 3, 5 and 2 genes. Some genes exist in multiple gene sets. -```r +``` r lt <- list(gene_set_1 = c("gene_1", "gene_2", "gene_3"), gene_set_2 = c("gene_1", "gene_3", "gene_4", "gene_5", "gene_6"), gene_set_3 = c("gene_4", "gene_7") @@ -581,7 +581,7 @@ lt <- list(gene_set_1 = c("gene_1", "gene_2", "gene_3"), lt ``` -```{.output} +``` output $gene_set_1 [1] "gene_1" "gene_2" "gene_3" @@ -598,12 +598,12 @@ i.e. which column locates as the first column, are quite arbitrary. Different tools may require differently. -```r +``` r data.frame(gene_set = rep(names(lt), times = sapply(lt, length)), gene = unname(unlist(lt))) ``` -```{.output} +``` output gene_set gene 1 gene_set_1 gene_1 2 gene_set_1 gene_2 @@ -621,7 +621,7 @@ Or genes be in the first column: -```{.output} +``` output gene gene_set 1 gene_1 gene_set_1 2 gene_2 gene_set_1 @@ -658,7 +658,7 @@ list to a two-column data frame? ::::::::::::::::::::::::::::::::::: solution -```r +``` r lt <- list(gene_set_1 = c("gene_1", "gene_2", "gene_3"), gene_set_2 = c("gene_1", "gene_3", "gene_4", "gene_5", "gene_6"), gene_set_3 = c("gene_4", "gene_7") @@ -669,13 +669,13 @@ To convert `lt` to a data frame (e.g. let's put gene sets in the first column): -```r +``` r df = data.frame(gene_set = rep(names(lt), times = sapply(lt, length)), gene = unname(unlist(lt))) df ``` -```{.output} +``` output gene_set gene 1 gene_set_1 gene_1 2 gene_set_1 gene_2 @@ -692,11 +692,11 @@ df To convert `df` back to the list: -```r +``` r split(df$gene, df$gene_set) ``` -```{.output} +``` output $gene_set_1 [1] "gene_1" "gene_2" "gene_3" @@ -764,12 +764,12 @@ All the columns (the key column or the source column) can be obtained by `keytypes()`: -```r +``` r library(org.Hs.eg.db) keytypes(org.Hs.eg.db) ``` -```{.output} +``` output [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" [6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME" [11] "GENETYPE" "GO" "GOALL" "IPI" "MAP" @@ -784,13 +784,13 @@ valid "key column", thus we can query "_select all GO IDs where the correspondin ONTOLOGY is BP_", which is translated into the following code: -```r +``` r BP_Id = mapIds(org.Hs.eg.db, keys = "BP", keytype = "ONTOLOGY", column = "GO", multiVals = "list")[[1]] head(BP_Id) ``` -```{.output} +``` output [1] "GO:0008150" "GO:0001553" "GO:0001869" "GO:0002438" "GO:0006953" [6] "GO:0007584" ``` @@ -804,7 +804,7 @@ Next we do mapping from GO IDs to gene Entrez IDs. Now the query becomes "_provi a vector of GO IDs, select ENTREZIDs which correspond to every one of them_". -```r +``` r BPGeneSets = mapIds(org.Hs.eg.db, keys = BP_Id, keytype = "GOALL", column = "ENTREZID", multiVals = "list") ``` @@ -823,12 +823,12 @@ annotations which is not complete, and mapping between `"GOALL"` and We filter out GO gene sets with no gene annotated. -```r +``` r BPGeneSets = BPGeneSets[sapply(BPGeneSets, length) > 0] BPGeneSets[2:3] # BPGeneSets[[1]] is too long ``` -```{.output} +``` output $`GO:0001553` [1] "2" "2516" "2661" "2661" "3624" "4313" "5156" "5798" "6777" [10] "8322" "8879" "56729" "59338" @@ -897,12 +897,12 @@ remote web server. -```r +``` r keggGeneSets = read.table(url("https://rest.kegg.jp/link/pathway/hsa"), sep = "\t") head(keggGeneSets) ``` -```{.output} +``` output V1 V2 1 hsa:10327 path:hsa00010 2 hsa:124 path:hsa00010 @@ -917,13 +917,13 @@ Let's remove the `"hsa:"` prefix, also we remove the `"path:"` prefix for pathway IDs in the second column. -```r +``` r keggGeneSets[, 1] = gsub("hsa:", "", keggGeneSets[, 1]) keggGeneSets[, 2] = gsub("path:", "", keggGeneSets[, 2]) head(keggGeneSets) ``` -```{.output} +``` output V1 V2 1 10327 hsa00010 2 124 hsa00010 @@ -936,12 +936,12 @@ head(keggGeneSets) The full pathway names can be obtained via the "list" command. -```r +``` r keggNames = read.table(url("https://rest.kegg.jp/list/pathway/hsa"), sep = "\t") head(keggNames) ``` -```{.output} +``` output V1 V2 1 hsa01100 Metabolic pathways - Homo sapiens (human) 2 hsa01200 Carbon metabolism - Homo sapiens (human) @@ -997,12 +997,12 @@ Let's check which organisms are supported and which gene sets collections it pro -```r +``` r library(msigdbr) msigdbr_species() ``` -```{.output} +``` output # A tibble: 20 × 2 species_name species_common_name @@ -1028,11 +1028,11 @@ msigdbr_species() 20 Xenopus tropicalis tropical clawed frog, western clawed frog ``` -```r +``` r msigdbr_collections() ``` -```{.output} +``` output # A tibble: 23 × 3 gs_cat gs_subcat num_genesets @@ -1078,12 +1078,12 @@ msigdbr(species, category, subcategory) For example, we want to obtain the hallmark gene sets for mouse. -```r +``` r MSigDBGeneSets = msigdbr(species = "mouse", category = "H") head(MSigDBGeneSets) ``` -```{.output} +``` output # A tibble: 6 × 18 gs_cat gs_subcat gs_name gene_symbol entrez_gene ensembl_gene @@ -1106,11 +1106,11 @@ table. So users can easily pick one with the same gene ID type as in the DE gene list. For example: -```r +``` r MSigDBGeneSets[, c("gs_name", "ensembl_gene")] ``` -```{.output} +``` output # A tibble: 7,384 × 2 gs_name ensembl_gene @@ -1127,12 +1127,12 @@ MSigDBGeneSets[, c("gs_name", "ensembl_gene")] # ℹ 7,374 more rows ``` -```r +``` r # or put genes in the first column MSigDBGeneSets[, c("ensembl_gene", "gs_name")] ``` -```{.output} +``` output # A tibble: 7,384 × 2 ensembl_gene gs_name @@ -1180,7 +1180,7 @@ recommended when the number of DE genes is too large. The filtering on log2 fold change can be thought as a filtering from the biology aspect. -```r +``` r resTime <- DESeq2::results(dds, contrast = c("time", "Day8", "Day0")) timeDE <- as.data.frame(subset(resTime, padj < 0.05 & abs(log2FoldChange) > log2(1.5) @@ -1189,16 +1189,16 @@ timeDEgenes <- rownames(timeDE) head(timeDEgenes) ``` -```{.output} +``` output [1] "3110035E14Rik" "Sgk3" "Kcnb2" "Sbspon" [5] "Gsta3" "Lman2l" ``` -```r +``` r length(timeDEgenes) ``` -```{.output} +``` output [1] 1134 ``` @@ -1217,7 +1217,7 @@ GO gene sets are automatically retrieved and processed from `org.Mm.eg.db` in `enrichGO()`. -```r +``` r library(clusterProfiler) library(org.Mm.eg.db) resTimeGO = enrichGO(gene = timeDEgenes, @@ -1225,15 +1225,15 @@ resTimeGO = enrichGO(gene = timeDEgenes, OrgDb = org.Mm.eg.db) ``` -```{.output} +``` output --> No gene can be mapped.... ``` -```{.output} ---> Expected input gene ID: 23920,12589,22068,330470,21808,74716 +``` output +--> Expected input gene ID: 22702,16578,19113,192287,14211,210529 ``` -```{.output} +``` output --> return NULL... ``` @@ -1252,7 +1252,7 @@ tell the function that DE genes are in gene symbols. Recall that all valid value for `keyType` are in `keytypes(org.Mm.eg.db)`. -```r +``` r resTimeGO = enrichGO(gene = timeDEgenes, keyType = "SYMBOL", ont = "BP", @@ -1261,35 +1261,35 @@ resTimeGOTable = as.data.frame(resTimeGO) head(resTimeGOTable) ``` -```{.output} - ID Description GeneRatio BgRatio -GO:0050900 GO:0050900 leukocyte migration 49/969 386/28564 -GO:0030595 GO:0030595 leukocyte chemotaxis 35/969 230/28564 -GO:0071674 GO:0071674 mononuclear cell migration 30/969 185/28564 -GO:0060326 GO:0060326 cell chemotaxis 39/969 308/28564 -GO:0035456 GO:0035456 response to interferon-beta 17/969 71/28564 -GO:0097529 GO:0097529 myeloid leukocyte migration 31/969 242/28564 - pvalue p.adjust qvalue -GO:0050900 2.163956e-15 1.096260e-11 8.402981e-12 -GO:0030595 1.066463e-13 2.701351e-10 2.070622e-10 -GO:0071674 1.143223e-12 1.930522e-09 1.479771e-09 -GO:0060326 1.728979e-12 2.189752e-09 1.678475e-09 -GO:0035456 1.681085e-10 1.703276e-07 1.305584e-07 -GO:0097529 2.447325e-10 2.066358e-07 1.583892e-07 - geneID -GO:0050900 Tnfsf18/Sell/Slamf9/Fut7/Itga4/Mdk/Grem1/Ada/Prex1/Edn3/P2ry12/Il12a/S100a8/S100a9/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Trpm4/Hsd3b7/Itgam/Adam8/Calr/Ccl17/Enpp1/Aire/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Aoc3/Itgb3/Ccl28/Lgals3/Ptk2b/Emp2/Apod/Retnlg/Plg/Dusp1/Ager/Il33/Ch25h -GO:0030595 Tnfsf18/Sell/Slamf9/Mdk/Grem1/Prex1/Edn3/Il12a/S100a8/S100a9/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Trpm4/Hsd3b7/Itgam/Adam8/Calr/Ccl17/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Lgals3/Ptk2b/Retnlg/Dusp1/Ch25h -GO:0071674 Tnfsf18/Slamf9/Fut7/Itga4/Mdk/Grem1/Il12a/Nbl1/Padi2/Alox5/Trpm4/Hsd3b7/Adam8/Calr/Ccl17/Aire/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Itgb3/Lgals3/Ptk2b/Apod/Retnlg/Plg/Dusp1/Ager/Ch25h -GO:0060326 Tnfsf18/Sell/Slamf9/Mdk/Grem1/Prex1/Edn3/Il12a/S100a8/S100a9/Lpar1/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Trpm4/Hsd3b7/Itgam/Adam8/Calr/Ccl17/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Ccl28/Lgals3/Ptk2b/Nr4a1/Retnlg/Dusp1/Ch25h/Plxnb3 -GO:0035456 Aim2/Ifi204/Gbp6/Oas1c/Ifitm6/Bst2/Irgm1/Tgtp1/Tgtp2/Ifi47/Igtp/Irgm2/Ifitm7/Gm4951/F830016B08Rik/Iigp1/Ifit1 -GO:0097529 Tnfsf18/Sell/Fut7/Mdk/Grem1/Prex1/Edn3/P2ry12/S100a8/S100a9/Nbl1/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Itgam/Adam8/Ccl17/Enpp1/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Lgals3/Ptk2b/Emp2/Retnlg/Dusp1/Ager +``` output + ID Description GeneRatio BgRatio RichFactor +GO:0050900 GO:0050900 leukocyte migration 51/968 402/28905 0.1268657 +GO:0006935 GO:0006935 chemotaxis 52/968 465/28905 0.1118280 +GO:0042330 GO:0042330 taxis 52/968 467/28905 0.1113490 +GO:0030595 GO:0030595 leukocyte chemotaxis 36/968 242/28905 0.1487603 +GO:0071674 GO:0071674 mononuclear cell migration 32/968 203/28905 0.1576355 +GO:0060326 GO:0060326 cell chemotaxis 41/968 334/28905 0.1227545 + FoldEnrichment zScore pvalue p.adjust qvalue +GO:0050900 3.788277 10.479256 3.536123e-16 1.795643e-12 1.304643e-12 +GO:0006935 3.339243 9.465940 3.336947e-14 6.712316e-11 4.876903e-11 +GO:0042330 3.324942 9.428613 3.965528e-14 6.712316e-11 4.876903e-11 +GO:0030595 4.442063 10.009042 6.591196e-14 8.367524e-11 6.079511e-11 +GO:0071674 4.707080 9.866216 3.208410e-13 3.258461e-10 2.367469e-10 +GO:0060326 3.665515 9.120495 8.641191e-13 7.313328e-10 5.313575e-10 + geneID +GO:0050900 Tnfsf18/Sell/Slamf9/Fut7/Itga4/Mdk/Grem1/Ada/Prex1/Edn3/P2ry12/Il12a/S100a8/S100a9/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Trpm4/Hsd3b7/Itgam/Adam8/Ascl2/Calr/Ccl17/Enpp1/Aire/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Aoc3/Itgb3/Ccl28/Lgals3/Ptk2b/Emp2/Apod/Retnlg/Plg/Fpr2/Dusp1/Ager/Il33/Ch25h +GO:0006935 Tnfsf18/Sell/Slamf9/Mdk/Grem1/Prex1/Edn3/P2ry12/Il12a/S100a8/S100a9/Lpar1/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Ntf3/Trpm4/Hsd3b7/Itgam/Adam8/Lsp1/Calr/Ccl17/Robo3/Cmtm7/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Itgb3/Tubb2b/Ccl28/Lgals3/Cmtm5/Ptk2b/Nr4a1/Casr/Retnlg/Fpr2/Dusp1/Ager/Stx3/Ch25h/Plxnb3/Nox1 +GO:0042330 Tnfsf18/Sell/Slamf9/Mdk/Grem1/Prex1/Edn3/P2ry12/Il12a/S100a8/S100a9/Lpar1/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Ntf3/Trpm4/Hsd3b7/Itgam/Adam8/Lsp1/Calr/Ccl17/Robo3/Cmtm7/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Itgb3/Tubb2b/Ccl28/Lgals3/Cmtm5/Ptk2b/Nr4a1/Casr/Retnlg/Fpr2/Dusp1/Ager/Stx3/Ch25h/Plxnb3/Nox1 +GO:0030595 Tnfsf18/Sell/Slamf9/Mdk/Grem1/Prex1/Edn3/Il12a/S100a8/S100a9/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Trpm4/Hsd3b7/Itgam/Adam8/Calr/Ccl17/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Lgals3/Ptk2b/Retnlg/Fpr2/Dusp1/Ch25h +GO:0071674 Tnfsf18/Slamf9/Fut7/Itga4/Mdk/Grem1/Il12a/Nbl1/Padi2/Alox5/Trpm4/Hsd3b7/Adam8/Ascl2/Calr/Ccl17/Aire/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Itgb3/Lgals3/Ptk2b/Apod/Retnlg/Plg/Fpr2/Dusp1/Ager/Ch25h +GO:0060326 Tnfsf18/Sell/Slamf9/Mdk/Grem1/Prex1/Edn3/Il12a/S100a8/S100a9/Lpar1/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Trpm4/Hsd3b7/Itgam/Adam8/Calr/Ccl17/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Ccl28/Lgals3/Ptk2b/Nr4a1/Retnlg/Fpr2/Dusp1/Ch25h/Plxnb3/Nox1 Count -GO:0050900 49 -GO:0030595 35 -GO:0071674 30 -GO:0060326 39 -GO:0035456 17 -GO:0097529 31 +GO:0050900 51 +GO:0006935 52 +GO:0042330 52 +GO:0030595 36 +GO:0071674 32 +GO:0060326 41 ``` Now `enrichGO()` went through! The returned object `resTimeGO` is in a special @@ -1344,7 +1344,7 @@ is suggested to set both `pvalueCutoff` and `qvalueCutoff` to 1 in `enrichGO()`. -```r +``` r resTimeGO = enrichGO(gene = timeDEgenes, keyType = "SYMBOL", ont = "BP", @@ -1355,35 +1355,35 @@ resTimeGOTable = as.data.frame(resTimeGO) head(resTimeGOTable) ``` -```{.output} - ID Description GeneRatio BgRatio -GO:0050900 GO:0050900 leukocyte migration 49/969 386/28564 -GO:0030595 GO:0030595 leukocyte chemotaxis 35/969 230/28564 -GO:0071674 GO:0071674 mononuclear cell migration 30/969 185/28564 -GO:0060326 GO:0060326 cell chemotaxis 39/969 308/28564 -GO:0035456 GO:0035456 response to interferon-beta 17/969 71/28564 -GO:0097529 GO:0097529 myeloid leukocyte migration 31/969 242/28564 - pvalue p.adjust qvalue -GO:0050900 2.163956e-15 1.096260e-11 8.402981e-12 -GO:0030595 1.066463e-13 2.701351e-10 2.070622e-10 -GO:0071674 1.143223e-12 1.930522e-09 1.479771e-09 -GO:0060326 1.728979e-12 2.189752e-09 1.678475e-09 -GO:0035456 1.681085e-10 1.703276e-07 1.305584e-07 -GO:0097529 2.447325e-10 2.066358e-07 1.583892e-07 - geneID -GO:0050900 Tnfsf18/Sell/Slamf9/Fut7/Itga4/Mdk/Grem1/Ada/Prex1/Edn3/P2ry12/Il12a/S100a8/S100a9/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Trpm4/Hsd3b7/Itgam/Adam8/Calr/Ccl17/Enpp1/Aire/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Aoc3/Itgb3/Ccl28/Lgals3/Ptk2b/Emp2/Apod/Retnlg/Plg/Dusp1/Ager/Il33/Ch25h -GO:0030595 Tnfsf18/Sell/Slamf9/Mdk/Grem1/Prex1/Edn3/Il12a/S100a8/S100a9/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Trpm4/Hsd3b7/Itgam/Adam8/Calr/Ccl17/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Lgals3/Ptk2b/Retnlg/Dusp1/Ch25h -GO:0071674 Tnfsf18/Slamf9/Fut7/Itga4/Mdk/Grem1/Il12a/Nbl1/Padi2/Alox5/Trpm4/Hsd3b7/Adam8/Calr/Ccl17/Aire/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Itgb3/Lgals3/Ptk2b/Apod/Retnlg/Plg/Dusp1/Ager/Ch25h -GO:0060326 Tnfsf18/Sell/Slamf9/Mdk/Grem1/Prex1/Edn3/Il12a/S100a8/S100a9/Lpar1/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Trpm4/Hsd3b7/Itgam/Adam8/Calr/Ccl17/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Ccl28/Lgals3/Ptk2b/Nr4a1/Retnlg/Dusp1/Ch25h/Plxnb3 -GO:0035456 Aim2/Ifi204/Gbp6/Oas1c/Ifitm6/Bst2/Irgm1/Tgtp1/Tgtp2/Ifi47/Igtp/Irgm2/Ifitm7/Gm4951/F830016B08Rik/Iigp1/Ifit1 -GO:0097529 Tnfsf18/Sell/Fut7/Mdk/Grem1/Prex1/Edn3/P2ry12/S100a8/S100a9/Nbl1/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Itgam/Adam8/Ccl17/Enpp1/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Lgals3/Ptk2b/Emp2/Retnlg/Dusp1/Ager +``` output + ID Description GeneRatio BgRatio RichFactor +GO:0050900 GO:0050900 leukocyte migration 51/968 402/28905 0.1268657 +GO:0006935 GO:0006935 chemotaxis 52/968 465/28905 0.1118280 +GO:0042330 GO:0042330 taxis 52/968 467/28905 0.1113490 +GO:0030595 GO:0030595 leukocyte chemotaxis 36/968 242/28905 0.1487603 +GO:0071674 GO:0071674 mononuclear cell migration 32/968 203/28905 0.1576355 +GO:0060326 GO:0060326 cell chemotaxis 41/968 334/28905 0.1227545 + FoldEnrichment zScore pvalue p.adjust qvalue +GO:0050900 3.788277 10.479256 3.536123e-16 1.795643e-12 1.304643e-12 +GO:0006935 3.339243 9.465940 3.336947e-14 6.712316e-11 4.876903e-11 +GO:0042330 3.324942 9.428613 3.965528e-14 6.712316e-11 4.876903e-11 +GO:0030595 4.442063 10.009042 6.591196e-14 8.367524e-11 6.079511e-11 +GO:0071674 4.707080 9.866216 3.208410e-13 3.258461e-10 2.367469e-10 +GO:0060326 3.665515 9.120495 8.641191e-13 7.313328e-10 5.313575e-10 + geneID +GO:0050900 Tnfsf18/Sell/Slamf9/Fut7/Itga4/Mdk/Grem1/Ada/Prex1/Edn3/P2ry12/Il12a/S100a8/S100a9/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Trpm4/Hsd3b7/Itgam/Adam8/Ascl2/Calr/Ccl17/Enpp1/Aire/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Aoc3/Itgb3/Ccl28/Lgals3/Ptk2b/Emp2/Apod/Retnlg/Plg/Fpr2/Dusp1/Ager/Il33/Ch25h +GO:0006935 Tnfsf18/Sell/Slamf9/Mdk/Grem1/Prex1/Edn3/P2ry12/Il12a/S100a8/S100a9/Lpar1/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Ntf3/Trpm4/Hsd3b7/Itgam/Adam8/Lsp1/Calr/Ccl17/Robo3/Cmtm7/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Itgb3/Tubb2b/Ccl28/Lgals3/Cmtm5/Ptk2b/Nr4a1/Casr/Retnlg/Fpr2/Dusp1/Ager/Stx3/Ch25h/Plxnb3/Nox1 +GO:0042330 Tnfsf18/Sell/Slamf9/Mdk/Grem1/Prex1/Edn3/P2ry12/Il12a/S100a8/S100a9/Lpar1/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Ntf3/Trpm4/Hsd3b7/Itgam/Adam8/Lsp1/Calr/Ccl17/Robo3/Cmtm7/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Itgb3/Tubb2b/Ccl28/Lgals3/Cmtm5/Ptk2b/Nr4a1/Casr/Retnlg/Fpr2/Dusp1/Ager/Stx3/Ch25h/Plxnb3/Nox1 +GO:0030595 Tnfsf18/Sell/Slamf9/Mdk/Grem1/Prex1/Edn3/Il12a/S100a8/S100a9/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Trpm4/Hsd3b7/Itgam/Adam8/Calr/Ccl17/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Lgals3/Ptk2b/Retnlg/Fpr2/Dusp1/Ch25h +GO:0071674 Tnfsf18/Slamf9/Fut7/Itga4/Mdk/Grem1/Il12a/Nbl1/Padi2/Alox5/Trpm4/Hsd3b7/Adam8/Ascl2/Calr/Ccl17/Aire/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Itgb3/Lgals3/Ptk2b/Apod/Retnlg/Plg/Fpr2/Dusp1/Ager/Ch25h +GO:0060326 Tnfsf18/Sell/Slamf9/Mdk/Grem1/Prex1/Edn3/Il12a/S100a8/S100a9/Lpar1/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Trpm4/Hsd3b7/Itgam/Adam8/Calr/Ccl17/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Ccl28/Lgals3/Ptk2b/Nr4a1/Retnlg/Fpr2/Dusp1/Ch25h/Plxnb3/Nox1 Count -GO:0050900 49 -GO:0030595 35 -GO:0071674 30 -GO:0060326 39 -GO:0035456 17 -GO:0097529 31 +GO:0050900 51 +GO:0006935 52 +GO:0042330 52 +GO:0030595 36 +GO:0071674 32 +GO:0060326 41 ``` @@ -1421,21 +1421,21 @@ use `select()` function: `select(org.Mm.eg.db, keys = timeDEgenes, keytype = "SYMBOL", column = "ENTREZID")`]. -```r +``` r EntrezIDs = mapIds(org.Mm.eg.db, keys = timeDEgenes, keytype = "SYMBOL", column = "ENTREZID", multiVals = "first") ``` -```{.output} +``` output 'select()' returned 1:1 mapping between keys and columns ``` -```r +``` r EntrezIDs = EntrezIDs[!is.na(EntrezIDs)] head(EntrezIDs) ``` -```{.output} +``` output Sgk3 Kcnb2 Sbspon Gsta3 Lman2l Ankrd39 "170755" "98741" "226866" "14859" "214895" "109346" ``` @@ -1446,7 +1446,7 @@ data frame. -```r +``` r resTimeKEGG = enrichKEGG(gene = EntrezIDs, organism = "mmu", pvalueCutoff = 1, @@ -1455,40 +1455,54 @@ resTimeKEGGTable = as.data.frame(resTimeKEGG) head(resTimeKEGGTable) ``` -```{.output} - ID -mmu00590 mmu00590 -mmu00565 mmu00565 -mmu00592 mmu00592 -mmu00591 mmu00591 -mmu04913 mmu04913 -mmu04061 mmu04061 +``` output + category +mmu00590 Metabolism +mmu00591 Metabolism +mmu00565 Metabolism +mmu00592 Metabolism +mmu04913 Organismal Systems +mmu04061 Environmental Information Processing + subcategory ID +mmu00590 Lipid metabolism mmu00590 +mmu00591 Lipid metabolism mmu00591 +mmu00565 Lipid metabolism mmu00565 +mmu00592 Lipid metabolism mmu00592 +mmu04913 Endocrine system mmu04913 +mmu04061 Signaling molecules and interaction mmu04061 Description mmu00590 Arachidonic acid metabolism - Mus musculus (house mouse) +mmu00591 Linoleic acid metabolism - Mus musculus (house mouse) mmu00565 Ether lipid metabolism - Mus musculus (house mouse) mmu00592 alpha-Linolenic acid metabolism - Mus musculus (house mouse) -mmu00591 Linoleic acid metabolism - Mus musculus (house mouse) mmu04913 Ovarian steroidogenesis - Mus musculus (house mouse) mmu04061 Viral protein interaction with cytokine and cytokine receptor - Mus musculus (house mouse) - GeneRatio BgRatio pvalue p.adjust qvalue -mmu00590 16/454 85/9465 2.224011e-06 0.0006827712 0.0005946302 -mmu00565 11/454 48/9465 1.234403e-05 0.0014156451 0.0012328951 -mmu00592 8/454 25/9465 1.383367e-05 0.0014156451 0.0012328951 -mmu00591 11/454 50/9465 1.870748e-05 0.0014357990 0.0012504473 -mmu04913 12/454 63/9465 3.669398e-05 0.0022530101 0.0019621620 -mmu04061 14/454 95/9465 1.606325e-04 0.0072719317 0.0063331756 + GeneRatio BgRatio RichFactor FoldEnrichment zScore pvalue +mmu00590 16/460 89/9787 0.1797753 3.824915 5.945245 3.232256e-06 +mmu00591 12/460 55/9787 0.2181818 4.642055 6.015024 6.976725e-06 +mmu00565 11/460 48/9787 0.2291667 4.875770 5.977670 1.021306e-05 +mmu00592 8/460 25/9787 0.3200000 6.808348 6.457490 1.194030e-05 +mmu04913 12/460 64/9787 0.1875000 3.989266 5.328009 3.553165e-05 +mmu04061 14/460 95/9787 0.1473684 3.135423 4.644614 1.301689e-04 + p.adjust qvalue +mmu00590 0.0009253733 0.0008012571 +mmu00591 0.0009253733 0.0008012571 +mmu00565 0.0009253733 0.0008012571 +mmu00592 0.0009253733 0.0008012571 +mmu04913 0.0022029625 0.0019074887 +mmu04061 0.0067253926 0.0058233450 geneID mmu00590 18783/19215/211429/329502/78390/19223/67103/242546/13118/18781/18784/11689/232889/15446/237625/11687 +mmu00591 18783/211429/329502/78390/242546/18781/18784/13113/622127/232889/237625/11687 mmu00565 18783/211429/329502/78390/22239/18781/18784/232889/320981/237625/53897 mmu00592 18783/211429/329502/78390/18781/18784/232889/237625 -mmu00591 18783/211429/329502/78390/242546/18781/18784/13113/232889/237625/11687 mmu04913 18783/211429/329502/78390/242546/11689/232889/13076/13070/15485/13078/16867 mmu04061 16174/20311/57349/56744/14825/20295/20296/20306/20304/20305/12775/56838/16185/16186 Count mmu00590 16 +mmu00591 12 mmu00565 11 mmu00592 8 -mmu00591 11 mmu04913 12 mmu04061 14 ``` @@ -1515,13 +1529,13 @@ data frame of genes and gene sets (or a class that can be converted to a data frame). -```r +``` r library(msigdbr) gene_sets = msigdbr(category = "H", species = "mouse") head(gene_sets) ``` -```{.output} +``` output # A tibble: 6 × 18 gs_cat gs_subcat gs_name gene_symbol entrez_gene ensembl_gene @@ -1541,7 +1555,7 @@ As mentioned before, it is important the gene ID type in the gene sets should be the same as in the DE genes, so here we choose the `"gene_symbol"` column. -```r +``` r resTimeHallmark = enricher(gene = timeDEgenes, TERM2GENE = gene_sets[, c("gs_name", "gene_symbol")], pvalueCutoff = 1, @@ -1550,7 +1564,7 @@ resTimeHallmarkTable = as.data.frame(resTimeHallmark) head(resTimeHallmarkTable) ``` -```{.output} +``` output ID HALLMARK_MYOGENESIS HALLMARK_MYOGENESIS HALLMARK_COMPLEMENT HALLMARK_COMPLEMENT @@ -1565,13 +1579,20 @@ HALLMARK_COAGULATION HALLMARK_COAGULATION 20/291 HALLMARK_ALLOGRAFT_REJECTION HALLMARK_ALLOGRAFT_REJECTION 23/291 HALLMARK_ESTROGEN_RESPONSE_LATE HALLMARK_ESTROGEN_RESPONSE_LATE 23/291 HALLMARK_INFLAMMATORY_RESPONSE HALLMARK_INFLAMMATORY_RESPONSE 22/291 - BgRatio pvalue p.adjust qvalue -HALLMARK_MYOGENESIS 201/4394 5.710171e-06 0.0002740882 0.0002284068 -HALLMARK_COMPLEMENT 196/4394 4.300844e-04 0.0103220246 0.0086016872 -HALLMARK_COAGULATION 139/4394 7.088433e-04 0.0113414921 0.0094512434 -HALLMARK_ALLOGRAFT_REJECTION 201/4394 6.397659e-03 0.0767719033 0.0639765861 -HALLMARK_ESTROGEN_RESPONSE_LATE 206/4394 8.584183e-03 0.0824081536 0.0686734614 -HALLMARK_INFLAMMATORY_RESPONSE 201/4394 1.256794e-02 0.0960740794 0.0800617329 + BgRatio RichFactor FoldEnrichment zScore +HALLMARK_MYOGENESIS 201/4394 0.1542289 2.328803 5.135378 +HALLMARK_COMPLEMENT 196/4394 0.1326531 2.003016 3.825523 +HALLMARK_COAGULATION 139/4394 0.1438849 2.172612 3.741006 +HALLMARK_ALLOGRAFT_REJECTION 201/4394 0.1144279 1.727821 2.812786 +HALLMARK_ESTROGEN_RESPONSE_LATE 206/4394 0.1116505 1.685884 2.685080 +HALLMARK_INFLAMMATORY_RESPONSE 201/4394 0.1094527 1.652699 2.522462 + pvalue p.adjust qvalue +HALLMARK_MYOGENESIS 5.710171e-06 0.0002740882 0.0002284068 +HALLMARK_COMPLEMENT 4.300844e-04 0.0103220246 0.0086016872 +HALLMARK_COAGULATION 7.088433e-04 0.0113414921 0.0094512434 +HALLMARK_ALLOGRAFT_REJECTION 6.397659e-03 0.0767719033 0.0639765861 +HALLMARK_ESTROGEN_RESPONSE_LATE 8.584183e-03 0.0824081536 0.0686734614 +HALLMARK_INFLAMMATORY_RESPONSE 1.256794e-02 0.0960740794 0.0800617329 geneID HALLMARK_MYOGENESIS Myl1/Casq1/Aplnr/Tnnc2/Ptgis/Gja5/Col15a1/Cav3/Tnnt1/Ryr1/Cox6a2/Tnni2/Lsp1/Nqo1/Hspb2/Cryab/Erbb3/Stc2/Gpx3/Sparc/Myh3/Myh1/Col1a1/Cacng1/Itgb4/Bdkrb2/Mapk12/Apod/Spdef/Cdkn1a/Actn3 HALLMARK_COMPLEMENT Serpinb2/Pla2g4a/Cd46/Hspa5/Cp/Gnb4/S100a9/Cda/Cxcl1/Apoc1/Itgam/Irf7/Klkb1/Mmp15/Mmp8/Ccl5/Lgals3/Gzmb/Tmprss6/Maff/Plg/Psmb9/Hspa1a/C3/Dusp5/Phex @@ -1597,7 +1618,7 @@ Implementing ORA is rather simple. The following function `ora()` performs ORA on a list of gene sets. Try to read and understand the code. -```r +``` r ora = function(genes, gene_sets, universe = NULL) { if(is.null(universe)) { universe = unique(unlist(gene_sets)) @@ -1633,13 +1654,13 @@ ora = function(genes, gene_sets, universe = NULL) { Test on the MSigDB hallmark gene sets: -```r +``` r HallmarkGeneSets = split(gene_sets$gene_symbol, gene_sets$gs_name) df = ora(timeDEgenes, HallmarkGeneSets, rownames(se)) head(df) ``` -```{.output} +``` output gene_set hits n_genes HALLMARK_ADIPOGENESIS HALLMARK_ADIPOGENESIS 9 1134 HALLMARK_ALLOGRAFT_REJECTION HALLMARK_ALLOGRAFT_REJECTION 23 1134 @@ -1722,7 +1743,7 @@ the `ora()` function which we have implemented in previous "Further reading" section and we compare three different universe settings. -```r +``` r # all genes in the gene sets collection ~ 4k genes df1 = ora(timeDEgenes, HallmarkGeneSets) # all protein-coding genes, ~ 20k genes @@ -1770,7 +1791,7 @@ https://yulab-smu.top/biomedical-knowledge-mining-book/enrichplot.html. We first re-generate the enrichment table. -```r +``` r library(enrichplot) resTimeGO = enrichGO(gene = timeDEgenes, keyType = "SYMBOL", @@ -1782,48 +1803,48 @@ resTimeGOTable = as.data.frame(resTimeGO) head(resTimeGOTable) ``` -```{.output} - ID Description GeneRatio BgRatio -GO:0050900 GO:0050900 leukocyte migration 49/969 386/28564 -GO:0030595 GO:0030595 leukocyte chemotaxis 35/969 230/28564 -GO:0071674 GO:0071674 mononuclear cell migration 30/969 185/28564 -GO:0060326 GO:0060326 cell chemotaxis 39/969 308/28564 -GO:0035456 GO:0035456 response to interferon-beta 17/969 71/28564 -GO:0097529 GO:0097529 myeloid leukocyte migration 31/969 242/28564 - pvalue p.adjust qvalue -GO:0050900 2.163956e-15 1.096260e-11 8.402981e-12 -GO:0030595 1.066463e-13 2.701351e-10 2.070622e-10 -GO:0071674 1.143223e-12 1.930522e-09 1.479771e-09 -GO:0060326 1.728979e-12 2.189752e-09 1.678475e-09 -GO:0035456 1.681085e-10 1.703276e-07 1.305584e-07 -GO:0097529 2.447325e-10 2.066358e-07 1.583892e-07 - geneID -GO:0050900 Tnfsf18/Sell/Slamf9/Fut7/Itga4/Mdk/Grem1/Ada/Prex1/Edn3/P2ry12/Il12a/S100a8/S100a9/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Trpm4/Hsd3b7/Itgam/Adam8/Calr/Ccl17/Enpp1/Aire/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Aoc3/Itgb3/Ccl28/Lgals3/Ptk2b/Emp2/Apod/Retnlg/Plg/Dusp1/Ager/Il33/Ch25h -GO:0030595 Tnfsf18/Sell/Slamf9/Mdk/Grem1/Prex1/Edn3/Il12a/S100a8/S100a9/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Trpm4/Hsd3b7/Itgam/Adam8/Calr/Ccl17/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Lgals3/Ptk2b/Retnlg/Dusp1/Ch25h -GO:0071674 Tnfsf18/Slamf9/Fut7/Itga4/Mdk/Grem1/Il12a/Nbl1/Padi2/Alox5/Trpm4/Hsd3b7/Adam8/Calr/Ccl17/Aire/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Itgb3/Lgals3/Ptk2b/Apod/Retnlg/Plg/Dusp1/Ager/Ch25h -GO:0060326 Tnfsf18/Sell/Slamf9/Mdk/Grem1/Prex1/Edn3/Il12a/S100a8/S100a9/Lpar1/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Trpm4/Hsd3b7/Itgam/Adam8/Calr/Ccl17/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Ccl28/Lgals3/Ptk2b/Nr4a1/Retnlg/Dusp1/Ch25h/Plxnb3 -GO:0035456 Aim2/Ifi204/Gbp6/Oas1c/Ifitm6/Bst2/Irgm1/Tgtp1/Tgtp2/Ifi47/Igtp/Irgm2/Ifitm7/Gm4951/F830016B08Rik/Iigp1/Ifit1 -GO:0097529 Tnfsf18/Sell/Fut7/Mdk/Grem1/Prex1/Edn3/P2ry12/S100a8/S100a9/Nbl1/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Itgam/Adam8/Ccl17/Enpp1/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Lgals3/Ptk2b/Emp2/Retnlg/Dusp1/Ager +``` output + ID Description GeneRatio BgRatio RichFactor +GO:0050900 GO:0050900 leukocyte migration 51/968 402/28905 0.1268657 +GO:0006935 GO:0006935 chemotaxis 52/968 465/28905 0.1118280 +GO:0042330 GO:0042330 taxis 52/968 467/28905 0.1113490 +GO:0030595 GO:0030595 leukocyte chemotaxis 36/968 242/28905 0.1487603 +GO:0071674 GO:0071674 mononuclear cell migration 32/968 203/28905 0.1576355 +GO:0060326 GO:0060326 cell chemotaxis 41/968 334/28905 0.1227545 + FoldEnrichment zScore pvalue p.adjust qvalue +GO:0050900 3.788277 10.479256 3.536123e-16 1.795643e-12 1.304643e-12 +GO:0006935 3.339243 9.465940 3.336947e-14 6.712316e-11 4.876903e-11 +GO:0042330 3.324942 9.428613 3.965528e-14 6.712316e-11 4.876903e-11 +GO:0030595 4.442063 10.009042 6.591196e-14 8.367524e-11 6.079511e-11 +GO:0071674 4.707080 9.866216 3.208410e-13 3.258461e-10 2.367469e-10 +GO:0060326 3.665515 9.120495 8.641191e-13 7.313328e-10 5.313575e-10 + geneID +GO:0050900 Tnfsf18/Sell/Slamf9/Fut7/Itga4/Mdk/Grem1/Ada/Prex1/Edn3/P2ry12/Il12a/S100a8/S100a9/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Trpm4/Hsd3b7/Itgam/Adam8/Ascl2/Calr/Ccl17/Enpp1/Aire/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Aoc3/Itgb3/Ccl28/Lgals3/Ptk2b/Emp2/Apod/Retnlg/Plg/Fpr2/Dusp1/Ager/Il33/Ch25h +GO:0006935 Tnfsf18/Sell/Slamf9/Mdk/Grem1/Prex1/Edn3/P2ry12/Il12a/S100a8/S100a9/Lpar1/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Ntf3/Trpm4/Hsd3b7/Itgam/Adam8/Lsp1/Calr/Ccl17/Robo3/Cmtm7/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Itgb3/Tubb2b/Ccl28/Lgals3/Cmtm5/Ptk2b/Nr4a1/Casr/Retnlg/Fpr2/Dusp1/Ager/Stx3/Ch25h/Plxnb3/Nox1 +GO:0042330 Tnfsf18/Sell/Slamf9/Mdk/Grem1/Prex1/Edn3/P2ry12/Il12a/S100a8/S100a9/Lpar1/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Ntf3/Trpm4/Hsd3b7/Itgam/Adam8/Lsp1/Calr/Ccl17/Robo3/Cmtm7/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Itgb3/Tubb2b/Ccl28/Lgals3/Cmtm5/Ptk2b/Nr4a1/Casr/Retnlg/Fpr2/Dusp1/Ager/Stx3/Ch25h/Plxnb3/Nox1 +GO:0030595 Tnfsf18/Sell/Slamf9/Mdk/Grem1/Prex1/Edn3/Il12a/S100a8/S100a9/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Trpm4/Hsd3b7/Itgam/Adam8/Calr/Ccl17/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Lgals3/Ptk2b/Retnlg/Fpr2/Dusp1/Ch25h +GO:0071674 Tnfsf18/Slamf9/Fut7/Itga4/Mdk/Grem1/Il12a/Nbl1/Padi2/Alox5/Trpm4/Hsd3b7/Adam8/Ascl2/Calr/Ccl17/Aire/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Itgb3/Lgals3/Ptk2b/Apod/Retnlg/Plg/Fpr2/Dusp1/Ager/Ch25h +GO:0060326 Tnfsf18/Sell/Slamf9/Mdk/Grem1/Prex1/Edn3/Il12a/S100a8/S100a9/Lpar1/Nbl1/Padi2/Bst1/Cxcl5/Ppbp/Pf4/Cxcl1/Ptn/Alox5/Trpm4/Hsd3b7/Itgam/Adam8/Calr/Ccl17/Ccl2/Ccl7/Ccl5/Ccl6/Ccr7/Ccl28/Lgals3/Ptk2b/Nr4a1/Retnlg/Fpr2/Dusp1/Ch25h/Plxnb3/Nox1 Count -GO:0050900 49 -GO:0030595 35 -GO:0071674 30 -GO:0060326 39 -GO:0035456 17 -GO:0097529 31 +GO:0050900 51 +GO:0006935 52 +GO:0042330 52 +GO:0030595 36 +GO:0071674 32 +GO:0060326 41 ``` `barplot()` and `dotplot()` generate plots for a small number of significant gene sets. Note the two functions are directly applied on `resTimeGO` returned by `enrichGO()`. -```r +``` r barplot(resTimeGO, showCategory = 20) ``` -```r +``` r dotplot(resTimeGO, showCategory = 20) ``` @@ -1866,7 +1887,7 @@ gene sets. Recall the denotations in the 2x2 contingency table (we are too far from that!). Let's take these numbers from the enrichment table. -```r +``` r n_11 = resTimeGOTable$Count n_10 = 983 # length(intersect(resTimeGO@gene, resTimeGO@universe)) n_01 = as.numeric(gsub("/.*$", "", resTimeGOTable$BgRatio)) @@ -1877,7 +1898,7 @@ Instead of using `GeneRatio`, we use the fraction of DE genes in the gene sets which are kind of like a "scaled" value for all gene sets. Let's calculate it: -```r +``` r resTimeGOTable$DE_Ratio = n_11/n_01 resTimeGOTable$GS_size = n_01 # size of gene sets ``` @@ -1896,7 +1917,7 @@ universe_ or the log2 of the ratio of _gene\_set% in the DE genes_ and _gene\_set% in the universe_. The two are identical. -```r +``` r resTimeGOTable$log2_Enrichment = log( (n_11/n_10)/(n_01/n) ) ``` @@ -1910,7 +1931,7 @@ distribution](https://en.wikipedia.org/wiki/Hypergeometric_distribution). They can be calculated as: -```r +``` r hyper_mean = n_01*n_10/n n_02 = n - n_01 @@ -1929,7 +1950,7 @@ In `resTimeGOTable`, gene sets are already ordered by the significance, so we take the first 10 gene sets which are the 10 most significant gene sets. -```r +``` r library(ggplot2) ggplot(resTimeGOTable[1:10, ], aes(x = log2_Enrichment, y = factor(Description, levels = rev(Description)), @@ -1946,7 +1967,7 @@ In the next example, we use _z_-score as the primary variable to map to the offset to origin, `DE_Ratio` and `Count` to map to dot colors and sizes. -```r +``` r ggplot(resTimeGOTable[1:10, ], aes(x = zScore, y = factor(Description, levels = rev(Description)), col = DE_Ratio, size = Count)) + @@ -1971,7 +1992,7 @@ then the gene sets on the top right region can be thought as being both statistically significant and also biologically sensible. -```r +``` r ggplot(resTimeGOTable, aes(x = log2_Enrichment, y = -log10(p.adjust), color = DE_Ratio, size = GS_size)) + @@ -1998,7 +2019,7 @@ the two ORA analysis in one plot. In the following code, we first generate two enrichment tables for up-regulated genes and down-regulated separately. -```r +``` r # up-regulated genes timeDEup <- as.data.frame(subset(resTime, padj < 0.05 & log2FoldChange > log2(1.5))) timeDEupGenes <- rownames(timeDEup) @@ -2041,7 +2062,7 @@ up-regulated genes and the first 5 most significant terms for down-regulated genes. The following **ggplot2** code should be easy to read. -```r +``` r # The name of the 3rd term is too long, we wrap it into two lines. resTimeGOupTable[3, "Description"] = paste(strwrap(resTimeGOupTable[3, "Description"]), collapse = "\n") @@ -2072,7 +2093,7 @@ The input of the `simplifyGO()` function is a vector of GO IDs. It is recommende to have at least 100 GO IDs for summarization and visualization. -```r +``` r GO_ID = resTimeGOTable$ID[resTimeGOTable$p.adjust < 0.1] library(simplifyEnrichment) simplifyGO(GO_ID) diff --git a/data/GSE96870_se.rds b/data/GSE96870_se.rds index 17481546..b1208299 100644 Binary files a/data/GSE96870_se.rds and b/data/GSE96870_se.rds differ diff --git a/fig/04-exploratory-qc-rendered-pca-exercise-1.png b/fig/04-exploratory-qc-rendered-pca-exercise-1.png index 3df85581..10835aaa 100644 Binary files a/fig/04-exploratory-qc-rendered-pca-exercise-1.png and b/fig/04-exploratory-qc-rendered-pca-exercise-1.png differ diff --git a/fig/05-differential-expression-rendered-heatmap-time-1.png b/fig/05-differential-expression-rendered-heatmap-time-1.png index 5838fb04..2697d73d 100644 Binary files a/fig/05-differential-expression-rendered-heatmap-time-1.png and b/fig/05-differential-expression-rendered-heatmap-time-1.png differ diff --git a/fig/07-gene-set-analysis-rendered-compare-universe-1.png b/fig/07-gene-set-analysis-rendered-compare-universe-1.png index 91df3830..8a2c1aa3 100644 Binary files a/fig/07-gene-set-analysis-rendered-compare-universe-1.png and b/fig/07-gene-set-analysis-rendered-compare-universe-1.png differ diff --git a/fig/07-gene-set-analysis-rendered-hypergeom-1.png b/fig/07-gene-set-analysis-rendered-hypergeom-1.png index d0896b35..0cbfa913 100644 Binary files a/fig/07-gene-set-analysis-rendered-hypergeom-1.png and b/fig/07-gene-set-analysis-rendered-hypergeom-1.png differ diff --git a/fig/07-gene-set-analysis-rendered-more-enrichplots-1.png b/fig/07-gene-set-analysis-rendered-more-enrichplots-1.png index b571e23f..14110b30 100644 Binary files a/fig/07-gene-set-analysis-rendered-more-enrichplots-1.png and b/fig/07-gene-set-analysis-rendered-more-enrichplots-1.png differ diff --git a/fig/07-gene-set-analysis-rendered-more-enrichplots-2.png b/fig/07-gene-set-analysis-rendered-more-enrichplots-2.png index e9908f6c..d71baa6d 100644 Binary files a/fig/07-gene-set-analysis-rendered-more-enrichplots-2.png and b/fig/07-gene-set-analysis-rendered-more-enrichplots-2.png differ diff --git a/fig/07-gene-set-analysis-rendered-plot-enrichment-1.png b/fig/07-gene-set-analysis-rendered-plot-enrichment-1.png index 7383bc4a..eaf00d8c 100644 Binary files a/fig/07-gene-set-analysis-rendered-plot-enrichment-1.png and b/fig/07-gene-set-analysis-rendered-plot-enrichment-1.png differ diff --git a/fig/07-gene-set-analysis-rendered-plot-enrichment-padj-1.png b/fig/07-gene-set-analysis-rendered-plot-enrichment-padj-1.png index a4ada06f..759cab70 100644 Binary files a/fig/07-gene-set-analysis-rendered-plot-enrichment-padj-1.png and b/fig/07-gene-set-analysis-rendered-plot-enrichment-padj-1.png differ diff --git a/fig/07-gene-set-analysis-rendered-plot-up-down-1.png b/fig/07-gene-set-analysis-rendered-plot-up-down-1.png index 08574b0e..2effa76a 100644 Binary files a/fig/07-gene-set-analysis-rendered-plot-up-down-1.png and b/fig/07-gene-set-analysis-rendered-plot-up-down-1.png differ diff --git a/fig/07-gene-set-analysis-rendered-plot-z-1.png b/fig/07-gene-set-analysis-rendered-plot-z-1.png index 52ea91ac..5acf482f 100644 Binary files a/fig/07-gene-set-analysis-rendered-plot-z-1.png and b/fig/07-gene-set-analysis-rendered-plot-z-1.png differ diff --git a/md5sum.txt b/md5sum.txt index 7c5ef97f..8ecb2c61 100644 --- a/md5sum.txt +++ b/md5sum.txt @@ -1,19 +1,19 @@ "file" "checksum" "built" "date" -"CODE_OF_CONDUCT.md" "c93c83c630db2fe2462240bf72552548" "site/built/CODE_OF_CONDUCT.md" "2024-03-05" -"LICENSE.md" "2fb7308e51efc6894f5fb9794505973e" "site/built/LICENSE.md" "2024-03-05" -"config.yaml" "6145f67cb03c9875f6ec5b6417a330eb" "site/built/config.yaml" "2024-03-05" -"index.md" "6a1b0b9e8983481699e2e4ef54a4893a" "site/built/index.md" "2024-03-05" -"episodes/01-intro-to-rnaseq.Rmd" "b4d2d36be5042834578061b30665d493" "site/built/01-intro-to-rnaseq.md" "2024-03-05" -"episodes/02-setup.Rmd" "e62af0b59eec704194484af83db5b859" "site/built/02-setup.md" "2024-03-05" -"episodes/03-import-annotate.Rmd" "c43d8eab0fcc982993ea35a952d323c0" "site/built/03-import-annotate.md" "2024-03-05" -"episodes/04-exploratory-qc.Rmd" "af033b6158f06de3f33d41673272a87d" "site/built/04-exploratory-qc.md" "2024-03-05" -"episodes/05-differential-expression.Rmd" "29ab3939367a0b3521cfd764929e3aee" "site/built/05-differential-expression.md" "2024-03-05" -"episodes/06-extra-design.Rmd" "e4859b8e76bc6d0c8d5c598e1d00b37b" "site/built/06-extra-design.md" "2024-03-05" -"episodes/07-gene-set-analysis.Rmd" "e7b69b6f2af11dfab883303e9a50981d" "site/built/07-gene-set-analysis.md" "2024-03-05" -"episodes/08-next-steps.Rmd" "88227f8b7b9ae91bd86a66a93d722946" "site/built/08-next-steps.md" "2024-03-05" -"instructors/instructor-notes.md" "a59fd3b94c07c3fe3218c054a0f03277" "site/built/instructor-notes.md" "2024-03-05" -"learners/discuss.md" "2758e2e5abd231d82d25c6453d8abbc6" "site/built/discuss.md" "2024-03-05" -"learners/reference.md" "b9aea3dd8169bf1105bf4f462e8e38f5" "site/built/reference.md" "2024-03-05" -"learners/setup.md" "29d12719c4fcebc031f726275706b330" "site/built/setup.md" "2024-03-05" -"profiles/learner-profiles.md" "28ba7f4d04639862db131119f4dc4f4d" "site/built/learner-profiles.md" "2024-03-05" -"renv/profiles/lesson-requirements/renv.lock" "01d50d1ab270c12e4feea7ab4a295486" "site/built/renv.lock" "2024-03-05" +"CODE_OF_CONDUCT.md" "c93c83c630db2fe2462240bf72552548" "site/built/CODE_OF_CONDUCT.md" "2024-10-22" +"LICENSE.md" "2fb7308e51efc6894f5fb9794505973e" "site/built/LICENSE.md" "2024-10-22" +"config.yaml" "6145f67cb03c9875f6ec5b6417a330eb" "site/built/config.yaml" "2024-10-22" +"index.md" "6a1b0b9e8983481699e2e4ef54a4893a" "site/built/index.md" "2024-10-22" +"episodes/01-intro-to-rnaseq.Rmd" "b4d2d36be5042834578061b30665d493" "site/built/01-intro-to-rnaseq.md" "2024-10-22" +"episodes/02-setup.Rmd" "e62af0b59eec704194484af83db5b859" "site/built/02-setup.md" "2024-10-22" +"episodes/03-import-annotate.Rmd" "c43d8eab0fcc982993ea35a952d323c0" "site/built/03-import-annotate.md" "2024-10-22" +"episodes/04-exploratory-qc.Rmd" "af033b6158f06de3f33d41673272a87d" "site/built/04-exploratory-qc.md" "2024-10-22" +"episodes/05-differential-expression.Rmd" "29ab3939367a0b3521cfd764929e3aee" "site/built/05-differential-expression.md" "2024-10-22" +"episodes/06-extra-design.Rmd" "e4859b8e76bc6d0c8d5c598e1d00b37b" "site/built/06-extra-design.md" "2024-10-22" +"episodes/07-gene-set-analysis.Rmd" "e7b69b6f2af11dfab883303e9a50981d" "site/built/07-gene-set-analysis.md" "2024-10-22" +"episodes/08-next-steps.Rmd" "88227f8b7b9ae91bd86a66a93d722946" "site/built/08-next-steps.md" "2024-10-22" +"instructors/instructor-notes.md" "a59fd3b94c07c3fe3218c054a0f03277" "site/built/instructor-notes.md" "2024-10-22" +"learners/discuss.md" "2758e2e5abd231d82d25c6453d8abbc6" "site/built/discuss.md" "2024-10-22" +"learners/reference.md" "b9aea3dd8169bf1105bf4f462e8e38f5" "site/built/reference.md" "2024-10-22" +"learners/setup.md" "29d12719c4fcebc031f726275706b330" "site/built/setup.md" "2024-10-22" +"profiles/learner-profiles.md" "28ba7f4d04639862db131119f4dc4f4d" "site/built/learner-profiles.md" "2024-10-22" +"renv/profiles/lesson-requirements/renv.lock" "68edba35866985308613a6a636279a2c" "site/built/renv.lock" "2024-10-22"