Add detailed README and indicate lines to modify

sjessa · sjessa · commit a8ec6a30c4de · 2019-02-06T22:09:25.000-05:00
diff --git a/README.md b/README.md
@@ -0,0 +1,110 @@
+
+# cytoscope
+
+A simple Shiny app for visualizing single cell data.
+
+## 0. Structure
+
+This app expects the following directory structure with these naming conventions for samples, where `sn_id` corresponds to the ID of sample *n*:
+
+```
+app.R
+get_genes.R
+data/
+  seurat_genes.Rda
+  markers/
+      s1_id.markers.tsv
+      ...
+      s5_id.markers.tsv
+  seurat/
+      s1_id.seurat_small.Rda
+      ...
+      s5_id.seurat_small.Rda
+```
+
+You can have as many samples as you like!
+
+## 1. Add your data
+
+### Seurat objects
+
+Currently this works for SC data stored in seurat objects. Since these objects
+tend to be large and store a lot of data, most of which are not needed for the simple
+functions here, and generating smaller versions to be stored alongside this app help with speed.
+These should be saved in the `data` directory, each named as `sn_id.seurat_small.Rda`.
+
+#### Shrinking Seurat objects
+
+Here is an example function that takes a Seurat object as input, deletes some of the
+content not needed for the app, and returns the shrunken object:
+
+```r
+
+reduce_seurat <- function(seurat,
+                          n_pcs = 5,
+                          keep_raw_data = FALSE) {
+  
+  # Remove scale.data
+  seurat@scale.data <- NULL
+  
+  # Remove raw data, only used for scaling and differential expression
+  # as per https://satijalab.org/seurat/faq
+  if (!keep_raw_data) seurat@raw.data <- NULL
+  
+  # Only keep data for the first few PCs
+  seurat@dr$pca@cell.embeddings <- seurat@dr$pca@cell.embeddings[, 1:n_pcs]
+  seurat@dr$pca@gene.loadings <- seurat@dr$pca@gene.loadings[, 1:n_pcs]
+  
+  if (!all(dim(seurat@dr$pca@gene.loadings.full) == 0)) {
+    
+    seurat@dr$pca@gene.loadings.full <- seurat@dr$pca@gene.loadings.full[, 1:n_pcs]
+      
+  }
+  
+  return(seurat)
+  
+}
+
+```
+
+#### Other requirements
+
+- The app will title certain plots using the `@project.name` slot in the Seurat
+objects
+- The app expects that a named character vector (names matching cluster names and
+values corresponding to colours) is stored at `@misc$colours`
+
+### Cluster markers
+
+The cluster markers can also be provided, allowing them to be searched/filtered
+in the app in a separate tab. The output of `Seurat::FindAllMarkers` for each sample can
+be saved as a TSV file and saved in the `markers` directory, named as `sn_id.markers.tsv`.
+
+These are the expected columns:
+
+```
+p_val	avg_logFC	pct.1	pct.2	p_val_adj	cluster	external_gene_name	ensembl_gene_id	gene_biotype	description
+```
+
+The `server` function in the `app.R` script can be easily modified to suit the
+columns of your markers files, at the step which generates `output$markers`.
+
+## 2. Prepare `seurat_genes.Rda`
+
+We save the gene lists for each sample to allow them to be searched by the user
+in the app when visualizing expression. To prepare this list:
+
+1. Populating the `data` directory
+2. Modify the indicated lines in `get_genes.R` with your sample IDs
+3. Run `$ Rscript get_genes.R` from the top level of the directory storing your app
+
+## 3. Modify `app.R`
+
+Modify the`app.R` script at the indicated lines with your sample IDs. In the example script,
+there are two collections of samples - you can have as many collections as you like,
+each containing as many samples as you like.
+
+NOTE: The `sn_id` fields **must** match exactly the sample IDs used to name your data.
+The `Sample n name` fields can be friendlier versions of the sample IDs, or exactly the same; nothing depends on these and they are just for listing the available datasets.
+
+## 4. Open `app.R` in RStudio and hit `Run App`! 
diff --git a/app.R b/app.R
@@ -19,11 +19,18 @@ ui <- fluidPage(
                  h3("Data"),
                  selectInput("sample", "Dataset", multiple = FALSE, selected = "ct_p3",
                              choices = list(
+                               
+                               # ********************
+                               # *** MODIFY THIS ****
+                               # ********************
+                               
                                "Sample collection 1" = c("Sample 1 name" = "s1_id",
                                                          "Sample 2 name" = "s2_id"),
                                "Sample collection 2" = c("Sample 3 name" = "s3_id",
                                                          "Sample 4 name" = "s4_id",
                                                          "Sample 5 name" = "s5_id")
+                               
+                               
                              )),
       selectInput("gene", "Genes (max 3)", choices = character(0), multiple = TRUE),
       selectInput("dr", "Dimensionality reduction", multiple = FALSE, choices = c("tsne", "pca"), selected = "tsne"),
diff --git a/get_genes.R b/get_genes.R
@@ -9,8 +9,10 @@ for (i in seq_along(seurat_obj)) {
   
 }
 
+# ** MODIFY THIS **
 samples <- list(s1_id, s2_id, s3_id, s4_id, s5_id)
 
+# ** MODIFY THIS **
 names(samples) <- c("s1_id",
                     "s2_id",
                     "s3_id",

Original file line number	Diff line number	Diff line change
`@@ -9,8 +9,10 @@ for (i in seq_along(seurat_obj)) {`
`9`	`9`
`10`	`10`	`}`
`11`	`11`
	`12`	`+# MODIFY THIS `
`12`	`13`	`samples <- list(s1_id, s2_id, s3_id, s4_id, s5_id)`
`13`	`14`
	`15`	`+# MODIFY THIS `
`14`	`16`	`names(samples) <- c("s1_id",`
`15`	`17`	`"s2_id",`
`16`	`18`	`"s3_id",`