forked from YutingPKU/SingleCell-Multiomics
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathSeuratCellLevelFiltering.Rmd
110 lines (77 loc) · 2.77 KB
/
SeuratCellLevelFiltering.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
# Seurat QC Cell-level Filtering
## Description
Basic quality control for snRNA-seq: check the distribution of
* number of UMIs per cell
* should above 500
* number of genes detected per cell
* number of genes detected per UMI
* check the complexity. outlier cells might be cells have less complex RNA species like red blood cells.
expected higher than 0.8
* mitochondrial ratio
* dead or dying cells will cause large amount of mitochondrial contamination
## Load seurat object
```{r, cache=TRUE}
combined <- get(load('data/Demo_CombinedSeurat_SCT_Preprocess.RData'))
```
## Add other meta info
* fraction of reads mapping to mitochondrial gene
```{r, cache=TRUE}
# for macaque, not all genes start with MT is mitochondrion genes
mt.gene <- c("MTARC2","MTFR1L","MTERF1","MTFR2","MTRF1L","MTRES1",
"MTO1","MTCH1","MTFMT","MTFR1","MTERF3","MTERF2","MTPAP",
"MTERF4","MTCH2",'MTIF2',"MTG2","MTIF3","MTRF1","MTCL1")
combined[["percent.mt"]] <- PercentageFeatureSet(combined, features = mt.gene )
```
* number of genes detected per UMI
```{r}
combined$log10GenesPerUMI <- log10(combined$nFeature_RNA) / log10(combined$nCount_RNA)
```
## Violin plots to check
* get the meta data
```{r, cache=TRUE}
df <- as.data.table(combined@meta.data)
sel <- c("orig.ident", "nCount_RNA", "nFeature_RNA", "percent.mt", "log10GenesPerUMI")
df <- df[, sel, with = FALSE]
df[1:3, ]
```
* define plotting function
```{r, cache=TRUE}
fontsize <- 10
linesize <- 0.35
gp.ls <- df[, 2:5] %>% imap( ~ {
# define lable fun
give.n <- function(x) {
return(c(y = median(x) + max(x) / 10, label = round(median(x), 2)))
}
# assign colors
col.ls <-
setNames(
c('lightpink2', 'lightblue2', 'lightgreen', 'coral1'),
c("nCount_RNA", "nFeature_RNA", "percent.mt", "log10GenesPerUMI")
)
ggplot(data = df, aes(x = orig.ident, y = .x)) +
geom_violin(trim = FALSE, fill = col.ls[.y]) +
ggtitle(label = .y) + ylab(label = .y) +
theme_bw() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
strip.background = element_blank(),
panel.border = element_blank()
) +
theme(
axis.text = element_text(size = fontsize),
axis.line = element_line(colour = "black", size = linesize),
axis.ticks = element_line(size = linesize),
axis.title.x = element_blank(),
axis.ticks.length = unit(.05, "cm"),
plot.title = element_text(size = fontsize + 2, hjust = 0.5),
legend.position = 'none'
) +
stat_summary(fun = median, geom = "point", col = "black") + # Add points to plot
stat_summary(fun.data = give.n,
geom = "text",
col = "black")
})
grid.arrange(gp.ls[[1]], gp.ls[[2]], gp.ls[[3]], gp.ls[[4]], ncol = 2)
```