-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path22-Metacells_QCs.Rmd
198 lines (162 loc) · 9.38 KB
/
22-Metacells_QCs.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
# Metacells' QCs {#QCs}
```{r include=FALSE}
TO_CACHE <- FALSE
```
Different metrics have been proposed in previous metacell studies to evaluate the quality of metacells.
We propose a R package called *MetacellAnalysisToolkit*, to compute and visualize these metrics.
The package also provides a function to visualize metacells projected in the single-cell space.
Import packages:
```{r true-import, echo = TRUE, message = FALSE, warning = FALSE}
library(MetacellAnalysisToolkit)
library(Seurat)
# If you have Seurat V5 installed, specify that you want to analyze Seurat V4 objects
if(packageVersion("Seurat") >= 5){options(Seurat.object.assay.version = "v4"); print("you are using seurat v5 with assay option v4")}
```
To explore metacells QCs, we need to load:
(i) the single-cell data used to build the metacells and
(ii) the metacell data saved in a Seurat object (see chapter \@ref(Metacell-construction-chapter)).
```{r mc-QC-global-parameters, echo = FALSE}
## Parameters
MC_tool = "SuperCell"
## Here we can modify dataset
proj_name = c("cell_lines", "3k_pbmc", "cd34_multiome","bmcite")[4]
annotation_label = unlist(list('cell_lines'='cell_line',
'3k_pbmc'='louvain',
'cd34_multiome'='celltype',
"bmcite" = "celltype_simplified")[proj_name])
```
```{r sc-mc-load, eval = TRUE, echo = TRUE}
MC_tool = "SuperCell"
proj_name = "bmcite"
annotation_label = "celltype_simplified"
cell_types <- c("Unconventional T", "Naive T cell", "Non-Naive CD4 cell", "CD14 Mono", "B cell", "Non-Naive CD8 cell",
"NK", "GMP", "CD16 Mono", "pDC", "cDC2", "Prog_B", "Plasmablast", "HSC", "LMPP", "Prog_DC", "MEP")
celltype_colors <- c("#1E88E5", "#FFC107", "#004D40", "#9E9D24",
"#F06292", "#546E7A", "#D4E157", "#76FF03",
"#26A69A", "#AB47BC", "#D81B60", "#42A5F5",
"#2E7D32", "#FFA726", "#5E35B1", "#EF5350","#6D4C41")
names(celltype_colors) <-cell_types
# Load the single-cell data
sc_data = readRDS(paste0("data/", proj_name, "/singlecell_seurat_filtered.rds"))
# Load the metacell data
mc_data = readRDS(paste0('data/', proj_name, '/metacell_', MC_tool,'.rds'))
```
## Quantitative metrics
### Purity
When available, cell annotations can be used to annotate each metacell to the most abundant cell category (*e.g.* cell type) composing the metacell (see chapter \@ref(Metacell-construction-chapter)).
This also allows us to compute metacell purity. If the annotation considered is the cell type, the **purity** of a metacell is the proportion of the most abundant
cell type within the metacell [@SuperCell].
```{r compute_purity}
membership_df <- mc_data@misc$cell_membership
mc_data$purity <- mc_purity(membership = membership_df$membership, annotation = sc_data@meta.data[, annotation_label])
qc_boxplot(mc.obj = mc_data, qc.metrics = "purity")
qc_boxplot(mc.obj = mc_data, qc.metrics = "purity", split.by = annotation_label)
```
### Compactness
The **compactness** of a metacell is the variance of the components within the metacell [@SEACells].
The lower the compactness value the better.
This metric as well as the separation metric are computed based on a low embedding of the single-cell data (e.g., PCA).
Note that it is important to use the embedding used initially to construct the metacells.
In the next chunk, we retrieve the principal components computed for metacell construction
(in chapter \@ref(Metacell-construction-chapter) these principal components were saved in the Seurat objects containing the metacell data)
and run UMAP for visualization.
```{r sc-embessing, cache = TO_CACHE, evaluate = TRUE}
sc_data@reductions[["pca"]] <- mc_data@misc$sc.pca
sc_data <- RunUMAP(sc_data, reduction = "pca", dims = c(1:30), n.neighbors = 15, verbose = F, min.dist = 0.5)
UMAPPlot(sc_data, group.by = annotation_label, reduction = "umap", cols=celltype_colors)
```
We can compute the compactness of each metacell using the PCA components.
```{r compute_compactness}
mc_data$compactness <- mc_compactness(cell.membership = membership_df,
sc.obj = sc_data,
sc.reduction = "pca",
dims = 1:30)
qc_boxplot(mc.obj = mc_data, qc.metrics = "compactness")
qc_boxplot(mc.obj = mc_data, qc.metrics = "compactness", split.by = annotation_label)
```
We can also compute the compactness of each metacell using diffusion map components computed based on the PCA axes, as suggested in [@SEACells].
To compute the diffusion map components, you can use the function `get_dim_reduc()` from the MetacellAnalysisTookit R package.
```{r compute_compactness_diffMaps}
# we run diffufion maps on the pca axes saved in the seurat object. Note that a matrix containing the PCA components can be provided as well
diffusion_comp <- get_diffusion_comp(sc.obj = sc_data, sc.reduction = "pca", dims = 1:30)
mc_data$compactness <- mc_compactness(cell.membership = membership_df,
sc.obj = sc_data,
sc.reduction = diffusion_comp,
dims = 1:ncol(diffusion_comp))
qc_boxplot(mc.obj = mc_data, qc.metrics = "compactness")
qc_boxplot(mc.obj = mc_data, qc.metrics = "compactness", split.by = annotation_label)
```
### Separation
The **separation** of a metacell is the distance to the closest metacell [@SEACells].
The higher the separation value the better.
```{r compute_separation}
mc_data$separation <- mc_separation(cell.membership = membership_df,
sc.obj = sc_data,
sc.reduction = "pca")
qc_boxplot(mc.obj = mc_data, qc.metrics = "separation")
qc_boxplot(mc.obj = mc_data, qc.metrics = "separation", split.by = annotation_label)
```
```{r compute_separation_diffMaps}
mc_data$separation <- mc_separation(cell.membership = membership_df,
sc.obj = sc_data,
sc.reduction = diffusion_comp)
qc_boxplot(mc.obj = mc_data, qc.metrics = "separation")
qc_boxplot(mc.obj = mc_data, qc.metrics = "separation", split.by = annotation_label)
```
Note that compactness and separation metrics are correlated, better compactness results in worse separation and vice versa.
Metacells from dense regions will have better compactness but worse separation,
while metacells from sparse regions will have better separation but worse compactness.
```{r comparison_compactness_separation, eval = TRUE, echo = TRUE}
library(ggplot2)
ggplot(data.frame(compactness = log(mc_data$compactness), separation = log(mc_data$separation)),
aes(x=compactness, y=separation)) +
geom_point()+
geom_smooth(method=lm) + ggpubr::stat_cor(method = "pearson")
```
### INV
The **inner normalized variance (INV)** of a metacell is the mean-normalized variance of gene expression within the metacell [@MC2].
The lower the INV value the better.
Note that it is the only metric that is latent-space independent.
```{r compute_INV}
mc_data$INV <- mc_INV(cell.membership = membership_df, sc.obj = sc_data, group.label = "membership")
qc_boxplot(mc.obj = mc_data, qc.metrics = "INV")
qc_boxplot(mc.obj = mc_data, qc.metrics = "INV", split.by = annotation_label)
```
## Size distribution
The size of a metacell corresponds to the number of single cells it contains.
Having a homogeneous metacell size distribution is ideal for downstream analyses, since larger metacells will express more genes,
which could confound analyses. When heterogeneous size distributions are obtained we recommend weighted downstream analyses as described in section \@ref(weighted-analysis).
```{r visualize_size_distribution}
# Seurat::VlnPlot(mc_data, features = "size", pt.size = 2)
# Seurat::VlnPlot(mc_data, features = "size", pt.size = 2, group.by = annotation_label)
hist(mc_data$size, main = "Size distribution", xlab = "Size", breaks = 50)
qc_boxplot(mc.obj = mc_data, qc.metrics = "size")
qc_boxplot(mc.obj = mc_data, qc.metrics = "size", split.by = annotation_label)
```
## Representativeness of metacells
To visualize the metacells, we can project the metacells on the single-cell UMAP representation using the `mc_projection()` function (adapted from the `plot.plot_2D()` from the SEACells package).
A good metacell partition should reproduce the overall structure of the single-cell data by uniformly representing the latent space.
To use this function we need the data at the single-cell level (or at least a low-dimensional embedding of the data) and the single-cell membership to each the metacell.
```{r visualize_metacells}
mc_projection(
sc.obj = sc_data,
mc.obj = mc_data,
cell.membership = membership_df,
sc.reduction = "umap",
sc.label = unlist(annotation_label), # single cells will be colored according the sc.label
metacell.label = unlist(annotation_label) # metacells cell will be colored according the metacell.label
)
```
By default the size of the metacells dots is proportionnal to the size of the metacells.
Metacells can also be colored by a continuous variable such as one of the QC metrics computed in the previous chunks:
```{r visualize_metacells_continuous}
mc_projection(
sc.obj = sc_data,
mc.obj = mc_data,
cell.membership = membership_df,
sc.reduction = "umap",
sc.label = unlist(annotation_label), # single cells will be colored according the sc.label
continuous_metric = TRUE,
metric = "compactness"
)
```