Skip to content

Commit a3920da

Browse files
committed
updated made during the workshop
1 parent 39aa426 commit a3920da

File tree

3 files changed

+113
-30
lines changed

3 files changed

+113
-30
lines changed

vignettes/Session_1_sequencing_assays.Rmd

Lines changed: 27 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,7 @@ Maynard and Torres et al., doi: [10.1038/s41593-020-00787-0](https://www.ncbi.nl
195195
library(spatialLIBD)
196196
library(ExperimentHub)
197197
198-
# To avoid error for SPE loading
198+
# To avoid error for SPE loading
199199
# https://support.bioconductor.org/p/9161859/#9161863
200200
setClassUnion("ExpData", c("matrix", "SpatialExperiment"))
201201
@@ -218,6 +218,10 @@ spatial_data
218218

219219
From: <https://bookdown.org/sjcockell/ismb-tutorial-2023/practical-session-2.html>
220220

221+
::: {.note}
222+
If `ExperimentHub` should not work. The `spatial_data` object from the previous code block can be downloaded from [Zenodo - 10.5281/zenodo.11233385](https://zenodo.org/records/11233385/files/tidySpatialWorkshop2024_spatial_data.rds)
223+
:::
224+
221225
We shows metadata for each cell, helping understand the dataset's structure.
222226

223227
```{r}
@@ -487,6 +491,23 @@ The final step in data preprocessing involves removing all spots identified as l
487491
spatial_data = spatial_data[,!colData(spatial_data)$discard ]
488492
```
489493

494+
::: {.note}
495+
**Exercise 1.0.1**
496+
497+
It is good practice to perform quality control independently for each sample and different cell types. This because samples and cell types (or tissue regions) could have a distinct baseline distributions of quality control factors (e.g. mitochondrial transcription).
498+
499+
1) Let's try to plot `subsets_mito_percent` grouping by sample, using `ggplot2`.
500+
2) Also, let's try to add tissue regions (present in the `colData` as already described) as colors, using `ggplot2`
501+
:::
502+
503+
::: {.note}
504+
**Exercise 1.0.2**
505+
506+
Thresholding is a easy-to-understand approach, but often arbitrary. A better strategy is outlier detection. With this strategy the baseline distribution of a QC factor (e.g. mitochondrial transcription) will be used to detect anomalous spots/cells. Read the documentation of `scater::isOutlier`, and use it to label outlier spots for mitochondrial transcription.
507+
508+
Then, note which method is the most stringent, between our thresholding and outlier-detection.
509+
:::
510+
490511
### 6. Dimensionality reduction
491512

492513
Dimensionality reduction is essential in spatial transcriptomics due to the high-dimensional nature of the data, which includes vast gene expression profiles across various spatial locations. Techniques such as PCA (Principal Component Analysis) and UMAP (Uniform Manifold Approximation and Projection) are particularly valuable. PCA helps to reduce noise and highlight the most significant variance in the data, making it simpler to uncover underlying patterns and correlations. UMAP, ofen calculated from principal components (and not directly from features) preserves both global and local data structures, enabling more nuanced visualisations of complex cellular landscapes. Together, these methods facilitate a deeper understanding of spatial gene expression, helping to reveal biological insights such as cellular heterogeneity and tissue structure, which are crucial for both basic biological research and clinical applications.
@@ -749,7 +770,7 @@ cluster_metadata =
749770
colData(spatial_data_list$sample_151673)[, c("clust_M0_lam0.2_k50_res0.7", "clust_M0_lam0.2_k50_res0.7_smooth")]
750771
751772
752-
knitr::kable(head(cluster_metadata), format = "html")
773+
knitr::kable(head(cluster_metadata, 10), format = "html")
753774
```
754775

755776
Using cluster comparison metrics like the adjusted Rand index (ARI) we evaluate the performance of our clustering approach. This statistical analysis helps validate the clustering results against known labels or pathologies.
@@ -854,23 +875,20 @@ table(brain_reference$cell_types)
854875
855876
```
856877

857-
These are the number of samples we have for each of the three data sets.
878+
These are the number of samples we have.
858879

859880
```{r}
860881
861882
table(brain_reference$sample)
862883
```
863884

864885

865-
Now, we identify the variable genes within each dataset, to not capture technical effects, and identify the union of variable genes for further analysis.
886+
Now, we identify the variable genes, to not capture technical effects, and identify the union of variable genes for further analysis.
866887

867888
```{r, warning=FALSE}
868889
genes <- !grepl(pattern = "^Rp[l|s]|Mt", x = rownames(brain_reference))
869890
870-
# Convert to list
871-
brain_reference_list <- lapply(unique(brain_reference$dataset_id), function(x) brain_reference[, brain_reference$dataset_id == x])
872-
873-
dec = scran::modelGeneVar(brain_reference, subset.row = genes, block = brain_reference$sample_id)
891+
dec = scran::modelGeneVar(brain_reference, subset.row = genes, block = brain_reference$sample)
874892
hvg_CAQ = scran::getTopHVGs(dec, n = 1000)
875893
876894
hvg_CAQ = unique( unlist(hvg_CAQ))
@@ -990,6 +1008,7 @@ No, let's look at the correlation matrices to see which cell type are most often
9901008
9911009
plotCorrelationMatrix(res$mat)
9921010
```
1011+
9931012
```{r}
9941013
mat_df = as.data.frame(res$mat)
9951014
```

vignettes/Session_2_Tidy_spatial_analyses.Rmd

Lines changed: 36 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -111,8 +111,9 @@ rownames(spatialCoords(spatial_data)) = colnames(spatial_data) # Bug?
111111
# Display the object
112112
spatial_data
113113
```
114+
114115
::: {.note}
115-
If `ExperimentHub` should not work. The `spatial_data` object from the previous code block can be downloaded from [Zenodo - 10.5281/zenodo.11233385](https://zenodo.org/records/11233385/files/tidySpatialWorkshop_spatial_data.rds?download=1)
116+
If `ExperimentHub` should not work. The `spatial_data` object from the previous code block can be downloaded from [Zenodo - 10.5281/zenodo.11233385](https://zenodo.org/records/11233385/files/tidySpatialWorkshop2024_spatial_data.rds)
116117
:::
117118

118119
## Working with tidySpatialExperiment
@@ -180,7 +181,7 @@ spatial_data |> select(.cell, sample_id, in_tissue, spatialLIBD)
180181
```
181182

182183
::: {.note}
183-
Note that some columns are always displayed no matter whet. These column include special slots in the objects such as reduced dimensions, spatial coordinates (mandatory for `SpatialExperiment`), and sample identifier (mandatory for `SpatialExperiment`).
184+
Note that some columns are always displayed no matter what. These column include special slots in the objects such as reduced dimensions, spatial coordinates (mandatory for `SpatialExperiment`), and sample identifier (mandatory for `SpatialExperiment`).
184185
:::
185186

186187
Although the select operation can be used as a display tool, to explore our object, it updates the `SpatialExperiment` metadata, subsetting the desired columns.
@@ -277,7 +278,7 @@ spatial_data |>
277278
We can update the underlying `SpatialExperiment` object, for future analyses. And confirm that the `SpatialExperiment` metadata has been mutated.
278279

279280
```{r message=FALSE}
280-
spatial_data =
281+
spatial_data <-
281282
spatial_data |>
282283
mutate(spatialLIBD_lower = tolower(spatialLIBD))
283284
@@ -313,11 +314,12 @@ Extract specific identifiers from complex data paths, simplifying the dataset by
313314
```{r}
314315
# Create column for sample
315316
spatial_data <- spatial_data |>
317+
316318
# Extract sample ID from file path and display the updated data
317319
tidyr::extract(file_path, "sample_id_from_file_path", "\\.\\./data/single_cell/([0-9]+)/outs/raw_feature_bc_matrix/", remove = FALSE)
318320
319321
# Take a look
320-
spatial_data |> select(.cell, sample_id_from_file_path, everything())
322+
spatial_data |> select(.cell, sample_id_from_file_path, file_path, everything())
321323
```
322324

323325
#### Unite
@@ -329,7 +331,7 @@ We could use tidyverse `unite` to combine columns, for example to create a new c
329331
spatial_data <- spatial_data |> unite("sample_subject", sample_id, subject, remove = FALSE)
330332
331333
# Take a look
332-
spatial_data |> select(.cell, sample_id, sample_subject, subject)
334+
spatial_data |> select(.cell, sample_id, subject, sample_subject)
333335
```
334336

335337
### 3. Advanced filtering/gating and pseudobulk
@@ -350,9 +352,19 @@ spatial_data =
350352
# Gate based on tissue morphology
351353
tidySpatialExperiment::gate(alpha = 0.1, colour = "spatialLIBD")
352354
355+
spatial_data |> select(.cell, .gated)
356+
```
357+
358+
```{r, eval=FALSE}
359+
tidygate_env$gates |> saveRDS("<PATH>")
360+
```
361+
362+
```{r}
353363
spatial_data_gated = tidygate_env$gates
354364
```
355365

366+
367+
356368
You can reload a pre-made gate for reproducibility
357369

358370
```{r}
@@ -433,7 +445,7 @@ Join the feature to the metadata
433445
```{r}
434446
spatial_data =
435447
spatial_data |>
436-
join_features("ENSG00000131095", shape="wide")
448+
join_features("ENSG00000131095", shape="wide", assay = "logcounts")
437449
438450
spatial_data |>
439451
select(.cell, ENSG00000131095)
@@ -443,13 +455,15 @@ spatial_data |>
443455

444456
::: {.note}
445457
**Exercise 2.2**
458+
446459
Join the endothelial marker PECAM1 (CD31, look for ENSEMBL ID), and plot in space the pixel that are in the 0.75 percentile of EPCAM1 expression. Are the PECAM1-positive pixels (endothelial?) spatially clustered?
447460

448461
- Get the ENSEMBL ID
449462
- Join the feature to the tidy data abstraction
450463
- Calculate the 0.75 quantile across all pixels `mutate()`
451464
- Label the cells with high PECAM1
452465
- Plot the slide colouring for the new label
466+
453467
:::
454468

455469

@@ -484,7 +498,7 @@ We calculate summary statistics of a subset of data
484498

485499
```{r}
486500
spatial_data |>
487-
filter(Cluster==1) |>
501+
filter(Cluster==1) |>
488502
count(sample_id) |>
489503
arrange(desc(n))
490504
@@ -589,9 +603,7 @@ spatial_data <-
589603
addPerCellQC(subsets = list(mito = is_gene_mitochondrial)) |>
590604
591605
## Add threshold in colData
592-
mutate(
593-
qc_mitochondrial_transcription = subsets_mito_percent > 30
594-
)
606+
mutate( qc_mitochondrial_transcription = subsets_mito_percent > 30 )
595607
596608
spatial_data |> select(.cell, qc_mitochondrial_transcription)
597609
@@ -623,6 +635,8 @@ marker_genes =
623635
}
624636
)
625637
638+
marker_genes = cbind(marker_genes)
639+
626640
head(unique(unlist(marker_genes)))
627641
628642
```
@@ -726,6 +740,15 @@ spatial_data_filtered =
726740
**Maintainability:** Fewer and self-explanatory lines of code and no need for intermediate steps make the code easier to maintain and modify, especially when conditions change or additional filters are needed.
727741

728742

743+
::: {.note}
744+
**Exercise 2.2.1**
745+
746+
In Session 1 we showed that a good strategy for QC filtering is outlier detection. With this strategy the baseline distribution of a QC factor (e.g. mitochondrial transcription) will be used to detect anomalous spots/cells. Read the documentation of `scater::isOutlier`, and use it WITH `tidyomics`/`tidyverse` to label outlier spots for mitochondrial transcription.
747+
748+
Then, note which method is the most stringent, between our thresholding and outlier-detection, solely using `tidyomics`/`tidyverse`.
749+
750+
:::
751+
729752
### 7. Visualisation
730753

731754
Here, we will show how to use ad-hoc spatial visualisation, as well as `ggplot` to explore spatial data we will show how `tidySpatialExperiment` allowed to alternate between tidyverse visualisation, and any visualisation compatible with `SpatialExperiment`.
@@ -780,9 +803,10 @@ We provide another example of how the use of tidy. Spatial experiment makes cust
780803
spatial_data_filtered |>
781804
ggplot(aes(subsets_mito_percent, sum_gene)) +
782805
geom_point(aes(color = spatialLIBD), size=0.2) +
783-
stat_ellipse(aes(group = spatialLIBD), alpha = 0.3) +
806+
stat_ellipse(aes(group = spatialLIBD, color = spatialLIBD), alpha = 0.3) +
784807
scale_color_manual(values = libd_layer_colors |>
785808
str_remove("ayer")) +
809+
geom_smooth(aes(group = spatialLIBD), method="lm") +
786810
scale_y_log10() +
787811
theme_bw()
788812
@@ -828,7 +852,7 @@ We assume that the cells we filtered as non-alive or damaged, characterised by b
828852

829853
Use `tidyomic`/`tidyverse` tools to label dead cells and perform differential expression within each region. Some of the comments you can use are: `mutate`, `nest`, `map`, `aggregate_cells`, `tidybulk:::test_differential_abundance`,
830854

831-
A hist:
855+
A hint:
832856

833857
- spatial_data |>
834858
mutate(

vignettes/Solutions.Rmd

Lines changed: 50 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,31 @@ spot_counts <-
3737
spot_counts
3838
```
3939

40+
::: {.note}
41+
**Exercise 1.0.1**
42+
:::
43+
44+
```{r}
45+
colData(spatial_data) |>
46+
ggplot(aes(subsets_mito_percent)) +
47+
geom_density(aes(linetype = sample_id))
48+
49+
colData(spatial_data) |>
50+
ggplot(aes(subsets_mito_percent)) +
51+
geom_density(aes(color = spatialLIBD, linetype = sample_id))
52+
53+
54+
```
55+
56+
::: {.note}
57+
**Exercise 1.0.2**
58+
:::
59+
60+
```{r}
61+
scater::isOutlier(colData(spatial_data)$subsets_mito_percent, type="higher") |> table()
62+
colData(spatial_data)$qc_mitochondrial_transcription |> table()
63+
```
64+
4065
::: {.note}
4166
**Exercise 1.1**
4267
:::
@@ -232,10 +257,10 @@ spatial_data |>
232257
filter(in_tissue, sample_id=="151673") |>
233258
234259
# Gate based on tissue morphology
235-
tidySpatialExperiment::gate_spatial(alpha = 0.1) |>
260+
tidySpatialExperiment::gate(alpha = 0.1) |>
236261
237262
# Plot
238-
scater::plotUMAP(colour_by = ".gate")
263+
scater::plotUMAP(colour_by = ".gated")
239264
```
240265

241266

@@ -252,22 +277,36 @@ rowData(spatial_data) |>
252277
gene = "ENSG00000261371"
253278
254279
spatial_data |>
255-
280+
256281
# Join the feature
257-
join_features(gene, shape="wide") |>
258-
282+
join_features(gene, shape="wide", assay = "logcounts") |>
283+
259284
# Calculate the quantile
260-
mutate(my_quantile = quantile(ENSG00000261371, 0.75)) |>
261-
285+
mutate(my_quantile = quantile(ENSG00000261371, 0.99)) |>
286+
# mutate(my_quantile = ENSG00000261371 > 0) |> # A possibility
287+
262288
# Label the pixels
263-
mutate(PECAM1_positive = ENSG00000261371 > my_quantile) |>
264-
289+
mutate(PECAM1_positive = ENSG00000261371 >= my_quantile) |>
290+
265291
# Plot
266292
ggspavis::plotSpots(annotate = "PECAM1_positive") +
267293
facet_wrap(~sample_id)
268294
269295
```
270296

297+
::: {.note}
298+
**Exercise 2.2.1**
299+
:::
300+
301+
```{r}
302+
spatial_data |>
303+
mutate(qc_mitochondrial_transcription = scater::isOutlier(subsets_mito_percent, type="higher")) |>
304+
count(qc_mitochondrial_transcription)
305+
306+
spatial_data |>
307+
count(qc_mitochondrial_transcription)
308+
309+
```
271310

272311
::: {.note}
273312
**Excercise 2.3**
@@ -532,4 +571,5 @@ annotate = "region"
532571
guides(color = "none") +
533572
labs(title = "ground truth")
534573
535-
```
574+
```
575+

0 commit comments

Comments
 (0)