You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If `ExperimentHub` should not work. The `spatial_data` object from the previous code block can be downloaded from [Zenodo - 10.5281/zenodo.11233385](https://zenodo.org/records/11233385/files/tidySpatialWorkshop2024_spatial_data.rds)
223
+
:::
224
+
221
225
We shows metadata for each cell, helping understand the dataset's structure.
222
226
223
227
```{r}
@@ -487,6 +491,23 @@ The final step in data preprocessing involves removing all spots identified as l
It is good practice to perform quality control independently for each sample and different cell types. This because samples and cell types (or tissue regions) could have a distinct baseline distributions of quality control factors (e.g. mitochondrial transcription).
498
+
499
+
1) Let's try to plot `subsets_mito_percent` grouping by sample, using `ggplot2`.
500
+
2) Also, let's try to add tissue regions (present in the `colData` as already described) as colors, using `ggplot2`
501
+
:::
502
+
503
+
::: {.note}
504
+
**Exercise 1.0.2**
505
+
506
+
Thresholding is a easy-to-understand approach, but often arbitrary. A better strategy is outlier detection. With this strategy the baseline distribution of a QC factor (e.g. mitochondrial transcription) will be used to detect anomalous spots/cells. Read the documentation of `scater::isOutlier`, and use it to label outlier spots for mitochondrial transcription.
507
+
508
+
Then, note which method is the most stringent, between our thresholding and outlier-detection.
509
+
:::
510
+
490
511
### 6. Dimensionality reduction
491
512
492
513
Dimensionality reduction is essential in spatial transcriptomics due to the high-dimensional nature of the data, which includes vast gene expression profiles across various spatial locations. Techniques such as PCA (Principal Component Analysis) and UMAP (Uniform Manifold Approximation and Projection) are particularly valuable. PCA helps to reduce noise and highlight the most significant variance in the data, making it simpler to uncover underlying patterns and correlations. UMAP, ofen calculated from principal components (and not directly from features) preserves both global and local data structures, enabling more nuanced visualisations of complex cellular landscapes. Together, these methods facilitate a deeper understanding of spatial gene expression, helping to reveal biological insights such as cellular heterogeneity and tissue structure, which are crucial for both basic biological research and clinical applications.
knitr::kable(head(cluster_metadata), format = "html")
773
+
knitr::kable(head(cluster_metadata, 10), format = "html")
753
774
```
754
775
755
776
Using cluster comparison metrics like the adjusted Rand index (ARI) we evaluate the performance of our clustering approach. This statistical analysis helps validate the clustering results against known labels or pathologies.
These are the number of samples we have for each of the three data sets.
878
+
These are the number of samples we have.
858
879
859
880
```{r}
860
881
861
882
table(brain_reference$sample)
862
883
```
863
884
864
885
865
-
Now, we identify the variable genes within each dataset, to not capture technical effects, and identify the union of variable genes for further analysis.
886
+
Now, we identify the variable genes, to not capture technical effects, and identify the union of variable genes for further analysis.
866
887
867
888
```{r, warning=FALSE}
868
889
genes <- !grepl(pattern = "^Rp[l|s]|Mt", x = rownames(brain_reference))
If `ExperimentHub` should not work. The `spatial_data` object from the previous code block can be downloaded from [Zenodo - 10.5281/zenodo.11233385](https://zenodo.org/records/11233385/files/tidySpatialWorkshop_spatial_data.rds?download=1)
116
+
If `ExperimentHub` should not work. The `spatial_data` object from the previous code block can be downloaded from [Zenodo - 10.5281/zenodo.11233385](https://zenodo.org/records/11233385/files/tidySpatialWorkshop2024_spatial_data.rds)
Note that some columns are always displayed no matter whet. These column include special slots in the objects such as reduced dimensions, spatial coordinates (mandatory for `SpatialExperiment`), and sample identifier (mandatory for `SpatialExperiment`).
184
+
Note that some columns are always displayed no matter what. These column include special slots in the objects such as reduced dimensions, spatial coordinates (mandatory for `SpatialExperiment`), and sample identifier (mandatory for `SpatialExperiment`).
184
185
:::
185
186
186
187
Although the select operation can be used as a display tool, to explore our object, it updates the `SpatialExperiment` metadata, subsetting the desired columns.
@@ -277,7 +278,7 @@ spatial_data |>
277
278
We can update the underlying `SpatialExperiment` object, for future analyses. And confirm that the `SpatialExperiment` metadata has been mutated.
278
279
279
280
```{r message=FALSE}
280
-
spatial_data =
281
+
spatial_data <-
281
282
spatial_data |>
282
283
mutate(spatialLIBD_lower = tolower(spatialLIBD))
283
284
@@ -313,11 +314,12 @@ Extract specific identifiers from complex data paths, simplifying the dataset by
313
314
```{r}
314
315
# Create column for sample
315
316
spatial_data <- spatial_data |>
317
+
316
318
# Extract sample ID from file path and display the updated data
Join the endothelial marker PECAM1 (CD31, look for ENSEMBL ID), and plot in space the pixel that are in the 0.75 percentile of EPCAM1 expression. Are the PECAM1-positive pixels (endothelial?) spatially clustered?
447
460
448
461
- Get the ENSEMBL ID
449
462
- Join the feature to the tidy data abstraction
450
463
- Calculate the 0.75 quantile across all pixels `mutate()`
451
464
- Label the cells with high PECAM1
452
465
- Plot the slide colouring for the new label
466
+
453
467
:::
454
468
455
469
@@ -484,7 +498,7 @@ We calculate summary statistics of a subset of data
**Maintainability:** Fewer and self-explanatory lines of code and no need for intermediate steps make the code easier to maintain and modify, especially when conditions change or additional filters are needed.
727
741
728
742
743
+
::: {.note}
744
+
**Exercise 2.2.1**
745
+
746
+
In Session 1 we showed that a good strategy for QC filtering is outlier detection. With this strategy the baseline distribution of a QC factor (e.g. mitochondrial transcription) will be used to detect anomalous spots/cells. Read the documentation of `scater::isOutlier`, and use it WITH `tidyomics`/`tidyverse` to label outlier spots for mitochondrial transcription.
747
+
748
+
Then, note which method is the most stringent, between our thresholding and outlier-detection, solely using `tidyomics`/`tidyverse`.
749
+
750
+
:::
751
+
729
752
### 7. Visualisation
730
753
731
754
Here, we will show how to use ad-hoc spatial visualisation, as well as `ggplot` to explore spatial data we will show how `tidySpatialExperiment` allowed to alternate between tidyverse visualisation, and any visualisation compatible with `SpatialExperiment`.
@@ -780,9 +803,10 @@ We provide another example of how the use of tidy. Spatial experiment makes cust
@@ -828,7 +852,7 @@ We assume that the cells we filtered as non-alive or damaged, characterised by b
828
852
829
853
Use `tidyomic`/`tidyverse` tools to label dead cells and perform differential expression within each region. Some of the comments you can use are: `mutate`, `nest`, `map`, `aggregate_cells`, `tidybulk:::test_differential_abundance`,
0 commit comments