Skip to content

Commit

Permalink
added kontextual + classifyR [skip ci]
Browse files Browse the repository at this point in the history
  • Loading branch information
fame2827 committed Oct 28, 2024
1 parent 27664de commit 5cfc4ed
Show file tree
Hide file tree
Showing 2 changed files with 369 additions and 5 deletions.
Binary file added inst/extdata/kontext.rda
Binary file not shown.
374 changes: 369 additions & 5 deletions vignettes/scdneySpatial.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ editor_options:
wrap: 72
---


```{r, include = FALSE}
knitr::opts_chunk$set(
#collapse = TRUE,
Expand All @@ -35,11 +34,10 @@ knitr::opts_chunk$set(
}
</style>
```

**Presenting Authors**

Farhan Ameen$^{1,2,3}$, Alex Qin$^{1,2,3}$, Shreya Rao$^{1,2,3}$,
Ellis Patrick$^{1,2,3}$.
Farhan Ameen$^{1,2,3}$, Alex Qin$^{1,2,3}$, Shreya Rao$^{1,2,3}$, Ellis
Patrick$^{1,2,3}$.

$^1$ Westmead Institute for Medical Research, University of Sydney,
Australia\
Expand All @@ -48,4 +46,370 @@ Australia\
$^3$ School of Mathematics and Statistics, University of Sydney,
Australia

<br/> Contact: ellis.patrick\@sydney.edu.au
<br/> Contact: ellis.patrick\@sydney.edu.au

## Overview

Understanding the interplay between different types of cells and their
immediate environment is critical for understanding the mechanisms of
cells themselves and their function in the context of human diseases.
Recent advances in high dimensional in situ cytometry technologies have
fundamentally revolutionized our ability to observe these complex
cellular relationships providing an unprecedented characterisation of
cellular heterogeneity in a tissue environment.

### Description

In this workshop we will introduce some of the key analytical concepts
needed to analyse data from high dimensional spatial omics technologies
such as, PhenoCycler, IMC, Xenium and MERFISH. We will show how
functionality from our Bioconductor packages simpleSeg, FuseSOM,
scClassify, scHot, spicyR, listClust, statial, scFeatures and ClassifyR
can be used to address various biological hypotheses. By the end of this
workshop attendees will be able to implement and assess some of the key
steps of a spatial analysis pipeline including cell segmentation,
feature normalisation, cell type identification, microenvironment and
cell-state characterisation, spatial hypothesis testing and patient
classification. Understanding these key steps will provide attendees
with the core skills needed to interrogate the comprehensive spatial
information generated by these exciting new technologies.

### Pre-requisites

It is expected that students will have:

- basic knowledge of R syntax,
- this workshop will not provide an in-depth description of
cell-resolution spatial omics technologies.

### *R* / *Bioconductor* packages used

Several single cell R packages will be used from the scdney package, for
more information visit: <https://sydneybiox.github.io/scdney/>.

### Learning objectives

- Understand and visualise spatial omics datasets.
- Identify key biological questions that can be addressed with these
technologies and spatial analysis.
- Understand the key analytical steps involved in spatial omics
analysis, and perform these steps using R.
- Evaluate the performance of data normalisation and cell
segmentation.
- Understand and generate individual feature representations from
spatial omics data.
- Develop appreciation on how to assess performance of classification
models.
- Perform disease outcome prediction using the feature representation
and robust classification framework.

## Workshop

## Setup

### Installation

```{r install, warning=FALSE, message=FALSE, eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install(c( "simpleSeg", "cytomapper", "scClassify", "scHOT", "FuseSOM","glmnet", "spicyR", "lisaClust","Statial", "scFeatures", "ClassifyR", "tidyverse", "scater", "SingleCellExperiment", "STexampleData", "SpatialDatasets", "tidySingleCellExperiment", "scuttle", "batchelor"))
```

### Load packages

```{r library, warning=FALSE, message=FALSE}
# packages from scdney
library(SpatialDatasets)
library(simpleSeg)
library(FuseSOM)
library(lisaClust)
library(spicyR)
#library(Statial)
devtools::load_all("~/Desktop/UNI/Phd/Projects/Statial")
devtools::load_all("~/Desktop/UNI/Phd/Projects/spicyR")
library(ClassifyR)
# Other required packages
library(tidySingleCellExperiment)
library(tidyverse)
library(SingleCellExperiment)
theme_set(theme_classic())
nCores <- 8 # Feel free to parallelise things if you have the cores to spare.
BPPARAM <- simpleSeg:::generateBPParam(nCores)
options("restore_SingleCellExperiment_show" = TRUE)
```

### Download datasets

Please download these datasets before you start the workshop

```{r data}
#Download data now
#SpatialDatasets::Ferguson_Images()
spe = SpatialDatasets::spe_Ferguson_2022()
spe$x = spatialCoords(spe)[, "x"]
spe$y = spatialCoords(spe)[, "y"]
```

## SimpleSeg: Segmentation and normalisation

### ALEX CHANGE LATER: Normalise and cluster

```{r normalise}
useMarkers <- rownames(spe)[!rownames(spe) %in% c("DNA1", "DNA2", "HH3")]
spe <- normalizeCells(spe,
markers = useMarkers,
transformation = NULL,
method = c("trim99", "mean", "PC1"),
assayIn = "counts",
cores = nCores
)
# Usually we specify a subset of the original markers which are informative to separating out distinct cell types for the UMAP and clustering.
ct_markers <- c("podoplanin", "CD13", "CD31",
"panCK", "CD3", "CD4", "CD8a",
"CD20", "CD68", "CD16", "CD14", "HLADR", "CD66a")
# Set seed.
set.seed(51773)
# Generate SOM and cluster cells into 10 groups
spe <- runFuseSOM(
spe,
markers = ct_markers,
assay = "norm",
numClusters = 10
)
spe <- spe |>
mutate(cellType = case_when(
clusters == "cluster_1" ~ "GC",
clusters == "cluster_2" ~ "MC",
clusters == "cluster_3" ~ "SC",
clusters == "cluster_4" ~ "EP",
clusters == "cluster_5" ~ "SC",
clusters == "cluster_6" ~ "TC_CD4",
clusters == "cluster_7" ~ "BC",
clusters == "cluster_8" ~ "EC",
clusters == "cluster_9" ~ "TC_CD8",
clusters == "cluster_10" ~ "DC"
))
```

## Statial

[Statial
Package](https://www.bioconductor.org/packages/release/bioc/html/Statial.html)

### SpatioMark

```{r}
# Preparing outcome vector
outcome <- spe$group[!duplicated(spe$imageID)]
names(outcome) <- spe$imageID[!duplicated(spe$imageID)]
```

### Kontextual

[Pre-print](https://www.biorxiv.org/content/10.1101/2024.09.03.611109v1)

```{r hierarchy}
# Cluster cell type hierarchy
hierarchy <- treekoR::getClusterTree(t(assay(spe, "norm")),
clusters = spe$cellType,
hierarchy_method = "hopach")
parentList = Statial::getParentPhylo(hierarchy)
# Visualise tree
hierarchy$clust_tree |>
plot()
# Add parent 4 and 5
parentList$parent_4 = c(parentList$parent_2, parentList$parent_3)
parentList$parent_5 = c(parentList$parent_1, "EC")
all = unique(spe$cellType)
# Create data frame with all pairwise combinations of cells
treeDf = Statial::parentCombinations(all, parentList = parentList)
```

```{r kontextual, eval = FALSE}
kontext = Kontextual(
cells = spe,
cellType = "cellType",
r = 50,
parentDf = treeDf,
cores = nCores
)
```

To save time we have saved the output of the results above.

```{r loadKontextual}
load(system.file("extdata", "kontextual.rda", package = "ScdneySpatial"))
```

```{r}
imageChoose = "H3"
spe |>
colData() |>
as.data.frame() |>
filter(imageID == imageChoose) |>
filter(cellType %in% c("SC", "TC_CD8", parentList$parent_4)) |>
mutate(cellType = case_when(
cellType %in% parentList$parent_4[parentList$parent_4 != "TC_CD8"] ~ "Immune",
TRUE ~ cellType
)) |>
mutate(cellType = factor(cellType, levels = c("TC_CD8", "SC", "Immune"))) |>
arrange(desc(cellType)) |>
ggplot(aes(x = x, y = y, color = cellType)) +
geom_point(size = 1) +
scale_colour_manual(values = c("SC" = "#404040", "TC_CD8" = "#E25758", "Immune" = "#D7D8D8")) +
guides(colour = guide_legend(title = "Cell types", override.aes = list(size = 3)))
```

We can use the `spicyR::spicy` to test for associations between the `Kontextual` values and progression status. To do so we use the `alternateResult` arguement.

```{r kontextOutcome, fig.height = 5, fig.width = 10}
# Converting Kontextual result into data matrix
kontextMat <- prepMatrix(kontext)
# Replace NAs with 0
kontextMat[is.na(kontextMat)] <- 0
# Test association with outcome
kontextOutcome = spicyR::spicy(
cells = spe,
alternateResult = kontextMat,
condition = "group",
BPPARAM = BPPARAM
)
# Plot results
signifPlot(kontextOutcome)
```

# ClassifyR: Patient classification

[ClassifyR
Package](https://www.bioconductor.org/packages/release/bioc/html/ClassifyR.html)

[Journal
article](https://academic.oup.com/bioinformatics/article/31/11/1851/2365648?login=false)


Our ClassifyR package formalises a convenient framework for evaluating classification in R. We provide functionality to easily include four key modelling stages; Data transformation, feature selection, classifier training and prediction; into a cross-validation loop. Here we use the `crossValidate` function to perform 50 repeats of 5-fold cross-validation to evaluate the performance of a random forest applied to five quantifications of our IMC data; 1) Cell type proportions 2) Cell type localisation from `spicyR` 3) Region proportions from `lisaClust` 4) Cell type localisation in reference to a parent cell type from `Kontextual` 5) Cell changes in response to proximal changes from `SpatioMark`

```{r features}
# Create list to store data.frames
data <- list()
# Add proportions of each cell type in each image
data[["Proportions"]] <- getProp(spe, "cellType")
# Add pair-wise associations
# spicyMat <- bind(spicyTest)
# spicyMat[is.na(spicyMat)] <- 0
# spicyMat <- spicyMat |>
# select(!condition) |>
# tibble::column_to_rownames("imageID")
#
# data[["SpicyR"]] <- spicyMat
# Add proportions of each region in each image to the list of dataframes.
#data[["LisaClust"]] <- getProp(cells, "region")
# Add SpatioMark features
#data[["SpatioMark"]] <- distMat
# Add Kontextual features
data[["Kontextual"]] <- kontextMat
```


```{r classification}
# Set seed
set.seed(51773)
# Perform 5-fold cross-validation with 50 repeats.
cv <- crossValidate(
measurements = data,
outcome = outcome,
selectionMethod = "t-test",
classifier = "LASSOGLM",
nFolds = 5,
nFeatures = 50,
nRepeats = 50,
nCores = nCores
)
```

## Visualise cross-validated prediction performance

Here we use the `performancePlot` function to assess the AUC from each repeat of the 5-fold cross-validation.

```{r performancePlot}
performancePlot(
cv,
metric = "AUC",
characteristicsList = list(x = "Assay Name"),
orderingList = list("Assay Name" = c("Proportions", "SpicyR", "LisaClust", "Kontextual", "SpatioMark"))
)
```


::: question
**Questions**

1. Run the following chunk of code, what is this plot telling us?
2. Do different methods perform differently on different samples?
3. Knowing this, what might we do to improve our model?
:::



```{r sampleMetrics}
samplesMetricMap(cv) |>
plot()
```


```{r multiViewClassification}
cv <- crossValidate(
measurements = data,
outcome = outcome,
classifier = "randomForest",
nFolds = 5,
nRepeats = 50,
multiViewMethod = "merge",
nCores = nCores
)
performancePlot(
cv,
metric = "AUC",
characteristicsList = list(x = "Assay Name"),
)
```

0 comments on commit 5cfc4ed

Please sign in to comment.