Figure 14.9: Spatial distribution of the integrated Leiden clusters.
diff --git a/reference-keys.txt b/reference-keys.txt
index 713d906..729bff8 100644
--- a/reference-keys.txt
+++ b/reference-keys.txt
@@ -154,29 +154,29 @@ fig:unnamed-chunk-491
fig:unnamed-chunk-493
fig:unnamed-chunk-495
fig:unnamed-chunk-497
-fig:unnamed-chunk-557
fig:unnamed-chunk-558
-fig:unnamed-chunk-564
-fig:unnamed-chunk-571
-fig:unnamed-chunk-573
-fig:unnamed-chunk-579
-fig:unnamed-chunk-581
-fig:unnamed-chunk-587
-fig:unnamed-chunk-589
-fig:unnamed-chunk-663
-fig:unnamed-chunk-668
-fig:unnamed-chunk-671
+fig:unnamed-chunk-559
+fig:unnamed-chunk-565
+fig:unnamed-chunk-572
+fig:unnamed-chunk-574
+fig:unnamed-chunk-580
+fig:unnamed-chunk-582
+fig:unnamed-chunk-588
+fig:unnamed-chunk-590
+fig:unnamed-chunk-664
+fig:unnamed-chunk-669
fig:unnamed-chunk-672
-fig:unnamed-chunk-675
+fig:unnamed-chunk-673
fig:unnamed-chunk-676
-fig:unnamed-chunk-678
-fig:unnamed-chunk-680
-fig:unnamed-chunk-682
-fig:unnamed-chunk-684
-fig:unnamed-chunk-688
-fig:unnamed-chunk-691
-fig:unnamed-chunk-693
-fig:unnamed-chunk-695
+fig:unnamed-chunk-677
+fig:unnamed-chunk-679
+fig:unnamed-chunk-681
+fig:unnamed-chunk-683
+fig:unnamed-chunk-685
+fig:unnamed-chunk-689
+fig:unnamed-chunk-692
+fig:unnamed-chunk-694
+fig:unnamed-chunk-696
giotto-suite-workshop-2024
instructors
topics-and-schedule
@@ -515,3 +515,8 @@ visualizing-clustering
cropping-objects
contributing-to-giotto
contribution-guideline
+coding-style
+stat-functions
+auxiliary-functions
+package-imports
+python-code
diff --git a/search_index.json b/search_index.json
index ec5e681..a8d87cf 100644
--- a/search_index.json
+++ b/search_index.json
@@ -1 +1 @@
-[["index.html", "Workshop: Spatial multi-omics data analysis with Giotto Suite 1 Giotto Suite Workshop 2024 1.1 Instructors 1.2 Topics and Schedule: 1.3 License", " Workshop: Spatial multi-omics data analysis with Giotto Suite Ruben Dries, Jiaji George Chen, Joselyn Cristina Chávez-Fuentes, Junxiang Xu ,Edward Ruiz, Jeff Sheridan, Iqra Amin, Wen Wang 1 Giotto Suite Workshop 2024 Workshop: Spatial multi-omics data analysis with Giotto Suite Github repo: https://github.com/drieslab/giotto_workshop_2024/ Giotto Suite Website: http://www.giottosuite.com Twitter/X: https://x.com/GiottoSpatial Code repo: https://github.com/drieslab/Giotto Issues page: https://github.com/drieslab/Giotto/issues Discussions page: https://github.com/drieslab/Giotto/discussions 1.1 Instructors Ruben Dries: Assistant Professor of Medicine at Boston University Joselyn Cristina Chávez Fuentes: Postdoctoral fellow at Icahn School of Medicine at Mount Sinai Jiaji George Chen: Ph.D. student at Boston University Junxiang Xu: Ph.D. student at Boston University Edward C. Ruiz: Ph.D. student at Boston University Jeff Sheridan: Postdoctoral fellow at Boston University Iqra Amin: Bioinformatician at Boston University Wen Wang: Postdoctoral fellow at Icahn School of Medicine at Mount Sinai 1.2 Topics and Schedule: Day 1: Introduction Spatial omics technologies Spatial sequencing Spatial in situ Spatial proteomics spatial other: ATAC-seq, lipidomics, etc Introduction to the Giotto package Ecosystem Installation + python environment Giotto instructions Data formatting and Pre-processing Creating a Giotto object From matrix + locations From subcellular raw data (transcripts or images) + polygons Using convenience functions for popular technologies (Vizgen, Xenium, CosMx, …) Spatial plots Subsetting: Based on IDs Based on locations Visualizations Introduction to spatial multi-modal dataset (10X Genomics breast cancer) and goal for the next days Quality control Statistics Normalization Feature selection: Highly Variable Features: loess regression binned pearson residuals Spatial variable genes Dimension Reduction PCA UMAP/t-SNE Visualizations Clustering Non-spatial k-means Hierarchical clustering Leiden/Louvain Spatial Spatial variable genes Spatial co-expression modules Day 2: Spatial Data Analysis Spatial sequencing based technology: Visium Differential expression Enrichment & Deconvolution PAGE/Rank SpatialDWLS Visualizations Interactive tools Spatial expression patterns Spatial variable genes Spatial co-expression modules Spatial HMRF Spatial sequencing based technology: Visium HD Tiling and aggregation Scalability (duckdb) and projection functions Spatial expression patterns Spatial co-expression module Spatial in situ technology: Xenium Read in raw data Transcript coordinates Polygon coordinates Visualizations Overlap txs & polygons Typical aggregated workflow Feature/molecule specific analysis Visualizations Transcript enrichment GSEA Spatial location analysis Spatial cell type co-localization analysis Spatial niche analysis Spatial niche trajectory analysis Visualizations Spatial proteomics: multiplex IF Read in raw data Intensity data (IF or any other image) Polygon coordinates Visualizations Overlap intensity & workflows Typical aggregated workflow Visualizations Day 3: Advanced Tutorials Multiple samples Create individual giotto objects Join Giotto Objects Perform Harmony and default workflows Visualizations Spatial multi-modal Co-registration of datasets Examples in giotto suite manuscript Multi-omics integration Example in giotto suite manuscript Interoperability w/ other frameworks AnnData/SpatialData SpatialExperiment Seurat Interoperability w/ isolated tools Spatial niche trajectory analysis Interactivity with the R/Spatial ecosystem Kriging Contributing to Giotto 1.3 License This material has a Creative Commons Attribution-ShareAlike 4.0 International License. To get more information about this license, visit http://creativecommons.org/licenses/by-sa/4.0/ "],["datasets-packages.html", "2 Datasets & Packages 2.1 Datasets to download 2.2 Needed packages", " 2 Datasets & Packages 2.1 Datasets to download Here we provide links to the original datasets that were used for this workshop. Some of the datasets were modified (e.g. downsampled or subsetted) for the purpose of this workshop. You can download them from their original source or download all of them - including intermediate files - from the following Zenodo repository: 2.1.1 Zenodo repository https://zenodo.org/communities/gw2024/ 2.1.2 10X Genomics Visium Mouse Brain Section (Coronal) dataset https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Adult_Mouse_Brain 2.1.3 10X Genomics Visium HD: FFPE Human Colon Cancer https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-libraries-of-human-crc 2.1.4 10X Genomics multi-modal dataset https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast 2.1.5 10X Genomics multi-omics Visium CytAssist Human Tonsil dataset https://www.10xgenomics.com/resources/datasets/gene-protein-expression-library-of-human-tonsil-cytassist-ffpe-2-standard 2.1.6 10X Genomics Human Prostate Cancer Adenocarcinoma with Invasive Carcinoma (FFPE) https://www.10xgenomics.com/datasets/human-prostate-cancer-adenocarcinoma-with-invasive-carcinoma-ffpe-1-standard-1-3-0 2.1.7 10X Genomics Normal Human Prostate (FFPE) https://www.10xgenomics.com/datasets/normal-human-prostate-ffpe-1-standard-1-3-0 2.1.8 Xenium https://www.10xgenomics.com/datasets/preview-data-ffpe-human-lung-cancer-with-xenium-multimodal-cell-segmentation-1-standard 2.1.9 MERFISH cortex dataset https://doi.brainimagelibrary.org/doi/10.35077/g.21 2.1.10 Lunaphore IF dataset https://zenodo.org/records/13175721 2.2 Needed packages To run all the tutorials from this Giotto Suite workshop you will need to install additional R and Python packages. Here we provide detailed instructions and discuss some common difficulties with installing these packages. The easiest way would be to copy each code snippet into your R/Rstudio Console using fresh a R session. 2.2.1 CRAN dependencies: cran_dependencies <- c("BiocManager", "devtools", "pak") install.packages(cran_dependencies, Ncpus = 4) 2.2.2 terra installation terra may have some additional steps when installing depending on which system you are on. Please see the terra repo for specifics. Installations of the CRAN release on Windows and Mac are expected to be simple, only requiring the code below. For Linux, there are several prerequisite installs: GDAL (>= 2.2.3), GEOS (>= 3.4.0), PROJ (>= 4.9.3), sqlite3 On our AlmaLinux 8 HPC, the following versions have been working well: gdal/3.6.4 geos/3.11.1 proj/9.2.0 sqlite3/3.37.2 install.packages("terra") 2.2.3 Matrix installation !! FOR R VERSIONS LOWER THAN 4.4.0 !! Giotto requires Matrix 1.6-2 or greater, but when installing Giotto with pak on an R version lower than 4.4.0, the installation can fail asking for R 4.5 which doesn’t exist yet. We can solve this by installing the 1.6-5 version directly by un-commenting and running the line below. # devtools::install_version("Matrix", version = "1.6-5") 2.2.4 Rtools installation Before installing Giotto on a windows PC please make sure to install the relevant version of Rtools. If you have a Mac or linux PC, or have already installed Rtools, please ignore this step. 2.2.5 Giotto installation pak::pak("drieslab/Giotto") pak::pak("drieslab/GiottoData") 2.2.6 irlba install Reinstall irlba from source. Avoids the common function 'as_cholmod_sparse' not provided by package 'Matrix' error. See this issue for more info. install.packages("irlba", type = "source") 2.2.7 arrow install arrow is a suggested package that we use here to open parquet files. The parquet files that 10X provides use zstd compression which the default arrow installation may not provide. has_arrow <- requireNamespace("arrow", quietly = TRUE) zstd <- TRUE if (has_arrow) { zstd <- arrow::arrow_info()$capabilities[["zstd"]] } if (!has_arrow || !zstd) { Sys.setenv(ARROW_WITH_ZSTD = "ON") install.packages("assertthat", "bit64") install.packages("arrow", repos = c("https://apache.r-universe.dev")) } 2.2.8 Bioconductor dependencies: bioc_dependencies <- c( "scran", "ComplexHeatmap", "SpatialExperiment", "ggspavis", "scater", "nnSVG" ) 2.2.9 CRAN packages: needed_packages_cran <- c( "dplyr", "gstat", "hdf5r", "miniUI", "shiny", "xml2", "future", "future.apply", "exactextractr", "tidyr", "viridis", "quadprog", "Rfast", "pheatmap", "patchwork", "Seurat", "harmony", "scatterpie", "R.utils", "qs" ) pak::pkg_install(c(bioc_dependencies, needed_packages_cran)) 2.2.10 Packages from GitHub github_packages <- c( "satijalab/seurat-data" ) pak::pkg_install(github_packages) 2.2.11 Python environments # default giotto environment Giotto::installGiottoEnvironment() reticulate::py_install( pip = TRUE, envname = 'giotto_env', packages = c( "scanpy" ) ) # install another environment with py 3.8 for cellpose reticulate::conda_create(envname = "giotto_cellpose", python_version = 3.8) #.re.restartR() reticulate::use_condaenv('giotto_cellpose') reticulate::py_install( pip = TRUE, envname = 'giotto_cellpose', packages = c( "pandas", "networkx", "python-igraph", "leidenalg", "scikit-learn", "cellpose", "smfishhmrf", 'tifffile', 'scikit-image' ) ) "],["spatial-omics-technologies.html", "3 Spatial omics technologies 3.1 Presentation 3.2 Short summary", " 3 Spatial omics technologies Ruben Dries August 5th 2024 3.1 Presentation 3.2 Short summary 3.2.1 Why do we need spatial omics technologies? Spatial omics allows us to examine the role of one or more cells within its normal context. This spatial context is typically organized at multiple length scales, and considers both adjacent neighboring cells and larger levels of tissue organization. Figure 3.1: Capturing tissue complexity with RNA-seq, scRNAseq, and Spatial Omics 3.2.2 What is spatial omics? Spatial omics is typically a combination of spatial sequencing and/or imaging together with understanding the obtained results through spatial data science. Figure 3.2: Spatial Omics Constituents 3.2.3 What are the main spatial omics technologies? The large majority - and most popular or accessible - spatial technologies are: - spatial antibody-multiplex proteomics - spatial multiplex in situ hybridization (ISH)-based transcriptomics - spatial sequencing-based transcriptomics Figure 3.3: Lewis et al. Nat Meth Review. Characteristics of spatial omics technologies 3.2.4 Other Spatial omics: ATAC-seq, CUT&Tag, lipidomics, etc A growing number of other spatial technologies exist that profile different types of molecular analytes. One example is using a deterministic barcoding approach (Rong Fan’s group) to explore open (ATAC-seq) or modified (CUT&Tag) chromatin in a spatially aware manner. Figure 3.4: Vandereyken et al. Nat Rev Genetics. Spatial deterministic barcoding for ATAC-seq and CUT&tag 3.2.5 What are the different types of spatial downstream analyses? There exist a large and diverse amount of different downstream spatial data analyses that use different available data types and formats as input. Figure 3.5: Dries, R. et al. Genome Res. Downstream analysis in spatial data analysis. "],["introduction-to-the-giotto-package.html", "4 Introduction to the Giotto package 4.1 Presentation 4.2 Ecosystem 4.3 Installation + python environment 4.4 Giotto instructions", " 4 Introduction to the Giotto package Ruben Dries & Jiaji George Chen August 5th 2024 4.1 Presentation 4.2 Ecosystem Giotto Suite is a modular ecosystem of individual R packages that each provide different functionality and that together provide users with a fully integrated spatial multi-omics workflow. Figure 4.1: Overview of the modular Giotto Suite ecosystem Each package also has its own website: - GiottoUtils: https://drieslab.github.io/GiottoUtils/ - GiottoClass: https://drieslab.github.io/GiottoClass/ - GiottoData: https://drieslab.github.io/GiottoData/ - GiottoVisuals: https://drieslab.github.io/GiottoVisuals/ More information is available at https://drieslab.github.io/Giotto_website/articles/ecosystem.html 4.3 Installation + python environment 4.3.1 Giotto installation Giotto Suite is currently installable only from GitHub, but we are actively working on getting it into a major repository. Much of this already covered in Section 2.2, but the highlights are: 4.3.1.1 System prerequisites for windows, Rtools needs to be installed a major dependency terra needs GDAL (>= 2.2.3), GEOS (>= 3.4.0), PROJ (>= 4.9.3), sqlite3 on linux 4.3.1.2 Installation of released version To install the currently released version of Giotto in a single step: pak::pak("drieslab/Giotto") This should automatically install all the Giotto dependencies and other Giotto module packages (main branch). 4.3.1.3 Installation of dev branch Giotto packages pak tends to forcibly install all dependencies, which can have issues when working with multiple dev branch packages. You can install dev branch versions by using devtools::install_github() instead Core module dev branchs: \"drieslab/Giotto@suite_dev\" \"drieslab/GiottoVisuals@dev\" \"drieslab/GiottoClass@dev\" \"drieslab/GiottoUtils@dev\" devtools::install_github("drieslab/GiottoClass@dev") 4.3.1.4 Common install issues If installing on an R version earlier than 4.4, pak can throw errors when installing Matrix. To get around this, install Matrix v1.6-5 and then installing Giotto with pak should work. devtools::install_version("Matrix", version = "1.6-5") If you come across the function 'as_cholmod_sparse' not provided by package 'Matrix' error when running Giotto, reinstalling irlba from source may resolve it. install.packages("irlba", type = "source") 4.3.2 Python environment 4.3.2.1 Default installation In order to make use of python packages, the first thing to do after installing Giotto for the first time is to create a giotto python environment. Giotto provides the following as a convenience wrapper around reticulate functions to setup a default environment. library(Giotto) installGiottoEnvironment() Two things are needed for python to work: A conda (e.g. miniconda or anaconda) installation which is the package and environment management system. Independent environment(s) with specific versions of the python language and associated python packages. installGiottoEnvironment() checks both and will install miniconda using reticulate if necessary. If a specific conda binary already exists that you want to use, the conda param can be set, or you can set the reticulate option options(\"reticulate.conda_binary\" = \"[conda path]\") or Sys.setenv(\"RETICULATE_CONDA\" = \"[conda path]\"). After ensuring the conda binary exists, the default Giotto environment is installed which is a python 3.10.2 environment named ‘giotto_env’. It will contain several default packages that Giotto installs: “pandas==1.5.1” “networkx==2.8.8” “python-igraph==0.10.2” “leidenalg==0.9.0” “python-louvain==0.16” “python.app==1.4” (if needed) “scikit-learn==1.1.3” 4.3.2.2 Custom installs Custom python environments can be made by first setting up a new environment and establishing the name and python version to use. reticulate::conda_create(envname = "[name of env]", python_version = ???) Following that, one or more python packages to install can be added to the environment. reticulate::py_install( pip = TRUE, envname = '[name of env]', packages = c( "package1", "package2", "..." ) ) Once an environment has been set up, Giotto can hook into it. 4.3.2.3 Using a specific environment When using python through reticulate, R only allows one environment to be activated per session. Once a session has loaded a python environment, it can no longer switch to another one. Giotto activates a python environment when any of the following happens: a giotto object is created giottoInstructions are created (createGiottoInstructions()) GiottoClass::set_giotto_python_path() is called (most straightforward) Which environment is activated is based on a set of 5 defaults in decreasing priority. User provided (when python_path param is given. Either a full filepath or an env name are accepted.) Any provided path or envname set in options options(\"giotto.py_path\" = \"[path to env or envname]\") Default expected giotto environment location based on reticulate::miniconda_path() Envname \"giotto_env\" System default python environment Method 2 is most recommended when there is a non-standard python environment to regularly use with Giotto. You would run file.edit(\"~/.Rprofile\") and then add options(\"giotto.py_path\" = \"[path to env or envname]\") as a line so that it is automatically set at the start of each session. If a specific environment should only be used a couple times then method 1 is easiest: GiottoClass::set_giotto_python_path(python_path = "[path to env or envname]") To check which conda environments exist on your machine: reticulate::conda_list() Once an environment is activated, you can check more details and ensure that it is the one you are expecting by running: reticulate::py_config() 4.4 Giotto instructions Giotto uses giottoInstructions in order to set a behavior for a particular giotto object. Most commonly used are: python_path - when set, will activate a python environment save_dir - save directory to use. Usually for plots generated. This can help speed things up since the viewer no longer has to render. save_plot - whether to save plots to the save_dir return_plot - whether to return the plot objects. When FALSE, only NULL is returned show_plot - whether to show the plot in the viewer These objects are created with createGiottoInstructions() and the created objects can be edited afterwards using the instructions() generic function. library(Giotto) save_dir <- "results/01_session2/" # this call will also intialize the python env instrs <- createGiottoInstructions( save_dir = save_dir, # working directory is the default show_plot = FALSE, save_plot = TRUE, return_plot = FALSE, python_path = NULL # when NULL, this calls GiottoClass::set_giotto_python_path() to get the default ) force(instrs) Giotto object creation functions all have an instructions param for passing in instructions objects. giotto objects will also respond to the instructions() generic. test <- giotto(instructions = instrs) # passing NULL instead will also generate a default instructions object # example plot g <- GiottoData::loadGiottoMini("visium") instructions(g) <- instrs instructions(g, "show_plot") # instructions say not to plot to viewer spatPlot2D(g, show_image = TRUE, image_name = "image") # instead it will directly write to the results folder As an example, you can also set individual instructions instructions(g, "show_plot") <- TRUE spatPlot2D(g, show_image = TRUE, image_name = "image") Figure 4.2: example image output "],["data-formatting-and-pre-processing.html", "5 Data formatting and Pre-processing 5.1 Data formats 5.2 Pre-processing 5.3 Subobject utility functions", " 5 Data formatting and Pre-processing Jiaji George Chen August 5th 2024 5.1 Data formats There are many kinds of outputs and data formats that are currently being used in the spatial omics field for storage and dissemination of information. The following are some that we commonly work with. For Giotto, much of the data wrangling task is to get the information read in from these formats into R native formats and wrapped as Giotto subobjects. The subobjects then enforce formatting and allow the data types to behave as building blocks of the giotto object 5.1.1 General formats .csv/.tsv are standard delimited filetypes, where the values are separated by commas (.csv), tabs (.tsv). These can be read in with a wide array of functions and packages: utils::read.delim(), readr::read_delim(), data.table::fread() etc. They are easy to use, but large files are hard to scan through. 5.1.2 Matrix formats 10X regularly provides their cell feature counts matrices in both the .mtx (matrix market or MM) and .h5 formats. The MM formats come in a zipped folder. Within, the structure is usually ├── barcodes.tsv.gz ├── features.tsv.gz └── matrix.mtx.gz MM format by itself does not carry dimnames so they are stored in .tsv files for the barcodes (cells/observations) and features. barcodes.tsv.gz from a Xenium dataset V1 <char> 1: aaaadpbp-1 2: aaaaficg-1 3: aaabbaka-1 4: aaabbjoo-1 5: aaablchg-1 --- 162250: ojaaphhh-1 162251: ojabeldf-1 162252: ojacfbid-1 162253: ojacfhhg-1 162254: ojacpeii-1 features.tsv.gz from a Xenium dataset V1 V2 V3 <char> <char> <char> 1: ENSG00000121270 ABCC11 Gene Expression 2: ENSG00000130234 ACE2 Gene Expression 3: ENSG00000213088 ACKR1 Gene Expression 4: ENSG00000107796 ACTA2 Gene Expression 5: ENSG00000163017 ACTG2 Gene Expression --- 537: UnassignedCodeword_0495 UnassignedCodeword_0495 Unassigned Codeword 538: UnassignedCodeword_0496 UnassignedCodeword_0496 Unassigned Codeword 539: UnassignedCodeword_0497 UnassignedCodeword_0497 Unassigned Codeword 540: UnassignedCodeword_0498 UnassignedCodeword_0498 Unassigned Codeword 541: UnassignedCodeword_0499 UnassignedCodeword_0499 Unassigned Codeword The matrix.mtx file then contains the actual sparse matrix values in triplet format. The .h5 format is very similar, except that it is a hierarchical format that contains all three of these items in the same file. Giotto provides get10Xmatrix() and get10Xmatrix_h5() as convenient functions to open these exports and read them in as one or more Matrix sparse representations. 5.1.3 Tabular formats .parquet is a great format for storing large amounts of table information and providing fast access to only portions of the data at a time. 10X is using this format for things such as the table of all transcripts detections in Xenium or the polygons. They can be opened and worked with using arrow and dplyr verbs. Currently, giotto extracts information from these files and then converts them to in-memory data.tables or terra SpatVectors depending on what data they contain. 5.1.4 Spatial formats .shp and .geojson are common formats for polygon and point data. They are commonly used as exports from segmentation software such as QuPath. GiottoClass::createGiottoPolygon() and the more specific createGiottoPolygonsFromGeoJSON() can be used for reading these in. library(GiottoClass) shp <- system.file("extdata/toy_poly.shp", package = "GiottoClass") gpoly <- createGiottoPolygon(shp, name = "test") plot(gpoly) Figure 5.1: Plot of giottoPolygon from .shp 5.1.5 Mask files .tif files can be used as mask files where the integer values of the image encode where an annotation is. createGiottoPolygonsFromMask() guesses whether the image is single value or multi value mask. NanoString CosMx is one example of a platform that distributes the polygon information through a series of mask files. m <- system.file("extdata/toy_mask_multi.tif", package = "GiottoClass") plot(terra::rast(m), col = grDevices::hcl.colors(7)) Figure 5.2: Example mask image. Integer values are shown as different colors gp <- createGiottoPolygon( m, flip_vertical = FALSE, flip_horizontal = FALSE, shift_horizontal_step = FALSE, shift_vertical_step = FALSE, ID_fmt = "id_test_%03d", name = "test" ) force(gp) An object of class giottoPolygon spat_unit : "test" Spatial Information: class : SpatVector geometry : polygons dimensions : 7, 1 (geometries, attributes) extent : 3, 27, 1.04, 11.96 (xmin, xmax, ymin, ymax) coord. ref. : centroids : NULL overlaps : NULL plot(gp, col = grDevices::hcl.colors(7)) Figure 5.3: giottoPolygon from mask image. Identical coloring order implies that encoded IDs have been properly imported. For situations where all pixel values are the same, but not touching indicates different annotations: m2 <- system.file("extdata/toy_mask_single.tif", package = "GiottoClass") plot(terra::rast(m2), col = grDevices::hcl.colors(7)) Figure 5.4: Example mask image with only 1 value gpoly1 <- createGiottoPolygonsFromMask( m2, flip_vertical = FALSE, flip_horizontal = FALSE, shift_horizontal_step = FALSE, shift_vertical_step = FALSE, ID_fmt = "id_test_%03d", name = "multi_test" ) plot(gpoly1, col = grDevices::hcl.colors(7)) Figure 5.5: giottoPolygon from single value mask 5.1.6 images Most images are openable using createGiottoLargeImage() which wraps terra::rast(). This allows compatibility with most common image types. Recent and non-geospatially related image formats are not well supported however. One example is ome.tif which 10X uses for large image exports from Xenium. For these, we use ometif_to_tif() to convert them into normal .tif files using the python tifffile package. ometif_metadata() can be used to extract and access the associated ome xml image metadata. 5.1.7 jsonlike formats jsonlike formats are ones that can be read in with jsonlite::read_json() and then coerced into list-like or tabular structures. 10X uses these .json to report the scalefactors information in Visium datasets. The .xenium file format is also openable as a json-like. 5.1.8 Hierarchical formats There are many types of data in spatial-omics analysis. Hierarchical formats afford both a way to organize complex multi-type data and also to store and distribute them. In R, these can be opened with either hdf5r on CRAN or rhdf5 on BioConductor. The complex nature of these formats and also the fact they are just a storage format and not an organizational specification means that what data and how it is stored and represented can often be very different. .gef and .bgef which StereoSeq exports are .hdf5-like formats. .h5ad is a specific flavor of these file formats where they follow the AnnData framework so that there is more common structure in how datasets are stored. Giotto provides anndataToGiotto() and giottoToAnnData() interoperability functions for interconverting. .zarr is another hierarchical storage structure, however currently the R-native support is still being developed. 5.2 Pre-processing The most common types of raw data needed for a Giotto object are expression matrices, centroids information, spatial feature points, polygons. Evaluation of input data and conversion to compatible formats happens inside the create* functions that Giotto exports. There is one of these for each of the subobject classes. 5.2.1 Expression matrix Not much processing is needed for matrices. All that is needed is a data type that is coercible to matrix (or Matrix classes). Dimnames should be added. Columns should be cells or observations. Rows should be features or variables. m <- matrix(sample(c(rep(1, 10), rep(0, 90))), nrow = 10) rownames(m) <- sprintf("feat_%02d", seq(10)) colnames(m) <- sprintf("cell_%02d", seq(10)) x <- createExprObj(m) An object of class exprObj : "test" spat_unit : "cell" feat_type : "rna" contains: 10 x 10 sparse Matrix of class "dgCMatrix" feat_01 . 1 1 1 . . . . . . feat_02 . . . . . . . . . . feat_03 . . . . . 1 . . . . feat_04 1 . . . 1 . . . . . ........suppressing 2 rows in show(); maybe adjust options(max.print=, width=) feat_07 . 1 . . . 1 . . . 1 feat_08 . . . . . . . . . . feat_09 . . . . . . . . . . feat_10 . . . . . . . . . . First four colnames: cell_01 cell_02 cell_03 cell_04 5.2.2 Spatial locations For Giotto centroid locations, a tabular data.frame-like format is required. The first non-numeric column found will be set as the cell_ID. The numeric columns will then be kept as coordinates information. set.seed(1234) xy <- data.frame( a = as.character(seq(100)), b = rnorm(100), c = rnorm(100) ) sl_xy <- createSpatLocsObj(xy) plot(sl_xy) Figure 5.6: Plot of spatLocsObj created from xy information set.seed(1234) xyz <- data.frame( a = as.character(seq(100)), b = rnorm(100), c = rnorm(100), d = rnorm(100) ) sl_xyz <- createSpatLocsObj(xyz) plot(sl_xyz) Figure 5.7: Plot of spatLocsObj created from xy and z information 5.2.3 giottoPoints giottoPoints are very similar. These subobjects wrap a terra SpatVector object and if tabular data is provided, what is needed are x, y, and feature ID. Additional columns are kept as metadata information. set.seed(1234) tx <- data.frame( id = sprintf("gene_%05d", seq(1e4)), x = rnorm(1e4), y = rnorm(1e4), meta = sprintf("metadata_%05d", seq(1e4)) ) gpoints <- createGiottoPoints(tx) plot(gpoints, raster = FALSE) plot(gpoints, dens = TRUE) An object of class giottoPoints feat_type : "rna" Feature Information: class : SpatVector geometry : points dimensions : 10000, 3 (geometries, attributes) extent : -3.396064, 3.618107, -4.126628, 3.727291 (xmin, xmax, ymin, ymax) coord. ref. : names : feat_ID meta feat_ID_uniq type : <chr> <chr> <int> values : gene_00001 metadata_00001 1 gene_00002 metadata_00002 2 gene_00003 metadata_00003 3 Figure 5.8: giottoPoints plotted without rasterization (left), with rasterization and colored by density (right) 5.2.4 giottoPolygon Polygon information is often provided as a known spatial format or as image masks, which can be read in as shown earlier. However, they can also be provided as numerical values. This is the case for Vizgen MERSCOPE and 10X Xenium outputs, both of which now use .parquet to provide cell barcodes and xy vertices associated with them. set.seed(1234) hex <- hexVertices(radius = 1) spatlocs <- data.table::data.table( sdimx = rnorm(10, mean = 5, sd = 20), sdimy = rnorm(10, mean = 5, sd = 20), cell_ID = paste0("spot_", seq_len(10)) ) random_hex <- polyStamp(hex, spatlocs) random_hex_poly <- createGiottoPolygon(random_hex) plot(random_hex_poly) Figure 5.9: giottoPolygon created from ID and vertices 5.3 Subobject utility functions The giotto object is hierarchically organized first by slots that define their subobject/information type, then usually by which spatial unit and feature type information they contain. Lastly, they have specific object names. This makes the object very manually explorable. Most of the subobjects are tagged with metadata information that allow them find their place within this nesting, and there are also common functions that giotto subobjects respond to. 5.3.1 IDs spatIDs() and featIDs() are used to find the spatial or feature IDs of an object. spatIDs(sl_xy) [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" [13] "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" [25] "25" "26" "27" "28" "29" "30" "31" "32" "33" "34" "35" "36" [37] "37" "38" "39" "40" "41" "42" "43" "44" "45" "46" "47" "48" [49] "49" "50" "51" "52" "53" "54" "55" "56" "57" "58" "59" "60" [61] "61" "62" "63" "64" "65" "66" "67" "68" "69" "70" "71" "72" [73] "73" "74" "75" "76" "77" "78" "79" "80" "81" "82" "83" "84" [85] "85" "86" "87" "88" "89" "90" "91" "92" "93" "94" "95" "96" [97] "97" "98" "99" "100" spatIDs(gpoly) "a" "b" "c" "d" "e" "f" "g" head(featIDs(gpoints)) "gene_00001" "gene_00002" "gene_00003" "gene_00004" "gene_00005" "gene_00006" 5.3.2 Bracket subsetting and extraction Most of the subobjects also respond to indexing with [, but since many of them are wrappers around an underlying data structure, empty [ calls will drop the object to the contained data structure gpoly[1:2] An object of class giottoPolygon spat_unit : "test" Spatial Information: class : SpatVector geometry : polygons dimensions : 2, 2 (geometries, attributes) extent : 3.015771, 12, 1.003947, 6.996053 (xmin, xmax, ymin, ymax) coord. ref. : names : poly_ID idx type : <chr> <int> values : a 10 b 9 centroids : NULL overlaps : NULL gpoly[c("a", "e")] An object of class giottoPolygon spat_unit : "test" Spatial Information: class : SpatVector geometry : polygons dimensions : 2, 2 (geometries, attributes) extent : 3.015771, 27, 1.003947, 6.996053 (xmin, xmax, ymin, ymax) coord. ref. : names : poly_ID idx type : <chr> <int> values : a 10 e 6 gpoints[] class : SpatVector geometry : points dimensions : 10000, 3 (geometries, attributes) extent : -3.396064, 3.618107, -4.126628, 3.727291 (xmin, xmax, ymin, ymax) coord. ref. : names : feat_ID meta feat_ID_uniq type : <chr> <chr> <int> values : gene_00001 metadata_00001 1 gene_00002 metadata_00002 2 gene_00003 metadata_00003 3 5.3.3 Nesting metadata generics spatUnit(), featType(), objName, prov() are all generics that act on the metadata of the subobjects. They work both to access and replace the information. featType(x) [1] "rna" objName(x) <- "raw2" spatUnit(x) <- "aggregate" force(x) An object of class exprObj : "raw2" spat_unit : "aggregate" feat_type : "rna" contains: 10 x 10 sparse Matrix of class "dgCMatrix" feat_01 . 1 1 1 . . . . . . feat_02 . . . . . . . . . . feat_03 . . . . . 1 . . . . feat_04 1 . . . 1 . . . . . ........suppressing 2 rows in show(); maybe adjust options(max.print=, width=) feat_07 . 1 . . . 1 . . . 1 feat_08 . . . . . . . . . . feat_09 . . . . . . . . . . feat_10 . . . . . . . . . . First four colnames: cell_01 cell_02 cell_03 cell_04 5.3.4 Appending to a giotto object Subobjects are formatted to for Giotto and can directly be added to the giotto object using the setGiotto() generic. # initialize an empty object g <- giotto() g <- setGiotto(g, x) force(g) An object of class giotto >Active spat_unit: aggregate >Active feat_type: rna [SUBCELLULAR INFO] [AGGREGATE INFO] expression ----------------------- [aggregate][rna] raw2 Use objHistory() to see steps and params used "],["creating-a-giotto-object.html", "6 Creating a Giotto object 6.1 Overview 6.2 GiottoData modular package 6.3 From matrix + locations 6.4 From subcellular raw data (transcripts or images) + polygons 6.5 From piece-wise 6.6 Using convenience functions for popular technologies (Vizgen, Xenium, CosMx, …) 6.7 Plotting 6.8 Subsetting", " 6 Creating a Giotto object Jiaji George Chen August 5th 2024 6.1 Overview The minimal amount of raw data needed to put together a fully functional giotto object are either of the following: spatial coordinates (centroids) and expression matrix information spatial feature information (points or image intensity values) and spatial annotations to aggregate that feature information with (polygons/mask). You can either use the create* style functions introduced in the previous session and build up the object piecewise or you can use the giotto object constructor functions createGiottoObject() and createGiottoObjectSubcellular() 6.2 GiottoData modular package We can showcase the construction of objects by pulling some raw data from the GiottoData package. A dataset was loaded from here earlier in the previous section, but to formally introduce it, this package contains mini datasets and also download links to other publicly available datasets. It helps with prototyping and development and also making reproducible examples. The mini examples from popular platform datasets can also help give an understanding of what their data is like and how Giotto represents them. 6.3 From matrix + locations For this, we will load some visium expression information and spatial locations. library(Giotto) # function to get a filepath from GiottoData mini_vis_raw <- function(x) { system.file( package = "GiottoData", file.path("Mini_datasets", "Visium", "Raw", x) ) } mini_vis_expr <- mini_vis_raw("visium_DG_expr.txt.gz") |> data.table::fread() |> GiottoUtils::dt_to_matrix() mini_vis_expr[seq(5), seq(5)] 5 x 5 sparse Matrix of class "dgCMatrix" AAAGGGATGTAGCAAG-1 AAATGGCATGTCTTGT-1 AAATGGTCAATGTGCC-1 AAATTAACGGGTAGCT-1 AACAACTGGTAGTTGC-1 Gna12 1 2 1 1 9 Ccnd2 . 1 1 . . Btbd17 . 1 1 1 . Sox9 . . . . . Sez6 . 1 4 3 . mini_vis_slocs <- mini_vis_raw("visium_DG_locs.txt") |> data.table::fread() head(mini_vis_slocs) V1 V2 <int> <int> 1: 5477 -4125 2: 5959 -2808 3: 4720 -5202 4: 5202 -5322 5: 4101 -4604 6: 5821 -3047 With these two pieces of data, we can make a fully working giotto object. The spatial locations are missing cell_ID names, but they will be detected from the expression information. mini_vis <- createGiottoObject( expression = mini_vis_expr, spatial_locs = mini_vis_slocs ) instructions(mini_vis, "return_plot") <- FALSE # set return_plot = FALSE otherwise we will get duplicate outputs in code chunks For a simple example plot: spatFeatPlot2D(mini_vis, feats = c("Gna12", "Gfap"), expression_values = "raw", point_size = 2.5, gradient_style = "sequential", background_color = "black" ) Figure 6.1: Example spatial feature plot to show functioning object 6.4 From subcellular raw data (transcripts or images) + polygons You can also make giotto objects starting from raw spatial feature information and annotations that give them spatial context. # function to get a filepath from GiottoData mini_viz_raw <- function(x) { system.file( package = "GiottoData", file.path("Mini_datasets", "Vizgen", "Raw", x) ) } mini_viz_dt <- mini_viz_raw(file.path("cell_boundaries", "z0_polygons.gz")) |> data.table::fread() mini_viz_poly <- createGiottoPolygon(mini_viz_dt) force(mini_viz_poly) An object of class giottoPolygon spat_unit : "cell" Spatial Information: class : SpatVector geometry : polygons dimensions : 498, 1 (geometries, attributes) extent : 6399.244, 6903.243, -5152.39, -4694.868 (xmin, xmax, ymin, ymax) coord. ref. : names : poly_ID type : <chr> values : 40951783403982682273285375368232495429 240649020551054330404932383065726870513 274176126496863898679934791272921588227 centroids : NULL overlaps : NULL plot(mini_viz_poly) Figure 6.2: Example MERSCOPE polygons loaded form vertex info mini_viz_tx <- mini_viz_raw("vizgen_transcripts.gz") |> data.table::fread() mini_viz_tx[, global_y := -global_y] # flip values to match polys viz_gpoints <- createGiottoPoints(mini_viz_tx) force(viz_gpoints) An object of class giottoPoints feat_type : "rna" Feature Information: class : SpatVector geometry : points dimensions : 80343, 3 (geometries, attributes) extent : 6400.037, 6900.032, 4699.979, 5149.983 (xmin, xmax, ymin, ymax) coord. ref. : names : feat_ID global_z feat_ID_uniq type : <chr> <int> <int> values : Mlc1 0 1 Gprc5b 0 2 Gfap 0 3 plot(viz_gpoints) Figure 6.3: Example mini MERSCOPE transcripts data mini_viz <- createGiottoObjectSubcellular( gpolygons = mini_viz_poly, gpoints = viz_gpoints ) instructions(mini_viz, "return_plot") <- FALSE force(mini_viz) An object of class giotto >Active spat_unit: cell >Active feat_type: rna [SUBCELLULAR INFO] polygons : cell features : rna [AGGREGATE INFO] Use objHistory() to see steps and params used # calculate centroids mini_viz <- addSpatialCentroidLocations(mini_viz) # create aggregated information mini_viz <- calculateOverlap(mini_viz) mini_viz <- overlapToMatrix(mini_viz) spatFeatPlot2D( mini_viz, feats = c("Grm4", "Gfap"), expression_values = "raw", point_size = 2.5, gradient_style = "sequential", background_color = "black" ) Figure 6.4: Example mini MERSCOPE aggregated feature counts 6.5 From piece-wise You can also piece-wise assemble an object independently of one of the 2 previously shown convenience functions. g <- giotto() # initialize empty gobject g <- setGiotto(g, mini_viz_poly) g <- setGiotto(g, viz_gpoints) force(g) An object of class giotto >Active spat_unit: cell >Active feat_type: rna [SUBCELLULAR INFO] polygons : cell features : rna [AGGREGATE INFO] Use objHistory() to see steps and params used This is essentially the same object as the one created through createGiottoObjectSubcellular() earlier. 6.6 Using convenience functions for popular technologies (Vizgen, Xenium, CosMx, …) There are also several convenience functions we provide for loading in data from popular platforms. These functions take care of reading the expected output folder structures, auto-detecting where needed data items are, formatting items for ingestion, then object creation. Many of these will be touched on later during other sessions. createGiottoVisiumObject() createGiottoVisiumHDObject() createGiottoXeniumObject() createGiottoCosMxObject() createGiottoMerscopeObject() 6.7 Plotting 6.7.1 Subobject plotting Giotto has several spatial plotting functions. At the lowest level, you directly call plot() on several subobjects in order to see what they look like, particularly the ones containing spatial info. Here we load several mini subobjects which are taken from the vizgen MERSCOPE mini dataset. To see which mini objects are available for independent loading with GiottoData::loadSubObjectMini(), you can run GiottoData::listSubobjectMini() gpoints <- GiottoData::loadSubObjectMini("giottoPoints") plot(gpoints) plot(gpoints, dens = TRUE, col = getColors("magma", 255)) plot(gpoints, raster = FALSE) plot(gpoints, feats = c("Grm4", "Gfap")) Figure 6.5: giottoPoints plots. Rasterized (top left), Rasterized and colored with ‘magma’ color scale by density (top right), Non-rasterized (bottom left), Plotting specifically 2 features (bottom right) gpoly <- GiottoData::loadSubObjectMini("giottoPolygon") plot(gpoly) plot(gpoly, type = "centroid") plot(gpoly, max_poly = 10) Figure 6.6: giottoPolygon plots. default (left), plotting centroids (middle), auto changing to centroids after there are more polygons to plot than max_poly param (right) spatlocs <- GiottoData::loadSubObjectMini("spatLocsObj") plot(spatlocs) Figure 6.7: Plot of spatLocsObj spatnet <- GiottoData::loadSubObjectMini("spatialNetworkObj") plot(spatnet) Figure 6.8: Plot of spatialNetworkObj pca <- GiottoData::loadSubObjectMini("dimObj") plot(pca, dims = c(3,10)) Figure 6.9: Plot of PCA dimObj showing the 3rd and 10th PCs 6.7.2 Additive subobject plotting These base plotting functions inherit from terra::plot(). They can be used additively with more than one object. gimg <- GiottoData::loadSubObjectMini("giottoLargeImage") plot(gimg, col = getMonochromeColors("#5FAFFF")) plot(gpoly, border = "maroon", lwd = 0.5, add = TRUE) Figure 6.10: Plot image with monochrome color scaling with added polygon borders 6.7.3 Giotto object plotting Giotto also has several ggplot2-based plotting functions that work on the whole giotto object. Here we load the vizgen mini dataset from GiottoData which contains a lot of worked through data. 6.7.3.1 Giotto spatial plot functions spatPlot() - standard centroid-based plotting geared towards metadata plotting g <- GiottoData::loadGiottoMini("vizgen") activeSpatUnit(g) <- "aggregate" # set default spat_unit to the one with lots of results force(g) An object of class giotto >Active spat_unit: aggregate >Active feat_type: rna [SUBCELLULAR INFO] polygons : z0 z1 aggregate features : rna [AGGREGATE INFO] expression ----------------------- [z0][rna] raw [z1][rna] raw [aggregate][rna] raw normalized scaled pearson spatial locations ---------------- [z0] raw [z1] raw [aggregate] raw spatial networks ----------------- [aggregate] Delaunay_network kNN_network spatial enrichments -------------- [aggregate][rna] cluster_metagene dim reduction -------------------- [aggregate][rna] pca umap tsne nearest neighbor networks -------- [aggregate][rna] sNN.pca attached images ------------------ images : 4 items... Use objHistory() to see steps and params used spatPlot2D(g) What metadata do we have in this mini object? pDataDT(g) cell_ID nr_feats perc_feats total_expr leiden_clus <char> <int> <num> <num> <num> 1: 240649020551054330404932383065726870513 5 1.483680 49.40986 2 2: 274176126496863898679934791272921588227 27 8.011869 191.50684 2 3: 323754550002953984063006506310071917306 23 6.824926 173.86955 4 4: 87260224659312905497866017323180367450 37 10.979228 246.04928 5 5: 17817477728742691260808256980746537959 18 5.341246 142.44520 4 --- 458: 6380671372744430258754116433861320161 54 16.023739 339.24383 2 459: 75286702783716447443887872812098770697 45 13.353116 286.81011 1 460: 9677424102111816817518421117250891895 30 8.902077 211.71790 2 461: 17685062374745280598492217386845129350 5 1.483680 48.99550 2 462: 32422253415776258079819139802733069941 12 3.560831 102.52805 2 louvain_clus <num> 1: 0 2: 3 3: 8 4: 6 5: 7 --- 458: 0 459: 23 460: 3 461: 14 462: 0 We have some expression count statistics and clustering annotations already present in the object spatPlot2D(g, cell_color = "leiden_clus") spatPlot2D(g, cell_color = "leiden_clus", show_image = TRUE, image_name = "dapi_z0") spatPlot2D(g, cell_color = "total_expr", color_as_factor = FALSE, gradient_style = "sequential") spatPlot2D(g, cell_color = "leiden_clus", group_by = "leiden_clus") Figure 6.11: Spatial plots spatCellPlot() - centroid-based plotting for spatial enrichment values We have a cluster_metagene enrichment already made in the object that is a numerical measure of how much each of the cells map to the leiden clusters we have above spatCellPlot2D(g, spat_enr_names = "cluster_metagene", cell_annotation_values = as.character(1:5)) Figure 6.12: Spatial cell plot of cluster_metagene spatial enrichments spatCellPlot2D(g, spat_enr_names = "cluster_metagene", cell_annotation_values = as.character(1:5), cell_color_gradient = "magma", background_color = "black") Figure 6.13: Spatial cell plot of cluster_metagene spatial enrichments spatFeatPlot() - centroid-based plotting for feature expression plotting spatFeatPlot2D(g, feats = c("Flt4", "Mertk"), point_size = 2, expression_values = "scaled") Figure 6.14: Spatial feature expression plot of normalized Flt4 (left) and Mertk expression (right) spatInSituPlotPoints() - subcellular plotting with support for transcript points and polygons spatInSituPlotPoints(g, feats = list(rna = c("Flt4", "Mertk", "Gfap")), # this should be a named list point_size = 0.5, polygon_fill = "total_expr", polygon_fill_as_factor = FALSE, polygon_fill_gradient_style = "sequential", polygon_alpha = 0.5, plot_last = "points", show_image = TRUE ) # without overlaps spatInSituPlotPoints(g, feats = list(rna = c("Flt4", "Mertk", "Gfap")), # this should be a named list point_size = 0.5, use_overlap = FALSE, polygon_fill = "total_expr", polygon_fill_as_factor = FALSE, polygon_fill_gradient_style = "sequential", polygon_alpha = 0.5, plot_last = "points", show_image = TRUE ) Figure 6.15: Points and polygons subcellular plot with 3 transcript species plotted, polygons colored as number of detected transcripts, and dapi image plotted. Left is with only the points overlapped by polygons, right is with all points 6.7.3.2 Giotto expression space plot functions dimPlot() - dimension reduction plotting Also has more specific functions for PCA plotPCA(), UMAP plotUMAP(), tSNE plotTSNE() results. dimPlot(g, dim_reduction_name = "umap", dim_reduction_to_use = "umap", cell_color = "leiden_clus") Figure 6.16: UMAP projection with leiden clustering colors 6.7.3.3 Giotto common plotting args gradient_style - Should the gradient be of ‘divergent’ or ‘sequential’ styles? color_as_factor - Is annotation value a numerical or factor/categorical based item to plot. cell_color_code - What color mapping to provide cell_color - What column of information to use when plotting (metadata, expression, etc.) point_shape - Either ‘border’ or ‘no_border’ to draw on the points. 6.8 Subsetting 6.8.1 ID subsetting Subset the giotto object for a random 300 cell IDs cx <- pDataDT(g) nrow(cx) [1] 462 ex <- getExpression(g) dim(ex) [1] 337 462 instructions(g, "cell_color_c_pal") <- "viridis" instructions(g, "poly_color_c_pal") <- "viridis" set.seed(1234) gsubset <- subsetGiotto(g, cell_ids = sample(spatIDs(g), 300)) cx_sub <- pDataDT(gsubset) nrow(cx_sub) [1] 300 spatPlot(g, cell_color = "total_expr", color_as_factor = FALSE, background_color = "black") spatPlot(gsubset, cell_color = "total_expr", color_as_factor = FALSE, background_color = "black") Figure 6.17: plot showing starting object (left) and subset object (right) 6.8.2 Coordinate-based subsetting gsubsetlocs <- subsetGiottoLocs(g, x_min = 6500, x_max = 6700, poly_info = "aggregate" ) spatPlot(gsubsetlocs, cell_color = "total_expr", color_as_factor = FALSE, background_color = "black") spatInSituPlotPoints(gsubsetlocs, polygon_fill = "total_expr", polygon_fill_as_factor = FALSE) Figure 6.18: plot showing starting object (left) and subset object (right) "],["visium-part-i.html", "7 Visium Part I 7.1 The Visium technology 7.2 Introduction to the spatial dataset 7.3 Download dataset 7.4 Create the Giotto object 7.5 Subset on spots that were covered by tissue 7.6 Quality control 7.7 Filtering 7.8 Normalization 7.9 Feature selection 7.10 Dimension Reduction 7.11 Clustering 7.12 Save the object 7.13 Session info", " 7 Visium Part I Joselyn Cristina Chávez Fuentes August 5th 2024 7.1 The Visium technology Visium allows you to perform spatial transcriptomics, which combines histological information with whole transcriptome gene expression profiles (fresh frozen or FFPE) to provide you with spatially resolved gene expression. Figure 7.1: Visum workflow. Source: 10X Genomics You can use standard fixation and staining techniques, including hematoxylin and eosin (H&E) staining, to visualize tissue sections on slides using a brightfield microscope and immunofluorescence (IF) staining to visualize protein detection in tissue sections on slides using a fluorescent microscope. 7.2 Introduction to the spatial dataset The visium fresh frozen mouse brain tissue (Strain C57BL/6) dataset was obtained from 10X genomics. The tissue was embedded and cryosectioned as described in Visium Spatial Protocols - Tissue Preparation Guide (Demonstrated Protocol CG000240). Tissue sections of 10 µm thickness from a slice of the coronal plane were placed on Visium Gene Expression Slides. You can find more information about his sample here 7.3 Download dataset You need to download the expression matrix and spatial information by running these commands: dir.create("data/01_session5") download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_raw_feature_bc_matrix.tar.gz", destfile = "data/01_session5/V1_Adult_Mouse_Brain_raw_feature_bc_matrix.tar.gz") download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_spatial.tar.gz", destfile = "data/01_session5/V1_Adult_Mouse_Brain_spatial.tar.gz") After downloading, unzip the gz files. You should get the “raw_feature_bc_matrix” and “spatial” folders inside “data/01_session5/”. untar(tarfile = "data/01_session5/V1_Adult_Mouse_Brain_raw_feature_bc_matrix.tar.gz", exdir = "data/01_session5") untar(tarfile = "data/01_session5/V1_Adult_Mouse_Brain_spatial.tar.gz", exdir = "data/01_session5") 7.4 Create the Giotto object createGiottoVisiumObject() will look for the standardized files organization from the visium technology in the data folder and will automatically load the expression and spatial information to create the Giotto object. library(Giotto) ## Set instructions results_folder <- "results/01_session5" python_path <- NULL instructions <- createGiottoInstructions( save_dir = results_folder, save_plot = TRUE, show_plot = FALSE, return_plot = FALSE, python_path = python_path ) ## Provide the path to the visium folder data_path <- "data/01_session5" ## Create object directly from the visium folder visium_brain <- createGiottoVisiumObject( visium_dir = data_path, expr_data = "raw", png_name = "tissue_lowres_image.png", gene_column_index = 2, instructions = instructions ) 7.5 Subset on spots that were covered by tissue Use the metadata column “in_tissue” to highlight the spots corresponding to the tissue area. spatPlot2D( gobject = visium_brain, cell_color = "in_tissue", point_size = 2, cell_color_code = c("0" = "lightgrey", "1" = "blue"), show_image = TRUE) Figure 7.2: Spatial plot of the Visium mouse brain sample, color indicates wheter the spot is in tissue (1) or not (0). Use the same metadata column “in_tissue” to subset the object and keep only the spots corresponding to the tissue area. metadata <- getCellMetadata(gobject = visium_brain, output = "data.table") in_tissue_barcodes <- metadata[in_tissue == 1]$cell_ID visium_brain <- subsetGiotto(gobject = visium_brain, cell_ids = in_tissue_barcodes) 7.6 Quality control Statistics Use the function addStatistics() to count the number of features per spot. The statistics information will be stored in the metadata table under the new column “nr_feats”. Then, use this column to visualize the number of features per spot across the sample. visium_brain_statistics <- addStatistics(gobject = visium_brain, expression_values = "raw") ## visualize spatPlot2D(gobject = visium_brain_statistics, cell_color = "nr_feats", color_as_factor = FALSE) Figure 7.3: Spatial distribution of features per spot. filterDistributions() creates a histogram to show the distribution of features per spot across the sample. filterDistributions(gobject = visium_brain_statistics, detection = "cells") Figure 7.4: Distribution of features per spot. When setting the detection = “feats”, the histogram shows the distribution of cells with certain numbers of features across the sample. filterDistributions(gobject = visium_brain_statistics, detection = "feats") Figure 7.5: Distribution of cells with different features per spot. filterCombinations() may be used to test how different filtering parameters will affect the number of cells and features in the filtered data: filterCombinations(gobject = visium_brain_statistics, expression_thresholds = c(1, 2, 3), feat_det_in_min_cells = c(50, 100, 200), min_det_feats_per_cell = c(500, 1000, 1500)) Figure 7.6: Number of spots and features filtered when using multiple feat_det_in_min_cells and min_det_feats_per_cell combinations. 7.7 Filtering Use the arguments feat_det_in_min_cells and min_det_feats_per_cell to set the minimal number of cells where an individual feature must be detected and the minimal number of features per spot/cell, respectively, to filter the giotto object. All the features and cells under those thresholds will be removed from the sample. visium_brain <- filterGiotto( gobject = visium_brain, expression_threshold = 1, feat_det_in_min_cells = 50, min_det_feats_per_cell = 1000, expression_values = "raw", verbose = TRUE ) Feature type: rna Number of cells removed: 4 out of 2702 Number of feats removed: 7311 out of 22125 7.8 Normalization Use scalefactor to set the scale factor to use after library size normalization. The default value is 6000, but you can use a different one. visium_brain <- normalizeGiotto( gobject = visium_brain, scalefactor = 6000, verbose = TRUE ) Calculate the normalized number of features per spot and save the statistics in the metadata table. visium_brain <- addStatistics(gobject = visium_brain) ## visualize spatPlot2D(gobject = visium_brain, cell_color = "nr_feats", color_as_factor = FALSE) Figure 7.7: Spatial distribution of the number of features per spot. 7.9 Feature selection 7.9.1 Highly Variable Features: Calculating Highly Variable Features (HVF) is necessary to identify genes (or features) that display significant variability across the spots. There are a few methods to choose from depending on the underlying distribution of the data: loess regression is used when the relationship between mean expression and variance is non-linear or can be described by a non-parametric model. visium_brain <- calculateHVF(gobject = visium_brain, method = "cov_loess", save_plot = TRUE, default_save_name = "HVFplot_loess") Figure 7.8: Covariance of HVFs using the loess method. pearson residuals are used for variance stabilization (to account for technical noise) and highlighting overdispersed genes. visium_brain <- calculateHVF(gobject = visium_brain, method = "var_p_resid", save_plot = TRUE, default_save_name = "HVFplot_pearson") Figure 7.9: Variance of HVFs using the pearson residuals method. binned (covariance groups) are used when gene expression variability differs across expression levels or spatial regions, without assuming a specific relationship between mean expression and variance. This is the default method in the calculateHVF() function. visium_brain <- calculateHVF(gobject = visium_brain, method = "cov_groups", save_plot = TRUE, default_save_name = "HVFplot_binned") Figure 7.10: Covariance of HVFs using the binned method. 7.10 Dimension Reduction 7.10.1 PCA Principal Components Analysis (PCA) is applied to reduce the dimensionality of gene expression data by transforming it into principal components, which are linear combinations of genes ranked by the variance they explain, with the first components capturing the most variance. runPCA() will look for the previous calculation of highly variable features, stored as a column in the feature metadata. If the HVF labels are not found in the giotto object, then runPCA() will use all the features available in the sample to calculate the Principal Components. visium_brain <- runPCA(gobject = visium_brain) You can also use specific features for the Principal Components calculation, by passing a vector of features in the “feats_to_use” argument. my_features <- head(getFeatureMetadata(visium_brain, output = "data.table")$feat_ID, 1000) visium_brain <- runPCA(gobject = visium_brain, feats_to_use = my_features, name = "custom_pca") Visualization Create a screeplot to visualize the percentage of variance explained by each component. screePlot(gobject = visium_brain, ncp = 30) Figure 7.11: Screeplot showing the variance explained per principal component. Visualized the PCA calculated using the HVFs. plotPCA(gobject = visium_brain) Figure 7.12: PCA plot using HVFs. Visualized the custom PCA calculated using the vector of features. plotPCA(gobject = visium_brain, dim_reduction_name = "custom_pca") Figure 7.13: PCA using custom features. Unlike PCA, Uniform Manifold Approximation and Projection (UMAP) and t-Stochastic Neighbor Embedding (t-SNE) do not assume linearity. After running PCA, UMAP or t-SNE allows you to visualize the dataset in 2D. 7.10.2 UMAP visium_brain <- runUMAP(visium_brain, dimensions_to_use = 1:10) Visualization plotUMAP(gobject = visium_brain) Figure 7.14: UMAP using the 10 first principal components. 7.10.3 t-SNE visium_brain <- runtSNE(gobject = visium_brain, dimensions_to_use = 1:10) Visualization plotTSNE(gobject = visium_brain) Figure 7.15: tSNE using the 10 first principal components. 7.11 Clustering Create a sNN network (default) visium_brain <- createNearestNetwork(gobject = visium_brain, dimensions_to_use = 1:10, k = 15) Create a kNN network visium_brain <- createNearestNetwork(gobject = visium_brain, dimensions_to_use = 1:10, k = 15, type = "kNN") 7.11.1 Calculate Leiden clustering Use the previously calculated shared nearest neighbors to create clusters. The default resolution is 1, but you can decrease the value to avoid the over calculation of clusters. visium_brain <- doLeidenCluster(gobject = visium_brain, resolution = 0.4, n_iterations = 1000) Visualization plotPCA(gobject = visium_brain, cell_color = "leiden_clus") Figure 7.16: PCA plot, colors indicate the Leiden clusters. Use the cluster IDs to visualize the clusters in the UMAP space. plotUMAP(gobject = visium_brain, cell_color = "leiden_clus", show_NN_network = FALSE, point_size = 2.5) Figure 7.17: UMAP plot, colors indicate the Leiden clusters. Set the argument “show_NN_network = TRUE” to visualize the connections between spots. plotUMAP(gobject = visium_brain, cell_color = "leiden_clus", show_NN_network = TRUE, point_size = 2.5) Figure 7.18: UMAP showing the nearest network. Use the cluster IDs to visualize the clusters on the tSNE. plotTSNE(gobject = visium_brain, cell_color = "leiden_clus", point_size = 2.5) Figure 7.19: tSNE plot, colors indicate the Leiden clusters. Set the argument “show_NN_network = TRUE” to visualize the connections between spots. plotTSNE(gobject = visium_brain, cell_color = "leiden_clus", point_size = 2.5, show_NN_network = TRUE) Figure 7.20: tSNE showing the nearest network. Use the cluster IDs to visualize their spatial location. spatPlot2D(visium_brain, cell_color = "leiden_clus", point_size = 3) Figure 7.21: Spatial plot, colors indicate the Leiden clusters. 7.11.2 Calculate Louvain clustering Louvain is an alternative clustering method, used to detect communities in large networks. visium_brain <- doLouvainCluster(visium_brain) spatPlot2D(visium_brain, cell_color = "louvain_clus") Figure 7.22: Spatial plot, colors indicate the Louvain clusters. You can find more information about the differences between the Leiden and Louvain methods in this paper: From Louvain to Leiden: guaranteeing well-connected communities, 2019 7.12 Save the object saveGiotto(visium_brain, "results/01_session5/visium_brain_object") 7.13 Session info sessionInfo() R version 4.4.1 (2024-06-14) Platform: aarch64-apple-darwin20 Running under: macOS Sonoma 14.5 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: America/New_York tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Giotto_4.1.0 GiottoClass_0.3.3 loaded via a namespace (and not attached): [1] colorRamp2_0.1.0 deldir_2.0-4 [3] rlang_1.1.4 magrittr_2.0.3 [5] GiottoUtils_0.1.10 matrixStats_1.3.0 [7] compiler_4.4.1 png_0.1-8 [9] systemfonts_1.1.0 vctrs_0.6.5 [11] reshape2_1.4.4 stringr_1.5.1 [13] pkgconfig_2.0.3 SpatialExperiment_1.14.0 [15] crayon_1.5.3 fastmap_1.2.0 [17] backports_1.5.0 magick_2.8.4 [19] XVector_0.44.0 labeling_0.4.3 [21] utf8_1.2.4 rmarkdown_2.27 [23] UCSC.utils_1.0.0 ragg_1.3.2 [25] purrr_1.0.2 xfun_0.46 [27] beachmat_2.20.0 zlibbioc_1.50.0 [29] GenomeInfoDb_1.40.1 jsonlite_1.8.8 [31] DelayedArray_0.30.1 BiocParallel_1.38.0 [33] terra_1.7-78 irlba_2.3.5.1 [35] parallel_4.4.1 R6_2.5.1 [37] stringi_1.8.4 RColorBrewer_1.1-3 [39] reticulate_1.38.0 parallelly_1.37.1 [41] GenomicRanges_1.56.1 scattermore_1.2 [43] Rcpp_1.0.13 bookdown_0.40 [45] SummarizedExperiment_1.34.0 knitr_1.48 [47] future.apply_1.11.2 R.utils_2.12.3 [49] FNN_1.1.4 IRanges_2.38.1 [51] Matrix_1.7-0 igraph_2.0.3 [53] tidyselect_1.2.1 rstudioapi_0.16.0 [55] abind_1.4-5 yaml_2.3.9 [57] codetools_0.2-20 listenv_0.9.1 [59] lattice_0.22-6 tibble_3.2.1 [61] plyr_1.8.9 Biobase_2.64.0 [63] withr_3.0.0 Rtsne_0.17 [65] evaluate_0.24.0 future_1.33.2 [67] pillar_1.9.0 MatrixGenerics_1.16.0 [69] checkmate_2.3.1 stats4_4.4.1 [71] plotly_4.10.4 generics_0.1.3 [73] dbscan_1.2-0 sp_2.1-4 [75] S4Vectors_0.42.1 ggplot2_3.5.1 [77] munsell_0.5.1 scales_1.3.0 [79] globals_0.16.3 gtools_3.9.5 [81] glue_1.7.0 lazyeval_0.2.2 [83] tools_4.4.1 GiottoVisuals_0.2.4 [85] data.table_1.15.4 ScaledMatrix_1.12.0 [87] cowplot_1.1.3 grid_4.4.1 [89] tidyr_1.3.1 colorspace_2.1-0 [91] SingleCellExperiment_1.26.0 GenomeInfoDbData_1.2.12 [93] BiocSingular_1.20.0 rsvd_1.0.5 [95] cli_3.6.3 textshaping_0.4.0 [97] fansi_1.0.6 S4Arrays_1.4.1 [99] viridisLite_0.4.2 dplyr_1.1.4 [101] uwot_0.2.2 gtable_0.3.5 [103] R.methodsS3_1.8.2 digest_0.6.36 [105] BiocGenerics_0.50.0 SparseArray_1.4.8 [107] ggrepel_0.9.5 farver_2.1.2 [109] rjson_0.2.21 htmlwidgets_1.6.4 [111] htmltools_0.5.8.1 R.oo_1.26.0 [113] lifecycle_1.0.4 httr_1.4.7 "],["visium-part-ii.html", "8 Visium Part II 8.1 Load the object 8.2 Differential expression 8.3 Enrichment & Deconvolution 8.4 Spatial expression patterns 8.5 Spatially informed clusters 8.6 Spatial domains HMRF 8.7 Interactive tools 8.8 Save the object 8.9 Session info", " 8 Visium Part II Joselyn Cristina Chávez Fuentes August 6th 2024 8.1 Load the object library(Giotto) visium_brain <- loadGiotto("results/01_session5/visium_brain_object") 8.2 Differential expression 8.2.1 Gini markers The Gini method identifies genes that are very selectively expressed in a specific cluster, however not always expressed in all cells of that cluster. In other words, highly specific but not necessarily sensitive at the single-cell level. Calculate the top marker genes per cluster using the gini method. gini_markers <- findMarkers_one_vs_all(gobject = visium_brain, method = "gini", expression_values = "normalized", cluster_column = "leiden_clus", min_feats = 10) topgenes_gini <- gini_markers[, head(.SD, 2), by = "cluster"]$feats Visualize Plot the normalized expression distribution of the top expressed genes. violinPlot(visium_brain, feats = unique(topgenes_gini), cluster_column = "leiden_clus", strip_text = 6, strip_position = "right", save_param = list(base_width = 5, base_height = 30)) Figure 8.1: Violin plot showing the top gini genes normalized expression. Use the cluster IDs to create a heatmap with the normalized expression of the top expressed genes per cluster. plotMetaDataHeatmap(visium_brain, selected_feats = unique(topgenes_gini), metadata_cols = "leiden_clus", x_text_size = 10, y_text_size = 10) Figure 8.2: Heatmap showing the top gini genes normalized expression per Leiden cluster. Visualize the scaled expression spatial distribution of the top expressed genes across the sample. dimFeatPlot2D(visium_brain, expression_values = "scaled", feats = sort(unique(topgenes_gini)), cow_n_col = 5, point_size = 1, save_param = list(base_width = 15, base_height = 20)) Figure 8.3: Spatial distribution of the top gini genes scaled expression. 8.2.2 Scran markers The Scran method is preferred for robust differential expression analysis, especially when addressing technical variability or differences in sequencing depth across spatial locations. [redo] Calculate the top marker genes per cluster using the scran method scran_markers <- findMarkers_one_vs_all(gobject = visium_brain, method = "scran", expression_values = "normalized", cluster_column = "leiden_clus", min_feats = 10) topgenes_scran <- scran_markers[, head(.SD, 2), by = "cluster"]$feats Visualize Plot the normalized expression distribution of the top expressed genes. violinPlot(visium_brain, feats = unique(topgenes_scran), cluster_column = "leiden_clus", strip_text = 6, strip_position = "right", save_param = list(base_width = 5, base_height = 30)) Figure 8.4: Violin plot of the top scran genes normalized expression. Use the cluster IDs to create a heatmap with the normalized expression of the top expressed genes per cluster. plotMetaDataHeatmap(visium_brain, selected_feats = unique(topgenes_scran), metadata_cols = "leiden_clus", x_text_size = 10, y_text_size = 10) Figure 8.5: Heatmap showing the top scran genes normalized expression per Leiden cluster. Visualize the scaled expression spatial distribution of the top expressed genes across the sample. dimFeatPlot2D(visium_brain, expression_values = "scaled", feats = sort(unique(topgenes_scran)), cow_n_col = 5, point_size = 1, save_param = list(base_width = 20, base_height = 20)) Figure 8.6: Spatial distribution of the top scran genes scaled expression. In practice, it is often beneficial to apply both Gini and Scran methods and compare results for a more complete understanding of differential gene expression across clusters. 8.3 Enrichment & Deconvolution Visium spatial transcriptomics does not provide single-cell resolution, making cell type annotation a harder problem. Giotto provides several ways to calculate enrichment of specific cell-type signature gene lists. Download the single-cell dataset GiottoData::getSpatialDataset(dataset = "scRNA_mouse_brain", directory = "data/02_session1") Create the single-cell object and run the normalization step results_folder <- "results/02_session1" python_path <- NULL instructions <- createGiottoInstructions( save_dir = results_folder, save_plot = TRUE, show_plot = FALSE, python_path = python_path ) sc_expression <- "data/02_session1/brain_sc_expression_matrix.txt.gz" sc_metadata <- "data/02_session1/brain_sc_metadata.csv" giotto_SC <- createGiottoObject(expression = sc_expression, instructions = instructions) giotto_SC <- addCellMetadata(giotto_SC, new_metadata = data.table::fread(sc_metadata)) giotto_SC <- normalizeGiotto(giotto_SC) 8.3.1 PAGE/Rank Parametric Analysis of Gene Set Enrichment (PAGE) and Rank enrichment both aim to determine whether a predefined set of genes show statistically significant differences in expression compared to other genes in the dataset. Calculate the cell type markers markers_scran <- findMarkers_one_vs_all(gobject = giotto_SC, method = "scran", expression_values = "normalized", cluster_column = "Class", min_feats = 3) top_markers <- markers_scran[, head(.SD, 10), by = "cluster"] celltypes <- levels(factor(markers_scran$cluster)) Create the signature matrix sign_list <- list() for (i in 1:length(celltypes)){ sign_list[[i]] = top_markers[which(top_markers$cluster == celltypes[i]),]$feats } sign_matrix <- makeSignMatrixPAGE(sign_names = celltypes, sign_list = sign_list) Run the enrichment test with PAGE visium_brain <- runPAGEEnrich(gobject = visium_brain, sign_matrix = sign_matrix) Visualize Create a heatmap showing the enrichment of cell types (from the single-cell data annotation) in the spatial dataset clusters. cell_types_PAGE <- colnames(sign_matrix) plotMetaDataCellsHeatmap(gobject = visium_brain, metadata_cols = "leiden_clus", value_cols = cell_types_PAGE, spat_enr_names = "PAGE", x_text_size = 8, y_text_size = 8) Figure 8.7: Cell types enrichment per Leiden cluster, identified using the PAGE method. Plot the spatial distribution of the cell types. spatCellPlot2D(gobject = visium_brain, spat_enr_names = "PAGE", cell_annotation_values = cell_types_PAGE, cow_n_col = 3, coord_fix_ratio = 1, point_size = 1, show_legend = TRUE) Figure 8.8: Spatial distribution of cell types identified using the PAGE method. 8.3.2 SpatialDWLS Spatial Dampened Weighted Least Squares (DWLS) estimates the proportions of different cell types across spots in a tissue. Create the signature matrix sign_matrix <- makeSignMatrixDWLSfromMatrix( matrix = getExpression(giotto_SC, values = "normalized", output = "matrix"), cell_type = pDataDT(giotto_SC)$Class, sign_gene = top_markers$feats) Run the DWLS Deconvolution This step may take a couple of minutes to run. visium_brain <- runDWLSDeconv(gobject = visium_brain, sign_matrix = sign_matrix) Visualize Plot the DWLS deconvolution result creating with pie plots showing the proportion of each cell type per spot. spatDeconvPlot(visium_brain, show_image = FALSE, radius = 50, save_param = list(save_name = "8_spat_DWLS_pie_plot")) Figure 8.9: Spatial deconvolution plot showing the proportion of cell types per spot, identified using the DWLS method. 8.4 Spatial expression patterns 8.4.1 Spatial variable genes Create a spatial network visium_brain <- createSpatialNetwork(gobject = visium_brain, method = "kNN", k = 6, maximum_distance_knn = 400, name = "spatial_network") spatPlot2D(gobject = visium_brain, show_network= TRUE, network_color = "blue", spatial_network_name = "spatial_network") Figure 8.10: Spatial network across spots in the Visium mouse sample. Rank binarization Rank the genes on the spatial dataset depending on whether they exhibit a spatial pattern location or not. This step may take a few minutes to run. ranktest <- binSpect(visium_brain, bin_method = "rank", calc_hub = TRUE, hub_min_int = 5, spatial_network_name = "spatial_network") Visualize top results Plot the scaled expression of genes with the highest probability of being spatial genes. spatFeatPlot2D(visium_brain, expression_values = "scaled", feats = ranktest$feats[1:6], cow_n_col = 2, point_size = 1) Figure 8.11: Spatial distribution of the top spatial genes scaled expression. 8.4.2 Spatial co-expression modules Cluster the top 500 spatial genes into 20 clusters ext_spatial_genes <- ranktest[1:500,]$feats Use detectSpatialCorGenes function to calculate pairwise distances between genes. spat_cor_netw_DT <- detectSpatialCorFeats( visium_brain, method = "network", spatial_network_name = "spatial_network", subset_feats = ext_spatial_genes) Identify most similar spatially correlated genes for one gene top10_genes <- showSpatialCorFeats(spat_cor_netw_DT, feats = "Mbp", show_top_feats = 10) Visualize Plot the scaled expression of the 3 genes with most similar spatial patterns to Mbp. spatFeatPlot2D(visium_brain, expression_values = "scaled", feats = top10_genes$variable[1:4], point_size = 1.5) Figure 8.12: Spatial distribution of the scaled expression of 3 genes with similar spatial pattern to Mbp. Cluster spatial genes spat_cor_netw_DT <- clusterSpatialCorFeats(spat_cor_netw_DT, name = "spat_netw_clus", k = 20) Visualize clusters Plot the correlation of the top 500 spatial genes with their assigned cluster. heatmSpatialCorFeats(visium_brain, spatCorObject = spat_cor_netw_DT, use_clus_name = "spat_netw_clus", heatmap_legend_param = list(title = NULL)) Figure 8.13: Correlations heatmap between spatial genes and correlated clusters. Rank spatial correlated clusters and show genes for selected clusters netw_ranks <- rankSpatialCorGroups( visium_brain, spatCorObject = spat_cor_netw_DT, use_clus_name = "spat_netw_clus") Plot the correlation and number of spatial genes in each cluster. top_netw_spat_cluster <- showSpatialCorFeats(spat_cor_netw_DT, use_clus_name = "spat_netw_clus", selected_clusters = 6, show_top_feats = 1) Figure 8.14: Ranking of spatial correlated groups. Size indicates the number spatial genes per group. Create the metagene enrichment score per co-expression cluster cluster_genes_DT <- showSpatialCorFeats(spat_cor_netw_DT, use_clus_name = "spat_netw_clus", show_top_feats = 1) cluster_genes <- cluster_genes_DT$clus names(cluster_genes) <- cluster_genes_DT$feat_ID visium_brain <- createMetafeats(visium_brain, feat_clusters = cluster_genes, name = "cluster_metagene") Plot the spatial distribution of the metagene enrichment scores of each spatial co-expression cluster. spatCellPlot(visium_brain, spat_enr_names = "cluster_metagene", cell_annotation_values = netw_ranks$clusters, point_size = 1, cow_n_col = 5) Figure 8.15: Spatial distribution of metagene enrichment scores per co-expression cluster. 8.5 Spatially informed clusters Get the top 30 genes per spatial co-expression cluster coexpr_dt <- data.table::data.table( genes = names(spat_cor_netw_DT$cor_clusters$spat_netw_clus), cluster = spat_cor_netw_DT$cor_clusters$spat_netw_clus) data.table::setorder(coexpr_dt, cluster) top30_coexpr_dt <- coexpr_dt[, head(.SD, 30) , by = cluster] spatial_genes <- top30_coexpr_dt$genes Re-calculate the clustering Use the spatial genes to calculate again the principal components, umap, network and clustering visium_brain <- runPCA(gobject = visium_brain, feats_to_use = spatial_genes, name = "custom_pca") visium_brain <- runUMAP(visium_brain, dim_reduction_name = "custom_pca", dimensions_to_use = 1:20, name = "custom_umap") visium_brain <- createNearestNetwork(gobject = visium_brain, dim_reduction_name = "custom_pca", dimensions_to_use = 1:20, k = 5, name = "custom_NN") visium_brain <- doLeidenCluster(gobject = visium_brain, network_name = "custom_NN", resolution = 0.15, n_iterations = 1000, name = "custom_leiden") Visualize Plot the spatial distribution of the Leiden clusters calculated based on the spatial genes. spatPlot2D(visium_brain, cell_color = "custom_leiden", point_size = 3) Figure 8.16: Spatial distribution of Leiden clusters calculated using spatial genes. Plot the UMAP and color the spots using the Leiden clusters calculated based on the spatial genes. plotUMAP(gobject = visium_brain, cell_color = "custom_leiden") Figure 8.17: UMAP plot, colors indicate the Leiden clusters calculated using spatial genes. 8.6 Spatial domains HMRF Hidden Markov Random Field (HMRF) models capture spatial dependencies and segment tissue regions based on shared and gene expression patterns. Do HMRF with different betas on top 30 genes per spatial co-expression module This step may take several minutes to run. HMRF_spatial_genes <- doHMRF(gobject = visium_brain, expression_values = "scaled", spatial_genes = spatial_genes, k = 20, spatial_network_name = "spatial_network", betas = c(0, 10, 5), output_folder = "11_HMRF/") Add the HMRF results to the giotto object visium_brain <- addHMRF(gobject = visium_brain, HMRFoutput = HMRF_spatial_genes, k = 20, betas_to_add = c(0, 10, 20, 30, 40), hmrf_name = "HMRF") Visualize Plot the spatial distribution of the HMRF domains. spatPlot2D(gobject = visium_brain, cell_color = "HMRF_k20_b.40") Figure 8.18: Spatial distribution of HMRF domains. 8.7 Interactive tools We have integrated a shiny app in Giotto to interactively select regions of a spatial plot. Create a spatial plot brain_spatPlot <- spatPlot2D(gobject = visium_brain, cell_color = "leiden_clus", show_image = FALSE, return_plot = TRUE, point_size = 1) brain_spatPlot Run the Shiny app plotInteractivePolygons(brain_spatPlot) Figure 8.19: Shiny app using the visium brain sample. Select the regions of interest and save the coordinates polygon_coordinates <- plotInteractivePolygons(brain_spatPlot) Figure 8.20: Polygons selected using the interactive Shiny app. Transform the data.table or data.frame with coordinates into a Giotto polygon object giotto_polygons <- createGiottoPolygonsFromDfr(polygon_coordinates, name = "selections", calc_centroids = TRUE) Add the polygons to the Giotto object visium_brain <- addGiottoPolygons(gobject = visium_brain, gpolygons = list(giotto_polygons)) Add the corresponding polygon IDs to the cell metadata visium_brain <- addPolygonCells(visium_brain, polygon_name = "selections") Extract the coordinates and IDs from cells located within one or multiple regions of interest. getCellsFromPolygon(visium_brain, polygon_name = "selections", polygons = "polygon 1") If no polygon name is provided, the function will retrieve cells located within all polygons getCellsFromPolygon(visium_brain, polygon_name = "selections") Compare the expression levels of some genes of interest between the selected regions comparePolygonExpression(visium_brain, selected_feats = c("Stmn1", "Psd", "Ly6h")) Figure 8.21: Heatmap showing the z-scores of three genes per selected polygon. Calculate the top genes expressed within each region, then provide the result to compare polygons scran_results <- findMarkers_one_vs_all( visium_brain, spat_unit = "cell", feat_type = "rna", method = "scran", expression_values = "normalized", cluster_column = "selections", min_feats = 2) top_genes <- scran_results[, head(.SD, 2), by = "cluster"]$feats comparePolygonExpression(visium_brain, selected_feats = top_genes) Figure 8.22: Heatmap showing the z-scores of top scran genes per selected polygon. Compare the abundance of cell types between the selected regions compareCellAbundance(visium_brain) Figure 8.23: Heatmap showing the cell abundance per selected polygon. Use other columns within the cell metadata table to compare the cell type abundances compareCellAbundance(visium_brain, cell_type_column = "custom_leiden") Figure 8.24: Heatmap showing the Leiden clusters abundance per selected polygon. Use the spatPlot arguments to isolate and plot each region. spatPlot2D(visium_brain, cell_color = "leiden_clus", group_by = "selections", cow_n_col = 3, point_size = 2, show_legend = FALSE) Figure 8.25: Spatial distribution of Leiden clusters across the selected polygons. Color each cell by cluster, cell type or expression level. spatFeatPlot2D(visium_brain, expression_values = "scaled", group_by = "selections", feats = "Psd", point_size = 2) Figure 8.26: Spatial distribution of Psd scaled expression across the selected polygons. Plot again the polygons plotPolygons(visium_brain, polygon_name = "selections", x = brain_spatPlot) Figure 8.27: Spatial location of selected polygons. 8.8 Save the object saveGiotto(visium_brain, "results/02_session1/visium_brain_object") 8.9 Session info sessionInfo() R version 4.4.1 (2024-06-14) Platform: aarch64-apple-darwin20 Running under: macOS Sonoma 14.5 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: America/New_York tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] shiny_1.8.1.1 Giotto_4.1.0 GiottoClass_0.3.3 loaded via a namespace (and not attached): [1] later_1.3.2 tibble_3.2.1 [3] R.oo_1.26.0 polyclip_1.10-7 [5] lifecycle_1.0.4 edgeR_4.2.1 [7] doParallel_1.0.17 lattice_0.22-6 [9] MASS_7.3-61 backports_1.5.0 [11] magrittr_2.0.3 sass_0.4.9 [13] limma_3.60.4 plotly_4.10.4 [15] rmarkdown_2.27 jquerylib_0.1.4 [17] yaml_2.3.9 metapod_1.12.0 [19] httpuv_1.6.15 sp_2.1-4 [21] reticulate_1.38.0 cowplot_1.1.3 [23] RColorBrewer_1.1-3 abind_1.4-5 [25] zlibbioc_1.50.0 quadprog_1.5-8 [27] GenomicRanges_1.56.1 purrr_1.0.2 [29] R.utils_2.12.3 BiocGenerics_0.50.0 [31] tweenr_2.0.3 circlize_0.4.16 [33] GenomeInfoDbData_1.2.12 IRanges_2.38.1 [35] S4Vectors_0.42.1 ggrepel_0.9.5 [37] irlba_2.3.5.1 terra_1.7-78 [39] dqrng_0.4.1 DelayedMatrixStats_1.26.0 [41] colorRamp2_0.1.0 codetools_0.2-20 [43] DelayedArray_0.30.1 scuttle_1.14.0 [45] ggforce_0.4.2 tidyselect_1.2.1 [47] shape_1.4.6.1 UCSC.utils_1.0.0 [49] farver_2.1.2 ScaledMatrix_1.12.0 [51] matrixStats_1.3.0 stats4_4.4.1 [53] GiottoData_0.2.12.0 jsonlite_1.8.8 [55] GetoptLong_1.0.5 BiocNeighbors_1.22.0 [57] progressr_0.14.0 iterators_1.0.14 [59] systemfonts_1.1.0 foreach_1.5.2 [61] dbscan_1.2-0 tools_4.4.1 [63] ragg_1.3.2 Rcpp_1.0.13 [65] glue_1.7.0 SparseArray_1.4.8 [67] xfun_0.46 MatrixGenerics_1.16.0 [69] GenomeInfoDb_1.40.1 dplyr_1.1.4 [71] withr_3.0.0 fastmap_1.2.0 [73] bluster_1.14.0 fansi_1.0.6 [75] digest_0.6.36 rsvd_1.0.5 [77] R6_2.5.1 mime_0.12 [79] textshaping_0.4.0 colorspace_2.1-0 [81] scattermore_1.2 Cairo_1.6-2 [83] gtools_3.9.5 R.methodsS3_1.8.2 [85] utf8_1.2.4 tidyr_1.3.1 [87] generics_0.1.3 data.table_1.15.4 [89] FNN_1.1.4 httr_1.4.7 [91] htmlwidgets_1.6.4 S4Arrays_1.4.1 [93] scatterpie_0.2.3 uwot_0.2.2 [95] pkgconfig_2.0.3 gtable_0.3.5 [97] ComplexHeatmap_2.20.0 GiottoVisuals_0.2.4 [99] SingleCellExperiment_1.26.0 XVector_0.44.0 [101] htmltools_0.5.8.1 bookdown_0.40 [103] clue_0.3-65 scales_1.3.0 [105] Biobase_2.64.0 GiottoUtils_0.1.10 [107] png_0.1-8 SpatialExperiment_1.14.0 [109] scran_1.32.0 ggfun_0.1.5 [111] knitr_1.48 rstudioapi_0.16.0 [113] reshape2_1.4.4 rjson_0.2.21 [115] checkmate_2.3.1 cachem_1.1.0 [117] GlobalOptions_0.1.2 stringr_1.5.1 [119] parallel_4.4.1 miniUI_0.1.1.1 [121] RcppZiggurat_0.1.6 pillar_1.9.0 [123] grid_4.4.1 vctrs_0.6.5 [125] promises_1.3.0 BiocSingular_1.20.0 [127] beachmat_2.20.0 xtable_1.8-4 [129] cluster_2.1.6 evaluate_0.24.0 [131] magick_2.8.4 cli_3.6.3 [133] locfit_1.5-9.10 compiler_4.4.1 [135] rlang_1.1.4 crayon_1.5.3 [137] labeling_0.4.3 plyr_1.8.9 [139] stringi_1.8.4 viridisLite_0.4.2 [141] deldir_2.0-4 BiocParallel_1.38.0 [143] munsell_0.5.1 lazyeval_0.2.2 [145] Matrix_1.7-0 sparseMatrixStats_1.16.0 [147] ggplot2_3.5.1 statmod_1.5.0 [149] SummarizedExperiment_1.34.0 Rfast_2.1.0 [151] memoise_2.0.1 igraph_2.0.3 [153] bslib_0.7.0 RcppParallel_5.1.8 "],["visium-hd.html", "9 Visium HD 9.1 Objective 9.2 Background 9.3 Data Ingestion 9.4 Hexbin 400 Giotto object 9.5 Hexbin 100 9.6 Hexbin 25 9.7 Database backend - Work in progress, but coming soon!", " 9 Visium HD Ruben Dries & Edward C. Ruiz August 6th 2024 9.1 Objective This tutorial demonstrates how to process Visium HD data at the highest 2 micron bin resolution by using flexible tiling and aggregation steps that are available in Giotto Suite. Notably, a similar strategy can be used for other spatial sequencing methods that operate at the subcellular level, including: - Stereo-seq - Seq-Scope - Open-ST The resulting datasets from all these technologies can be very large since they provide both a high spatial resolution and genome-wide capture of all transcripts. We will also discuss how data projection strategies can be used to alleviate heavy computational tasks such as PCA, UMAP, or clustering. This tutorial expects a general knowledge of common spatial analysis technologies that are available in Giotto Suite, such as those that have been discussed in the standard Visium tutorials (part I and part II). 9.2 Background 9.2.1 Visium HD Technology Figure 9.1: Overview of Visium HD. Source: 10X Genomics Visium HD is a spatial transcriptomics technology recently developed by 10X Genomics. Details about this platform are discussed on the official 10X Genomics Visium HD website and the preprint by Oliveira et al. 2024 on bioRxiv. Visium HD has a 2 micron bin size resolution. The default SpaceRanger pipeline from 10X Genomics also returns aggregated data at the 8 and 16 micron bin size. 9.2.2 Colorectal Cancer Sample Figure 9.2: Colorectal Cancer Overview. Source: 10X Genomics For this tutorial we will be using the publicly available Colorectal Cancer Visium HD dataset. Details about this dataset and a link to download the raw data can be found at the 10X Genomics website. 9.3 Data Ingestion 9.3.1 Visium HD output data format Figure 9.3: File structure of Visium HD data processed with spaceranger pipeline. Visium HD data processed with the spaceranger pipeline is organized in this format containing various files associated with the sample. The files highlighted in yellow are what we will be using to read in these datasets. Warning: the VisiumHD folder structure has very recently been updated and might be slightly different. 9.3.2 Mini Visium HD dataset For this workshop we will use a spatial subset and downsampled version of the original datasets. A VisiumHD folder similar to the original can be downloaded using the Zenodo link. Using this dataset will ensure that we will not run into major memory issues. library(Giotto) # set up paths data_path <- "data/02_session2/" save_dir <- "results/02_session2/" dir.create(save_dir, recursive = TRUE) # download the mini dataset and untar options("timeout" = Inf) download.file( url = "https://zenodo.org/records/13226158/files/workshop_VisiumHD.zip?download=1", destfile = file.path(save_dir, "workshop_visiumHD.zip") ) untar(tarfile = file.path(save_dir, "workshop_visiumHD.zip"), exdir = data_path) 9.3.3 Giotto Visium HD convenience function The easiest way to read in Visium HD data in Giotto is through our convenience function. This function will automatically read in the data at your desired resolution, align the images, and finally create a Giotto Object. # importVisiumHD() 9.3.4 Read in data manually However, for this tutorial we will illustrate how to create your own Giotto object in a step-by-step manner, which can also be applied to other similar technologies as discussed in the Objective section. 9.3.4.1 Raw expression data expression_path <- file.path(data_path, '/Human_Colorectal_Cancer_workshop/square_002um/raw_feature_bc_matrix') expr_results <- get10Xmatrix(path_to_data = expression_path, gene_column_index = 1) 9.3.4.2 Tissue positions data tissue_positions_path <- file.path(data_path, '/Human_Colorectal_Cancer_workshop/square_002um/spatial/tissue_positions.parquet') tissue_positions <- data.table::as.data.table(arrow::read_parquet(tissue_positions_path)) 9.3.4.3 Merge expression and 2 micron position data # convert expression matrix to minimal data.frame or data.table object matrix_tile_dt <- data.table::as.data.table(Matrix::summary(expr_results)) genes <- expr_results@Dimnames[[1]] samples <- expr_results@Dimnames[[2]] matrix_tile_dt[, gene := genes[i]] matrix_tile_dt[, pixel := samples[j]] Figure 9.4: Genes expressed for each 2 µm pixel in the array dimensions. # merge data.table matrix and spatial coordinates to create input for Giotto Polygons expr_pos_data <- data.table::merge.data.table(matrix_tile_dt, tissue_positions, by.x = 'pixel', by.y = 'barcode') expr_pos_data <- expr_pos_data[,.(pixel, pxl_row_in_fullres, pxl_col_in_fullres, gene, x)] colnames(expr_pos_data) = c('pixel', 'x', 'y', 'gene', 'count') Figure 9.5: Genes expressed with count for each 2 µm pixel in the spatial dimensions. 9.4 Hexbin 400 Giotto object 9.4.1 create giotto points The giottoPoints object represents the spatial expression information for each transcript: - gene id - count or UMI - spatial pixel location (x, y) giotto_points = createGiottoPoints(x = expr_pos_data[,.(x, y, gene, pixel, count)]) 9.4.2 create giotto polygons 9.4.2.1 Tiling and aggregation The Visium HD data is organized in a grid format. We can aggregate the data into larger bins to reduce the resolution of the data. Giotto Suite can work with any type of polygon information and already provides ready-to-use options for binning data with squares, triangles, and hexagons. Here we will use a hexagon tesselation to aggregate the data into arbitrary bins. Figure 9.6: Hexagon properties # create giotto polygons, here we create hexagons hexbin400 <- tessellate(extent = ext(giotto_points), shape = 'hexagon', shape_size = 400, name = 'hex400') plot(hexbin400) Figure 9.7: Giotto polygon in a hexagon shape for overlapping visium HD expression data. 9.4.3 combine Giotto points and polygons to create Giotto object instrs = createGiottoInstructions( save_dir = save_dir, save_plot = TRUE, show_plot = FALSE, return_plot = FALSE ) # gpoints provides spatial gene expression information # gpolygons provides spatial unit information (here = hexagon tiles) visiumHD = createGiottoObjectSubcellular(gpoints = list('rna' = giotto_points), gpolygons = list('hex400' = hexbin400), instructions = instrs) # create spatial centroids for each spatial unit (hexagon) visiumHD = addSpatialCentroidLocations(gobject = visiumHD, poly_info = 'hex400') Visualize the Giotto object. Make sure to set expand_counts = TRUE to expand the counts column. Each spatial bin can have multiple transcripts/UMIs. This is different compared to in situ technologies like seqFISH, MERFISH, Nanostring CosMx or Xenium. Figure 9.8: Schematic showing effect of expand counts and jitter. Show the giotto points (transcripts) and polygons (hexagons) together using spatInSituPlotPoints: feature_data = fDataDT(visiumHD) spatInSituPlotPoints(visiumHD, show_image = F, feats = list('rna' = feature_data$feat_ID[10:20]), show_legend = T, spat_unit = 'hex400', point_size = 0.25, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex400', polygon_bg_color = NA, polygon_color = 'white', polygon_line_size = 0.1, expand_counts = TRUE, count_info_column = 'count', jitter = c(25,25)) Figure 9.9: Overlap of gene expression with the hex400 polygons. Each dot represents a single gene. Jitter used to better vizualize individual transcripts You can set plot_method = scattermore or scattermost to convert high-resolution images to low(er) resolution rasterized images. It’s usually faster and will save on disk space. spatInSituPlotPoints(visiumHD, show_image = F, feats = list('rna' = feature_data$feat_ID[10:20]), show_legend = T, spat_unit = 'hex400', point_size = 0.25, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex400', polygon_bg_color = NA, polygon_color = 'white', polygon_line_size = 0.1, expand_counts = TRUE, count_info_column = 'count', jitter = c(25,25), plot_method = 'scattermore') Figure 9.10: Overlap of gene expression with the hex400 polygons. Genes/transcripts are rasterized. Jitter used to better vizualize individual transcripts 9.4.4 Process Giotto object 9.4.4.1 calculate overlap between points and polygons At the moment the giotto points (transcripts) and polygons (hexagons) are two separate layers of information. Here we will determine which transcripts overlap with which hexagons so that we can aggregate the gene expression information and convert this into a gene expression matrix (genes-by-hexagons) that can be used in default spatial pipelines. # calculate overlap between points and polygons visiumHD = calculateOverlap(visiumHD, spatial_info = 'hex400', feat_info = 'rna') showGiottoSpatialInfo(visiumHD) 9.4.4.2 convert overlap results to a gene-by-hexagon matrix # convert overlap results to bin by gene matrix visiumHD = overlapToMatrix(visiumHD, poly_info = 'hex400', feat_info = 'rna', name = 'raw') # this action will automatically create an active spatial unit, ie. hexbin 400 activeSpatUnit(visiumHD) 9.4.4.3 default processing steps This part is similar to that described in the Visium tutorials (Part I and Part II). # filter on gene expression matrix visiumHD <- filterGiotto(visiumHD, expression_threshold = 1, feat_det_in_min_cells = 5, min_det_feats_per_cell = 25) # normalize and scale gene expression data visiumHD <- normalizeGiotto(visiumHD, scalefactor = 1000, verbose = T) # add cell and gene statistics visiumHD <- addStatistics(visiumHD) 9.4.4.3.1 visualize number of features At the centroid level. # each dot here represents a 200x200 aggregation of spatial barcodes (bin size 200) spatPlot2D(gobject = visiumHD, cell_color = "nr_feats", color_as_factor = F, point_size = 2.5) Figure 9.11: Number of features detected in each of the centroids. Using the spatial polygon (hexagon) tiles spatInSituPlotPoints(visiumHD, show_image = F, feats = NULL, show_legend = F, spat_unit = 'hex400', point_size = 0.1, show_polygon = TRUE, use_overlap = TRUE, polygon_feat_type = 'hex400', polygon_fill = 'nr_feats', polygon_fill_as_factor = F, polygon_bg_color = NA, polygon_color = 'white', polygon_line_size = 0.1) Figure 9.12: Number of features detected in each of the hex400 polygons. 9.4.4.4 Dimension reduction + clustering 9.4.4.4.1 Highly variable features + PCA visiumHD <- calculateHVF(visiumHD, zscore_threshold = 1) visiumHD <- runPCA(visiumHD, expression_values = 'normalized', feats_to_use = 'hvf') screePlot(visiumHD, ncp = 30) plotPCA(visiumHD) 9.4.4.4.2 UMAP reduction for visualization visiumHD <- runUMAP(visiumHD, dimensions_to_use = 1:14, n_threads = 10) plotUMAP(gobject = visiumHD, point_size = 1) 9.4.4.4.3 Create network based on expression similarity + graph partition cluster # sNN network (default) visiumHD <- createNearestNetwork(visiumHD, dimensions_to_use = 1:14, k = 5) ## leiden clustering #### visiumHD <- doLeidenClusterIgraph(visiumHD, resolution = 0.5, n_iterations = 1000, spat_unit = 'hex400') plotUMAP(gobject = visiumHD, cell_color = 'leiden_clus', point_size = 1.5, show_NN_network = F, edge_alpha = 0.05) Figure 9.13: Leiden clustering for the hex400 bins. spatInSituPlotPoints(visiumHD, show_image = F, feats = NULL, show_legend = F, spat_unit = 'hex400', point_size = 0.25, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex400', polygon_fill_as_factor = TRUE, polygon_fill = 'leiden_clus', polygon_color = 'black', polygon_line_size = 0.3) Figure 9.14: Spat plot for hex400 bin colored by leiden clusters. 9.5 Hexbin 100 Observation: Hexbin 400 results in very coarse information about the tissue. Goal is to create a higher resolution bin (hex100), then add this to the Giotto object to compare difference in resolution. 9.5.1 Standard subcellular pipeline Create new spatial unit layer, e.g. with tessellate function Add spatial units to Giottoo object Calculate centroids (optional) Compute overlap between transcript and polygon (hexagon) locations. Convert overlap data into a gene-by-polygon matrix hexbin100 <- tessellate(extent = ext(visiumHD), shape = 'hexagon', shape_size = 100, name = 'hex100') visiumHD = setPolygonInfo(gobject = visiumHD, x = hexbin100, name = 'hex100', initialize = T) visiumHD = addSpatialCentroidLocations(gobject = visiumHD, poly_info = 'hex100') Set active spatial unit. This can also be set manually in each function. activeSpatUnit(visiumHD) <- 'hex100' Let’s visualize the higher resolution hexagons. spatInSituPlotPoints(visiumHD, show_image = F, feats = list('rna' = feature_data$feat_ID[1:20]), show_legend = T, spat_unit = 'hex100', point_size = 0.1, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex100', polygon_bg_color = NA, polygon_color = 'white', polygon_line_size = 0.2, expand_counts = TRUE, count_info_column = 'count', jitter = c(25,25)) Figure 9.15: Polygon overlay of hex100 bins over 2 µm pixel. Jitter applied to vizualize individual features. visiumHD = calculateOverlap(visiumHD, spatial_info = 'hex100', feat_info = 'rna') visiumHD = overlapToMatrix(visiumHD, poly_info = 'hex100', feat_info = 'rna', name = 'raw') visiumHD <- filterGiotto(visiumHD, expression_threshold = 1, feat_det_in_min_cells = 10, min_det_feats_per_cell = 10) visiumHD <- normalizeGiotto(visiumHD, scalefactor = 1000, verbose = T) visiumHD <- addStatistics(visiumHD) Your Giotto object will have metadata for each spatial unit. pDataDT(visiumHD, spat_unit = 'hex100') pDataDT(visiumHD, spat_unit = 'hex400') ## dimension reduction #### # --------------------------- # visiumHD <- calculateHVF(visiumHD, zscore_threshold = 1) visiumHD <- runPCA(visiumHD, expression_values = 'normalized', feats_to_use = 'hvf') plotPCA(visiumHD) visiumHD <- runUMAP(visiumHD, dimensions_to_use = 1:14, n_threads = 10) # plot UMAP, coloring cells/points based on nr_feats plotUMAP(gobject = visiumHD, point_size = 2) Figure 9.16: UMAP for the hex100 bin. # sNN network (default) visiumHD <- createNearestNetwork(visiumHD, dimensions_to_use = 1:14, k = 5) ## leiden clustering #### visiumHD <- doLeidenClusterIgraph(visiumHD, resolution = 0.2, n_iterations = 1000) plotUMAP(gobject = visiumHD, cell_color = 'leiden_clus', point_size = 1.5, show_NN_network = F, edge_alpha = 0.05) Figure 9.17: UMAP for the hex100 bin colored by ledien clusters. spatInSituPlotPoints(visiumHD, show_image = F, feats = NULL, show_legend = F, spat_unit = 'hex100', point_size = 0.5, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex100', polygon_fill_as_factor = TRUE, polygon_fill = 'leiden_clus', polygon_color = 'black', polygon_line_size = 0.3) Figure 9.18: Spat plot for the hex100 bin colored by leiden clusters. This resolution definitely shows more promise to identify interesting spatial patterns. 9.5.2 Spatial expression patterns 9.5.2.1 Identify single genes Here we will use binSpect as a quick method to rank genes with high potential for spatial coherent expression patterns. featData = fDataDT(visiumHD) hvf_genes = featData[hvf == 'yes']$feat_ID visiumHD = createSpatialNetwork(visiumHD, name = 'kNN_network', spat_unit = 'hex100', method = 'kNN', k = 8) ranktest = binSpect(visiumHD, spat_unit = 'hex100', subset_feats = hvf_genes, bin_method = 'rank', calc_hub = FALSE, do_fisher_test = TRUE, spatial_network_name = 'kNN_network') Visualize top 2 ranked spatial genes per expression bin: set0 = ranktest[high_expr < 50][1:2]$feats set1 = ranktest[high_expr > 50 & high_expr < 100][1:2]$feats set2 = ranktest[high_expr > 100 & high_expr < 200][1:2]$feats set3 = ranktest[high_expr > 200 & high_expr < 400][1:2]$feats set4 = ranktest[high_expr > 400 & high_expr < 1000][1:2]$feats set5 = ranktest[high_expr > 1000][1:2]$feats spatFeatPlot2D(visiumHD, expression_values = 'scaled', feats = c(set0, set1, set2), gradient_style = "sequential", cell_color_gradient = c('blue', 'white', 'yellow', 'orange', 'red', 'darkred'), cow_n_col = 2, point_size = 1) Figure 9.19: Spat feature plot showing gene expression for the top 2 ranked spatial genes per expression bin (<50, >50 and >100) across the hex100 bin. spatFeatPlot2D(visiumHD, expression_values = 'scaled', feats = c(set3, set4, set5), gradient_style = "sequential", cell_color_gradient = c('blue', 'white', 'yellow', 'orange', 'red', 'darkred'), cow_n_col = 2, point_size = 1) Figure 9.20: Spat feature plot showing gene expression for the top 2 ranked spatial genes per expression bin (>200, >400 and >1000) across the hex100 bin. 9.5.2.2 Spatial co-expression modules Investigating individual genes is a good start, but here we would like to identify recurrent spatial expression patterns that are shared by spatial co-expression modules that might represent spatially organized biological processes. ext_spatial_genes = ranktest[adj.p.value < 0.001]$feats spat_cor_netw_DT = detectSpatialCorFeats(visiumHD, method = 'network', spatial_network_name = 'kNN_network', subset_feats = ext_spatial_genes) # cluster spatial genes spat_cor_netw_DT = clusterSpatialCorFeats(spat_cor_netw_DT, name = 'spat_netw_clus', k = 16) # visualize clusters heatmSpatialCorFeats(visiumHD, spatCorObject = spat_cor_netw_DT, use_clus_name = 'spat_netw_clus', heatmap_legend_param = list(title = NULL)) Figure 9.21: Heatmap showing spatially correlated genes split into 16 clusters. # create metagene enrichment score for clusters cluster_genes_DT = showSpatialCorFeats(spat_cor_netw_DT, use_clus_name = 'spat_netw_clus', show_top_feats = 1) cluster_genes = cluster_genes_DT$clus; names(cluster_genes) = cluster_genes_DT$feat_ID visiumHD = createMetafeats(visiumHD, expression_values = 'normalized', feat_clusters = cluster_genes, name = 'cluster_metagene') showGiottoSpatEnrichments(visiumHD) spatCellPlot(visiumHD, spat_enr_names = 'cluster_metagene', gradient_style = "sequential", cell_color_gradient = c('blue', 'white', 'yellow', 'orange', 'red', 'darkred'), cell_annotation_values = as.character(c(1:4)), point_size = 1, cow_n_col = 2) Figure 9.22: Spat plot vizualizing metagenes (1-4) based on spatially correlated genes vizualized on the hex100 bin spatCellPlot(visiumHD, spat_enr_names = 'cluster_metagene', gradient_style = "sequential", cell_color_gradient = c('blue', 'white', 'yellow', 'orange', 'red', 'darkred'), cell_annotation_values = as.character(c(5:8)), point_size = 1, cow_n_col = 2) Figure 9.23: Spat plot vizualizing metagenes (5-8) based on spatially correlated genes vizualized on the hex100 bin spatCellPlot(visiumHD, spat_enr_names = 'cluster_metagene', gradient_style = "sequential", cell_color_gradient = c('blue', 'white', 'yellow', 'orange', 'red', 'darkred'), cell_annotation_values = as.character(c(9:12)), point_size = 1, cow_n_col = 2) Figure 9.24: Spat plot vizualizing metagenes (9-12) based on spatially correlated genes vizualized on the hex100 bin spatCellPlot(visiumHD, spat_enr_names = 'cluster_metagene', gradient_style = "sequential", cell_color_gradient = c('blue', 'white', 'yellow', 'orange', 'red', 'darkred'), cell_annotation_values = as.character(c(13:16)), point_size = 1, cow_n_col = 2) Figure 9.25: Spat plot vizualizing metagenes (13-16) based on spatially correlated genes vizualized on the hex100 bin A simple follow up analysis could be to perform gene set enrichment analysis on each spatial co-expression module. 9.5.2.3 Plot spatial gene groups Hack! Vendors of spatial technologies typically like to show very interesting spatial gene expression patterns. Here we will follow a similar strategy by selecting a balanced set of genes for each spatial co-expression module and then to simply give them the same color in the spatInSituPlotPoints function. balanced_genes = getBalancedSpatCoexpressionFeats(spatCorObject = spat_cor_netw_DT, maximum = 5) selected_feats = names(balanced_genes) # give genes from same cluster same color distinct_colors = getDistinctColors(n = 20) names(distinct_colors) = 1:20 my_colors = distinct_colors[balanced_genes] names(my_colors) = names(balanced_genes) spatInSituPlotPoints(visiumHD, show_image = F, feats = list('rna' = selected_feats), feats_color_code = my_colors, show_legend = F, spat_unit = 'hex100', point_size = 0.20, show_polygon = FALSE, use_overlap = FALSE, polygon_feat_type = 'hex100', polygon_bg_color = NA, polygon_color = 'white', polygon_line_size = 0.01, expand_counts = TRUE, count_info_column = 'count', jitter = c(25,25)) Figure 9.26: Coloring individual features based on the spatially correlated gene clusters. 9.6 Hexbin 25 Goal is to create a higher resolution bin (hex25) and add to the Giotto object. We will aim to identify individual cell types and local neighborhood niches. 9.6.1 Subcellular workflow filter and normalization workflow visiumHD_subset = subsetGiottoLocs(gobject = visiumHD, x_min = 16000, x_max = 20000, y_min = 44250, y_max = 45500) Figure 9.27: Coloring individual features based on the spatially correlated gene clusters + subset rectangle. Plot visiumHD subset with hexbin100 polygons: spatInSituPlotPoints(visiumHD_subset, show_image = F, feats = NULL, show_legend = F, spat_unit = 'hex100', point_size = 0.5, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex100', polygon_fill_as_factor = TRUE, polygon_fill = 'leiden_clus', polygon_color = 'black', polygon_line_size = 0.3) Figure 9.28: Hexbin100 colored by leiden clustering results Plot visiumHD subset with selected gene features: spatInSituPlotPoints(visiumHD_subset, show_image = F, feats = list('rna' = selected_feats), feats_color_code = my_colors, show_legend = F, spat_unit = 'hex100', point_size = 0.40, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex100', polygon_bg_color = NA, polygon_color = 'white', polygon_line_size = 0.05, jitter = c(25,25)) Figure 9.29: Coloring individual features based on the spatially correlated gene clusters Create smaller hexbin25 tessellations: hexbin25 <- tessellate(extent = ext(visiumHD_subset@feat_info$rna), shape = 'hexagon', shape_size = 25, name = 'hex25') visiumHD_subset = setPolygonInfo(gobject = visiumHD_subset, x = hexbin25, name = 'hex25', initialize = T) showGiottoSpatialInfo(visiumHD_subset) visiumHD_subset = addSpatialCentroidLocations(gobject = visiumHD_subset, poly_info = 'hex25') activeSpatUnit(visiumHD_subset) <- 'hex25' spatInSituPlotPoints(visiumHD_subset, show_image = F, feats = list('rna' = selected_feats), feats_color_code = my_colors, show_legend = F, spat_unit = 'hex25', point_size = 0.40, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex25', polygon_bg_color = NA, polygon_color = 'white', polygon_line_size = 0.05, jitter = c(25,25)) Figure 9.30: xxx visiumHD_subset = calculateOverlap(visiumHD_subset, spatial_info = 'hex25', feat_info = 'rna') showGiottoSpatialInfo(visiumHD_subset) # convert overlap results to bin by gene matrix visiumHD_subset = overlapToMatrix(visiumHD_subset, poly_info = 'hex25', feat_info = 'rna', name = 'raw') visiumHD_subset <- filterGiotto(visiumHD_subset, expression_threshold = 1, feat_det_in_min_cells = 3, min_det_feats_per_cell = 5) activeSpatUnit(visiumHD_subset) # normalize visiumHD_subset <- normalizeGiotto(visiumHD_subset, scalefactor = 1000, verbose = T) # add statistics visiumHD_subset <- addStatistics(visiumHD_subset) feature_data = fDataDT(visiumHD_subset) visiumHD_subset <- calculateHVF(visiumHD_subset, zscore_threshold = 1) 9.6.2 Projections PCA projection from random subset. UMAP projection from random subset. cluster result projection from subsampled Giotto object + kNN voting 9.6.2.1 PCA with projection n_25_percent <- round(length(spatIDs(visiumHD_subset, 'hex25')) * 0.25) # pca projection on subset visiumHD_subset <- runPCAprojection( gobject = visiumHD_subset, spat_unit = "hex25", feats_to_use = 'hvf', name = 'pca.projection', set_seed = TRUE, seed_number = 12345, random_subset = n_25_percent ) showGiottoDimRed(visiumHD_subset) plotPCA(visiumHD_subset, dim_reduction_name = 'pca.projection') Figure 9.31: xxx 9.6.2.2 UMAP with projection # umap projection on subset visiumHD_subset <- runUMAPprojection( gobject = visiumHD_subset, spat_unit = "hex25", dim_reduction_to_use = 'pca', dim_reduction_name = "pca.projection", dimensions_to_use = 1:10, name = "umap.projection", random_subset = n_25_percent, n_neighbors = 10, min_dist = 0.005, n_threads = 4 ) showGiottoDimRed(visiumHD_subset) # plot UMAP, coloring cells/points based on nr_feats plotUMAP(gobject = visiumHD_subset, point_size = 1, dim_reduction_name = 'umap.projection') Figure 9.32: xxx 9.6.2.3 clustering with projection subsample Giotto object perform clustering (e.g. hierarchical clustering) project cluster results to full Giotto object using a kNN voting approach and a shared dimension reduction space (e.g. PCA) # subset to smaller giotto object set.seed(1234) subset_IDs = sample(x = spatIDs(visiumHD_subset, 'hex25'), size = n_25_percent) temp_gobject = subsetGiotto( gobject = visiumHD_subset, spat_unit = 'hex25', cell_ids = subset_IDs ) # hierarchical clustering temp_gobject = doHclust(gobject = temp_gobject, spat_unit = 'hex25', k = 8, name = 'sub_hclust', dim_reduction_to_use = 'pca', dim_reduction_name = 'pca.projection', dimensions_to_use = 1:10) # show umap dimPlot2D( gobject = temp_gobject, point_size = 2.5, spat_unit = 'hex25', dim_reduction_to_use = 'umap', dim_reduction_name = 'umap.projection', cell_color = 'sub_hclust' ) Figure 9.33: xxx # project clusterings back to full dataset visiumHD_subset <- doClusterProjection( target_gobject = visiumHD_subset, source_gobject = temp_gobject, spat_unit = "hex25", source_cluster_labels = "sub_hclust", reduction_method = 'pca', reduction_name = 'pca.projection', prob = FALSE, knn_k = 5, dimensions_to_use = 1:10 ) pDataDT(visiumHD_subset) dimPlot2D( gobject = visiumHD_subset, point_size = 1.5, spat_unit = 'hex25', dim_reduction_to_use = 'umap', dim_reduction_name = 'umap.projection', cell_color = 'knn_labels' ) Figure 9.34: xxx spatInSituPlotPoints(visiumHD_subset, show_image = F, feats = NULL, show_legend = F, spat_unit = 'hex25', point_size = 0.5, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex25', polygon_fill_as_factor = TRUE, polygon_fill = 'knn_labels', polygon_color = 'black', polygon_line_size = 0.3) Figure 9.35: xxx 9.6.3 Niche clustering Each cell will be clustered based on its neighboring cell type composition. Figure 9.36: Schematic for niche clustering. Originally from CODEX. Size of cellular niche is important and defines the tissue organization resolution. visiumHD_subset = createSpatialNetwork(visiumHD_subset, name = 'kNN_network', spat_unit = 'hex25', method = 'kNN', k = 6) pDataDT(visiumHD_subset) visiumHD_subset = calculateSpatCellMetadataProportions(gobject = visiumHD_subset, spat_unit = 'hex25', feat_type = 'rna', metadata_column = 'knn_labels', spat_network = 'kNN_network') prop_table = getSpatialEnrichment(visiumHD_subset, name = 'proportion', output = 'data.table') prop_matrix = GiottoUtils:::dt_to_matrix(prop_table) set.seed(1234) prop_kmeans = kmeans(x = prop_matrix, centers = 10, iter.max = 1000, nstart = 100) prop_kmeansDT = data.table::data.table(cell_ID = names(prop_kmeans$cluster), niche = prop_kmeans$cluster) visiumHD_subset = addCellMetadata(visiumHD_subset, new_metadata = prop_kmeansDT, by_column = T, column_cell_ID = 'cell_ID') pDataDT(visiumHD_subset) spatInSituPlotPoints(visiumHD_subset, show_image = F, feats = NULL, show_legend = F, spat_unit = 'hex25', point_size = 0.5, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex25', polygon_fill_as_factor = TRUE, polygon_fill = 'niche', polygon_color = 'black', polygon_line_size = 0.3) Figure 9.37: xxx 9.7 Database backend - Work in progress, but coming soon! Memory problems: - data ingestion - spatial operations - matrix operations - matrix and spatial geometry object sizes "],["xenium-1.html", "10 Xenium 10.1 Introduction to spatial dataset 10.2 Data preparation 10.3 Convenience function 10.4 Piecewise loading 10.5 Xenium Images 10.6 Spatial aggregation 10.7 Aggregate analyses workflow 10.8 Niche clustering 10.9 Cell proximity enrichment 10.10 Pseudovisium", " 10 Xenium Jiaji George Chen August 6th 2024 10.1 Introduction to spatial dataset This is the 10X Xenium FFPE Human Lung Cancer dataset. Xenium captures individual transcript detections with a spatial resolution of 100s of nanometers, providing an extremely highly resolved subcellular spatial dataset. This particular dataset also showcases their recent multimodal cell segmentation outputs. The Xenium Human Multi-Tissue and Cancer Panel (377) genes was used. The exported data is from their Xenium Onboard Analysis v2.0.0 pipeline. The full data for this example can be found here: here The relevant items are: Xenium Output Bundle (full) Supplemental: Post-Xenium H&E image (OME-TIFF) Supplemental: H&E Image Alignment File (CSV) Additional package requirements When working with this data and trying to open the parquet files, you will need arrow built with ZTSD support. See the datasets & packages section for specific install instructions. 10.1.1 Output directory structure ├── analysis.tar.gz ├── analysis.zarr.zip ├── analysis_summary.html ├── aux_outputs.tar.gz ├── transcripts.csv.gz ├── transcripts.parquet ├── transcripts.zarr.zip ├── cell_boundaries.csv.gz ├── cell_boundaries.parquet ├── nucleus_boundaries.csv.gz ├── nucleus_boundaries.parquet ├── cell_feature_matrix.tar.gz ├── cell_feature_matrix │ ├── barcodes.tsv.gz │ ├── features.tsv.gz │ └── matrix.mtx.gz ├── cell_feature_matrix.h5 ├── cell_feature_matrix.zarr.zip ├── cells.csv.gz ├── cells.parquet ├── cells.zarr.zip ├── experiment.xenium ├── gene_panel.json ├── metrics_summary.csv ├── morphology.ome.tif ├── morphology_focus │ ├── morphology_focus_0000.ome.tif │ ├── morphology_focus_0001.ome.tif │ ├── morphology_focus_0002.ome.tif │ ├── morphology_focus_0003.ome.tif ├── Xenium_V1_humanLung_Cancer_FFPE_he_image.ome.tif └── Xenium_V1_humanLung_Cancer_FFPE_he_imagealignment.csv The above directory structuring and naming is characteristic of Xenium v2.0 pipeline outputs. The only items that may not be exactly the same across all outputs are the morphology focus directory and the naming of the aligned image items. For the morphology focus images, you may have fewer images if the experiment did not include the multimodal cell segmentation. As for the aligned images, this is usually done after the Xenium experiment concludes and is added on using Xenium Explorer. Naming and location of the aligned image (he_image.ome.tif) and associated alignment info he_imagealignment.csv are entirely up to the user. 10.1.2 Mini Xenium Dataset library(Giotto) # set up paths data_path <- "data/02_session3" save_dir <- "results/02_session3" dir.create(save_dir, recursive = TRUE) # download the mini dataset and untar options("timeout" = Inf) download.file( url = "https://zenodo.org/records/13207308/files/workshop_xenium.zip?download=1", destfile = file.path(save_dir, "workshop_xenium.zip") ) # untar the downloaded data untar(tarfile = file.path(save_dir, "workshop_xenium.zip"), exdir = data_path) In order to speed up the steps of the workshop and make it locally runnable, we provide a subset of the full dataset. - Full: -16.039, 12342.984, -3511.515, -294.455 (xmin, xmax, ymin, ymax) - Mini: 6000, 7000, -2200, -1400 (xmin, xmax, ymin, ymax) Figure 10.1: Shown is the H&E aligned to the Xenium dataset with micron scaling. The blue bounds mark out the area provided as a mini dataset 10.2 Data preparation 10.2.1 Image conversion (may change) First is actually dealing with the image formats. Xenium generates ome.tif images which Giotto is currently not fully compatible with. So we convert them to normal tif images using ometif_to_tif() which works through the python tifffile package. The image files can then be loaded in downstream steps. These commented out steps are not needed for today since the mini dataset provides .tif images that have already been spatially aligned and converted. However, the code needed to do this is provided below. # image_paths <- list.files( # data_path, pattern = "morphology_focus|he_image.ome", # recursive = TRUE, full.names = TRUE # ) ometif_to_tif() output_dir can be specified, but by default, it writes to a new subdirectory called tif_exports underneath the source image”s directory. Keep in mind that where the exported tifs get exported to should be where downstream image reading functions should point to. The code run today is with the filepaths that the mini dataset has. # lapply(image_paths, function(img) { # GiottoClass::ometif_to_tif(img, overwrite = TRUE) # }) We are also working on a method of directly accessing the ome.tifs for better compatibility in the future. 10.3 Convenience function Giotto has flexible methods for working with the Xenium outputs. The createGiottoXeniumObject() will generate a giotto object in a single step when provided the output directory. The default behavior is to load: transcripts information cell and nucleus boundaries feature metadata (gene_panel.json) For the full dataset (HPC): time: 1-2min | memory: 24GBC ?createGiottoXeniumObject g <- createGiottoXeniumObject(xenium_dir = data_path) # set instructions for save directory and to save the plots to disk instructions(g, "save_dir") <- save_dir instructions(g, "save_plot") <- TRUE There are a lot of other parameters for additional or alternative items you can load. The next subsections will explain a couple of them. 10.3.1 Specific filepaths expression_path = , cell_metadata_path = , transcript_path = , bounds_path = , gene_panel_json_path = , The convenience function auto-detects filepaths based on the Xenium directory path and the preferred file formats .parquet for tabular (vs .csv) .h5 for matrix over other formats when available (vs .mtx) .zarr is currently not supported. When you need to use a different file format or something is not in the expected output structure, you can supply a specific filepath to the convenience function using these parameters. 10.3.2 Quality value qv_threshold = 20 # default The Quality Value is a Phred-based 0-40 value that 10X provides for every detection in their transcripts output. Higher values mean higher confidence in the decoded transcript identity. By default 10X uses a cutoff of QV = 20 for transcripts to use downstream. _*setting a value other than 20 will make the loaded dataset different from the 10X-provided expression matrix and cell metadata._ QV Calculation Raw Q-score based on how likely it is that an observed code is to be the codeword that it gets mapped to vs less likely codeword. Adjustment of raw Q-score by binning the transcripts by Q-value then adjusting the exact Q per bin based on proportion of Negative Control Codewords detected within. further info 10.3.3 Transcript type splitting feat_type = c( "rna", "NegControlProbe", "UnassignedCodeword", "NegControlCodeword" ), split_keyword = list( c("NegControlProbe"), c("UnassignedCodeword"), c("NegControlCodeword)" ) There are 4 types of transcript detections that 10X reports with their v2.0 pipeline: Gene expression - This is the rna gene detections. Negative Control Codeword - (QC) Codewords that do not map to genes, but are in the codebook. Used to determine specificity of decoding algorithm. Negative Control Probe - (QC) Probes in panel but target non-biological sequences. Used to determine specificity of assay. Unassigned Codeword - (QC) Codewords that should not be used in the current panel. With V3 on their Xenium prime outputs, there is additionally: Genomic Control Codeword (QC) Probes for intergenic genomic DNA instead of transcripts. The main thing to watch out for is that the other probe types should be separated out from the the Gene expression or rna feature type. How to deal with these different types of detections is easily adjustable. With the feat_type param you declare which categories/feat_types you want to split transcript detections into. Then with split_keyword, you provide a list of character vectors containing grep() terms to search for. Note that there are 4 feat_types declared in this set of defaults, but 3 items passed to split_keyword. Any transcripts not matched by items in split_keyword, get categorized as the first provided feat_type (“rna”). 10.3.4 Centroids calculation Several Giotto operations require that a set of centroids are calculated for polygon spatial units. g <- addSpatialCentroidLocations(g, poly_info = "cell") g <- addSpatialCentroidLocations(g, poly_info = "nucleus") 10.3.5 Simple visualization spatInSituPlotPoints(g, polygon_feat_type = "cell", feats = list(rna = head(featIDs(g))), # must be named list use_overlap = FALSE, polygon_color = "cyan", polygon_line_size = 0.1 ) Figure 10.2: Simple subcellular plotting to check data 10.4 Piecewise loading Giotto also provides the importXenium() import utility that allows independent creation of compatible Giotto subobjects for more flexibility. x <- importXenium(data_path) force(x) Giotto <XeniumReader> dir : data/02_session3/ qv_cutoff : 20 filetype : transcripts -- parquet boundaries -- parquet expression -- h5 cell_meta -- parquet funs : load_transcripts() load_polys() load_cellmeta() load_featmeta() load_expression() load_image() load_aligned_image() create_gobject() 10.4.1 Load giottoPoints transcripts x$qv <- 20 # default tx <- x$load_transcripts() plot(tx[[1]]$rna, dens = TRUE) Figure 10.3: plot of Gene expression (rna) density force(tx[[1]]$rna) An object of class giottoPoints feat_type : "rna" Feature Information: class : SpatVector geometry : points dimensions : 479097, 10 (geometries, attributes) extent : 6000.001, 7000, -2200, -1400.012 (xmin, xmax, ymin, ymax) coord. ref. : names : feat_ID transcript_id cell_id overlaps_nucleus z_location qv fov_name type : <chr> <chr> <chr> <int> <num> <num> <chr> values : FBLN1 281487861612869 mcnjadoe-1 0 19.32 40 B11 PDGFRB 281487861612872 mcnjbidl-1 1 18.75 40 B11 PDGFRB 281487861612873 mcnjbidl-1 1 18.74 40 B11 nucleus_distance codeword_index feat_ID_uniq <num> <int> <int> 0 334 1 0 289 2 0 289 3 rm(tx) # remove to save space 10.4.2 (optional) Loading pre-aggregated data Giotto can spatially aggregate the transcripts information based on a provided set of boundaries information, however 10X also provides a pre-aggregated set of cell by feature information and metadata. These values may be slightly different from those calculated by Giotto”s pipeline, and are not loaded by default. Some care needs to be taken when loading this information: The feat_type of the loaded expression information should be matched to the used feat_type parameters passed to the convenience function. The qv_threshold used must be 20 since the 10X outputs are based on that cutoff. x$filetype$expression <- "mtx" # change to mtx instead of .h5 which is not in the mini dataset ex <- x$load_expression() featType(ex) [1] "rna" "Negative Control Probe" "Negative Control Codeword" [4] "Unassigned Codeword" The feature types here do not match what we established for the transcripts, so we can just change them. Another reason for changing them here is just because the default names have ’ ’ characters which are difficult to work with. force(g) An object of class giotto >Active spat_unit: cell >Active feat_type: rna [SUBCELLULAR INFO] polygons : cell nucleus features : rna NegControlProbe UnassignedCodeword NegControlCodeword [AGGREGATE INFO] spatial locations ---------------- [cell] raw [nucleus] raw featType(ex[[2]]) <- c("NegControlProbe") featType(ex[[3]]) <- c("NegControlCodeword") featType(ex[[4]]) <- c("UnassignedCodeword") Then we can just append them to the Giotto object. Here we set up a second object called g2 since we will be using Giotto’s own aggregation method to generate the expression matrix later. g2 <- g # append the expression info g2 <- setGiotto(g2, ex) # load cell metadata cx <- x$load_cellmeta() g2 <- setGiotto(g2, cx) force(g2) An object of class giotto >Active spat_unit: cell >Active feat_type: rna [SUBCELLULAR INFO] polygons : cell nucleus features : rna NegControlProbe UnassignedCodeword NegControlCodeword [AGGREGATE INFO] expression ----------------------- [cell][rna] raw [cell][NegControlProbe] raw [cell][NegControlCodeword] raw [cell][UnassignedCodeword] raw spatial locations ---------------- [cell] raw [nucleus] raw spatInSituPlotPoints(g2, # polygon shading params polygon_fill = "cell_area", polygon_fill_as_factor = FALSE, polygon_fill_gradient_style = "sequential", # polygon line params polygon_color = "grey", polygon_line_size = 0.1 ) spatInSituPlotPoints(g2, # polygon shading params polygon_fill = "transcript_counts", polygon_fill_as_factor = FALSE, polygon_fill_gradient_style = "sequential", # polygon line params polygon_color = "grey", polygon_line_size = 0.1 ) Figure 10.4: Example plot using 10X metadata. Left is cell_area, right is transcript_counts rm(g2) # save space 10.5 Xenium Images Xenium outputs have several image outputs. For this dataset: morphology.ome.tif is a z-stacked image of the DAPI staining, with z levels separated as pages within the ome.tif. In this dataset, only pages 6 and 7 are really in focus. morphology_focus is a folder containing single-channel image(s), but with the original z information collapsed into a single in-focus layer. For all datasets, image 0000 will be DAPI staining, but if you have additional stains, such as the multimodal segmentation, they will also be here. These are the recommended immunofluorescence staining images to import. Xenium_V1_humanLung_Cancer_FFPE_he_image.ome.tif is an added on (in this case H&E) image with manual affine registration. 10.5.1 Image metadata The morphology_focus directory may contain multiple images, but to know more information, we have to check the ome.tif xml metadata. With a normal dataset, you can use: `GiottoClass::ometif_metadata([filepath], node = "Channel")` on one of the morphology_focus images, but since the mini dataset images are pre-processed, there is only an exported .xml to explore. The output of the code chunk below is the same as that from calling ometif_metadata() and looking for the Channel node. img_xml_path <- file.path(data_path, "morphology_focus", "morphology_focus_0000.xml") omemeta <- xml2::read_xml(img_xml_path) res <- xml2::xml_find_all(omemeta, "//d1:Channel", ns = xml2::xml_ns(omemeta)) res <- Reduce(rbind, xml2::xml_attrs(res)) rownames(res) <- NULL res <- as.data.frame(res) force(res) ID Name SamplesPerPixel 1 Channel:0 DAPI 1 2 Channel:1 18S 1 3 Channel:2 ATP1A1/CD45/E-Cadherin 1 4 Channel:3 alphaSMA/Vimentin 1 10.5.2 Image loading morphology_focus images need to be scaled by the micron scaling factor. Aligned images need to first be affine transformed then scaled. The micron scaling factor can be found in the json-like experiment.xenium file under pixel_size (0.2125 for this dataset). Figure 10.5: Spatial extent/bounds of transcripts (red), immunofluorescence morphology focus images (blue), H&E aligned image (gold). Lower right shows the affine matrix for aligning the H&E These transforms are normally done automatically when using: # convenience function params load_images = list( img1 = "[img_path1.tif]", img2 = "[img_path2.tif]", img3 = "..." ), load_aligned_images = list( aligned_img = c( "[path to image.tif]", "[path to magealignment.csv]" ) ) # importer params x$load_image(path = "[img_path1.tif]", name = "img1") x$load_image(path = "[img_path2.tif]", name = "img2") ... x$load_aligned_image( path = "[path to image.tif]", imagealignment_path = "[path to magealignment.csv]", name = "aligned_img" ) Specifically for the aligned image, there is also read10xAffineImage() which has similar parameters, but also asks for the micron scaling factor. But for the mini dataset, the images are pre-processed and can be directly added. img_paths <- c( sprintf("data/02_session3/morphology_focus/morphology_focus_%04d.tif", 0:3), "data/02_session3/he_mini.tif" ) img_list <- createGiottoLargeImageList( img_paths, # naming is based on the channel metadata above names = c("DAPI", "18S", "ATP1A1/CD45/E-Cadherin", "alphaSMA/Vimentin", "HE"), use_rast_ext = TRUE, verbose = FALSE ) # make some images brighter img_list[[1]]@max_window <- 5000 img_list[[2]]@max_window <- 5000 img_list[[3]]@max_window <- 5000 # append images to gobject g <- setGiotto(g, img_list) # example plots spatInSituPlotPoints(g, show_image = TRUE, image_name = "HE", polygon_feat_type = "cell", polygon_color = "cyan", polygon_line_size = 0.1, polygon_alpha = 0 ) spatInSituPlotPoints(g, show_image = TRUE, image_name = "DAPI", polygon_feat_type = "nucleus", polygon_color = "cyan", polygon_line_size = 0.1, polygon_alpha = 0 ) spatInSituPlotPoints(g, show_image = TRUE, image_name = "18S", polygon_feat_type = "cell", polygon_color = "cyan", polygon_line_size = 0.1, polygon_alpha = 0 ) spatInSituPlotPoints(g, show_image = TRUE, image_name = "ATP1A1/CD45/E-Cadherin", polygon_feat_type = "nucleus", polygon_color = "cyan", polygon_line_size = 0.1, polygon_alpha = 0 ) Figure 10.6: H&E and Cell polys (top left), DAPI and nuclear polys (top right), 18S and cell polys (lower left), ATP1A1/CD45/E-Cadherin and nuclear polys (lower right) 10.6 Spatial aggregation First calculate the feat_info “rna” transcripts overlapped by the spatial_info “cell” polygons with calculateOverlap(). Then, the overlaps information (relationships between points and polygons that overlap them) gets converted into a count matrix with overlapToMatrix(). g <- calculateOverlap(g, spatial_info = "cell", feat_info = "rna" ) g <- overlapToMatrix(g) 10.7 Aggregate analyses workflow 10.7.1 Transcripts per cell g <- addStatistics(g) # this is going to fail because it looks for normalized g <- addStatistics(g, expression_values = "raw") cell_stats <- pDataDT(g) ggplot2::ggplot(cell_stats, ggplot2::aes(total_expr)) + ggplot2::geom_histogram(binwidth = 5) Figure 10.7: Histogram of detections per cell 10.7.2 Filtering # very permissive filtering. Mainly for removing 0 values g <- filterGiotto(g, expression_threshold = 1, feat_det_in_min_cells = 1, min_det_feats_per_cell = 5 ) Feature type: rna Number of cells removed: 143 out of 7655 Number of feats removed: 0 out of 377 10.7.3 Normalization g <- normalizeGiotto(g) # overwrite original results with those for normalized values g <- addStatistics(g) spatInSituPlotPoints(g, polygon_fill = "nr_feats", polygon_fill_gradient_style = "sequential", polygon_fill_as_factor = FALSE ) spatInSituPlotPoints(g, polygon_fill = "total_expr", polygon_fill_gradient_style = "sequential", polygon_fill_as_factor = FALSE ) Figure 10.8: nr_feats - Number of different gene species detected per cell (left), total_expr - total detections per cell (right) When there are a lot of features, we would also select only the interesting highly variable features so that downstream dimension reduction has more meaningful separation. Here we skip HVF detection since there are only 377 genes. 10.7.4 Dimension Reduction Dimensional reduction of expression space to visualize expressional differences between cells and help with clustering. g <- runPCA(g, feats_to_use = NULL) # feats_to_use = NULL since there are no HVFs calculated. Use all genes. screePlot(g, ncp = 30) Figure 10.9: Plot of variance explained in the first 30 out of 100 principle components calculated g <- runUMAP(g, dimensions_to_use = seq(15), n_neighbors = 40 # default ) plotPCA(g) plotUMAP(g) Figure 10.10: PCA plot showing the first 2 PCs (left), UMAP generated from first 15 PCs (right) 10.7.5 Clustering g <- createNearestNetwork(g, dimensions_to_use = seq(15), k = 40 ) # takes roughly 1 min to run g <- doLeidenCluster(g) plotPCA_3D(g, cell_color = "leiden_clus", point_size = 1 ) plotUMAP(g, cell_color = "leiden_clus", point_size = 0.1, point_shape = "no_border" ) Figure 10.11: 3D plot showing first PCs with leiden clustering annotations (left), UMAP plot showing leiden clustering results (right) spatInSituPlotPoints(g, polygon_fill = "leiden_clus", polygon_fill_as_factor = TRUE, polygon_alpha = 1, show_image = TRUE, image_name = "HE" ) Figure 10.12: Spatial plot with leiden clustering annotations. 10.8 Niche clustering Building on top of these leiden annotations, we can define spatial niche signatures based on which leiden types are often found together. 10.8.1 Spatial network First a spatial network must be generated so that spatial relationships between cells can be understood. g <- createSpatialNetwork(g, method = "Delaunay" ) spatPlot2D(g, point_shape = "no_border", show_network = TRUE, point_size = 0.1, point_alpha = 0.5, network_color = "grey" ) Figure 10.13: Delaunay spatial network` 10.8.2 Niche calculation Calculate a proportion table for a cell metadata table for all the spatial neighbors of each cell. This means that with each cell established as the center of its local niche, the enrichment of each leiden cluster label is found for that local niche. The results are stored as a new spatial enrichment entry called “leiden_niche” g <- calculateSpatCellMetadataProportions(g, spat_network = "Delaunay_network", metadata_column = "leiden_clus", name = "leiden_niche" ) 10.8.3 k-means clustering based on niche signature # retrieve the niche info prop_table <- getSpatialEnrichment(g, name = "leiden_niche", output = "data.table") # convert to matrix prop_matrix <- GiottoUtils::dt_to_matrix(prop_table) # perform kmeans clustering set.seed(1234) # make kmeans clustering reproducible prop_kmeans <- kmeans( x = prop_matrix, centers = 7, # controls how many clusters will be formed iter.max = 1000, nstart = 100 ) prop_kmeansDT = data.table::data.table( cell_ID = names(prop_kmeans$cluster), niche = prop_kmeans$cluster ) # return kmeans clustering on niche to gobject g <- addCellMetadata(g, new_metadata = prop_kmeansDT, by_column = TRUE, column_cell_ID = "cell_ID" ) # visualize niches spatInSituPlotPoints(g, show_image = TRUE, image_name = "HE", polygon_fill = "niche", # polygon_fill_code = getColors("Accent", 8), polygon_alpha = 1, polygon_fill_as_factor = TRUE ) # visualize niche makeup cellmeta <- pDataDT(g) ggplot2::ggplot( cellmeta, ggplot2::aes(fill = as.character(leiden_clus), y = 1, x = as.character(niche))) + ggplot2::geom_bar(position = "fill", stat = "identity") + ggplot2::scale_fill_manual(values = c( "#E7298A", "#FFED6F", "#80B1D3", "#E41A1C", "#377EB8", "#A65628", "#4DAF4A", "#D9D9D9", "#FF7F00", "#BC80BD", "#666666", "#B3DE69") ) Figure 10.14: Leiden annotation-based spatial niches Figure 10.15: Stacked barplot of leiden annotation composition by niche. Coloring is matched to that of the previous spatial plot with leiden clustering annotations 10.9 Cell proximity enrichment Using a spatial network, determine if there is an enrichment or depletion between annotation types by calculating the observed over the expected frequency of interactions. # uses a lot of memory leiden_prox <- cellProximityEnrichment(g, cluster_column = "leiden_clus", spatial_network_name = "Delaunay_network", adjust_method = "fdr", number_of_simulations = 2000 ) cellProximityBarplot(g, CPscore = leiden_prox, min_orig_ints = 5, # minimum original cell-cell interactions min_sim_ints = 5 # minimum simulated cell-cell interactions ) Figure 10.16: Cell-cell interaction enrichments and depletions (left). Number of interactions of each type found (right) Most enrichments are self-self interactions, which is expected. However, 6–8 and 2–9 stand out as being hetero interactions that are enriched with a large number of interactions. We can take a closer look by plotting these annotation pairs with colors that stand out. # set up colors other_cell_color <- rep("grey", 12) int_6_8 <- int_2_9 <- other_cell_color int_6_8[c(6, 8)] <- c("orange", "cornflowerblue") int_2_9[c(2, 9)] <- c("orange", "cornflowerblue") spatInSituPlotPoints(g, polygon_fill = "leiden_clus", polygon_fill_as_factor = TRUE, polygon_fill_code = int_6_8, polygon_line_size = 0.1, polygon_alpha = 1, show_image = TRUE, image_name = "HE" ) spatInSituPlotPoints(g, polygon_fill = "leiden_clus", polygon_fill_as_factor = TRUE, polygon_fill_code = int_2_9, polygon_line_size = 0.1, show_image = TRUE, polygon_alpha = 1, image_name = "HE" ) Figure 10.17: Spatial plot of enriched leiden annotation 6 to 8 interactions Figure 10.18: Spatial plot of enriched leiden annotation 2 to 9 interactions 10.10 Pseudovisium Another thing we can do is create a “pseudovisium” dataset by tessellating across this dataset using the same layout and resolution as a Visium capture array. makePseudoVisium() generates a Visium array of circular polygons across the spatial extent provided. Here we use ext() with the prefer arg pointing to the polygon and points data and all_data = TRUE, meaning that the combined spatial extent of those two data types will be returned, giving a good measure of where all the data in the object is at the moment. micron_size = 1 since the Xenium data is already scaled to microns. pvis <- makePseudoVisium( extent = ext(g, prefer = c("polygon", "points"), all_data = TRUE), # all_data = TRUE is the default micron_size = 1 ) g <- setGiotto(g, pvis) g <- addSpatialCentroidLocations(g, poly_info = "pseudo_visium") plot(pvis) Figure 10.19: Pseudovisium spot geometries generated by makePseudoVisium() 10.10.1 Pseudovisium aggregation and workflow Make “pseudo_visium” the new default spatial unit then proceed with aggregation and usual aggregate workflow. activeSpatUnit(g) <- "pseudo_visium" g <- calculateOverlap(g, spatial_info = "pseudo_visium", feat_info = "rna" ) g <- overlapToMatrix(g) g <- filterGiotto(g, expression_threshold = 1, feat_det_in_min_cells = 1, min_det_feats_per_cell = 100 ) g <- normalizeGiotto(g) g <- addStatistics(g) spatInSituPlotPoints(g, show_image = TRUE, image_name = "HE", polygon_feat_type = "pseudo_visium", polygon_fill = "total_expr", polygon_fill_gradient_style = "sequential" ) Figure 10.20: Pseudo visium total detections per spot g <- runPCA(g, feats_to_use = NULL) g <- runUMAP(g, dimensions_to_use = seq(15), n_neighbors = 15 ) g <- createNearestNetwork(g, dimensions_to_use = seq(15), k = 15 ) g <- doLeidenCluster(g, resolution = 1.5) # plots plotPCA(g, cell_color = "leiden_clus", point_size = 2) plotUMAP(g, cell_color = "leiden_clus", point_size = 2) spatInSituPlotPoints(g, polygon_feat_type = "pseudo_visium", polygon_fill = "leiden_clus", polygon_fill_as_factor = TRUE ) spatInSituPlotPoints(g, polygon_feat_type = "pseudo_visium", polygon_fill = "leiden_clus", polygon_fill_as_factor = TRUE, show_image = TRUE, image_name = "HE" ) Figure 10.21: Leiden clustering in PCA (top left) and UMAP (top right) spaces, and in spatial plot with no image (bottom left), and with image (bottom right) "],["spatial-proteomics-multiplexed-immunofluorescence.html", "11 Spatial proteomics: Multiplexed Immunofluorescence 11.1 Spatial Proteomics Technologies 11.2 Raw data type coming out of different technologies 11.3 Cell Segmentation file to get single cell level protein expression 11.4 Create a Giotto Object using list of gitto large images and polygons 11.5 Session info", " 11 Spatial proteomics: Multiplexed Immunofluorescence Junxiang Xu August 6th 2024 Before you start, this tutorial contains an optional part to run image segmentation using Giotto wrapper of Cellpose. If considering to use that function, please restart R session as we will need to activate a new Giotto python environment. The environment is also compatible with other Giotto functions. We will also need to install the Cellpose supported Giotto environment if haven’t done so. #Install the Giotto Environment with Cellpose, note that we only need to do it once reticulate::conda_create(envname = "giotto_cellpose", python_version = 3.8) #.re.restartR() reticulate::use_condaenv("giotto_cellpose") reticulate::py_install( pip = TRUE, envname = "giotto_cellpose", packages = c( "pandas", "networkx", "python-igraph", "leidenalg", "scikit-learn", "cellpose", "smfishhmrf", "tifffile", "scikit-image" ) ) #.rs.restartR() Now, activate the Giotto python environment. #.rs.restartR() # Activate the Giotto python environment of your choice GiottoClass::set_giotto_python_path("giotto_cellpose") # Check if cellpose was successfully installed GiottoUtils::package_check("cellpose", repository = "pip") 11.1 Spatial Proteomics Technologies This tutorial is aimed at analyzing spatially resolved multiplexed immunofluorescence data. It is compatible for different kinds of image based spatial proteomics data, such as Akoya(CODEX), CyCIF, IMC, MIBI, and Lunaphore(seqIF). Note that this tutorial will focus on starting directly with the intensity data(image), not the decoded count matrix. This is the example Lunaphore dataset from Lunaphore the official website and we are using the cropped one small area as an example. This is an overview of a subset of how the data would look like. 11.2 Raw data type coming out of different technologies 11.2.1 Use ome.tiff as an example output data to begin with OME-TIFF (Open Microscopy Environment Tagged Image File Format) is a file format designed for including detailed metadata and support for multi-dimensional image data. This is a common output file format for spatial proteomics platform such as lunaphore. library(Giotto) instrs <- createGiottoInstructions(save_dir = file.path(getwd(),"/img/02_session4/"), save_plot = TRUE, show_plot = TRUE, python_path = "giotto_cellpose") options(timeout = Inf) data_dir <- "data/02_session4" destfile <- file.path(data_dir, "Lunaphore.zip") if (!dir.exists(data_dir)) { dir.create(data_dir, recursive = TRUE) } download.file("https://zenodo.org/records/13175721/files/Lunaphore.zip?download=1", destfile = destfile) unzip(file.path(data_dir, "/Lunaphore.zip"), exdir = data_dir) list.files(file.path(data_dir, "/Lunaphore")) We provide a way to extract meta data information directly from ome.tiffs. Please note that different platforms may store the meta data such as channel information in a different format, we will probably need to change the node names of the ome-XML. img_path <- file.path(data_dir, "/Lunaphore/Lunaphore_example.ome.tiff") img_meta <- ometif_metadata(img_path, node = "Channel", output = "data.frame") img_meta However, sometimes a simple ometiff file manipulation like cropping could result in a loss of ome-XML information from the ome.tiff file. That way, we can use a different strategy to parse the xml information seperately and get channel information from it. ## Get channel information Luna <- file.path(data_dir, "/Lunaphore/Lunaphore_example.ome.tiff") xmldata <- xml2::read_xml(file.path(data_dir,"/Lunaphore/Lunaphore_sample_metadata.xml")) node <- xml2::xml_find_all(xmldata, "//d1:Channel", ns = xml2::xml_ns(xmldata)) channel_df <- as.data.frame(Reduce("rbind", xml2::xml_attrs(node))) channel_df 11.2.2 Use single channel images as an example output data to begin with Some platforms may also deconvolute and output gray scale single channel images. And we can create single channel images from ome.tiffs, the single channel images will be of the same format if the platform provide single channel gray scale images. With the single channel images, we can create a GiottoLargeImage and see what it looks like. # Create multichannel raster and extract each single channels Luna_terra <- terra::rast(Luna) names(Luna_terra) <- channel_df$Name gimg_DAPI <- createGiottoLargeImage(Luna_terra[[1]], negative_y = FALSE, flip_vertical = TRUE) plot(gimg_DAPI) Extract and save the raster image for future use. single_channel_dir <- file.path(data_dir, "/Lunaphore/single_channels/") if (!dir.exists(single_channel_dir)) { dir.create(single_channel_dir, recursive = TRUE) } for (i in 1:nrow(channel_df)){ single_channel <- terra::subset(Luna_terra, i) terra::writeRaster(single_channel, filename = paste0(single_channel_dir,names(single_channel),".tiff"), overwrite = TRUE) } Create a list of GiottoLargeImages using single channel rasters. file_names <- list.files(single_channel_dir, full.names = TRUE) image_names <- sub("\\\\.tiff$", "", list.files(single_channel_dir)) gimg_list <- createGiottoLargeImageList(raster_objects = file_names, names = image_names, negative_y = FALSE, flip_vertical = TRUE) names(gimg_list) <- image_names plot(gimg_list[["Vimentin"]]) 11.3 Cell Segmentation file to get single cell level protein expression Cell segmentation is necessary to generate single cell level protein expression. Currently, there are multiple algorithms to generate segmentations from images and output could be different. For that purpose, Giotto provides createGiottoPolygonsFromMask(), createGiottoPolygonsFromDfr(), createGiottoPolygonsFromGeoJSON() to load different type of file to the giottoPolygon Class. 11.3.1 Using segmentation output file from DeepCell(mesmer) as an example. We collapsed several different channels to created a pseudo memberane staining channel(“nuc_and_bound.tif” provided here), and use that as an input for the deepcell mesmer segmentation pipeline. We can load the output mask from to GiottoPolygon via a convenience function. gpoly_mesmer <- createGiottoPolygonsFromMask( file.path(data_dir, "/Lunaphore/whole_cell_mask.tif"), shift_horizontal_step = FALSE, shift_vertical_step = FALSE, flip_vertical = TRUE, calc_centroids = TRUE) plot(gpoly_mesmer) We can also zoom in to check how does the segmentation look. zoom <- c(2000,2500,2000,2500) plot(gimg_DAPI, ext = zoom) plot(gpoly_mesmer, add = TRUE, border = "white", ext = zoom) 11.3.2 Using Giotto wrapper of Cellpose to perform segmentation Here, we create a mini example by cropping the image to a smaller area. Note that crop() is probably easier to use to directly crop image, unless cropping the image when the image is inside of a giotto object. gimg_cropped <- cropGiottoLargeImage(giottoLargeImage = gimg_DAPI, crop_extent = terra::ext(zoom)) writeGiottoLargeImage(gimg_cropped, filename = file.path(data_dir, "/Lunaphore/DAPI_forcellpose.tiff"), overwrite = TRUE) #Create a giotto image to evaluate segmentation gimg_for_cellpose <- createGiottoLargeImage( file.path(data_dir, "/Lunaphore/DAPI_forcellpose.tiff"), negative_y = FALSE) Now we can run the cellpose segmentation. We can provide different parameters for cellpose inference model(flow_threshold,cellprob_threshold,etc), and practically, the batch size represents how many 224X224 images are calculated in parallel, increasing the amount will increase RAM/VRAM requirement, lowering the amount will increase the run time. For more information please refer to the cellpose website doCellposeSegmentation(image_dir = file.path(data_dir, "/Lunaphore/DAPI_forcellpose.tiff"), mask_output = file.path(data_dir, "/Lunaphore/giotto_cellpose_seg.tiff"), channel_1 = 0, channel_2 = 0, model_name = "cyto3", batch_size = 12) cpoly <- createGiottoPolygonsFromMask(file.path(data_dir,"/Lunaphore/giotto_cellpose_seg.tiff"), shift_horizontal_step = FALSE, shift_vertical_step = FALSE, flip_vertical = TRUE) plot(gimg_for_cellpose) plot(cpoly, add = TRUE, border = "red") 11.4 Create a Giotto Object using list of gitto large images and polygons You will need to have: list of giotto images giottoPolygon created from segmentation Lunaphore_giotto <- createGiottoObjectSubcellular(gpolygons = list("cell" = gpoly_mesmer), images = gimg_list, instructions = instrs) Lunaphore_giotto 11.4.1 Overlap to matrix calculateOverlap() and overlapToMatrix() are used to overlap the intensity values with Lunaphore_giotto <- calculateOverlap(Lunaphore_giotto, spatial_info = "cell", image_names = names(gimg_list)) Lunaphore_giotto <- overlapToMatrix(x = Lunaphore_giotto, type ="intensity", poly_info = "cell", feat_info = "protein", aggr_function = "sum") showGiottoExpression(Lunaphore_giotto) 11.4.2 Manipulate Expression information For IF data, DAPI staining is usually only used for stain nuclei, the intensity value of DAPI usually does not have meaningful result to drive difference between cell types. Similar things could happen when a platform uses some reference channel to adjust signal calling, such as TRITC or Cy5, These images will be loaded but need to be removed for expression profile. Therefore, we could extract the feature expression matrix, filter the DAPI information and write it back to the Giotto Object expr_mtx <- getExpression(Lunaphore_giotto, values = "raw", output = "matrix") filtered_expr_mtx <- expr_mtx[rownames(expr_mtx) != "DAPI",] Lunaphore_giotto <- setExpression(Lunaphore_giotto, feat_type = "protein", x = createExprObj(filtered_expr_mtx), name = "raw") showGiottoExpression(Lunaphore_giotto) 11.4.3 Rescale polygons rescalePolygons() will provide a quick way to manipulate the polygon size and potentially affect the expression for each cell. redo the calculateOverlap() and overlapToMatrix() will potentially change the downstream analysis Lunaphore_giotto <- rescalePolygons(gobject = Lunaphore_giotto, poly_info = "cell", name = "smallcell", fx = 0.7, fy = 0.7, calculate_centroids = TRUE) smallpoly <- getPolygonInfo(Lunaphore_giotto, polygon_name = "smallcell") plot(gimg_DAPI, ext = zoom) plot(gpoly_mesmer, add = TRUE, border = "white", ext = zoom) plot(smallpoly, add = TRUE, border = "red", ext = zoom) 11.4.4 Perform clustering and differential expression The Giotto Object can then go through standard analysis pipeline normalization, dimensional reduction and clustering Lunaphore_giotto <- normalizeGiotto(Lunaphore_giotto, spat_unit = "cell", feat_type = "protein") Lunaphore_giotto <- addStatistics(Lunaphore_giotto, spat_unit = "cell", feat_type = "protein") Lunaphore_giotto <- runPCA(Lunaphore_giotto, spat_unit = "cell", feat_type = "protein", scale_unit = FALSE, center = FALSE, ncp = 20, feats_to_use = NULL, set_seed = TRUE) screePlot(Lunaphore_giotto, spat_unit = "cell", feat_type = "protein", show_plot = TRUE) Due to the limited number of total features we have, Leiden clustering generally does not work very well compared to Kmeans or hierarchical clustering. Here we can use hierarchical clustering to do a quick check. Lunaphore_giotto <- runUMAP(gobject = Lunaphore_giotto, spat_unit = "cell", feat_type = "protein", dimensions_to_use = 1:5, set_seed = TRUE) Lunaphore_giotto <- createNearestNetwork(Lunaphore_giotto, spat_unit = "cell", feat_type = "protein", dimensions_to_use = 1:5) Lunaphore_giotto <- doHclust(Lunaphore_giotto, k = 8, dim_reduction_to_use = "cells", spat_unit = "cell", feat_type = "protein") spatInSituPlotPoints(gobject = Lunaphore_giotto, spat_unit = "cell", polygon_feat_type = "cell", show_polygon = TRUE, feat_type = "protein", feats = NULL, polygon_fill = "hclust", polygon_fill_as_factor = TRUE, polygon_line_size = 0, image_name = "CD68", show_image = TRUE, return_plot = TRUE, polygon_color = "black", background_color = "white") Then we can check the heatmap of protein expression and determine the first round of cluster annotation. cluster_column <- "hclust" plotMetaDataHeatmap(Lunaphore_giotto, spat_unit = "cell", feat_type = "protein", expression_values = "raw", metadata_cols = cluster_column, selected_feats = names(gimg_list), y_text_size = 8, show_values = "zscores_rescaled") 11.4.5 Give the cluster an annotation based on expression values annotation <- c("B_cell", "Macrophage", "T_cell", "stromal", "epithelial", "DC" , "Fibroblast", "endothelial") names(annotation) <- 1:8 Lunaphore_giotto <- annotateGiotto(Lunaphore_giotto, cluster_column = "hclust", annotation_vector = annotation, name = "cell_types") 11.4.6 Spatial network This is to create a cellular neighborhood based on nearest neighbor of physical distance. Lunaphore_giotto <- createSpatialNetwork(Lunaphore_giotto) spatPlot2D(Lunaphore_giotto, show_network = TRUE, network_color = "blue", point_size = 1.5, cell_color = "hclust") 11.4.7 Cell Neighborhood: Cell-Type/Cell-Type Interactions This is using cellProximityEnrichment() to statistically identify cell type interactions. cell_proximities <- cellProximityEnrichment(gobject = Lunaphore_giotto, cluster_column = "cell_types", spatial_network_name = "Delaunay_network", adjust_method = "fdr", number_of_simulations = 2000) ## barplot cellProximityBarplot(gobject = Lunaphore_giotto, CPscore = cell_proximities, min_orig_ints = 5, min_sim_ints = 5) ## network cellProximityNetwork(gobject = Lunaphore_giotto, CPscore = cell_proximities, remove_self_edges = TRUE, only_show_enrichment_edges = FALSE) 11.5 Session info sessionInfo() R version 4.4.1 (2024-06-14) Platform: aarch64-apple-darwin20 Running under: macOS Sonoma 14.5 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: America/New_York tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Giotto_4.1.0 GiottoClass_0.3.4 loaded via a namespace (and not attached): [1] RColorBrewer_1.1-3 rstudioapi_0.16.0 jsonlite_1.8.8 [4] magrittr_2.0.3 magick_2.8.4 farver_2.1.2 [7] rmarkdown_2.27 zlibbioc_1.50.0 ragg_1.3.2 [10] vctrs_0.6.5 memoise_2.0.1 GiottoUtils_0.1.10 [13] terra_1.7-78 htmltools_0.5.8.1 S4Arrays_1.4.1 [16] raster_3.6-26 SparseArray_1.4.8 sass_0.4.9 [19] bslib_0.8.0 KernSmooth_2.23-24 htmlwidgets_1.6.4 [22] plyr_1.8.9 plotly_4.10.4 cachem_1.1.0 [25] igraph_2.0.3 lifecycle_1.0.4 pkgconfig_2.0.3 [28] rsvd_1.0.5 Matrix_1.7-0 R6_2.5.1 [31] fastmap_1.2.0 GenomeInfoDbData_1.2.12 MatrixGenerics_1.16.0 [34] digest_0.6.36 colorspace_2.1-1 S4Vectors_0.42.1 [37] irlba_2.3.5.1 textshaping_0.4.0 GenomicRanges_1.56.1 [40] beachmat_2.20.0 labeling_0.4.3 fansi_1.0.6 [43] httr_1.4.7 polyclip_1.10-7 abind_1.4-5 [46] compiler_4.4.1 proxy_0.4-27 withr_3.0.1 [49] backports_1.5.0 BiocParallel_1.38.0 viridis_0.6.5 [52] DBI_1.2.3 highr_0.11 ggforce_0.4.2 [55] MASS_7.3-61 DelayedArray_0.30.1 rjson_0.2.21 [58] classInt_0.4-10 gtools_3.9.5 GiottoVisuals_0.2.4 [61] tools_4.4.1 units_0.8-5 glue_1.7.0 [64] dbscan_1.2-0 grid_4.4.1 sf_1.0-16 [67] checkmate_2.3.2 reshape2_1.4.4 generics_0.1.3 [70] gtable_0.3.5 class_7.3-22 tidyr_1.3.1 [73] data.table_1.15.4 BiocSingular_1.20.0 tidygraph_1.3.1 [76] ScaledMatrix_1.12.0 sp_2.1-4 xml2_1.3.6 [79] utf8_1.2.4 XVector_0.44.0 BiocGenerics_0.50.0 [82] RcppAnnoy_0.0.22 ggrepel_0.9.5 pillar_1.9.0 [85] stringr_1.5.1 dplyr_1.1.4 tweenr_2.0.3 [88] lattice_0.22-6 deldir_2.0-4 tidyselect_1.2.1 [91] SingleCellExperiment_1.26.0 knitr_1.48 gridExtra_2.3 [94] bookdown_0.40 IRanges_2.38.1 SummarizedExperiment_1.34.0 [97] scattermore_1.2 stats4_4.4.1 xfun_0.46 [100] graphlayouts_1.1.1 Biobase_2.64.0 matrixStats_1.3.0 [103] stringi_1.8.4 UCSC.utils_1.0.0 lazyeval_0.2.2 [106] yaml_2.3.10 evaluate_0.24.0 codetools_0.2-20 [109] ggraph_2.2.1 tibble_3.2.1 BiocManager_1.30.23 [112] colorRamp2_0.1.0 cli_3.6.3 uwot_0.2.2 [115] reticulate_1.38.0 systemfonts_1.1.0 jquerylib_0.1.4 [118] munsell_0.5.1 Rcpp_1.0.13 GenomeInfoDb_1.40.1 [121] png_0.1-8 parallel_4.4.1 ggplot2_3.5.1 [124] exactextractr_0.10.0 SpatialExperiment_1.14.0 viridisLite_0.4.2 [127] scales_1.3.0 e1071_1.7-14 purrr_1.0.2 [130] crayon_1.5.3 rlang_1.1.4 cowplot_1.1.3 "],["working-with-multiple-samples.html", "12 Working with multiple samples 12.1 Objective 12.2 Background 12.3 Create individual giotto objects 12.4 Extracting the downloaded files 12.5 Join Giotto Objects 12.6 Visualizing combined datasets 12.7 Splitting combined dataset 12.8 Analyzing joined objects 12.9 Perform Harmony and default workflows", " 12 Working with multiple samples Jeff Sheridan August 7th 2024 12.1 Objective Giotto enables the grouping of multiple objects into a single object for combined analysis. Grouping objects can be used to ensure normalization is consistent across datasets allowing us to compare datasets directly. Datasets can be spatially distributed across the x, y, or z axes, allowing for the creation of 3D datasets using the z-plane or the analysis of grouped datasets, such as multiple replicates or similar samples. While it’s possible to integrate multiple datasets, batch effects and differences between samples can hinder effective integration. In such cases, more sophisticated methods may be needed to successfully integrate and cluster samples as a unified dataset. One example of an advanced integration technique is Harmony, which will be discussed in more detail later in this tutorial. This tutorial will demonstrate the integration of two Visium datasets, examining the results before and after Harmony integration. 12.2 Background 12.2.1 Dataset For this tutorial we will be using two prostate visium datasets produced by 10X Genomics, one an Adenocarcinoma with Invasive Carcinoma and the other a normal prostate sample. 12.2.2 Visium technology Figure 12.1: Overview of Visium. Source: 10X Genomics. Visium by 10x Genomics is a spatial gene expression platform that allows for the mapping of gene expression to high-resolution histology through RNA sequencing The process involves placing a tissue section on a specially prepared slide with an array of barcoded spots, which are 55 µm in diameter with a spot to spot distance of 100 µm. Each spot contains unique barcodes that capture the mRNA from the tissue section, preserving the spatial information. After the tissue is imaged and RNA is captured, the mRNA is sequenced, and the data is mapped back to the tissue’s spatial coordinates. This technology is particularly useful in understanding complex tissue environments, such as tumors, by providing insights into how gene expression varies across different regions. 12.3 Create individual giotto objects 12.3.1 Download the data You need to download the expression matrix and spatial information by running these commands: data_dir <- "data/03_session1" dir.create(file.path(data_dir, "Visium_FFPE_Human_Prostate_Cancer"), showWarnings = FALSE, recursive = TRUE) # Spatial data adenocarcinoma prostate download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Prostate_Cancer/Visium_FFPE_Human_Prostate_Cancer_spatial.tar.gz", destfile = file.path(data_dir, "Visium_FFPE_Human_Prostate_Cancer/Visium_FFPE_Human_Prostate_Cancer_spatial.tar.gz")) # Download matrix adenocarcinoma prostate download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Prostate_Cancer/Visium_FFPE_Human_Prostate_Cancer_raw_feature_bc_matrix.tar.gz", destfile = file.path(data_dir, "Visium_FFPE_Human_Prostate_Cancer/Visium_FFPE_Human_Prostate_Cancer_raw_feature_bc_matrix.tar.gz")) dir.create(file.path(data_dir, "Visium_FFPE_Human_Normal_Prostate"), showWarnings = FALSE, recursive = TRUE) # Spatial data normal prostate download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Normal_Prostate/Visium_FFPE_Human_Normal_Prostate_spatial.tar.gz", destfile = file.path(data_dir, "Visium_FFPE_Human_Normal_Prostate/Visium_FFPE_Human_Normal_Prostate_spatial.tar.gz")) # Download matrix normal prostate download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Normal_Prostate/Visium_FFPE_Human_Normal_Prostate_raw_feature_bc_matrix.tar.gz", destfile = file.path(data_dir, "Visium_FFPE_Human_Normal_Prostate/Visium_FFPE_Human_Normal_Prostate_raw_feature_bc_matrix.tar.gz")) 12.4 Extracting the downloaded files # The adenocarcinoma sample untar(tarfile = file.path(data_dir, "Visium_FFPE_Human_Prostate_Cancer/Visium_FFPE_Human_Prostate_Cancer_spatial.tar.gz"), exdir = file.path(data_dir, "Visium_FFPE_Human_Prostate_Cancer")) untar(tarfile = file.path(data_dir, "Visium_FFPE_Human_Prostate_Cancer/Visium_FFPE_Human_Prostate_Cancer_raw_feature_bc_matrix.tar.gz"), exdir = file.path(data_dir, "Visium_FFPE_Human_Prostate_Cancer")) # The normal prostate sample untar(tarfile = file.path(data_dir, "Visium_FFPE_Human_Normal_Prostate/Visium_FFPE_Human_Normal_Prostate_spatial.tar.gz"), exdir = file.path(data_dir, "Visium_FFPE_Human_Normal_Prostate")) untar(tarfile = file.path(data_dir, "Visium_FFPE_Human_Normal_Prostate/Visium_FFPE_Human_Normal_Prostate_raw_feature_bc_matrix.tar.gz"), exdir = file.path(data_dir, "Visium_FFPE_Human_Normal_Prostate")) 12.4.1 Create giotto instructions We must first create instructions for our Giotto object. This will tell the object where to save outputs, whether to show or return plots, and the python path. Specifying the python path is often not required as Giotto will identify the relevant python environment, but might be required in some instances. library(Giotto) save_dir <- "results/03_session1" instrs <- createGiottoInstructions(save_dir = save_dir, save_plot = TRUE, show_plot = TRUE, python_path = NULL) 12.4.2 Load visium data into Giotto We next need to read in the data for the Giotto object. To do this we will use the createGiottoVisiumObject() convenience function. This requires us to specify the directory that contains the visium data output from 10X Genomics’s Spaceranger. We also specify the expression data to use (raw or filtered) as well as the image to align. Spaceranger outputs two images, a low and high resolution image. ## Healthy prostate N_pros <- createGiottoVisiumObject( visium_dir = file.path(data_dir, "Visium_FFPE_Human_Normal_Prostate"), expr_data = "raw", png_name = "tissue_lowres_image.png", gene_column_index = 2, instructions = instrs ) ## Adenocarcinoma C_pros <- createGiottoVisiumObject( visium_dir = file.path(data_dir, "Visium_FFPE_Human_Prostate_Cancer"), expr_data = "raw", png_name = "tissue_lowres_image.png", gene_column_index = 2, instructions = instrs ) We can see that the gobject contains information for the cells (polygon and spatial units), the RNA express (raw) and the relevant image. Figure 12.2: Structure of Giotto object containing a single dataset. 12.4.3 Healthy prostate tissue coverage Aligning the Visium spots to the tissue using the fiducials that border the capture area enables the identification of spots containing expression data from the tissue. These spots can be visualized using the spatPlot2D function by setting the cell_color parameter to “in_tissue”. spatPlot2D(gobject = N_pros, cell_color = "in_tissue", show_image = TRUE, point_size = 2.5, cell_color_code = c("black", "red"), point_alpha = 0.5, save_param = list(save_name = "03_ses1_normal_prostate_tissue")) Figure 12.3: Tissue coverage for the normal prostate sample. 12.4.4 Adenocarcinoma prostate tissue coverage spatPlot2D(gobject = C_pros, cell_color = "in_tissue", show_image = TRUE, point_size = 2.5, cell_color_code = c("black", "red"), point_alpha = 0.5, save_param = list(save_name = "03_ses1_adeno_prostate_tissue")) Figure 12.4: Tissue coverage for the adenocarcinoma prostate sample. 12.4.5 Showing the data strucutre for the inidividual objects # Printing the file structure for the individual datasets print(head(pDataDT(N_pros))) print(N_pros) 12.5 Join Giotto Objects To join objects together we can use the joinGittoObjects() function. For this we need to supply a list of objects as well as the names for each of these objects. We can also specify the x and y padding to separate the objects in space or the Z position for 3D datasets. If the x_shift is set to NULL then the total shift will be guessed from the Giotto image. combined_pros <- joinGiottoObjects(gobject_list = list(N_pros, C_pros), gobject_names = c("NP", "CP"), join_method = "shift", x_padding = 1000) # Printing the file structure for the individual datasets print(head(pDataDT(combined_pros))) print(combined_pros) From the joined data we can see the same information that was present in the single dataset objects as well as the addition of another image. The images are renamed from “image” to include the object name in the image name e.g. “NP-image”. We can also see in the cell metadata that there is a new column “list_ID” that contains the original object names. The cell_ID column also has the original object name appended to the beginning of each cell ID e.g. “NP-AAACAACGAATAGTTC-1”. Figure 12.5: Structure of Giotto object containing two datasets (left) and cell metadata on the left. Note the addition of multiple images and the addition of the list_ID column to define the dataset. 12.6 Visualizing combined datasets The combined dataset can either visualized in the same space or in two separate plots through the group_by variable. To show images both the show_image variable and the image_name variable containing both image names needs to be used. 12.6.1 Vizualizing in the same plot Due to the x_padding provided when joining the objects each of the datasets can be visualized in the same plotting area. We can see below the normal prostate sample on the left and the healthy prostate on the right. By including the show_image function and supplying both of the image names (“NP-image”, “CP-image”), we can also include the relevant images within the same plot. spatPlot2D(gobject = combined_pros, cell_color = "in_tissue", cell_color_code = c("black", "red"), show_image = TRUE, image_name = c("NP-image", "CP-image"), point_size = 1, point_alpha = 0.5, save_param = list(save_name = "03_ses1_combined_tissue")) Figure 12.6: Vizualizing the visium spots that overlap tissue in normal prostate (left) and adenocarcinoma samples (right) within the same plot. 12.6.2 Visualizing on separate plots If we want to visualize the datasets in separate plots we can supply the “group_by” variable. Below we group the data by “list_ID”, which corresponds to each dataset. We can specify the number of columns through the “cow_n_col” variable. spatPlot2D(gobject = combined_pros, cell_color = "in_tissue", cell_color_code = c("black", "pink"), show_image = TRUE, image_name = c("NP-image", "CP-image"), group_by = "list_ID", point_alpha = 0.5, point_size = 0.5, cow_n_col = 1, save_param = list(save_name = "03_ses1_combined_tissue_group")) Figure 12.7: Vizualizing the visium spots that overlap tissue in normal prostate (left) and adenocarcinoma samples (right) in separate plots. 12.7 Splitting combined dataset If needed it’s possible to split the individual objects into single objects again through subsetting the cell metadata as shown below. # Getting the cell information combined_cells <- pDataDT(combined_pros) np_cells <- combined_cells[list_ID == "NP"] np_split <- subsetGiotto(combined_pros, cell_ids = np_cells$cell_ID, poly_info = np_cells$cell_ID, spat_unit = ":all:") spatPlot2D(gobject = np_split, cell_color = "in_tissue", cell_color_code = c("black", "red"), show_image = TRUE, point_alpha = 0.5, point_size = 0.5, save_param = list(save_name = "03_ses1_split_object")) Figure 12.8: Structure of Giotto object containing two datasets (left) and cell metadata on the left. Note the addition of multiple images and the addition of the list_ID column to define the dataset. 12.8 Analyzing joined objects 12.8.1 Normalization and adding statistics Now that the objects have been joined we can analyze the object as if it was a single object. This means all of the analyses will be performed in parallel. Therefore, all of the filtering and normalization will be identical between datasets, retaining the ability for direct comparisons between datasets. # subset on in-tissue spots metadata <- pDataDT(combined_pros) in_tissue_barcodes <- metadata[in_tissue == 1]$cell_ID combined_pros <- subsetGiotto(combined_pros, cell_ids = in_tissue_barcodes) ## filter combined_pros <- filterGiotto(gobject = combined_pros, expression_threshold = 1, feat_det_in_min_cells = 50, min_det_feats_per_cell = 500, expression_values = "raw", verbose = TRUE) ## normalize combined_pros <- normalizeGiotto(gobject = combined_pros, scalefactor = 6000) ## add gene & cell statistics combined_pros <- addStatistics(gobject = combined_pros, expression_values = "raw") ## visualize spatPlot2D(gobject = combined_pros, cell_color = "nr_feats", color_as_factor = FALSE, point_size = 1, show_image = TRUE, image_name = c("NP-image", "CP-image"), save_param = list(save_name = "ses3_1_feat_expression")) After performing the addStatistics() function on both the datasets we can see the relative expression for each spot in both samples. Figure 12.9: Unique feat expression for visium spots for both prostate samples. 12.8.2 Clustering the datasets Since we shifted the objects within space the spatial networks for each dataset will remain separate, assuming that the lower limits for neighbors is smaller than the distance of each dataset. However, the individual spot clustering will be performed on all spots from both datasets as if they were a single object, meaning that the same cell types between objects should be clustered together ## PCA ## combined_pros <- calculateHVF(gobject = combined_pros) combined_pros <- runPCA(gobject = combined_pros, center = TRUE, scale_unit = TRUE) ## cluster and run UMAP ## # sNN network (default) combined_pros <- createNearestNetwork(gobject = combined_pros, dim_reduction_to_use = "pca", dim_reduction_name = "pca", dimensions_to_use = 1:10, k = 15) # Leiden clustering combined_pros <- doLeidenCluster(gobject = combined_pros, resolution = 0.2, n_iterations = 200) # UMAP combined_pros <- runUMAP(combined_pros) 12.8.3 Vizualizing spatial location of clusters We can visualize the clusters determined through Leiden clustering on both of the datasets within the same plot. spatDimPlot2D(gobject = combined_pros, cell_color = "leiden_clus", show_image = TRUE, image_name = c("NP-image", "CP-image"), save_param = list(save_name = "ses3_1_leiden_clus")) Figure 12.10: UMAP (top) for both samples colored by Leiden clusters visualized in a spatial plot (bottom) for the normal prostate (left) and the adenocarcinoma prostate sample (right). 12.8.4 Vizualizing tissue contribution to clusters We can also color the UMAP to visualize the contribution from each tissue in the UMAP. To do this we color the UMAP by “list_ID” rather than “leiden_clus”. If each of the cell types between both samples cluster together then we would expect that clusters should contain the cell color of both samples. However, we can see that the samples are clustered distinctly within the UMAP. This indicates that the cell types shared between both samples are found within different clusters indicating that more complex integration techniques might be required for these samples. spatDimPlot2D(gobject = combined_pros, cell_color = "list_ID", show_image = TRUE, image_name = c("NP-image", "CP-image"), save_param = list(save_name = "ses3_1_tissue_contribution")) Figure 12.11: Tissue contribution for leiden clustering for the normal prostate (left) and the adenocarcinoma prostate sample (right). 12.9 Perform Harmony and default workflows Figure 12.12: Overview of how Harmony aligns multiple datasets. First cluster cells, then get the centroids and apply a dataset correction factor then move cells based on the soft cluster membership. (Korsunsky et al. 2019) We can use Harmony to integrate multiple datasets, grouping equivelent cell types between samples. Harmony is an algorithm that iteratively adjusts cell coordinates in a reduced-dimensional space to correct for dataset-specific effects. It uses fuzzy clustering to assign cells to multiple clusters, calculates dataset-specific correction factors, and applies these corrections to each cell, repeating the process until the influence of the dataset diminishes. Performing Harmony only affects the PCA space and does not alter gene expression. Before running Harmony we need to run the PCA function or set “do_pca” to TRUE. We ran this above so do not need to perform this step. Harmony will default to attempting 10 rounds of integration. Not all samples will need the full 10 and will finish accordingly. The following dataset should converge after 5 iterations. Harmony variables” theta: A parameter that controls the diversity within clusters, with higher values leading to more diverse clusters and a value of zero not encouraging any diversity. sigma: Determines the width of soft k-means clusters, with larger values allowing cells to belong to more clusters and smaller values making the clustering approach more rigid. lambda: A penalty parameter for ridge regression that helps prevent overcorrection, where larger values offer more protection, and it can be automatically estimated if set to NULL. nclust: Specifies the number of clusters in the model. library(harmony) ## run harmony integration combined_pros <- runGiottoHarmony(combined_pros, vars_use = "list_ID", do_pca = FALSE, sigma = 0.1, theta = 2, lambda = 1, nclust = NULL) After running the Harmony function successfully we can see that the outputted gobject has a new dim reduction names “harmony”. We can use this for all subsequent spatial steps. Figure 12.13: Data structure of the gobject after running Harmony integration. 12.9.1 Clustering harmonized object We can now perform the same clustering steps as before but instead using the “harmony” dim reduction rather than PCA. We will also be creating new UMAP and nearest network data for the gobject that will be named differently to before to preserve the original analyses. If using the same name then this will overwrite the original analysis. ## sNN network (default) combined_pros <- createNearestNetwork(gobject = combined_pros, dim_reduction_to_use = "harmony", dim_reduction_name = "harmony", name = "NN.harmony", dimensions_to_use = 1:10, k = 15) ## Leiden clustering combined_pros <- doLeidenCluster(gobject = combined_pros, network_name = "NN.harmony", resolution = 0.2, n_iterations = 1000, name = "leiden_harmony") # UMAP dimension reduction combined_pros <- runUMAP(combined_pros, dim_reduction_name = "harmony", dim_reduction_to_use = "harmony", name = "umap_harmony") spatDimPlot2D(gobject = combined_pros, dim_reduction_to_use = "umap", dim_reduction_name = "umap_harmony", cell_color = "leiden_harmony", show_image = TRUE, image_name = c("NP-image", "CP-image"), spat_point_size = 1, save_param = list(save_name = "leiden_clustering_harmony")) We can see a different UMAP and clustering to that seen in the original steps above. We can again map these onto the tissue spots and see where the clusters are spatially. Figure 12.14: Leiden clustering after harmony was performed for the normal prostate (left) and the adenocarcinoma prostate sample (right). 12.9.2 Vizualizing the tissue contribution We can see that after performing harmony that the clusters from the two tissue samples are now clustered together. There is still a cluster that is unique to the adenocarcinoma sample, however this is expected as this represents the visium spots that cover the tumor regions of the tissue, which are not found in the normal tissue. spatDimPlot2D(gobject = combined_pros, dim_reduction_to_use = "umap", dim_reduction_name = "umap_harmony", cell_color = "list_ID", save_plot = TRUE, save_param = list(save_name = "leiden_clustering_harmony_contribution")) Figure 12.15: Tissue contribution for leiden clustering after harmony for the normal prostate (left) and the adenocarcinoma prostate sample (right). "],["spatial-multi-modal-analysis.html", "13 Spatial multi-modal analysis 13.1 Overview 13.2 Spatial manipulation 13.3 Examples of the simple transforms with a giottoPolygon 13.4 Affine transforms 13.5 Image transforms 13.6 The practical usage of multi-modality co-registration", " 13 Spatial multi-modal analysis George Chen Junxiang Xu August 7th 2024 13.1 Overview Spatial multimodal datasets are created when there is more than one modality available for a single piece of tissue. One way that these datasets can be assembled is by performing multiple spatial assays on closely adjacent tissue sections or ideally the same section. However, for these datasets, in addition to the usual expression space integration, we must also first spatially align them. 13.2 Spatial manipulation Performing spatial analyses across any two sections of tissue from the same block requires that data to be spatially aligned into a common coordinate space. Minute differences during the sectioning process from the cutting motion to how long an FFPE section was floated can result in even neighboring sections being distorted when compared side-by-side. These differences make it difficult to assemble multislice and/or cross platform multimodal datasets into a cohesive 3D volume. The solution for this is to perform registration across either the dataset images or expression information. Based on the registration results, both the raster images and vector feature and polygon information can be aligned into a continuous whole. Ideally this registration will be a free deformation based on sets of control points or a deformation matrix, however affine transforms already provide a good approximation. In either case, the transform or deformation applied must work in the same way across both raster and vector information. Giotto provides spatial classes and methods for easy manipulation of data with 2D affine transformations. These functionalities are all available from GiottoClass. 13.2.1 Spatial transforms: We support simple transformations and more complex affine transformations which can be used to combine and encode more than one simple transform. spatShift() - translations spin() - rotations (degrees) rescale() - scaling flip() - flip vertical or horizontal across arbitrary lines t() - transpose shear() - shear transform affine() - affine matrix transform 13.2.2 Spatial utilities: Helpful functions for use alongside these spatial transforms are ext() for finding the spatial bounding box of where your data object is, crop() for cutting out a spatial region of the data, and plot() for terra/base plots of the data. ext() - spatial extent or bounding box crop() - cut out a spatial region of the data plot() - plot a spatial object 13.2.3 Spatial classes: Giotto’s spatial subobjects respond to the above functions. The Giotto object itself can also be affine transformed. spatLocsObj - xy centroids spatialNetworkObj - spatial networks between centroids giottoPoints - xy feature point detections giottoPolygon - spatial polygons giottoImage (mostly deprecated) - magick-based images giottoLargeImage/giottoAffineImage - terra-based images affine2d - affine matrix container giotto - giotto analysis object # load in data library(Giotto) g <- GiottoData::loadGiottoMini("vizgen") activeSpatUnit(g) <- "aggregate" gpoly <- getPolygonInfo(g, return_giottoPolygon = TRUE) gimg <- getGiottoImage(g) 13.3 Examples of the simple transforms with a giottoPolygon rain <- rainbow(nrow(gpoly)) line_width <- 0.3 # par to setup the grid plotting layout p <- par(no.readonly = TRUE) par(mfrow=c(3,3)) gpoly |> plot(main = "no transform", col = rain, lwd = line_width) gpoly |> spatShift(dx = 1000) |> plot(main = "spatShift(dx = 1000)", col = rain, lwd = line_width) gpoly |> spin(45) |> plot(main = "spin(45)", col = rain, lwd = line_width) gpoly |> rescale(fx = 10, fy = 6) |> plot(main = "rescale(fx = 10, fy = 6)", col = rain, lwd = line_width) gpoly |> flip(direction = "vertical") |> plot(main = "flip()", col = rain, lwd = line_width) gpoly |> t() |> plot(main = "t()", col = rain, lwd = line_width) gpoly |> shear(fx = 0.5) |> plot(main = "shear(fx = 0.5)", col = rain, lwd = line_width) par(p) 13.4 Affine transforms The above transforms are all simple to understand in how they work, but you can imagine that performing them in sequence on your dataset can be computationally expensive. Luckily, the above operations are all affine transformation, and they can be condensed into a single step. Affine transforms where the x and y values undergo a linear transform. These transforms in 2D, can all be represented as a 2x2 matrix or 2x3 if the xy translation values are included. To perform the linear transform, the xy coordinates just need to be matrix multiplied by the 2x2 affine matrix. The resulting values should then be added to the translate values. Due to the nature of matrix multiplication, you can simply multiply the affine matrices with each other and when the xy coordinates are multiplied by the resulting matrix, it performs both linear transforms in the same step. Giotto provides a utility affine2d S4 class that can be created from any affine matrix and responds to the affine transform functions to simplify this accumulation of simple transforms. Once done, the affine2d can be applied to spatial objects in a single step using affine() in the same way that you would use a matrix. # create affine2d aff <- affine() # when called without params, this is the same as affine(diag(c(1, 1))) The affine2d object also has an anchor spatial extent, which is used in calculations of the translation values. affine2d generates with a default extent, but a specific one matching that of the object you are manipulating (such as that of the giottoPolygon) should be set. aff@anchor <- ext(gpoly) aff <- initialize(aff) # append several simple transforms aff <- aff |> spatShift(dx = 1000) |> spin(45, x0 = 0, y0 = 0) |> # without the x0, y0 params, the extent center is used rescale(10, x0 = 0, y0 = 0) |> # without the x0, y0 params, the extent center is used flip(direction = "vertical") |> t() |> shear(fx = 0.5) force(aff) <affine2d> anchor : 6399.24384990901, 6903.24298517207, -5152.38959073896, -4694.86823300896 (xmin, xmax, ymin, ymax) rotate : -0.785398163397448 (rad) shear : 0.5, 0 (x, y) scale : 10, 10 (x, y) translate : 963.028150700062, 7071.06781186548 (x, y) The show() function displays some information about the stored affine transform, including a set of decomposed simple transformations. You can then plot the affine object and see a projection of the spatial transform where blue is the starting position and red is the end. plot(aff) We can then apply the affine transforms to the giottoPolygon to see that it indeed in the location and orientation that the projection suggests. gpoly |> affine(aff) |> plot(main = "affine()", col = rain, lwd = line_width) 13.5 Image transforms Giotto uses giottoLargeImages as the core image class which is based on terra SpatRaster. Images are not loaded into memory when the object is generated and instead an amount of regular sampling appropriate to the zoom level requested is performed at time of plotting. spatShift() and rescale() operations are supported by terra SpatRaster, and we inherit those functionalities. spin(), flip(), t(), shear(), affine() operations will coerce giottoLargeImage to giottoAffineImage, which is much the same, except it contains an affine2d object that tracks spatial manipulations performed, so that they can be applied through magick::image_distort() processing after sampled values are pulled into memory. giottoAffineImage also has alternative ext() and crop() methods so that those operations respect both the expected post-affine space and un-transformed source image. # affine transform of image info matches with polygon info gimg |> affine(aff) |> plot() gpoly |> affine(aff) |> plot(add = TRUE, border = "cyan", lwd = 0.3) # affine of the giotto object g |> affine(aff) |> spatInSituPlotPoints( show_image = TRUE, feats = list(rna = c("Adgrl1", "Gfap", "Ntrk3", "Slc17a7")), feats_color_code = rainbow(4), polygon_color = "cyan", polygon_line_size = 0.1, point_size = 0.1, use_overlap = FALSE ) Currently giotto image objects are not fully compatible with .ome.tif files. terra which relies on gdal drivers for image loading will find that the Gtiff driver opens some .ome.tif images, but fails when certain compressions (notably JP2000 as used by 10x for their single-channel stains) are used. 13.6 The practical usage of multi-modality co-registration 13.6.1 Example dataset: Xenium Breast Cancer pre-release pack 10X Genomics Released a comprehensive dataset on 2022. To capture spatial structure by complementing different spatial resolutions and modalities across different assays, they provided a dataset with Xenium in situ transcriptomics data, together with Visium on closely adjacent sections. Additional IF staining was also performed on the Xenium slides. For more information, please refer to the pre-release dataset page as well as the publication. Visium H&E Histology 55um spot level expression with transcriptome coverage Xenium H&E Histology IF image staining DAPI, HER2 and CD20 in situ transcripts cooresponding centroid locations The goal of creating this multi-modal dataset is to register all the modalities listed above to the same coordinate system as Xenium in situ transcripts as the coordinate represents a certain micron distance. library(Giotto) instrs <- createGiottoInstructions(save_dir = file.path(getwd(),'/img/03_session2/'), save_plot = TRUE, show_plot = TRUE) options(timeout = 999999) download_dir <-file.path(getwd(),'/data/03_session2/') destfile <- file.path(download_dir,'Multimodal_registration.zip') if (!dir.exists(download_dir)) { dir.create(download_dir, recursive = TRUE) } download.file('https://zenodo.org/records/13208139/files/Multimodal_registration.zip?download=1', destfile = destfile) unzip(paste0(download_dir,'/Multimodal_registration.zip'), exdir = download_dir) Xenium_dir <- paste0(download_dir,'/Xenium/') Visium_dir <- paste0(download_dir,'/Visium/') 13.6.2 Target Coordinate system Xenium transcripts, polygon information and corresponding centroids are output from the Xenium instrument and are in the same coordinate system from the raw output. We can start with checking the centroid information as a representation of the target coordinate system. xen_cell_df <- read.csv(paste0(Xenium_dir,"/cells.csv.gz")) xen_cell_pl <- ggplot2::ggplot() + ggplot2::geom_point(data = xen_cell_df, ggplot2::aes(x = x_centroid , y = y_centroid),size = 1e-150,,color = 'orange') + ggplot2::theme_classic() xen_cell_pl 13.6.3 Visium to register Load the Visium directory using the Giotto convenience function, note that here we are using the “tissue_hires_image.png” as a image to plot. Using the convenience function, hires image and scale factor stored in the spaceranger will be used for automatic alignment while the Giotto Visium Object creation. SpatPlot2D provided by Giotto will random sample pixels from the image you provide, thus providing microscopic image as the image input for createGiottoVisiumObject() will improve the visual performance for downstream registration G_visium <- createGiottoVisiumObject(visium_dir = Visium_dir, gene_column_index = 2, png_name = 'tissue_hires_image.png', instructions = NULL) # In the meantime, calculate statistics for easier plot showing G_visium <- normalizeGiotto(G_visium) G_visium <- addStatistics(G_visium) V_origin <- spatPlot2D(G_visium,show_image = T,point_size = 0,return_plot = T) V_origin The Visium Object needs to be transformed to the same orientation as target coordinate system, so we perform the first transform. # create affine2d aff <- affine(diag(c(1,1))) aff <- aff |> spin(90) |> flip(direction = "horizontal") force(aff) # Apply the transform V_tansformed <- affine(G_visium,aff) spatplot_to_register <- spatPlot2D(V_tansformed,show_image = T,point_size = 0,return_plot = T) spatplot_to_register Landmarks are considered to be a set of points that are defining same location from two different resources. They are very helpful to be used as anchors to create affine transformtion. For example, after the affine transformation source landmarks should be as close to target landmarks as possible. Since images from different modalities can share similar morphology, the easiest way is to pin landmarks at the morphological identities shared between images. Giotto provides a interactive landmark selection tool to pin landmarks, two input plots can be generated from a ggplot object, a GiottoLargeImage object, or a path to a image you want to register for. Note that if you directly provide image path, you will need to create a separate GiottoLargeImage to perform transformation, and make sure the GiottoLargeImage has the same coordinate system as shown in the shiny app. landmarks <- interactiveLandmarkSelection(spatplot_to_register, xen_cell_pl) Now, use the landmarks to estimate the transformation matrix needed, and to register the Giotto Visium Object to the target coordinate system. For reproducibility purpose, the landmarks used in the chunck below will be loaded from saved result. landmarks<- readRDS(paste0(Xenium_dir,'/Visium_to_Xen_Landmarks.rds')) affine_mtx <- calculateAffineMatrixFromLandmarks(landmarks[[1]],landmarks[[2]]) V_final <- affine(G_visium,affine_mtx %*% aff@affine) spatplot_final <- spatPlot2D(V_final,show_image = T,point_size = 0,show_plot = F) spatplot_final + ggplot2::geom_point(data = xen_cell_df, ggplot2::aes(x = x_centroid , y = y_centroid),size = 1e-150,,color = 'orange') + ggplot2::theme_classic() 13.6.3.1 Create Pseudo Visium dataset for comparison Giotto provides a way to create different shapes on certain locations, we can use that to create a pseudo-visium polygons to aggregate transcripts or image intensities. To do that, we will need the centroid locations, which can be get using getSpatialLocations(). and also the radius information to create circles. We know that Visium certer to center distance is 100um and spot diameter is 55um, thus we can estimate the radius from certer to center distance. And we can use a spatial network created by nearest neighbor = 2 to capture the distance. V_final <- createSpatialNetwork(V_final, k = 1) spat_network <- getSpatialNetwork(V_final,output = 'networkDT') spatPlot2D(V_final, show_network = T, network_color = 'blue', point_size = 1) center_to_center <- min(spat_network$distance) radius <- center_to_center*55/200 Now we get the Pseudo Visium polygons Visium_centroid <- getSpatialLocations(V_final,output = 'data.table') stamp_dt <- circleVertices(radius = radius, npoints = 100) pseudo_visium_dt <- polyStamp(stamp_dt, Visium_centroid) pseudo_visium_poly <- createGiottoPolygonsFromDfr(pseudo_visium_dt,calc_centroids = T) plot(pseudo_visium_poly) Create Xenium object with pseudo Visium polygon. To save run time, example shown here only have MS4A1 and ERBB2 genes to create Giotto points xen_transcripts <- data.table::fread(paste0(Xenium_dir,'/Xen_2_genes.csv.gz')) gpoints <- createGiottoPoints(xen_transcripts) Xen_obj <-createGiottoObjectSubcellular(gpoints = list('rna' = gpoints), gpolygons = list('visium' = pseudo_visium_poly)) Get gene expression information by overlapping polygon to points Xen_obj <- calculateOverlap(Xen_obj, feat_info = 'rna', spatial_info = 'visium') Xen_obj <- overlapToMatrix(x = Xen_obj, type = "point", poly_info = "visium", feat_info = "rna", aggr_function = "sum") Manipulate the expression for plotting Xen_obj <- filterGiotto(Xen_obj, feat_type = 'rna', spat_unit = 'visium', expression_threshold = 1, feat_det_in_min_cells = 0, min_det_feats_per_cell = 1) tmp_exprs <- getExpression(Xen_obj, feat_type = 'rna', spat_unit = 'visium', output = 'matrix') Xen_obj <- setExpression(Xen_obj, x = createExprObj(log(tmp_exprs+1)), feat_type = 'rna', spat_unit = 'visium', name = 'plot') spatFeatPlot2D(Xen_obj, point_size = 3.5, expression_values = 'plot', show_image = F, feats = 'ERBB2') Subset the registered Visium and plot same gene #get the extent of giotto points, xmin, xmax, ymin, ymax subset_extent <- ext(gpoints@spatVector) sub_visium <- subsetGiottoLocs(V_final, x_min = subset_extent[1], x_max = subset_extent[2], y_min = subset_extent[3], y_max = subset_extent[4]) spatFeatPlot2D(sub_visium, point_size = 2, expression_values = 'scaled', show_image = F, feats = 'ERBB2') 13.6.4 Register post-Xenium H&E and IF image For Xenium instrument output, Giotto provide a convenience function to load the output from the Xenium ranger output. Note that 10X created the affine image alignment file by applying rotation, scale at (0,0) of the top left corner and translation last. Thus, it will look different than the affine matrix created from landmarks above. In this example, we used a 0.05X compressed ometiff and the alignment file is also create by first rescale at 20X, then apply the affine matrix provided by 10X Genomics. HE_xen <- read10xAffineImage(file = paste0(Xenium_dir, "/HE_ome_compressed.tiff"), imagealignment_path = paste0(Xenium_dir,"/Xenium_he_imagealignment.csv"), micron = 0.2125) plot(HE_xen) The image is still on the top left corner, so we flip the image to make it align with the target coordinate system. We can also save the transformed image raster by re-sample all pixel from the original image, and write it to a file on disk for future use. HE_xen <- HE_xen |> flip(direction = "vertical") gimg_rast <- HE_xen@funs$realize_magick(size = prod(dim(HE_xen))) plot(gimg_rast) #terra::writeRaster(gimg_rast@raster_object, filename = output, gdal = "COG" # save as GeoTIFF with extent info) Now we can check the registration results. GiottoVisuals provide a function to plot a giottoLargeImage to a ggplot object in order to plot additional layers of ggplots gg <- ggplot2::ggplot() pl <- GiottoVisuals::gg_annotation_raster(gg,gimg_rast) pl + ggplot2::geom_smooth() + ggplot2::geom_point(data = xen_cell_df, ggplot2::aes(x = x_centroid , y = y_centroid),size = 1e-150,,color = 'orange') + ggplot2::theme_classic() 13.6.4.1 Add registered image information and compare RNA vs protein expression With the strategy described above, affine transformed image can be saved and used for quantitive analysis. Here, we can use the same strategy as dealing with spatial proteomics data for IF CD20_gimg <- createGiottoLargeImage(paste0(Xenium_dir,'/CD20_registered.tiff'), use_rast_ext = T,name = 'CD20') HER2_gimg <- createGiottoLargeImage(paste0(Xenium_dir,'/HER2_registered.tiff'), use_rast_ext = T,name = 'HER2') Xen_obj <- addGiottoLargeImage(gobject = Xen_obj, largeImages = list('CD20' = CD20_gimg,'HER2' = HER2_gimg)) Get the cell polygons, as Xenium and IF are both subcellular resolution cellpoly_dt <- data.table::fread(paste0(Xenium_dir,'/cell_boundaries.csv.gz')) colnames(cellpoly_dt) <- c('poly_ID','x','y') cellpoly <- createGiottoPolygonsFromDfr(cellpoly_dt) Xen_obj <- addGiottoPolygons(Xen_obj,gpolygons = list('cell' = cellpoly)) Compute the gene expression matrix by overlay the cell polygons and giotto points. Xen_obj <- calculateOverlap(Xen_obj, feat_info = 'rna', spatial_info = 'cell') Xen_obj <- overlapToMatrix(x = Xen_obj, type = "point", poly_info = "cell", feat_info = "rna", aggr_function = "sum") tmp_exprs <- getExpression(Xen_obj, feat_type = 'rna', spat_unit = 'cell', output = 'matrix') Xen_obj <- setExpression(Xen_obj, x = createExprObj(log(tmp_exprs+1)), feat_type = 'rna', spat_unit = 'cell', name = 'plot') spatFeatPlot2D(Xen_obj, feat_type = 'rna', expression_values = 'plot', spat_unit = 'cell', feats = 'ERBB2', point_size = 0.05) Now we overlay the HER2 expression from the raster image with the cell polygons. Xen_obj <- calculateOverlap(Xen_obj, spatial_info = 'cell', image_names = c('HER2','CD20')) Xen_obj <- overlapToMatrix(x = Xen_obj, type = "intensity", poly_info = "cell", feat_info = "protein", aggr_function = "sum") tmp_exprs <- getExpression(Xen_obj, feat_type = 'protein', spat_unit = 'cell', output = 'matrix') Xen_obj <- setExpression(Xen_obj, x = createExprObj(log(tmp_exprs+1)), feat_type = 'protein', spat_unit = 'cell', name = 'plot') spatFeatPlot2D(Xen_obj, feat_type = 'protein', expression_values = 'plot', spat_unit = 'cell', feats = 'HER2', point_size = 0.05) We can also overlay the protein expression to Visium spots Xen_obj <- calculateOverlap(Xen_obj, spatial_info = 'visium', image_names = c('HER2','CD20')) Xen_obj <- overlapToMatrix(x = Xen_obj, type = "intensity", poly_info = "visium", feat_info = "protein", aggr_function = "sum") Xen_obj <- filterGiotto(Xen_obj, feat_type = 'protein', spat_unit = 'visium', expression_threshold = 1, feat_det_in_min_cells = 0, min_det_feats_per_cell = 1) tmp_exprs <- getExpression(Xen_obj, feat_type = 'protein', spat_unit = 'visium', output = 'matrix') Xen_obj <- setExpression(Xen_obj, x = createExprObj(log(tmp_exprs+1)), feat_type = 'protein', spat_unit = 'visium', name = 'plot') spatFeatPlot2D(Xen_obj, feat_type = 'protein', expression_values = 'plot', spat_unit = 'visium', feats = 'HER2', point_size = 2) 13.6.5 Automatic alignment via SIFT feature descriptor matching and affine transformation Pin landmarks or use compounded affine transforms to register image usually provides initial registration results. However, recording landmarks or manually combine transformations require a lot of manual effort. It will require too much effort when having a large amount of images to register. As long as accurate landmarks are provided, registration will be easy to automatically perform. Here we provide a wrapper function of Scale invariant feature transform(SIFT). SIFT will first identify the extreme points in different scale spaces from paired images, then use a brutal force way to match the points. The matched points can then be used to estimate the transform and warp the image. The major drawback is once the dimension of the image become bigger, the computing time will increase exponentially. Here, we provide an example of two compressed images to show the automatic alignment pipeline. HE <- createGiottoLargeImage(paste0(Xenium_dir,'/mini_HE.png'),negative_y = F) plot(HE) IF <- createGiottoLargeImage(paste0(Xenium_dir,'/mini_IF.tif'),negative_y = F,flip_horizontal = T) terra::plotRGB(IF@raster_object,r=1, g=2, b=3,, stretch="lin") Now, we can use the automated transformation pipeline. Note that we will start with a path to the images, run the preprocessImageToMatrix() first to meet the requirement of estimateAutomatedImageRegistrationWithSIFT() function. The images will be preprocessed to gray scale. And for that purpose, we use the DAPI channel from the miniIF, and set invert = T for mini HE as HE image so that grayscle HE will have higher value for high intensity pixels. The function will output an estimation of the transform. estimation <- estimateAutomatedImageRegistrationWithSIFT(x = preprocessImageToMatrix(paste0(Xenium_dir,'/mini_IF.tif'), flip_horizontal = T, use_single_channel = T, single_channel_number = 3), y = preprocessImageToMatrix(paste0(Xenium_dir,'/mini_HE.png'), invert = T), plot_match = T, max_ratio = 0.5,estimate_fun = 'Projective') Use the estimation, we can quickly visualize the transformation mtx <- as.matrix(estimation$params) transformed <- affine(IF, mtx) To_see_overlay <- transformed@funs$realize_magick(size = 2e6) plot(HE) plot(To_see_overlay@raster_object[[2]], add=TRUE, alpha=0.5) 13.6.6 Final Notes Image registration is becoming crucial for spatial multi modal analysis. The methods included here are not the only ways to register images, and either of them may have drawbacks for a good alignment. There are multiple tools coming out for the field with different strategies, including easier landmark detection, deformable transformation as well as matching spatial patterns, etc. Some of them provides transformed images or coordinates that can be directly loaded to Giotto as a multimodal object using a standard pipeline. "],["multi-omics-integration.html", "14 Multi-omics integration 14.1 The CytAssist technology 14.2 Introduction to the spatial dataset 14.3 Download dataset 14.4 Create the Giotto object 14.5 Subset on spots that were covered by tissue 14.6 RNA processing 14.7 Protein processing 14.8 Multi-omics integration 14.9 Session info", " 14 Multi-omics integration Joselyn Cristina Chávez Fuentes August 7th 2024 14.1 The CytAssist technology The Visium CytAssist Spatial Gene and Protein Expression assay is designed to introduce simultaneous Gene Expression and Protein Expression analysis to FFPE samples processed with Visium CytAssist. The assay uses NGS to measure the abundance of oligo-tagged antibodies with spatial resolution, in addition to the whole transcriptome and a morphological image. Figure 14.1: CystAssits multi-omics diagram. Source: 10X genomics. The 10X human immune cell profiling panel features 35 antibodies from Abcam and Biolegend, and includes cell surface and intracellular targets. The rna probes hybridize to ~18,000 genes, or RNA targets, within the tissue section to achieve whole transcriptome gene expression profiling. The remaining steps, starting with probe extension, follow the standard Visium workflow outside of the instrument. Figure 14.2: CytAssist workflow. Source: 10X genomics. 14.2 Introduction to the spatial dataset The Human Tonsil (FFPE) dataset was obtained from 10X Genomics. The tissue was sectioned as described in Visium CytAssist Spatial Gene and Protein Expression for FFPE – Tissue Preparation Guide (CG000660). 5 µm tissue sections were placed on Superfrost glass slides, deparaffinized, H&E stained (CG000658) and coverslipped. Sections were imaged, decoverslipped, followed by decrosslinking per the Staining Demonstrated Protocol (CG000658). More information about this dataset can be found here. 14.3 Download dataset You need to download the expression matrix and spatial information by running these commands: dir.create("data/03_session3") download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/2.1.0/CytAssist_FFPE_Protein_Expression_Human_Tonsil/CytAssist_FFPE_Protein_Expression_Human_Tonsil_raw_feature_bc_matrix.tar.gz", destfile = "data/03_session3/CytAssist_FFPE_Protein_Expression_Human_Tonsil_raw_feature_bc_matrix.tar.gz") download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/2.1.0/CytAssist_FFPE_Protein_Expression_Human_Tonsil/CytAssist_FFPE_Protein_Expression_Human_Tonsil_spatial.tar.gz", destfile = "data/03_session3/CytAssist_FFPE_Protein_Expression_Human_Tonsil_spatial.tar.gz") After downloading, unzip the gz files. You should get the “raw_feature_bc_matrix” and “spatial” folders inside “data/03_session3/”. untar(tarfile = "data/03_session3/CytAssist_FFPE_Protein_Expression_Human_Tonsil_raw_feature_bc_matrix.tar.gz", exdir = "data/03_session3") untar(tarfile = "data/03_session3/CytAssist_FFPE_Protein_Expression_Human_Tonsil_spatial.tar.gz", exdir = "data/03_session3") 14.4 Create the Giotto object The minimum requirements are: matrix with expression information (or the path to) x,y(,z) coordinates for cells or spots (or the path to) createGiottoVisiumObject() will automatically detect both RNA and Protein modalities in the expression matrix and will create a multi-omics Giotto object. library(Giotto) ## Set instructions results_folder <- "results/03_session3/" python_path <- NULL instructions <- createGiottoInstructions( save_dir = results_folder, save_plot = TRUE, show_plot = FALSE, return_plot = FALSE, python_path = python_path ) # Provide the path to the data folder data_path <- "data/03_session3/" # Create object directly from the data folder visium_tonsil <- createGiottoVisiumObject( visium_dir = data_path, expr_data = "raw", png_name = "tissue_lowres_image.png", gene_column_index = 2, instructions = instructions ) Print the information of the object, note that both rna and protein are listed in the expression slot. visium_tonsil 14.5 Subset on spots that were covered by tissue spatPlot2D( gobject = visium_tonsil, cell_color = "in_tissue", point_size = 2, cell_color_code = c("0" = "lightgrey", "1" = "blue"), show_image = TRUE, image_name = "image" ) Figure 14.3: Spatial plot of the CytAssist human tonsil sample, color indicates wheter the spot is in tissue (1) or not (0). Use the metadata table to identify the spots corresponding to the tissue area, given by the “in_tissue” column. Then use the spot IDs to subset the giotto object. metadata <- getCellMetadata(gobject = visium_tonsil, output = "data.table") in_tissue_barcodes <- metadata[in_tissue == 1]$cell_ID visium_tonsil <- subsetGiotto(visium_tonsil, cell_ids = in_tissue_barcodes) 14.6 RNA processing Run the Filtering, normalization, and statistics steps using only the RNA feature. visium_tonsil <- filterGiotto( gobject = visium_tonsil, expression_threshold = 1, feat_det_in_min_cells = 50, min_det_feats_per_cell = 1000, expression_values = "raw", verbose = TRUE) visium_tonsil <- normalizeGiotto(gobject = visium_tonsil, scalefactor = 6000, verbose = TRUE) visium_tonsil <- addStatistics(gobject = visium_tonsil) Dimension reduction Identify the highly variable features using the RNA features, then calculate the principal components based on the HVFs. visium_tonsil <- calculateHVF(gobject = visium_tonsil) visium_tonsil <- runPCA(gobject = visium_tonsil) Clustering Calculate the UMAP, tSNE, and shared nearest neighbor network using the first 10 principal components for the RNA modality. visium_tonsil <- runUMAP(visium_tonsil, dimensions_to_use = 1:10) visium_tonsil <- runtSNE(visium_tonsil, dimensions_to_use = 1:10) visium_tonsil <- createNearestNetwork(gobject = visium_tonsil, dimensions_to_use = 1:10, k = 30) Calculate the RNA-based Leiden clusters. visium_tonsil <- doLeidenCluster(gobject = visium_tonsil, resolution = 1, n_iterations = 1000) Visualization Plot the RNA-based UMAP with the corresponding RNA-based Leiden cluster per spot. plotUMAP(gobject = visium_tonsil, cell_color = "leiden_clus", show_NN_network = TRUE, point_size = 2) Figure 14.4: RNA UMAP, color indicates the RNA-based Leiden clusters. Plot the spatial distribution of the RNA-based Leiden cluster per spot. spatPlot2D(gobject = visium_tonsil, show_image = TRUE, cell_color = "leiden_clus", point_size = 3) Figure 14.5: Spatial distribution of RNA-based Leiden clusters. 14.7 Protein processing Run the Filtering, normalization, and statistics steps for the protein modality. visium_tonsil <- filterGiotto(gobject = visium_tonsil, spat_unit = "cell", feat_type = "protein", expression_threshold = 1, feat_det_in_min_cells = 50, min_det_feats_per_cell = 1, expression_values = "raw", verbose = TRUE) visium_tonsil <- normalizeGiotto(gobject = visium_tonsil, spat_unit = "cell", feat_type = "protein", scalefactor = 6000, verbose = TRUE) visium_tonsil <- addStatistics(gobject = visium_tonsil, spat_unit = "cell", feat_type = "protein") Dimension reduction Calculate the principal components using all the proteins available in the dataset. visium_tonsil <- runPCA(gobject = visium_tonsil, spat_unit = "cell", feat_type = "protein") Clustering Calculate the UMAP, tSNE, and shared nearest neighbors network using the first 10 principal components for the Protein modality. visium_tonsil <- runUMAP(visium_tonsil, spat_unit = "cell", feat_type = "protein", dimensions_to_use = 1:10) visium_tonsil <- runtSNE(visium_tonsil, spat_unit = "cell", feat_type = "protein", dimensions_to_use = 1:10) visium_tonsil <- createNearestNetwork(gobject = visium_tonsil, spat_unit = "cell", feat_type = "protein", dimensions_to_use = 1:10, k = 30) Calculate the Protein-based Leiden clusters. visium_tonsil <- doLeidenCluster(gobject = visium_tonsil, spat_unit = "cell", feat_type = "protein", resolution = 1, n_iterations = 1000) Visualization Plot the Protein UMAP and color the spots using the Protein-based Leiden clusters. plotUMAP(gobject = visium_tonsil, spat_unit = "cell", feat_type = "protein", cell_color = "leiden_clus", show_NN_network = TRUE, point_size = 2) Figure 14.6: Protein UMAP, color indicates the Protein-based Leiden clusters. Plot the spatial distribution of the Protein-based Leiden clusters. spatPlot2D(gobject = visium_tonsil, spat_unit = "cell", feat_type = "protein", show_image = TRUE, cell_color = "leiden_clus", point_size = 3) Figure 14.7: Spatial distribution of Protein-based Leiden clusters. 14.8 Multi-omics integration Calculate kNN Calculate the k nearest neighbors network for each modality (RNA and Protein), using the first 10 principal components of each feature type. ## RNA modality visium_tonsil <- createNearestNetwork(gobject = visium_tonsil, type = "kNN", dimensions_to_use = 1:10, k = 20) ## Protein modality visium_tonsil <- createNearestNetwork(gobject = visium_tonsil, spat_unit = "cell", feat_type = "protein", type = "kNN", dimensions_to_use = 1:10, k = 20) Run WNN Run the Weighted Nearest Neighbor analysis to weight the contribution of each feature type per spot. The results will be saved in the multiomics slot of the giotto object. visium_tonsil <- runWNN(visium_tonsil, spat_unit = "cell", modality_1 = "rna", modality_2 = "protein", pca_name_modality_1 = "pca", pca_name_modality_2 = "protein.pca", k = 20, integrated_feat_type = NULL, matrix_result_name = NULL, w_name_modality_1 = NULL, w_name_modality_2 = NULL, verbose = TRUE) Run Integrated umap Calculate the UMAP using the weights of each feature per spot. visium_tonsil <- runIntegratedUMAP(visium_tonsil, modality1 = "rna", modality2 = "protein", spread = 7, min_dist = 1, force = FALSE) Calculate integrated clusters Calculate the multiomics-based Leiden clusters using the weights of each feature per spot. visium_tonsil <- doLeidenCluster(gobject = visium_tonsil, spat_unit = "cell", feat_type = "rna", nn_network_to_use = "kNN", network_name = "integrated_kNN", name = "integrated_leiden_clus", resolution = 1) Visualize the integrated UMAP Plot the integrated UMAP and color the spots using the integrated Leiden clusters. plotUMAP(gobject = visium_tonsil, spat_unit = "cell", feat_type = "rna", cell_color = "integrated_leiden_clus", dim_reduction_name = "integrated.umap", point_size = 2, title = "Integrated UMAP using Integrated Leiden clusters") Figure 14.8: Integrated UMAP. Color represents the integrated Leiden clusters. Visualize spatial plot with integrated clusters Plot the spatial distribution of the integrated Leiden clusters. spatPlot2D(visium_tonsil, spat_unit = "cell", feat_type = "rna", cell_color = "integrated_leiden_clus", point_size = 3, show_image = TRUE, title = "Integrated Leiden clustering") Figure 14.9: Spatial distribution of the integrated Leiden clusters. 14.9 Session info sessionInfo() R version 4.4.1 (2024-06-14) Platform: aarch64-apple-darwin20 Running under: macOS Sonoma 14.5 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: America/New_York tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Giotto_4.1.0 GiottoClass_0.3.3 loaded via a namespace (and not attached): [1] colorRamp2_0.1.0 deldir_2.0-4 [3] rlang_1.1.4 magrittr_2.0.3 [5] RcppAnnoy_0.0.22 GiottoUtils_0.1.10 [7] matrixStats_1.3.0 compiler_4.4.1 [9] png_0.1-8 systemfonts_1.1.0 [11] vctrs_0.6.5 reshape2_1.4.4 [13] stringr_1.5.1 pkgconfig_2.0.3 [15] SpatialExperiment_1.14.0 crayon_1.5.3 [17] fastmap_1.2.0 backports_1.5.0 [19] magick_2.8.4 XVector_0.44.0 [21] labeling_0.4.3 utf8_1.2.4 [23] rmarkdown_2.27 UCSC.utils_1.0.0 [25] ragg_1.3.2 purrr_1.0.2 [27] xfun_0.46 beachmat_2.20.0 [29] zlibbioc_1.50.0 GenomeInfoDb_1.40.1 [31] jsonlite_1.8.8 DelayedArray_0.30.1 [33] BiocParallel_1.38.0 terra_1.7-78 [35] irlba_2.3.5.1 parallel_4.4.1 [37] R6_2.5.1 stringi_1.8.4 [39] RColorBrewer_1.1-3 reticulate_1.38.0 [41] parallelly_1.37.1 GenomicRanges_1.56.1 [43] scattermore_1.2 Rcpp_1.0.13 [45] bookdown_0.40 SummarizedExperiment_1.34.0 [47] knitr_1.48 future.apply_1.11.2 [49] R.utils_2.12.3 IRanges_2.38.1 [51] Matrix_1.7-0 igraph_2.0.3 [53] tidyselect_1.2.1 rstudioapi_0.16.0 [55] abind_1.4-5 yaml_2.3.9 [57] codetools_0.2-20 listenv_0.9.1 [59] lattice_0.22-6 tibble_3.2.1 [61] plyr_1.8.9 Biobase_2.64.0 [63] withr_3.0.0 Rtsne_0.17 [65] evaluate_0.24.0 future_1.33.2 [67] pillar_1.9.0 MatrixGenerics_1.16.0 [69] checkmate_2.3.1 stats4_4.4.1 [71] plotly_4.10.4 generics_0.1.3 [73] dbscan_1.2-0 sp_2.1-4 [75] S4Vectors_0.42.1 ggplot2_3.5.1 [77] munsell_0.5.1 scales_1.3.0 [79] globals_0.16.3 gtools_3.9.5 [81] glue_1.7.0 lazyeval_0.2.2 [83] tools_4.4.1 GiottoVisuals_0.2.4 [85] data.table_1.15.4 ScaledMatrix_1.12.0 [87] cowplot_1.1.3 grid_4.4.1 [89] tidyr_1.3.1 colorspace_2.1-0 [91] SingleCellExperiment_1.26.0 GenomeInfoDbData_1.2.12 [93] BiocSingular_1.20.0 rsvd_1.0.5 [95] cli_3.6.3 textshaping_0.4.0 [97] fansi_1.0.6 S4Arrays_1.4.1 [99] viridisLite_0.4.2 dplyr_1.1.4 [101] uwot_0.2.2 gtable_0.3.5 [103] R.methodsS3_1.8.2 digest_0.6.36 [105] BiocGenerics_0.50.0 SparseArray_1.4.8 [107] ggrepel_0.9.5 farver_2.1.2 [109] rjson_0.2.21 htmlwidgets_1.6.4 [111] htmltools_0.5.8.1 R.oo_1.26.0 [113] lifecycle_1.0.4 httr_1.4.7 "],["interoperability-with-other-frameworks.html", "15 Interoperability with other frameworks 15.1 Load Giotto object 15.2 Seurat 15.3 SpatialExperiment 15.4 AnnData 15.5 Create mini Vizgen object", " 15 Interoperability with other frameworks Iqra August 7th 2024 Giotto facilitates seamless interoperability with various tools, including Seurat, AnnData, and SpatialExperiment. Below is a comprehensive tutorial on how Giotto interoperates with these other tools. 15.1 Load Giotto object To begin demonstrating the interoperability of a Giotto object with other frameworks, we first load the required libraries and a Giotto mini object. We then proceed with the conversion process: library(Giotto) library(GiottoData) Load a Giotto mini Visium object, which will be used for demonstrating interoperability. gobject <- GiottoData::loadGiottoMini("visium") 15.2 Seurat Giotto Suite provides interoperability between Seurat and Giotto, supporting both older and newer versions of Seurat objects. The four tailored functions are giottoToSeuratV4(), seuratToGiottoV4() for older versions, and giottoToSeuratV5(), seuratToGiottoV5() for Seurat v5, which includes subcellular and image information. These functions map Giotto’s metadata, dimension reductions, spatial locations, and images to the corresponding slots in Seurat. 15.2.1 Conversion of Giotto Object to Seurat Object To convert Giotto object to Seurat V5 object, we first load required libraries and use the function giottoToSeuratV5() function library(Seurat) library(SeuratData) library(ggplot2) library(patchwork) library(dplyr) Now we convert the Giotto object to a Seurat V5 object and create violin and spatial feature plots to visualize the RNA count data. gToS <- giottoToSeuratV5(gobject = gobject, spat_unit = "cell") plot1 <- VlnPlot(gToS, features = "nCount_rna", pt.size = 0.1) + NoLegend() plot2 <- SpatialFeaturePlot(gToS, features = "nCount_rna", pt.size.factor = 2) + theme(legend.position = "right") wrap_plots(plot1, plot2) 15.2.1.1 Apply SCTransform We apply SCTransform to perform data transformation on the RNA assay: SCTransform() function. gToS <- SCTransform(gToS, assay = "rna", verbose = FALSE) 15.2.1.2 Dimension Reduction We perform Principal Component Analysis (PCA), find neighbors, and run UMAP for dimensionality reduction and clustering on the transformed Seurat object: gToS <- RunPCA(gToS, assay = "SCT") gToS <- FindNeighbors(gToS, reduction = "pca", dims = 1:30) gToS <- RunUMAP(gToS, reduction = "pca", dims = 1:30) 15.2.2 Conversion of Seurat object Back to Giotto Object To Convert the Seurat Object back to Giotto object, we use the funcion seuratToGiottoV5(), specifying the spatial assay, dimensionality reduction techniques, and spatial and nearest neighbor networks. giottoFromSeurat <- seuratToGiottoV5(sobject = gToS, spatial_assay = "rna", dim_reduction = c("pca", "umap"), sp_network = "Delaunay_network", nn_network = c("sNN.pca", "custom_NN" )) 15.2.2.1 Clustering and Plotting UMAP Here we perform K-means clustering on the UMAP results obtained from the Seurat object: ## k-means clustering giottoFromSeurat <- doKmeans(gobject = giottoFromSeurat, dim_reduction_to_use = "pca") #Plot UMAP post-clustering to visualize kmeans graph2 <- Giotto::plotUMAP( gobject = giottoFromSeurat, cell_color = "kmeans", show_NN_network = TRUE, point_size = 2.5 ) 15.2.2.2 Spatial CoExpression We can also use the binSpect function to analyze spatial co-expression using the spatial network Delaunay network from the Seurat object and then visualize the spatial co-expression using the heatmSpatialCorFeat() function: ranktest <- binSpect(giottoFromSeurat, bin_method = "rank", calc_hub = TRUE, hub_min_int = 5, spatial_network_name = "Delaunay_network") ext_spatial_genes <- ranktest[1:300,]$feats spat_cor_netw_DT <- detectSpatialCorFeats( giottoFromSeurat, method = "network", spatial_network_name = "Delaunay_network", subset_feats = ext_spatial_genes) top10_genes <- showSpatialCorFeats(spat_cor_netw_DT, feats = "Dsp", show_top_feats = 10) spat_cor_netw_DT <- clusterSpatialCorFeats(spat_cor_netw_DT, name = "spat_netw_clus", k = 7) heatmSpatialCorFeats( giottoFromSeurat, spatCorObject = spat_cor_netw_DT, use_clus_name = "spat_netw_clus", heatmap_legend_param = list(title = NULL), save_plot = TRUE, show_plot = TRUE, return_plot = FALSE, save_param = list(base_height = 6, base_width = 8, units = 'cm')) 15.3 SpatialExperiment For the Bioconductor group of packages, the SpatialExperiment data container handles data from spatial-omics experiments, including spatial coordinates, images, and metadata. Giotto Suite provides giottoToSpatialExperiment() and spatialExperimentToGiotto(), mapping Giotto’s slots to the corresponding SpatialExperiment slots. Since SpatialExperiment can only store one spatial unit at a time, giottoToSpatialExperiment() returns a list of SpatialExperiment objects, each representing a distinct spatial unit. To start the conversion of a Giotto mini Visium object to a SpatialExperiment object, we first load the required libraries. library(SpatialExperiment) library(ggspavis) library(pheatmap) library(scater) library(scran) library(nnSVG) 15.3.1 Convert Giotto Object to SpatialExperiment Object To convert the Giotto object to a SpatialExperiment object, we use the giottoToSpatialExperiment() function. gspe <- giottoToSpatialExperiment(gobject) The conversion function returns a separate SpatialExperiment object for each spatial unit. We select the first object for downstream use: spe <- gspe[[1]] 15.3.1.1 Identify top spatially variable genes with nnSVG We employ the nnSVG package to identify the top spatially variable genes in our SpatialExperiment object. Covariates can be added to our model; in this example, we use Leiden clustering labels as a covariate: # One of the assays should be "logcounts" # We rename the normalized assay to "logcounts" assayNames(spe)[[2]] <- "logcounts" # Create model matrix for leiden clustering labels X <- model.matrix(~ colData(spe)$leiden_clus) dim(X) Run nnSVG This step will take several minutes to run spe <- nnSVG(spe, X = X) # Show top 10 features rowData(spe)[order(rowData(spe)$rank)[1:10], ]$feat_ID 15.3.2 Conversion of SpatialExperiment object back to Giotto We then convert the processed SpatialExperiment object back into a Giotto object for further downstream analysis using the Giotto suite. This is done using the spatialExperimentToGiotto function, where we explicitly specify the spatial network from the SpatialExperiment object. giottoFromSPE <- spatialExperimentToGiotto(spe = spe, python_path = NULL, sp_network = "Delaunay_network") giottoFromSPE <- spatialExperimentToGiotto(spe = spe, python_path = NULL, sp_network = "Delaunay_network") print(giottoFromSPE) 15.3.2.1 Plotting top genes from nnSVG in Giotto Now, we visualize the genes previously identified in the SpatialExperiment object using the nnSVG package within the Giotto toolkit, leveraging the converted Giotto object. ext_spatial_genes <- getFeatureMetadata(giottoFromSPE, output = "data.table") ext_spatial_genes <- ext_spatial_genes[order(ext_spatial_genes$rank)[1:10], ]$feat_ID spatFeatPlot2D(giottoFromSPE, expression_values = "scaled_rna_cell", feats = ext_spatial_genes[1:4], point_size = 2) 15.4 AnnData The anndataToGiotto() and giottoToAnnData() functions map the slots of the Giotto object to the corresponding locations in a Squidpy-flavored AnnData object. Specifically, Giotto’s expression slot maps to adata.X, spatial_locs to adata.obsm, cell_metadata to adata.obs, feat_metadata to adata.var, dimension_reduction to adata.obsm, and nn_network and spat_network to adata.obsp. Images are currently not mapped between both classes. Notably, Giotto stores expression matrices within separate spatial units and feature types, while AnnData does not support this hierarchical data storage. Consequently, multiple AnnData objects are created from a Giotto object when there are multiple spatial unit and feature type pairs. 15.4.1 Load Required Libraries To start, we need to load the necessary libraries, including reticulate for interfacing with Python. library(reticulate) 15.4.2 Specify Path for Results First, we specify the directory where the results will be saved. Additionally, we retrieve and update Giotto instructions. # Specify path to which results may be saved results_directory <- "results/03_session4/giotto_anndata_conversion/" instrs <- showGiottoInstructions(gobject) mini_gobject <- replaceGiottoInstructions(gobject = gobject, instructions = instrs) 15.4.2.1 Create Default kNN Network We will create a k-nearest neighbor (kNN) network using mostly default parameters. gobject <- createNearestNetwork(gobject = gobject, spat_unit = "cell", feat_type = "rna", type = "kNN", dim_reduction_to_use = "umap", dim_reduction_name = "umap", k = 15, name = "kNN.umap") 15.4.3 Giotto To AnnData To convert the giotto object to AnnData, we use the Giotto’s function giottoToAnnData() gToAnnData <- giottoToAnnData(gobject = gobject, save_directory = results_directory) Next, we import scanpy and perform a series of preprocessing steps on the AnnData object. scanpy <- import("scanpy") adata <- scanpy$read_h5ad("results/03_session4/giotto_anndata_conversion/cell_rna_converted_gobject.h5ad") # Normalize total counts per cell scanpy$pp$normalize_total(adata, target_sum=1e4) # Log-transform the data scanpy$pp$log1p(adata) # Perform PCA scanpy$pp$pca(adata, n_comps=40L) # Compute the neighborhood graph scanpy$pp$neighbors(adata, n_neighbors=10L, n_pcs=40L) # Run UMAP scanpy$tl$umap(adata) # Save the processed AnnData object adata$write("results/03_session4/cell_rna_converted_gobject2.h5ad") processed_file_path <- "results/03_session4/cell_rna_converted_gobject2.h5ad" 15.4.4 Convert AnnData to Giotto Finally, we convert the processed AnnData object back into a Giotto object for further analysis using Giotto. giottoFromAnndata <- anndataToGiotto(anndata_path = processed_file_path) 15.4.4.1 UMAP Visualization Now we plot the UMAP using the GiottoVisuals::plotUMAP() function that was calculated using Scanpy on the AnnData object. Giotto::plotUMAP( gobject = giottoFromAnndata, dim_reduction_name = "umap.ad", cell_color = "leiden_clus", point_size = 3 ) 15.5 Create mini Vizgen object mini_gobject <- loadGiottoMini(dataset = "vizgen", python_path = NULL) mini_gobject <- replaceGiottoInstructions(gobject = mini_gobject, instructions = instrs) mini_gobject <- createNearestNetwork(gobject = mini_gobject, spat_unit = "aggregate", feat_type = "rna", type = "kNN", dim_reduction_to_use = "umap", dim_reduction_name = "umap", k = 6, name = "new_network") Since we have multiple spat_unit and feat_type pairs, this function will create multiple .h5ad files, with their names returned. Non-default nearest or spatial network names will have their key_added terms recorded and saved in corresponding .txt files; refer to the documentation for details. anndata_conversions <- giottoToAnnData(gobject = mini_gobject, save_directory = results_directory, python_path = NULL) "],["interoperability-with-isolated-tools.html", "16 Interoperability with isolated tools 16.1 Spatial niche trajectory analysis (ONTraC) 16.2 Session info", " 16 Interoperability with isolated tools Wen Wang August 7th 2024 16.1 Spatial niche trajectory analysis (ONTraC) 16.1.1 Introduction to ONTraC ONTraC (Ordered Niche Trajectory Construction) is a niche-centered, machine learning method for constructing spatially continuous trajectories. ONTraC differs from existing tools in that it treats a niche, rather than an individual cell, as the basic unit for spatial trajectory analysis. In this context, we define niche as a multicellular, spatially localized region where different cell types may coexist and interact with each other. ONTraC seamlessly integrates cell-type composition and spatial information by using the graph neural network modeling framework. Its output, which is called the niche trajectory, can be viewed as a one dimensional representation of the tissue microenvironment continuum. By disentangling cell-level and niche-level properties, niche trajectory analysis provides a coherent framework to study coordinated responses from all the cells in association with continuous tissue microenvironment variations. ONTraC paper ONTraC GitHub repository PPT 16.1.2 Introduction to MERFISH MERFISH is a massively multiplexed single-molecule imaging technology for spatially resolved transcriptomics capable of simultaneously measuring the copy number and spatial distribution of hundreds to tens of thousands of RNA species in individual cells. For further information, please visit the official website. 16.1.3 Settings options(timeout=Inf) # In case of network interrupt data_path <- file.path("data","03_session5") dir.create(data_path, recursive=T) results_folder <- file.path("results","03_session5") dir.create(results_folder, recursive=T) 16.1.4 Dataset This is a MERFISH mouse motor cortex dataset comprising 61 tissue sections and containing approximately 280,000 cells characterised by a 258-gene panel. The study identified 3 classes of cells, glutamatergic, GABAergic and non-neuronal cell groups, and further clustered into 23 annotated plus 1 other subclass-level cell types. Pseudotime based methods could generate one dimensional coordinates for specific lineages but lack the ability to generate trajectories for whole samples. By moving our focus from the cell to the niche (local microenvironment), ONTraC could generate niche trajectories for whole samples and map the NT score to each cell. 16.1.4.1 Dataset download The MERFISH mouse motor cortex data to run this tutorial can be found here You need to download the processed expression, metadata, and cell segmentation information by running these commands: Note 1: there are 61 slices here, we run on two of them to save the time. Note 2: due to the instability of network, download processing may be interrupt. We recommend to download these data in advance or download the processing giotto obj from Zenodo. download.file(url = "https://download.brainimagelibrary.org/cf/1c/cf1c1a431ef8d021/processed_data/counts.h5ad", destfile = file.path(data_path,"counts.h5ad")) download.file(url = "https://download.brainimagelibrary.org/cf/1c/cf1c1a431ef8d021/processed_data/cell_labels.csv", destfile = file.path(data_path,"cell_labels.csv")) download.file(url = "https://download.brainimagelibrary.org/cf/1c/cf1c1a431ef8d021/processed_data/segmented_cells_mouse2sample1.csv", destfile = file.path(data_path,"segmented_cells_mouse2sample1.csv")) download.file(url = "https://download.brainimagelibrary.org/cf/1c/cf1c1a431ef8d021/processed_data/segmented_cells_mouse2sample6.csv", destfile = file.path(data_path,"segmented_cells_mouse2sample6.csv")) 16.1.5 Create the Giotto object library(Giotto) library(reticulate) ## Set instructions python_path <- NULL instructions <- createGiottoInstructions( save_dir = results_folder, save_plot = TRUE, show_plot = FALSE, return_plot = FALSE, python_path = python_path ) ## create Giotto object from expression counts. This file contains 61 slices here. giotto_all_slices_obj <- anndataToGiotto(file.path(data_path, "counts.h5ad")) ## load meta_data meta_df <- read.csv(file.path(data_path, "cell_labels.csv"), colClasses = "character") # as the cell IDs are 30 digit numbers, set the type as character to avoid the limitation of R in handling larger integers colnames(meta_df)[[1]] <- "cell_ID" ### we use two slices here to speed up slice1_cells <- meta_df[meta_df$slice_id == "mouse2_slice229",]$cell_ID slice2_cells <- meta_df[meta_df$slice_id == "mouse2_slice300",]$cell_ID selected_cells <- c(slice1_cells, slice2_cells) ## subset giotto obj by cell ID giotto_slice1_obj <- subsetGiotto(gobject = giotto_all_slices_obj, cell_ids = slice1_cells) giotto_slice2_obj <- subsetGiotto(gobject = giotto_all_slices_obj, cell_ids = slice2_cells) ## add cell metadata giotto_slice1_obj <- addCellMetadata(gobject = giotto_slice1_obj, new_metadata = meta_df, by_column = TRUE) giotto_slice2_obj <- addCellMetadata(gobject = giotto_slice2_obj, new_metadata = meta_df, by_column = TRUE) ## cell segmentation. Calculate center (median of vertices) of each cell. segments_1_df <- read.csv(file.path(data_path, "segmented_cells_mouse2sample1.csv"), row.names=1, colClasses = "character") # as the cell IDs are 30 digit numbers, set the type as character to avoid the limitation of R in handling larger integers segments_2_df <- read.csv(file.path(data_path, "segmented_cells_mouse2sample6.csv"), row.names=1, colClasses = "character") # as the cell IDs are 30 digit numbers, set the type as character to avoid the limitation of R in handling larger integers segments_df <- rbind(segments_1_df, segments_2_df) loc.use <- segments_df[selected_cells,] loc.x <- grep("boundaryX_",colnames(loc.use),value = T) loc.y <- grep("boundaryY_",colnames(loc.use),value = T) centr.x <- apply(loc.use[,loc.x],1,function(x){ temp <- lapply(x,function(y){ as.numeric(unlist(strsplit(y,", "))) }) return (median(unname(unlist(temp)))) }) centr.y <- apply(loc.use[,loc.y],1,function(x){ temp <- lapply(x,function(y){ as.numeric(unlist(strsplit(y,", "))) }) return (median(unname(unlist(temp)))) }) ## create spatial locations object spatial_locs_df <- data.frame(cell_ID = selected_cells, sdimx = centr.x, sdimy = centr.y) spatial_locs_slice1_df <- spatial_locs_df[slice1_cells,] spatial_locs_slice2_df <- spatial_locs_df[slice2_cells,] spat_locs_slice1_obj <- readSpatLocsData(data_list = spatial_locs_slice1_df) spat_locs_slice2_obj <- readSpatLocsData(data_list = spatial_locs_slice2_df) ## add spatial location info giotto_slice1_obj <- setSpatialLocations(gobject = giotto_slice1_obj, x = spat_locs_slice1_obj) giotto_slice2_obj <- setSpatialLocations(gobject = giotto_slice2_obj, x = spat_locs_slice2_obj) ## merge two giotto objects together giotto_obj <- joinGiottoObjects(gobject_list = list(giotto_slice1_obj, giotto_slice2_obj), gobject_names = c("mouse2_slice229", "mouse2_slice300"), # name for each samples join_method = "z_stack") ## save giotto obj # saveGiotto saveGiotto(gobject = giotto_obj, foldername = "gobject", dir=results_folder) If you facing network issue when downloading the raw dataset. Please download the processing giotto obj from Zenodo, unzip and move it to results folder giotto_obj <- loadGiotto(path_to_folder = file.path(results_folder, "gobject")) 16.1.5.1 Spatial distribution of cell type spatPlot2D(giotto_obj, group_by = "slice_id", cell_color = "subclass", point_size = 1, point_border_stroke = NA, legend_text = 6) # We skip the processing process here to save time and use the given cell type # annotation directly ONTraC_input <- getONTraCv1Input(gobject = giotto_obj, cell_type = "subclass", output_path = results_folder, spat_unit = "cell", feat_type = "rna", verbose = TRUE) head(ONTraC_input) # Cell_ID Sample x y Cell_Type # <chr> <chr> <dbl> <dbl> <chr> # mouse2_slice229-100101435705986292663283283043431511315 mouse2_slice229 -4828.728 -2203.4502 L6 CT # mouse2_slice229-100104370212612969023746137269354247741 mouse2_slice229 -5405.400 -995.6467 OPC # mouse2_slice229-100128078183217482733448056590230529739 mouse2_slice229 -5731.403 -1071.1735 L2/3 IT # mouse2_slice229-100209662400867003194056898065587980841 mouse2_slice229 -5468.113 -1286.2465 Oligo # mouse2_slice229-100218038012295593766653119076639444055 mouse2_slice229 -6399.986 -959.7440 L2/3 IT # mouse2_slice229-100252992997994275968450436343196667192 mouse2_slice229 -6637.847 -1659.6237 Astro 16.1.6 Perform spatial niche trajectory analysis using ONTraC 16.1.6.1 ONTraC Installation You could run ONTraC on your own laptop or on an HPC with an NVIDIA GPU node. It will run for less than 10 minutes on this example dataset. For larger datasets, running on an NVIDIA GPU is recommended, otherwise it will take a long time. source ~/.bash_profile conda create -y -n ONTraC python=3.11 conda activate ONTraC pip install ONTraC 16.1.6.2 Running ONTraC This step will take several minutes to run. source ~/.bash_profile conda activate ONTraC ONTraC -d results/03_session5/ONTraC_dataset_input.csv --preprocessing-dir results/03_session5/preprocessing_dir --GNN-dir results/03_session5/GNN_dir --NTScore-dir results/03_session5/NTScore_dir --device cuda --epochs 1000 -s 42 --patience 100 --min-delta 0.001 --min-epochs 50 --lr 0.03 --hidden-feats 4 -k 6 --modularity-loss-weight 0.3 --purity-loss-weight 300 --regularization-loss-weight 0.3 --beta 0.03 2>&1 | tee results/03_session5/merfish_subset.log 16.1.7 Visualization 16.1.7.1 Load ONTraC results giotto_obj <- loadOntraCResults(gobject = giotto_obj, ontrac_results_dir = results_folder) The NTScore and binarized niche cluster info were stored in cell metadata head(pDataDT(giotto_obj, spat_unit = "cell", feat_type = "rna")) # cell_ID sample_id slice_id class_label subclass label list_ID NicheCluster NTScore # <char> <char> <char> <char> <char> <char> <char> <int> <num> # 1: mouse2_slice229-100101435705986292663283283043431511315 mouse2_sample6 mouse2_slice229 Glutamatergic L6 CT L6_CT_5 mouse2_slice229 3 0.2002081 # 2: mouse2_slice229-100104370212612969023746137269354247741 mouse2_sample6 mouse2_slice229 Other OPC OPC mouse2_slice229. 1 0.7999791 # 3: mouse2_slice229-100128078183217482733448056590230529739 mouse2_sample6 mouse2_slice229 Glutamatergic L2/3 IT L23_IT_4 mouse2_slice229 1 0.7662198 # 4: mouse2_slice229-100209662400867003194056898065587980841 mouse2_sample6 mouse2_slice229 Other Oligo Oligo_1 mouse2_slice229 5 0.6010420 # 5: mouse2_slice229-100218038012295593766653119076639444055 mouse2_sample6 mouse2_slice229 Glutamatergic L2/3 IT L23_IT_4 mouse2_slice229 1 0.7132024 # 6: mouse2_slice229-100252992997994275968450436343196667192 mouse2_sample6 mouse2_slice229 Other Astro Astro_2 mouse2_slice229 3 0.1980136 The probability matrix of each cell assigned to each niche cluster and connectivity between niche cluster were stored here. GiottoClass::list_expression(giotto_obj) # spat_unit feat_type name # <char> <char> <char> # 1: cell rna raw # 2: cell niche cluster prob # 3: niche cluster connectivity normalized 16.1.7.2 Niche cluster probability distribution spatFeatPlot2D(gobject = giotto_obj, spat_unit = "cell", feat_type = "niche cluster", expression_values = "prob", group_by = "list_ID", feats = rownames(giotto_obj@expression$cell$`niche cluster`$prob), point_border_col = "gray" ) 16.1.7.3 Binarized niche cluster for each cell spatPlot2D(giotto_obj, spat_unit = "cell", group_by = "slice_id", cell_color = "NicheCluster", color_as_factor = TRUE, point_size = 1, point_border_stroke = NA) 16.1.7.4 Niche cluster spatial connectivity set.seed(42) # fix the node positions plotNicheClusterConnectivity(gobject = giotto_obj) 16.1.7.5 NT (niche trajectory) score spatPlot2D(gobject = giotto_obj, spat_unit = "cell", feat_type = "rna", group_by = "slice_id", cell_color = "NTScore", color_as_factor = FALSE, cell_color_gradient = "turbo", point_size = 1, point_border_stroke = NA ) We could change the direction of NT scores here. giotto_obj@cell_metadata$cell$rna$NTScore <- 1 - giotto_obj@cell_metadata$cell$rna$NTScore spatPlot2D(gobject = giotto_obj, spat_unit = "cell", feat_type = "rna", group_by = "slice_id", cell_color = "NTScore", color_as_factor = FALSE, cell_color_gradient = "turbo", point_size = 1, point_border_stroke = NA ) plotCellTypeNTScore(gobject = giotto_obj, cell_type = "subclass", values = "NTScore", spat_unit = "cell", feat_type = "rna") 16.1.7.6 Cell type composition within niche cluster plotCTCompositionInNicheCluster(gobject = giotto_obj, cell_type = "subclass") 16.2 Session info sessionInfo() # R version 4.4.0 (2024-04-24) # Platform: aarch64-apple-darwin20 # Running under: macOS Ventura 13.6.6 # # Matrix products: default # BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib # LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 # # locale: # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 # # time zone: America/New_York # tzcode source: internal # # attached base packages: # [1] stats graphics grDevices utils datasets methods base # # other attached packages: # [1] ggraph_2.2.1 ggplot2_3.5.1 reticulate_1.37.0 Giotto_4.1.0 GiottoClass_0.3.2 # # loaded via a namespace (and not attached): # [1] tidyselect_1.2.1 viridisLite_0.4.2 dplyr_1.1.4 farver_2.1.2 GiottoVisuals_0.2.4 viridis_0.6.5 fastmap_1.2.0 lazyeval_0.2.2 tweenr_2.0.3 digest_0.6.35 lifecycle_1.0.4 # [12] terra_1.7-78 magrittr_2.0.3 dbscan_1.1-12 compiler_4.4.0 rlang_1.1.4 tools_4.4.0 igraph_2.0.3 utf8_1.2.4 yaml_2.3.8 data.table_1.15.4 knitr_1.47 # [23] labeling_0.4.3 graphlayouts_1.1.1 htmlwidgets_1.6.4 sp_2.1-4 plyr_1.8.9 RColorBrewer_1.1-3 withr_3.0.0 purrr_1.0.2 grid_4.4.0 polyclip_1.10-6 fansi_1.0.6 # [34] colorspace_2.1-0 scales_1.3.0 gtools_3.9.5 MASS_7.3-60.2 cli_3.6.2 rmarkdown_2.27 generics_0.1.3 rstudioapi_0.16.0 httr_1.4.7 reshape2_1.4.4 cachem_1.1.0 # [45] ggforce_0.4.2 stringr_1.5.1 parallel_4.4.0 matrixStats_1.3.0 vctrs_0.6.5 Matrix_1.7-0 jsonlite_1.8.8 bookdown_0.40 ggrepel_0.9.5 scattermore_1.2 magick_2.8.3 # [56] GiottoUtils_0.1.10 plotly_4.10.4 tidyr_1.3.1 glue_1.7.0 codetools_0.2-20 cowplot_1.1.3 stringi_1.8.4 gtable_0.3.5 deldir_2.0-4 munsell_0.5.1 tibble_3.2.1 # [67] pillar_1.9.0 htmltools_0.5.8.1 R6_2.5.1 tidygraph_1.3.1 evaluate_0.24.0 lattice_0.22-6 png_0.1-8 backports_1.5.0 memoise_2.0.1 Rcpp_1.0.12 gridExtra_2.3 # [78] checkmate_2.3.1 colorRamp2_0.1.0 xfun_0.44 pkgconfig_2.0.3 "],["interactivity-with-the-rspatial-ecosystem.html", "17 Interactivity with the R/Spatial ecosystem 17.1 Visium technology 17.2 Gene expression interpolation through kriging 17.3 Downloading the dataset 17.4 Extracting the files 17.5 Downloading giotto object and nuclei segmentation 17.6 Importing visium data 17.7 Performing kriging 17.8 Adding cell polygons to Giotto object 17.9 Reading in larger dataset 17.10 Analyzing interpolated features", " 17 Interactivity with the R/Spatial ecosystem Jeff Sheridan August 7th 2024 17.1 Visium technology Figure 17.1: Overview of Visium. Source: 10X Genomics. Visium by 10x Genomics is a spatial gene expression platform that allows for the mapping of gene expression to high-resolution histology through RNA sequencing The process involves placing a tissue section on a specially prepared slide with an array of barcoded spots, which are 55 µm in diameter with a spot to spot distance of 100 µm. Each spot contains unique barcodes that capture the mRNA from the tissue section, preserving the spatial information. After the tissue is imaged and RNA is captured, the mRNA is sequenced, and the data is mapped back to the tissue”s spatial coordinates. This technology is particularly useful in understanding complex tissue environments, such as tumors, by providing insights into how gene expression varies across different regions. 17.2 Gene expression interpolation through kriging Low resolution spatial data typically covers multiple cells making it difficult to delineate the cell contribution to gene expression. Using a process called kriging we can interpolate gene expression and map it to the single cell level from low resolution datasets. Kriging is a spatial interpolation technique that estimates unknown values at specific locations by weighing nearby known values based on distance and spatial trends. It uses a model to account for both the distance between points and the overall pattern in the data to make accurate predictions. By taking discrete measurement spots, such as those used for visium, we can interpolate gene expression to a finer scale using kriging. 17.2.1 Dataset For this tutorial we’ll be using the mouse brain dataset described in section 6. Visium datasets require a high resolution H&E or IF image to align spots to. Using these images we can identify individual nuclei and cells to be used for kriging. Identifying nuclei is outside the scope of the current tutorial but is required to perform kriging. 17.2.2 Generating a geojson file of nuclei location For the following sections we will need to create a geojson that contains polygon information for the nuclei in the sample. We will be providing this in the following link, however when using for your own datasets this will need to be done outside of Giotto. A tutorial for this using qupath can be found here. 17.3 Downloading the dataset We first need to import a dataset that we want to perform kriging on. data_directory <- "data/03_session6" dir.create(data_directory, showWarnings = F) download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_raw_feature_bc_matrix.tar.gz", destfile = file.path(data_directory, "V1_Adult_Mouse_Brain_raw_feature_bc_matrix.tar.gz")) download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_spatial.tar.gz", destfile = file.path(data_directory, "V1_Adult_Mouse_Brain_spatial.tar.gz")) 17.4 Extracting the files untar(tarfile = file.path(data_directory, "V1_Adult_Mouse_Brain_raw_feature_bc_matrix.tar.gz"), exdir = data_directory) untar(tarfile = file.path(data_directory, "V1_Adult_Mouse_Brain_spatial.tar.gz"), exdir = data_directory) 17.5 Downloading giotto object and nuclei segmentation We will need nuclei/cell segmentations to perform the kriging. Later in the tutorial we’ll also be using a pre-made giotto object. Download them using the following: destfile <- file.path(data_directory, "subcellular_gobject.zip") options(timeout = Inf) # Needed to download large files download.file("https://zenodo.org/records/13144556/files/Day3_Session6.zip?download=1", destfile = destfile) unzip(file.path(data_directory, "subcellular_gobject.zip"), exdir = data_directory) 17.6 Importing visium data We’re going to begin by creating a Giotto object for the visium mouse brain dataset. This tutorial won’t go into detail about each of these steps as these have been covered for this dataset in section 6. To get the best results when performing gene expression interpolation we need to identify spatially distinct genes. Therefore, we need to perform nearest neighbor to create a spatial network. If you have a Giotto object from day 1 session 5, feel free to load that in and skip this first step. library(Giotto) save_directory <- "results/03_session6" visium_save_directory <- file.path(save_directory, "visium_mouse_brain") subcell_save_directory <- file.path(save_directory, "pseudo_subcellular/") instrs <- createGiottoInstructions(show_plot = TRUE, save_plot = TRUE, save_dir = visium_save_directory) v_brain <- createGiottoVisiumObject(data_directory, gene_column_index = 2, instructions = instrs) # Subset to in tissue only cm <- pDataDT(v_brain) in_tissue_barcodes <- cm[in_tissue == 1]$cell_ID v_brain <- subsetGiotto(v_brain, cell_ids = in_tissue_barcodes) # Filter v_brain <- filterGiotto(gobject = v_brain, expression_threshold = 1, feat_det_in_min_cells = 50, min_det_feats_per_cell = 1000, expression_values = "raw") # Normalize v_brain <- normalizeGiotto(gobject = v_brain, scalefactor = 6000, verbose = TRUE) # Add stats v_brain <- addStatistics(gobject = v_brain) # ID HVF v_brain <- calculateHVF(gobject = v_brain, method = "cov_loess") fm <- fDataDT(v_brain) hv_feats <- fm[hvf == "yes" & perc_cells > 3 & mean_expr_det > 0.4]$feat_ID # Dimension Reductions v_brain <- runPCA(gobject = v_brain, feats_to_use = hv_feats) v_brain <- runUMAP(v_brain, dimensions_to_use = 1:10, n_neighbors = 15, set_seed = TRUE) # NN Network v_brain <- createNearestNetwork(gobject = v_brain, dimensions_to_use = 1:10, k = 15) # Leiden Cluster v_brain <- doLeidenCluster(gobject = v_brain, resolution = 0.4, n_iterations = 1000, set_seed = TRUE) # Spatial Network (kNN) v_brain <- createSpatialNetwork(gobject = v_brain, method = "kNN", k = 5, maximum_distance_knn = 400, name = "spatial_network") spatPlot2D(gobject = v_brain, spat_unit = "cell", cell_color = "leiden_clus", show_image = TRUE, point_size = 1.5, point_shape = "no_border", background_color = "black", show_legend = TRUE, save_plot = TRUE, save_param = list(save_name = "03_ses6_1_vis_spat")) Here we can see the clustering of the regular visium spots is able to identify distinct regions of the mouse brain. Figure 17.2: Mouse brain spatial plot showing leiden clustering 17.6.1 Identifying spatially organized features We need to identify genes to be used for interpolation. This works best with genes that are spatially distinct. To identify these genes we’ll use binSpect(). For this tutorial we’ll only use the top 15 spatially distinct genes. The more genes used for interpolation the longer the analysis will take. When running this for your own datasets you should use more genes. We are only using 15 here to minimize analysis time. # Spatially Variable Features ranktest <- binSpect(v_brain, bin_method = "rank", calc_hub = TRUE, hub_min_int = 5, spatial_network_name = "spatial_network", do_parallel = TRUE, cores = 8) #not able to provide a seed number, so do not set one # Getting the top 15 spatially organized genes ext_spatial_features <- ranktest[1:15,]$feats 17.7 Performing kriging 17.7.1 Interpolating features Now we can perform gene expression interpolation. This involves creating a raster image for the gene expression of each of the selected genes. The steps from here can be time consuming and require large amounts of memory. We will only be analyzing 15 genes to show the process of expression interpolation. For clustering and other analyses more genes are required. future::plan(future::multisession()) # comment out for single threading v_brain <- interpolateFeature(v_brain, spat_unit = "cell", feat_type = "rna", ext = ext(v_brain), feats = ext_spatial_features, overwrite = TRUE) print(v_brain) Figure 17.3: Giotto object after to interpolating features. Addition of images for each interoplated feature (left) and an example of rasterized gene expression image (right). For each gene that we interpolate a raster image is exported based on the gene expression. Shown below is an example of an output for the gene Pantr1. Figure 17.4: Raster of gene expression interpolation for Pantr1 17.8 Adding cell polygons to Giotto object 17.8.1 Read in the poly information First we need to read in the geojson file that contains the cell polygons that we’ll interpolate gene expression onto. These will then be added to the Giotto object as a new polygon object. This won’t affect the visium polygons. Both polygons will be stored within the same Giotto object. # Read in the data stardist_cell_poly_path <- file.path(data_directory, "segmentations/stardist_only_cell_bounds.geojson") stardist_cell_gpoly <- createGiottoPolygonsFromGeoJSON(GeoJSON = stardist_cell_poly_path, name = "stardist_cell", calc_centroids = TRUE) stardist_cell_gpoly <- flip(stardist_cell_gpoly) 17.8.2 Vizualizing polygons Below we can see a visualization of the polygons for the visium and the nuclei we identified from the H&E image. The visium dataset has 2698 spots compared to the 36694 nuclei we identified. Just using the visium spots we’re therefore losing a lot of the spatial data for individual cells. With the increased number of spots and them directly correlating with the tissue, through the spots alone we are able to better see the actual structure of the mouse brain. plot(getPolygonInfo(v_brain)) plot(stardist_cell_gpoly, max_poly = 1e6) Figure 17.5: Mouse brain cell polygons from the visium dataset Figure 17.6: Mouse brain cell polygons with artifacts removed and flipped 17.8.3 Showing Giotto object prior to polygon addition Before we add the polygons we can see the gobject contains “cell” as a spatial unit and a polygon. print(v_brain) Figure 17.7: Giotto object before adding subcellular polygons. 17.8.4 Adding polygons to giotto object After we add the nuclei polygons we can see that a new polygon name, “stardist_cell” has been added to the gobject. v_brain <- addGiottoPolygons(v_brain, gpolygons = list("stardist_cell" = stardist_cell_gpoly)) print(v_brain) Figure 17.8: Giotto object after to adding subcellular polygons. 17.8.5 Check polygon information We can now see the addition of the new polygons under the name “stardist_cell”. Each of the new polyons is given a unique poly_ID as shown below. Each polygon is also added into same space as the original visium spots, therefore line up with the same image as the visium spots. poly_info <- getPolygonInfo(v_brain, polygon_name = "stardist_cell") print(poly_info) Figure 17.9: Polygon information for stardist_cell. 17.8.6 Expression overlap The raster we created above gives the gene expression in a graphical form. We next need to determine how that relates to the nuclei location. To determine that we will calculate the overlap of the rasterized gene expression image to the polygons supplied earlier. This step also takes more time the more genes that are provided. For large datasets please allow up to multiple hours for these steps to run. v_brain <- calculateOverlapPolygonImages(gobject = v_brain, name_overlap = "rna", spatial_info = "stardist_cell", image_names = ext_spatial_features) v_brain <- Giotto::overlapToMatrix(x = v_brain, poly_info = "stardist_cell", feat_info = "rna", aggr_function = "sum", type="intensity") After performing the overlap we now have expression data for each gene provided. This can be seen below where we see the interpolated gene expression for genes in each of the nuclei we identified. Figure 17.10: Gene expression for cells based on interpolation. 17.9 Reading in larger dataset For better results more genes are required. The above data used only 15 genes. We will now read in a dataset that has 1500 interpolated genes an use this for the remained of the tutorial. If you haven’t downloaded this dataset please download it here. v_brain <- loadGiotto(file.path(data_directory, "subcellular_gobject")) 17.10 Analyzing interpolated features 17.10.1 Filter and normalization Now that we have a valid spat unit and gene expression data for each of the provided genes we can now perform the same analyses we used for the regular visium data. Please note that due to the differences in cell number that the values used for the current analysis aren’t identical to the visium analysis. v_brain <- filterGiotto(gobject = v_brain, spat_unit = "stardist_cell", expression_values = "raw", expression_threshold = 1, feat_det_in_min_cells = 0, min_det_feats_per_cell = 1) v_brain <- normalizeGiotto(gobject = v_brain, spat_unit = "stardist_cell", scalefactor = 6000, verbose = TRUE) 17.10.2 Visualizing gene expression from interpolated expression Since we have the gene expression information for both the visium and the interpolated gene expression we can visualize gene expression for both from the same Giotto object. We will look at the expression for two genes “Sparc” and “Pantr1” for both the visium and interpolated data. spatFeatPlot2D(v_brain, spat_unit = "cell", gradient_style = "sequential", cell_color_gradient = "Geyser", feats = "Sparc", point_size = 2, save_plot = TRUE, show_image = TRUE, save_param = list(save_name = "03_ses6_sparc_vis")) spatFeatPlot2D(v_brain, spat_unit = "stardist_cell", gradient_style = "sequential", cell_color_gradient = "Geyser", feats = "Sparc", point_size = 0.6, save_plot = TRUE, show_image = TRUE, save_param = list(save_name = "03_ses6_sparc")) spatFeatPlot2D(v_brain, spat_unit = "cell", gradient_style = "sequential", feats = "Pantr1", cell_color_gradient = "Geyser", point_size = 2, save_plot = TRUE, show_image = TRUE, save_param = list(save_name = "03_ses6_pantr1_vis")) spatFeatPlot2D(v_brain, spat_unit = "stardist_cell", gradient_style = "sequential", cell_color_gradient = "Geyser", feats = "Pantr1", point_size = 0.6, save_plot = TRUE, show_image = TRUE, save_param = list(save_name = "03_ses6_pantr1")) Below we can see the gene expression for both datatypes. With the interpolated gene expression we’re able to get a better idea as to the cells that are expressing each of the genes. This is especially clear with Pantr1, which clearly localizes to the pyramidal layer. Figure 17.11: Gene expression for visium (left) and interpolated (right) expression for Sparc (top) and Pantr1 (bottom). 17.10.3 Run PCA v_brain <- runPCA(gobject = v_brain, spat_unit = "stardist_cell", expression_values = "normalized", feats_to_use = NULL) 17.10.4 Clustering # UMAP v_brain <- runUMAP(v_brain, spat_unit = "stardist_cell", dimensions_to_use = 1:15, n_neighbors = 1000, min_dist = 0.001, spread = 1) # NN Network v_brain <- createNearestNetwork(gobject = v_brain, spat_unit = "stardist_cell", dimensions_to_use = 1:10, feats_to_use = hv_feats, expression_values = "normalized", k = 70) v_brain <- doLeidenCluster(gobject = v_brain, spat_unit = "stardist_cell", resolution = 0.15, n_iterations = 100, partition_type = "RBConfigurationVertexPartition") plotUMAP(v_brain, spat_unit = "stardist_cell", cell_color = "leiden_clus") Figure 17.12: UMAP for stardist_cell based on the 1500 interpolated gene expressions. Colored based on leiden clustering. 17.10.5 Visualizing clustering Visualizing the clustering for both the visium dataset and the interpolated dataset we can get similar clusters. However, with the interpolated dataset we are able to see finer detail for each cluster. spatPlot2D(gobject = v_brain, spat_unit = "cell", cell_color = "leiden_clus", show_image = TRUE, point_size = 0.5, point_shape = "no_border", background_color = "black", save_plot = FALSE, show_legend = TRUE) spatPlot2D(gobject = v_brain, spat_unit = "stardist_cell", cell_color = "leiden_clus", show_image = TRUE, point_size = 0.1, point_shape = "no_border", background_color = "black", show_legend = TRUE, save_plot = TRUE, save_param = list(save_name = "03_ses6_subcell_spat")) Figure 17.13: Spatial plots showing leiden clustering mapped onto the base visium spots (left) and individual nuceli through interpolation (right) 17.10.6 Cropping objects We are also able to crop both spat units simultaneously to zoom in on specific regions of the tissue such as seen below. v_brain_crop <- subsetGiottoLocs(gobject = v_brain, spat_unit = ":all:", x_min = 4000, x_max = 7000, y_min = -6500, y_max = -3500, z_max = NULL, z_min = NULL) spatPlot2D(gobject = v_brain_crop, spat_unit = "cell", cell_color = "leiden_clus", show_image = TRUE, point_size = 2, point_shape = "no_border", background_color = "black", show_legend = TRUE, save_plot = TRUE, save_param = list(save_name = "03_ses6_vis_spat_crop")) spatPlot2D(gobject = v_brain_crop, spat_unit = "stardist_cell", cell_color = "leiden_clus", show_image = TRUE, point_size = 0.1, point_shape = "no_border", background_color = "black", show_legend = TRUE, save_plot = TRUE, save_param = list(save_name = "03_ses6_subcell_spat_crop")) Figure 17.14: Spatial plots showing leiden clustering mapped onto the base visium spots (left) and individual nuceli through interpolation (right) "],["contributing-to-giotto.html", "18 Contributing to Giotto 18.1 Contribution guideline", " 18 Contributing to Giotto Jiaji George Chen August 7th 2024 save_dir <- "~/Documents/GitHub/giotto_workshop_2024/img/03_session7" 18.1 Contribution guideline https://drieslab.github.io/Giotto_website/CONTRIBUTING.html "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]]
+[["index.html", "Workshop: Spatial multi-omics data analysis with Giotto Suite 1 Giotto Suite Workshop 2024 1.1 Instructors 1.2 Topics and Schedule: 1.3 License", " Workshop: Spatial multi-omics data analysis with Giotto Suite Ruben Dries, Jiaji George Chen, Joselyn Cristina Chávez-Fuentes, Junxiang Xu ,Edward Ruiz, Jeff Sheridan, Iqra Amin, Wen Wang 1 Giotto Suite Workshop 2024 Workshop: Spatial multi-omics data analysis with Giotto Suite Github repo: https://github.com/drieslab/giotto_workshop_2024/ Giotto Suite Website: http://www.giottosuite.com Twitter/X: https://x.com/GiottoSpatial Code repo: https://github.com/drieslab/Giotto Issues page: https://github.com/drieslab/Giotto/issues Discussions page: https://github.com/drieslab/Giotto/discussions 1.1 Instructors Ruben Dries: Assistant Professor of Medicine at Boston University Joselyn Cristina Chávez Fuentes: Postdoctoral fellow at Icahn School of Medicine at Mount Sinai Jiaji George Chen: Ph.D. student at Boston University Junxiang Xu: Ph.D. student at Boston University Edward C. Ruiz: Ph.D. student at Boston University Jeff Sheridan: Postdoctoral fellow at Boston University Iqra Amin: Bioinformatician at Boston University Wen Wang: Postdoctoral fellow at Icahn School of Medicine at Mount Sinai 1.2 Topics and Schedule: Day 1: Introduction Spatial omics technologies Spatial sequencing Spatial in situ Spatial proteomics spatial other: ATAC-seq, lipidomics, etc Introduction to the Giotto package Ecosystem Installation + python environment Giotto instructions Data formatting and Pre-processing Creating a Giotto object From matrix + locations From subcellular raw data (transcripts or images) + polygons Using convenience functions for popular technologies (Vizgen, Xenium, CosMx, …) Spatial plots Subsetting: Based on IDs Based on locations Visualizations Introduction to spatial multi-modal dataset (10X Genomics breast cancer) and goal for the next days Quality control Statistics Normalization Feature selection: Highly Variable Features: loess regression binned pearson residuals Spatial variable genes Dimension Reduction PCA UMAP/t-SNE Visualizations Clustering Non-spatial k-means Hierarchical clustering Leiden/Louvain Spatial Spatial variable genes Spatial co-expression modules Day 2: Spatial Data Analysis Spatial sequencing based technology: Visium Differential expression Enrichment & Deconvolution PAGE/Rank SpatialDWLS Visualizations Interactive tools Spatial expression patterns Spatial variable genes Spatial co-expression modules Spatial HMRF Spatial sequencing based technology: Visium HD Tiling and aggregation Scalability (duckdb) and projection functions Spatial expression patterns Spatial co-expression module Spatial in situ technology: Xenium Read in raw data Transcript coordinates Polygon coordinates Visualizations Overlap txs & polygons Typical aggregated workflow Feature/molecule specific analysis Visualizations Transcript enrichment GSEA Spatial location analysis Spatial cell type co-localization analysis Spatial niche analysis Spatial niche trajectory analysis Visualizations Spatial proteomics: multiplex IF Read in raw data Intensity data (IF or any other image) Polygon coordinates Visualizations Overlap intensity & workflows Typical aggregated workflow Visualizations Day 3: Advanced Tutorials Multiple samples Create individual giotto objects Join Giotto Objects Perform Harmony and default workflows Visualizations Spatial multi-modal Co-registration of datasets Examples in giotto suite manuscript Multi-omics integration Example in giotto suite manuscript Interoperability w/ other frameworks AnnData/SpatialData SpatialExperiment Seurat Interoperability w/ isolated tools Spatial niche trajectory analysis Interactivity with the R/Spatial ecosystem Kriging Contributing to Giotto 1.3 License This material has a Creative Commons Attribution-ShareAlike 4.0 International License. To get more information about this license, visit http://creativecommons.org/licenses/by-sa/4.0/ "],["datasets-packages.html", "2 Datasets & Packages 2.1 Datasets to download 2.2 Needed packages", " 2 Datasets & Packages 2.1 Datasets to download Here we provide links to the original datasets that were used for this workshop. Some of the datasets were modified (e.g. downsampled or subsetted) for the purpose of this workshop. You can download them from their original source or download all of them - including intermediate files - from the following Zenodo repository: 2.1.1 Zenodo repository https://zenodo.org/communities/gw2024/ 2.1.2 10X Genomics Visium Mouse Brain Section (Coronal) dataset https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Adult_Mouse_Brain 2.1.3 10X Genomics Visium HD: FFPE Human Colon Cancer https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-libraries-of-human-crc 2.1.4 10X Genomics multi-modal dataset https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast 2.1.5 10X Genomics multi-omics Visium CytAssist Human Tonsil dataset https://www.10xgenomics.com/resources/datasets/gene-protein-expression-library-of-human-tonsil-cytassist-ffpe-2-standard 2.1.6 10X Genomics Human Prostate Cancer Adenocarcinoma with Invasive Carcinoma (FFPE) https://www.10xgenomics.com/datasets/human-prostate-cancer-adenocarcinoma-with-invasive-carcinoma-ffpe-1-standard-1-3-0 2.1.7 10X Genomics Normal Human Prostate (FFPE) https://www.10xgenomics.com/datasets/normal-human-prostate-ffpe-1-standard-1-3-0 2.1.8 Xenium https://www.10xgenomics.com/datasets/preview-data-ffpe-human-lung-cancer-with-xenium-multimodal-cell-segmentation-1-standard 2.1.9 MERFISH cortex dataset https://doi.brainimagelibrary.org/doi/10.35077/g.21 2.1.10 Lunaphore IF dataset https://zenodo.org/records/13175721 2.2 Needed packages To run all the tutorials from this Giotto Suite workshop you will need to install additional R and Python packages. Here we provide detailed instructions and discuss some common difficulties with installing these packages. The easiest way would be to copy each code snippet into your R/Rstudio Console using fresh a R session. 2.2.1 CRAN dependencies: cran_dependencies <- c("BiocManager", "devtools", "pak") install.packages(cran_dependencies, Ncpus = 4) 2.2.2 terra installation terra may have some additional steps when installing depending on which system you are on. Please see the terra repo for specifics. Installations of the CRAN release on Windows and Mac are expected to be simple, only requiring the code below. For Linux, there are several prerequisite installs: GDAL (>= 2.2.3), GEOS (>= 3.4.0), PROJ (>= 4.9.3), sqlite3 On our AlmaLinux 8 HPC, the following versions have been working well: gdal/3.6.4 geos/3.11.1 proj/9.2.0 sqlite3/3.37.2 install.packages("terra") 2.2.3 Matrix installation !! FOR R VERSIONS LOWER THAN 4.4.0 !! Giotto requires Matrix 1.6-2 or greater, but when installing Giotto with pak on an R version lower than 4.4.0, the installation can fail asking for R 4.5 which doesn’t exist yet. We can solve this by installing the 1.6-5 version directly by un-commenting and running the line below. # devtools::install_version("Matrix", version = "1.6-5") 2.2.4 Rtools installation Before installing Giotto on a windows PC please make sure to install the relevant version of Rtools. If you have a Mac or linux PC, or have already installed Rtools, please ignore this step. 2.2.5 Giotto installation pak::pak("drieslab/Giotto") pak::pak("drieslab/GiottoData") 2.2.6 irlba install Reinstall irlba from source. Avoids the common function 'as_cholmod_sparse' not provided by package 'Matrix' error. See this issue for more info. install.packages("irlba", type = "source") 2.2.7 arrow install arrow is a suggested package that we use here to open parquet files. The parquet files that 10X provides use zstd compression which the default arrow installation may not provide. has_arrow <- requireNamespace("arrow", quietly = TRUE) zstd <- TRUE if (has_arrow) { zstd <- arrow::arrow_info()$capabilities[["zstd"]] } if (!has_arrow || !zstd) { Sys.setenv(ARROW_WITH_ZSTD = "ON") install.packages("assertthat", "bit64") install.packages("arrow", repos = c("https://apache.r-universe.dev")) } 2.2.8 Bioconductor dependencies: bioc_dependencies <- c( "scran", "ComplexHeatmap", "SpatialExperiment", "ggspavis", "scater", "nnSVG" ) 2.2.9 CRAN packages: needed_packages_cran <- c( "dplyr", "gstat", "hdf5r", "miniUI", "shiny", "xml2", "future", "future.apply", "exactextractr", "tidyr", "viridis", "quadprog", "Rfast", "pheatmap", "patchwork", "Seurat", "harmony", "scatterpie", "R.utils", "qs" ) pak::pkg_install(c(bioc_dependencies, needed_packages_cran)) 2.2.10 Packages from GitHub github_packages <- c( "satijalab/seurat-data" ) pak::pkg_install(github_packages) 2.2.11 Python environments # default giotto environment Giotto::installGiottoEnvironment() reticulate::py_install( pip = TRUE, envname = 'giotto_env', packages = c( "scanpy" ) ) # install another environment with py 3.8 for cellpose reticulate::conda_create(envname = "giotto_cellpose", python_version = 3.8) #.re.restartR() reticulate::use_condaenv('giotto_cellpose') reticulate::py_install( pip = TRUE, envname = 'giotto_cellpose', packages = c( "pandas", "networkx", "python-igraph", "leidenalg", "scikit-learn", "cellpose", "smfishhmrf", 'tifffile', 'scikit-image' ) ) "],["spatial-omics-technologies.html", "3 Spatial omics technologies 3.1 Presentation 3.2 Short summary", " 3 Spatial omics technologies Ruben Dries August 5th 2024 3.1 Presentation 3.2 Short summary 3.2.1 Why do we need spatial omics technologies? Spatial omics allows us to examine the role of one or more cells within its normal context. This spatial context is typically organized at multiple length scales, and considers both adjacent neighboring cells and larger levels of tissue organization. Figure 3.1: Capturing tissue complexity with RNA-seq, scRNAseq, and Spatial Omics 3.2.2 What is spatial omics? Spatial omics is typically a combination of spatial sequencing and/or imaging together with understanding the obtained results through spatial data science. Figure 3.2: Spatial Omics Constituents 3.2.3 What are the main spatial omics technologies? The large majority - and most popular or accessible - spatial technologies are: - spatial antibody-multiplex proteomics - spatial multiplex in situ hybridization (ISH)-based transcriptomics - spatial sequencing-based transcriptomics Figure 3.3: Lewis et al. Nat Meth Review. Characteristics of spatial omics technologies 3.2.4 Other Spatial omics: ATAC-seq, CUT&Tag, lipidomics, etc A growing number of other spatial technologies exist that profile different types of molecular analytes. One example is using a deterministic barcoding approach (Rong Fan’s group) to explore open (ATAC-seq) or modified (CUT&Tag) chromatin in a spatially aware manner. Figure 3.4: Vandereyken et al. Nat Rev Genetics. Spatial deterministic barcoding for ATAC-seq and CUT&tag 3.2.5 What are the different types of spatial downstream analyses? There exist a large and diverse amount of different downstream spatial data analyses that use different available data types and formats as input. Figure 3.5: Dries, R. et al. Genome Res. Downstream analysis in spatial data analysis. "],["introduction-to-the-giotto-package.html", "4 Introduction to the Giotto package 4.1 Presentation 4.2 Ecosystem 4.3 Installation + python environment 4.4 Giotto instructions", " 4 Introduction to the Giotto package Ruben Dries & Jiaji George Chen August 5th 2024 4.1 Presentation 4.2 Ecosystem Giotto Suite is a modular ecosystem of individual R packages that each provide different functionality and that together provide users with a fully integrated spatial multi-omics workflow. Figure 4.1: Overview of the modular Giotto Suite ecosystem Each package also has its own website: - GiottoUtils: https://drieslab.github.io/GiottoUtils/ - GiottoClass: https://drieslab.github.io/GiottoClass/ - GiottoData: https://drieslab.github.io/GiottoData/ - GiottoVisuals: https://drieslab.github.io/GiottoVisuals/ More information is available at https://drieslab.github.io/Giotto_website/articles/ecosystem.html 4.3 Installation + python environment 4.3.1 Giotto installation Giotto Suite is currently installable only from GitHub, but we are actively working on getting it into a major repository. Much of this already covered in Section 2.2, but the highlights are: 4.3.1.1 System prerequisites for windows, Rtools needs to be installed a major dependency terra needs GDAL (>= 2.2.3), GEOS (>= 3.4.0), PROJ (>= 4.9.3), sqlite3 on linux 4.3.1.2 Installation of released version To install the currently released version of Giotto in a single step: pak::pak("drieslab/Giotto") This should automatically install all the Giotto dependencies and other Giotto module packages (main branch). 4.3.1.3 Installation of dev branch Giotto packages pak tends to forcibly install all dependencies, which can have issues when working with multiple dev branch packages. You can install dev branch versions by using devtools::install_github() instead Core module dev branchs: \"drieslab/Giotto@suite_dev\" \"drieslab/GiottoVisuals@dev\" \"drieslab/GiottoClass@dev\" \"drieslab/GiottoUtils@dev\" devtools::install_github("drieslab/GiottoClass@dev") 4.3.1.4 Common install issues If installing on an R version earlier than 4.4, pak can throw errors when installing Matrix. To get around this, install Matrix v1.6-5 and then installing Giotto with pak should work. devtools::install_version("Matrix", version = "1.6-5") If you come across the function 'as_cholmod_sparse' not provided by package 'Matrix' error when running Giotto, reinstalling irlba from source may resolve it. install.packages("irlba", type = "source") 4.3.2 Python environment 4.3.2.1 Default installation In order to make use of python packages, the first thing to do after installing Giotto for the first time is to create a giotto python environment. Giotto provides the following as a convenience wrapper around reticulate functions to setup a default environment. library(Giotto) installGiottoEnvironment() Two things are needed for python to work: A conda (e.g. miniconda or anaconda) installation which is the package and environment management system. Independent environment(s) with specific versions of the python language and associated python packages. installGiottoEnvironment() checks both and will install miniconda using reticulate if necessary. If a specific conda binary already exists that you want to use, the conda param can be set, or you can set the reticulate option options(\"reticulate.conda_binary\" = \"[conda path]\") or Sys.setenv(\"RETICULATE_CONDA\" = \"[conda path]\"). After ensuring the conda binary exists, the default Giotto environment is installed which is a python 3.10.2 environment named ‘giotto_env’. It will contain several default packages that Giotto installs: “pandas==1.5.1” “networkx==2.8.8” “python-igraph==0.10.2” “leidenalg==0.9.0” “python-louvain==0.16” “python.app==1.4” (if needed) “scikit-learn==1.1.3” 4.3.2.2 Custom installs Custom python environments can be made by first setting up a new environment and establishing the name and python version to use. reticulate::conda_create(envname = "[name of env]", python_version = ???) Following that, one or more python packages to install can be added to the environment. reticulate::py_install( pip = TRUE, envname = '[name of env]', packages = c( "package1", "package2", "..." ) ) Once an environment has been set up, Giotto can hook into it. 4.3.2.3 Using a specific environment When using python through reticulate, R only allows one environment to be activated per session. Once a session has loaded a python environment, it can no longer switch to another one. Giotto activates a python environment when any of the following happens: a giotto object is created giottoInstructions are created (createGiottoInstructions()) GiottoClass::set_giotto_python_path() is called (most straightforward) Which environment is activated is based on a set of 5 defaults in decreasing priority. User provided (when python_path param is given. Either a full filepath or an env name are accepted.) Any provided path or envname set in options options(\"giotto.py_path\" = \"[path to env or envname]\") Default expected giotto environment location based on reticulate::miniconda_path() Envname \"giotto_env\" System default python environment Method 2 is most recommended when there is a non-standard python environment to regularly use with Giotto. You would run file.edit(\"~/.Rprofile\") and then add options(\"giotto.py_path\" = \"[path to env or envname]\") as a line so that it is automatically set at the start of each session. If a specific environment should only be used a couple times then method 1 is easiest: GiottoClass::set_giotto_python_path(python_path = "[path to env or envname]") To check which conda environments exist on your machine: reticulate::conda_list() Once an environment is activated, you can check more details and ensure that it is the one you are expecting by running: reticulate::py_config() 4.4 Giotto instructions Giotto uses giottoInstructions in order to set a behavior for a particular giotto object. Most commonly used are: python_path - when set, will activate a python environment save_dir - save directory to use. Usually for plots generated. This can help speed things up since the viewer no longer has to render. save_plot - whether to save plots to the save_dir return_plot - whether to return the plot objects. When FALSE, only NULL is returned show_plot - whether to show the plot in the viewer These objects are created with createGiottoInstructions() and the created objects can be edited afterwards using the instructions() generic function. library(Giotto) save_dir <- "results/01_session2/" # this call will also intialize the python env instrs <- createGiottoInstructions( save_dir = save_dir, # working directory is the default show_plot = FALSE, save_plot = TRUE, return_plot = FALSE, python_path = NULL # when NULL, this calls GiottoClass::set_giotto_python_path() to get the default ) force(instrs) Giotto object creation functions all have an instructions param for passing in instructions objects. giotto objects will also respond to the instructions() generic. test <- giotto(instructions = instrs) # passing NULL instead will also generate a default instructions object # example plot g <- GiottoData::loadGiottoMini("visium") instructions(g) <- instrs instructions(g, "show_plot") # instructions say not to plot to viewer spatPlot2D(g, show_image = TRUE, image_name = "image") # instead it will directly write to the results folder As an example, you can also set individual instructions instructions(g, "show_plot") <- TRUE spatPlot2D(g, show_image = TRUE, image_name = "image") Figure 4.2: example image output "],["data-formatting-and-pre-processing.html", "5 Data formatting and Pre-processing 5.1 Data formats 5.2 Pre-processing 5.3 Subobject utility functions", " 5 Data formatting and Pre-processing Jiaji George Chen August 5th 2024 5.1 Data formats There are many kinds of outputs and data formats that are currently being used in the spatial omics field for storage and dissemination of information. The following are some that we commonly work with. For Giotto, much of the data wrangling task is to get the information read in from these formats into R native formats and wrapped as Giotto subobjects. The subobjects then enforce formatting and allow the data types to behave as building blocks of the giotto object 5.1.1 General formats .csv/.tsv are standard delimited filetypes, where the values are separated by commas (.csv), tabs (.tsv). These can be read in with a wide array of functions and packages: utils::read.delim(), readr::read_delim(), data.table::fread() etc. They are easy to use, but large files are hard to scan through. 5.1.2 Matrix formats 10X regularly provides their cell feature counts matrices in both the .mtx (matrix market or MM) and .h5 formats. The MM formats come in a zipped folder. Within, the structure is usually ├── barcodes.tsv.gz ├── features.tsv.gz └── matrix.mtx.gz MM format by itself does not carry dimnames so they are stored in .tsv files for the barcodes (cells/observations) and features. barcodes.tsv.gz from a Xenium dataset V1 <char> 1: aaaadpbp-1 2: aaaaficg-1 3: aaabbaka-1 4: aaabbjoo-1 5: aaablchg-1 --- 162250: ojaaphhh-1 162251: ojabeldf-1 162252: ojacfbid-1 162253: ojacfhhg-1 162254: ojacpeii-1 features.tsv.gz from a Xenium dataset V1 V2 V3 <char> <char> <char> 1: ENSG00000121270 ABCC11 Gene Expression 2: ENSG00000130234 ACE2 Gene Expression 3: ENSG00000213088 ACKR1 Gene Expression 4: ENSG00000107796 ACTA2 Gene Expression 5: ENSG00000163017 ACTG2 Gene Expression --- 537: UnassignedCodeword_0495 UnassignedCodeword_0495 Unassigned Codeword 538: UnassignedCodeword_0496 UnassignedCodeword_0496 Unassigned Codeword 539: UnassignedCodeword_0497 UnassignedCodeword_0497 Unassigned Codeword 540: UnassignedCodeword_0498 UnassignedCodeword_0498 Unassigned Codeword 541: UnassignedCodeword_0499 UnassignedCodeword_0499 Unassigned Codeword The matrix.mtx file then contains the actual sparse matrix values in triplet format. The .h5 format is very similar, except that it is a hierarchical format that contains all three of these items in the same file. Giotto provides get10Xmatrix() and get10Xmatrix_h5() as convenient functions to open these exports and read them in as one or more Matrix sparse representations. 5.1.3 Tabular formats .parquet is a great format for storing large amounts of table information and providing fast access to only portions of the data at a time. 10X is using this format for things such as the table of all transcripts detections in Xenium or the polygons. They can be opened and worked with using arrow and dplyr verbs. Currently, giotto extracts information from these files and then converts them to in-memory data.tables or terra SpatVectors depending on what data they contain. 5.1.4 Spatial formats .shp and .geojson are common formats for polygon and point data. They are commonly used as exports from segmentation software such as QuPath. GiottoClass::createGiottoPolygon() and the more specific createGiottoPolygonsFromGeoJSON() can be used for reading these in. library(GiottoClass) shp <- system.file("extdata/toy_poly.shp", package = "GiottoClass") gpoly <- createGiottoPolygon(shp, name = "test") plot(gpoly) Figure 5.1: Plot of giottoPolygon from .shp 5.1.5 Mask files .tif files can be used as mask files where the integer values of the image encode where an annotation is. createGiottoPolygonsFromMask() guesses whether the image is single value or multi value mask. NanoString CosMx is one example of a platform that distributes the polygon information through a series of mask files. m <- system.file("extdata/toy_mask_multi.tif", package = "GiottoClass") plot(terra::rast(m), col = grDevices::hcl.colors(7)) Figure 5.2: Example mask image. Integer values are shown as different colors gp <- createGiottoPolygon( m, flip_vertical = FALSE, flip_horizontal = FALSE, shift_horizontal_step = FALSE, shift_vertical_step = FALSE, ID_fmt = "id_test_%03d", name = "test" ) force(gp) An object of class giottoPolygon spat_unit : "test" Spatial Information: class : SpatVector geometry : polygons dimensions : 7, 1 (geometries, attributes) extent : 3, 27, 1.04, 11.96 (xmin, xmax, ymin, ymax) coord. ref. : centroids : NULL overlaps : NULL plot(gp, col = grDevices::hcl.colors(7)) Figure 5.3: giottoPolygon from mask image. Identical coloring order implies that encoded IDs have been properly imported. For situations where all pixel values are the same, but not touching indicates different annotations: m2 <- system.file("extdata/toy_mask_single.tif", package = "GiottoClass") plot(terra::rast(m2), col = grDevices::hcl.colors(7)) Figure 5.4: Example mask image with only 1 value gpoly1 <- createGiottoPolygonsFromMask( m2, flip_vertical = FALSE, flip_horizontal = FALSE, shift_horizontal_step = FALSE, shift_vertical_step = FALSE, ID_fmt = "id_test_%03d", name = "multi_test" ) plot(gpoly1, col = grDevices::hcl.colors(7)) Figure 5.5: giottoPolygon from single value mask 5.1.6 images Most images are openable using createGiottoLargeImage() which wraps terra::rast(). This allows compatibility with most common image types. Recent and non-geospatially related image formats are not well supported however. One example is ome.tif which 10X uses for large image exports from Xenium. For these, we use ometif_to_tif() to convert them into normal .tif files using the python tifffile package. ometif_metadata() can be used to extract and access the associated ome xml image metadata. 5.1.7 jsonlike formats jsonlike formats are ones that can be read in with jsonlite::read_json() and then coerced into list-like or tabular structures. 10X uses these .json to report the scalefactors information in Visium datasets. The .xenium file format is also openable as a json-like. 5.1.8 Hierarchical formats There are many types of data in spatial-omics analysis. Hierarchical formats afford both a way to organize complex multi-type data and also to store and distribute them. In R, these can be opened with either hdf5r on CRAN or rhdf5 on BioConductor. The complex nature of these formats and also the fact they are just a storage format and not an organizational specification means that what data and how it is stored and represented can often be very different. .gef and .bgef which StereoSeq exports are .hdf5-like formats. .h5ad is a specific flavor of these file formats where they follow the AnnData framework so that there is more common structure in how datasets are stored. Giotto provides anndataToGiotto() and giottoToAnnData() interoperability functions for interconverting. .zarr is another hierarchical storage structure, however currently the R-native support is still being developed. 5.2 Pre-processing The most common types of raw data needed for a Giotto object are expression matrices, centroids information, spatial feature points, polygons. Evaluation of input data and conversion to compatible formats happens inside the create* functions that Giotto exports. There is one of these for each of the subobject classes. 5.2.1 Expression matrix Not much processing is needed for matrices. All that is needed is a data type that is coercible to matrix (or Matrix classes). Dimnames should be added. Columns should be cells or observations. Rows should be features or variables. m <- matrix(sample(c(rep(1, 10), rep(0, 90))), nrow = 10) rownames(m) <- sprintf("feat_%02d", seq(10)) colnames(m) <- sprintf("cell_%02d", seq(10)) x <- createExprObj(m) An object of class exprObj : "test" spat_unit : "cell" feat_type : "rna" contains: 10 x 10 sparse Matrix of class "dgCMatrix" feat_01 . 1 1 1 . . . . . . feat_02 . . . . . . . . . . feat_03 . . . . . 1 . . . . feat_04 1 . . . 1 . . . . . ........suppressing 2 rows in show(); maybe adjust options(max.print=, width=) feat_07 . 1 . . . 1 . . . 1 feat_08 . . . . . . . . . . feat_09 . . . . . . . . . . feat_10 . . . . . . . . . . First four colnames: cell_01 cell_02 cell_03 cell_04 5.2.2 Spatial locations For Giotto centroid locations, a tabular data.frame-like format is required. The first non-numeric column found will be set as the cell_ID. The numeric columns will then be kept as coordinates information. set.seed(1234) xy <- data.frame( a = as.character(seq(100)), b = rnorm(100), c = rnorm(100) ) sl_xy <- createSpatLocsObj(xy) plot(sl_xy) Figure 5.6: Plot of spatLocsObj created from xy information set.seed(1234) xyz <- data.frame( a = as.character(seq(100)), b = rnorm(100), c = rnorm(100), d = rnorm(100) ) sl_xyz <- createSpatLocsObj(xyz) plot(sl_xyz) Figure 5.7: Plot of spatLocsObj created from xy and z information 5.2.3 giottoPoints giottoPoints are very similar. These subobjects wrap a terra SpatVector object and if tabular data is provided, what is needed are x, y, and feature ID. Additional columns are kept as metadata information. set.seed(1234) tx <- data.frame( id = sprintf("gene_%05d", seq(1e4)), x = rnorm(1e4), y = rnorm(1e4), meta = sprintf("metadata_%05d", seq(1e4)) ) gpoints <- createGiottoPoints(tx) plot(gpoints, raster = FALSE) plot(gpoints, dens = TRUE) An object of class giottoPoints feat_type : "rna" Feature Information: class : SpatVector geometry : points dimensions : 10000, 3 (geometries, attributes) extent : -3.396064, 3.618107, -4.126628, 3.727291 (xmin, xmax, ymin, ymax) coord. ref. : names : feat_ID meta feat_ID_uniq type : <chr> <chr> <int> values : gene_00001 metadata_00001 1 gene_00002 metadata_00002 2 gene_00003 metadata_00003 3 Figure 5.8: giottoPoints plotted without rasterization (left), with rasterization and colored by density (right) 5.2.4 giottoPolygon Polygon information is often provided as a known spatial format or as image masks, which can be read in as shown earlier. However, they can also be provided as numerical values. This is the case for Vizgen MERSCOPE and 10X Xenium outputs, both of which now use .parquet to provide cell barcodes and xy vertices associated with them. set.seed(1234) hex <- hexVertices(radius = 1) spatlocs <- data.table::data.table( sdimx = rnorm(10, mean = 5, sd = 20), sdimy = rnorm(10, mean = 5, sd = 20), cell_ID = paste0("spot_", seq_len(10)) ) random_hex <- polyStamp(hex, spatlocs) random_hex_poly <- createGiottoPolygon(random_hex) plot(random_hex_poly) Figure 5.9: giottoPolygon created from ID and vertices 5.3 Subobject utility functions The giotto object is hierarchically organized first by slots that define their subobject/information type, then usually by which spatial unit and feature type information they contain. Lastly, they have specific object names. This makes the object very manually explorable. Most of the subobjects are tagged with metadata information that allow them find their place within this nesting, and there are also common functions that giotto subobjects respond to. 5.3.1 IDs spatIDs() and featIDs() are used to find the spatial or feature IDs of an object. spatIDs(sl_xy) [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" [13] "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" [25] "25" "26" "27" "28" "29" "30" "31" "32" "33" "34" "35" "36" [37] "37" "38" "39" "40" "41" "42" "43" "44" "45" "46" "47" "48" [49] "49" "50" "51" "52" "53" "54" "55" "56" "57" "58" "59" "60" [61] "61" "62" "63" "64" "65" "66" "67" "68" "69" "70" "71" "72" [73] "73" "74" "75" "76" "77" "78" "79" "80" "81" "82" "83" "84" [85] "85" "86" "87" "88" "89" "90" "91" "92" "93" "94" "95" "96" [97] "97" "98" "99" "100" spatIDs(gpoly) "a" "b" "c" "d" "e" "f" "g" head(featIDs(gpoints)) "gene_00001" "gene_00002" "gene_00003" "gene_00004" "gene_00005" "gene_00006" 5.3.2 Bracket subsetting and extraction Most of the subobjects also respond to indexing with [, but since many of them are wrappers around an underlying data structure, empty [ calls will drop the object to the contained data structure gpoly[1:2] An object of class giottoPolygon spat_unit : "test" Spatial Information: class : SpatVector geometry : polygons dimensions : 2, 2 (geometries, attributes) extent : 3.015771, 12, 1.003947, 6.996053 (xmin, xmax, ymin, ymax) coord. ref. : names : poly_ID idx type : <chr> <int> values : a 10 b 9 centroids : NULL overlaps : NULL gpoly[c("a", "e")] An object of class giottoPolygon spat_unit : "test" Spatial Information: class : SpatVector geometry : polygons dimensions : 2, 2 (geometries, attributes) extent : 3.015771, 27, 1.003947, 6.996053 (xmin, xmax, ymin, ymax) coord. ref. : names : poly_ID idx type : <chr> <int> values : a 10 e 6 gpoints[] class : SpatVector geometry : points dimensions : 10000, 3 (geometries, attributes) extent : -3.396064, 3.618107, -4.126628, 3.727291 (xmin, xmax, ymin, ymax) coord. ref. : names : feat_ID meta feat_ID_uniq type : <chr> <chr> <int> values : gene_00001 metadata_00001 1 gene_00002 metadata_00002 2 gene_00003 metadata_00003 3 5.3.3 Nesting metadata generics spatUnit(), featType(), objName, prov() are all generics that act on the metadata of the subobjects. They work both to access and replace the information. featType(x) [1] "rna" objName(x) <- "raw2" spatUnit(x) <- "aggregate" force(x) An object of class exprObj : "raw2" spat_unit : "aggregate" feat_type : "rna" contains: 10 x 10 sparse Matrix of class "dgCMatrix" feat_01 . 1 1 1 . . . . . . feat_02 . . . . . . . . . . feat_03 . . . . . 1 . . . . feat_04 1 . . . 1 . . . . . ........suppressing 2 rows in show(); maybe adjust options(max.print=, width=) feat_07 . 1 . . . 1 . . . 1 feat_08 . . . . . . . . . . feat_09 . . . . . . . . . . feat_10 . . . . . . . . . . First four colnames: cell_01 cell_02 cell_03 cell_04 5.3.4 Appending to a giotto object Subobjects are formatted to for Giotto and can directly be added to the giotto object using the setGiotto() generic. # initialize an empty object g <- giotto() g <- setGiotto(g, x) force(g) An object of class giotto >Active spat_unit: aggregate >Active feat_type: rna [SUBCELLULAR INFO] [AGGREGATE INFO] expression ----------------------- [aggregate][rna] raw2 Use objHistory() to see steps and params used "],["creating-a-giotto-object.html", "6 Creating a Giotto object 6.1 Overview 6.2 GiottoData modular package 6.3 From matrix + locations 6.4 From subcellular raw data (transcripts or images) + polygons 6.5 From piece-wise 6.6 Using convenience functions for popular technologies (Vizgen, Xenium, CosMx, …) 6.7 Plotting 6.8 Subsetting", " 6 Creating a Giotto object Jiaji George Chen August 5th 2024 6.1 Overview The minimal amount of raw data needed to put together a fully functional giotto object are either of the following: spatial coordinates (centroids) and expression matrix information spatial feature information (points or image intensity values) and spatial annotations to aggregate that feature information with (polygons/mask). You can either use the create* style functions introduced in the previous session and build up the object piecewise or you can use the giotto object constructor functions createGiottoObject() and createGiottoObjectSubcellular() 6.2 GiottoData modular package We can showcase the construction of objects by pulling some raw data from the GiottoData package. A dataset was loaded from here earlier in the previous section, but to formally introduce it, this package contains mini datasets and also download links to other publicly available datasets. It helps with prototyping and development and also making reproducible examples. The mini examples from popular platform datasets can also help give an understanding of what their data is like and how Giotto represents them. 6.3 From matrix + locations For this, we will load some visium expression information and spatial locations. library(Giotto) # function to get a filepath from GiottoData mini_vis_raw <- function(x) { system.file( package = "GiottoData", file.path("Mini_datasets", "Visium", "Raw", x) ) } mini_vis_expr <- mini_vis_raw("visium_DG_expr.txt.gz") |> data.table::fread() |> GiottoUtils::dt_to_matrix() mini_vis_expr[seq(5), seq(5)] 5 x 5 sparse Matrix of class "dgCMatrix" AAAGGGATGTAGCAAG-1 AAATGGCATGTCTTGT-1 AAATGGTCAATGTGCC-1 AAATTAACGGGTAGCT-1 AACAACTGGTAGTTGC-1 Gna12 1 2 1 1 9 Ccnd2 . 1 1 . . Btbd17 . 1 1 1 . Sox9 . . . . . Sez6 . 1 4 3 . mini_vis_slocs <- mini_vis_raw("visium_DG_locs.txt") |> data.table::fread() head(mini_vis_slocs) V1 V2 <int> <int> 1: 5477 -4125 2: 5959 -2808 3: 4720 -5202 4: 5202 -5322 5: 4101 -4604 6: 5821 -3047 With these two pieces of data, we can make a fully working giotto object. The spatial locations are missing cell_ID names, but they will be detected from the expression information. mini_vis <- createGiottoObject( expression = mini_vis_expr, spatial_locs = mini_vis_slocs ) instructions(mini_vis, "return_plot") <- FALSE # set return_plot = FALSE otherwise we will get duplicate outputs in code chunks For a simple example plot: spatFeatPlot2D(mini_vis, feats = c("Gna12", "Gfap"), expression_values = "raw", point_size = 2.5, gradient_style = "sequential", background_color = "black" ) Figure 6.1: Example spatial feature plot to show functioning object 6.4 From subcellular raw data (transcripts or images) + polygons You can also make giotto objects starting from raw spatial feature information and annotations that give them spatial context. # function to get a filepath from GiottoData mini_viz_raw <- function(x) { system.file( package = "GiottoData", file.path("Mini_datasets", "Vizgen", "Raw", x) ) } mini_viz_dt <- mini_viz_raw(file.path("cell_boundaries", "z0_polygons.gz")) |> data.table::fread() mini_viz_poly <- createGiottoPolygon(mini_viz_dt) force(mini_viz_poly) An object of class giottoPolygon spat_unit : "cell" Spatial Information: class : SpatVector geometry : polygons dimensions : 498, 1 (geometries, attributes) extent : 6399.244, 6903.243, -5152.39, -4694.868 (xmin, xmax, ymin, ymax) coord. ref. : names : poly_ID type : <chr> values : 40951783403982682273285375368232495429 240649020551054330404932383065726870513 274176126496863898679934791272921588227 centroids : NULL overlaps : NULL plot(mini_viz_poly) Figure 6.2: Example MERSCOPE polygons loaded form vertex info mini_viz_tx <- mini_viz_raw("vizgen_transcripts.gz") |> data.table::fread() mini_viz_tx[, global_y := -global_y] # flip values to match polys viz_gpoints <- createGiottoPoints(mini_viz_tx) force(viz_gpoints) An object of class giottoPoints feat_type : "rna" Feature Information: class : SpatVector geometry : points dimensions : 80343, 3 (geometries, attributes) extent : 6400.037, 6900.032, 4699.979, 5149.983 (xmin, xmax, ymin, ymax) coord. ref. : names : feat_ID global_z feat_ID_uniq type : <chr> <int> <int> values : Mlc1 0 1 Gprc5b 0 2 Gfap 0 3 plot(viz_gpoints) Figure 6.3: Example mini MERSCOPE transcripts data mini_viz <- createGiottoObjectSubcellular( gpolygons = mini_viz_poly, gpoints = viz_gpoints ) instructions(mini_viz, "return_plot") <- FALSE force(mini_viz) An object of class giotto >Active spat_unit: cell >Active feat_type: rna [SUBCELLULAR INFO] polygons : cell features : rna [AGGREGATE INFO] Use objHistory() to see steps and params used # calculate centroids mini_viz <- addSpatialCentroidLocations(mini_viz) # create aggregated information mini_viz <- calculateOverlap(mini_viz) mini_viz <- overlapToMatrix(mini_viz) spatFeatPlot2D( mini_viz, feats = c("Grm4", "Gfap"), expression_values = "raw", point_size = 2.5, gradient_style = "sequential", background_color = "black" ) Figure 6.4: Example mini MERSCOPE aggregated feature counts 6.5 From piece-wise You can also piece-wise assemble an object independently of one of the 2 previously shown convenience functions. g <- giotto() # initialize empty gobject g <- setGiotto(g, mini_viz_poly) g <- setGiotto(g, viz_gpoints) force(g) An object of class giotto >Active spat_unit: cell >Active feat_type: rna [SUBCELLULAR INFO] polygons : cell features : rna [AGGREGATE INFO] Use objHistory() to see steps and params used This is essentially the same object as the one created through createGiottoObjectSubcellular() earlier. 6.6 Using convenience functions for popular technologies (Vizgen, Xenium, CosMx, …) There are also several convenience functions we provide for loading in data from popular platforms. These functions take care of reading the expected output folder structures, auto-detecting where needed data items are, formatting items for ingestion, then object creation. Many of these will be touched on later during other sessions. createGiottoVisiumObject() createGiottoVisiumHDObject() createGiottoXeniumObject() createGiottoCosMxObject() createGiottoMerscopeObject() 6.7 Plotting 6.7.1 Subobject plotting Giotto has several spatial plotting functions. At the lowest level, you directly call plot() on several subobjects in order to see what they look like, particularly the ones containing spatial info. Here we load several mini subobjects which are taken from the vizgen MERSCOPE mini dataset. To see which mini objects are available for independent loading with GiottoData::loadSubObjectMini(), you can run GiottoData::listSubobjectMini() gpoints <- GiottoData::loadSubObjectMini("giottoPoints") plot(gpoints) plot(gpoints, dens = TRUE, col = getColors("magma", 255)) plot(gpoints, raster = FALSE) plot(gpoints, feats = c("Grm4", "Gfap")) Figure 6.5: giottoPoints plots. Rasterized (top left), Rasterized and colored with ‘magma’ color scale by density (top right), Non-rasterized (bottom left), Plotting specifically 2 features (bottom right) gpoly <- GiottoData::loadSubObjectMini("giottoPolygon") plot(gpoly) plot(gpoly, type = "centroid") plot(gpoly, max_poly = 10) Figure 6.6: giottoPolygon plots. default (left), plotting centroids (middle), auto changing to centroids after there are more polygons to plot than max_poly param (right) spatlocs <- GiottoData::loadSubObjectMini("spatLocsObj") plot(spatlocs) Figure 6.7: Plot of spatLocsObj spatnet <- GiottoData::loadSubObjectMini("spatialNetworkObj") plot(spatnet) Figure 6.8: Plot of spatialNetworkObj pca <- GiottoData::loadSubObjectMini("dimObj") plot(pca, dims = c(3,10)) Figure 6.9: Plot of PCA dimObj showing the 3rd and 10th PCs 6.7.2 Additive subobject plotting These base plotting functions inherit from terra::plot(). They can be used additively with more than one object. gimg <- GiottoData::loadSubObjectMini("giottoLargeImage") plot(gimg, col = getMonochromeColors("#5FAFFF")) plot(gpoly, border = "maroon", lwd = 0.5, add = TRUE) Figure 6.10: Plot image with monochrome color scaling with added polygon borders 6.7.3 Giotto object plotting Giotto also has several ggplot2-based plotting functions that work on the whole giotto object. Here we load the vizgen mini dataset from GiottoData which contains a lot of worked through data. 6.7.3.1 Giotto spatial plot functions spatPlot() - standard centroid-based plotting geared towards metadata plotting g <- GiottoData::loadGiottoMini("vizgen") activeSpatUnit(g) <- "aggregate" # set default spat_unit to the one with lots of results force(g) An object of class giotto >Active spat_unit: aggregate >Active feat_type: rna [SUBCELLULAR INFO] polygons : z0 z1 aggregate features : rna [AGGREGATE INFO] expression ----------------------- [z0][rna] raw [z1][rna] raw [aggregate][rna] raw normalized scaled pearson spatial locations ---------------- [z0] raw [z1] raw [aggregate] raw spatial networks ----------------- [aggregate] Delaunay_network kNN_network spatial enrichments -------------- [aggregate][rna] cluster_metagene dim reduction -------------------- [aggregate][rna] pca umap tsne nearest neighbor networks -------- [aggregate][rna] sNN.pca attached images ------------------ images : 4 items... Use objHistory() to see steps and params used spatPlot2D(g) What metadata do we have in this mini object? pDataDT(g) cell_ID nr_feats perc_feats total_expr leiden_clus <char> <int> <num> <num> <num> 1: 240649020551054330404932383065726870513 5 1.483680 49.40986 2 2: 274176126496863898679934791272921588227 27 8.011869 191.50684 2 3: 323754550002953984063006506310071917306 23 6.824926 173.86955 4 4: 87260224659312905497866017323180367450 37 10.979228 246.04928 5 5: 17817477728742691260808256980746537959 18 5.341246 142.44520 4 --- 458: 6380671372744430258754116433861320161 54 16.023739 339.24383 2 459: 75286702783716447443887872812098770697 45 13.353116 286.81011 1 460: 9677424102111816817518421117250891895 30 8.902077 211.71790 2 461: 17685062374745280598492217386845129350 5 1.483680 48.99550 2 462: 32422253415776258079819139802733069941 12 3.560831 102.52805 2 louvain_clus <num> 1: 0 2: 3 3: 8 4: 6 5: 7 --- 458: 0 459: 23 460: 3 461: 14 462: 0 We have some expression count statistics and clustering annotations already present in the object spatPlot2D(g, cell_color = "leiden_clus") spatPlot2D(g, cell_color = "leiden_clus", show_image = TRUE, image_name = "dapi_z0") spatPlot2D(g, cell_color = "total_expr", color_as_factor = FALSE, gradient_style = "sequential") spatPlot2D(g, cell_color = "leiden_clus", group_by = "leiden_clus") Figure 6.11: Spatial plots spatCellPlot() - centroid-based plotting for spatial enrichment values We have a cluster_metagene enrichment already made in the object that is a numerical measure of how much each of the cells map to the leiden clusters we have above spatCellPlot2D(g, spat_enr_names = "cluster_metagene", cell_annotation_values = as.character(1:5)) Figure 6.12: Spatial cell plot of cluster_metagene spatial enrichments spatCellPlot2D(g, spat_enr_names = "cluster_metagene", cell_annotation_values = as.character(1:5), cell_color_gradient = "magma", background_color = "black") Figure 6.13: Spatial cell plot of cluster_metagene spatial enrichments spatFeatPlot() - centroid-based plotting for feature expression plotting spatFeatPlot2D(g, feats = c("Flt4", "Mertk"), point_size = 2, expression_values = "scaled") Figure 6.14: Spatial feature expression plot of normalized Flt4 (left) and Mertk expression (right) spatInSituPlotPoints() - subcellular plotting with support for transcript points and polygons spatInSituPlotPoints(g, feats = list(rna = c("Flt4", "Mertk", "Gfap")), # this should be a named list point_size = 0.5, polygon_fill = "total_expr", polygon_fill_as_factor = FALSE, polygon_fill_gradient_style = "sequential", polygon_alpha = 0.5, plot_last = "points", show_image = TRUE ) # without overlaps spatInSituPlotPoints(g, feats = list(rna = c("Flt4", "Mertk", "Gfap")), # this should be a named list point_size = 0.5, use_overlap = FALSE, polygon_fill = "total_expr", polygon_fill_as_factor = FALSE, polygon_fill_gradient_style = "sequential", polygon_alpha = 0.5, plot_last = "points", show_image = TRUE ) Figure 6.15: Points and polygons subcellular plot with 3 transcript species plotted, polygons colored as number of detected transcripts, and dapi image plotted. Left is with only the points overlapped by polygons, right is with all points 6.7.3.2 Giotto expression space plot functions dimPlot() - dimension reduction plotting Also has more specific functions for PCA plotPCA(), UMAP plotUMAP(), tSNE plotTSNE() results. dimPlot(g, dim_reduction_name = "umap", dim_reduction_to_use = "umap", cell_color = "leiden_clus") Figure 6.16: UMAP projection with leiden clustering colors 6.7.3.3 Giotto common plotting args gradient_style - Should the gradient be of ‘divergent’ or ‘sequential’ styles? color_as_factor - Is annotation value a numerical or factor/categorical based item to plot. cell_color_code - What color mapping to provide cell_color - What column of information to use when plotting (metadata, expression, etc.) point_shape - Either ‘border’ or ‘no_border’ to draw on the points. 6.8 Subsetting 6.8.1 ID subsetting Subset the giotto object for a random 300 cell IDs cx <- pDataDT(g) nrow(cx) [1] 462 ex <- getExpression(g) dim(ex) [1] 337 462 instructions(g, "cell_color_c_pal") <- "viridis" instructions(g, "poly_color_c_pal") <- "viridis" set.seed(1234) gsubset <- subsetGiotto(g, cell_ids = sample(spatIDs(g), 300)) cx_sub <- pDataDT(gsubset) nrow(cx_sub) [1] 300 spatPlot(g, cell_color = "total_expr", color_as_factor = FALSE, background_color = "black") spatPlot(gsubset, cell_color = "total_expr", color_as_factor = FALSE, background_color = "black") Figure 6.17: plot showing starting object (left) and subset object (right) 6.8.2 Coordinate-based subsetting gsubsetlocs <- subsetGiottoLocs(g, x_min = 6500, x_max = 6700, poly_info = "aggregate" ) spatPlot(gsubsetlocs, cell_color = "total_expr", color_as_factor = FALSE, background_color = "black") spatInSituPlotPoints(gsubsetlocs, polygon_fill = "total_expr", polygon_fill_as_factor = FALSE) Figure 6.18: plot showing starting object (left) and subset object (right) "],["visium-part-i.html", "7 Visium Part I 7.1 The Visium technology 7.2 Introduction to the spatial dataset 7.3 Download dataset 7.4 Create the Giotto object 7.5 Subset on spots that were covered by tissue 7.6 Quality control 7.7 Filtering 7.8 Normalization 7.9 Feature selection 7.10 Dimension Reduction 7.11 Clustering 7.12 Save the object 7.13 Session info", " 7 Visium Part I Joselyn Cristina Chávez Fuentes August 5th 2024 7.1 The Visium technology Visium allows you to perform spatial transcriptomics, which combines histological information with whole transcriptome gene expression profiles (fresh frozen or FFPE) to provide you with spatially resolved gene expression. Figure 7.1: Visum workflow. Source: 10X Genomics You can use standard fixation and staining techniques, including hematoxylin and eosin (H&E) staining, to visualize tissue sections on slides using a brightfield microscope and immunofluorescence (IF) staining to visualize protein detection in tissue sections on slides using a fluorescent microscope. 7.2 Introduction to the spatial dataset The visium fresh frozen mouse brain tissue (Strain C57BL/6) dataset was obtained from 10X genomics. The tissue was embedded and cryosectioned as described in Visium Spatial Protocols - Tissue Preparation Guide (Demonstrated Protocol CG000240). Tissue sections of 10 µm thickness from a slice of the coronal plane were placed on Visium Gene Expression Slides. You can find more information about his sample here 7.3 Download dataset You need to download the expression matrix and spatial information by running these commands: dir.create("data/01_session5") download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_raw_feature_bc_matrix.tar.gz", destfile = "data/01_session5/V1_Adult_Mouse_Brain_raw_feature_bc_matrix.tar.gz") download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_spatial.tar.gz", destfile = "data/01_session5/V1_Adult_Mouse_Brain_spatial.tar.gz") After downloading, unzip the gz files. You should get the “raw_feature_bc_matrix” and “spatial” folders inside “data/01_session5/”. untar(tarfile = "data/01_session5/V1_Adult_Mouse_Brain_raw_feature_bc_matrix.tar.gz", exdir = "data/01_session5") untar(tarfile = "data/01_session5/V1_Adult_Mouse_Brain_spatial.tar.gz", exdir = "data/01_session5") 7.4 Create the Giotto object createGiottoVisiumObject() will look for the standardized files organization from the visium technology in the data folder and will automatically load the expression and spatial information to create the Giotto object. library(Giotto) ## Set instructions results_folder <- "results/01_session5" python_path <- NULL instructions <- createGiottoInstructions( save_dir = results_folder, save_plot = TRUE, show_plot = FALSE, return_plot = FALSE, python_path = python_path ) ## Provide the path to the visium folder data_path <- "data/01_session5" ## Create object directly from the visium folder visium_brain <- createGiottoVisiumObject( visium_dir = data_path, expr_data = "raw", png_name = "tissue_lowres_image.png", gene_column_index = 2, instructions = instructions ) 7.5 Subset on spots that were covered by tissue Use the metadata column “in_tissue” to highlight the spots corresponding to the tissue area. spatPlot2D( gobject = visium_brain, cell_color = "in_tissue", point_size = 2, cell_color_code = c("0" = "lightgrey", "1" = "blue"), show_image = TRUE) Figure 7.2: Spatial plot of the Visium mouse brain sample, color indicates wheter the spot is in tissue (1) or not (0). Use the same metadata column “in_tissue” to subset the object and keep only the spots corresponding to the tissue area. metadata <- getCellMetadata(gobject = visium_brain, output = "data.table") in_tissue_barcodes <- metadata[in_tissue == 1]$cell_ID visium_brain <- subsetGiotto(gobject = visium_brain, cell_ids = in_tissue_barcodes) 7.6 Quality control Statistics Use the function addStatistics() to count the number of features per spot. The statistics information will be stored in the metadata table under the new column “nr_feats”. Then, use this column to visualize the number of features per spot across the sample. visium_brain_statistics <- addStatistics(gobject = visium_brain, expression_values = "raw") ## visualize spatPlot2D(gobject = visium_brain_statistics, cell_color = "nr_feats", color_as_factor = FALSE) Figure 7.3: Spatial distribution of features per spot. filterDistributions() creates a histogram to show the distribution of features per spot across the sample. filterDistributions(gobject = visium_brain_statistics, detection = "cells") Figure 7.4: Distribution of features per spot. When setting the detection = “feats”, the histogram shows the distribution of cells with certain numbers of features across the sample. filterDistributions(gobject = visium_brain_statistics, detection = "feats") Figure 7.5: Distribution of cells with different features per spot. filterCombinations() may be used to test how different filtering parameters will affect the number of cells and features in the filtered data: filterCombinations(gobject = visium_brain_statistics, expression_thresholds = c(1, 2, 3), feat_det_in_min_cells = c(50, 100, 200), min_det_feats_per_cell = c(500, 1000, 1500)) Figure 7.6: Number of spots and features filtered when using multiple feat_det_in_min_cells and min_det_feats_per_cell combinations. 7.7 Filtering Use the arguments feat_det_in_min_cells and min_det_feats_per_cell to set the minimal number of cells where an individual feature must be detected and the minimal number of features per spot/cell, respectively, to filter the giotto object. All the features and cells under those thresholds will be removed from the sample. visium_brain <- filterGiotto( gobject = visium_brain, expression_threshold = 1, feat_det_in_min_cells = 50, min_det_feats_per_cell = 1000, expression_values = "raw", verbose = TRUE ) Feature type: rna Number of cells removed: 4 out of 2702 Number of feats removed: 7311 out of 22125 7.8 Normalization Use scalefactor to set the scale factor to use after library size normalization. The default value is 6000, but you can use a different one. visium_brain <- normalizeGiotto( gobject = visium_brain, scalefactor = 6000, verbose = TRUE ) Calculate the normalized number of features per spot and save the statistics in the metadata table. visium_brain <- addStatistics(gobject = visium_brain) ## visualize spatPlot2D(gobject = visium_brain, cell_color = "nr_feats", color_as_factor = FALSE) Figure 7.7: Spatial distribution of the number of features per spot. 7.9 Feature selection 7.9.1 Highly Variable Features: Calculating Highly Variable Features (HVF) is necessary to identify genes (or features) that display significant variability across the spots. There are a few methods to choose from depending on the underlying distribution of the data: loess regression is used when the relationship between mean expression and variance is non-linear or can be described by a non-parametric model. visium_brain <- calculateHVF(gobject = visium_brain, method = "cov_loess", save_plot = TRUE, default_save_name = "HVFplot_loess") Figure 7.8: Covariance of HVFs using the loess method. pearson residuals are used for variance stabilization (to account for technical noise) and highlighting overdispersed genes. visium_brain <- calculateHVF(gobject = visium_brain, method = "var_p_resid", save_plot = TRUE, default_save_name = "HVFplot_pearson") Figure 7.9: Variance of HVFs using the pearson residuals method. binned (covariance groups) are used when gene expression variability differs across expression levels or spatial regions, without assuming a specific relationship between mean expression and variance. This is the default method in the calculateHVF() function. visium_brain <- calculateHVF(gobject = visium_brain, method = "cov_groups", save_plot = TRUE, default_save_name = "HVFplot_binned") Figure 7.10: Covariance of HVFs using the binned method. 7.10 Dimension Reduction 7.10.1 PCA Principal Components Analysis (PCA) is applied to reduce the dimensionality of gene expression data by transforming it into principal components, which are linear combinations of genes ranked by the variance they explain, with the first components capturing the most variance. runPCA() will look for the previous calculation of highly variable features, stored as a column in the feature metadata. If the HVF labels are not found in the giotto object, then runPCA() will use all the features available in the sample to calculate the Principal Components. visium_brain <- runPCA(gobject = visium_brain) You can also use specific features for the Principal Components calculation, by passing a vector of features in the “feats_to_use” argument. my_features <- head(getFeatureMetadata(visium_brain, output = "data.table")$feat_ID, 1000) visium_brain <- runPCA(gobject = visium_brain, feats_to_use = my_features, name = "custom_pca") Visualization Create a screeplot to visualize the percentage of variance explained by each component. screePlot(gobject = visium_brain, ncp = 30) Figure 7.11: Screeplot showing the variance explained per principal component. Visualized the PCA calculated using the HVFs. plotPCA(gobject = visium_brain) Figure 7.12: PCA plot using HVFs. Visualized the custom PCA calculated using the vector of features. plotPCA(gobject = visium_brain, dim_reduction_name = "custom_pca") Figure 7.13: PCA using custom features. Unlike PCA, Uniform Manifold Approximation and Projection (UMAP) and t-Stochastic Neighbor Embedding (t-SNE) do not assume linearity. After running PCA, UMAP or t-SNE allows you to visualize the dataset in 2D. 7.10.2 UMAP visium_brain <- runUMAP(visium_brain, dimensions_to_use = 1:10) Visualization plotUMAP(gobject = visium_brain) Figure 7.14: UMAP using the 10 first principal components. 7.10.3 t-SNE visium_brain <- runtSNE(gobject = visium_brain, dimensions_to_use = 1:10) Visualization plotTSNE(gobject = visium_brain) Figure 7.15: tSNE using the 10 first principal components. 7.11 Clustering Create a sNN network (default) visium_brain <- createNearestNetwork(gobject = visium_brain, dimensions_to_use = 1:10, k = 15) Create a kNN network visium_brain <- createNearestNetwork(gobject = visium_brain, dimensions_to_use = 1:10, k = 15, type = "kNN") 7.11.1 Calculate Leiden clustering Use the previously calculated shared nearest neighbors to create clusters. The default resolution is 1, but you can decrease the value to avoid the over calculation of clusters. visium_brain <- doLeidenCluster(gobject = visium_brain, resolution = 0.4, n_iterations = 1000) Visualization plotPCA(gobject = visium_brain, cell_color = "leiden_clus") Figure 7.16: PCA plot, colors indicate the Leiden clusters. Use the cluster IDs to visualize the clusters in the UMAP space. plotUMAP(gobject = visium_brain, cell_color = "leiden_clus", show_NN_network = FALSE, point_size = 2.5) Figure 7.17: UMAP plot, colors indicate the Leiden clusters. Set the argument “show_NN_network = TRUE” to visualize the connections between spots. plotUMAP(gobject = visium_brain, cell_color = "leiden_clus", show_NN_network = TRUE, point_size = 2.5) Figure 7.18: UMAP showing the nearest network. Use the cluster IDs to visualize the clusters on the tSNE. plotTSNE(gobject = visium_brain, cell_color = "leiden_clus", point_size = 2.5) Figure 7.19: tSNE plot, colors indicate the Leiden clusters. Set the argument “show_NN_network = TRUE” to visualize the connections between spots. plotTSNE(gobject = visium_brain, cell_color = "leiden_clus", point_size = 2.5, show_NN_network = TRUE) Figure 7.20: tSNE showing the nearest network. Use the cluster IDs to visualize their spatial location. spatPlot2D(visium_brain, cell_color = "leiden_clus", point_size = 3) Figure 7.21: Spatial plot, colors indicate the Leiden clusters. 7.11.2 Calculate Louvain clustering Louvain is an alternative clustering method, used to detect communities in large networks. visium_brain <- doLouvainCluster(visium_brain) spatPlot2D(visium_brain, cell_color = "louvain_clus") Figure 7.22: Spatial plot, colors indicate the Louvain clusters. You can find more information about the differences between the Leiden and Louvain methods in this paper: From Louvain to Leiden: guaranteeing well-connected communities, 2019 7.12 Save the object saveGiotto(visium_brain, "results/01_session5/visium_brain_object") 7.13 Session info sessionInfo() R version 4.4.1 (2024-06-14) Platform: aarch64-apple-darwin20 Running under: macOS Sonoma 14.5 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: America/New_York tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Giotto_4.1.0 GiottoClass_0.3.3 loaded via a namespace (and not attached): [1] colorRamp2_0.1.0 deldir_2.0-4 [3] rlang_1.1.4 magrittr_2.0.3 [5] GiottoUtils_0.1.10 matrixStats_1.3.0 [7] compiler_4.4.1 png_0.1-8 [9] systemfonts_1.1.0 vctrs_0.6.5 [11] reshape2_1.4.4 stringr_1.5.1 [13] pkgconfig_2.0.3 SpatialExperiment_1.14.0 [15] crayon_1.5.3 fastmap_1.2.0 [17] backports_1.5.0 magick_2.8.4 [19] XVector_0.44.0 labeling_0.4.3 [21] utf8_1.2.4 rmarkdown_2.27 [23] UCSC.utils_1.0.0 ragg_1.3.2 [25] purrr_1.0.2 xfun_0.46 [27] beachmat_2.20.0 zlibbioc_1.50.0 [29] GenomeInfoDb_1.40.1 jsonlite_1.8.8 [31] DelayedArray_0.30.1 BiocParallel_1.38.0 [33] terra_1.7-78 irlba_2.3.5.1 [35] parallel_4.4.1 R6_2.5.1 [37] stringi_1.8.4 RColorBrewer_1.1-3 [39] reticulate_1.38.0 parallelly_1.37.1 [41] GenomicRanges_1.56.1 scattermore_1.2 [43] Rcpp_1.0.13 bookdown_0.40 [45] SummarizedExperiment_1.34.0 knitr_1.48 [47] future.apply_1.11.2 R.utils_2.12.3 [49] FNN_1.1.4 IRanges_2.38.1 [51] Matrix_1.7-0 igraph_2.0.3 [53] tidyselect_1.2.1 rstudioapi_0.16.0 [55] abind_1.4-5 yaml_2.3.9 [57] codetools_0.2-20 listenv_0.9.1 [59] lattice_0.22-6 tibble_3.2.1 [61] plyr_1.8.9 Biobase_2.64.0 [63] withr_3.0.0 Rtsne_0.17 [65] evaluate_0.24.0 future_1.33.2 [67] pillar_1.9.0 MatrixGenerics_1.16.0 [69] checkmate_2.3.1 stats4_4.4.1 [71] plotly_4.10.4 generics_0.1.3 [73] dbscan_1.2-0 sp_2.1-4 [75] S4Vectors_0.42.1 ggplot2_3.5.1 [77] munsell_0.5.1 scales_1.3.0 [79] globals_0.16.3 gtools_3.9.5 [81] glue_1.7.0 lazyeval_0.2.2 [83] tools_4.4.1 GiottoVisuals_0.2.4 [85] data.table_1.15.4 ScaledMatrix_1.12.0 [87] cowplot_1.1.3 grid_4.4.1 [89] tidyr_1.3.1 colorspace_2.1-0 [91] SingleCellExperiment_1.26.0 GenomeInfoDbData_1.2.12 [93] BiocSingular_1.20.0 rsvd_1.0.5 [95] cli_3.6.3 textshaping_0.4.0 [97] fansi_1.0.6 S4Arrays_1.4.1 [99] viridisLite_0.4.2 dplyr_1.1.4 [101] uwot_0.2.2 gtable_0.3.5 [103] R.methodsS3_1.8.2 digest_0.6.36 [105] BiocGenerics_0.50.0 SparseArray_1.4.8 [107] ggrepel_0.9.5 farver_2.1.2 [109] rjson_0.2.21 htmlwidgets_1.6.4 [111] htmltools_0.5.8.1 R.oo_1.26.0 [113] lifecycle_1.0.4 httr_1.4.7 "],["visium-part-ii.html", "8 Visium Part II 8.1 Load the object 8.2 Differential expression 8.3 Enrichment & Deconvolution 8.4 Spatial expression patterns 8.5 Spatially informed clusters 8.6 Spatial domains HMRF 8.7 Interactive tools 8.8 Save the object 8.9 Session info", " 8 Visium Part II Joselyn Cristina Chávez Fuentes August 6th 2024 8.1 Load the object library(Giotto) visium_brain <- loadGiotto("results/01_session5/visium_brain_object") 8.2 Differential expression 8.2.1 Gini markers The Gini method identifies genes that are very selectively expressed in a specific cluster, however not always expressed in all cells of that cluster. In other words, highly specific but not necessarily sensitive at the single-cell level. Calculate the top marker genes per cluster using the gini method. gini_markers <- findMarkers_one_vs_all(gobject = visium_brain, method = "gini", expression_values = "normalized", cluster_column = "leiden_clus", min_feats = 10) topgenes_gini <- gini_markers[, head(.SD, 2), by = "cluster"]$feats Visualize Plot the normalized expression distribution of the top expressed genes. violinPlot(visium_brain, feats = unique(topgenes_gini), cluster_column = "leiden_clus", strip_text = 6, strip_position = "right", save_param = list(base_width = 5, base_height = 30)) Figure 8.1: Violin plot showing the top gini genes normalized expression. Use the cluster IDs to create a heatmap with the normalized expression of the top expressed genes per cluster. plotMetaDataHeatmap(visium_brain, selected_feats = unique(topgenes_gini), metadata_cols = "leiden_clus", x_text_size = 10, y_text_size = 10) Figure 8.2: Heatmap showing the top gini genes normalized expression per Leiden cluster. Visualize the scaled expression spatial distribution of the top expressed genes across the sample. dimFeatPlot2D(visium_brain, expression_values = "scaled", feats = sort(unique(topgenes_gini)), cow_n_col = 5, point_size = 1, save_param = list(base_width = 15, base_height = 20)) Figure 8.3: Spatial distribution of the top gini genes scaled expression. 8.2.2 Scran markers The Scran method is preferred for robust differential expression analysis, especially when addressing technical variability or differences in sequencing depth across spatial locations. [redo] Calculate the top marker genes per cluster using the scran method scran_markers <- findMarkers_one_vs_all(gobject = visium_brain, method = "scran", expression_values = "normalized", cluster_column = "leiden_clus", min_feats = 10) topgenes_scran <- scran_markers[, head(.SD, 2), by = "cluster"]$feats Visualize Plot the normalized expression distribution of the top expressed genes. violinPlot(visium_brain, feats = unique(topgenes_scran), cluster_column = "leiden_clus", strip_text = 6, strip_position = "right", save_param = list(base_width = 5, base_height = 30)) Figure 8.4: Violin plot of the top scran genes normalized expression. Use the cluster IDs to create a heatmap with the normalized expression of the top expressed genes per cluster. plotMetaDataHeatmap(visium_brain, selected_feats = unique(topgenes_scran), metadata_cols = "leiden_clus", x_text_size = 10, y_text_size = 10) Figure 8.5: Heatmap showing the top scran genes normalized expression per Leiden cluster. Visualize the scaled expression spatial distribution of the top expressed genes across the sample. dimFeatPlot2D(visium_brain, expression_values = "scaled", feats = sort(unique(topgenes_scran)), cow_n_col = 5, point_size = 1, save_param = list(base_width = 20, base_height = 20)) Figure 8.6: Spatial distribution of the top scran genes scaled expression. In practice, it is often beneficial to apply both Gini and Scran methods and compare results for a more complete understanding of differential gene expression across clusters. 8.3 Enrichment & Deconvolution Visium spatial transcriptomics does not provide single-cell resolution, making cell type annotation a harder problem. Giotto provides several ways to calculate enrichment of specific cell-type signature gene lists. Download the single-cell dataset GiottoData::getSpatialDataset(dataset = "scRNA_mouse_brain", directory = "data/02_session1") Create the single-cell object and run the normalization step results_folder <- "results/02_session1" python_path <- NULL instructions <- createGiottoInstructions( save_dir = results_folder, save_plot = TRUE, show_plot = FALSE, python_path = python_path ) sc_expression <- "data/02_session1/brain_sc_expression_matrix.txt.gz" sc_metadata <- "data/02_session1/brain_sc_metadata.csv" giotto_SC <- createGiottoObject(expression = sc_expression, instructions = instructions) giotto_SC <- addCellMetadata(giotto_SC, new_metadata = data.table::fread(sc_metadata)) giotto_SC <- normalizeGiotto(giotto_SC) 8.3.1 PAGE/Rank Parametric Analysis of Gene Set Enrichment (PAGE) and Rank enrichment both aim to determine whether a predefined set of genes show statistically significant differences in expression compared to other genes in the dataset. Calculate the cell type markers markers_scran <- findMarkers_one_vs_all(gobject = giotto_SC, method = "scran", expression_values = "normalized", cluster_column = "Class", min_feats = 3) top_markers <- markers_scran[, head(.SD, 10), by = "cluster"] celltypes <- levels(factor(markers_scran$cluster)) Create the signature matrix sign_list <- list() for (i in 1:length(celltypes)){ sign_list[[i]] = top_markers[which(top_markers$cluster == celltypes[i]),]$feats } sign_matrix <- makeSignMatrixPAGE(sign_names = celltypes, sign_list = sign_list) Run the enrichment test with PAGE visium_brain <- runPAGEEnrich(gobject = visium_brain, sign_matrix = sign_matrix) Visualize Create a heatmap showing the enrichment of cell types (from the single-cell data annotation) in the spatial dataset clusters. cell_types_PAGE <- colnames(sign_matrix) plotMetaDataCellsHeatmap(gobject = visium_brain, metadata_cols = "leiden_clus", value_cols = cell_types_PAGE, spat_enr_names = "PAGE", x_text_size = 8, y_text_size = 8) Figure 8.7: Cell types enrichment per Leiden cluster, identified using the PAGE method. Plot the spatial distribution of the cell types. spatCellPlot2D(gobject = visium_brain, spat_enr_names = "PAGE", cell_annotation_values = cell_types_PAGE, cow_n_col = 3, coord_fix_ratio = 1, point_size = 1, show_legend = TRUE) Figure 8.8: Spatial distribution of cell types identified using the PAGE method. 8.3.2 SpatialDWLS Spatial Dampened Weighted Least Squares (DWLS) estimates the proportions of different cell types across spots in a tissue. Create the signature matrix sign_matrix <- makeSignMatrixDWLSfromMatrix( matrix = getExpression(giotto_SC, values = "normalized", output = "matrix"), cell_type = pDataDT(giotto_SC)$Class, sign_gene = top_markers$feats) Run the DWLS Deconvolution This step may take a couple of minutes to run. visium_brain <- runDWLSDeconv(gobject = visium_brain, sign_matrix = sign_matrix) Visualize Plot the DWLS deconvolution result creating with pie plots showing the proportion of each cell type per spot. spatDeconvPlot(visium_brain, show_image = FALSE, radius = 50, save_param = list(save_name = "8_spat_DWLS_pie_plot")) Figure 8.9: Spatial deconvolution plot showing the proportion of cell types per spot, identified using the DWLS method. 8.4 Spatial expression patterns 8.4.1 Spatial variable genes Create a spatial network visium_brain <- createSpatialNetwork(gobject = visium_brain, method = "kNN", k = 6, maximum_distance_knn = 400, name = "spatial_network") spatPlot2D(gobject = visium_brain, show_network= TRUE, network_color = "blue", spatial_network_name = "spatial_network") Figure 8.10: Spatial network across spots in the Visium mouse sample. Rank binarization Rank the genes on the spatial dataset depending on whether they exhibit a spatial pattern location or not. This step may take a few minutes to run. ranktest <- binSpect(visium_brain, bin_method = "rank", calc_hub = TRUE, hub_min_int = 5, spatial_network_name = "spatial_network") Visualize top results Plot the scaled expression of genes with the highest probability of being spatial genes. spatFeatPlot2D(visium_brain, expression_values = "scaled", feats = ranktest$feats[1:6], cow_n_col = 2, point_size = 1) Figure 8.11: Spatial distribution of the top spatial genes scaled expression. 8.4.2 Spatial co-expression modules Cluster the top 500 spatial genes into 20 clusters ext_spatial_genes <- ranktest[1:500,]$feats Use detectSpatialCorGenes function to calculate pairwise distances between genes. spat_cor_netw_DT <- detectSpatialCorFeats( visium_brain, method = "network", spatial_network_name = "spatial_network", subset_feats = ext_spatial_genes) Identify most similar spatially correlated genes for one gene top10_genes <- showSpatialCorFeats(spat_cor_netw_DT, feats = "Mbp", show_top_feats = 10) Visualize Plot the scaled expression of the 3 genes with most similar spatial patterns to Mbp. spatFeatPlot2D(visium_brain, expression_values = "scaled", feats = top10_genes$variable[1:4], point_size = 1.5) Figure 8.12: Spatial distribution of the scaled expression of 3 genes with similar spatial pattern to Mbp. Cluster spatial genes spat_cor_netw_DT <- clusterSpatialCorFeats(spat_cor_netw_DT, name = "spat_netw_clus", k = 20) Visualize clusters Plot the correlation of the top 500 spatial genes with their assigned cluster. heatmSpatialCorFeats(visium_brain, spatCorObject = spat_cor_netw_DT, use_clus_name = "spat_netw_clus", heatmap_legend_param = list(title = NULL)) Figure 8.13: Correlations heatmap between spatial genes and correlated clusters. Rank spatial correlated clusters and show genes for selected clusters netw_ranks <- rankSpatialCorGroups( visium_brain, spatCorObject = spat_cor_netw_DT, use_clus_name = "spat_netw_clus") Plot the correlation and number of spatial genes in each cluster. top_netw_spat_cluster <- showSpatialCorFeats(spat_cor_netw_DT, use_clus_name = "spat_netw_clus", selected_clusters = 6, show_top_feats = 1) Figure 8.14: Ranking of spatial correlated groups. Size indicates the number spatial genes per group. Create the metagene enrichment score per co-expression cluster cluster_genes_DT <- showSpatialCorFeats(spat_cor_netw_DT, use_clus_name = "spat_netw_clus", show_top_feats = 1) cluster_genes <- cluster_genes_DT$clus names(cluster_genes) <- cluster_genes_DT$feat_ID visium_brain <- createMetafeats(visium_brain, feat_clusters = cluster_genes, name = "cluster_metagene") Plot the spatial distribution of the metagene enrichment scores of each spatial co-expression cluster. spatCellPlot(visium_brain, spat_enr_names = "cluster_metagene", cell_annotation_values = netw_ranks$clusters, point_size = 1, cow_n_col = 5) Figure 8.15: Spatial distribution of metagene enrichment scores per co-expression cluster. 8.5 Spatially informed clusters Get the top 30 genes per spatial co-expression cluster coexpr_dt <- data.table::data.table( genes = names(spat_cor_netw_DT$cor_clusters$spat_netw_clus), cluster = spat_cor_netw_DT$cor_clusters$spat_netw_clus) data.table::setorder(coexpr_dt, cluster) top30_coexpr_dt <- coexpr_dt[, head(.SD, 30) , by = cluster] spatial_genes <- top30_coexpr_dt$genes Re-calculate the clustering Use the spatial genes to calculate again the principal components, umap, network and clustering visium_brain <- runPCA(gobject = visium_brain, feats_to_use = spatial_genes, name = "custom_pca") visium_brain <- runUMAP(visium_brain, dim_reduction_name = "custom_pca", dimensions_to_use = 1:20, name = "custom_umap") visium_brain <- createNearestNetwork(gobject = visium_brain, dim_reduction_name = "custom_pca", dimensions_to_use = 1:20, k = 5, name = "custom_NN") visium_brain <- doLeidenCluster(gobject = visium_brain, network_name = "custom_NN", resolution = 0.15, n_iterations = 1000, name = "custom_leiden") Visualize Plot the spatial distribution of the Leiden clusters calculated based on the spatial genes. spatPlot2D(visium_brain, cell_color = "custom_leiden", point_size = 3) Figure 8.16: Spatial distribution of Leiden clusters calculated using spatial genes. Plot the UMAP and color the spots using the Leiden clusters calculated based on the spatial genes. plotUMAP(gobject = visium_brain, cell_color = "custom_leiden") Figure 8.17: UMAP plot, colors indicate the Leiden clusters calculated using spatial genes. 8.6 Spatial domains HMRF Hidden Markov Random Field (HMRF) models capture spatial dependencies and segment tissue regions based on shared and gene expression patterns. Do HMRF with different betas on top 30 genes per spatial co-expression module This step may take several minutes to run. HMRF_spatial_genes <- doHMRF(gobject = visium_brain, expression_values = "scaled", spatial_genes = spatial_genes, k = 20, spatial_network_name = "spatial_network", betas = c(0, 10, 5), output_folder = "11_HMRF/") Add the HMRF results to the giotto object visium_brain <- addHMRF(gobject = visium_brain, HMRFoutput = HMRF_spatial_genes, k = 20, betas_to_add = c(0, 10, 20, 30, 40), hmrf_name = "HMRF") Visualize Plot the spatial distribution of the HMRF domains. spatPlot2D(gobject = visium_brain, cell_color = "HMRF_k20_b.40") Figure 8.18: Spatial distribution of HMRF domains. 8.7 Interactive tools We have integrated a shiny app in Giotto to interactively select regions of a spatial plot. Create a spatial plot brain_spatPlot <- spatPlot2D(gobject = visium_brain, cell_color = "leiden_clus", show_image = FALSE, return_plot = TRUE, point_size = 1) brain_spatPlot Run the Shiny app plotInteractivePolygons(brain_spatPlot) Figure 8.19: Shiny app using the visium brain sample. Select the regions of interest and save the coordinates polygon_coordinates <- plotInteractivePolygons(brain_spatPlot) Figure 8.20: Polygons selected using the interactive Shiny app. Transform the data.table or data.frame with coordinates into a Giotto polygon object giotto_polygons <- createGiottoPolygonsFromDfr(polygon_coordinates, name = "selections", calc_centroids = TRUE) Add the polygons to the Giotto object visium_brain <- addGiottoPolygons(gobject = visium_brain, gpolygons = list(giotto_polygons)) Add the corresponding polygon IDs to the cell metadata visium_brain <- addPolygonCells(visium_brain, polygon_name = "selections") Extract the coordinates and IDs from cells located within one or multiple regions of interest. getCellsFromPolygon(visium_brain, polygon_name = "selections", polygons = "polygon 1") If no polygon name is provided, the function will retrieve cells located within all polygons getCellsFromPolygon(visium_brain, polygon_name = "selections") Compare the expression levels of some genes of interest between the selected regions comparePolygonExpression(visium_brain, selected_feats = c("Stmn1", "Psd", "Ly6h")) Figure 8.21: Heatmap showing the z-scores of three genes per selected polygon. Calculate the top genes expressed within each region, then provide the result to compare polygons scran_results <- findMarkers_one_vs_all( visium_brain, spat_unit = "cell", feat_type = "rna", method = "scran", expression_values = "normalized", cluster_column = "selections", min_feats = 2) top_genes <- scran_results[, head(.SD, 2), by = "cluster"]$feats comparePolygonExpression(visium_brain, selected_feats = top_genes) Figure 8.22: Heatmap showing the z-scores of top scran genes per selected polygon. Compare the abundance of cell types between the selected regions compareCellAbundance(visium_brain) Figure 8.23: Heatmap showing the cell abundance per selected polygon. Use other columns within the cell metadata table to compare the cell type abundances compareCellAbundance(visium_brain, cell_type_column = "custom_leiden") Figure 8.24: Heatmap showing the Leiden clusters abundance per selected polygon. Use the spatPlot arguments to isolate and plot each region. spatPlot2D(visium_brain, cell_color = "leiden_clus", group_by = "selections", cow_n_col = 3, point_size = 2, show_legend = FALSE) Figure 8.25: Spatial distribution of Leiden clusters across the selected polygons. Color each cell by cluster, cell type or expression level. spatFeatPlot2D(visium_brain, expression_values = "scaled", group_by = "selections", feats = "Psd", point_size = 2) Figure 8.26: Spatial distribution of Psd scaled expression across the selected polygons. Plot again the polygons plotPolygons(visium_brain, polygon_name = "selections", x = brain_spatPlot) Figure 8.27: Spatial location of selected polygons. 8.8 Save the object saveGiotto(visium_brain, "results/02_session1/visium_brain_object") 8.9 Session info sessionInfo() R version 4.4.1 (2024-06-14) Platform: aarch64-apple-darwin20 Running under: macOS Sonoma 14.5 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: America/New_York tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] shiny_1.8.1.1 Giotto_4.1.0 GiottoClass_0.3.3 loaded via a namespace (and not attached): [1] later_1.3.2 tibble_3.2.1 [3] R.oo_1.26.0 polyclip_1.10-7 [5] lifecycle_1.0.4 edgeR_4.2.1 [7] doParallel_1.0.17 lattice_0.22-6 [9] MASS_7.3-61 backports_1.5.0 [11] magrittr_2.0.3 sass_0.4.9 [13] limma_3.60.4 plotly_4.10.4 [15] rmarkdown_2.27 jquerylib_0.1.4 [17] yaml_2.3.9 metapod_1.12.0 [19] httpuv_1.6.15 sp_2.1-4 [21] reticulate_1.38.0 cowplot_1.1.3 [23] RColorBrewer_1.1-3 abind_1.4-5 [25] zlibbioc_1.50.0 quadprog_1.5-8 [27] GenomicRanges_1.56.1 purrr_1.0.2 [29] R.utils_2.12.3 BiocGenerics_0.50.0 [31] tweenr_2.0.3 circlize_0.4.16 [33] GenomeInfoDbData_1.2.12 IRanges_2.38.1 [35] S4Vectors_0.42.1 ggrepel_0.9.5 [37] irlba_2.3.5.1 terra_1.7-78 [39] dqrng_0.4.1 DelayedMatrixStats_1.26.0 [41] colorRamp2_0.1.0 codetools_0.2-20 [43] DelayedArray_0.30.1 scuttle_1.14.0 [45] ggforce_0.4.2 tidyselect_1.2.1 [47] shape_1.4.6.1 UCSC.utils_1.0.0 [49] farver_2.1.2 ScaledMatrix_1.12.0 [51] matrixStats_1.3.0 stats4_4.4.1 [53] GiottoData_0.2.12.0 jsonlite_1.8.8 [55] GetoptLong_1.0.5 BiocNeighbors_1.22.0 [57] progressr_0.14.0 iterators_1.0.14 [59] systemfonts_1.1.0 foreach_1.5.2 [61] dbscan_1.2-0 tools_4.4.1 [63] ragg_1.3.2 Rcpp_1.0.13 [65] glue_1.7.0 SparseArray_1.4.8 [67] xfun_0.46 MatrixGenerics_1.16.0 [69] GenomeInfoDb_1.40.1 dplyr_1.1.4 [71] withr_3.0.0 fastmap_1.2.0 [73] bluster_1.14.0 fansi_1.0.6 [75] digest_0.6.36 rsvd_1.0.5 [77] R6_2.5.1 mime_0.12 [79] textshaping_0.4.0 colorspace_2.1-0 [81] scattermore_1.2 Cairo_1.6-2 [83] gtools_3.9.5 R.methodsS3_1.8.2 [85] utf8_1.2.4 tidyr_1.3.1 [87] generics_0.1.3 data.table_1.15.4 [89] FNN_1.1.4 httr_1.4.7 [91] htmlwidgets_1.6.4 S4Arrays_1.4.1 [93] scatterpie_0.2.3 uwot_0.2.2 [95] pkgconfig_2.0.3 gtable_0.3.5 [97] ComplexHeatmap_2.20.0 GiottoVisuals_0.2.4 [99] SingleCellExperiment_1.26.0 XVector_0.44.0 [101] htmltools_0.5.8.1 bookdown_0.40 [103] clue_0.3-65 scales_1.3.0 [105] Biobase_2.64.0 GiottoUtils_0.1.10 [107] png_0.1-8 SpatialExperiment_1.14.0 [109] scran_1.32.0 ggfun_0.1.5 [111] knitr_1.48 rstudioapi_0.16.0 [113] reshape2_1.4.4 rjson_0.2.21 [115] checkmate_2.3.1 cachem_1.1.0 [117] GlobalOptions_0.1.2 stringr_1.5.1 [119] parallel_4.4.1 miniUI_0.1.1.1 [121] RcppZiggurat_0.1.6 pillar_1.9.0 [123] grid_4.4.1 vctrs_0.6.5 [125] promises_1.3.0 BiocSingular_1.20.0 [127] beachmat_2.20.0 xtable_1.8-4 [129] cluster_2.1.6 evaluate_0.24.0 [131] magick_2.8.4 cli_3.6.3 [133] locfit_1.5-9.10 compiler_4.4.1 [135] rlang_1.1.4 crayon_1.5.3 [137] labeling_0.4.3 plyr_1.8.9 [139] stringi_1.8.4 viridisLite_0.4.2 [141] deldir_2.0-4 BiocParallel_1.38.0 [143] munsell_0.5.1 lazyeval_0.2.2 [145] Matrix_1.7-0 sparseMatrixStats_1.16.0 [147] ggplot2_3.5.1 statmod_1.5.0 [149] SummarizedExperiment_1.34.0 Rfast_2.1.0 [151] memoise_2.0.1 igraph_2.0.3 [153] bslib_0.7.0 RcppParallel_5.1.8 "],["visium-hd.html", "9 Visium HD 9.1 Objective 9.2 Background 9.3 Data Ingestion 9.4 Hexbin 400 Giotto object 9.5 Hexbin 100 9.6 Hexbin 25 9.7 Database backend - Work in progress, but coming soon!", " 9 Visium HD Ruben Dries & Edward C. Ruiz August 6th 2024 9.1 Objective This tutorial demonstrates how to process Visium HD data at the highest 2 micron bin resolution by using flexible tiling and aggregation steps that are available in Giotto Suite. Notably, a similar strategy can be used for other spatial sequencing methods that operate at the subcellular level, including: - Stereo-seq - Seq-Scope - Open-ST The resulting datasets from all these technologies can be very large since they provide both a high spatial resolution and genome-wide capture of all transcripts. We will also discuss how data projection strategies can be used to alleviate heavy computational tasks such as PCA, UMAP, or clustering. This tutorial expects a general knowledge of common spatial analysis technologies that are available in Giotto Suite, such as those that have been discussed in the standard Visium tutorials (part I and part II). 9.2 Background 9.2.1 Visium HD Technology Figure 9.1: Overview of Visium HD. Source: 10X Genomics Visium HD is a spatial transcriptomics technology recently developed by 10X Genomics. Details about this platform are discussed on the official 10X Genomics Visium HD website and the preprint by Oliveira et al. 2024 on bioRxiv. Visium HD has a 2 micron bin size resolution. The default SpaceRanger pipeline from 10X Genomics also returns aggregated data at the 8 and 16 micron bin size. 9.2.2 Colorectal Cancer Sample Figure 9.2: Colorectal Cancer Overview. Source: 10X Genomics For this tutorial we will be using the publicly available Colorectal Cancer Visium HD dataset. Details about this dataset and a link to download the raw data can be found at the 10X Genomics website. 9.3 Data Ingestion 9.3.1 Visium HD output data format Figure 9.3: File structure of Visium HD data processed with spaceranger pipeline. Visium HD data processed with the spaceranger pipeline is organized in this format containing various files associated with the sample. The files highlighted in yellow are what we will be using to read in these datasets. Warning: the VisiumHD folder structure has very recently been updated and might be slightly different. 9.3.2 Mini Visium HD dataset For this workshop we will use a spatial subset and downsampled version of the original datasets. A VisiumHD folder similar to the original can be downloaded using the Zenodo link. Using this dataset will ensure that we will not run into major memory issues. library(Giotto) # set up paths data_path <- "data/02_session2/" save_dir <- "results/02_session2/" dir.create(save_dir, recursive = TRUE) # download the mini dataset and untar options("timeout" = Inf) download.file( url = "https://zenodo.org/records/13226158/files/workshop_VisiumHD.zip?download=1", destfile = file.path(save_dir, "workshop_visiumHD.zip") ) untar(tarfile = file.path(save_dir, "workshop_visiumHD.zip"), exdir = data_path) 9.3.3 Giotto Visium HD convenience function The easiest way to read in Visium HD data in Giotto is through our convenience function. This function will automatically read in the data at your desired resolution, align the images, and finally create a Giotto Object. # importVisiumHD() 9.3.4 Read in data manually However, for this tutorial we will illustrate how to create your own Giotto object in a step-by-step manner, which can also be applied to other similar technologies as discussed in the Objective section. 9.3.4.1 Raw expression data expression_path <- file.path(data_path, '/Human_Colorectal_Cancer_workshop/square_002um/raw_feature_bc_matrix') expr_results <- get10Xmatrix(path_to_data = expression_path, gene_column_index = 1) 9.3.4.2 Tissue positions data tissue_positions_path <- file.path(data_path, '/Human_Colorectal_Cancer_workshop/square_002um/spatial/tissue_positions.parquet') tissue_positions <- data.table::as.data.table(arrow::read_parquet(tissue_positions_path)) 9.3.4.3 Merge expression and 2 micron position data # convert expression matrix to minimal data.frame or data.table object matrix_tile_dt <- data.table::as.data.table(Matrix::summary(expr_results)) genes <- expr_results@Dimnames[[1]] samples <- expr_results@Dimnames[[2]] matrix_tile_dt[, gene := genes[i]] matrix_tile_dt[, pixel := samples[j]] Figure 9.4: Genes expressed for each 2 µm pixel in the array dimensions. # merge data.table matrix and spatial coordinates to create input for Giotto Polygons expr_pos_data <- data.table::merge.data.table(matrix_tile_dt, tissue_positions, by.x = 'pixel', by.y = 'barcode') expr_pos_data <- expr_pos_data[,.(pixel, pxl_row_in_fullres, pxl_col_in_fullres, gene, x)] colnames(expr_pos_data) = c('pixel', 'x', 'y', 'gene', 'count') Figure 9.5: Genes expressed with count for each 2 µm pixel in the spatial dimensions. 9.4 Hexbin 400 Giotto object 9.4.1 create giotto points The giottoPoints object represents the spatial expression information for each transcript: - gene id - count or UMI - spatial pixel location (x, y) giotto_points = createGiottoPoints(x = expr_pos_data[,.(x, y, gene, pixel, count)]) 9.4.2 create giotto polygons 9.4.2.1 Tiling and aggregation The Visium HD data is organized in a grid format. We can aggregate the data into larger bins to reduce the resolution of the data. Giotto Suite can work with any type of polygon information and already provides ready-to-use options for binning data with squares, triangles, and hexagons. Here we will use a hexagon tesselation to aggregate the data into arbitrary bins. Figure 9.6: Hexagon properties # create giotto polygons, here we create hexagons hexbin400 <- tessellate(extent = ext(giotto_points), shape = 'hexagon', shape_size = 400, name = 'hex400') plot(hexbin400) Figure 9.7: Giotto polygon in a hexagon shape for overlapping visium HD expression data. 9.4.3 combine Giotto points and polygons to create Giotto object instrs = createGiottoInstructions( save_dir = save_dir, save_plot = TRUE, show_plot = FALSE, return_plot = FALSE ) # gpoints provides spatial gene expression information # gpolygons provides spatial unit information (here = hexagon tiles) visiumHD = createGiottoObjectSubcellular(gpoints = list('rna' = giotto_points), gpolygons = list('hex400' = hexbin400), instructions = instrs) # create spatial centroids for each spatial unit (hexagon) visiumHD = addSpatialCentroidLocations(gobject = visiumHD, poly_info = 'hex400') Visualize the Giotto object. Make sure to set expand_counts = TRUE to expand the counts column. Each spatial bin can have multiple transcripts/UMIs. This is different compared to in situ technologies like seqFISH, MERFISH, Nanostring CosMx or Xenium. Figure 9.8: Schematic showing effect of expand counts and jitter. Show the giotto points (transcripts) and polygons (hexagons) together using spatInSituPlotPoints: feature_data = fDataDT(visiumHD) spatInSituPlotPoints(visiumHD, show_image = F, feats = list('rna' = feature_data$feat_ID[10:20]), show_legend = T, spat_unit = 'hex400', point_size = 0.25, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex400', polygon_bg_color = NA, polygon_color = 'white', polygon_line_size = 0.1, expand_counts = TRUE, count_info_column = 'count', jitter = c(25,25)) Figure 9.9: Overlap of gene expression with the hex400 polygons. Each dot represents a single gene. Jitter used to better vizualize individual transcripts You can set plot_method = scattermore or scattermost to convert high-resolution images to low(er) resolution rasterized images. It’s usually faster and will save on disk space. spatInSituPlotPoints(visiumHD, show_image = F, feats = list('rna' = feature_data$feat_ID[10:20]), show_legend = T, spat_unit = 'hex400', point_size = 0.25, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex400', polygon_bg_color = NA, polygon_color = 'white', polygon_line_size = 0.1, expand_counts = TRUE, count_info_column = 'count', jitter = c(25,25), plot_method = 'scattermore') Figure 9.10: Overlap of gene expression with the hex400 polygons. Genes/transcripts are rasterized. Jitter used to better vizualize individual transcripts 9.4.4 Process Giotto object 9.4.4.1 calculate overlap between points and polygons At the moment the giotto points (transcripts) and polygons (hexagons) are two separate layers of information. Here we will determine which transcripts overlap with which hexagons so that we can aggregate the gene expression information and convert this into a gene expression matrix (genes-by-hexagons) that can be used in default spatial pipelines. # calculate overlap between points and polygons visiumHD = calculateOverlap(visiumHD, spatial_info = 'hex400', feat_info = 'rna') showGiottoSpatialInfo(visiumHD) 9.4.4.2 convert overlap results to a gene-by-hexagon matrix # convert overlap results to bin by gene matrix visiumHD = overlapToMatrix(visiumHD, poly_info = 'hex400', feat_info = 'rna', name = 'raw') # this action will automatically create an active spatial unit, ie. hexbin 400 activeSpatUnit(visiumHD) 9.4.4.3 default processing steps This part is similar to that described in the Visium tutorials (Part I and Part II). # filter on gene expression matrix visiumHD <- filterGiotto(visiumHD, expression_threshold = 1, feat_det_in_min_cells = 5, min_det_feats_per_cell = 25) # normalize and scale gene expression data visiumHD <- normalizeGiotto(visiumHD, scalefactor = 1000, verbose = T) # add cell and gene statistics visiumHD <- addStatistics(visiumHD) 9.4.4.3.1 visualize number of features At the centroid level. # each dot here represents a 200x200 aggregation of spatial barcodes (bin size 200) spatPlot2D(gobject = visiumHD, cell_color = "nr_feats", color_as_factor = F, point_size = 2.5) Figure 9.11: Number of features detected in each of the centroids. Using the spatial polygon (hexagon) tiles spatInSituPlotPoints(visiumHD, show_image = F, feats = NULL, show_legend = F, spat_unit = 'hex400', point_size = 0.1, show_polygon = TRUE, use_overlap = TRUE, polygon_feat_type = 'hex400', polygon_fill = 'nr_feats', polygon_fill_as_factor = F, polygon_bg_color = NA, polygon_color = 'white', polygon_line_size = 0.1) Figure 9.12: Number of features detected in each of the hex400 polygons. 9.4.4.4 Dimension reduction + clustering 9.4.4.4.1 Highly variable features + PCA visiumHD <- calculateHVF(visiumHD, zscore_threshold = 1) visiumHD <- runPCA(visiumHD, expression_values = 'normalized', feats_to_use = 'hvf') screePlot(visiumHD, ncp = 30) plotPCA(visiumHD) 9.4.4.4.2 UMAP reduction for visualization visiumHD <- runUMAP(visiumHD, dimensions_to_use = 1:14, n_threads = 10) plotUMAP(gobject = visiumHD, point_size = 1) 9.4.4.4.3 Create network based on expression similarity + graph partition cluster # sNN network (default) visiumHD <- createNearestNetwork(visiumHD, dimensions_to_use = 1:14, k = 5) ## leiden clustering #### visiumHD <- doLeidenClusterIgraph(visiumHD, resolution = 0.5, n_iterations = 1000, spat_unit = 'hex400') plotUMAP(gobject = visiumHD, cell_color = 'leiden_clus', point_size = 1.5, show_NN_network = F, edge_alpha = 0.05) Figure 9.13: Leiden clustering for the hex400 bins. spatInSituPlotPoints(visiumHD, show_image = F, feats = NULL, show_legend = F, spat_unit = 'hex400', point_size = 0.25, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex400', polygon_fill_as_factor = TRUE, polygon_fill = 'leiden_clus', polygon_color = 'black', polygon_line_size = 0.3) Figure 9.14: Spat plot for hex400 bin colored by leiden clusters. 9.5 Hexbin 100 Observation: Hexbin 400 results in very coarse information about the tissue. Goal is to create a higher resolution bin (hex100), then add this to the Giotto object to compare difference in resolution. 9.5.1 Standard subcellular pipeline Create new spatial unit layer, e.g. with tessellate function Add spatial units to Giottoo object Calculate centroids (optional) Compute overlap between transcript and polygon (hexagon) locations. Convert overlap data into a gene-by-polygon matrix hexbin100 <- tessellate(extent = ext(visiumHD), shape = 'hexagon', shape_size = 100, name = 'hex100') visiumHD = setPolygonInfo(gobject = visiumHD, x = hexbin100, name = 'hex100', initialize = T) visiumHD = addSpatialCentroidLocations(gobject = visiumHD, poly_info = 'hex100') Set active spatial unit. This can also be set manually in each function. activeSpatUnit(visiumHD) <- 'hex100' Let’s visualize the higher resolution hexagons. spatInSituPlotPoints(visiumHD, show_image = F, feats = list('rna' = feature_data$feat_ID[1:20]), show_legend = T, spat_unit = 'hex100', point_size = 0.1, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex100', polygon_bg_color = NA, polygon_color = 'white', polygon_line_size = 0.2, expand_counts = TRUE, count_info_column = 'count', jitter = c(25,25)) Figure 9.15: Polygon overlay of hex100 bins over 2 µm pixel. Jitter applied to vizualize individual features. visiumHD = calculateOverlap(visiumHD, spatial_info = 'hex100', feat_info = 'rna') visiumHD = overlapToMatrix(visiumHD, poly_info = 'hex100', feat_info = 'rna', name = 'raw') visiumHD <- filterGiotto(visiumHD, expression_threshold = 1, feat_det_in_min_cells = 10, min_det_feats_per_cell = 10) visiumHD <- normalizeGiotto(visiumHD, scalefactor = 1000, verbose = T) visiumHD <- addStatistics(visiumHD) Your Giotto object will have metadata for each spatial unit. pDataDT(visiumHD, spat_unit = 'hex100') pDataDT(visiumHD, spat_unit = 'hex400') ## dimension reduction #### # --------------------------- # visiumHD <- calculateHVF(visiumHD, zscore_threshold = 1) visiumHD <- runPCA(visiumHD, expression_values = 'normalized', feats_to_use = 'hvf') plotPCA(visiumHD) visiumHD <- runUMAP(visiumHD, dimensions_to_use = 1:14, n_threads = 10) # plot UMAP, coloring cells/points based on nr_feats plotUMAP(gobject = visiumHD, point_size = 2) Figure 9.16: UMAP for the hex100 bin. # sNN network (default) visiumHD <- createNearestNetwork(visiumHD, dimensions_to_use = 1:14, k = 5) ## leiden clustering #### visiumHD <- doLeidenClusterIgraph(visiumHD, resolution = 0.2, n_iterations = 1000) plotUMAP(gobject = visiumHD, cell_color = 'leiden_clus', point_size = 1.5, show_NN_network = F, edge_alpha = 0.05) Figure 9.17: UMAP for the hex100 bin colored by ledien clusters. spatInSituPlotPoints(visiumHD, show_image = F, feats = NULL, show_legend = F, spat_unit = 'hex100', point_size = 0.5, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex100', polygon_fill_as_factor = TRUE, polygon_fill = 'leiden_clus', polygon_color = 'black', polygon_line_size = 0.3) Figure 9.18: Spat plot for the hex100 bin colored by leiden clusters. This resolution definitely shows more promise to identify interesting spatial patterns. 9.5.2 Spatial expression patterns 9.5.2.1 Identify single genes Here we will use binSpect as a quick method to rank genes with high potential for spatial coherent expression patterns. featData = fDataDT(visiumHD) hvf_genes = featData[hvf == 'yes']$feat_ID visiumHD = createSpatialNetwork(visiumHD, name = 'kNN_network', spat_unit = 'hex100', method = 'kNN', k = 8) ranktest = binSpect(visiumHD, spat_unit = 'hex100', subset_feats = hvf_genes, bin_method = 'rank', calc_hub = FALSE, do_fisher_test = TRUE, spatial_network_name = 'kNN_network') Visualize top 2 ranked spatial genes per expression bin: set0 = ranktest[high_expr < 50][1:2]$feats set1 = ranktest[high_expr > 50 & high_expr < 100][1:2]$feats set2 = ranktest[high_expr > 100 & high_expr < 200][1:2]$feats set3 = ranktest[high_expr > 200 & high_expr < 400][1:2]$feats set4 = ranktest[high_expr > 400 & high_expr < 1000][1:2]$feats set5 = ranktest[high_expr > 1000][1:2]$feats spatFeatPlot2D(visiumHD, expression_values = 'scaled', feats = c(set0, set1, set2), gradient_style = "sequential", cell_color_gradient = c('blue', 'white', 'yellow', 'orange', 'red', 'darkred'), cow_n_col = 2, point_size = 1) Figure 9.19: Spat feature plot showing gene expression for the top 2 ranked spatial genes per expression bin (<50, >50 and >100) across the hex100 bin. spatFeatPlot2D(visiumHD, expression_values = 'scaled', feats = c(set3, set4, set5), gradient_style = "sequential", cell_color_gradient = c('blue', 'white', 'yellow', 'orange', 'red', 'darkred'), cow_n_col = 2, point_size = 1) Figure 9.20: Spat feature plot showing gene expression for the top 2 ranked spatial genes per expression bin (>200, >400 and >1000) across the hex100 bin. 9.5.2.2 Spatial co-expression modules Investigating individual genes is a good start, but here we would like to identify recurrent spatial expression patterns that are shared by spatial co-expression modules that might represent spatially organized biological processes. ext_spatial_genes = ranktest[adj.p.value < 0.001]$feats spat_cor_netw_DT = detectSpatialCorFeats(visiumHD, method = 'network', spatial_network_name = 'kNN_network', subset_feats = ext_spatial_genes) # cluster spatial genes spat_cor_netw_DT = clusterSpatialCorFeats(spat_cor_netw_DT, name = 'spat_netw_clus', k = 16) # visualize clusters heatmSpatialCorFeats(visiumHD, spatCorObject = spat_cor_netw_DT, use_clus_name = 'spat_netw_clus', heatmap_legend_param = list(title = NULL)) Figure 9.21: Heatmap showing spatially correlated genes split into 16 clusters. # create metagene enrichment score for clusters cluster_genes_DT = showSpatialCorFeats(spat_cor_netw_DT, use_clus_name = 'spat_netw_clus', show_top_feats = 1) cluster_genes = cluster_genes_DT$clus; names(cluster_genes) = cluster_genes_DT$feat_ID visiumHD = createMetafeats(visiumHD, expression_values = 'normalized', feat_clusters = cluster_genes, name = 'cluster_metagene') showGiottoSpatEnrichments(visiumHD) spatCellPlot(visiumHD, spat_enr_names = 'cluster_metagene', gradient_style = "sequential", cell_color_gradient = c('blue', 'white', 'yellow', 'orange', 'red', 'darkred'), cell_annotation_values = as.character(c(1:4)), point_size = 1, cow_n_col = 2) Figure 9.22: Spat plot vizualizing metagenes (1-4) based on spatially correlated genes vizualized on the hex100 bin spatCellPlot(visiumHD, spat_enr_names = 'cluster_metagene', gradient_style = "sequential", cell_color_gradient = c('blue', 'white', 'yellow', 'orange', 'red', 'darkred'), cell_annotation_values = as.character(c(5:8)), point_size = 1, cow_n_col = 2) Figure 9.23: Spat plot vizualizing metagenes (5-8) based on spatially correlated genes vizualized on the hex100 bin spatCellPlot(visiumHD, spat_enr_names = 'cluster_metagene', gradient_style = "sequential", cell_color_gradient = c('blue', 'white', 'yellow', 'orange', 'red', 'darkred'), cell_annotation_values = as.character(c(9:12)), point_size = 1, cow_n_col = 2) Figure 9.24: Spat plot vizualizing metagenes (9-12) based on spatially correlated genes vizualized on the hex100 bin spatCellPlot(visiumHD, spat_enr_names = 'cluster_metagene', gradient_style = "sequential", cell_color_gradient = c('blue', 'white', 'yellow', 'orange', 'red', 'darkred'), cell_annotation_values = as.character(c(13:16)), point_size = 1, cow_n_col = 2) Figure 9.25: Spat plot vizualizing metagenes (13-16) based on spatially correlated genes vizualized on the hex100 bin A simple follow up analysis could be to perform gene set enrichment analysis on each spatial co-expression module. 9.5.2.3 Plot spatial gene groups Hack! Vendors of spatial technologies typically like to show very interesting spatial gene expression patterns. Here we will follow a similar strategy by selecting a balanced set of genes for each spatial co-expression module and then to simply give them the same color in the spatInSituPlotPoints function. balanced_genes = getBalancedSpatCoexpressionFeats(spatCorObject = spat_cor_netw_DT, maximum = 5) selected_feats = names(balanced_genes) # give genes from same cluster same color distinct_colors = getDistinctColors(n = 20) names(distinct_colors) = 1:20 my_colors = distinct_colors[balanced_genes] names(my_colors) = names(balanced_genes) spatInSituPlotPoints(visiumHD, show_image = F, feats = list('rna' = selected_feats), feats_color_code = my_colors, show_legend = F, spat_unit = 'hex100', point_size = 0.20, show_polygon = FALSE, use_overlap = FALSE, polygon_feat_type = 'hex100', polygon_bg_color = NA, polygon_color = 'white', polygon_line_size = 0.01, expand_counts = TRUE, count_info_column = 'count', jitter = c(25,25)) Figure 9.26: Coloring individual features based on the spatially correlated gene clusters. 9.6 Hexbin 25 Goal is to create a higher resolution bin (hex25) and add to the Giotto object. We will aim to identify individual cell types and local neighborhood niches. 9.6.1 Subcellular workflow filter and normalization workflow visiumHD_subset = subsetGiottoLocs(gobject = visiumHD, x_min = 16000, x_max = 20000, y_min = 44250, y_max = 45500) Figure 9.27: Coloring individual features based on the spatially correlated gene clusters + subset rectangle. Plot visiumHD subset with hexbin100 polygons: spatInSituPlotPoints(visiumHD_subset, show_image = F, feats = NULL, show_legend = F, spat_unit = 'hex100', point_size = 0.5, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex100', polygon_fill_as_factor = TRUE, polygon_fill = 'leiden_clus', polygon_color = 'black', polygon_line_size = 0.3) Figure 9.28: Hexbin100 colored by leiden clustering results Plot visiumHD subset with selected gene features: spatInSituPlotPoints(visiumHD_subset, show_image = F, feats = list('rna' = selected_feats), feats_color_code = my_colors, show_legend = F, spat_unit = 'hex100', point_size = 0.40, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex100', polygon_bg_color = NA, polygon_color = 'white', polygon_line_size = 0.05, jitter = c(25,25)) Figure 9.29: Coloring individual features based on the spatially correlated gene clusters Create smaller hexbin25 tessellations: hexbin25 <- tessellate(extent = ext(visiumHD_subset@feat_info$rna), shape = 'hexagon', shape_size = 25, name = 'hex25') visiumHD_subset = setPolygonInfo(gobject = visiumHD_subset, x = hexbin25, name = 'hex25', initialize = T) showGiottoSpatialInfo(visiumHD_subset) visiumHD_subset = addSpatialCentroidLocations(gobject = visiumHD_subset, poly_info = 'hex25') activeSpatUnit(visiumHD_subset) <- 'hex25' spatInSituPlotPoints(visiumHD_subset, show_image = F, feats = list('rna' = selected_feats), feats_color_code = my_colors, show_legend = F, spat_unit = 'hex25', point_size = 0.40, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex25', polygon_bg_color = NA, polygon_color = 'white', polygon_line_size = 0.05, jitter = c(25,25)) Figure 9.30: xxx visiumHD_subset = calculateOverlap(visiumHD_subset, spatial_info = 'hex25', feat_info = 'rna') showGiottoSpatialInfo(visiumHD_subset) # convert overlap results to bin by gene matrix visiumHD_subset = overlapToMatrix(visiumHD_subset, poly_info = 'hex25', feat_info = 'rna', name = 'raw') visiumHD_subset <- filterGiotto(visiumHD_subset, expression_threshold = 1, feat_det_in_min_cells = 3, min_det_feats_per_cell = 5) activeSpatUnit(visiumHD_subset) # normalize visiumHD_subset <- normalizeGiotto(visiumHD_subset, scalefactor = 1000, verbose = T) # add statistics visiumHD_subset <- addStatistics(visiumHD_subset) feature_data = fDataDT(visiumHD_subset) visiumHD_subset <- calculateHVF(visiumHD_subset, zscore_threshold = 1) 9.6.2 Projections PCA projection from random subset. UMAP projection from random subset. cluster result projection from subsampled Giotto object + kNN voting 9.6.2.1 PCA with projection n_25_percent <- round(length(spatIDs(visiumHD_subset, 'hex25')) * 0.25) # pca projection on subset visiumHD_subset <- runPCAprojection( gobject = visiumHD_subset, spat_unit = "hex25", feats_to_use = 'hvf', name = 'pca.projection', set_seed = TRUE, seed_number = 12345, random_subset = n_25_percent ) showGiottoDimRed(visiumHD_subset) plotPCA(visiumHD_subset, dim_reduction_name = 'pca.projection') Figure 9.31: xxx 9.6.2.2 UMAP with projection # umap projection on subset visiumHD_subset <- runUMAPprojection( gobject = visiumHD_subset, spat_unit = "hex25", dim_reduction_to_use = 'pca', dim_reduction_name = "pca.projection", dimensions_to_use = 1:10, name = "umap.projection", random_subset = n_25_percent, n_neighbors = 10, min_dist = 0.005, n_threads = 4 ) showGiottoDimRed(visiumHD_subset) # plot UMAP, coloring cells/points based on nr_feats plotUMAP(gobject = visiumHD_subset, point_size = 1, dim_reduction_name = 'umap.projection') Figure 9.32: xxx 9.6.2.3 clustering with projection subsample Giotto object perform clustering (e.g. hierarchical clustering) project cluster results to full Giotto object using a kNN voting approach and a shared dimension reduction space (e.g. PCA) # subset to smaller giotto object set.seed(1234) subset_IDs = sample(x = spatIDs(visiumHD_subset, 'hex25'), size = n_25_percent) temp_gobject = subsetGiotto( gobject = visiumHD_subset, spat_unit = 'hex25', cell_ids = subset_IDs ) # hierarchical clustering temp_gobject = doHclust(gobject = temp_gobject, spat_unit = 'hex25', k = 8, name = 'sub_hclust', dim_reduction_to_use = 'pca', dim_reduction_name = 'pca.projection', dimensions_to_use = 1:10) # show umap dimPlot2D( gobject = temp_gobject, point_size = 2.5, spat_unit = 'hex25', dim_reduction_to_use = 'umap', dim_reduction_name = 'umap.projection', cell_color = 'sub_hclust' ) Figure 9.33: xxx # project clusterings back to full dataset visiumHD_subset <- doClusterProjection( target_gobject = visiumHD_subset, source_gobject = temp_gobject, spat_unit = "hex25", source_cluster_labels = "sub_hclust", reduction_method = 'pca', reduction_name = 'pca.projection', prob = FALSE, knn_k = 5, dimensions_to_use = 1:10 ) pDataDT(visiumHD_subset) dimPlot2D( gobject = visiumHD_subset, point_size = 1.5, spat_unit = 'hex25', dim_reduction_to_use = 'umap', dim_reduction_name = 'umap.projection', cell_color = 'knn_labels' ) Figure 9.34: xxx spatInSituPlotPoints(visiumHD_subset, show_image = F, feats = NULL, show_legend = F, spat_unit = 'hex25', point_size = 0.5, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex25', polygon_fill_as_factor = TRUE, polygon_fill = 'knn_labels', polygon_color = 'black', polygon_line_size = 0.3) Figure 9.35: xxx 9.6.3 Niche clustering Each cell will be clustered based on its neighboring cell type composition. Figure 9.36: Schematic for niche clustering. Originally from CODEX. Size of cellular niche is important and defines the tissue organization resolution. visiumHD_subset = createSpatialNetwork(visiumHD_subset, name = 'kNN_network', spat_unit = 'hex25', method = 'kNN', k = 6) pDataDT(visiumHD_subset) visiumHD_subset = calculateSpatCellMetadataProportions(gobject = visiumHD_subset, spat_unit = 'hex25', feat_type = 'rna', metadata_column = 'knn_labels', spat_network = 'kNN_network') prop_table = getSpatialEnrichment(visiumHD_subset, name = 'proportion', output = 'data.table') prop_matrix = GiottoUtils:::dt_to_matrix(prop_table) set.seed(1234) prop_kmeans = kmeans(x = prop_matrix, centers = 10, iter.max = 1000, nstart = 100) prop_kmeansDT = data.table::data.table(cell_ID = names(prop_kmeans$cluster), niche = prop_kmeans$cluster) visiumHD_subset = addCellMetadata(visiumHD_subset, new_metadata = prop_kmeansDT, by_column = T, column_cell_ID = 'cell_ID') pDataDT(visiumHD_subset) spatInSituPlotPoints(visiumHD_subset, show_image = F, feats = NULL, show_legend = F, spat_unit = 'hex25', point_size = 0.5, show_polygon = TRUE, use_overlap = FALSE, polygon_feat_type = 'hex25', polygon_fill_as_factor = TRUE, polygon_fill = 'niche', polygon_color = 'black', polygon_line_size = 0.3) Figure 9.37: xxx 9.7 Database backend - Work in progress, but coming soon! Memory problems: - data ingestion - spatial operations - matrix operations - matrix and spatial geometry object sizes "],["xenium-1.html", "10 Xenium 10.1 Introduction to spatial dataset 10.2 Data preparation 10.3 Convenience function 10.4 Piecewise loading 10.5 Xenium Images 10.6 Spatial aggregation 10.7 Aggregate analyses workflow 10.8 Niche clustering 10.9 Cell proximity enrichment 10.10 Pseudovisium", " 10 Xenium Jiaji George Chen August 6th 2024 10.1 Introduction to spatial dataset This is the 10X Xenium FFPE Human Lung Cancer dataset. Xenium captures individual transcript detections with a spatial resolution of 100s of nanometers, providing an extremely highly resolved subcellular spatial dataset. This particular dataset also showcases their recent multimodal cell segmentation outputs. The Xenium Human Multi-Tissue and Cancer Panel (377) genes was used. The exported data is from their Xenium Onboard Analysis v2.0.0 pipeline. The full data for this example can be found here: here The relevant items are: Xenium Output Bundle (full) Supplemental: Post-Xenium H&E image (OME-TIFF) Supplemental: H&E Image Alignment File (CSV) Additional package requirements When working with this data and trying to open the parquet files, you will need arrow built with ZTSD support. See the datasets & packages section for specific install instructions. 10.1.1 Output directory structure ├── analysis.tar.gz ├── analysis.zarr.zip ├── analysis_summary.html ├── aux_outputs.tar.gz ├── transcripts.csv.gz ├── transcripts.parquet ├── transcripts.zarr.zip ├── cell_boundaries.csv.gz ├── cell_boundaries.parquet ├── nucleus_boundaries.csv.gz ├── nucleus_boundaries.parquet ├── cell_feature_matrix.tar.gz ├── cell_feature_matrix │ ├── barcodes.tsv.gz │ ├── features.tsv.gz │ └── matrix.mtx.gz ├── cell_feature_matrix.h5 ├── cell_feature_matrix.zarr.zip ├── cells.csv.gz ├── cells.parquet ├── cells.zarr.zip ├── experiment.xenium ├── gene_panel.json ├── metrics_summary.csv ├── morphology.ome.tif ├── morphology_focus │ ├── morphology_focus_0000.ome.tif │ ├── morphology_focus_0001.ome.tif │ ├── morphology_focus_0002.ome.tif │ ├── morphology_focus_0003.ome.tif ├── Xenium_V1_humanLung_Cancer_FFPE_he_image.ome.tif └── Xenium_V1_humanLung_Cancer_FFPE_he_imagealignment.csv The above directory structuring and naming is characteristic of Xenium v2.0 pipeline outputs. The only items that may not be exactly the same across all outputs are the morphology focus directory and the naming of the aligned image items. For the morphology focus images, you may have fewer images if the experiment did not include the multimodal cell segmentation. As for the aligned images, this is usually done after the Xenium experiment concludes and is added on using Xenium Explorer. Naming and location of the aligned image (he_image.ome.tif) and associated alignment info he_imagealignment.csv are entirely up to the user. 10.1.2 Mini Xenium Dataset library(Giotto) # set up paths data_path <- "data/02_session3" save_dir <- "results/02_session3" dir.create(save_dir, recursive = TRUE) # download the mini dataset and untar options("timeout" = Inf) download.file( url = "https://zenodo.org/records/13207308/files/workshop_xenium.zip?download=1", destfile = file.path(save_dir, "workshop_xenium.zip") ) # untar the downloaded data untar(tarfile = file.path(save_dir, "workshop_xenium.zip"), exdir = data_path) In order to speed up the steps of the workshop and make it locally runnable, we provide a subset of the full dataset. - Full: -16.039, 12342.984, -3511.515, -294.455 (xmin, xmax, ymin, ymax) - Mini: 6000, 7000, -2200, -1400 (xmin, xmax, ymin, ymax) Figure 10.1: Shown is the H&E aligned to the Xenium dataset with micron scaling. The blue bounds mark out the area provided as a mini dataset 10.2 Data preparation 10.2.1 Image conversion (may change) First is actually dealing with the image formats. Xenium generates ome.tif images which Giotto is currently not fully compatible with. So we convert them to normal tif images using ometif_to_tif() which works through the python tifffile package. The image files can then be loaded in downstream steps. These commented out steps are not needed for today since the mini dataset provides .tif images that have already been spatially aligned and converted. However, the code needed to do this is provided below. # image_paths <- list.files( # data_path, pattern = "morphology_focus|he_image.ome", # recursive = TRUE, full.names = TRUE # ) ometif_to_tif() output_dir can be specified, but by default, it writes to a new subdirectory called tif_exports underneath the source image”s directory. Keep in mind that where the exported tifs get exported to should be where downstream image reading functions should point to. The code run today is with the filepaths that the mini dataset has. # lapply(image_paths, function(img) { # GiottoClass::ometif_to_tif(img, overwrite = TRUE) # }) We are also working on a method of directly accessing the ome.tifs for better compatibility in the future. 10.3 Convenience function Giotto has flexible methods for working with the Xenium outputs. The createGiottoXeniumObject() will generate a giotto object in a single step when provided the output directory. The default behavior is to load: transcripts information cell and nucleus boundaries feature metadata (gene_panel.json) For the full dataset (HPC): time: 1-2min | memory: 24GBC ?createGiottoXeniumObject g <- createGiottoXeniumObject(xenium_dir = data_path) # set instructions for save directory and to save the plots to disk instructions(g, "save_dir") <- save_dir instructions(g, "save_plot") <- TRUE There are a lot of other parameters for additional or alternative items you can load. The next subsections will explain a couple of them. 10.3.1 Specific filepaths expression_path = , cell_metadata_path = , transcript_path = , bounds_path = , gene_panel_json_path = , The convenience function auto-detects filepaths based on the Xenium directory path and the preferred file formats .parquet for tabular (vs .csv) .h5 for matrix over other formats when available (vs .mtx) .zarr is currently not supported. When you need to use a different file format or something is not in the expected output structure, you can supply a specific filepath to the convenience function using these parameters. 10.3.2 Quality value qv_threshold = 20 # default The Quality Value is a Phred-based 0-40 value that 10X provides for every detection in their transcripts output. Higher values mean higher confidence in the decoded transcript identity. By default 10X uses a cutoff of QV = 20 for transcripts to use downstream. _*setting a value other than 20 will make the loaded dataset different from the 10X-provided expression matrix and cell metadata._ QV Calculation Raw Q-score based on how likely it is that an observed code is to be the codeword that it gets mapped to vs less likely codeword. Adjustment of raw Q-score by binning the transcripts by Q-value then adjusting the exact Q per bin based on proportion of Negative Control Codewords detected within. further info 10.3.3 Transcript type splitting feat_type = c( "rna", "NegControlProbe", "UnassignedCodeword", "NegControlCodeword" ), split_keyword = list( c("NegControlProbe"), c("UnassignedCodeword"), c("NegControlCodeword)" ) There are 4 types of transcript detections that 10X reports with their v2.0 pipeline: Gene expression - This is the rna gene detections. Negative Control Codeword - (QC) Codewords that do not map to genes, but are in the codebook. Used to determine specificity of decoding algorithm. Negative Control Probe - (QC) Probes in panel but target non-biological sequences. Used to determine specificity of assay. Unassigned Codeword - (QC) Codewords that should not be used in the current panel. With V3 on their Xenium prime outputs, there is additionally: Genomic Control Codeword (QC) Probes for intergenic genomic DNA instead of transcripts. The main thing to watch out for is that the other probe types should be separated out from the the Gene expression or rna feature type. How to deal with these different types of detections is easily adjustable. With the feat_type param you declare which categories/feat_types you want to split transcript detections into. Then with split_keyword, you provide a list of character vectors containing grep() terms to search for. Note that there are 4 feat_types declared in this set of defaults, but 3 items passed to split_keyword. Any transcripts not matched by items in split_keyword, get categorized as the first provided feat_type (“rna”). 10.3.4 Centroids calculation Several Giotto operations require that a set of centroids are calculated for polygon spatial units. g <- addSpatialCentroidLocations(g, poly_info = "cell") g <- addSpatialCentroidLocations(g, poly_info = "nucleus") 10.3.5 Simple visualization spatInSituPlotPoints(g, polygon_feat_type = "cell", feats = list(rna = head(featIDs(g))), # must be named list use_overlap = FALSE, polygon_color = "cyan", polygon_line_size = 0.1 ) Figure 10.2: Simple subcellular plotting to check data 10.4 Piecewise loading Giotto also provides the importXenium() import utility that allows independent creation of compatible Giotto subobjects for more flexibility. x <- importXenium(data_path) force(x) Giotto <XeniumReader> dir : data/02_session3/ qv_cutoff : 20 filetype : transcripts -- parquet boundaries -- parquet expression -- h5 cell_meta -- parquet funs : load_transcripts() load_polys() load_cellmeta() load_featmeta() load_expression() load_image() load_aligned_image() create_gobject() 10.4.1 Load giottoPoints transcripts x$qv <- 20 # default tx <- x$load_transcripts() plot(tx[[1]]$rna, dens = TRUE) Figure 10.3: plot of Gene expression (rna) density force(tx[[1]]$rna) An object of class giottoPoints feat_type : "rna" Feature Information: class : SpatVector geometry : points dimensions : 479097, 10 (geometries, attributes) extent : 6000.001, 7000, -2200, -1400.012 (xmin, xmax, ymin, ymax) coord. ref. : names : feat_ID transcript_id cell_id overlaps_nucleus z_location qv fov_name type : <chr> <chr> <chr> <int> <num> <num> <chr> values : FBLN1 281487861612869 mcnjadoe-1 0 19.32 40 B11 PDGFRB 281487861612872 mcnjbidl-1 1 18.75 40 B11 PDGFRB 281487861612873 mcnjbidl-1 1 18.74 40 B11 nucleus_distance codeword_index feat_ID_uniq <num> <int> <int> 0 334 1 0 289 2 0 289 3 rm(tx) # remove to save space 10.4.2 (optional) Loading pre-aggregated data Giotto can spatially aggregate the transcripts information based on a provided set of boundaries information, however 10X also provides a pre-aggregated set of cell by feature information and metadata. These values may be slightly different from those calculated by Giotto”s pipeline, and are not loaded by default. Some care needs to be taken when loading this information: The feat_type of the loaded expression information should be matched to the used feat_type parameters passed to the convenience function. The qv_threshold used must be 20 since the 10X outputs are based on that cutoff. x$filetype$expression <- "mtx" # change to mtx instead of .h5 which is not in the mini dataset ex <- x$load_expression() featType(ex) [1] "rna" "Negative Control Probe" "Negative Control Codeword" [4] "Unassigned Codeword" The feature types here do not match what we established for the transcripts, so we can just change them. Another reason for changing them here is just because the default names have ’ ’ characters which are difficult to work with. force(g) An object of class giotto >Active spat_unit: cell >Active feat_type: rna [SUBCELLULAR INFO] polygons : cell nucleus features : rna NegControlProbe UnassignedCodeword NegControlCodeword [AGGREGATE INFO] spatial locations ---------------- [cell] raw [nucleus] raw featType(ex[[2]]) <- c("NegControlProbe") featType(ex[[3]]) <- c("NegControlCodeword") featType(ex[[4]]) <- c("UnassignedCodeword") Then we can just append them to the Giotto object. Here we set up a second object called g2 since we will be using Giotto’s own aggregation method to generate the expression matrix later. g2 <- g # append the expression info g2 <- setGiotto(g2, ex) # load cell metadata cx <- x$load_cellmeta() g2 <- setGiotto(g2, cx) force(g2) An object of class giotto >Active spat_unit: cell >Active feat_type: rna [SUBCELLULAR INFO] polygons : cell nucleus features : rna NegControlProbe UnassignedCodeword NegControlCodeword [AGGREGATE INFO] expression ----------------------- [cell][rna] raw [cell][NegControlProbe] raw [cell][NegControlCodeword] raw [cell][UnassignedCodeword] raw spatial locations ---------------- [cell] raw [nucleus] raw spatInSituPlotPoints(g2, # polygon shading params polygon_fill = "cell_area", polygon_fill_as_factor = FALSE, polygon_fill_gradient_style = "sequential", # polygon line params polygon_color = "grey", polygon_line_size = 0.1 ) spatInSituPlotPoints(g2, # polygon shading params polygon_fill = "transcript_counts", polygon_fill_as_factor = FALSE, polygon_fill_gradient_style = "sequential", # polygon line params polygon_color = "grey", polygon_line_size = 0.1 ) Figure 10.4: Example plot using 10X metadata. Left is cell_area, right is transcript_counts rm(g2) # save space 10.5 Xenium Images Xenium outputs have several image outputs. For this dataset: morphology.ome.tif is a z-stacked image of the DAPI staining, with z levels separated as pages within the ome.tif. In this dataset, only pages 6 and 7 are really in focus. morphology_focus is a folder containing single-channel image(s), but with the original z information collapsed into a single in-focus layer. For all datasets, image 0000 will be DAPI staining, but if you have additional stains, such as the multimodal segmentation, they will also be here. These are the recommended immunofluorescence staining images to import. Xenium_V1_humanLung_Cancer_FFPE_he_image.ome.tif is an added on (in this case H&E) image with manual affine registration. 10.5.1 Image metadata The morphology_focus directory may contain multiple images, but to know more information, we have to check the ome.tif xml metadata. With a normal dataset, you can use: `GiottoClass::ometif_metadata([filepath], node = "Channel")` on one of the morphology_focus images, but since the mini dataset images are pre-processed, there is only an exported .xml to explore. The output of the code chunk below is the same as that from calling ometif_metadata() and looking for the Channel node. img_xml_path <- file.path(data_path, "morphology_focus", "morphology_focus_0000.xml") omemeta <- xml2::read_xml(img_xml_path) res <- xml2::xml_find_all(omemeta, "//d1:Channel", ns = xml2::xml_ns(omemeta)) res <- Reduce(rbind, xml2::xml_attrs(res)) rownames(res) <- NULL res <- as.data.frame(res) force(res) ID Name SamplesPerPixel 1 Channel:0 DAPI 1 2 Channel:1 18S 1 3 Channel:2 ATP1A1/CD45/E-Cadherin 1 4 Channel:3 alphaSMA/Vimentin 1 10.5.2 Image loading morphology_focus images need to be scaled by the micron scaling factor. Aligned images need to first be affine transformed then scaled. The micron scaling factor can be found in the json-like experiment.xenium file under pixel_size (0.2125 for this dataset). Figure 10.5: Spatial extent/bounds of transcripts (red), immunofluorescence morphology focus images (blue), H&E aligned image (gold). Lower right shows the affine matrix for aligning the H&E These transforms are normally done automatically when using: # convenience function params load_images = list( img1 = "[img_path1.tif]", img2 = "[img_path2.tif]", img3 = "..." ), load_aligned_images = list( aligned_img = c( "[path to image.tif]", "[path to magealignment.csv]" ) ) # importer params x$load_image(path = "[img_path1.tif]", name = "img1") x$load_image(path = "[img_path2.tif]", name = "img2") ... x$load_aligned_image( path = "[path to image.tif]", imagealignment_path = "[path to magealignment.csv]", name = "aligned_img" ) Specifically for the aligned image, there is also read10xAffineImage() which has similar parameters, but also asks for the micron scaling factor. But for the mini dataset, the images are pre-processed and can be directly added. img_paths <- c( sprintf("data/02_session3/morphology_focus/morphology_focus_%04d.tif", 0:3), "data/02_session3/he_mini.tif" ) img_list <- createGiottoLargeImageList( img_paths, # naming is based on the channel metadata above names = c("DAPI", "18S", "ATP1A1/CD45/E-Cadherin", "alphaSMA/Vimentin", "HE"), use_rast_ext = TRUE, verbose = FALSE ) # make some images brighter img_list[[1]]@max_window <- 5000 img_list[[2]]@max_window <- 5000 img_list[[3]]@max_window <- 5000 # append images to gobject g <- setGiotto(g, img_list) # example plots spatInSituPlotPoints(g, show_image = TRUE, image_name = "HE", polygon_feat_type = "cell", polygon_color = "cyan", polygon_line_size = 0.1, polygon_alpha = 0 ) spatInSituPlotPoints(g, show_image = TRUE, image_name = "DAPI", polygon_feat_type = "nucleus", polygon_color = "cyan", polygon_line_size = 0.1, polygon_alpha = 0 ) spatInSituPlotPoints(g, show_image = TRUE, image_name = "18S", polygon_feat_type = "cell", polygon_color = "cyan", polygon_line_size = 0.1, polygon_alpha = 0 ) spatInSituPlotPoints(g, show_image = TRUE, image_name = "ATP1A1/CD45/E-Cadherin", polygon_feat_type = "nucleus", polygon_color = "cyan", polygon_line_size = 0.1, polygon_alpha = 0 ) Figure 10.6: H&E and Cell polys (top left), DAPI and nuclear polys (top right), 18S and cell polys (lower left), ATP1A1/CD45/E-Cadherin and nuclear polys (lower right) 10.6 Spatial aggregation First calculate the feat_info “rna” transcripts overlapped by the spatial_info “cell” polygons with calculateOverlap(). Then, the overlaps information (relationships between points and polygons that overlap them) gets converted into a count matrix with overlapToMatrix(). g <- calculateOverlap(g, spatial_info = "cell", feat_info = "rna" ) g <- overlapToMatrix(g) 10.7 Aggregate analyses workflow 10.7.1 Transcripts per cell g <- addStatistics(g) # this is going to fail because it looks for normalized g <- addStatistics(g, expression_values = "raw") cell_stats <- pDataDT(g) ggplot2::ggplot(cell_stats, ggplot2::aes(total_expr)) + ggplot2::geom_histogram(binwidth = 5) Figure 10.7: Histogram of detections per cell 10.7.2 Filtering # very permissive filtering. Mainly for removing 0 values g <- filterGiotto(g, expression_threshold = 1, feat_det_in_min_cells = 1, min_det_feats_per_cell = 5 ) Feature type: rna Number of cells removed: 143 out of 7655 Number of feats removed: 0 out of 377 10.7.3 Normalization g <- normalizeGiotto(g) # overwrite original results with those for normalized values g <- addStatistics(g) spatInSituPlotPoints(g, polygon_fill = "nr_feats", polygon_fill_gradient_style = "sequential", polygon_fill_as_factor = FALSE ) spatInSituPlotPoints(g, polygon_fill = "total_expr", polygon_fill_gradient_style = "sequential", polygon_fill_as_factor = FALSE ) Figure 10.8: nr_feats - Number of different gene species detected per cell (left), total_expr - total detections per cell (right) When there are a lot of features, we would also select only the interesting highly variable features so that downstream dimension reduction has more meaningful separation. Here we skip HVF detection since there are only 377 genes. 10.7.4 Dimension Reduction Dimensional reduction of expression space to visualize expressional differences between cells and help with clustering. g <- runPCA(g, feats_to_use = NULL) # feats_to_use = NULL since there are no HVFs calculated. Use all genes. screePlot(g, ncp = 30) Figure 10.9: Plot of variance explained in the first 30 out of 100 principle components calculated g <- runUMAP(g, dimensions_to_use = seq(15), n_neighbors = 40 # default ) plotPCA(g) plotUMAP(g) Figure 10.10: PCA plot showing the first 2 PCs (left), UMAP generated from first 15 PCs (right) 10.7.5 Clustering g <- createNearestNetwork(g, dimensions_to_use = seq(15), k = 40 ) # takes roughly 1 min to run g <- doLeidenCluster(g) plotPCA_3D(g, cell_color = "leiden_clus", point_size = 1 ) plotUMAP(g, cell_color = "leiden_clus", point_size = 0.1, point_shape = "no_border" ) Figure 10.11: 3D plot showing first PCs with leiden clustering annotations (left), UMAP plot showing leiden clustering results (right) spatInSituPlotPoints(g, polygon_fill = "leiden_clus", polygon_fill_as_factor = TRUE, polygon_alpha = 1, show_image = TRUE, image_name = "HE" ) Figure 10.12: Spatial plot with leiden clustering annotations. 10.8 Niche clustering Building on top of these leiden annotations, we can define spatial niche signatures based on which leiden types are often found together. 10.8.1 Spatial network First a spatial network must be generated so that spatial relationships between cells can be understood. g <- createSpatialNetwork(g, method = "Delaunay" ) spatPlot2D(g, point_shape = "no_border", show_network = TRUE, point_size = 0.1, point_alpha = 0.5, network_color = "grey" ) Figure 10.13: Delaunay spatial network` 10.8.2 Niche calculation Calculate a proportion table for a cell metadata table for all the spatial neighbors of each cell. This means that with each cell established as the center of its local niche, the enrichment of each leiden cluster label is found for that local niche. The results are stored as a new spatial enrichment entry called “leiden_niche” g <- calculateSpatCellMetadataProportions(g, spat_network = "Delaunay_network", metadata_column = "leiden_clus", name = "leiden_niche" ) 10.8.3 k-means clustering based on niche signature # retrieve the niche info prop_table <- getSpatialEnrichment(g, name = "leiden_niche", output = "data.table") # convert to matrix prop_matrix <- GiottoUtils::dt_to_matrix(prop_table) # perform kmeans clustering set.seed(1234) # make kmeans clustering reproducible prop_kmeans <- kmeans( x = prop_matrix, centers = 7, # controls how many clusters will be formed iter.max = 1000, nstart = 100 ) prop_kmeansDT = data.table::data.table( cell_ID = names(prop_kmeans$cluster), niche = prop_kmeans$cluster ) # return kmeans clustering on niche to gobject g <- addCellMetadata(g, new_metadata = prop_kmeansDT, by_column = TRUE, column_cell_ID = "cell_ID" ) # visualize niches spatInSituPlotPoints(g, show_image = TRUE, image_name = "HE", polygon_fill = "niche", # polygon_fill_code = getColors("Accent", 8), polygon_alpha = 1, polygon_fill_as_factor = TRUE ) # visualize niche makeup cellmeta <- pDataDT(g) ggplot2::ggplot( cellmeta, ggplot2::aes(fill = as.character(leiden_clus), y = 1, x = as.character(niche))) + ggplot2::geom_bar(position = "fill", stat = "identity") + ggplot2::scale_fill_manual(values = c( "#E7298A", "#FFED6F", "#80B1D3", "#E41A1C", "#377EB8", "#A65628", "#4DAF4A", "#D9D9D9", "#FF7F00", "#BC80BD", "#666666", "#B3DE69") ) Figure 10.14: Leiden annotation-based spatial niches Figure 10.15: Stacked barplot of leiden annotation composition by niche. Coloring is matched to that of the previous spatial plot with leiden clustering annotations 10.9 Cell proximity enrichment Using a spatial network, determine if there is an enrichment or depletion between annotation types by calculating the observed over the expected frequency of interactions. # uses a lot of memory leiden_prox <- cellProximityEnrichment(g, cluster_column = "leiden_clus", spatial_network_name = "Delaunay_network", adjust_method = "fdr", number_of_simulations = 2000 ) cellProximityBarplot(g, CPscore = leiden_prox, min_orig_ints = 5, # minimum original cell-cell interactions min_sim_ints = 5 # minimum simulated cell-cell interactions ) Figure 10.16: Cell-cell interaction enrichments and depletions (left). Number of interactions of each type found (right) Most enrichments are self-self interactions, which is expected. However, 6–8 and 2–9 stand out as being hetero interactions that are enriched with a large number of interactions. We can take a closer look by plotting these annotation pairs with colors that stand out. # set up colors other_cell_color <- rep("grey", 12) int_6_8 <- int_2_9 <- other_cell_color int_6_8[c(6, 8)] <- c("orange", "cornflowerblue") int_2_9[c(2, 9)] <- c("orange", "cornflowerblue") spatInSituPlotPoints(g, polygon_fill = "leiden_clus", polygon_fill_as_factor = TRUE, polygon_fill_code = int_6_8, polygon_line_size = 0.1, polygon_alpha = 1, show_image = TRUE, image_name = "HE" ) spatInSituPlotPoints(g, polygon_fill = "leiden_clus", polygon_fill_as_factor = TRUE, polygon_fill_code = int_2_9, polygon_line_size = 0.1, show_image = TRUE, polygon_alpha = 1, image_name = "HE" ) Figure 10.17: Spatial plot of enriched leiden annotation 6 to 8 interactions Figure 10.18: Spatial plot of enriched leiden annotation 2 to 9 interactions 10.10 Pseudovisium Another thing we can do is create a “pseudovisium” dataset by tessellating across this dataset using the same layout and resolution as a Visium capture array. makePseudoVisium() generates a Visium array of circular polygons across the spatial extent provided. Here we use ext() with the prefer arg pointing to the polygon and points data and all_data = TRUE, meaning that the combined spatial extent of those two data types will be returned, giving a good measure of where all the data in the object is at the moment. micron_size = 1 since the Xenium data is already scaled to microns. pvis <- makePseudoVisium( extent = ext(g, prefer = c("polygon", "points"), all_data = TRUE), # all_data = TRUE is the default micron_size = 1 ) g <- setGiotto(g, pvis) g <- addSpatialCentroidLocations(g, poly_info = "pseudo_visium") plot(pvis) Figure 10.19: Pseudovisium spot geometries generated by makePseudoVisium() 10.10.1 Pseudovisium aggregation and workflow Make “pseudo_visium” the new default spatial unit then proceed with aggregation and usual aggregate workflow. activeSpatUnit(g) <- "pseudo_visium" g <- calculateOverlap(g, spatial_info = "pseudo_visium", feat_info = "rna" ) g <- overlapToMatrix(g) g <- filterGiotto(g, expression_threshold = 1, feat_det_in_min_cells = 1, min_det_feats_per_cell = 100 ) g <- normalizeGiotto(g) g <- addStatistics(g) spatInSituPlotPoints(g, show_image = TRUE, image_name = "HE", polygon_feat_type = "pseudo_visium", polygon_fill = "total_expr", polygon_fill_gradient_style = "sequential" ) Figure 10.20: Pseudo visium total detections per spot g <- runPCA(g, feats_to_use = NULL) g <- runUMAP(g, dimensions_to_use = seq(15), n_neighbors = 15 ) g <- createNearestNetwork(g, dimensions_to_use = seq(15), k = 15 ) g <- doLeidenCluster(g, resolution = 1.5) # plots plotPCA(g, cell_color = "leiden_clus", point_size = 2) plotUMAP(g, cell_color = "leiden_clus", point_size = 2) spatInSituPlotPoints(g, polygon_feat_type = "pseudo_visium", polygon_fill = "leiden_clus", polygon_fill_as_factor = TRUE ) spatInSituPlotPoints(g, polygon_feat_type = "pseudo_visium", polygon_fill = "leiden_clus", polygon_fill_as_factor = TRUE, show_image = TRUE, image_name = "HE" ) Figure 10.21: Leiden clustering in PCA (top left) and UMAP (top right) spaces, and in spatial plot with no image (bottom left), and with image (bottom right) "],["spatial-proteomics-multiplexed-immunofluorescence.html", "11 Spatial proteomics: Multiplexed Immunofluorescence 11.1 Spatial Proteomics Technologies 11.2 Raw data type coming out of different technologies 11.3 Cell Segmentation file to get single cell level protein expression 11.4 Create a Giotto Object using list of gitto large images and polygons 11.5 Session info", " 11 Spatial proteomics: Multiplexed Immunofluorescence Junxiang Xu August 6th 2024 Before you start, this tutorial contains an optional part to run image segmentation using Giotto wrapper of Cellpose. If considering to use that function, please restart R session as we will need to activate a new Giotto python environment. The environment is also compatible with other Giotto functions. We will also need to install the Cellpose supported Giotto environment if haven’t done so. #Install the Giotto Environment with Cellpose, note that we only need to do it once reticulate::conda_create(envname = "giotto_cellpose", python_version = 3.8) #.re.restartR() reticulate::use_condaenv("giotto_cellpose") reticulate::py_install( pip = TRUE, envname = "giotto_cellpose", packages = c( "pandas", "networkx", "python-igraph", "leidenalg", "scikit-learn", "cellpose", "smfishhmrf", "tifffile", "scikit-image" ) ) #.rs.restartR() Now, activate the Giotto python environment. #.rs.restartR() # Activate the Giotto python environment of your choice GiottoClass::set_giotto_python_path("giotto_cellpose") # Check if cellpose was successfully installed GiottoUtils::package_check("cellpose", repository = "pip") 11.1 Spatial Proteomics Technologies This tutorial is aimed at analyzing spatially resolved multiplexed immunofluorescence data. It is compatible for different kinds of image based spatial proteomics data, such as Akoya(CODEX), CyCIF, IMC, MIBI, and Lunaphore(seqIF). Note that this tutorial will focus on starting directly with the intensity data(image), not the decoded count matrix. This is the example Lunaphore dataset from Lunaphore the official website and we are using the cropped one small area as an example. This is an overview of a subset of how the data would look like. 11.2 Raw data type coming out of different technologies 11.2.1 Use ome.tiff as an example output data to begin with OME-TIFF (Open Microscopy Environment Tagged Image File Format) is a file format designed for including detailed metadata and support for multi-dimensional image data. This is a common output file format for spatial proteomics platform such as lunaphore. library(Giotto) instrs <- createGiottoInstructions(save_dir = file.path(getwd(),"/img/02_session4/"), save_plot = TRUE, show_plot = TRUE, python_path = "giotto_cellpose") options(timeout = Inf) data_dir <- "data/02_session4" destfile <- file.path(data_dir, "Lunaphore.zip") if (!dir.exists(data_dir)) { dir.create(data_dir, recursive = TRUE) } download.file("https://zenodo.org/records/13175721/files/Lunaphore.zip?download=1", destfile = destfile) unzip(file.path(data_dir, "/Lunaphore.zip"), exdir = data_dir) list.files(file.path(data_dir, "/Lunaphore")) We provide a way to extract meta data information directly from ome.tiffs. Please note that different platforms may store the meta data such as channel information in a different format, we will probably need to change the node names of the ome-XML. img_path <- file.path(data_dir, "/Lunaphore/Lunaphore_example.ome.tiff") img_meta <- ometif_metadata(img_path, node = "Channel", output = "data.frame") img_meta However, sometimes a simple ometiff file manipulation like cropping could result in a loss of ome-XML information from the ome.tiff file. That way, we can use a different strategy to parse the xml information seperately and get channel information from it. ## Get channel information Luna <- file.path(data_dir, "/Lunaphore/Lunaphore_example.ome.tiff") xmldata <- xml2::read_xml(file.path(data_dir,"/Lunaphore/Lunaphore_sample_metadata.xml")) node <- xml2::xml_find_all(xmldata, "//d1:Channel", ns = xml2::xml_ns(xmldata)) channel_df <- as.data.frame(Reduce("rbind", xml2::xml_attrs(node))) channel_df 11.2.2 Use single channel images as an example output data to begin with Some platforms may also deconvolute and output gray scale single channel images. And we can create single channel images from ome.tiffs, the single channel images will be of the same format if the platform provide single channel gray scale images. With the single channel images, we can create a GiottoLargeImage and see what it looks like. # Create multichannel raster and extract each single channels Luna_terra <- terra::rast(Luna) names(Luna_terra) <- channel_df$Name gimg_DAPI <- createGiottoLargeImage(Luna_terra[[1]], negative_y = FALSE, flip_vertical = TRUE) plot(gimg_DAPI) Extract and save the raster image for future use. single_channel_dir <- file.path(data_dir, "/Lunaphore/single_channels/") if (!dir.exists(single_channel_dir)) { dir.create(single_channel_dir, recursive = TRUE) } for (i in 1:nrow(channel_df)){ single_channel <- terra::subset(Luna_terra, i) terra::writeRaster(single_channel, filename = paste0(single_channel_dir,names(single_channel),".tiff"), overwrite = TRUE) } Create a list of GiottoLargeImages using single channel rasters. file_names <- list.files(single_channel_dir, full.names = TRUE) image_names <- sub("\\\\.tiff$", "", list.files(single_channel_dir)) gimg_list <- createGiottoLargeImageList(raster_objects = file_names, names = image_names, negative_y = FALSE, flip_vertical = TRUE) names(gimg_list) <- image_names plot(gimg_list[["Vimentin"]]) 11.3 Cell Segmentation file to get single cell level protein expression Cell segmentation is necessary to generate single cell level protein expression. Currently, there are multiple algorithms to generate segmentations from images and output could be different. For that purpose, Giotto provides createGiottoPolygonsFromMask(), createGiottoPolygonsFromDfr(), createGiottoPolygonsFromGeoJSON() to load different type of file to the giottoPolygon Class. 11.3.1 Using segmentation output file from DeepCell(mesmer) as an example. We collapsed several different channels to created a pseudo memberane staining channel(“nuc_and_bound.tif” provided here), and use that as an input for the deepcell mesmer segmentation pipeline. We can load the output mask from to GiottoPolygon via a convenience function. gpoly_mesmer <- createGiottoPolygonsFromMask( file.path(data_dir, "/Lunaphore/whole_cell_mask.tif"), shift_horizontal_step = FALSE, shift_vertical_step = FALSE, flip_vertical = TRUE, calc_centroids = TRUE) plot(gpoly_mesmer) We can also zoom in to check how does the segmentation look. zoom <- c(2000,2500,2000,2500) plot(gimg_DAPI, ext = zoom) plot(gpoly_mesmer, add = TRUE, border = "white", ext = zoom) 11.3.2 Using Giotto wrapper of Cellpose to perform segmentation Here, we create a mini example by cropping the image to a smaller area. Note that crop() is probably easier to use to directly crop image, unless cropping the image when the image is inside of a giotto object. gimg_cropped <- cropGiottoLargeImage(giottoLargeImage = gimg_DAPI, crop_extent = terra::ext(zoom)) writeGiottoLargeImage(gimg_cropped, filename = file.path(data_dir, "/Lunaphore/DAPI_forcellpose.tiff"), overwrite = TRUE) #Create a giotto image to evaluate segmentation gimg_for_cellpose <- createGiottoLargeImage( file.path(data_dir, "/Lunaphore/DAPI_forcellpose.tiff"), negative_y = FALSE) Now we can run the cellpose segmentation. We can provide different parameters for cellpose inference model(flow_threshold,cellprob_threshold,etc), and practically, the batch size represents how many 224X224 images are calculated in parallel, increasing the amount will increase RAM/VRAM requirement, lowering the amount will increase the run time. For more information please refer to the cellpose website doCellposeSegmentation(image_dir = file.path(data_dir, "/Lunaphore/DAPI_forcellpose.tiff"), mask_output = file.path(data_dir, "/Lunaphore/giotto_cellpose_seg.tiff"), channel_1 = 0, channel_2 = 0, model_name = "cyto3", batch_size = 12) cpoly <- createGiottoPolygonsFromMask(file.path(data_dir,"/Lunaphore/giotto_cellpose_seg.tiff"), shift_horizontal_step = FALSE, shift_vertical_step = FALSE, flip_vertical = TRUE) plot(gimg_for_cellpose) plot(cpoly, add = TRUE, border = "red") 11.4 Create a Giotto Object using list of gitto large images and polygons You will need to have: list of giotto images giottoPolygon created from segmentation Lunaphore_giotto <- createGiottoObjectSubcellular(gpolygons = list("cell" = gpoly_mesmer), images = gimg_list, instructions = instrs) Lunaphore_giotto 11.4.1 Overlap to matrix calculateOverlap() and overlapToMatrix() are used to overlap the intensity values with Lunaphore_giotto <- calculateOverlap(Lunaphore_giotto, spatial_info = "cell", image_names = names(gimg_list)) Lunaphore_giotto <- overlapToMatrix(x = Lunaphore_giotto, type ="intensity", poly_info = "cell", feat_info = "protein", aggr_function = "sum") showGiottoExpression(Lunaphore_giotto) 11.4.2 Manipulate Expression information For IF data, DAPI staining is usually only used for stain nuclei, the intensity value of DAPI usually does not have meaningful result to drive difference between cell types. Similar things could happen when a platform uses some reference channel to adjust signal calling, such as TRITC or Cy5, These images will be loaded but need to be removed for expression profile. Therefore, we could extract the feature expression matrix, filter the DAPI information and write it back to the Giotto Object expr_mtx <- getExpression(Lunaphore_giotto, values = "raw", output = "matrix") filtered_expr_mtx <- expr_mtx[rownames(expr_mtx) != "DAPI",] Lunaphore_giotto <- setExpression(Lunaphore_giotto, feat_type = "protein", x = createExprObj(filtered_expr_mtx), name = "raw") showGiottoExpression(Lunaphore_giotto) 11.4.3 Rescale polygons rescalePolygons() will provide a quick way to manipulate the polygon size and potentially affect the expression for each cell. redo the calculateOverlap() and overlapToMatrix() will potentially change the downstream analysis Lunaphore_giotto <- rescalePolygons(gobject = Lunaphore_giotto, poly_info = "cell", name = "smallcell", fx = 0.7, fy = 0.7, calculate_centroids = TRUE) smallpoly <- getPolygonInfo(Lunaphore_giotto, polygon_name = "smallcell") plot(gimg_DAPI, ext = zoom) plot(gpoly_mesmer, add = TRUE, border = "white", ext = zoom) plot(smallpoly, add = TRUE, border = "red", ext = zoom) 11.4.4 Perform clustering and differential expression The Giotto Object can then go through standard analysis pipeline normalization, dimensional reduction and clustering Lunaphore_giotto <- normalizeGiotto(Lunaphore_giotto, spat_unit = "cell", feat_type = "protein") Lunaphore_giotto <- addStatistics(Lunaphore_giotto, spat_unit = "cell", feat_type = "protein") Lunaphore_giotto <- runPCA(Lunaphore_giotto, spat_unit = "cell", feat_type = "protein", scale_unit = FALSE, center = FALSE, ncp = 20, feats_to_use = NULL, set_seed = TRUE) screePlot(Lunaphore_giotto, spat_unit = "cell", feat_type = "protein", show_plot = TRUE) Due to the limited number of total features we have, Leiden clustering generally does not work very well compared to Kmeans or hierarchical clustering. Here we can use hierarchical clustering to do a quick check. Lunaphore_giotto <- runUMAP(gobject = Lunaphore_giotto, spat_unit = "cell", feat_type = "protein", dimensions_to_use = 1:5, set_seed = TRUE) Lunaphore_giotto <- createNearestNetwork(Lunaphore_giotto, spat_unit = "cell", feat_type = "protein", dimensions_to_use = 1:5) Lunaphore_giotto <- doHclust(Lunaphore_giotto, k = 8, dim_reduction_to_use = "cells", spat_unit = "cell", feat_type = "protein") spatInSituPlotPoints(gobject = Lunaphore_giotto, spat_unit = "cell", polygon_feat_type = "cell", show_polygon = TRUE, feat_type = "protein", feats = NULL, polygon_fill = "hclust", polygon_fill_as_factor = TRUE, polygon_line_size = 0, image_name = "CD68", show_image = TRUE, return_plot = TRUE, polygon_color = "black", background_color = "white") Then we can check the heatmap of protein expression and determine the first round of cluster annotation. cluster_column <- "hclust" plotMetaDataHeatmap(Lunaphore_giotto, spat_unit = "cell", feat_type = "protein", expression_values = "raw", metadata_cols = cluster_column, selected_feats = names(gimg_list), y_text_size = 8, show_values = "zscores_rescaled") 11.4.5 Give the cluster an annotation based on expression values annotation <- c("B_cell", "Macrophage", "T_cell", "stromal", "epithelial", "DC" , "Fibroblast", "endothelial") names(annotation) <- 1:8 Lunaphore_giotto <- annotateGiotto(Lunaphore_giotto, cluster_column = "hclust", annotation_vector = annotation, name = "cell_types") 11.4.6 Spatial network This is to create a cellular neighborhood based on nearest neighbor of physical distance. Lunaphore_giotto <- createSpatialNetwork(Lunaphore_giotto) spatPlot2D(Lunaphore_giotto, show_network = TRUE, network_color = "blue", point_size = 1.5, cell_color = "hclust") 11.4.7 Cell Neighborhood: Cell-Type/Cell-Type Interactions This is using cellProximityEnrichment() to statistically identify cell type interactions. cell_proximities <- cellProximityEnrichment(gobject = Lunaphore_giotto, cluster_column = "cell_types", spatial_network_name = "Delaunay_network", adjust_method = "fdr", number_of_simulations = 2000) ## barplot cellProximityBarplot(gobject = Lunaphore_giotto, CPscore = cell_proximities, min_orig_ints = 5, min_sim_ints = 5) ## network cellProximityNetwork(gobject = Lunaphore_giotto, CPscore = cell_proximities, remove_self_edges = TRUE, only_show_enrichment_edges = FALSE) 11.5 Session info sessionInfo() R version 4.4.1 (2024-06-14) Platform: aarch64-apple-darwin20 Running under: macOS Sonoma 14.5 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: America/New_York tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Giotto_4.1.0 GiottoClass_0.3.4 loaded via a namespace (and not attached): [1] RColorBrewer_1.1-3 rstudioapi_0.16.0 jsonlite_1.8.8 [4] magrittr_2.0.3 magick_2.8.4 farver_2.1.2 [7] rmarkdown_2.27 zlibbioc_1.50.0 ragg_1.3.2 [10] vctrs_0.6.5 memoise_2.0.1 GiottoUtils_0.1.10 [13] terra_1.7-78 htmltools_0.5.8.1 S4Arrays_1.4.1 [16] raster_3.6-26 SparseArray_1.4.8 sass_0.4.9 [19] bslib_0.8.0 KernSmooth_2.23-24 htmlwidgets_1.6.4 [22] plyr_1.8.9 plotly_4.10.4 cachem_1.1.0 [25] igraph_2.0.3 lifecycle_1.0.4 pkgconfig_2.0.3 [28] rsvd_1.0.5 Matrix_1.7-0 R6_2.5.1 [31] fastmap_1.2.0 GenomeInfoDbData_1.2.12 MatrixGenerics_1.16.0 [34] digest_0.6.36 colorspace_2.1-1 S4Vectors_0.42.1 [37] irlba_2.3.5.1 textshaping_0.4.0 GenomicRanges_1.56.1 [40] beachmat_2.20.0 labeling_0.4.3 fansi_1.0.6 [43] httr_1.4.7 polyclip_1.10-7 abind_1.4-5 [46] compiler_4.4.1 proxy_0.4-27 withr_3.0.1 [49] backports_1.5.0 BiocParallel_1.38.0 viridis_0.6.5 [52] DBI_1.2.3 highr_0.11 ggforce_0.4.2 [55] MASS_7.3-61 DelayedArray_0.30.1 rjson_0.2.21 [58] classInt_0.4-10 gtools_3.9.5 GiottoVisuals_0.2.4 [61] tools_4.4.1 units_0.8-5 glue_1.7.0 [64] dbscan_1.2-0 grid_4.4.1 sf_1.0-16 [67] checkmate_2.3.2 reshape2_1.4.4 generics_0.1.3 [70] gtable_0.3.5 class_7.3-22 tidyr_1.3.1 [73] data.table_1.15.4 BiocSingular_1.20.0 tidygraph_1.3.1 [76] ScaledMatrix_1.12.0 sp_2.1-4 xml2_1.3.6 [79] utf8_1.2.4 XVector_0.44.0 BiocGenerics_0.50.0 [82] RcppAnnoy_0.0.22 ggrepel_0.9.5 pillar_1.9.0 [85] stringr_1.5.1 dplyr_1.1.4 tweenr_2.0.3 [88] lattice_0.22-6 deldir_2.0-4 tidyselect_1.2.1 [91] SingleCellExperiment_1.26.0 knitr_1.48 gridExtra_2.3 [94] bookdown_0.40 IRanges_2.38.1 SummarizedExperiment_1.34.0 [97] scattermore_1.2 stats4_4.4.1 xfun_0.46 [100] graphlayouts_1.1.1 Biobase_2.64.0 matrixStats_1.3.0 [103] stringi_1.8.4 UCSC.utils_1.0.0 lazyeval_0.2.2 [106] yaml_2.3.10 evaluate_0.24.0 codetools_0.2-20 [109] ggraph_2.2.1 tibble_3.2.1 BiocManager_1.30.23 [112] colorRamp2_0.1.0 cli_3.6.3 uwot_0.2.2 [115] reticulate_1.38.0 systemfonts_1.1.0 jquerylib_0.1.4 [118] munsell_0.5.1 Rcpp_1.0.13 GenomeInfoDb_1.40.1 [121] png_0.1-8 parallel_4.4.1 ggplot2_3.5.1 [124] exactextractr_0.10.0 SpatialExperiment_1.14.0 viridisLite_0.4.2 [127] scales_1.3.0 e1071_1.7-14 purrr_1.0.2 [130] crayon_1.5.3 rlang_1.1.4 cowplot_1.1.3 "],["working-with-multiple-samples.html", "12 Working with multiple samples 12.1 Objective 12.2 Background 12.3 Create individual giotto objects 12.4 Extracting the downloaded files 12.5 Join Giotto Objects 12.6 Visualizing combined datasets 12.7 Splitting combined dataset 12.8 Analyzing joined objects 12.9 Perform Harmony and default workflows", " 12 Working with multiple samples Jeff Sheridan August 7th 2024 12.1 Objective Giotto enables the grouping of multiple objects into a single object for combined analysis. Grouping objects can be used to ensure normalization is consistent across datasets allowing us to compare datasets directly. Datasets can be spatially distributed across the x, y, or z axes, allowing for the creation of 3D datasets using the z-plane or the analysis of grouped datasets, such as multiple replicates or similar samples. While it’s possible to integrate multiple datasets, batch effects and differences between samples can hinder effective integration. In such cases, more sophisticated methods may be needed to successfully integrate and cluster samples as a unified dataset. One example of an advanced integration technique is Harmony, which will be discussed in more detail later in this tutorial. This tutorial will demonstrate the integration of two Visium datasets, examining the results before and after Harmony integration. 12.2 Background 12.2.1 Dataset For this tutorial we will be using two prostate visium datasets produced by 10X Genomics, one an Adenocarcinoma with Invasive Carcinoma and the other a normal prostate sample. 12.2.2 Visium technology Figure 12.1: Overview of Visium. Source: 10X Genomics. Visium by 10x Genomics is a spatial gene expression platform that allows for the mapping of gene expression to high-resolution histology through RNA sequencing The process involves placing a tissue section on a specially prepared slide with an array of barcoded spots, which are 55 µm in diameter with a spot to spot distance of 100 µm. Each spot contains unique barcodes that capture the mRNA from the tissue section, preserving the spatial information. After the tissue is imaged and RNA is captured, the mRNA is sequenced, and the data is mapped back to the tissue’s spatial coordinates. This technology is particularly useful in understanding complex tissue environments, such as tumors, by providing insights into how gene expression varies across different regions. 12.3 Create individual giotto objects 12.3.1 Download the data You need to download the expression matrix and spatial information by running these commands: data_dir <- "data/03_session1" dir.create(file.path(data_dir, "Visium_FFPE_Human_Prostate_Cancer"), showWarnings = FALSE, recursive = TRUE) # Spatial data adenocarcinoma prostate download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Prostate_Cancer/Visium_FFPE_Human_Prostate_Cancer_spatial.tar.gz", destfile = file.path(data_dir, "Visium_FFPE_Human_Prostate_Cancer/Visium_FFPE_Human_Prostate_Cancer_spatial.tar.gz")) # Download matrix adenocarcinoma prostate download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Prostate_Cancer/Visium_FFPE_Human_Prostate_Cancer_raw_feature_bc_matrix.tar.gz", destfile = file.path(data_dir, "Visium_FFPE_Human_Prostate_Cancer/Visium_FFPE_Human_Prostate_Cancer_raw_feature_bc_matrix.tar.gz")) dir.create(file.path(data_dir, "Visium_FFPE_Human_Normal_Prostate"), showWarnings = FALSE, recursive = TRUE) # Spatial data normal prostate download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Normal_Prostate/Visium_FFPE_Human_Normal_Prostate_spatial.tar.gz", destfile = file.path(data_dir, "Visium_FFPE_Human_Normal_Prostate/Visium_FFPE_Human_Normal_Prostate_spatial.tar.gz")) # Download matrix normal prostate download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Normal_Prostate/Visium_FFPE_Human_Normal_Prostate_raw_feature_bc_matrix.tar.gz", destfile = file.path(data_dir, "Visium_FFPE_Human_Normal_Prostate/Visium_FFPE_Human_Normal_Prostate_raw_feature_bc_matrix.tar.gz")) 12.4 Extracting the downloaded files # The adenocarcinoma sample untar(tarfile = file.path(data_dir, "Visium_FFPE_Human_Prostate_Cancer/Visium_FFPE_Human_Prostate_Cancer_spatial.tar.gz"), exdir = file.path(data_dir, "Visium_FFPE_Human_Prostate_Cancer")) untar(tarfile = file.path(data_dir, "Visium_FFPE_Human_Prostate_Cancer/Visium_FFPE_Human_Prostate_Cancer_raw_feature_bc_matrix.tar.gz"), exdir = file.path(data_dir, "Visium_FFPE_Human_Prostate_Cancer")) # The normal prostate sample untar(tarfile = file.path(data_dir, "Visium_FFPE_Human_Normal_Prostate/Visium_FFPE_Human_Normal_Prostate_spatial.tar.gz"), exdir = file.path(data_dir, "Visium_FFPE_Human_Normal_Prostate")) untar(tarfile = file.path(data_dir, "Visium_FFPE_Human_Normal_Prostate/Visium_FFPE_Human_Normal_Prostate_raw_feature_bc_matrix.tar.gz"), exdir = file.path(data_dir, "Visium_FFPE_Human_Normal_Prostate")) 12.4.1 Create giotto instructions We must first create instructions for our Giotto object. This will tell the object where to save outputs, whether to show or return plots, and the python path. Specifying the python path is often not required as Giotto will identify the relevant python environment, but might be required in some instances. library(Giotto) save_dir <- "results/03_session1" instrs <- createGiottoInstructions(save_dir = save_dir, save_plot = TRUE, show_plot = TRUE, python_path = NULL) 12.4.2 Load visium data into Giotto We next need to read in the data for the Giotto object. To do this we will use the createGiottoVisiumObject() convenience function. This requires us to specify the directory that contains the visium data output from 10X Genomics’s Spaceranger. We also specify the expression data to use (raw or filtered) as well as the image to align. Spaceranger outputs two images, a low and high resolution image. ## Healthy prostate N_pros <- createGiottoVisiumObject( visium_dir = file.path(data_dir, "Visium_FFPE_Human_Normal_Prostate"), expr_data = "raw", png_name = "tissue_lowres_image.png", gene_column_index = 2, instructions = instrs ) ## Adenocarcinoma C_pros <- createGiottoVisiumObject( visium_dir = file.path(data_dir, "Visium_FFPE_Human_Prostate_Cancer"), expr_data = "raw", png_name = "tissue_lowres_image.png", gene_column_index = 2, instructions = instrs ) We can see that the gobject contains information for the cells (polygon and spatial units), the RNA express (raw) and the relevant image. Figure 12.2: Structure of Giotto object containing a single dataset. 12.4.3 Healthy prostate tissue coverage Aligning the Visium spots to the tissue using the fiducials that border the capture area enables the identification of spots containing expression data from the tissue. These spots can be visualized using the spatPlot2D function by setting the cell_color parameter to “in_tissue”. spatPlot2D(gobject = N_pros, cell_color = "in_tissue", show_image = TRUE, point_size = 2.5, cell_color_code = c("black", "red"), point_alpha = 0.5, save_param = list(save_name = "03_ses1_normal_prostate_tissue")) Figure 12.3: Tissue coverage for the normal prostate sample. 12.4.4 Adenocarcinoma prostate tissue coverage spatPlot2D(gobject = C_pros, cell_color = "in_tissue", show_image = TRUE, point_size = 2.5, cell_color_code = c("black", "red"), point_alpha = 0.5, save_param = list(save_name = "03_ses1_adeno_prostate_tissue")) Figure 12.4: Tissue coverage for the adenocarcinoma prostate sample. 12.4.5 Showing the data strucutre for the inidividual objects # Printing the file structure for the individual datasets print(head(pDataDT(N_pros))) print(N_pros) 12.5 Join Giotto Objects To join objects together we can use the joinGittoObjects() function. For this we need to supply a list of objects as well as the names for each of these objects. We can also specify the x and y padding to separate the objects in space or the Z position for 3D datasets. If the x_shift is set to NULL then the total shift will be guessed from the Giotto image. combined_pros <- joinGiottoObjects(gobject_list = list(N_pros, C_pros), gobject_names = c("NP", "CP"), join_method = "shift", x_padding = 1000) # Printing the file structure for the individual datasets print(head(pDataDT(combined_pros))) print(combined_pros) From the joined data we can see the same information that was present in the single dataset objects as well as the addition of another image. The images are renamed from “image” to include the object name in the image name e.g. “NP-image”. We can also see in the cell metadata that there is a new column “list_ID” that contains the original object names. The cell_ID column also has the original object name appended to the beginning of each cell ID e.g. “NP-AAACAACGAATAGTTC-1”. Figure 12.5: Structure of Giotto object containing two datasets (left) and cell metadata on the left. Note the addition of multiple images and the addition of the list_ID column to define the dataset. 12.6 Visualizing combined datasets The combined dataset can either visualized in the same space or in two separate plots through the group_by variable. To show images both the show_image variable and the image_name variable containing both image names needs to be used. 12.6.1 Vizualizing in the same plot Due to the x_padding provided when joining the objects each of the datasets can be visualized in the same plotting area. We can see below the normal prostate sample on the left and the healthy prostate on the right. By including the show_image function and supplying both of the image names (“NP-image”, “CP-image”), we can also include the relevant images within the same plot. spatPlot2D(gobject = combined_pros, cell_color = "in_tissue", cell_color_code = c("black", "red"), show_image = TRUE, image_name = c("NP-image", "CP-image"), point_size = 1, point_alpha = 0.5, save_param = list(save_name = "03_ses1_combined_tissue")) Figure 12.6: Vizualizing the visium spots that overlap tissue in normal prostate (left) and adenocarcinoma samples (right) within the same plot. 12.6.2 Visualizing on separate plots If we want to visualize the datasets in separate plots we can supply the “group_by” variable. Below we group the data by “list_ID”, which corresponds to each dataset. We can specify the number of columns through the “cow_n_col” variable. spatPlot2D(gobject = combined_pros, cell_color = "in_tissue", cell_color_code = c("black", "pink"), show_image = TRUE, image_name = c("NP-image", "CP-image"), group_by = "list_ID", point_alpha = 0.5, point_size = 0.5, cow_n_col = 1, save_param = list(save_name = "03_ses1_combined_tissue_group")) Figure 12.7: Vizualizing the visium spots that overlap tissue in normal prostate (left) and adenocarcinoma samples (right) in separate plots. 12.7 Splitting combined dataset If needed it’s possible to split the individual objects into single objects again through subsetting the cell metadata as shown below. # Getting the cell information combined_cells <- pDataDT(combined_pros) np_cells <- combined_cells[list_ID == "NP"] np_split <- subsetGiotto(combined_pros, cell_ids = np_cells$cell_ID, poly_info = np_cells$cell_ID, spat_unit = ":all:") spatPlot2D(gobject = np_split, cell_color = "in_tissue", cell_color_code = c("black", "red"), show_image = TRUE, point_alpha = 0.5, point_size = 0.5, save_param = list(save_name = "03_ses1_split_object")) Figure 12.8: Structure of Giotto object containing two datasets (left) and cell metadata on the left. Note the addition of multiple images and the addition of the list_ID column to define the dataset. 12.8 Analyzing joined objects 12.8.1 Normalization and adding statistics Now that the objects have been joined we can analyze the object as if it was a single object. This means all of the analyses will be performed in parallel. Therefore, all of the filtering and normalization will be identical between datasets, retaining the ability for direct comparisons between datasets. # subset on in-tissue spots metadata <- pDataDT(combined_pros) in_tissue_barcodes <- metadata[in_tissue == 1]$cell_ID combined_pros <- subsetGiotto(combined_pros, cell_ids = in_tissue_barcodes) ## filter combined_pros <- filterGiotto(gobject = combined_pros, expression_threshold = 1, feat_det_in_min_cells = 50, min_det_feats_per_cell = 500, expression_values = "raw", verbose = TRUE) ## normalize combined_pros <- normalizeGiotto(gobject = combined_pros, scalefactor = 6000) ## add gene & cell statistics combined_pros <- addStatistics(gobject = combined_pros, expression_values = "raw") ## visualize spatPlot2D(gobject = combined_pros, cell_color = "nr_feats", color_as_factor = FALSE, point_size = 1, show_image = TRUE, image_name = c("NP-image", "CP-image"), save_param = list(save_name = "ses3_1_feat_expression")) After performing the addStatistics() function on both the datasets we can see the relative expression for each spot in both samples. Figure 12.9: Unique feat expression for visium spots for both prostate samples. 12.8.2 Clustering the datasets Since we shifted the objects within space the spatial networks for each dataset will remain separate, assuming that the lower limits for neighbors is smaller than the distance of each dataset. However, the individual spot clustering will be performed on all spots from both datasets as if they were a single object, meaning that the same cell types between objects should be clustered together ## PCA ## combined_pros <- calculateHVF(gobject = combined_pros) combined_pros <- runPCA(gobject = combined_pros, center = TRUE, scale_unit = TRUE) ## cluster and run UMAP ## # sNN network (default) combined_pros <- createNearestNetwork(gobject = combined_pros, dim_reduction_to_use = "pca", dim_reduction_name = "pca", dimensions_to_use = 1:10, k = 15) # Leiden clustering combined_pros <- doLeidenCluster(gobject = combined_pros, resolution = 0.2, n_iterations = 200) # UMAP combined_pros <- runUMAP(combined_pros) 12.8.3 Vizualizing spatial location of clusters We can visualize the clusters determined through Leiden clustering on both of the datasets within the same plot. spatDimPlot2D(gobject = combined_pros, cell_color = "leiden_clus", show_image = TRUE, image_name = c("NP-image", "CP-image"), save_param = list(save_name = "ses3_1_leiden_clus")) Figure 12.10: UMAP (top) for both samples colored by Leiden clusters visualized in a spatial plot (bottom) for the normal prostate (left) and the adenocarcinoma prostate sample (right). 12.8.4 Vizualizing tissue contribution to clusters We can also color the UMAP to visualize the contribution from each tissue in the UMAP. To do this we color the UMAP by “list_ID” rather than “leiden_clus”. If each of the cell types between both samples cluster together then we would expect that clusters should contain the cell color of both samples. However, we can see that the samples are clustered distinctly within the UMAP. This indicates that the cell types shared between both samples are found within different clusters indicating that more complex integration techniques might be required for these samples. spatDimPlot2D(gobject = combined_pros, cell_color = "list_ID", show_image = TRUE, image_name = c("NP-image", "CP-image"), save_param = list(save_name = "ses3_1_tissue_contribution")) Figure 12.11: Tissue contribution for leiden clustering for the normal prostate (left) and the adenocarcinoma prostate sample (right). 12.9 Perform Harmony and default workflows Figure 12.12: Overview of how Harmony aligns multiple datasets. First cluster cells, then get the centroids and apply a dataset correction factor then move cells based on the soft cluster membership. (Korsunsky et al. 2019) We can use Harmony to integrate multiple datasets, grouping equivelent cell types between samples. Harmony is an algorithm that iteratively adjusts cell coordinates in a reduced-dimensional space to correct for dataset-specific effects. It uses fuzzy clustering to assign cells to multiple clusters, calculates dataset-specific correction factors, and applies these corrections to each cell, repeating the process until the influence of the dataset diminishes. Performing Harmony only affects the PCA space and does not alter gene expression. Before running Harmony we need to run the PCA function or set “do_pca” to TRUE. We ran this above so do not need to perform this step. Harmony will default to attempting 10 rounds of integration. Not all samples will need the full 10 and will finish accordingly. The following dataset should converge after 5 iterations. Harmony variables” theta: A parameter that controls the diversity within clusters, with higher values leading to more diverse clusters and a value of zero not encouraging any diversity. sigma: Determines the width of soft k-means clusters, with larger values allowing cells to belong to more clusters and smaller values making the clustering approach more rigid. lambda: A penalty parameter for ridge regression that helps prevent overcorrection, where larger values offer more protection, and it can be automatically estimated if set to NULL. nclust: Specifies the number of clusters in the model. library(harmony) ## run harmony integration combined_pros <- runGiottoHarmony(combined_pros, vars_use = "list_ID", do_pca = FALSE, sigma = 0.1, theta = 2, lambda = 1, nclust = NULL) After running the Harmony function successfully we can see that the outputted gobject has a new dim reduction names “harmony”. We can use this for all subsequent spatial steps. Figure 12.13: Data structure of the gobject after running Harmony integration. 12.9.1 Clustering harmonized object We can now perform the same clustering steps as before but instead using the “harmony” dim reduction rather than PCA. We will also be creating new UMAP and nearest network data for the gobject that will be named differently to before to preserve the original analyses. If using the same name then this will overwrite the original analysis. ## sNN network (default) combined_pros <- createNearestNetwork(gobject = combined_pros, dim_reduction_to_use = "harmony", dim_reduction_name = "harmony", name = "NN.harmony", dimensions_to_use = 1:10, k = 15) ## Leiden clustering combined_pros <- doLeidenCluster(gobject = combined_pros, network_name = "NN.harmony", resolution = 0.2, n_iterations = 1000, name = "leiden_harmony") # UMAP dimension reduction combined_pros <- runUMAP(combined_pros, dim_reduction_name = "harmony", dim_reduction_to_use = "harmony", name = "umap_harmony") spatDimPlot2D(gobject = combined_pros, dim_reduction_to_use = "umap", dim_reduction_name = "umap_harmony", cell_color = "leiden_harmony", show_image = TRUE, image_name = c("NP-image", "CP-image"), spat_point_size = 1, save_param = list(save_name = "leiden_clustering_harmony")) We can see a different UMAP and clustering to that seen in the original steps above. We can again map these onto the tissue spots and see where the clusters are spatially. Figure 12.14: Leiden clustering after harmony was performed for the normal prostate (left) and the adenocarcinoma prostate sample (right). 12.9.2 Vizualizing the tissue contribution We can see that after performing harmony that the clusters from the two tissue samples are now clustered together. There is still a cluster that is unique to the adenocarcinoma sample, however this is expected as this represents the visium spots that cover the tumor regions of the tissue, which are not found in the normal tissue. spatDimPlot2D(gobject = combined_pros, dim_reduction_to_use = "umap", dim_reduction_name = "umap_harmony", cell_color = "list_ID", save_plot = TRUE, save_param = list(save_name = "leiden_clustering_harmony_contribution")) Figure 12.15: Tissue contribution for leiden clustering after harmony for the normal prostate (left) and the adenocarcinoma prostate sample (right). "],["spatial-multi-modal-analysis.html", "13 Spatial multi-modal analysis 13.1 Overview 13.2 Spatial manipulation 13.3 Examples of the simple transforms with a giottoPolygon 13.4 Affine transforms 13.5 Image transforms 13.6 The practical usage of multi-modality co-registration", " 13 Spatial multi-modal analysis George Chen Junxiang Xu August 7th 2024 13.1 Overview Spatial multimodal datasets are created when there is more than one modality available for a single piece of tissue. One way that these datasets can be assembled is by performing multiple spatial assays on closely adjacent tissue sections or ideally the same section. However, for these datasets, in addition to the usual expression space integration, we must also first spatially align them. 13.2 Spatial manipulation Performing spatial analyses across any two sections of tissue from the same block requires that data to be spatially aligned into a common coordinate space. Minute differences during the sectioning process from the cutting motion to how long an FFPE section was floated can result in even neighboring sections being distorted when compared side-by-side. These differences make it difficult to assemble multislice and/or cross platform multimodal datasets into a cohesive 3D volume. The solution for this is to perform registration across either the dataset images or expression information. Based on the registration results, both the raster images and vector feature and polygon information can be aligned into a continuous whole. Ideally this registration will be a free deformation based on sets of control points or a deformation matrix, however affine transforms already provide a good approximation. In either case, the transform or deformation applied must work in the same way across both raster and vector information. Giotto provides spatial classes and methods for easy manipulation of data with 2D affine transformations. These functionalities are all available from GiottoClass. 13.2.1 Spatial transforms: We support simple transformations and more complex affine transformations which can be used to combine and encode more than one simple transform. spatShift() - translations spin() - rotations (degrees) rescale() - scaling flip() - flip vertical or horizontal across arbitrary lines t() - transpose shear() - shear transform affine() - affine matrix transform 13.2.2 Spatial utilities: Helpful functions for use alongside these spatial transforms are ext() for finding the spatial bounding box of where your data object is, crop() for cutting out a spatial region of the data, and plot() for terra/base plots of the data. ext() - spatial extent or bounding box crop() - cut out a spatial region of the data plot() - plot a spatial object 13.2.3 Spatial classes: Giotto’s spatial subobjects respond to the above functions. The Giotto object itself can also be affine transformed. spatLocsObj - xy centroids spatialNetworkObj - spatial networks between centroids giottoPoints - xy feature point detections giottoPolygon - spatial polygons giottoImage (mostly deprecated) - magick-based images giottoLargeImage/giottoAffineImage - terra-based images affine2d - affine matrix container giotto - giotto analysis object # load in data library(Giotto) g <- GiottoData::loadGiottoMini("vizgen") activeSpatUnit(g) <- "aggregate" gpoly <- getPolygonInfo(g, return_giottoPolygon = TRUE) gimg <- getGiottoImage(g) 13.3 Examples of the simple transforms with a giottoPolygon rain <- rainbow(nrow(gpoly)) line_width <- 0.3 # par to setup the grid plotting layout p <- par(no.readonly = TRUE) par(mfrow=c(3,3)) gpoly |> plot(main = "no transform", col = rain, lwd = line_width) gpoly |> spatShift(dx = 1000) |> plot(main = "spatShift(dx = 1000)", col = rain, lwd = line_width) gpoly |> spin(45) |> plot(main = "spin(45)", col = rain, lwd = line_width) gpoly |> rescale(fx = 10, fy = 6) |> plot(main = "rescale(fx = 10, fy = 6)", col = rain, lwd = line_width) gpoly |> flip(direction = "vertical") |> plot(main = "flip()", col = rain, lwd = line_width) gpoly |> t() |> plot(main = "t()", col = rain, lwd = line_width) gpoly |> shear(fx = 0.5) |> plot(main = "shear(fx = 0.5)", col = rain, lwd = line_width) par(p) 13.4 Affine transforms The above transforms are all simple to understand in how they work, but you can imagine that performing them in sequence on your dataset can be computationally expensive. Luckily, the above operations are all affine transformation, and they can be condensed into a single step. Affine transforms where the x and y values undergo a linear transform. These transforms in 2D, can all be represented as a 2x2 matrix or 2x3 if the xy translation values are included. To perform the linear transform, the xy coordinates just need to be matrix multiplied by the 2x2 affine matrix. The resulting values should then be added to the translate values. Due to the nature of matrix multiplication, you can simply multiply the affine matrices with each other and when the xy coordinates are multiplied by the resulting matrix, it performs both linear transforms in the same step. Giotto provides a utility affine2d S4 class that can be created from any affine matrix and responds to the affine transform functions to simplify this accumulation of simple transforms. Once done, the affine2d can be applied to spatial objects in a single step using affine() in the same way that you would use a matrix. # create affine2d aff <- affine() # when called without params, this is the same as affine(diag(c(1, 1))) The affine2d object also has an anchor spatial extent, which is used in calculations of the translation values. affine2d generates with a default extent, but a specific one matching that of the object you are manipulating (such as that of the giottoPolygon) should be set. aff@anchor <- ext(gpoly) aff <- initialize(aff) # append several simple transforms aff <- aff |> spatShift(dx = 1000) |> spin(45, x0 = 0, y0 = 0) |> # without the x0, y0 params, the extent center is used rescale(10, x0 = 0, y0 = 0) |> # without the x0, y0 params, the extent center is used flip(direction = "vertical") |> t() |> shear(fx = 0.5) force(aff) <affine2d> anchor : 6399.24384990901, 6903.24298517207, -5152.38959073896, -4694.86823300896 (xmin, xmax, ymin, ymax) rotate : -0.785398163397448 (rad) shear : 0.5, 0 (x, y) scale : 10, 10 (x, y) translate : 963.028150700062, 7071.06781186548 (x, y) The show() function displays some information about the stored affine transform, including a set of decomposed simple transformations. You can then plot the affine object and see a projection of the spatial transform where blue is the starting position and red is the end. plot(aff) We can then apply the affine transforms to the giottoPolygon to see that it indeed in the location and orientation that the projection suggests. gpoly |> affine(aff) |> plot(main = "affine()", col = rain, lwd = line_width) 13.5 Image transforms Giotto uses giottoLargeImages as the core image class which is based on terra SpatRaster. Images are not loaded into memory when the object is generated and instead an amount of regular sampling appropriate to the zoom level requested is performed at time of plotting. spatShift() and rescale() operations are supported by terra SpatRaster, and we inherit those functionalities. spin(), flip(), t(), shear(), affine() operations will coerce giottoLargeImage to giottoAffineImage, which is much the same, except it contains an affine2d object that tracks spatial manipulations performed, so that they can be applied through magick::image_distort() processing after sampled values are pulled into memory. giottoAffineImage also has alternative ext() and crop() methods so that those operations respect both the expected post-affine space and un-transformed source image. # affine transform of image info matches with polygon info gimg |> affine(aff) |> plot() gpoly |> affine(aff) |> plot(add = TRUE, border = "cyan", lwd = 0.3) # affine of the giotto object g |> affine(aff) |> spatInSituPlotPoints( show_image = TRUE, feats = list(rna = c("Adgrl1", "Gfap", "Ntrk3", "Slc17a7")), feats_color_code = rainbow(4), polygon_color = "cyan", polygon_line_size = 0.1, point_size = 0.1, use_overlap = FALSE ) Currently giotto image objects are not fully compatible with .ome.tif files. terra which relies on gdal drivers for image loading will find that the Gtiff driver opens some .ome.tif images, but fails when certain compressions (notably JP2000 as used by 10x for their single-channel stains) are used. 13.6 The practical usage of multi-modality co-registration 13.6.1 Example dataset: Xenium Breast Cancer pre-release pack 10X Genomics Released a comprehensive dataset on 2022. To capture spatial structure by complementing different spatial resolutions and modalities across different assays, they provided a dataset with Xenium in situ transcriptomics data, together with Visium on closely adjacent sections. Additional IF staining was also performed on the Xenium slides. For more information, please refer to the pre-release dataset page as well as the publication. Visium H&E Histology 55um spot level expression with transcriptome coverage Xenium H&E Histology IF image staining DAPI, HER2 and CD20 in situ transcripts cooresponding centroid locations The goal of creating this multi-modal dataset is to register all the modalities listed above to the same coordinate system as Xenium in situ transcripts as the coordinate represents a certain micron distance. library(Giotto) instrs <- createGiottoInstructions(save_dir = file.path(getwd(),'/img/03_session2/'), save_plot = TRUE, show_plot = TRUE) options(timeout = 999999) download_dir <-file.path(getwd(),'/data/03_session2/') destfile <- file.path(download_dir,'Multimodal_registration.zip') if (!dir.exists(download_dir)) { dir.create(download_dir, recursive = TRUE) } download.file('https://zenodo.org/records/13208139/files/Multimodal_registration.zip?download=1', destfile = destfile) unzip(paste0(download_dir,'/Multimodal_registration.zip'), exdir = download_dir) Xenium_dir <- paste0(download_dir,'/Xenium/') Visium_dir <- paste0(download_dir,'/Visium/') 13.6.2 Target Coordinate system Xenium transcripts, polygon information and corresponding centroids are output from the Xenium instrument and are in the same coordinate system from the raw output. We can start with checking the centroid information as a representation of the target coordinate system. xen_cell_df <- read.csv(paste0(Xenium_dir,"/cells.csv.gz")) xen_cell_pl <- ggplot2::ggplot() + ggplot2::geom_point(data = xen_cell_df, ggplot2::aes(x = x_centroid , y = y_centroid),size = 1e-150,,color = 'orange') + ggplot2::theme_classic() xen_cell_pl 13.6.3 Visium to register Load the Visium directory using the Giotto convenience function, note that here we are using the “tissue_hires_image.png” as a image to plot. Using the convenience function, hires image and scale factor stored in the spaceranger will be used for automatic alignment while the Giotto Visium Object creation. SpatPlot2D provided by Giotto will random sample pixels from the image you provide, thus providing microscopic image as the image input for createGiottoVisiumObject() will improve the visual performance for downstream registration G_visium <- createGiottoVisiumObject(visium_dir = Visium_dir, gene_column_index = 2, png_name = 'tissue_hires_image.png', instructions = NULL) # In the meantime, calculate statistics for easier plot showing G_visium <- normalizeGiotto(G_visium) G_visium <- addStatistics(G_visium) V_origin <- spatPlot2D(G_visium,show_image = T,point_size = 0,return_plot = T) V_origin The Visium Object needs to be transformed to the same orientation as target coordinate system, so we perform the first transform. # create affine2d aff <- affine(diag(c(1,1))) aff <- aff |> spin(90) |> flip(direction = "horizontal") force(aff) # Apply the transform V_tansformed <- affine(G_visium,aff) spatplot_to_register <- spatPlot2D(V_tansformed,show_image = T,point_size = 0,return_plot = T) spatplot_to_register Landmarks are considered to be a set of points that are defining same location from two different resources. They are very helpful to be used as anchors to create affine transformtion. For example, after the affine transformation source landmarks should be as close to target landmarks as possible. Since images from different modalities can share similar morphology, the easiest way is to pin landmarks at the morphological identities shared between images. Giotto provides a interactive landmark selection tool to pin landmarks, two input plots can be generated from a ggplot object, a GiottoLargeImage object, or a path to a image you want to register for. Note that if you directly provide image path, you will need to create a separate GiottoLargeImage to perform transformation, and make sure the GiottoLargeImage has the same coordinate system as shown in the shiny app. landmarks <- interactiveLandmarkSelection(spatplot_to_register, xen_cell_pl) Now, use the landmarks to estimate the transformation matrix needed, and to register the Giotto Visium Object to the target coordinate system. For reproducibility purpose, the landmarks used in the chunck below will be loaded from saved result. landmarks<- readRDS(paste0(Xenium_dir,'/Visium_to_Xen_Landmarks.rds')) affine_mtx <- calculateAffineMatrixFromLandmarks(landmarks[[1]],landmarks[[2]]) V_final <- affine(G_visium,affine_mtx %*% aff@affine) spatplot_final <- spatPlot2D(V_final,show_image = T,point_size = 0,show_plot = F) spatplot_final + ggplot2::geom_point(data = xen_cell_df, ggplot2::aes(x = x_centroid , y = y_centroid),size = 1e-150,,color = 'orange') + ggplot2::theme_classic() 13.6.3.1 Create Pseudo Visium dataset for comparison Giotto provides a way to create different shapes on certain locations, we can use that to create a pseudo-visium polygons to aggregate transcripts or image intensities. To do that, we will need the centroid locations, which can be get using getSpatialLocations(). and also the radius information to create circles. We know that Visium certer to center distance is 100um and spot diameter is 55um, thus we can estimate the radius from certer to center distance. And we can use a spatial network created by nearest neighbor = 2 to capture the distance. V_final <- createSpatialNetwork(V_final, k = 1) spat_network <- getSpatialNetwork(V_final,output = 'networkDT') spatPlot2D(V_final, show_network = T, network_color = 'blue', point_size = 1) center_to_center <- min(spat_network$distance) radius <- center_to_center*55/200 Now we get the Pseudo Visium polygons Visium_centroid <- getSpatialLocations(V_final,output = 'data.table') stamp_dt <- circleVertices(radius = radius, npoints = 100) pseudo_visium_dt <- polyStamp(stamp_dt, Visium_centroid) pseudo_visium_poly <- createGiottoPolygonsFromDfr(pseudo_visium_dt,calc_centroids = T) plot(pseudo_visium_poly) Create Xenium object with pseudo Visium polygon. To save run time, example shown here only have MS4A1 and ERBB2 genes to create Giotto points xen_transcripts <- data.table::fread(paste0(Xenium_dir,'/Xen_2_genes.csv.gz')) gpoints <- createGiottoPoints(xen_transcripts) Xen_obj <-createGiottoObjectSubcellular(gpoints = list('rna' = gpoints), gpolygons = list('visium' = pseudo_visium_poly)) Get gene expression information by overlapping polygon to points Xen_obj <- calculateOverlap(Xen_obj, feat_info = 'rna', spatial_info = 'visium') Xen_obj <- overlapToMatrix(x = Xen_obj, type = "point", poly_info = "visium", feat_info = "rna", aggr_function = "sum") Manipulate the expression for plotting Xen_obj <- filterGiotto(Xen_obj, feat_type = 'rna', spat_unit = 'visium', expression_threshold = 1, feat_det_in_min_cells = 0, min_det_feats_per_cell = 1) tmp_exprs <- getExpression(Xen_obj, feat_type = 'rna', spat_unit = 'visium', output = 'matrix') Xen_obj <- setExpression(Xen_obj, x = createExprObj(log(tmp_exprs+1)), feat_type = 'rna', spat_unit = 'visium', name = 'plot') spatFeatPlot2D(Xen_obj, point_size = 3.5, expression_values = 'plot', show_image = F, feats = 'ERBB2') Subset the registered Visium and plot same gene #get the extent of giotto points, xmin, xmax, ymin, ymax subset_extent <- ext(gpoints@spatVector) sub_visium <- subsetGiottoLocs(V_final, x_min = subset_extent[1], x_max = subset_extent[2], y_min = subset_extent[3], y_max = subset_extent[4]) spatFeatPlot2D(sub_visium, point_size = 2, expression_values = 'scaled', show_image = F, feats = 'ERBB2') 13.6.4 Register post-Xenium H&E and IF image For Xenium instrument output, Giotto provide a convenience function to load the output from the Xenium ranger output. Note that 10X created the affine image alignment file by applying rotation, scale at (0,0) of the top left corner and translation last. Thus, it will look different than the affine matrix created from landmarks above. In this example, we used a 0.05X compressed ometiff and the alignment file is also create by first rescale at 20X, then apply the affine matrix provided by 10X Genomics. HE_xen <- read10xAffineImage(file = paste0(Xenium_dir, "/HE_ome_compressed.tiff"), imagealignment_path = paste0(Xenium_dir,"/Xenium_he_imagealignment.csv"), micron = 0.2125) plot(HE_xen) The image is still on the top left corner, so we flip the image to make it align with the target coordinate system. We can also save the transformed image raster by re-sample all pixel from the original image, and write it to a file on disk for future use. HE_xen <- HE_xen |> flip(direction = "vertical") gimg_rast <- HE_xen@funs$realize_magick(size = prod(dim(HE_xen))) plot(gimg_rast) #terra::writeRaster(gimg_rast@raster_object, filename = output, gdal = "COG" # save as GeoTIFF with extent info) Now we can check the registration results. GiottoVisuals provide a function to plot a giottoLargeImage to a ggplot object in order to plot additional layers of ggplots gg <- ggplot2::ggplot() pl <- GiottoVisuals::gg_annotation_raster(gg,gimg_rast) pl + ggplot2::geom_smooth() + ggplot2::geom_point(data = xen_cell_df, ggplot2::aes(x = x_centroid , y = y_centroid),size = 1e-150,,color = 'orange') + ggplot2::theme_classic() 13.6.4.1 Add registered image information and compare RNA vs protein expression With the strategy described above, affine transformed image can be saved and used for quantitive analysis. Here, we can use the same strategy as dealing with spatial proteomics data for IF CD20_gimg <- createGiottoLargeImage(paste0(Xenium_dir,'/CD20_registered.tiff'), use_rast_ext = T,name = 'CD20') HER2_gimg <- createGiottoLargeImage(paste0(Xenium_dir,'/HER2_registered.tiff'), use_rast_ext = T,name = 'HER2') Xen_obj <- addGiottoLargeImage(gobject = Xen_obj, largeImages = list('CD20' = CD20_gimg,'HER2' = HER2_gimg)) Get the cell polygons, as Xenium and IF are both subcellular resolution cellpoly_dt <- data.table::fread(paste0(Xenium_dir,'/cell_boundaries.csv.gz')) colnames(cellpoly_dt) <- c('poly_ID','x','y') cellpoly <- createGiottoPolygonsFromDfr(cellpoly_dt) Xen_obj <- addGiottoPolygons(Xen_obj,gpolygons = list('cell' = cellpoly)) Compute the gene expression matrix by overlay the cell polygons and giotto points. Xen_obj <- calculateOverlap(Xen_obj, feat_info = 'rna', spatial_info = 'cell') Xen_obj <- overlapToMatrix(x = Xen_obj, type = "point", poly_info = "cell", feat_info = "rna", aggr_function = "sum") tmp_exprs <- getExpression(Xen_obj, feat_type = 'rna', spat_unit = 'cell', output = 'matrix') Xen_obj <- setExpression(Xen_obj, x = createExprObj(log(tmp_exprs+1)), feat_type = 'rna', spat_unit = 'cell', name = 'plot') spatFeatPlot2D(Xen_obj, feat_type = 'rna', expression_values = 'plot', spat_unit = 'cell', feats = 'ERBB2', point_size = 0.05) Now we overlay the HER2 expression from the raster image with the cell polygons. Xen_obj <- calculateOverlap(Xen_obj, spatial_info = 'cell', image_names = c('HER2','CD20')) Xen_obj <- overlapToMatrix(x = Xen_obj, type = "intensity", poly_info = "cell", feat_info = "protein", aggr_function = "sum") tmp_exprs <- getExpression(Xen_obj, feat_type = 'protein', spat_unit = 'cell', output = 'matrix') Xen_obj <- setExpression(Xen_obj, x = createExprObj(log(tmp_exprs+1)), feat_type = 'protein', spat_unit = 'cell', name = 'plot') spatFeatPlot2D(Xen_obj, feat_type = 'protein', expression_values = 'plot', spat_unit = 'cell', feats = 'HER2', point_size = 0.05) We can also overlay the protein expression to Visium spots Xen_obj <- calculateOverlap(Xen_obj, spatial_info = 'visium', image_names = c('HER2','CD20')) Xen_obj <- overlapToMatrix(x = Xen_obj, type = "intensity", poly_info = "visium", feat_info = "protein", aggr_function = "sum") Xen_obj <- filterGiotto(Xen_obj, feat_type = 'protein', spat_unit = 'visium', expression_threshold = 1, feat_det_in_min_cells = 0, min_det_feats_per_cell = 1) tmp_exprs <- getExpression(Xen_obj, feat_type = 'protein', spat_unit = 'visium', output = 'matrix') Xen_obj <- setExpression(Xen_obj, x = createExprObj(log(tmp_exprs+1)), feat_type = 'protein', spat_unit = 'visium', name = 'plot') spatFeatPlot2D(Xen_obj, feat_type = 'protein', expression_values = 'plot', spat_unit = 'visium', feats = 'HER2', point_size = 2) 13.6.5 Automatic alignment via SIFT feature descriptor matching and affine transformation Pin landmarks or use compounded affine transforms to register image usually provides initial registration results. However, recording landmarks or manually combine transformations require a lot of manual effort. It will require too much effort when having a large amount of images to register. As long as accurate landmarks are provided, registration will be easy to automatically perform. Here we provide a wrapper function of Scale invariant feature transform(SIFT). SIFT will first identify the extreme points in different scale spaces from paired images, then use a brutal force way to match the points. The matched points can then be used to estimate the transform and warp the image. The major drawback is once the dimension of the image become bigger, the computing time will increase exponentially. Here, we provide an example of two compressed images to show the automatic alignment pipeline. HE <- createGiottoLargeImage(paste0(Xenium_dir,'/mini_HE.png'),negative_y = F) plot(HE) IF <- createGiottoLargeImage(paste0(Xenium_dir,'/mini_IF.tif'),negative_y = F,flip_horizontal = T) terra::plotRGB(IF@raster_object,r=1, g=2, b=3,, stretch="lin") Now, we can use the automated transformation pipeline. Note that we will start with a path to the images, run the preprocessImageToMatrix() first to meet the requirement of estimateAutomatedImageRegistrationWithSIFT() function. The images will be preprocessed to gray scale. And for that purpose, we use the DAPI channel from the miniIF, and set invert = T for mini HE as HE image so that grayscle HE will have higher value for high intensity pixels. The function will output an estimation of the transform. estimation <- estimateAutomatedImageRegistrationWithSIFT(x = preprocessImageToMatrix(paste0(Xenium_dir,'/mini_IF.tif'), flip_horizontal = T, use_single_channel = T, single_channel_number = 3), y = preprocessImageToMatrix(paste0(Xenium_dir,'/mini_HE.png'), invert = T), plot_match = T, max_ratio = 0.5,estimate_fun = 'Projective') Use the estimation, we can quickly visualize the transformation mtx <- as.matrix(estimation$params) transformed <- affine(IF, mtx) To_see_overlay <- transformed@funs$realize_magick(size = 2e6) plot(HE) plot(To_see_overlay@raster_object[[2]], add=TRUE, alpha=0.5) 13.6.6 Final Notes Image registration is becoming crucial for spatial multi modal analysis. The methods included here are not the only ways to register images, and either of them may have drawbacks for a good alignment. There are multiple tools coming out for the field with different strategies, including easier landmark detection, deformable transformation as well as matching spatial patterns, etc. Some of them provides transformed images or coordinates that can be directly loaded to Giotto as a multimodal object using a standard pipeline. "],["multi-omics-integration.html", "14 Multi-omics integration 14.1 The CytAssist technology 14.2 Introduction to the spatial dataset 14.3 Download dataset 14.4 Create the Giotto object 14.5 Subset on spots that were covered by tissue 14.6 RNA processing 14.7 Protein processing 14.8 Multi-omics integration 14.9 Session info", " 14 Multi-omics integration Joselyn Cristina Chávez Fuentes August 7th 2024 14.1 The CytAssist technology The Visium CytAssist Spatial Gene and Protein Expression assay is designed to introduce simultaneous Gene Expression and Protein Expression analysis to FFPE samples processed with Visium CytAssist. The assay uses NGS to measure the abundance of oligo-tagged antibodies with spatial resolution, in addition to the whole transcriptome and a morphological image. Figure 14.1: CystAssits multi-omics diagram. Source: 10X genomics. The 10X human immune cell profiling panel features 35 antibodies from Abcam and Biolegend, and includes cell surface and intracellular targets. The rna probes hybridize to ~18,000 genes, or RNA targets, within the tissue section to achieve whole transcriptome gene expression profiling. The remaining steps, starting with probe extension, follow the standard Visium workflow outside of the instrument. Figure 14.2: CytAssist workflow. Source: 10X genomics. 14.2 Introduction to the spatial dataset The Human Tonsil (FFPE) dataset was obtained from 10X Genomics. The tissue was sectioned as described in Visium CytAssist Spatial Gene and Protein Expression for FFPE – Tissue Preparation Guide (CG000660). 5 µm tissue sections were placed on Superfrost glass slides, deparaffinized, H&E stained (CG000658) and coverslipped. Sections were imaged, decoverslipped, followed by decrosslinking per the Staining Demonstrated Protocol (CG000658). More information about this dataset can be found here. 14.3 Download dataset You need to download the expression matrix and spatial information by running these commands: dir.create("data/03_session3") download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/2.1.0/CytAssist_FFPE_Protein_Expression_Human_Tonsil/CytAssist_FFPE_Protein_Expression_Human_Tonsil_raw_feature_bc_matrix.tar.gz", destfile = "data/03_session3/CytAssist_FFPE_Protein_Expression_Human_Tonsil_raw_feature_bc_matrix.tar.gz") download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/2.1.0/CytAssist_FFPE_Protein_Expression_Human_Tonsil/CytAssist_FFPE_Protein_Expression_Human_Tonsil_spatial.tar.gz", destfile = "data/03_session3/CytAssist_FFPE_Protein_Expression_Human_Tonsil_spatial.tar.gz") After downloading, unzip the gz files. You should get the “raw_feature_bc_matrix” and “spatial” folders inside “data/03_session3/”. untar(tarfile = "data/03_session3/CytAssist_FFPE_Protein_Expression_Human_Tonsil_raw_feature_bc_matrix.tar.gz", exdir = "data/03_session3") untar(tarfile = "data/03_session3/CytAssist_FFPE_Protein_Expression_Human_Tonsil_spatial.tar.gz", exdir = "data/03_session3") 14.4 Create the Giotto object The minimum requirements are: matrix with expression information (or the path to) x,y(,z) coordinates for cells or spots (or the path to) createGiottoVisiumObject() will automatically detect both RNA and Protein modalities in the expression matrix and will create a multi-omics Giotto object. library(Giotto) ## Set instructions results_folder <- "results/03_session3/" python_path <- NULL instructions <- createGiottoInstructions( save_dir = results_folder, save_plot = TRUE, show_plot = FALSE, return_plot = FALSE, python_path = python_path ) # Provide the path to the data folder data_path <- "data/03_session3/" # Create object directly from the data folder visium_tonsil <- createGiottoVisiumObject( visium_dir = data_path, expr_data = "raw", png_name = "tissue_lowres_image.png", gene_column_index = 2, instructions = instructions ) Print the information of the object, note that both rna and protein are listed in the expression slot. visium_tonsil 14.5 Subset on spots that were covered by tissue spatPlot2D( gobject = visium_tonsil, cell_color = "in_tissue", point_size = 2, cell_color_code = c("0" = "lightgrey", "1" = "blue"), show_image = TRUE, image_name = "image" ) Figure 14.3: Spatial plot of the CytAssist human tonsil sample, color indicates wheter the spot is in tissue (1) or not (0). Use the metadata table to identify the spots corresponding to the tissue area, given by the “in_tissue” column. Then use the spot IDs to subset the giotto object. metadata <- getCellMetadata(gobject = visium_tonsil, output = "data.table") in_tissue_barcodes <- metadata[in_tissue == 1]$cell_ID visium_tonsil <- subsetGiotto(visium_tonsil, cell_ids = in_tissue_barcodes) 14.6 RNA processing Run the Filtering, normalization, and statistics steps using only the RNA feature. visium_tonsil <- filterGiotto( gobject = visium_tonsil, expression_threshold = 1, feat_det_in_min_cells = 50, min_det_feats_per_cell = 1000, expression_values = "raw", verbose = TRUE) visium_tonsil <- normalizeGiotto(gobject = visium_tonsil, scalefactor = 6000, verbose = TRUE) visium_tonsil <- addStatistics(gobject = visium_tonsil) Dimension reduction Identify the highly variable features using the RNA features, then calculate the principal components based on the HVFs. visium_tonsil <- calculateHVF(gobject = visium_tonsil) visium_tonsil <- runPCA(gobject = visium_tonsil) Clustering Calculate the UMAP, tSNE, and shared nearest neighbor network using the first 10 principal components for the RNA modality. visium_tonsil <- runUMAP(visium_tonsil, dimensions_to_use = 1:10) visium_tonsil <- runtSNE(visium_tonsil, dimensions_to_use = 1:10) visium_tonsil <- createNearestNetwork(gobject = visium_tonsil, dimensions_to_use = 1:10, k = 30) Calculate the RNA-based Leiden clusters. visium_tonsil <- doLeidenCluster(gobject = visium_tonsil, resolution = 1, n_iterations = 1000) Visualization Plot the RNA-based UMAP with the corresponding RNA-based Leiden cluster per spot. plotUMAP(gobject = visium_tonsil, cell_color = "leiden_clus", show_NN_network = TRUE, point_size = 2) Figure 14.4: RNA UMAP, color indicates the RNA-based Leiden clusters. Plot the spatial distribution of the RNA-based Leiden cluster per spot. spatPlot2D(gobject = visium_tonsil, show_image = TRUE, cell_color = "leiden_clus", point_size = 3) Figure 14.5: Spatial distribution of RNA-based Leiden clusters. 14.7 Protein processing Run the Filtering, normalization, and statistics steps for the protein modality. visium_tonsil <- filterGiotto(gobject = visium_tonsil, spat_unit = "cell", feat_type = "protein", expression_threshold = 1, feat_det_in_min_cells = 50, min_det_feats_per_cell = 1, expression_values = "raw", verbose = TRUE) visium_tonsil <- normalizeGiotto(gobject = visium_tonsil, spat_unit = "cell", feat_type = "protein", scalefactor = 6000, verbose = TRUE) visium_tonsil <- addStatistics(gobject = visium_tonsil, spat_unit = "cell", feat_type = "protein") Dimension reduction Calculate the principal components using all the proteins available in the dataset. visium_tonsil <- runPCA(gobject = visium_tonsil, spat_unit = "cell", feat_type = "protein") Clustering Calculate the UMAP, tSNE, and shared nearest neighbors network using the first 10 principal components for the Protein modality. visium_tonsil <- runUMAP(visium_tonsil, spat_unit = "cell", feat_type = "protein", dimensions_to_use = 1:10) visium_tonsil <- runtSNE(visium_tonsil, spat_unit = "cell", feat_type = "protein", dimensions_to_use = 1:10) visium_tonsil <- createNearestNetwork(gobject = visium_tonsil, spat_unit = "cell", feat_type = "protein", dimensions_to_use = 1:10, k = 30) Calculate the Protein-based Leiden clusters. visium_tonsil <- doLeidenCluster(gobject = visium_tonsil, spat_unit = "cell", feat_type = "protein", resolution = 1, n_iterations = 1000) Visualization Plot the Protein UMAP and color the spots using the Protein-based Leiden clusters. plotUMAP(gobject = visium_tonsil, spat_unit = "cell", feat_type = "protein", cell_color = "leiden_clus", show_NN_network = TRUE, point_size = 2) Figure 14.6: Protein UMAP, color indicates the Protein-based Leiden clusters. Plot the spatial distribution of the Protein-based Leiden clusters. spatPlot2D(gobject = visium_tonsil, spat_unit = "cell", feat_type = "protein", show_image = TRUE, cell_color = "leiden_clus", point_size = 3) Figure 14.7: Spatial distribution of Protein-based Leiden clusters. 14.8 Multi-omics integration Calculate kNN Calculate the k nearest neighbors network for each modality (RNA and Protein), using the first 10 principal components of each feature type. ## RNA modality visium_tonsil <- createNearestNetwork(gobject = visium_tonsil, type = "kNN", dimensions_to_use = 1:10, k = 20) ## Protein modality visium_tonsil <- createNearestNetwork(gobject = visium_tonsil, spat_unit = "cell", feat_type = "protein", type = "kNN", dimensions_to_use = 1:10, k = 20) Run WNN Run the Weighted Nearest Neighbor analysis to weight the contribution of each feature type per spot. The results will be saved in the multiomics slot of the giotto object. visium_tonsil <- runWNN(visium_tonsil, spat_unit = "cell", modality_1 = "rna", modality_2 = "protein", pca_name_modality_1 = "pca", pca_name_modality_2 = "protein.pca", k = 20, integrated_feat_type = NULL, matrix_result_name = NULL, w_name_modality_1 = NULL, w_name_modality_2 = NULL, verbose = TRUE) Run Integrated umap Calculate the UMAP using the weights of each feature per spot. visium_tonsil <- runIntegratedUMAP(visium_tonsil, modality1 = "rna", modality2 = "protein", spread = 7, min_dist = 1, force = FALSE) Calculate integrated clusters Calculate the multiomics-based Leiden clusters using the weights of each feature per spot. visium_tonsil <- doLeidenCluster(gobject = visium_tonsil, spat_unit = "cell", feat_type = "rna", nn_network_to_use = "kNN", network_name = "integrated_kNN", name = "integrated_leiden_clus", resolution = 1) Visualize the integrated UMAP Plot the integrated UMAP and color the spots using the integrated Leiden clusters. plotUMAP(gobject = visium_tonsil, spat_unit = "cell", feat_type = "rna", cell_color = "integrated_leiden_clus", dim_reduction_name = "integrated.umap", point_size = 2, title = "Integrated UMAP using Integrated Leiden clusters") Figure 14.8: Integrated UMAP. Color represents the integrated Leiden clusters. Visualize spatial plot with integrated clusters Plot the spatial distribution of the integrated Leiden clusters. spatPlot2D(visium_tonsil, spat_unit = "cell", feat_type = "rna", cell_color = "integrated_leiden_clus", point_size = 3, show_image = TRUE, title = "Integrated Leiden clustering") Figure 14.9: Spatial distribution of the integrated Leiden clusters. 14.9 Session info sessionInfo() R version 4.4.1 (2024-06-14) Platform: aarch64-apple-darwin20 Running under: macOS Sonoma 14.5 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: America/New_York tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Giotto_4.1.0 GiottoClass_0.3.3 loaded via a namespace (and not attached): [1] colorRamp2_0.1.0 deldir_2.0-4 [3] rlang_1.1.4 magrittr_2.0.3 [5] RcppAnnoy_0.0.22 GiottoUtils_0.1.10 [7] matrixStats_1.3.0 compiler_4.4.1 [9] png_0.1-8 systemfonts_1.1.0 [11] vctrs_0.6.5 reshape2_1.4.4 [13] stringr_1.5.1 pkgconfig_2.0.3 [15] SpatialExperiment_1.14.0 crayon_1.5.3 [17] fastmap_1.2.0 backports_1.5.0 [19] magick_2.8.4 XVector_0.44.0 [21] labeling_0.4.3 utf8_1.2.4 [23] rmarkdown_2.27 UCSC.utils_1.0.0 [25] ragg_1.3.2 purrr_1.0.2 [27] xfun_0.46 beachmat_2.20.0 [29] zlibbioc_1.50.0 GenomeInfoDb_1.40.1 [31] jsonlite_1.8.8 DelayedArray_0.30.1 [33] BiocParallel_1.38.0 terra_1.7-78 [35] irlba_2.3.5.1 parallel_4.4.1 [37] R6_2.5.1 stringi_1.8.4 [39] RColorBrewer_1.1-3 reticulate_1.38.0 [41] parallelly_1.37.1 GenomicRanges_1.56.1 [43] scattermore_1.2 Rcpp_1.0.13 [45] bookdown_0.40 SummarizedExperiment_1.34.0 [47] knitr_1.48 future.apply_1.11.2 [49] R.utils_2.12.3 IRanges_2.38.1 [51] Matrix_1.7-0 igraph_2.0.3 [53] tidyselect_1.2.1 rstudioapi_0.16.0 [55] abind_1.4-5 yaml_2.3.9 [57] codetools_0.2-20 listenv_0.9.1 [59] lattice_0.22-6 tibble_3.2.1 [61] plyr_1.8.9 Biobase_2.64.0 [63] withr_3.0.0 Rtsne_0.17 [65] evaluate_0.24.0 future_1.33.2 [67] pillar_1.9.0 MatrixGenerics_1.16.0 [69] checkmate_2.3.1 stats4_4.4.1 [71] plotly_4.10.4 generics_0.1.3 [73] dbscan_1.2-0 sp_2.1-4 [75] S4Vectors_0.42.1 ggplot2_3.5.1 [77] munsell_0.5.1 scales_1.3.0 [79] globals_0.16.3 gtools_3.9.5 [81] glue_1.7.0 lazyeval_0.2.2 [83] tools_4.4.1 GiottoVisuals_0.2.4 [85] data.table_1.15.4 ScaledMatrix_1.12.0 [87] cowplot_1.1.3 grid_4.4.1 [89] tidyr_1.3.1 colorspace_2.1-0 [91] SingleCellExperiment_1.26.0 GenomeInfoDbData_1.2.12 [93] BiocSingular_1.20.0 rsvd_1.0.5 [95] cli_3.6.3 textshaping_0.4.0 [97] fansi_1.0.6 S4Arrays_1.4.1 [99] viridisLite_0.4.2 dplyr_1.1.4 [101] uwot_0.2.2 gtable_0.3.5 [103] R.methodsS3_1.8.2 digest_0.6.36 [105] BiocGenerics_0.50.0 SparseArray_1.4.8 [107] ggrepel_0.9.5 farver_2.1.2 [109] rjson_0.2.21 htmlwidgets_1.6.4 [111] htmltools_0.5.8.1 R.oo_1.26.0 [113] lifecycle_1.0.4 httr_1.4.7 "],["interoperability-with-other-frameworks.html", "15 Interoperability with other frameworks 15.1 Load Giotto object 15.2 Seurat 15.3 SpatialExperiment 15.4 AnnData 15.5 Create mini Vizgen object", " 15 Interoperability with other frameworks Iqra August 7th 2024 Giotto facilitates seamless interoperability with various tools, including Seurat, AnnData, and SpatialExperiment. Below is a comprehensive tutorial on how Giotto interoperates with these other tools. 15.1 Load Giotto object To begin demonstrating the interoperability of a Giotto object with other frameworks, we first load the required libraries and a Giotto mini object. We then proceed with the conversion process: library(Giotto) library(GiottoData) Load a Giotto mini Visium object, which will be used for demonstrating interoperability. gobject <- GiottoData::loadGiottoMini("visium") 15.2 Seurat Giotto Suite provides interoperability between Seurat and Giotto, supporting both older and newer versions of Seurat objects. The four tailored functions are giottoToSeuratV4(), seuratToGiottoV4() for older versions, and giottoToSeuratV5(), seuratToGiottoV5() for Seurat v5, which includes subcellular and image information. These functions map Giotto’s metadata, dimension reductions, spatial locations, and images to the corresponding slots in Seurat. 15.2.1 Conversion of Giotto Object to Seurat Object To convert Giotto object to Seurat V5 object, we first load required libraries and use the function giottoToSeuratV5() function library(Seurat) library(SeuratData) library(ggplot2) library(patchwork) library(dplyr) Now we convert the Giotto object to a Seurat V5 object and create violin and spatial feature plots to visualize the RNA count data. gToS <- giottoToSeuratV5(gobject = gobject, spat_unit = "cell") plot1 <- VlnPlot(gToS, features = "nCount_rna", pt.size = 0.1) + NoLegend() plot2 <- SpatialFeaturePlot(gToS, features = "nCount_rna", pt.size.factor = 2) + theme(legend.position = "right") wrap_plots(plot1, plot2) 15.2.1.1 Apply SCTransform We apply SCTransform to perform data transformation on the RNA assay: SCTransform() function. gToS <- SCTransform(gToS, assay = "rna", verbose = FALSE) 15.2.1.2 Dimension Reduction We perform Principal Component Analysis (PCA), find neighbors, and run UMAP for dimensionality reduction and clustering on the transformed Seurat object: gToS <- RunPCA(gToS, assay = "SCT") gToS <- FindNeighbors(gToS, reduction = "pca", dims = 1:30) gToS <- RunUMAP(gToS, reduction = "pca", dims = 1:30) 15.2.2 Conversion of Seurat object Back to Giotto Object To Convert the Seurat Object back to Giotto object, we use the funcion seuratToGiottoV5(), specifying the spatial assay, dimensionality reduction techniques, and spatial and nearest neighbor networks. giottoFromSeurat <- seuratToGiottoV5(sobject = gToS, spatial_assay = "rna", dim_reduction = c("pca", "umap"), sp_network = "Delaunay_network", nn_network = c("sNN.pca", "custom_NN" )) 15.2.2.1 Clustering and Plotting UMAP Here we perform K-means clustering on the UMAP results obtained from the Seurat object: ## k-means clustering giottoFromSeurat <- doKmeans(gobject = giottoFromSeurat, dim_reduction_to_use = "pca") #Plot UMAP post-clustering to visualize kmeans graph2 <- Giotto::plotUMAP( gobject = giottoFromSeurat, cell_color = "kmeans", show_NN_network = TRUE, point_size = 2.5 ) 15.2.2.2 Spatial CoExpression We can also use the binSpect function to analyze spatial co-expression using the spatial network Delaunay network from the Seurat object and then visualize the spatial co-expression using the heatmSpatialCorFeat() function: ranktest <- binSpect(giottoFromSeurat, bin_method = "rank", calc_hub = TRUE, hub_min_int = 5, spatial_network_name = "Delaunay_network") ext_spatial_genes <- ranktest[1:300,]$feats spat_cor_netw_DT <- detectSpatialCorFeats( giottoFromSeurat, method = "network", spatial_network_name = "Delaunay_network", subset_feats = ext_spatial_genes) top10_genes <- showSpatialCorFeats(spat_cor_netw_DT, feats = "Dsp", show_top_feats = 10) spat_cor_netw_DT <- clusterSpatialCorFeats(spat_cor_netw_DT, name = "spat_netw_clus", k = 7) heatmSpatialCorFeats( giottoFromSeurat, spatCorObject = spat_cor_netw_DT, use_clus_name = "spat_netw_clus", heatmap_legend_param = list(title = NULL), save_plot = TRUE, show_plot = TRUE, return_plot = FALSE, save_param = list(base_height = 6, base_width = 8, units = 'cm')) 15.3 SpatialExperiment For the Bioconductor group of packages, the SpatialExperiment data container handles data from spatial-omics experiments, including spatial coordinates, images, and metadata. Giotto Suite provides giottoToSpatialExperiment() and spatialExperimentToGiotto(), mapping Giotto’s slots to the corresponding SpatialExperiment slots. Since SpatialExperiment can only store one spatial unit at a time, giottoToSpatialExperiment() returns a list of SpatialExperiment objects, each representing a distinct spatial unit. To start the conversion of a Giotto mini Visium object to a SpatialExperiment object, we first load the required libraries. library(SpatialExperiment) library(ggspavis) library(pheatmap) library(scater) library(scran) library(nnSVG) 15.3.1 Convert Giotto Object to SpatialExperiment Object To convert the Giotto object to a SpatialExperiment object, we use the giottoToSpatialExperiment() function. gspe <- giottoToSpatialExperiment(gobject) The conversion function returns a separate SpatialExperiment object for each spatial unit. We select the first object for downstream use: spe <- gspe[[1]] 15.3.1.1 Identify top spatially variable genes with nnSVG We employ the nnSVG package to identify the top spatially variable genes in our SpatialExperiment object. Covariates can be added to our model; in this example, we use Leiden clustering labels as a covariate: # One of the assays should be "logcounts" # We rename the normalized assay to "logcounts" assayNames(spe)[[2]] <- "logcounts" # Create model matrix for leiden clustering labels X <- model.matrix(~ colData(spe)$leiden_clus) dim(X) Run nnSVG This step will take several minutes to run spe <- nnSVG(spe, X = X) # Show top 10 features rowData(spe)[order(rowData(spe)$rank)[1:10], ]$feat_ID 15.3.2 Conversion of SpatialExperiment object back to Giotto We then convert the processed SpatialExperiment object back into a Giotto object for further downstream analysis using the Giotto suite. This is done using the spatialExperimentToGiotto function, where we explicitly specify the spatial network from the SpatialExperiment object. giottoFromSPE <- spatialExperimentToGiotto(spe = spe, python_path = NULL, sp_network = "Delaunay_network") giottoFromSPE <- spatialExperimentToGiotto(spe = spe, python_path = NULL, sp_network = "Delaunay_network") print(giottoFromSPE) 15.3.2.1 Plotting top genes from nnSVG in Giotto Now, we visualize the genes previously identified in the SpatialExperiment object using the nnSVG package within the Giotto toolkit, leveraging the converted Giotto object. ext_spatial_genes <- getFeatureMetadata(giottoFromSPE, output = "data.table") ext_spatial_genes <- ext_spatial_genes[order(ext_spatial_genes$rank)[1:10], ]$feat_ID spatFeatPlot2D(giottoFromSPE, expression_values = "scaled_rna_cell", feats = ext_spatial_genes[1:4], point_size = 2) 15.4 AnnData The anndataToGiotto() and giottoToAnnData() functions map the slots of the Giotto object to the corresponding locations in a Squidpy-flavored AnnData object. Specifically, Giotto’s expression slot maps to adata.X, spatial_locs to adata.obsm, cell_metadata to adata.obs, feat_metadata to adata.var, dimension_reduction to adata.obsm, and nn_network and spat_network to adata.obsp. Images are currently not mapped between both classes. Notably, Giotto stores expression matrices within separate spatial units and feature types, while AnnData does not support this hierarchical data storage. Consequently, multiple AnnData objects are created from a Giotto object when there are multiple spatial unit and feature type pairs. 15.4.1 Load Required Libraries To start, we need to load the necessary libraries, including reticulate for interfacing with Python. library(reticulate) 15.4.2 Specify Path for Results First, we specify the directory where the results will be saved. Additionally, we retrieve and update Giotto instructions. # Specify path to which results may be saved results_directory <- "results/03_session4/giotto_anndata_conversion/" instrs <- showGiottoInstructions(gobject) mini_gobject <- replaceGiottoInstructions(gobject = gobject, instructions = instrs) 15.4.2.1 Create Default kNN Network We will create a k-nearest neighbor (kNN) network using mostly default parameters. gobject <- createNearestNetwork(gobject = gobject, spat_unit = "cell", feat_type = "rna", type = "kNN", dim_reduction_to_use = "umap", dim_reduction_name = "umap", k = 15, name = "kNN.umap") 15.4.3 Giotto To AnnData To convert the giotto object to AnnData, we use the Giotto’s function giottoToAnnData() gToAnnData <- giottoToAnnData(gobject = gobject, save_directory = results_directory) Next, we import scanpy and perform a series of preprocessing steps on the AnnData object. scanpy <- import("scanpy") adata <- scanpy$read_h5ad("results/03_session4/giotto_anndata_conversion/cell_rna_converted_gobject.h5ad") # Normalize total counts per cell scanpy$pp$normalize_total(adata, target_sum=1e4) # Log-transform the data scanpy$pp$log1p(adata) # Perform PCA scanpy$pp$pca(adata, n_comps=40L) # Compute the neighborhood graph scanpy$pp$neighbors(adata, n_neighbors=10L, n_pcs=40L) # Run UMAP scanpy$tl$umap(adata) # Save the processed AnnData object adata$write("results/03_session4/cell_rna_converted_gobject2.h5ad") processed_file_path <- "results/03_session4/cell_rna_converted_gobject2.h5ad" 15.4.4 Convert AnnData to Giotto Finally, we convert the processed AnnData object back into a Giotto object for further analysis using Giotto. giottoFromAnndata <- anndataToGiotto(anndata_path = processed_file_path) 15.4.4.1 UMAP Visualization Now we plot the UMAP using the GiottoVisuals::plotUMAP() function that was calculated using Scanpy on the AnnData object. Giotto::plotUMAP( gobject = giottoFromAnndata, dim_reduction_name = "umap.ad", cell_color = "leiden_clus", point_size = 3 ) 15.5 Create mini Vizgen object mini_gobject <- loadGiottoMini(dataset = "vizgen", python_path = NULL) mini_gobject <- replaceGiottoInstructions(gobject = mini_gobject, instructions = instrs) mini_gobject <- createNearestNetwork(gobject = mini_gobject, spat_unit = "aggregate", feat_type = "rna", type = "kNN", dim_reduction_to_use = "umap", dim_reduction_name = "umap", k = 6, name = "new_network") Since we have multiple spat_unit and feat_type pairs, this function will create multiple .h5ad files, with their names returned. Non-default nearest or spatial network names will have their key_added terms recorded and saved in corresponding .txt files; refer to the documentation for details. anndata_conversions <- giottoToAnnData(gobject = mini_gobject, save_directory = results_directory, python_path = NULL) "],["interoperability-with-isolated-tools.html", "16 Interoperability with isolated tools 16.1 Spatial niche trajectory analysis (ONTraC) 16.2 Session info", " 16 Interoperability with isolated tools Wen Wang August 7th 2024 16.1 Spatial niche trajectory analysis (ONTraC) 16.1.1 Introduction to ONTraC ONTraC (Ordered Niche Trajectory Construction) is a niche-centered, machine learning method for constructing spatially continuous trajectories. ONTraC differs from existing tools in that it treats a niche, rather than an individual cell, as the basic unit for spatial trajectory analysis. In this context, we define niche as a multicellular, spatially localized region where different cell types may coexist and interact with each other. ONTraC seamlessly integrates cell-type composition and spatial information by using the graph neural network modeling framework. Its output, which is called the niche trajectory, can be viewed as a one dimensional representation of the tissue microenvironment continuum. By disentangling cell-level and niche-level properties, niche trajectory analysis provides a coherent framework to study coordinated responses from all the cells in association with continuous tissue microenvironment variations. ONTraC paper ONTraC GitHub repository PPT 16.1.2 Introduction to MERFISH MERFISH is a massively multiplexed single-molecule imaging technology for spatially resolved transcriptomics capable of simultaneously measuring the copy number and spatial distribution of hundreds to tens of thousands of RNA species in individual cells. For further information, please visit the official website. 16.1.3 Settings options(timeout=Inf) # In case of network interrupt data_path <- file.path("data","03_session5") dir.create(data_path, recursive=T) results_folder <- file.path("results","03_session5") dir.create(results_folder, recursive=T) 16.1.4 Dataset This is a MERFISH mouse motor cortex dataset comprising 61 tissue sections and containing approximately 280,000 cells characterised by a 258-gene panel. The study identified 3 classes of cells, glutamatergic, GABAergic and non-neuronal cell groups, and further clustered into 23 annotated plus 1 other subclass-level cell types. Pseudotime based methods could generate one dimensional coordinates for specific lineages but lack the ability to generate trajectories for whole samples. By moving our focus from the cell to the niche (local microenvironment), ONTraC could generate niche trajectories for whole samples and map the NT score to each cell. 16.1.4.1 Dataset download The MERFISH mouse motor cortex data to run this tutorial can be found here You need to download the processed expression, metadata, and cell segmentation information by running these commands: Note 1: there are 61 slices here, we run on two of them to save the time. Note 2: due to the instability of network, download processing may be interrupt. We recommend to download these data in advance or download the processing giotto obj from Zenodo. download.file(url = "https://download.brainimagelibrary.org/cf/1c/cf1c1a431ef8d021/processed_data/counts.h5ad", destfile = file.path(data_path,"counts.h5ad")) download.file(url = "https://download.brainimagelibrary.org/cf/1c/cf1c1a431ef8d021/processed_data/cell_labels.csv", destfile = file.path(data_path,"cell_labels.csv")) download.file(url = "https://download.brainimagelibrary.org/cf/1c/cf1c1a431ef8d021/processed_data/segmented_cells_mouse2sample1.csv", destfile = file.path(data_path,"segmented_cells_mouse2sample1.csv")) download.file(url = "https://download.brainimagelibrary.org/cf/1c/cf1c1a431ef8d021/processed_data/segmented_cells_mouse2sample6.csv", destfile = file.path(data_path,"segmented_cells_mouse2sample6.csv")) 16.1.5 Create the Giotto object library(Giotto) library(reticulate) ## Set instructions python_path <- NULL instructions <- createGiottoInstructions( save_dir = results_folder, save_plot = TRUE, show_plot = FALSE, return_plot = FALSE, python_path = python_path ) ## create Giotto object from expression counts. This file contains 61 slices here. giotto_all_slices_obj <- anndataToGiotto(file.path(data_path, "counts.h5ad")) ## load meta_data meta_df <- read.csv(file.path(data_path, "cell_labels.csv"), colClasses = "character") # as the cell IDs are 30 digit numbers, set the type as character to avoid the limitation of R in handling larger integers colnames(meta_df)[[1]] <- "cell_ID" ### we use two slices here to speed up slice1_cells <- meta_df[meta_df$slice_id == "mouse2_slice229",]$cell_ID slice2_cells <- meta_df[meta_df$slice_id == "mouse2_slice300",]$cell_ID selected_cells <- c(slice1_cells, slice2_cells) ## subset giotto obj by cell ID giotto_slice1_obj <- subsetGiotto(gobject = giotto_all_slices_obj, cell_ids = slice1_cells) giotto_slice2_obj <- subsetGiotto(gobject = giotto_all_slices_obj, cell_ids = slice2_cells) ## add cell metadata giotto_slice1_obj <- addCellMetadata(gobject = giotto_slice1_obj, new_metadata = meta_df, by_column = TRUE) giotto_slice2_obj <- addCellMetadata(gobject = giotto_slice2_obj, new_metadata = meta_df, by_column = TRUE) ## cell segmentation. Calculate center (median of vertices) of each cell. segments_1_df <- read.csv(file.path(data_path, "segmented_cells_mouse2sample1.csv"), row.names=1, colClasses = "character") # as the cell IDs are 30 digit numbers, set the type as character to avoid the limitation of R in handling larger integers segments_2_df <- read.csv(file.path(data_path, "segmented_cells_mouse2sample6.csv"), row.names=1, colClasses = "character") # as the cell IDs are 30 digit numbers, set the type as character to avoid the limitation of R in handling larger integers segments_df <- rbind(segments_1_df, segments_2_df) loc.use <- segments_df[selected_cells,] loc.x <- grep("boundaryX_",colnames(loc.use),value = T) loc.y <- grep("boundaryY_",colnames(loc.use),value = T) centr.x <- apply(loc.use[,loc.x],1,function(x){ temp <- lapply(x,function(y){ as.numeric(unlist(strsplit(y,", "))) }) return (median(unname(unlist(temp)))) }) centr.y <- apply(loc.use[,loc.y],1,function(x){ temp <- lapply(x,function(y){ as.numeric(unlist(strsplit(y,", "))) }) return (median(unname(unlist(temp)))) }) ## create spatial locations object spatial_locs_df <- data.frame(cell_ID = selected_cells, sdimx = centr.x, sdimy = centr.y) spatial_locs_slice1_df <- spatial_locs_df[slice1_cells,] spatial_locs_slice2_df <- spatial_locs_df[slice2_cells,] spat_locs_slice1_obj <- readSpatLocsData(data_list = spatial_locs_slice1_df) spat_locs_slice2_obj <- readSpatLocsData(data_list = spatial_locs_slice2_df) ## add spatial location info giotto_slice1_obj <- setSpatialLocations(gobject = giotto_slice1_obj, x = spat_locs_slice1_obj) giotto_slice2_obj <- setSpatialLocations(gobject = giotto_slice2_obj, x = spat_locs_slice2_obj) ## merge two giotto objects together giotto_obj <- joinGiottoObjects(gobject_list = list(giotto_slice1_obj, giotto_slice2_obj), gobject_names = c("mouse2_slice229", "mouse2_slice300"), # name for each samples join_method = "z_stack") ## save giotto obj # saveGiotto saveGiotto(gobject = giotto_obj, foldername = "gobject", dir=results_folder) If you facing network issue when downloading the raw dataset. Please download the processing giotto obj from Zenodo, unzip and move it to results folder giotto_obj <- loadGiotto(path_to_folder = file.path(results_folder, "gobject")) 16.1.5.1 Spatial distribution of cell type spatPlot2D(giotto_obj, group_by = "slice_id", cell_color = "subclass", point_size = 1, point_border_stroke = NA, legend_text = 6) # We skip the processing process here to save time and use the given cell type # annotation directly ONTraC_input <- getONTraCv1Input(gobject = giotto_obj, cell_type = "subclass", output_path = results_folder, spat_unit = "cell", feat_type = "rna", verbose = TRUE) head(ONTraC_input) # Cell_ID Sample x y Cell_Type # <chr> <chr> <dbl> <dbl> <chr> # mouse2_slice229-100101435705986292663283283043431511315 mouse2_slice229 -4828.728 -2203.4502 L6 CT # mouse2_slice229-100104370212612969023746137269354247741 mouse2_slice229 -5405.400 -995.6467 OPC # mouse2_slice229-100128078183217482733448056590230529739 mouse2_slice229 -5731.403 -1071.1735 L2/3 IT # mouse2_slice229-100209662400867003194056898065587980841 mouse2_slice229 -5468.113 -1286.2465 Oligo # mouse2_slice229-100218038012295593766653119076639444055 mouse2_slice229 -6399.986 -959.7440 L2/3 IT # mouse2_slice229-100252992997994275968450436343196667192 mouse2_slice229 -6637.847 -1659.6237 Astro 16.1.6 Perform spatial niche trajectory analysis using ONTraC 16.1.6.1 ONTraC Installation You could run ONTraC on your own laptop or on an HPC with an NVIDIA GPU node. It will run for less than 10 minutes on this example dataset. For larger datasets, running on an NVIDIA GPU is recommended, otherwise it will take a long time. source ~/.bash_profile conda create -y -n ONTraC python=3.11 conda activate ONTraC pip install ONTraC 16.1.6.2 Running ONTraC This step will take several minutes to run. source ~/.bash_profile conda activate ONTraC ONTraC -d results/03_session5/ONTraC_dataset_input.csv --preprocessing-dir results/03_session5/preprocessing_dir --GNN-dir results/03_session5/GNN_dir --NTScore-dir results/03_session5/NTScore_dir --device cuda --epochs 1000 -s 42 --patience 100 --min-delta 0.001 --min-epochs 50 --lr 0.03 --hidden-feats 4 -k 6 --modularity-loss-weight 0.3 --purity-loss-weight 300 --regularization-loss-weight 0.3 --beta 0.03 2>&1 | tee results/03_session5/merfish_subset.log 16.1.7 Visualization 16.1.7.1 Load ONTraC results giotto_obj <- loadOntraCResults(gobject = giotto_obj, ontrac_results_dir = results_folder) The NTScore and binarized niche cluster info were stored in cell metadata head(pDataDT(giotto_obj, spat_unit = "cell", feat_type = "rna")) # cell_ID sample_id slice_id class_label subclass label list_ID NicheCluster NTScore # <char> <char> <char> <char> <char> <char> <char> <int> <num> # 1: mouse2_slice229-100101435705986292663283283043431511315 mouse2_sample6 mouse2_slice229 Glutamatergic L6 CT L6_CT_5 mouse2_slice229 3 0.2002081 # 2: mouse2_slice229-100104370212612969023746137269354247741 mouse2_sample6 mouse2_slice229 Other OPC OPC mouse2_slice229. 1 0.7999791 # 3: mouse2_slice229-100128078183217482733448056590230529739 mouse2_sample6 mouse2_slice229 Glutamatergic L2/3 IT L23_IT_4 mouse2_slice229 1 0.7662198 # 4: mouse2_slice229-100209662400867003194056898065587980841 mouse2_sample6 mouse2_slice229 Other Oligo Oligo_1 mouse2_slice229 5 0.6010420 # 5: mouse2_slice229-100218038012295593766653119076639444055 mouse2_sample6 mouse2_slice229 Glutamatergic L2/3 IT L23_IT_4 mouse2_slice229 1 0.7132024 # 6: mouse2_slice229-100252992997994275968450436343196667192 mouse2_sample6 mouse2_slice229 Other Astro Astro_2 mouse2_slice229 3 0.1980136 The probability matrix of each cell assigned to each niche cluster and connectivity between niche cluster were stored here. GiottoClass::list_expression(giotto_obj) # spat_unit feat_type name # <char> <char> <char> # 1: cell rna raw # 2: cell niche cluster prob # 3: niche cluster connectivity normalized 16.1.7.2 Niche cluster probability distribution spatFeatPlot2D(gobject = giotto_obj, spat_unit = "cell", feat_type = "niche cluster", expression_values = "prob", group_by = "list_ID", feats = rownames(giotto_obj@expression$cell$`niche cluster`$prob), point_border_col = "gray" ) 16.1.7.3 Binarized niche cluster for each cell spatPlot2D(giotto_obj, spat_unit = "cell", group_by = "slice_id", cell_color = "NicheCluster", color_as_factor = TRUE, point_size = 1, point_border_stroke = NA) 16.1.7.4 Niche cluster spatial connectivity set.seed(42) # fix the node positions plotNicheClusterConnectivity(gobject = giotto_obj) 16.1.7.5 NT (niche trajectory) score spatPlot2D(gobject = giotto_obj, spat_unit = "cell", feat_type = "rna", group_by = "slice_id", cell_color = "NTScore", color_as_factor = FALSE, cell_color_gradient = "turbo", point_size = 1, point_border_stroke = NA ) We could change the direction of NT scores here. giotto_obj@cell_metadata$cell$rna$NTScore <- 1 - giotto_obj@cell_metadata$cell$rna$NTScore spatPlot2D(gobject = giotto_obj, spat_unit = "cell", feat_type = "rna", group_by = "slice_id", cell_color = "NTScore", color_as_factor = FALSE, cell_color_gradient = "turbo", point_size = 1, point_border_stroke = NA ) plotCellTypeNTScore(gobject = giotto_obj, cell_type = "subclass", values = "NTScore", spat_unit = "cell", feat_type = "rna") 16.1.7.6 Cell type composition within niche cluster plotCTCompositionInNicheCluster(gobject = giotto_obj, cell_type = "subclass") 16.2 Session info sessionInfo() # R version 4.4.0 (2024-04-24) # Platform: aarch64-apple-darwin20 # Running under: macOS Ventura 13.6.6 # # Matrix products: default # BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib # LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 # # locale: # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 # # time zone: America/New_York # tzcode source: internal # # attached base packages: # [1] stats graphics grDevices utils datasets methods base # # other attached packages: # [1] ggraph_2.2.1 ggplot2_3.5.1 reticulate_1.37.0 Giotto_4.1.0 GiottoClass_0.3.2 # # loaded via a namespace (and not attached): # [1] tidyselect_1.2.1 viridisLite_0.4.2 dplyr_1.1.4 farver_2.1.2 GiottoVisuals_0.2.4 viridis_0.6.5 fastmap_1.2.0 lazyeval_0.2.2 tweenr_2.0.3 digest_0.6.35 lifecycle_1.0.4 # [12] terra_1.7-78 magrittr_2.0.3 dbscan_1.1-12 compiler_4.4.0 rlang_1.1.4 tools_4.4.0 igraph_2.0.3 utf8_1.2.4 yaml_2.3.8 data.table_1.15.4 knitr_1.47 # [23] labeling_0.4.3 graphlayouts_1.1.1 htmlwidgets_1.6.4 sp_2.1-4 plyr_1.8.9 RColorBrewer_1.1-3 withr_3.0.0 purrr_1.0.2 grid_4.4.0 polyclip_1.10-6 fansi_1.0.6 # [34] colorspace_2.1-0 scales_1.3.0 gtools_3.9.5 MASS_7.3-60.2 cli_3.6.2 rmarkdown_2.27 generics_0.1.3 rstudioapi_0.16.0 httr_1.4.7 reshape2_1.4.4 cachem_1.1.0 # [45] ggforce_0.4.2 stringr_1.5.1 parallel_4.4.0 matrixStats_1.3.0 vctrs_0.6.5 Matrix_1.7-0 jsonlite_1.8.8 bookdown_0.40 ggrepel_0.9.5 scattermore_1.2 magick_2.8.3 # [56] GiottoUtils_0.1.10 plotly_4.10.4 tidyr_1.3.1 glue_1.7.0 codetools_0.2-20 cowplot_1.1.3 stringi_1.8.4 gtable_0.3.5 deldir_2.0-4 munsell_0.5.1 tibble_3.2.1 # [67] pillar_1.9.0 htmltools_0.5.8.1 R6_2.5.1 tidygraph_1.3.1 evaluate_0.24.0 lattice_0.22-6 png_0.1-8 backports_1.5.0 memoise_2.0.1 Rcpp_1.0.12 gridExtra_2.3 # [78] checkmate_2.3.1 colorRamp2_0.1.0 xfun_0.44 pkgconfig_2.0.3 "],["interactivity-with-the-rspatial-ecosystem.html", "17 Interactivity with the R/Spatial ecosystem 17.1 Visium technology 17.2 Gene expression interpolation through kriging 17.3 Downloading the dataset 17.4 Extracting the files 17.5 Downloading giotto object and nuclei segmentation 17.6 Importing visium data 17.7 Performing kriging 17.8 Adding cell polygons to Giotto object 17.9 Reading in larger dataset 17.10 Analyzing interpolated features", " 17 Interactivity with the R/Spatial ecosystem Jeff Sheridan August 7th 2024 17.1 Visium technology Figure 17.1: Overview of Visium. Source: 10X Genomics. Visium by 10x Genomics is a spatial gene expression platform that allows for the mapping of gene expression to high-resolution histology through RNA sequencing The process involves placing a tissue section on a specially prepared slide with an array of barcoded spots, which are 55 µm in diameter with a spot to spot distance of 100 µm. Each spot contains unique barcodes that capture the mRNA from the tissue section, preserving the spatial information. After the tissue is imaged and RNA is captured, the mRNA is sequenced, and the data is mapped back to the tissue”s spatial coordinates. This technology is particularly useful in understanding complex tissue environments, such as tumors, by providing insights into how gene expression varies across different regions. 17.2 Gene expression interpolation through kriging Low resolution spatial data typically covers multiple cells making it difficult to delineate the cell contribution to gene expression. Using a process called kriging we can interpolate gene expression and map it to the single cell level from low resolution datasets. Kriging is a spatial interpolation technique that estimates unknown values at specific locations by weighing nearby known values based on distance and spatial trends. It uses a model to account for both the distance between points and the overall pattern in the data to make accurate predictions. By taking discrete measurement spots, such as those used for visium, we can interpolate gene expression to a finer scale using kriging. 17.2.1 Dataset For this tutorial we’ll be using the mouse brain dataset described in section 6. Visium datasets require a high resolution H&E or IF image to align spots to. Using these images we can identify individual nuclei and cells to be used for kriging. Identifying nuclei is outside the scope of the current tutorial but is required to perform kriging. 17.2.2 Generating a geojson file of nuclei location For the following sections we will need to create a geojson that contains polygon information for the nuclei in the sample. We will be providing this in the following link, however when using for your own datasets this will need to be done outside of Giotto. A tutorial for this using qupath can be found here. 17.3 Downloading the dataset We first need to import a dataset that we want to perform kriging on. data_directory <- "data/03_session6" dir.create(data_directory, showWarnings = F) download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_raw_feature_bc_matrix.tar.gz", destfile = file.path(data_directory, "V1_Adult_Mouse_Brain_raw_feature_bc_matrix.tar.gz")) download.file(url = "https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_spatial.tar.gz", destfile = file.path(data_directory, "V1_Adult_Mouse_Brain_spatial.tar.gz")) 17.4 Extracting the files untar(tarfile = file.path(data_directory, "V1_Adult_Mouse_Brain_raw_feature_bc_matrix.tar.gz"), exdir = data_directory) untar(tarfile = file.path(data_directory, "V1_Adult_Mouse_Brain_spatial.tar.gz"), exdir = data_directory) 17.5 Downloading giotto object and nuclei segmentation We will need nuclei/cell segmentations to perform the kriging. Later in the tutorial we’ll also be using a pre-made giotto object. Download them using the following: destfile <- file.path(data_directory, "subcellular_gobject.zip") options(timeout = Inf) # Needed to download large files download.file("https://zenodo.org/records/13144556/files/Day3_Session6.zip?download=1", destfile = destfile) unzip(file.path(data_directory, "subcellular_gobject.zip"), exdir = data_directory) 17.6 Importing visium data We’re going to begin by creating a Giotto object for the visium mouse brain dataset. This tutorial won’t go into detail about each of these steps as these have been covered for this dataset in section 6. To get the best results when performing gene expression interpolation we need to identify spatially distinct genes. Therefore, we need to perform nearest neighbor to create a spatial network. If you have a Giotto object from day 1 session 5, feel free to load that in and skip this first step. library(Giotto) save_directory <- "results/03_session6" visium_save_directory <- file.path(save_directory, "visium_mouse_brain") subcell_save_directory <- file.path(save_directory, "pseudo_subcellular/") instrs <- createGiottoInstructions(show_plot = TRUE, save_plot = TRUE, save_dir = visium_save_directory) v_brain <- createGiottoVisiumObject(data_directory, gene_column_index = 2, instructions = instrs) # Subset to in tissue only cm <- pDataDT(v_brain) in_tissue_barcodes <- cm[in_tissue == 1]$cell_ID v_brain <- subsetGiotto(v_brain, cell_ids = in_tissue_barcodes) # Filter v_brain <- filterGiotto(gobject = v_brain, expression_threshold = 1, feat_det_in_min_cells = 50, min_det_feats_per_cell = 1000, expression_values = "raw") # Normalize v_brain <- normalizeGiotto(gobject = v_brain, scalefactor = 6000, verbose = TRUE) # Add stats v_brain <- addStatistics(gobject = v_brain) # ID HVF v_brain <- calculateHVF(gobject = v_brain, method = "cov_loess") fm <- fDataDT(v_brain) hv_feats <- fm[hvf == "yes" & perc_cells > 3 & mean_expr_det > 0.4]$feat_ID # Dimension Reductions v_brain <- runPCA(gobject = v_brain, feats_to_use = hv_feats) v_brain <- runUMAP(v_brain, dimensions_to_use = 1:10, n_neighbors = 15, set_seed = TRUE) # NN Network v_brain <- createNearestNetwork(gobject = v_brain, dimensions_to_use = 1:10, k = 15) # Leiden Cluster v_brain <- doLeidenCluster(gobject = v_brain, resolution = 0.4, n_iterations = 1000, set_seed = TRUE) # Spatial Network (kNN) v_brain <- createSpatialNetwork(gobject = v_brain, method = "kNN", k = 5, maximum_distance_knn = 400, name = "spatial_network") spatPlot2D(gobject = v_brain, spat_unit = "cell", cell_color = "leiden_clus", show_image = TRUE, point_size = 1.5, point_shape = "no_border", background_color = "black", show_legend = TRUE, save_plot = TRUE, save_param = list(save_name = "03_ses6_1_vis_spat")) Here we can see the clustering of the regular visium spots is able to identify distinct regions of the mouse brain. Figure 17.2: Mouse brain spatial plot showing leiden clustering 17.6.1 Identifying spatially organized features We need to identify genes to be used for interpolation. This works best with genes that are spatially distinct. To identify these genes we’ll use binSpect(). For this tutorial we’ll only use the top 15 spatially distinct genes. The more genes used for interpolation the longer the analysis will take. When running this for your own datasets you should use more genes. We are only using 15 here to minimize analysis time. # Spatially Variable Features ranktest <- binSpect(v_brain, bin_method = "rank", calc_hub = TRUE, hub_min_int = 5, spatial_network_name = "spatial_network", do_parallel = TRUE, cores = 8) #not able to provide a seed number, so do not set one # Getting the top 15 spatially organized genes ext_spatial_features <- ranktest[1:15,]$feats 17.7 Performing kriging 17.7.1 Interpolating features Now we can perform gene expression interpolation. This involves creating a raster image for the gene expression of each of the selected genes. The steps from here can be time consuming and require large amounts of memory. We will only be analyzing 15 genes to show the process of expression interpolation. For clustering and other analyses more genes are required. future::plan(future::multisession()) # comment out for single threading v_brain <- interpolateFeature(v_brain, spat_unit = "cell", feat_type = "rna", ext = ext(v_brain), feats = ext_spatial_features, overwrite = TRUE) print(v_brain) Figure 17.3: Giotto object after to interpolating features. Addition of images for each interoplated feature (left) and an example of rasterized gene expression image (right). For each gene that we interpolate a raster image is exported based on the gene expression. Shown below is an example of an output for the gene Pantr1. Figure 17.4: Raster of gene expression interpolation for Pantr1 17.8 Adding cell polygons to Giotto object 17.8.1 Read in the poly information First we need to read in the geojson file that contains the cell polygons that we’ll interpolate gene expression onto. These will then be added to the Giotto object as a new polygon object. This won’t affect the visium polygons. Both polygons will be stored within the same Giotto object. # Read in the data stardist_cell_poly_path <- file.path(data_directory, "segmentations/stardist_only_cell_bounds.geojson") stardist_cell_gpoly <- createGiottoPolygonsFromGeoJSON(GeoJSON = stardist_cell_poly_path, name = "stardist_cell", calc_centroids = TRUE) stardist_cell_gpoly <- flip(stardist_cell_gpoly) 17.8.2 Vizualizing polygons Below we can see a visualization of the polygons for the visium and the nuclei we identified from the H&E image. The visium dataset has 2698 spots compared to the 36694 nuclei we identified. Just using the visium spots we’re therefore losing a lot of the spatial data for individual cells. With the increased number of spots and them directly correlating with the tissue, through the spots alone we are able to better see the actual structure of the mouse brain. plot(getPolygonInfo(v_brain)) plot(stardist_cell_gpoly, max_poly = 1e6) Figure 17.5: Mouse brain cell polygons from the visium dataset Figure 17.6: Mouse brain cell polygons with artifacts removed and flipped 17.8.3 Showing Giotto object prior to polygon addition Before we add the polygons we can see the gobject contains “cell” as a spatial unit and a polygon. print(v_brain) Figure 17.7: Giotto object before adding subcellular polygons. 17.8.4 Adding polygons to giotto object After we add the nuclei polygons we can see that a new polygon name, “stardist_cell” has been added to the gobject. v_brain <- addGiottoPolygons(v_brain, gpolygons = list("stardist_cell" = stardist_cell_gpoly)) print(v_brain) Figure 17.8: Giotto object after to adding subcellular polygons. 17.8.5 Check polygon information We can now see the addition of the new polygons under the name “stardist_cell”. Each of the new polyons is given a unique poly_ID as shown below. Each polygon is also added into same space as the original visium spots, therefore line up with the same image as the visium spots. poly_info <- getPolygonInfo(v_brain, polygon_name = "stardist_cell") print(poly_info) Figure 17.9: Polygon information for stardist_cell. 17.8.6 Expression overlap The raster we created above gives the gene expression in a graphical form. We next need to determine how that relates to the nuclei location. To determine that we will calculate the overlap of the rasterized gene expression image to the polygons supplied earlier. This step also takes more time the more genes that are provided. For large datasets please allow up to multiple hours for these steps to run. v_brain <- calculateOverlapPolygonImages(gobject = v_brain, name_overlap = "rna", spatial_info = "stardist_cell", image_names = ext_spatial_features) v_brain <- Giotto::overlapToMatrix(x = v_brain, poly_info = "stardist_cell", feat_info = "rna", aggr_function = "sum", type="intensity") After performing the overlap we now have expression data for each gene provided. This can be seen below where we see the interpolated gene expression for genes in each of the nuclei we identified. Figure 17.10: Gene expression for cells based on interpolation. 17.9 Reading in larger dataset For better results more genes are required. The above data used only 15 genes. We will now read in a dataset that has 1500 interpolated genes an use this for the remained of the tutorial. If you haven’t downloaded this dataset please download it here. v_brain <- loadGiotto(file.path(data_directory, "subcellular_gobject")) 17.10 Analyzing interpolated features 17.10.1 Filter and normalization Now that we have a valid spat unit and gene expression data for each of the provided genes we can now perform the same analyses we used for the regular visium data. Please note that due to the differences in cell number that the values used for the current analysis aren’t identical to the visium analysis. v_brain <- filterGiotto(gobject = v_brain, spat_unit = "stardist_cell", expression_values = "raw", expression_threshold = 1, feat_det_in_min_cells = 0, min_det_feats_per_cell = 1) v_brain <- normalizeGiotto(gobject = v_brain, spat_unit = "stardist_cell", scalefactor = 6000, verbose = TRUE) 17.10.2 Visualizing gene expression from interpolated expression Since we have the gene expression information for both the visium and the interpolated gene expression we can visualize gene expression for both from the same Giotto object. We will look at the expression for two genes “Sparc” and “Pantr1” for both the visium and interpolated data. spatFeatPlot2D(v_brain, spat_unit = "cell", gradient_style = "sequential", cell_color_gradient = "Geyser", feats = "Sparc", point_size = 2, save_plot = TRUE, show_image = TRUE, save_param = list(save_name = "03_ses6_sparc_vis")) spatFeatPlot2D(v_brain, spat_unit = "stardist_cell", gradient_style = "sequential", cell_color_gradient = "Geyser", feats = "Sparc", point_size = 0.6, save_plot = TRUE, show_image = TRUE, save_param = list(save_name = "03_ses6_sparc")) spatFeatPlot2D(v_brain, spat_unit = "cell", gradient_style = "sequential", feats = "Pantr1", cell_color_gradient = "Geyser", point_size = 2, save_plot = TRUE, show_image = TRUE, save_param = list(save_name = "03_ses6_pantr1_vis")) spatFeatPlot2D(v_brain, spat_unit = "stardist_cell", gradient_style = "sequential", cell_color_gradient = "Geyser", feats = "Pantr1", point_size = 0.6, save_plot = TRUE, show_image = TRUE, save_param = list(save_name = "03_ses6_pantr1")) Below we can see the gene expression for both datatypes. With the interpolated gene expression we’re able to get a better idea as to the cells that are expressing each of the genes. This is especially clear with Pantr1, which clearly localizes to the pyramidal layer. Figure 17.11: Gene expression for visium (left) and interpolated (right) expression for Sparc (top) and Pantr1 (bottom). 17.10.3 Run PCA v_brain <- runPCA(gobject = v_brain, spat_unit = "stardist_cell", expression_values = "normalized", feats_to_use = NULL) 17.10.4 Clustering # UMAP v_brain <- runUMAP(v_brain, spat_unit = "stardist_cell", dimensions_to_use = 1:15, n_neighbors = 1000, min_dist = 0.001, spread = 1) # NN Network v_brain <- createNearestNetwork(gobject = v_brain, spat_unit = "stardist_cell", dimensions_to_use = 1:10, feats_to_use = hv_feats, expression_values = "normalized", k = 70) v_brain <- doLeidenCluster(gobject = v_brain, spat_unit = "stardist_cell", resolution = 0.15, n_iterations = 100, partition_type = "RBConfigurationVertexPartition") plotUMAP(v_brain, spat_unit = "stardist_cell", cell_color = "leiden_clus") Figure 17.12: UMAP for stardist_cell based on the 1500 interpolated gene expressions. Colored based on leiden clustering. 17.10.5 Visualizing clustering Visualizing the clustering for both the visium dataset and the interpolated dataset we can get similar clusters. However, with the interpolated dataset we are able to see finer detail for each cluster. spatPlot2D(gobject = v_brain, spat_unit = "cell", cell_color = "leiden_clus", show_image = TRUE, point_size = 0.5, point_shape = "no_border", background_color = "black", save_plot = FALSE, show_legend = TRUE) spatPlot2D(gobject = v_brain, spat_unit = "stardist_cell", cell_color = "leiden_clus", show_image = TRUE, point_size = 0.1, point_shape = "no_border", background_color = "black", show_legend = TRUE, save_plot = TRUE, save_param = list(save_name = "03_ses6_subcell_spat")) Figure 17.13: Spatial plots showing leiden clustering mapped onto the base visium spots (left) and individual nuceli through interpolation (right) 17.10.6 Cropping objects We are also able to crop both spat units simultaneously to zoom in on specific regions of the tissue such as seen below. v_brain_crop <- subsetGiottoLocs(gobject = v_brain, spat_unit = ":all:", x_min = 4000, x_max = 7000, y_min = -6500, y_max = -3500, z_max = NULL, z_min = NULL) spatPlot2D(gobject = v_brain_crop, spat_unit = "cell", cell_color = "leiden_clus", show_image = TRUE, point_size = 2, point_shape = "no_border", background_color = "black", show_legend = TRUE, save_plot = TRUE, save_param = list(save_name = "03_ses6_vis_spat_crop")) spatPlot2D(gobject = v_brain_crop, spat_unit = "stardist_cell", cell_color = "leiden_clus", show_image = TRUE, point_size = 0.1, point_shape = "no_border", background_color = "black", show_legend = TRUE, save_plot = TRUE, save_param = list(save_name = "03_ses6_subcell_spat_crop")) Figure 17.14: Spatial plots showing leiden clustering mapped onto the base visium spots (left) and individual nuceli through interpolation (right) "],["contributing-to-giotto.html", "18 Contributing to Giotto 18.1 Contribution guideline 18.2 Coding Style 18.3 Stat functions 18.4 Auxiliary functions 18.5 Package Imports 18.6 Python code", " 18 Contributing to Giotto Jiaji George Chen August 7th 2024 18.1 Contribution guideline To be updated… https://drieslab.github.io/Giotto_website/CONTRIBUTING.html We welcome contributions or suggestions from other developers. Please contact us if you have questions or would like to discuss an addition or major modifications to the Giotto main code. The source code for Giotto Suite may be found on our GitHub repository. 18.2 Coding Style Following a particular programming style will help programmers read and understand source code conforming to the style, and help to avoid introducing errors. Here we present a small list of guidelines on what is considered a good practice when writing R codes in Giotto package. Most of them are adapted from Bioconductor - coding style or Google’s R Style Guide. These guidelines are preferences and strongly encouraged! Naming Use camelCase for Giotto user-facing exported function names. (functionName()) Use snake_case for non-user-facing exported functions, which are essentially any functions not directly related to commonly used data processing, analysis, and visualization. (function_name()) Use . prefix and snake_case for internal non-exported functions. (.function_name()) Use snake_case for parameter names. Do not use . as a separator in function naming. (in the S3 class system, some(x) where x is class A will dispatch to some.A) Use of ` ` (space) characters Do not place a space before a comma, but always place one after a comma. This: a, b, c. Always use space around = when using named arguments to functions. This: somefunc(a = 1, b = 2). Use of symbols Do not use any non-UTF-8 characters unless provided as the escape code. For example: \\u00F6 for ö Beyond these guidelines, styler should be used in order to maintain code uniformity. 18.3 Stat functions Most Giotto commands can accept several matrix classes (DelayedMatrix, SparseM, Matrix or base matrix). To facilitate this we provide flexible wrappers that work on any type of matrix class. mean_flex: analogous to mean() rowSums_flex: analogous to rowSums() rowMeans_flex: analogous to rowMeans() colSums_flex: analogous to colSums() colMeans_flex: analogous to colMeans() t_flex: analogous to t() cor_flex: analogous to cor() 18.4 Auxiliary functions Giotto has a number of auxiliary or convenience functions that might help you to adapt your code or write new code for Giotto. We encourage you to use these small functions to maintain uniformity throughout the code. lapply_flex: analogous to lapply() and works for both windows and unix systems all_plots_save_function: compatible with Giotto instructions and helps to automatically save generated plots plot_output_handler: further wraps all_plots_save_function and includes handling for return_plot and show_plot and Giotto instructions checking determine_cores: to determine the number of cores to use if a user does not set this explicitly get_os: to identify the operating system update_giotto_params: will catch and store the parameters for each used command on a giotto object wrap_txt and wrap_msg: text and message formatting functions vmsg: framework for Giotto’s verbosity-flagged messages package_check: to check if a package exists, works for packages on CRAN, Bioconductor and Github The last function should be used within your contribution code. It has the additional benefit that it will suggest the user how to download the package if it is not available. To keep the size of Giotto within limits we prefer not to add too many new dependencies. 18.5 Package Imports Giotto tracks packages and functions to import in a centralized manner. When adding code that requires functions from another package, add the roxygen tags to the package_imports.R file for that Giotto module. Getters and Setters Giotto stores information in different slots, which can be accessed through these getters and setters functions. They can be found in the accessors.R file. getCellMetadata(): Gets cell metadata setCellMetadata(): Sets cell metadata getFeatureMetadata(): Gets feature metadata getFeatureMetadata(): Sets feature metadata getExpression(): To select the expression matrix to use setExpression(): Sets a new expression matrix to the expression slot getSpatialLocations(): Get spatial locations to use setSpatialLocations(): Sets new spatial locations getDimReduction(): To select the dimension reduction values to use setDimReduction(): Sets new dimension reduction object getNearestNetwork(): To select the nearest neighbor network (kNN or sNN) to use setNearestNetwork(): Sets a new nearest neighbor network (kNN or sNN) getSpatialNetwork(): To select the spatial network to use setSpatialNetwork(): Sets a new spatial network getPolygonInfo(): Gets spatial polygon information setPolygonInfo(): Set new spatial polygon information getFeatureInfo(): Gets spatial feature information setFeatureInfo(): Sets new spatial feature information getSpatialEnrichment(): Gets spatial enrichment information setSpatialEnrichment(): Sets new spatial enrichment information getMultiomics(): Gets multiomics information setMultiomics(): Sets multiomics information 18.6 Python code To use Python code we prefer to create a python wrapper/functions around the python code, which can then be sourced by reticulate. As an example we show the basic principles of how we implemented the Leiden clustering algorithm. write python wrapper and store as python_leiden.py in /inst/python: import igraph as ig import leidenalg as la import pandas as pd import networkx as nx def python_leiden(df, partition_type, initial_membership=None, weights=None, n_iterations=2, seed=None, resolution_parameter = 1): # create networkx object Gx = nx.from_pandas_edgelist(df = df, source = 'from', target = 'to', edge_attr = 'weight') # get weight attribute myweights = nx.get_edge_attributes(Gx, 'weight') .... return(leiden_dfr) source python code with reticulate: python_leiden_function = system.file(“python”, “python_leiden.py”, package = ‘Giotto’) reticulate::source_python(file = python_leiden_function) use python code as if R code: See doLeidenCLuster for more detailed information. pyth_leid_result = python_leiden(df = network_edge_dt, partition_type = partition_type, initial_membership = init_membership, weights = ‘weight’, n_iterations = n_iterations, seed = seed_number, resolution_parameter = resolution) "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]]
diff --git a/spatial-multi-modal-analysis.html b/spatial-multi-modal-analysis.html
index be56652..a632e69 100644
--- a/spatial-multi-modal-analysis.html
+++ b/spatial-multi-modal-analysis.html
@@ -556,6 +556,11 @@
18 Contributing to Giotto
@@ -673,6 +678,7 @@
Affine transforms
+
To perform the linear transform, the xy coordinates just need to be matrix multiplied by the 2x2 affine matrix. The resulting values should then be added to the translate values.
Due to the nature of matrix multiplication, you can simply multiply the affine matrices with each other and when the xy coordinates are multiplied by the resulting matrix, it performs both linear transforms in the same step.
diff --git a/spatial-omics-technologies.html b/spatial-omics-technologies.html
index 4292f97..74eddc7 100644
--- a/spatial-omics-technologies.html
+++ b/spatial-omics-technologies.html
@@ -556,6 +556,11 @@
18 Contributing to Giotto
diff --git a/spatial-proteomics-multiplexed-immunofluorescence.html b/spatial-proteomics-multiplexed-immunofluorescence.html
index b8755dc..180cea1 100644
--- a/spatial-proteomics-multiplexed-immunofluorescence.html
+++ b/spatial-proteomics-multiplexed-immunofluorescence.html
@@ -556,6 +556,11 @@
18 Contributing to Giotto
diff --git a/visium-hd.html b/visium-hd.html
index dfe1ef2..dd05333 100644
--- a/visium-hd.html
+++ b/visium-hd.html
@@ -556,6 +556,11 @@
18 Contributing to Giotto
diff --git a/visium-part-i.html b/visium-part-i.html
index a35aa14..171056b 100644
--- a/visium-part-i.html
+++ b/visium-part-i.html
@@ -556,6 +556,11 @@
18 Contributing to Giotto
diff --git a/visium-part-ii.html b/visium-part-ii.html
index 1869c19..a629b02 100644
--- a/visium-part-ii.html
+++ b/visium-part-ii.html
@@ -556,6 +556,11 @@
18 Contributing to Giotto
diff --git a/working-with-multiple-samples.html b/working-with-multiple-samples.html
index 90e3384..2ea925f 100644
--- a/working-with-multiple-samples.html
+++ b/working-with-multiple-samples.html
@@ -556,6 +556,11 @@
18 Contributing to Giotto
diff --git a/xenium-1.html b/xenium-1.html
index 846cbc2..4899b58 100644
--- a/xenium-1.html
+++ b/xenium-1.html
@@ -556,6 +556,11 @@
18 Contributing to Giotto