-
Notifications
You must be signed in to change notification settings - Fork 7
/
02_session2.Rmd
1132 lines (871 loc) · 40.7 KB
/
02_session2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Visium HD
Ruben Dries & Edward C. Ruiz
August 6th 2024
## Objective {#objective}
This tutorial demonstrates how to process Visium HD data at the highest 2 micron bin resolution by using flexible tiling and aggregation steps that are available in [Giotto Suite](https://drieslab.github.io/Giotto_website/). Notably, a similar strategy can be used for other spatial sequencing methods that operate at the subcellular level, including:\
- Stereo-seq\
- [Seq-Scope](https://drieslab.github.io/Giotto_website/articles/seqscope_mouse_liver.html)\
- Open-ST
The resulting datasets from all these technologies can be very large since they provide both a high spatial resolution and genome-wide capture of *all* transcripts. We will also discuss how data projection strategies can be used to alleviate heavy computational tasks such as PCA, UMAP, or clustering.
This tutorial expects a general knowledge of common spatial analysis technologies that are available in Giotto Suite, such as those that have been discussed in the standard Visium tutorials ([part I](#Visium%20Part%20I) and [part II](Visium%20Part%20II)).
## Background
### Visium HD Technology
```{r, echo=FALSE, out.width="80%", fig.align="center", fig.cap="Overview of Visium HD. Source: *10X Genomics*"}
knitr::include_graphics("img/02_session2/1.png")
```
Visium HD is a spatial transcriptomics technology recently developed by 10X Genomics. Details about this platform are discussed on the official [10X Genomics Visium HD website](https://www.10xgenomics.com/products/visium-hd-spatial-gene-expression) and the preprint by [Oliveira *et al.* 2024](https://doi.org/10.1101/2024.06.04.597233) on *bioRxiv*.
Visium HD has a 2 micron bin size resolution. The default SpaceRanger pipeline from 10X Genomics also returns aggregated data at the 8 and 16 micron bin size.
### Colorectal Cancer Sample
```{r, echo=FALSE, out.width="80%", fig.align="center", fig.cap="Colorectal Cancer Overview. Source: *10X Genomics*"}
knitr::include_graphics("img/02_session2/2.png")
```
For this tutorial we will be using the publicly available Colorectal Cancer Visium HD dataset.
Details about this dataset and a link to download the raw data can be found at the [10X Genomics website](https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-libraries-of-human-crc).
## Data Ingestion
### Visium HD output data format
```{r, echo=FALSE, out.width="80%", fig.align="center", fig.cap="File structure of Visium HD data processed with spaceranger pipeline."}
knitr::include_graphics("img/02_session2/3.png")
```
Visium HD data processed with the spaceranger pipeline is organized in this format containing various files associated with the sample. The files highlighted in yellow are what we will be using to read in these datasets.
**Warning**: the VisiumHD folder structure has very recently been updated and might be slightly different.
### Mini Visium HD dataset
For this workshop we will use a spatial subset and downsampled version of the original datasets. A VisiumHD folder similar to the original can be downloaded using the Zenodo link. Using this dataset will ensure that we will not run into major memory issues.
```{r, eval = FALSE}
library(Giotto)
# set up paths
data_path <- "data/02_session2/"
save_dir <- "results/02_session2/"
dir.create(save_dir, recursive = TRUE)
# download the mini dataset and untar
options("timeout" = Inf)
download.file(
url = "https://zenodo.org/records/13226158/files/workshop_VisiumHD.zip?download=1",
destfile = file.path(save_dir, "workshop_visiumHD.zip")
)
untar(tarfile = file.path(save_dir, "workshop_visiumHD.zip"),
exdir = data_path)
```
### Giotto Visium HD convenience function
The easiest way to read in Visium HD data in Giotto is through our convenience function. This function will automatically read in the data at your desired resolution, align the images, and finally create a Giotto Object.
```{r, eval = FALSE}
# importVisiumHD()
```
### Read in data manually
However, for this tutorial we will illustrate how to create your own Giotto object in a step-by-step manner, which can also be applied to other similar technologies as discussed in the [Objective](#Objective) section.
#### Raw expression data
```{r, eval = FALSE}
expression_path <- file.path(data_path, '/Human_Colorectal_Cancer_workshop/square_002um/raw_feature_bc_matrix')
expr_results <- get10Xmatrix(path_to_data = expression_path,
gene_column_index = 1)
```
#### Tissue positions data
```{r, eval = FALSE}
tissue_positions_path <- file.path(data_path, '/Human_Colorectal_Cancer_workshop/square_002um/spatial/tissue_positions.parquet')
tissue_positions <- data.table::as.data.table(arrow::read_parquet(tissue_positions_path))
```
#### Merge expression and 2 micron position data
```{r, eval = FALSE}
# convert expression matrix to minimal data.frame or data.table object
matrix_tile_dt <- data.table::as.data.table(Matrix::summary(expr_results))
genes <- expr_results@Dimnames[[1]]
samples <- expr_results@Dimnames[[2]]
matrix_tile_dt[, gene := genes[i]]
matrix_tile_dt[, pixel := samples[j]]
```
```{r, echo=FALSE, out.width="80%", fig.align="center", fig.cap="Genes expressed for each 2 µm pixel in the array dimensions."}
knitr::include_graphics("img/02_session2/4.png")
```
```{r, eval = FALSE}
# merge data.table matrix and spatial coordinates to create input for Giotto Polygons
expr_pos_data <- data.table::merge.data.table(matrix_tile_dt,
tissue_positions,
by.x = 'pixel',
by.y = 'barcode')
expr_pos_data <- expr_pos_data[,.(pixel, pxl_row_in_fullres, pxl_col_in_fullres, gene, x)]
colnames(expr_pos_data) = c('pixel', 'x', 'y', 'gene', 'count')
```
```{r, echo=FALSE, out.width="80%", fig.align="center", fig.cap="Genes expressed with count for each 2 µm pixel in the spatial dimensions."}
knitr::include_graphics("img/02_session2/5.png")
```
## Hexbin 400 Giotto object
### create giotto points
The giottoPoints object represents the spatial expression information for each transcript: - gene id - count or UMI\
- spatial pixel location (x, y)
```{r, eval = FALSE}
giotto_points = createGiottoPoints(x = expr_pos_data[,.(x, y, gene, pixel, count)])
```
### create giotto polygons
#### Tiling and aggregation
The Visium HD data is organized in a grid format. We can aggregate the data into larger bins to reduce the resolution of the data. Giotto Suite can work with any type of polygon information and already provides ready-to-use options for binning data with squares, triangles, and hexagons. Here we will use a hexagon tesselation to aggregate the data into arbitrary bins.
```{r, echo=FALSE, out.width="80%", fig.align="center", fig.cap="Hexagon properties"}
knitr::include_graphics("img/02_session2/6.png")
```
```{r, eval=FALSE}
# create giotto polygons, here we create hexagons
hexbin400 <- tessellate(extent = ext(giotto_points),
shape = 'hexagon',
shape_size = 400,
name = 'hex400')
plot(hexbin400)
```
```{r, echo=FALSE, out.width="80%", fig.align="center", fig.cap="Giotto polygon in a hexagon shape for overlapping visium HD expression data."}
knitr::include_graphics("img/02_session2/7.png")
```
### combine Giotto points and polygons to create Giotto object
```{r, eval=FALSE}
instrs = createGiottoInstructions(
save_dir = save_dir,
save_plot = TRUE,
show_plot = FALSE,
return_plot = FALSE
)
# gpoints provides spatial gene expression information
# gpolygons provides spatial unit information (here = hexagon tiles)
visiumHD = createGiottoObjectSubcellular(gpoints = list('rna' = giotto_points),
gpolygons = list('hex400' = hexbin400),
instructions = instrs)
# create spatial centroids for each spatial unit (hexagon)
visiumHD = addSpatialCentroidLocations(gobject = visiumHD,
poly_info = 'hex400')
```
Visualize the Giotto object. Make sure to set `expand_counts = TRUE` to expand the counts column. Each spatial bin can have multiple transcripts/UMIs. This is different compared to in situ technologies like seqFISH, MERFISH, Nanostring CosMx or Xenium.
```{r, echo=FALSE, out.width="80%", fig.align="center", fig.cap="Schematic showing effect of expand counts and jitter."}
knitr::include_graphics("img/02_session2/8.png")
```
Show the giotto points (transcripts) and polygons (hexagons) together using `spatInSituPlotPoints`:
```{r, eval=FALSE}
feature_data = fDataDT(visiumHD)
spatInSituPlotPoints(visiumHD,
show_image = F,
feats = list('rna' = feature_data$feat_ID[10:20]),
show_legend = T,
spat_unit = 'hex400',
point_size = 0.25,
show_polygon = TRUE,
use_overlap = FALSE,
polygon_feat_type = 'hex400',
polygon_bg_color = NA,
polygon_color = 'white',
polygon_line_size = 0.1,
expand_counts = TRUE,
count_info_column = 'count',
jitter = c(25,25))
```
```{r, echo=FALSE, out.width="80%", fig.align="center", fig.cap="Overlap of gene expression with the hex400 polygons. Each dot represents a single gene. Jitter used to better vizualize individual transcripts"}
knitr::include_graphics("img/02_session2/0-spatInSituPlotPoints.png")
```
You can set `plot_method = scattermore` or `scattermost` to convert high-resolution images to low(er) resolution rasterized images. It's usually faster and will save on disk space.
```{r, eval=FALSE}
spatInSituPlotPoints(visiumHD,
show_image = F,
feats = list('rna' = feature_data$feat_ID[10:20]),
show_legend = T,
spat_unit = 'hex400',
point_size = 0.25,
show_polygon = TRUE,
use_overlap = FALSE,
polygon_feat_type = 'hex400',
polygon_bg_color = NA,
polygon_color = 'white',
polygon_line_size = 0.1,
expand_counts = TRUE,
count_info_column = 'count',
jitter = c(25,25),
plot_method = 'scattermore')
```
```{r, echo=FALSE, out.width="80%", fig.align="center", fig.cap="Overlap of gene expression with the hex400 polygons. Genes/transcripts are rasterized. Jitter used to better vizualize individual transcripts"}
knitr::include_graphics("img/02_session2/1-spatInSituPlotPoints.png")
```
### Process Giotto object
#### calculate overlap between points and polygons
At the moment the giotto points (transcripts) and polygons (hexagons) are two separate layers of information. Here we will determine which transcripts overlap with which hexagons so that we can aggregate the gene expression information and convert this into a gene expression matrix (genes-by-hexagons) that can be used in default spatial pipelines.
```{r, eval=FALSE}
# calculate overlap between points and polygons
visiumHD = calculateOverlap(visiumHD,
spatial_info = 'hex400',
feat_info = 'rna')
showGiottoSpatialInfo(visiumHD)
```
#### convert overlap results to a gene-by-hexagon matrix
```{r, eval=FALSE}
# convert overlap results to bin by gene matrix
visiumHD = overlapToMatrix(visiumHD,
poly_info = 'hex400',
feat_info = 'rna',
name = 'raw')
# this action will automatically create an active spatial unit, ie. hexbin 400
activeSpatUnit(visiumHD)
```
#### default processing steps
This part is similar to that described in the Visium tutorials ([Part I](#Visium%20Part%20I) and [Part II](#Visium%20Part%20II)).
```{r, eval=FALSE}
# filter on gene expression matrix
visiumHD <- filterGiotto(visiumHD,
expression_threshold = 1,
feat_det_in_min_cells = 5,
min_det_feats_per_cell = 25)
# normalize and scale gene expression data
visiumHD <- normalizeGiotto(visiumHD,
scalefactor = 1000,
verbose = T)
# add cell and gene statistics
visiumHD <- addStatistics(visiumHD)
```
##### visualize number of features
At the centroid level.
```{r, eval=FALSE}
# each dot here represents a 200x200 aggregation of spatial barcodes (bin size 200)
spatPlot2D(gobject = visiumHD,
cell_color = "nr_feats",
color_as_factor = F,
point_size = 2.5)
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="Number of features detected in each of the centroids."}
knitr::include_graphics("img/02_session2/2-spatPlot2D.png")
```
Using the spatial polygon (hexagon) tiles
```{r, eval=FALSE}
spatInSituPlotPoints(visiumHD,
show_image = F,
feats = NULL,
show_legend = F,
spat_unit = 'hex400',
point_size = 0.1,
show_polygon = TRUE,
use_overlap = TRUE,
polygon_feat_type = 'hex400',
polygon_fill = 'nr_feats',
polygon_fill_as_factor = F,
polygon_bg_color = NA,
polygon_color = 'white',
polygon_line_size = 0.1)
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="Number of features detected in each of the hex400 polygons."}
knitr::include_graphics("img/02_session2/3-spatInSituPlotPoints.png")
```
#### Dimension reduction + clustering
##### Highly variable features + PCA
```{r, eval=FALSE}
visiumHD <- calculateHVF(visiumHD,
zscore_threshold = 1)
visiumHD <- runPCA(visiumHD,
expression_values = 'normalized',
feats_to_use = 'hvf')
screePlot(visiumHD, ncp = 30)
```
```{r, echo=FALSE, out.width="100%", fig.align="center"}
knitr::include_graphics("img/02_session2/4-screePlot.png")
```
```{r, eval=FALSE}
plotPCA(visiumHD)
```
```{r, echo=FALSE, out.width="100%", fig.align="center"}
knitr::include_graphics("img/02_session2/5-PCA.png")
```
##### UMAP reduction for visualization
```{r, eval=FALSE}
visiumHD <- runUMAP(visiumHD,
dimensions_to_use = 1:14,
n_threads = 10)
plotUMAP(gobject = visiumHD,
point_size = 1)
```
```{r, echo=FALSE, out.width="100%", fig.align="center"}
knitr::include_graphics("img/02_session2/6-UMAP.png")
```
##### Create network based on expression similarity + graph partition cluster
```{r, eval=FALSE}
# sNN network (default)
visiumHD <- createNearestNetwork(visiumHD,
dimensions_to_use = 1:14,
k = 5)
## leiden clustering ####
visiumHD <- doLeidenClusterIgraph(visiumHD, resolution = 0.5, n_iterations = 1000, spat_unit = 'hex400')
```
```{r, eval=FALSE}
plotUMAP(gobject = visiumHD,
cell_color = 'leiden_clus',
point_size = 1.5,
show_NN_network = F,
edge_alpha = 0.05)
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="Leiden clustering for the hex400 bins."}
knitr::include_graphics("img/02_session2/7-UMAP.png")
```
```{r, eval=FALSE}
spatInSituPlotPoints(visiumHD,
show_image = F,
feats = NULL,
show_legend = F,
spat_unit = 'hex400',
point_size = 0.25,
show_polygon = TRUE,
use_overlap = FALSE,
polygon_feat_type = 'hex400',
polygon_fill_as_factor = TRUE,
polygon_fill = 'leiden_clus',
polygon_color = 'black',
polygon_line_size = 0.3)
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="Spat plot for hex400 bin colored by leiden clusters."}
knitr::include_graphics("img/02_session2/8-spatInSituPlotPoints.png")
```
## Hexbin 100
Observation: Hexbin 400 results in very coarse information about the tissue.
Goal is to create a higher resolution bin (hex100), then add this to the Giotto object to compare difference in resolution.
### Standard subcellular pipeline
1. Create new spatial unit layer, e.g. with tessellate function\
2. Add spatial units to Giottoo object\
3. Calculate centroids (optional)\
4. Compute overlap between transcript and polygon (hexagon) locations.\
5. Convert overlap data into a gene-by-polygon matrix
```{r, eval=FALSE}
hexbin100 <- tessellate(extent = ext(visiumHD),
shape = 'hexagon',
shape_size = 100,
name = 'hex100')
visiumHD = setPolygonInfo(gobject = visiumHD,
x = hexbin100,
name = 'hex100',
initialize = T)
visiumHD = addSpatialCentroidLocations(gobject = visiumHD,
poly_info = 'hex100')
```
Set active spatial unit. This can also be set manually in each function.
```{r, eval=FALSE}
activeSpatUnit(visiumHD) <- 'hex100'
```
Let's visualize the higher resolution hexagons.
```{r, eval=FALSE}
spatInSituPlotPoints(visiumHD,
show_image = F,
feats = list('rna' = feature_data$feat_ID[1:20]),
show_legend = T,
spat_unit = 'hex100',
point_size = 0.1,
show_polygon = TRUE,
use_overlap = FALSE,
polygon_feat_type = 'hex100',
polygon_bg_color = NA,
polygon_color = 'white',
polygon_line_size = 0.2,
expand_counts = TRUE,
count_info_column = 'count',
jitter = c(25,25))
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="Polygon overlay of hex100 bins over 2 µm pixel. Jitter applied to vizualize individual features."}
knitr::include_graphics("img/02_session2/9-spatInSituPlotPoints.png")
```
```{r, eval=FALSE}
visiumHD = calculateOverlap(visiumHD,
spatial_info = 'hex100',
feat_info = 'rna')
visiumHD = overlapToMatrix(visiumHD,
poly_info = 'hex100',
feat_info = 'rna',
name = 'raw')
visiumHD <- filterGiotto(visiumHD,
expression_threshold = 1,
feat_det_in_min_cells = 10,
min_det_feats_per_cell = 10)
visiumHD <- normalizeGiotto(visiumHD, scalefactor = 1000, verbose = T)
visiumHD <- addStatistics(visiumHD)
```
Your Giotto object will have metadata for each spatial unit.
```{r, eval=FALSE}
pDataDT(visiumHD, spat_unit = 'hex100')
pDataDT(visiumHD, spat_unit = 'hex400')
```
```{r, eval=FALSE}
## dimension reduction ####
# --------------------------- #
visiumHD <- calculateHVF(visiumHD, zscore_threshold = 1)
visiumHD <- runPCA(visiumHD, expression_values = 'normalized', feats_to_use = 'hvf')
plotPCA(visiumHD)
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap=""}
knitr::include_graphics("img/02_session2/10-PCA.png")
```
```{r, eval=FALSE}
visiumHD <- runUMAP(visiumHD, dimensions_to_use = 1:14, n_threads = 10)
# plot UMAP, coloring cells/points based on nr_feats
plotUMAP(gobject = visiumHD,
point_size = 2)
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="UMAP for the hex100 bin."}
knitr::include_graphics("img/02_session2/11-UMAP.png")
```
```{r, eval=FALSE}
# sNN network (default)
visiumHD <- createNearestNetwork(visiumHD,
dimensions_to_use = 1:14,
k = 5)
## leiden clustering ####
visiumHD <- doLeidenClusterIgraph(visiumHD, resolution = 0.2, n_iterations = 1000)
plotUMAP(gobject = visiumHD,
cell_color = 'leiden_clus',
point_size = 1.5,
show_NN_network = F,
edge_alpha = 0.05)
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="UMAP for the hex100 bin colored by ledien clusters."}
knitr::include_graphics("img/02_session2/12-UMAP.png")
```
```{r, eval=FALSE}
spatInSituPlotPoints(visiumHD,
show_image = F,
feats = NULL,
show_legend = F,
spat_unit = 'hex100',
point_size = 0.5,
show_polygon = TRUE,
use_overlap = FALSE,
polygon_feat_type = 'hex100',
polygon_fill_as_factor = TRUE,
polygon_fill = 'leiden_clus',
polygon_color = 'black',
polygon_line_size = 0.3)
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="Spat plot for the hex100 bin colored by leiden clusters."}
knitr::include_graphics("img/02_session2/13-spatInSituPlotPoints.png")
```
This resolution definitely shows more promise to identify interesting spatial patterns.
### Spatial expression patterns
#### Identify single genes
Here we will use binSpect as a quick method to rank genes with high potential for spatial coherent expression patterns.
```{r, eval=FALSE}
featData = fDataDT(visiumHD)
hvf_genes = featData[hvf == 'yes']$feat_ID
visiumHD = createSpatialNetwork(visiumHD,
name = 'kNN_network',
spat_unit = 'hex100',
method = 'kNN',
k = 8)
ranktest = binSpect(visiumHD,
spat_unit = 'hex100',
subset_feats = hvf_genes,
bin_method = 'rank',
calc_hub = FALSE,
do_fisher_test = TRUE,
spatial_network_name = 'kNN_network')
```
Visualize top 2 ranked spatial genes per expression bin:
```{r, eval=FALSE}
set0 = ranktest[high_expr < 50][1:2]$feats
set1 = ranktest[high_expr > 50 & high_expr < 100][1:2]$feats
set2 = ranktest[high_expr > 100 & high_expr < 200][1:2]$feats
set3 = ranktest[high_expr > 200 & high_expr < 400][1:2]$feats
set4 = ranktest[high_expr > 400 & high_expr < 1000][1:2]$feats
set5 = ranktest[high_expr > 1000][1:2]$feats
spatFeatPlot2D(visiumHD,
expression_values = 'scaled',
feats = c(set0, set1, set2),
gradient_style = "sequential",
cell_color_gradient = c('blue', 'white', 'yellow', 'orange', 'red', 'darkred'),
cow_n_col = 2, point_size = 1)
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="Spat feature plot showing gene expression for the top 2 ranked spatial genes per expression bin (<50, >50 and >100) across the hex100 bin."}
knitr::include_graphics("img/02_session2/14-spatFeatPlot2D.png")
```
```{r, eval=FALSE}
spatFeatPlot2D(visiumHD,
expression_values = 'scaled',
feats = c(set3, set4, set5),
gradient_style = "sequential",
cell_color_gradient = c('blue', 'white', 'yellow', 'orange', 'red', 'darkred'),
cow_n_col = 2, point_size = 1)
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="Spat feature plot showing gene expression for the top 2 ranked spatial genes per expression bin (>200, >400 and >1000) across the hex100 bin."}
knitr::include_graphics("img/02_session2/15-spatFeatPlot2D.png")
```
#### Spatial co-expression modules
Investigating individual genes is a good start, but here we would like to identify recurrent spatial expression patterns that are shared by spatial co-expression modules that might represent spatially organized biological processes.
```{r, eval=FALSE}
ext_spatial_genes = ranktest[adj.p.value < 0.001]$feats
spat_cor_netw_DT = detectSpatialCorFeats(visiumHD,
method = 'network',
spatial_network_name = 'kNN_network',
subset_feats = ext_spatial_genes)
# cluster spatial genes
spat_cor_netw_DT = clusterSpatialCorFeats(spat_cor_netw_DT,
name = 'spat_netw_clus',
k = 16)
# visualize clusters
heatmSpatialCorFeats(visiumHD,
spatCorObject = spat_cor_netw_DT,
use_clus_name = 'spat_netw_clus',
heatmap_legend_param = list(title = NULL))
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="Heatmap showing spatially correlated genes split into 16 clusters."}
knitr::include_graphics("img/02_session2/16-heatmSpatialCorFeats.png")
```
```{r, eval=FALSE}
# create metagene enrichment score for clusters
cluster_genes_DT = showSpatialCorFeats(spat_cor_netw_DT,
use_clus_name = 'spat_netw_clus',
show_top_feats = 1)
cluster_genes = cluster_genes_DT$clus; names(cluster_genes) = cluster_genes_DT$feat_ID
visiumHD = createMetafeats(visiumHD,
expression_values = 'normalized',
feat_clusters = cluster_genes,
name = 'cluster_metagene')
showGiottoSpatEnrichments(visiumHD)
```
```{r, eval=FALSE}
spatCellPlot(visiumHD,
spat_enr_names = 'cluster_metagene',
gradient_style = "sequential",
cell_color_gradient = c('blue', 'white', 'yellow', 'orange', 'red', 'darkred'),
cell_annotation_values = as.character(c(1:4)),
point_size = 1, cow_n_col = 2)
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="Spat plot vizualizing metagenes (1-4) based on spatially correlated genes vizualized on the hex100 bin"}
knitr::include_graphics("img/02_session2/17-spatCellPlot2D.png")
```
```{r, eval=FALSE}
spatCellPlot(visiumHD,
spat_enr_names = 'cluster_metagene',
gradient_style = "sequential",
cell_color_gradient = c('blue', 'white', 'yellow', 'orange', 'red', 'darkred'),
cell_annotation_values = as.character(c(5:8)),
point_size = 1, cow_n_col = 2)
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="Spat plot vizualizing metagenes (5-8) based on spatially correlated genes vizualized on the hex100 bin"}
knitr::include_graphics("img/02_session2/18-spatCellPlot2D.png")
```
```{r, eval=FALSE}
spatCellPlot(visiumHD,
spat_enr_names = 'cluster_metagene',
gradient_style = "sequential",
cell_color_gradient = c('blue', 'white', 'yellow', 'orange', 'red', 'darkred'),
cell_annotation_values = as.character(c(9:12)),
point_size = 1, cow_n_col = 2)
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="Spat plot vizualizing metagenes (9-12) based on spatially correlated genes vizualized on the hex100 bin"}
knitr::include_graphics("img/02_session2/19-spatCellPlot2D.png")
```
```{r, eval=FALSE}
spatCellPlot(visiumHD,
spat_enr_names = 'cluster_metagene',
gradient_style = "sequential",
cell_color_gradient = c('blue', 'white', 'yellow', 'orange', 'red', 'darkred'),
cell_annotation_values = as.character(c(13:16)),
point_size = 1, cow_n_col = 2)
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="Spat plot vizualizing metagenes (13-16) based on spatially correlated genes vizualized on the hex100 bin"}
knitr::include_graphics("img/02_session2/20-spatCellPlot2D.png")
```
A simple follow up analysis could be to perform gene set enrichment analysis on each spatial co-expression module.
#### Plot spatial gene groups
Hack! Vendors of spatial technologies typically like to show very interesting spatial gene expression patterns. Here we will follow a similar strategy by selecting a balanced set of genes for each spatial co-expression module and then to simply give them the same color in the `spatInSituPlotPoints` function.
```{r, eval=FALSE}
balanced_genes = getBalancedSpatCoexpressionFeats(spatCorObject = spat_cor_netw_DT,
maximum = 5)
selected_feats = names(balanced_genes)
# give genes from same cluster same color
distinct_colors = getDistinctColors(n = 20)
names(distinct_colors) = 1:20
my_colors = distinct_colors[balanced_genes]
names(my_colors) = names(balanced_genes)
spatInSituPlotPoints(visiumHD,
show_image = F,
feats = list('rna' = selected_feats),
feats_color_code = my_colors,
show_legend = F,
spat_unit = 'hex100',
point_size = 0.20,
show_polygon = FALSE,
use_overlap = FALSE,
polygon_feat_type = 'hex100',
polygon_bg_color = NA,
polygon_color = 'white',
polygon_line_size = 0.01,
expand_counts = TRUE,
count_info_column = 'count',
jitter = c(25,25))
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="Coloring individual features based on the spatially correlated gene clusters."}
knitr::include_graphics("img/02_session2/21-spatInSituPlotPoints.png")
```
## Hexbin 25
Goal is to create a higher resolution bin (hex25) and add to the Giotto object. We will aim to identify individual cell types and local neighborhood niches.
### Subcellular workflow
- filter and normalization workflow
```{r, eval=FALSE}
visiumHD_subset = subsetGiottoLocs(gobject = visiumHD,
x_min = 16000,
x_max = 20000,
y_min = 44250,
y_max = 45500)
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="Coloring individual features based on the spatially correlated gene clusters + subset rectangle."}
knitr::include_graphics("img/02_session2/9.png")
```
Plot visiumHD subset with hexbin100 polygons:
```{r, eval=FALSE}
spatInSituPlotPoints(visiumHD_subset,
show_image = F,
feats = NULL,
show_legend = F,
spat_unit = 'hex100',
point_size = 0.5,
show_polygon = TRUE,
use_overlap = FALSE,
polygon_feat_type = 'hex100',
polygon_fill_as_factor = TRUE,
polygon_fill = 'leiden_clus',
polygon_color = 'black',
polygon_line_size = 0.3)
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="Hexbin100 colored by leiden clustering results"}
knitr::include_graphics("img/02_session2/22-spatInSituPlotPoints.png")
```
Plot visiumHD subset with selected gene features:
```{r, eval=FALSE}
spatInSituPlotPoints(visiumHD_subset,
show_image = F,
feats = list('rna' = selected_feats),
feats_color_code = my_colors,
show_legend = F,
spat_unit = 'hex100',
point_size = 0.40,
show_polygon = TRUE,
use_overlap = FALSE,
polygon_feat_type = 'hex100',
polygon_bg_color = NA,
polygon_color = 'white',
polygon_line_size = 0.05,
jitter = c(25,25))
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="Coloring individual features based on the spatially correlated gene clusters"}
knitr::include_graphics("img/02_session2/23-spatInSituPlotPoints.png")
```
Create smaller hexbin25 tessellations:
```{r, eval=FALSE}
hexbin25 <- tessellate(extent = ext(visiumHD_subset@feat_info$rna),
shape = 'hexagon',
shape_size = 25,
name = 'hex25')
visiumHD_subset = setPolygonInfo(gobject = visiumHD_subset,
x = hexbin25,
name = 'hex25',
initialize = T)
showGiottoSpatialInfo(visiumHD_subset)
visiumHD_subset = addSpatialCentroidLocations(gobject = visiumHD_subset,
poly_info = 'hex25')
activeSpatUnit(visiumHD_subset) <- 'hex25'
spatInSituPlotPoints(visiumHD_subset,
show_image = F,
feats = list('rna' = selected_feats),
feats_color_code = my_colors,
show_legend = F,
spat_unit = 'hex25',
point_size = 0.40,
show_polygon = TRUE,
use_overlap = FALSE,
polygon_feat_type = 'hex25',
polygon_bg_color = NA,
polygon_color = 'white',
polygon_line_size = 0.05,
jitter = c(25,25))
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="xxx"}
knitr::include_graphics("img/02_session2/24-spatInSituPlotPoints.png")
```
```{r, eval=FALSE}
visiumHD_subset = calculateOverlap(visiumHD_subset,
spatial_info = 'hex25',
feat_info = 'rna')
showGiottoSpatialInfo(visiumHD_subset)
# convert overlap results to bin by gene matrix
visiumHD_subset = overlapToMatrix(visiumHD_subset,
poly_info = 'hex25',
feat_info = 'rna',
name = 'raw')
visiumHD_subset <- filterGiotto(visiumHD_subset,
expression_threshold = 1,
feat_det_in_min_cells = 3,
min_det_feats_per_cell = 5)
activeSpatUnit(visiumHD_subset)
# normalize
visiumHD_subset <- normalizeGiotto(visiumHD_subset, scalefactor = 1000, verbose = T)
# add statistics
visiumHD_subset <- addStatistics(visiumHD_subset)
feature_data = fDataDT(visiumHD_subset)
visiumHD_subset <- calculateHVF(visiumHD_subset, zscore_threshold = 1)
```
### Projections
- PCA projection from random subset.\
- UMAP projection from random subset.
- cluster result projection from subsampled Giotto object + kNN voting
#### PCA with projection
```{r, eval=FALSE}
n_25_percent <- round(length(spatIDs(visiumHD_subset, 'hex25')) * 0.25)
# pca projection on subset
visiumHD_subset <- runPCAprojection(
gobject = visiumHD_subset,
spat_unit = "hex25",
feats_to_use = 'hvf',
name = 'pca.projection',
set_seed = TRUE,
seed_number = 12345,
random_subset = n_25_percent
)
showGiottoDimRed(visiumHD_subset)
plotPCA(visiumHD_subset, dim_reduction_name = 'pca.projection')
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="xxx"}
knitr::include_graphics("img/02_session2/25-PCA.png")
```
#### UMAP with projection
```{r, eval=FALSE}
# umap projection on subset
visiumHD_subset <- runUMAPprojection(
gobject = visiumHD_subset,
spat_unit = "hex25",
dim_reduction_to_use = 'pca',
dim_reduction_name = "pca.projection",
dimensions_to_use = 1:10,
name = "umap.projection",
random_subset = n_25_percent,
n_neighbors = 10,
min_dist = 0.005,
n_threads = 4
)
showGiottoDimRed(visiumHD_subset)
# plot UMAP, coloring cells/points based on nr_feats
plotUMAP(gobject = visiumHD_subset,
point_size = 1,
dim_reduction_name = 'umap.projection')
```
```{r, echo=FALSE, out.width="100%", fig.align="center", fig.cap="xxx"}
knitr::include_graphics("img/02_session2/26-UMAP.png")
```
#### clustering with projection
1. subsample Giotto object\
2. perform clustering (e.g. hierarchical clustering)\
3. project cluster results to full Giotto object using a kNN voting approach and a shared dimension reduction space (e.g. PCA)
```{r, eval=FALSE}
# subset to smaller giotto object
set.seed(1234)
subset_IDs = sample(x = spatIDs(visiumHD_subset, 'hex25'), size = n_25_percent)
temp_gobject = subsetGiotto(
gobject = visiumHD_subset,
spat_unit = 'hex25',
cell_ids = subset_IDs
)
# hierarchical clustering
temp_gobject = doHclust(gobject = temp_gobject,
spat_unit = 'hex25',
k = 8, name = 'sub_hclust',
dim_reduction_to_use = 'pca',
dim_reduction_name = 'pca.projection',
dimensions_to_use = 1:10)