Skip to content

Commit

Permalink
misc doc + prepare staining based tissue segmentation
Browse files Browse the repository at this point in the history
  • Loading branch information
quentinblampey committed Oct 16, 2024
1 parent 76c3f3d commit ea94902
Show file tree
Hide file tree
Showing 8 changed files with 121 additions and 78 deletions.
40 changes: 21 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

Built on top of [SpatialData](https://github.com/scverse/spatialdata), Sopa enables processing and analyses of spatial omics data with single-cell resolution (spatial transcriptomics or multiplex imaging data) using a standard data structure and output. We currently support the following technologies: Xenium, Visium HD, MERSCOPE, CosMX, PhenoCycler, MACSima, Hyperion. Sopa was designed for generability and low memory consumption on large images (scales to `1TB+` images).

The pipeline outputs contain: (i) Xenium Explorer files for interactive visualization, (ii) an HTML report for quick quality controls, and (iii) a SpatialData `.zarr` directory for further analyses.
🎉 `sopa==2.0.0` is out! It introduces many new API features; check [our migration guide](https://github.com/gustaveroussy/sopa/discussions/138) to smoothly update your code base.

# Documentation

Expand All @@ -32,7 +32,7 @@ The following illustration describes the main steps of `sopa`:
# Installation

### PyPI installation
Sopa can be installed via `PyPI` on all operating systems. The preferred Python version is `python==3.10`, but we also support `3.9` to `3.11`. On a new environment, run the following command:
Sopa can be installed via `PyPI` on all operating systems, with the only requirement being Python (`>=3.9` and `<=3.11`). On a new environment, run the following command:
```
pip install sopa
```
Expand All @@ -42,21 +42,35 @@ To install extras (for example, if you want to use `snakemake`/`cellpose`/`bayso
pip install 'sopa[snakemake,cellpose,baysor,tangram]'
```

Important: even though `pip install 'sopa[baysor]'` will install some dependencies related to baysor, you still have to install the `baysor` command line (see the [official repository](https://github.com/kharchenkolab/Baysor)) if you want to use it.
**Important**: even though `pip install 'sopa[baysor]'` will install some dependencies related to baysor, you still have to install the `baysor` command line (see the [official repository](https://github.com/kharchenkolab/Baysor)) if you want to use it.

### Other installation modes

You can clone the repository and run one of these command lines at the root of `sopa`:
```
pip install -e . # dev mode installation
```sh
pip install -e . # dev mode installation
poetry install # poetry installation
```

# Features
Sopa comes in three different flavours, each corresponding to a different use case:
- `API`: use directly `sopa` as a Python package for complete flexibility and customization
- `Snakemake pipeline`: choose a config, and run our pipeline on your spatial data in a couple of minutes
- `CLI`: use our command-line-interface for prototyping quickly your own pipeline
- `API`: use directly `sopa` as a Python package for complete flexibility and customization

### API

Below is an example of a minimal API usage. For a complete API description, please refer to the [documentation](https://gustaveroussy.github.io/sopa).

```python
import sopa

sdata = sopa.io.xenium("path/to/data") # reading Xenium data

sopa.make_image_patches(sdata) # creating overlapping patches
sopa.segmentation.cellpose(sdata, "DAPI", diameter=30) # running cellpose segmentation
sopa.aggregate(sdata) # counting the transcripts inside the cells
```

### Snakemake pipeline

Expand All @@ -71,7 +85,7 @@ For more details on `snakemake` configuration and how to properly setup your env

### CLI

Below are examples of commands that can be run with the `sopa` CLI:
Below are examples of commands that can be run with the `sopa` CLI. For a complete description of the CLI, please refer to the [documentation](https://gustaveroussy.github.io/sopa/cli).

```bash
> sopa --help # show command names and arguments
Expand All @@ -83,18 +97,6 @@ Below are examples of commands that can be run with the `sopa` CLI:
> sopa explorer write merscope_directory.zarr # convert for interactive vizualisation
```

For a complete description of the CLI, please refer to the [documentation](https://gustaveroussy.github.io/sopa/cli).

### API

```python
import sopa

# use the 'sopa' python package
```

For a complete API description, please refer to the [documentation](https://gustaveroussy.github.io/sopa).

# Cite us
Our article is published in [Nature Communications](https://www.nature.com/articles/s41467-024-48981-z). You can cite our paper as below:

Expand Down
14 changes: 7 additions & 7 deletions docs/api/misc.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,19 +25,19 @@ sopa.settings.parallelization_backend = None # no backend (i.e., sequential)

## Xenium Explorer

::: sopa.io.write_xenium_explorer
::: sopa.io.explorer.write

::: sopa.io.align
::: sopa.io.explorer.align

::: sopa.io.add_xenium_explorer_selection
::: sopa.io.explorer.add_explorer_selection

::: sopa.io.int_cell_id
::: sopa.io.explorer.int_cell_id

::: sopa.io.str_cell_id
::: sopa.io.explorer.str_cell_id

::: sopa.io.write_image
::: sopa.io.explorer.write_image

::: sopa.io.save_column_csv
::: sopa.io.explorer.save_column_csv

## Report

Expand Down
7 changes: 3 additions & 4 deletions docs/api/readers.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
# Readers

!!! notes
Due to many updates in the data format provided by the different companies, you might have issues loading your data. In this case, consider [opening an issue](https://github.com/gustaveroussy/sopa/issues) detailing the version of the machine you used and the error log, as well as an example of file names that you are trying to read.
The readers below are used to read your raw spatial data into a `SpatialData` object. Choose the right function below, according to your technology of interest.

!!! notes "Related to `spatialdata-io`"
A library called [`spatialdata-io`](https://spatialdata.scverse.org/projects/io/en/latest/) already contains a lot of readers. Here, we add minor functionnalities on top of the readers from `spatialdata-io`, and add a few others.
!!! warning
Due to many updates in the data format provided by the different companies, you might have issues loading your data. In this case, consider [opening an issue](https://github.com/gustaveroussy/sopa/issues) detailing the version of the machine you used and the error log, as well as an example of file names that you are trying to read.

::: sopa.io.xenium

Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

Built on top of [SpatialData](https://github.com/scverse/spatialdata), Sopa enables processing and analyses of image-based spatial omics using a standard data structure and output. We currently support the following technologies: Xenium, MERSCOPE, CosMX, PhenoCycler, MACSIMA, Hyperion. Sopa was designed for generability and low memory consumption on large images (scales to `1TB+` images).

The pipeline outputs contain: (i) Xenium Explorer files for interactive visualization, (ii) an HTML report for quick quality controls, and (iii) a SpatialData `.zarr` directory for further analyses.
🎉 `sopa==2.0.0` is out! It introduces many new API features; check [our migration guide](https://github.com/gustaveroussy/sopa/discussions/138) to smoothly update your code base.

## Overview

Expand Down
2 changes: 1 addition & 1 deletion sopa/patches/cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ def cluster_embeddings(
key_added: str = "cluster",
**method_kwargs: str,
) -> gpd.GeoDataFrame:
"""Cluster the patches embeddings using a clustering method
"""Create clusters of the patches embeddings (obtained from [sopa.patches.compute_embeddings][]).
Args:
sdata: A `SpatialData` object
Expand Down
4 changes: 2 additions & 2 deletions sopa/patches/infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ def compute_embeddings(
batch_size: int = 32,
device: str = None,
) -> DataArray:
"""Create an image made of patch based predictions of an image (useful for WSI images notably).
"""It creates patches, runs a computer vision model on each patch, and store the embeddings of each all patches as an image. This is mostly useful for WSI images.
!!! info
The image will be saved into the `SpatialData` object with the key `sopa_{model_name}` (see the argument below).
Expand All @@ -54,7 +54,7 @@ def compute_embeddings(
device: Device used for the computer vision model.
Returns:
The `DataArray` of shape `(C,Y,X)` containing the model predictions (they are also saved in the `SpatialData` object).
The `DataArray` of shape `(C,Y,X)` containing the model predictions (also added to the `SpatialData` object).
"""
try:
import torch
Expand Down
18 changes: 17 additions & 1 deletion sopa/patches/patches.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,14 @@
def make_image_patches(
sdata: SpatialData, patch_width: int = 2000, patch_overlap: int = 50, image_key: str | None = None
):
"""Create overlapping patches on an image. This can be used for image-based segmentation methods such as Cellpose, which will run on each patch.
Args:
sdata: A `SpatialData` object.
patch_width: Width of the patches, in pixels.
patch_overlap: Number of pixels of overlap between patches.
image_key: Optional key of the image on which the patches will be made. If not provided, it is found automatically.
"""
image_key, _ = get_spatial_image(sdata, key=image_key, return_key=True)
patches = Patches2D(sdata, image_key, patch_width=patch_width, patch_overlap=patch_overlap)

Expand All @@ -45,7 +53,15 @@ def make_transcript_patches(
patch_overlap: int = 50,
points_key: str | None = None,
**kwargs: int,
) -> list[int]:
):
"""Create overlapping patches on a transcripts dataframe, and save it in a cache. This can be used for trancript-based segmentation methods such as Baysor.
Args:
sdata: A `SpatialData` object.
patch_width: Width of the patches, in microns.
patch_overlap: Number of microns of overlap between patches.
points_key: Optional key of the points on which the patches will be made. If not provided, it is found automatically.
"""
points_key, _ = get_spatial_element(
sdata.points, key=points_key or sdata.attrs.get(SopaAttrs.TRANSCRIPTS), return_key=True
)
Expand Down
112 changes: 69 additions & 43 deletions sopa/segmentation/tissue.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,7 @@

log = logging.getLogger(__name__)


def hsv_otsu(*args, **kwargs):
warnings.warn(
"This function is deprecated and will be removed in late 2024. Use `sopa.tissue_segmentation` instead.",
DeprecationWarning,
stacklevel=2,
)
tissue_segmentation(*args, **kwargs)
AVAILABLE_MODES = ["hsv_otsu", "staining"]


def tissue_segmentation(
Expand All @@ -36,6 +29,7 @@ def tissue_segmentation(
open_k: int = 5,
close_k: int = 5,
drop_threshold: int = 0.01,
mode: str | None = None,
key_added: str = SopaKeys.ROI,
):
"""Perform WSI tissue segmentation. The resulting regions-of-interests (ROI) are saved as shapes.
Expand All @@ -57,8 +51,11 @@ def tissue_segmentation(
open_k: The kernel size of the morphological openning operation
close_k: The kernel size of the morphological closing operation
drop_threshold: Segments that cover less area than `drop_threshold`*100% of the number of pixels of the image will be removed
mode: Two modes are available: `hsv_otsu` (for H&E data) and `staining`. By default, `hsv_otsu` is used only if there are exactly 3 channels.
key_added: Name of the spatial element that will be added, containing the segmented tissue polygons.
"""
assert mode is None or mode in AVAILABLE_MODES, f"`mode` argument should be one of {AVAILABLE_MODES}"

if key_added in sdata.shapes:
log.warning(f"sdata['{key_added}'] was already existing, but tissue segmentation is run on top")

Expand All @@ -72,50 +69,79 @@ def tissue_segmentation(
level_keys = list(image.keys())
image: DataArray = next(iter(image[level_keys[level]].values()))

geo_df = _get_polygons(image, blur_k, open_k, close_k, drop_threshold)
geo_df = TissueSegmentation(image, blur_k, open_k, close_k, drop_threshold).get_polygons(mode)

assert len(
geo_df
), "No polygon has been found after tissue segmentation. Check that there is some tissue in the image, or consider updating the segmentation parameters."
if not len(geo_df):
log.warning(
"No polygon has been found after tissue segmentation. "
"Check that there is some tissue in the image, or consider updating the function parameters."
)
return

geo_df = to_valid_polygons(geo_df)
geo_df = ShapesModel.parse(geo_df, transformations=get_transformation(image, get_all=True).copy())

add_spatial_element(sdata, key_added, geo_df)


def _get_polygons(image: DataArray, blur_k: int, open_k: int, close_k: int, drop_threshold: int) -> gpd.GeoDataFrame:
import cv2
class TissueSegmentation:
def __init__(self, image: DataArray, blur_k: int, open_k: int, close_k: int, drop_threshold: int):
self.image = image
self.blur_k = blur_k
self.open_k = open_k
self.close_k = close_k
self.drop_threshold = drop_threshold

thumbnail = np.array(image.transpose("y", "x", "c"))
def _get_default_mode(self) -> str:
return "hsv_otsu" if self.image.sizes["c"] == 3 else "staining"

assert thumbnail.shape[2] == 3, "The image should be in RGB color space"
def get_polygons(self, mode: str | None) -> gpd.GeoDataFrame:
return getattr(self, mode or self._get_default_mode())()

if thumbnail.shape[0] * thumbnail.shape[1] > 1e8:
log.warning(
"Tissue segmentation is computationally expensive for large images. Consider using a smaller image, or set the `level` parameter."
)
def hsv_otsu(self) -> gpd.GeoDataFrame:
import cv2

thumbnail = np.array(self.image.transpose("y", "x", "c"))

assert thumbnail.shape[2] == 3, "The image should be in RGB color space"

thumbnail_hsv = cv2.cvtColor(thumbnail, cv2.COLOR_RGB2HSV)
thumbnail_hsv_blurred = cv2.medianBlur(thumbnail_hsv[:, :, 1], blur_k)
_, mask = cv2.threshold(thumbnail_hsv_blurred, 0, 255, cv2.THRESH_OTSU + cv2.THRESH_BINARY)

mask_open = cv2.morphologyEx(mask, cv2.MORPH_OPEN, np.ones((open_k, open_k), np.uint8))
mask_open_close = cv2.morphologyEx(mask_open, cv2.MORPH_CLOSE, np.ones((close_k, close_k), np.uint8))

num_labels, labels, stats, _ = cv2.connectedComponentsWithStats(mask_open_close, 4, cv2.CV_32S)

contours = []
for i in range(1, num_labels):
if stats[i, 4] > drop_threshold * np.prod(mask_open_close.shape):
cc = cv2.findContours(
np.array(labels == i, dtype="uint8"),
cv2.RETR_TREE,
cv2.CHAIN_APPROX_NONE,
)[
0
][0]
c_closed = np.array(list(cc) + [cc[0]])
contours.extend([c_closed.squeeze()])

return gpd.GeoDataFrame(geometry=[Polygon(contour) for contour in contours])
if thumbnail.shape[0] * thumbnail.shape[1] > 1e8:
log.warning(
"Tissue segmentation is computationally expensive for large images. Consider using a smaller image, or set the `level` parameter."
)

thumbnail_hsv = cv2.cvtColor(thumbnail, cv2.COLOR_RGB2HSV)
thumbnail_hsv_blurred = cv2.medianBlur(thumbnail_hsv[:, :, 1], self.blur_k)
_, mask = cv2.threshold(thumbnail_hsv_blurred, 0, 255, cv2.THRESH_OTSU + cv2.THRESH_BINARY)

mask_open = cv2.morphologyEx(mask, cv2.MORPH_OPEN, np.ones((self.open_k, self.open_k), np.uint8))
mask_open_close = cv2.morphologyEx(mask_open, cv2.MORPH_CLOSE, np.ones((self.close_k, self.close_k), np.uint8))

num_labels, labels, stats, _ = cv2.connectedComponentsWithStats(mask_open_close, 4, cv2.CV_32S)

contours = []
for i in range(1, num_labels):
if stats[i, 4] > self.drop_threshold * np.prod(mask_open_close.shape):
cc = cv2.findContours(
np.array(labels == i, dtype="uint8"),
cv2.RETR_TREE,
cv2.CHAIN_APPROX_NONE,
)[
0
][0]
c_closed = np.array(list(cc) + [cc[0]])
contours.extend([c_closed.squeeze()])

return gpd.GeoDataFrame(geometry=[Polygon(contour) for contour in contours])

def staining(self) -> gpd.GeoDataFrame:
raise NotImplementedError


def hsv_otsu(*args, **kwargs):
warnings.warn(
"This function is deprecated and will be removed in late 2024. Use `sopa.tissue_segmentation` instead.",
DeprecationWarning,
stacklevel=2,
)
tissue_segmentation(*args, **kwargs)

0 comments on commit ea94902

Please sign in to comment.