Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue resolving conflicts with resolve() for Baysor segmentation #152

Closed
NJNataren opened this issue Nov 8, 2024 · 5 comments
Closed

Issue resolving conflicts with resolve() for Baysor segmentation #152

NJNataren opened this issue Nov 8, 2024 · 5 comments

Comments

@NJNataren
Copy link

Hi, first of all, thanks for developing this amazing looking package, I'm really excited to start using it!

I am having an issue using the API version. I am working on a conda environment (python=3.10) which I am running as a kernel in Jupyter labs on another environment.

I am just trying to familiarise myself with the workflow using the toy dataset and following your code in the API tutorial (https://gustaveroussy.github.io/sopa/tutorials/api_usage/)

When i tried running Baysor on the patches I initally used your code

for patch_index in valid_indices: command = f""" cd {baysor_temp_dir}/{patch_index} {baysor_executable_path} run --save-polygons GeoJSON -c config.toml transcripts.csv """ subprocess.run(command, shell=True)

But I received the cannot find --save-polygons error, so I then switched to --polygon-format=FeatureCollection

for patch_index in valid_indices: command = f""" cd {baysor_temp_dir}/{patch_index} {baysor_executable_path} run --polygon-format=FeatureCollection -c config.toml transcripts.csv """ subprocess.run(command, shell=True)

This completed successfully. When I then try to resolve conflicts using your code

from sopa.segmentation.transcripts import resolve resolve(sdata, baysor_temp_dir, gene_column, min_area=10)

I get the error message below

`[INFO] (sopa.segmentation.transcripts) Cells whose area is less than 10 microns^2 will be removed

Reading transcript-segmentation outputs: 0%| | 0/1 [00:00<?, ?it/s]

FileNotFoundError Traceback (most recent call last)
Cell In[15], line 3
1 from sopa.segmentation.transcripts import resolve
----> 3 resolve(sdata, baysor_temp_dir, gene_column, min_area=10)

File ~/miniforge3/envs/xenium/lib/python3.10/site-packages/sopa/segmentation/transcripts.py:45, in resolve(sdata, temp_dir, gene_column, patches_dirs, min_area, shapes_key)
42 if min_area > 0:
43 log.info(f"Cells whose area is less than {min_area} microns^2 will be removed")
---> 45 patches_cells, adatas = _read_all_segmented_patches(temp_dir, min_area, patches_dirs)
46 geo_df, cells_indices, new_ids = _resolve_patches(patches_cells, adatas)
48 image_key, _ = get_spatial_image(sdata, return_key=True)

File ~/miniforge3/envs/xenium/lib/python3.10/site-packages/sopa/segmentation/transcripts.py:140, in _read_all_segmented_patches(temp_dir, min_area, patches_dirs)
137 if patches_dirs is None or not len(patches_dirs):
138 patches_dirs = [subdir for subdir in Path(temp_dir).iterdir() if subdir.is_dir()]
--> 140 outs = [
141 _read_one_segmented_patch(path, min_area)
142 for path in tqdm(patches_dirs, desc="Reading transcript-segmentation outputs")
143 ]
145 patches_cells, adatas = zip(*outs)
147 return patches_cells, adatas

File ~/miniforge3/envs/xenium/lib/python3.10/site-packages/sopa/segmentation/transcripts.py:141, in (.0)
137 if patches_dirs is None or not len(patches_dirs):
138 patches_dirs = [subdir for subdir in Path(temp_dir).iterdir() if subdir.is_dir()]
140 outs = [
--> 141 _read_one_segmented_patch(path, min_area)
142 for path in tqdm(patches_dirs, desc="Reading transcript-segmentation outputs")
143 ]
145 patches_cells, adatas = zip(*outs)
147 return patches_cells, adatas

File ~/miniforge3/envs/xenium/lib/python3.10/site-packages/sopa/segmentation/transcripts.py:112, in _read_one_segmented_patch(directory, min_area, min_vertices)
109 cells_num = pd.Series(adata.obs["CellID"].astype(int), index=adata.obs_names)
110 del adata.obs["CellID"]
--> 112 with open(directory / "segmentation_polygons.json") as f:
113 polygons_dict = json.load(f)
114 polygons_dict = {c["cell"]: c for c in polygons_dict["geometries"]}

FileNotFoundError: [Errno 2] No such file or directory: 'tuto.zarr/.sopa_cache/baysor/0/segmentation_polygons.json'`

I think this is caused because the output is called segmentation_polygons_2d.json not segmentation_polygons.json as expected by your package. However, if I then manually change the name to segmentation_polygons.json I get the following error message

`[INFO] (sopa.segmentation.transcripts) Cells whose area is less than 10 microns^2 will be removed

Reading transcript-segmentation outputs: 0%| | 0/1 [00:00<?, ?it/s]

KeyError Traceback (most recent call last)
Cell In[17], line 3
1 from sopa.segmentation.transcripts import resolve
----> 3 resolve(sdata, baysor_temp_dir, gene_column, min_area=10)

File ~/miniforge3/envs/xenium/lib/python3.10/site-packages/sopa/segmentation/transcripts.py:45, in resolve(sdata, temp_dir, gene_column, patches_dirs, min_area, shapes_key)
42 if min_area > 0:
43 log.info(f"Cells whose area is less than {min_area} microns^2 will be removed")
---> 45 patches_cells, adatas = _read_all_segmented_patches(temp_dir, min_area, patches_dirs)
46 geo_df, cells_indices, new_ids = _resolve_patches(patches_cells, adatas)
48 image_key, _ = get_spatial_image(sdata, return_key=True)

File ~/miniforge3/envs/xenium/lib/python3.10/site-packages/sopa/segmentation/transcripts.py:140, in _read_all_segmented_patches(temp_dir, min_area, patches_dirs)
137 if patches_dirs is None or not len(patches_dirs):
138 patches_dirs = [subdir for subdir in Path(temp_dir).iterdir() if subdir.is_dir()]
--> 140 outs = [
141 _read_one_segmented_patch(path, min_area)
142 for path in tqdm(patches_dirs, desc="Reading transcript-segmentation outputs")
143 ]
145 patches_cells, adatas = zip(*outs)
147 return patches_cells, adatas

File ~/miniforge3/envs/xenium/lib/python3.10/site-packages/sopa/segmentation/transcripts.py:141, in (.0)
137 if patches_dirs is None or not len(patches_dirs):
138 patches_dirs = [subdir for subdir in Path(temp_dir).iterdir() if subdir.is_dir()]
140 outs = [
--> 141 _read_one_segmented_patch(path, min_area)
142 for path in tqdm(patches_dirs, desc="Reading transcript-segmentation outputs")
143 ]
145 patches_cells, adatas = zip(*outs)
147 return patches_cells, adatas

File ~/miniforge3/envs/xenium/lib/python3.10/site-packages/sopa/segmentation/transcripts.py:114, in _read_one_segmented_patch(directory, min_area, min_vertices)
112 with open(directory / "segmentation_polygons.json") as f:
113 polygons_dict = json.load(f)
--> 114 polygons_dict = {c["cell"]: c for c in polygons_dict["geometries"]}
116 cells_num = cells_num[cells_num.map(lambda num: len(polygons_dict[num]["coordinates"][0]) >= min_vertices)]
118 gdf = gpd.GeoDataFrame(index=cells_num.index, geometry=[shape(polygons_dict[cell_num]) for cell_num in cells_num])

KeyError: 'geometries'`

I am likely missing something obvious, but I am not sure how to proceed. I want to eventually use this API to process some Xenium Spatial data, but I would like to know I can get this working with the toy test set. Any help would be appreciated!

@quentinblampey
Copy link
Collaborator

Hello @NJNataren, the error you have is very likely related to baysor 0.7.0 (can you confirm you have this version?) which is a recent version of baysor that introduced many breaking changes.

Recently, we updated the CLI and the Snakemake pipeline to support this new version, but not the API tutorial. Actually, it's also fixed in the API, but not released yet: I expect to release sopa==2.0.0 in about two weeks!

Since sopa==2.0.0 will introduce many new features and simplify the API, I recommend you to wait for its release so that you can directly start familiarizing with the new features!

@NJNataren
Copy link
Author

Thanks @quentinblampey, that is indeed the Baysor version I am using!
That for confirming, I wondered if the issue was with Baysor given that I had to use --polygon-format=GeometryCollection .
I have some samples I need to start looking at ASAP and this package is perfect for that, so I am using the CLI pipeline at right now, but I will definitely be using the API as soon as it comes out!
Thank you for getting back to me! :)

@quentinblampey
Copy link
Collaborator

Hello @NJNataren, if you need results really soon, maybe you can downgrade baysor to baysor<0.7.0? This way it should fix the error, even if you can't wait for the next Sopa release :)

NB: using --polygon-format=GeometryCollection will create files that are different from the "old version" files, and they are not readable currently with Sopa. I think it should work on the master branch though, but it might be not very stable

@NJNataren
Copy link
Author

Hello @NJNataren, if you need results really soon, maybe you can downgrade baysor to baysor<0.7.0? This way it should fix the error, even if you can't wait for the next Sopa release :)

NB: using --polygon-format=GeometryCollection will create files that are different from the "old version" files, and they are not readable currently with Sopa. I think it should work on the master branch though, but it might be not very stable

Great suggestion, thanks!

@quentinblampey
Copy link
Collaborator

Hello @NJNataren, this should be fixed in the new sopa==2.0.0 version, please let me know!

Don’t hesitate to check the new documentation, or the migration guide to smoothly get up to date!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants