Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[help wanted] FileNotFoundError: [Errno 2] No such file or directory for --sopa resolve baysor #161

Closed
KunHHE opened this issue Dec 3, 2024 · 26 comments

Comments

@KunHHE
Copy link

KunHHE commented Dec 3, 2024

Hi,
Dear community, I finished Baysor segmentation and tried to run --sopa resolve baysor C:/Users/hekun/Downloads/S3R1.zarr --gene-column genes; But an error showed up: FileNotFoundError: [Errno 2] No such file or directory:
'C:\Users\hekun\Downloads\S3R1.zarr\.sopa_cache\baysor_boundaries\0\segmentation_polygons.json'

I know there's one report in #152 report. Yes, I did notice the output for the JSON file is segmentation_polygons_2d, but not segmentation_polygons. I guess renaming all the patch folders won't help? Any suggestions?

Thanks very very much!

@quentinblampey
Copy link
Collaborator

Hello @KunHHE,

This is an issue that should be fixed in the next release (sopa==2.0.0), which should be released very soon. Actually, the version is ready, but I need the new version of spatialdata to be released, which is expected this week.

I'll let you know when it's released!

@KunHHE KunHHE closed this as completed Dec 3, 2024
@KunHHE
Copy link
Author

KunHHE commented Dec 9, 2024

Dear @quentinblampey, can I ask you the new version of sopa==2.0.0? Thank you!!!

@KunHHE KunHHE reopened this Dec 9, 2024
@quentinblampey
Copy link
Collaborator

Hello @KunHHE, it's still not released, I'm waiting for the new version of SpatialData. It should arrive soon, hopefully this week

@KunHHE
Copy link
Author

KunHHE commented Dec 23, 2024

Hi, @quentinblampey, is this a CLI-specific issue? If we switch to snakemake or API, is there no issue at all? Thank you so much! appy holidays!

@quentinblampey
Copy link
Collaborator

Hi @KunHHE, no it's not specific to the CLI, you'll also have the error using the API/pipeline.
Sorry for the delay regarding the release of sopa 2, I hope the new version of SpatialData will soon be released...

@quentinblampey
Copy link
Collaborator

Sorry for the delay!
I’m happy to announce that sopa==2.0.0 is now released :)
Don’t hesitate to check the new documentation, or the migration guide to smoothly get up to date!

@KunHHE
Copy link
Author

KunHHE commented Jan 20, 2025

Wonderful @quentinblampey, I will test it right away!
When run cellpose using CLI mode: I run

  1. set SOPA_PARALLELIZATION_BACKEND=dask
  2. set SOPA_DASK_CLIENT_N_WORKERS=6
  3. sopa segmentation cellpose C:/Users/hekun/Downloads/Slide1316.zarr --diameter 60 --channels Cellbound2 --channels DAPI --flow-threshold 1.5 --cellprob-threshold -5.5 --pretrained-model C:/Users/hekun/.cellpose/models/CP_20250110_104912DRGv2 --min-area 1000 --clip-limit 0.2 --gaussian-sigma 1

It still say "[INFO] (sopa._settings) Using dask backend
[WARNING] (sopa._settings) Each worker has less than 4GB of RAM (2.86GB), which may not be enough. Consider setting sopa.settings.dask_client_kwargs['n_workers'] to use less workers (11 currently)."

It dies:

KilledWorker: Attempted to run task 'write_patch_cells-a14416f5-18b3-4cd0-a3e8-684b9ae65f5a' on 4 different workers, but
all those workers died while running it. The last worker that attempt to run the task was tcp://127.0.0.1:53321.
Inspecting worker logs is often a good next step to diagnose what went wrong. For more information see
https://distributed.dask.org/en/stable/killed.html.

@KunHHE
Copy link
Author

KunHHE commented Jan 20, 2025

Update: I create a .py file running in the CLI:
set SOPA_PARALLELIZATION_BACKEND=dask
python C:/Users/hekun/segmentation.py

The segmentation.py file is:

import sopa

sopa.settings.dask_client_kwargs['n_workers'] = 4
sopa.settings.dask_client_kwargs['memory_limit'] = '16GB'
sopa.segmentation.cellpose(
"C:/Users/hekun/Downloads/Slide1316.zarr",
diameter=60,
channels=["Cellbound2", "DAPI"],
flow_threshold=1.5,
cellprob_threshold=-5.5,
pretrained_model="C:/Users/hekun/.cellpose/models/CP_20250110_104912DRGv2",
min_area=1000,
clip_limit=0.2,
gaussian_sigma=1
)

Error:

(sopa) C:\Users\hekun>python C:/Users/hekun/segmentation.py
C:\Users\hekun\miniconda3\envs\sopa\lib\site-packages\dask\dataframe_init_.py:31: FutureWarning: The legacy Dask DataFrame implementation is deprecated and will be removed in a future version. Set the configuration option dataframe.query-planning to True or None to enable the new Dask Dataframe implementation and silence this warning.
warnings.warn(
Traceback (most recent call last):
File "C:\Users\hekun\segmentation.py", line 8, in
sopa.segmentation.cellpose(
File "C:\Users\hekun\miniconda3\envs\sopa\lib\site-packages\sopa\segmentation\methods_cellpose.py", line 71, in cellpose
custom_staining_based(
File "C:\Users\hekun\miniconda3\envs\sopa\lib\site-packages\sopa\segmentation\methods_custom.py", line 41, in custom_staining_based
temp_dir = get_cache_dir(sdata) / cache_dir_name
File "C:\Users\hekun\miniconda3\envs\sopa\lib\site-packages\sopa\utils\utils.py", line 293, in get_cache_dir
if sdata.is_backed(): # inside the zarr directory
AttributeError: 'str' object has no attribute 'is_backed'

@quentinblampey
Copy link
Collaborator

Hi @KunHHE,

For the first example, in simply means that you have too many workers with too few memory. You can use a different machine, or, as you tried, use a different number of workers, as described below. For the second, you actually try to use the API, whose usage is described in this tutorial. In particular, the API uses directly the SpatialData object as an input, not paths. So, the command sopa.segmentation.cellpose( "C:/Users/hekun/Downloads/Slide1316.zarr", ...) is wrong and should be sopa.segmentation.cellpose(sdata, ...) (I let you read the above tutorial to know more).

So, finally, it should look like so:

import sopa

sopa.settings.parallelization_backend = "dask"
sopa.settings.dask_client_kwargs["n_workers"] = 4

sopa.segmentation.cellpose(sdata, ...) # add the other arguments

... # continue using the API

@KunHHE
Copy link
Author

KunHHE commented Jan 21, 2025

Hi @quentinblampey, so "sopa.settings.parallelization_backend = "dask"; sopa.settings.dask_client_kwargs["n_workers"] = 4" cannot be done in the CLI, so API only? I want to use CLI only I think....

Can I ask you a stupid question, sorry, when saying "API usage", we can run sopa "API" using jupyter notebook?

Thanks!

@KunHHE
Copy link
Author

KunHHE commented Jan 21, 2025

OK, Assuming the API can be run using jupyter notebook, so I tried and looks like it worked out:

Image

Image

Image

@quentinblampey
Copy link
Collaborator

Yes, using the API means you can use a Jupyter Notebook, among others.

so "sopa.settings.parallelization_backend = "dask"; sopa.settings.dask_client_kwargs["n_workers"] = 4" cannot be done in the CLI, so API only?

It can be done using the CLI, but the command is different. For the CLI, you need to set an environment variable, as you did above. The thing is, this only sets dask, but you can't (yet) choose the number of workers via the CLI. I can add this, but, meanwhile, prefer using the API or you can also use a machine with more RAM per worker

@KunHHE
Copy link
Author

KunHHE commented Jan 21, 2025

Thanks very much! @quentinblampey, sorry too many questions. I have to run Baysor using cellpose as prior, so want to double-check, I am using API now, do we still need the .toml for Baysor config?

OR we just need te default setting using:
sopa.make_transcript_patches(sdata, patch_width=1000, prior_shapes_key="cellpose_boundaries")
sopa.segmentation.baysor(sdata, min_area=20)
sopa.aggregate(sdata)

For "resolve", both Cellpose and Baysor running in the sopa, it will always automatically run resolve NOW?

@quentinblampey
Copy link
Collaborator

You can provide a baysor config, as detailed in the tutorial, but you don't have to. If you don't provide a baysor config, it will be inferred. But if you already have a good config, it's easier to use.

Yes, you don't need to run "resolve" using the API.

@KunHHE
Copy link
Author

KunHHE commented Jan 21, 2025

Cool! If I run sopa.segmentation.tissue(sdata) prior to "sopa.make_image_patches(sdata, patch_width=6000, patch_overlap=150)" and "sopa.segmentation.cellpose", will te patches and segmentation run the "region_of_interest" region only?

@quentinblampey
Copy link
Collaborator

Yes, as described in the tutorial, it will only run inside the segmented tissue

@KunHHE
Copy link
Author

KunHHE commented Jan 21, 2025

Thanks so much @quentinblampey, I was stopped by Baysor running. I can run it before using CLI. Now I am using the API.

sopa.segmentation.baysor(sdata, min_area=20), error is "FileNotFoundError: Please install baysor and ensure that either C:\Users\hekun\.julia\bin\baysor executes baysor, or baysor is an existing shell alias for baysor's executable."
I set up "Environment Variable" window and add C:\Users\hekun.julia\bin\ to the path. Then I opened PowerShell: Baysor and it recognized it. ---"PS C:\Users\hekun> baysor"---" baysor v0.7.1".

I freshened the notebook and 'env', but still the sopa cannot find the baysor's executable.
Could you please provide any guidance?

Thanks!

@KunHHE
Copy link
Author

KunHHE commented Jan 21, 2025

Updates: figure it out using
import os
os.environ["PATH"] += os.pathsep + r"C:\Users\hekun.julia\bin"

But I had issues:

  1. on some patches <4000 transcripts are out for segmentation, so I did sopa.make_transcript_patches(
    sdata,
    patch_width=1000,
    patch_overlap=20,
    prior_shapes_key="cellpose_boundaries",
    min_points_per_patch=0
    ), then this way to force it to run all patches? is this correct?

Image

  1. Then run sopa.segmentation.baysor(sdata, min_area=0)

AssertionError: Could not find the segmentation polygons file in C:\Users\hekun\Downloads\Slide1307.zarr.sopa_cache\transcript_patches\0

I can see patches in the folder of.zarr.sopa_cache\transcript_patches, but each folder of patches only have config.toml and transcript.csv, I did not see segmentation polygons file...........

[INFO] (sopa.segmentation.methods._baysor) The Baysor config was not provided, using the following by default:
{'data': {'x': 'x', 'y': 'y', 'gene': 'gene', 'min_molecules_per_gene': 10, 'min_molecules_per_cell': 20, 'force_2d': True}, 'segmentation': {'prior_segmentation_confidence': 0.8}}
[WARNING] (sopa._settings) Running without parallelization backend can be slow. Consider using a backend, e.g. via sopa.settings.parallelization_backend = 'dask', or export SOPA_PARALLELIZATION_BACKEND=dask.

0%| | 0/9 [00:00<?, ?it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 65.55it/s]

Reading transcript-segmentation outputs: 0%| | 0/9 [00:00<?, ?it/s]

AssertionError Traceback (most recent call last)
Cell In[34], line 1
----> 1 sopa.segmentation.baysor(sdata, min_area=0)

File ~\miniconda3\envs\sopa\lib\site-packages\sopa\segmentation\methods_baysor.py:79, in baysor(sdata, config, min_area, delete_cache, recover, force, scale, key_added, patch_index)
76 assert patches_dirs, "Baysor failed on all patches"
78 gene_column = _get_gene_column_argument(config)
---> 79 resolve(sdata, patches_dirs, gene_column, min_area=min_area, key_added=key_added)
81 sdata.attrs[SopaAttrs.BOUNDARIES] = key_added
83 if delete_cache:

File ~\miniconda3\envs\sopa\lib\site-packages\sopa\segmentation_transcripts.py:43, in resolve(sdata, patches_dirs, gene_column, min_area, key_added)
40 if min_area > 0:
41 log.info(f"Cells whose area is less than {min_area} microns^2 will be removed")
---> 43 patches_cells, adatas = _read_all_segmented_patches(patches_dirs, min_area)
44 geo_df, cells_indices, new_ids = _resolve_patches(patches_cells, adatas)
46 points_key = sdata[SopaKeys.TRANSCRIPTS_PATCHES][SopaKeys.POINTS_KEY].iloc[0]

File ~\miniconda3\envs\sopa\lib\site-packages\sopa\segmentation_transcripts.py:142, in _read_all_segmented_patches(patches_dirs, min_area)
138 def _read_all_segmented_patches(
139 patches_dirs: list[str],
140 min_area: float = 0,
141 ) -> tuple[list[list[Polygon]], list[AnnData]]:
--> 142 outs = [
143 _read_one_segmented_patch(path, min_area)
144 for path in tqdm(patches_dirs, desc="Reading transcript-segmentation outputs")
145 ]
147 patches_cells, adatas = zip(*outs)
149 return patches_cells, adatas

File ~\miniconda3\envs\sopa\lib\site-packages\sopa\segmentation_transcripts.py:143, in (.0)
138 def _read_all_segmented_patches(
139 patches_dirs: list[str],
140 min_area: float = 0,
141 ) -> tuple[list[list[Polygon]], list[AnnData]]:
142 outs = [
--> 143 _read_one_segmented_patch(path, min_area)
144 for path in tqdm(patches_dirs, desc="Reading transcript-segmentation outputs")
145 ]
147 patches_cells, adatas = zip(*outs)
149 return patches_cells, adatas

File ~\miniconda3\envs\sopa\lib\site-packages\sopa\segmentation_transcripts.py:93, in _read_one_segmented_patch(directory, min_area, min_vertices)
89 def _read_one_segmented_patch(
90 directory: str, min_area: float = 0, min_vertices: int = 4
91 ) -> tuple[list[Polygon], AnnData]:
92 directory: Path = Path(directory)
---> 93 id_as_string, polygon_file = _find_polygon_file(directory)
95 loom_file = directory / "segmentation_counts.loom"
96 if loom_file.exists():

File ~\miniconda3\envs\sopa\lib\site-packages\sopa\segmentation_transcripts.py:134, in _find_polygon_file(directory)
132 return False, old_baysor_path
133 new_baysor_path = directory / "segmentation_polygons_2d.json"
--> 134 assert new_baysor_path.exists(), f"Could not find the segmentation polygons file in {directory}"
135 return True, new_baysor_path

AssertionError: Could not find the segmentation polygons file in C:\Users\hekun\Downloads\Slide1307.zarr.sopa_cache\transcript_patches\0

@quentinblampey
Copy link
Collaborator

quentinblampey commented Jan 22, 2025

You can use min_points_per_patch=0, which will make Baysor run even on patches with a low amount of transcripts
But the way to force baysor running is by using sopa.segmentation.baysor(sdata, min_area=0, force=True) (see the force argument)

Please, next time, can you try to send a way for me to reproduce your issue? I.e., using the toy dataset for instance.
Also, the documentation should answer already most of your questions!

@KunHHE
Copy link
Author

KunHHE commented Jan 22, 2025

Hi @quentinblampey Thanks for suggestions, I did use Toy data for testing for Baysor, and it phenocopys the issue that there's no segmentation polygons file in the .sopa_cache\transcript_patches. I share the notebook for your reference via OneDrive link, hope it works!

https://esbc22-my.sharepoint.com/:u:/g/personal/kun_he_omapix_com/EaRLYnQhzoVJlEATvb_xI5MBM5bdwE8QFMEAvmbYTXabxQ?e=AsP4xr

@quentinblampey
Copy link
Collaborator

The notebook ran successfully for me.
Are you sure you installed baysor correctly? Can you try to run it on the patch directory without sopa (i.e., run directly baysor itself)

@KunHHE
Copy link
Author

KunHHE commented Jan 22, 2025

Yeah, I re-installed the BAYSOR dependencies. using CLI:
installe Julia and GCC;
juliaup add 1.10;
juliaup default 1.10;
julia -e "using Pkg; Pkg.add(PackageSpec(url="https://github.com/kharchenkolab/Baysor.git")); Pkg.build()"

Then in the CLI I test: 'baysor segfree -c C:/Users/hekun/sopa/workflow/config/merscope/merscope.toml C:/Users/hekun/Downloads/Slide1307/detected_transcripts.csv', looks it is working in the baysor itself?
But interesting and weird thing is when I used: (sopa)C:\Users\hekun>baysor -v; I got julia version 1.10.7..... This is not related to sopa but want to have a update.

Image

@quentinblampey
Copy link
Collaborator

Thanks for trying this! And do you have the results of Baysor somewhere?
Other question: when running baysor with Sopa, does it look like something is running, or is it super quick until the error?

@KunHHE
Copy link
Author

KunHHE commented Jan 23, 2025

Hi @quentinblampey. MY ALL the process is under windows..

Run baysor run -c C:/Users/hekun/sopa/workflow/config/merscope/merscope.toml C:/Users/hekun/Downloads/Slide1307.zarr/.sopa_cache/transcript_patches/0/transcripts.csv -o C:/Users/hekun/Downloads/Slide1307

I got outputs: https://esbc22-my.sharepoint.com/:u:/g/personal/kun_he_omapix_com/EUkDYiuT_LRPtcoAjI50XVUBU8M87fef_-sc7EnNimW4oA?e=ZgV4iy

I did sopa[baysor] install, then installed julia 1.10 and open julia;
Then: using Pkg
Pkg.add(PackageSpec(url="https://github.com/kharchenkolab/Baysor.git"))
Pkg.build()
Then I go to sopa env using miniconda3 prompt:
pip install julia;
import julia
julia.install();

I did not see error during those installation process.

Then tried to go back to sopa notebook using Toy data, run baysor, it runs the first stage but failed for patches call:

sopa.segmentation.baysor(sdata,
config=None,
min_area=0,
delete_cache=True,
recover=False,
force=True,
scale=None,
key_added='BAYSOR_BOUNDARIES',
patch_index=None)
--------------detailed track:
[INFO] (sopa.segmentation.methods._baysor) The Baysor config was not provided, using the following by default:
{'data': {'x': 'x', 'y': 'y', 'gene': 'genes', 'min_molecules_per_gene': 10, 'min_molecules_per_cell': 20, 'force_2d': True}, 'segmentation': {'prior_segmentation_confidence': 0.8}}

100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 50.98it/s]

AssertionError Traceback (most recent call last)
Cell In[25], line 1
----> 1 sopa.segmentation.baysor(sdata,
2 config=None,
3 min_area=0,
4 delete_cache=True,
5 recover=False,
6 force=True,
7 scale=None,
8 key_added='BAYSOR_BOUNDARIES',
9 patch_index=None)

File ~\miniconda3\envs\sopa\lib\site-packages\sopa\segmentation\methods_baysor.py:76, in baysor(sdata, config, min_area, delete_cache, recover, force, scale, key_added, patch_index)
74 if force:
75 patches_dirs = [patch_dir for patch_dir in patches_dirs if (patch_dir / "segmentation_counts.loom").exists()]
---> 76 assert patches_dirs, "Baysor failed on all patches"
78 gene_column = _get_gene_column_argument(config)
79 resolve(sdata, patches_dirs, gene_column, min_area=min_area, key_added=key_added)

AssertionError: Baysor failed on all patches

@KunHHE
Copy link
Author

KunHHE commented Jan 23, 2025

Updates: I run the baysor patch-by-patch one by one manually in the CLI successfully using:
cd C:/Users/hekun/Downloads/Slide1307.zarr/.sopa_cache/transcript_patches/8;
Then: C:/Users/hekun/.julia/bin/baysor run --polygon-format FeatureCollection -c /Users/hekun/sopa/workflow/config/merscope/merscope.toml transcripts.csv.

In the API, after run :sopa.make_transcript_patches(sdata,
patch_width=500,
patch_overlap=20,
points_key=None,
prior_shapes_key="cellpose_boundaries",
unassigned_value=None,
min_points_per_patch=0,
write_cells_centroids=False,
key_added=None), it generate transcript_patches in the .sopa_cache folder,
Then we run: sopa.segmentation.baysor(sdata,
config=None,
min_area=0,
delete_cache=True,
recover=False,
force=True,
key_added='baysor_boundaries',
patch_index=None
), then looks like the program is looking segmentation polygons file (segmentation_polygons_2d?), but the segmentation_polygons_2d file for each patch is the outputs?

@KunHHE
Copy link
Author

KunHHE commented Jan 25, 2025

Hi @quentinblampey, I tested using ubuntu, and it worked out. looks like there are some issue running in windows?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants