Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sopa read: Could not infer dataset_id #146

Open
GBeattie opened this issue Oct 31, 2024 · 9 comments
Open

Sopa read: Could not infer dataset_id #146

GBeattie opened this issue Oct 31, 2024 · 9 comments

Comments

@GBeattie
Copy link

Hi, trying to read a cosmx run using this CLI command:

sopa read --technology "cosmx" /Sam_reseg_30_10_2024_15_42_51_118

Here is the full error:


╭────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────────────────────────╮
│ /lustre/scratch/scratch/regmgbe/Projects/Sopa/sopa_env/lib/python3.9/site-packages/sopa/cli/app.py:94 in read                                                                                 │
│                                                                                                                                                                                               │
│    91 │   │   io, technology                                                                                                                                                                  │
│    92 │   ), f"Technology {technology} unknown. Currently available: xenium, merscope, cosmx,                                                                                                 │
│    93 │                                                                                                                                                                                       │
│ ❱  94 │   sdata = getattr(io, technology)(data_path, **kwargs)                                                                                                                                │
│    95 │   io.write_standardized(sdata, sdata_path, delete_table=True)                                                                                                                         │
│    96                                                                                                                                                                                         │
│    97                                                                                                                                                                                         │
│                                                                                                                                                                                               │
│ ╭──────────────────────────────────────────────────────────────────── locals ────────────────────────────────────────────────────────────────────╮                                            │
│ │ config_path = None                                                                                                                             │                                            │
│ │   data_path = '/Sam_reseg_30_10_2024_15_42_51_118'                                                                                             │                                            │
│ │          io = <module 'sopa.io' from '/lustre/scratch/scratch/regmgbe/Projects/Sopa/sopa_env/lib/python3.9/site-packages/sopa/io/__init__.py'> │                                            │
│ │      kwargs = {}                                                                                                                               │                                            │
│ │  sdata_path = PosixPath('/Sam_reseg_30_10_2024_15_42_51_118.zarr')                                                                             │                                            │
│ │  technology = 'cosmx'                                                                                                                          │                                            │
│ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯                                            │
│                                                                                                                                                                                               │
│ /lustre/scratch/scratch/regmgbe/Projects/Sopa/sopa_env/lib/python3.9/site-packages/sopa/io/reader/cosmx.py:55 in cosmx                                                                        │
│                                                                                                                                                                                               │
│    52 │   path = Path(path)                                                                    ╭───────────────────────────────────── locals ─────────────────────────────────────╮           │
│    53 │   image_models_kwargs, imread_kwargs = _default_image_kwargs(image_models_kwargs, imre │          dataset_id = None                                                       │           │
│    54 │                                                                                        │                 fov = None                                                       │           │
│ ❱  55 │   dataset_id = _infer_dataset_id(path, dataset_id)                                     │ image_models_kwargs = {'chunks': (1, 1024, 1024), 'scale_factors': [2, 2, 2, 2]} │           │
│    56 │   fov_locs = _read_fov_locs(path, dataset_id)                                          │       imread_kwargs = {}                                                         │           │
│    57 │   fov_id, fov = _check_fov_id(fov)                                                     │                path = PosixPath('/Sam_reseg_30_10_2024_15_42_51_118')            │           │
│    58                                                                                          │       read_proteins = False                                                      │           │
│                                                                                                ╰──────────────────────────────────────────────────────────────────────────────────╯           │
│                                                                                                                                                                                               │
│ /lustre/scratch/scratch/regmgbe/Projects/Sopa/sopa_env/lib/python3.9/site-packages/sopa/io/reader/cosmx.py:128 in _infer_dataset_id                                                           │
│                                                                                                                                                                                               │
│   125 │   │   │   if found:                                                                    ╭──────────────────────────── locals ────────────────────────────╮                             │
│   126 │   │   │   │   return found.group(1)                                                    │ counts_files = []                                              │                             │
│   127 │                                                                                        │   dataset_id = None                                            │                             │
│ ❱ 128 │   raise ValueError("Could not infer `dataset_id` from the name of the transcript file. │         path = PosixPath('/Sam_reseg_30_10_2024_15_42_51_118') │                             │
│   129                                                                                          │       suffix = '.csv.gz'                                       │                             │
│   130                                                                                          ╰────────────────────────────────────────────────────────────────╯                             │
│   131 def _read_fov_image(                                                                                                                                                                    │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Could not infer `dataset_id` from the name of the transcript file. Please specify it manually.

Running python 3.9.10 on UNIX.

Thanks in advance for any assistance!

All the best,
Gordon

@quentinblampey
Copy link
Collaborator

Hello @GBeattie, thanks for reporting.

Could you show me which file names are inside your Sam_reseg_30_10_2024_15_42_51_118 directory?

Normally, it should contain a file that ends with _fov_positions_file.csv or _fov_positions_file.csv.gz (among other files). See this tab of the FAQ to understand how the CosMX input directory should look like.

Hope this helps!

@GBeattie
Copy link
Author

Thanks @quentinblampey for the speedy response!

Yes I've followed those instructions, apart from missing the "Morphology_ChannelID_Dictionary.txt" file, which wasn't included in the AtoMx export, which will be my next issue to sort out!

ls of my data directory below with the error

(sopa_env) [regmgbe@login12 Sopa]$ ls Sam_reseg_30_10_2024_15_42_51_118/
flatFiles  L1_SU500_fov_positions_file.csv.gz  L1_SU500_tx_file.csv.gz  Morphology2D  RawFiles
(sopa_env) [regmgbe@login12 Sopa]$ ls Sam_reseg_30_10_2024_15_42_51_118/Morphology2D/ | head
20231206_171601_S3_C902_P99_N99_F001.TIF
20231206_171601_S3_C902_P99_N99_F002.TIF
20231206_171601_S3_C902_P99_N99_F003.TIF
20231206_171601_S3_C902_P99_N99_F004.TIF
20231206_171601_S3_C902_P99_N99_F005.TIF
20231206_171601_S3_C902_P99_N99_F006.TIF
20231206_171601_S3_C902_P99_N99_F007.TIF
20231206_171601_S3_C902_P99_N99_F008.TIF
20231206_171601_S3_C902_P99_N99_F009.TIF
20231206_171601_S3_C902_P99_N99_F010.TIF
(sopa_env) [regmgbe@login12 Sopa]$ sopa read --technology "cosmx" /Sam_reseg_30_10_2024_15_42_51_118
╭────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────────────────────────╮
│ /lustre/scratch/scratch/regmgbe/Projects/Sopa/sopa_env/lib/python3.9/site-packages/sopa/cli/app.py:94 in read                                                                                 │
│                                                                                                                                                                                               │
│    91 │   │   io, technology                                                                                                                                                                  │
│    92 │   ), f"Technology {technology} unknown. Currently available: xenium, merscope, cosmx,                                                                                                 │
│    93 │                                                                                                                                                                                       │
│ ❱  94 │   sdata = getattr(io, technology)(data_path, **kwargs)                                                                                                                                │
│    95 │   io.write_standardized(sdata, sdata_path, delete_table=True)                                                                                                                         │
│    96                                                                                                                                                                                         │
│    97                                                                                                                                                                                         │
│                                                                                                                                                                                               │
│ ╭──────────────────────────────────────────────────────────────────── locals ────────────────────────────────────────────────────────────────────╮                                            │
│ │ config_path = None                                                                                                                             │                                            │
│ │   data_path = '/Sam_reseg_30_10_2024_15_42_51_118'                                                                                             │                                            │
│ │          io = <module 'sopa.io' from '/lustre/scratch/scratch/regmgbe/Projects/Sopa/sopa_env/lib/python3.9/site-packages/sopa/io/__init__.py'> │                                            │
│ │      kwargs = {}                                                                                                                               │                                            │
│ │  sdata_path = PosixPath('/Sam_reseg_30_10_2024_15_42_51_118.zarr')                                                                             │                                            │
│ │  technology = 'cosmx'                                                                                                                          │                                            │
│ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯                                            │
│                                                                                                                                                                                               │
│ /lustre/scratch/scratch/regmgbe/Projects/Sopa/sopa_env/lib/python3.9/site-packages/sopa/io/reader/cosmx.py:55 in cosmx                                                                        │
│                                                                                                                                                                                               │
│    52 │   path = Path(path)                                                                    ╭───────────────────────────────────── locals ─────────────────────────────────────╮           │
│    53 │   image_models_kwargs, imread_kwargs = _default_image_kwargs(image_models_kwargs, imre │          dataset_id = None                                                       │           │
│    54 │                                                                                        │                 fov = None                                                       │           │
│ ❱  55 │   dataset_id = _infer_dataset_id(path, dataset_id)                                     │ image_models_kwargs = {'chunks': (1, 1024, 1024), 'scale_factors': [2, 2, 2, 2]} │           │
│    56 │   fov_locs = _read_fov_locs(path, dataset_id)                                          │       imread_kwargs = {}                                                         │           │
│    57 │   fov_id, fov = _check_fov_id(fov)                                                     │                path = PosixPath('/Sam_reseg_30_10_2024_15_42_51_118')            │           │
│    58                                                                                          │       read_proteins = False                                                      │           │
│                                                                                                ╰──────────────────────────────────────────────────────────────────────────────────╯           │
│                                                                                                                                                                                               │
│ /lustre/scratch/scratch/regmgbe/Projects/Sopa/sopa_env/lib/python3.9/site-packages/sopa/io/reader/cosmx.py:128 in _infer_dataset_id                                                           │
│                                                                                                                                                                                               │
│   125 │   │   │   if found:                                                                    ╭──────────────────────────── locals ────────────────────────────╮                             │
│   126 │   │   │   │   return found.group(1)                                                    │ counts_files = []                                              │                             │
│   127 │                                                                                        │   dataset_id = None                                            │                             │
│ ❱ 128 │   raise ValueError("Could not infer `dataset_id` from the name of the transcript file. │         path = PosixPath('/Sam_reseg_30_10_2024_15_42_51_118') │                             │
│   129                                                                                          │       suffix = '.csv.gz'                                       │                             │
│   130                                                                                          ╰────────────────────────────────────────────────────────────────╯                             │
│   131 def _read_fov_image(                                                                                                                                                                    │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Could not infer `dataset_id` from the name of the transcript file. Please specify it manually.

@quentinblampey
Copy link
Collaborator

@GBeattie I think the issue comes from the fact that you used /Sam_reseg_30_10_2024_15_42_51_118 (absolute path), which indicated that your data is at the root, which I expect to be false.

Can you try to use a relative path instead, as below:

sopa read --technology "cosmx" Sam_reseg_30_10_2024_15_42_51_118

You should still have an error though, because you're missing the Morphology_ChannelID_Dictionary.txt file (containing the image channel names).
Please let me know if you indeed have this error, and, if yes, let me know if you can figure out how to export this file from AtomX!

@GBeattie
Copy link
Author

GBeattie commented Nov 1, 2024

Thanks again, yes you're right! I was adjusting the path as I was previously getting an error that implied sopa was generating an incorrect path by repeating the first directory in the path (i.e "Sam_reseg_30_10_2024_15_42_51_118/" is put incorrectly as "Sam_reseg_30_10_2024_15_42_51_118/Sam_reseg_30_10_2024_15_42_51_118/"), which is now happening again with that command:

sopa read --technology "cosmx" Sam_reseg_30_10_2024_15_42_51_118
╭────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────────────────────────╮
│ /lustre/scratch/scratch/regmgbe/Projects/Sopa/sopa_env/lib/python3.9/site-packages/sopa/cli/app.py:94 in read                                                                                 │
│                                                                                                                                                                                               │
│    91 │   │   io, technology                                                                                                                                                                  │
│    92 │   ), f"Technology {technology} unknown. Currently available: xenium, merscope, cosmx,                                                                                                 │
│    93 │                                                                                                                                                                                       │
│ ❱  94 │   sdata = getattr(io, technology)(data_path, **kwargs)                                                                                                                                │
│    95 │   io.write_standardized(sdata, sdata_path, delete_table=True)                                                                                                                         │
│    96                                                                                                                                                                                         │
│    97                                                                                                                                                                                         │
│                                                                                                                                                                                               │
│ ╭──────────────────────────────────────────────────────────────────── locals ────────────────────────────────────────────────────────────────────╮                                            │
│ │ config_path = None                                                                                                                             │                                            │
│ │   data_path = 'Sam_reseg_30_10_2024_15_42_51_118'                                                                                              │                                            │
│ │          io = <module 'sopa.io' from '/lustre/scratch/scratch/regmgbe/Projects/Sopa/sopa_env/lib/python3.9/site-packages/sopa/io/__init__.py'> │                                            │
│ │      kwargs = {}                                                                                                                               │                                            │
│ │  sdata_path = PosixPath('Sam_reseg_30_10_2024_15_42_51_118.zarr')                                                                              │                                            │
│ │  technology = 'cosmx'                                                                                                                          │                                            │
│ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯                                            │
│                                                                                                                                                                                               │
│ /lustre/scratch/scratch/regmgbe/Projects/Sopa/sopa_env/lib/python3.9/site-packages/sopa/io/reader/cosmx.py:56 in cosmx                                                                        │
│                                                                                                                                                                                               │
│    53 │   image_models_kwargs, imread_kwargs = _default_image_kwargs(image_models_kwargs, imre ╭───────────────────────────────────── locals ─────────────────────────────────────╮           │
│    54 │                                                                                        │          dataset_id = 'Sam_reseg_30_10_2024_15_42_51_118/L1_SU500'               │           │
│    55 │   dataset_id = _infer_dataset_id(path, dataset_id)                                     │                 fov = None                                                       │           │
│ ❱  56 │   fov_locs = _read_fov_locs(path, dataset_id)                                          │ image_models_kwargs = {'chunks': (1, 1024, 1024), 'scale_factors': [2, 2, 2, 2]} │           │
│    57 │   fov_id, fov = _check_fov_id(fov)                                                     │       imread_kwargs = {}                                                         │           │
│    58 │                                                                                        │                path = PosixPath('Sam_reseg_30_10_2024_15_42_51_118')             │           │
│    59 │   protein_dir_dict = {}                                                                │       read_proteins = False                                                      │           │
│                                                                                                ╰──────────────────────────────────────────────────────────────────────────────────╯           │
│                                                                                                                                                                                               │
│ /lustre/scratch/scratch/regmgbe/Projects/Sopa/sopa_env/lib/python3.9/site-packages/sopa/io/reader/cosmx.py:150 in _read_fov_locs                                                              │
│                                                                                                                                                                                               │
│   147 │   if not fov_file.exists():                                                                                                                                                           │
│   148 │   │   fov_file = path / f"{dataset_id}_fov_positions_file.csv.gz"                                                                                                                     │
│   149 │                                                                                                                                                                                       │
│ ❱ 150 │   assert fov_file.exists(), f"Missing field of view file: {fov_file}"                                                                                                                 │
│   151 │                                                                                                                                                                                       │
│   152 │   fov_locs = pd.read_csv(fov_file)                                                                                                                                                    │
│   153                                                                                                                                                                                         │
│                                                                                                                                                                                               │
│ ╭───────────────────────────────────────────────────────────── locals ─────────────────────────────────────────────────────────────╮                                                          │
│ │ dataset_id = 'Sam_reseg_30_10_2024_15_42_51_118/L1_SU500'                                                                        │                                                          │
│ │   fov_file = PosixPath('Sam_reseg_30_10_2024_15_42_51_118/Sam_reseg_30_10_2024_15_42_51_118/L1_SU500_fov_positions_file.csv.gz') │                                                          │
│ │       path = PosixPath('Sam_reseg_30_10_2024_15_42_51_118')                                                                      │                                                          │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯                                                          │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError: Missing field of view file: Sam_reseg_30_10_2024_15_42_51_118/Sam_reseg_30_10_2024_15_42_51_118/L1_SU500_fov_positions_file.csv.gz

@quentinblampey
Copy link
Collaborator

Hum, it seems that the dataset_id was not inferred correctly...
It inferred dataset_id = 'Sam_reseg_30_10_2024_15_42_51_118/L1_SU500' instead of dataset_id = 'L1_SU500', which is an issue.

So, I'll fix this, but meanwhile can you try using the API and specifying the dataset_id? That is:

from sopa.io import cosmx

sdata = cosmx("Sam_reseg_30_10_2024_15_42_51_118", dataset_id="L1_SU500")

Sorry for these issues, the cosmx reader is still "new"/experimental, it should become more stable in the future!

@GBeattie
Copy link
Author

GBeattie commented Nov 4, 2024

@quentinblampey, no problem at all, from my limited CosMx experience it is the most difficult spatial platform due to changing file formats and poor documentation, which probably underpins my issues running Sopa..

On a first note about "Morphology_ChannelID_Dictionary.txt", it looks to me as this may not be included in the outputs (at least in my case). A bit of digging and I found this page: https://nanostring-biostats.github.io/CosMx-Analysis-Scratch-Space/posts/napari-cosmx-basics/using-napari-cosmx.html that puts the file in the RunSummary directory, however in my case it's not there:

RunSummary]$ ls
2304H0070_Affine_Transform_20230606.csv  latest.fovs.csv
c901.fovs.csv                            Run_4ae390b0-1c01-41bc-94d7-f5a2048a29bd_20231206_171601_S3_2304H0070_ExptConfig.txt
c902.fovs.csv                            Run4ae390b0-1c01-41bc-94d7-f5a2048a29bd_20231206_171601_S3_2304H0070_SpatialBC_Metrics4D.csv
FovTracking                              Shading

As for the API, I do get an error although an easy fix in theory (apart from the inevitable error due to missing "Morphology_ChannelID_Dictionary.txt").

>>> sdata = cosmx("Sam_reseg_30_10_2024_15_42_51_118", dataset_id="L1_SU500")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/lustre/scratch/scratch/regmgbe/Projects/Sopa/sopa_env/lib/python3.9/site-packages/sopa/io/reader/cosmx.py", line 56, in cosmx
    fov_locs = _read_fov_locs(path, dataset_id)
  File "/lustre/scratch/scratch/regmgbe/Projects/Sopa/sopa_env/lib/python3.9/site-packages/sopa/io/reader/cosmx.py", line 155, in _read_fov_locs
    assert np.isin(
AssertionError: The file Sam_reseg_30_10_2024_15_42_51_118/L1_SU500_fov_positions_file.csv.gz must contain the following columns: X_mm, Y_mm, FOV. Consider using a different export module.

And here are my column names for that file:

zcat L1_SU500_fov_positions_file.csv.gz | head -n 1
FOV,x_global_px,y_global_px,x_global_mm,y_global_mm

I can see in issue #63 and #65 that different AtoMx versions/output options have different column names so I can adjust as needed when I have some time this week, more worrying if AtoMx isn't generating "Morphology_ChannelID_Dictionary.txt" though, note: my AtoMx version is 1.3.2.

@quentinblampey
Copy link
Collaborator

Indeed the documentation is quite poor, and it seems the exports are different for everyone...

Have you exported the data as shown in this comment? It seems he was also using version 1.3.

I hope that you'll be able to get the right output by changing the way the data is exported! Else, I'll need to better understand the difference in terms of outputs depending on the AtomX versions, which may be difficult

@GBeattie
Copy link
Author

GBeattie commented Nov 4, 2024

I just re-exported the FlatFiles only (previously I did both Flat and Raw), but no difference to the column names of that file. I'll keep an eye out to see any changes/other issues related to the missing "Morphology_ChannelID_Dictionary.txt" file as I'll be unable to proceed without it in any case. Thanks, and good luck wrangling with AtoMx versions, a very difficult task!

@quentinblampey
Copy link
Collaborator

Okay, please let me know if you have any updates on your side!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants