Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnnData Conversion Notebook #1079

Merged
merged 25 commits into from
Dec 19, 2023
Merged

AnnData Conversion Notebook #1079

merged 25 commits into from
Dec 19, 2023

Conversation

srivarra
Copy link
Contributor

@srivarra srivarra commented Oct 24, 2023

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Main Features

Adds a notebook for converting the cell table to per-FOV AnnData objects. By default, they get saved to data/<my-cohort>/anndata in the notebook. In addition, it briefly covers viewing components of the AnnData object, and creating data pipelines with a small application for constructing spatial neighbors with squidpy.

  • Added the conversion class which creates a AnnData object with:
    • Renamed spatial coordinates (centroid-0 $\to$ centroid_y, centroid-1 $\to$ centroid_x), stored in obsm["spatial"].
    • All markers get stored in var_names / X.
    • All other columns get stored in obs.
    • Dropped settings.CELL_SIZE in favor of the generic "area" as a column name in obs, as we can have tables for many kinds of segmentations / observations (cell, fiber, ez_seg, etc...)
  • Added a function to load all AnnData Zarr stores in a directory to an AnnCollection.
  • Added a custom datapipe to iterate over the AnnCollection FOV by FOV.

Documentation

  • Added a brief AnnData section for data_types.md with figures.
  • Added a AnnData section for development.md.

Testing

  • Adjusted test_utils.make_cell_table to have the expected ordering for Cell Tables. Added n_cells and n_markers to create different sized cell tables.
  • Added a utility function to generate AnnData tables with various values of n_obs, and floating point and categorical column types. (test_utils.generate_anndata_table and test_utils.generate_anncollection)

How did you implement your changes

  • Used Dask to work with converting large cell tables.
  • Used TorchData to create the AnnData IterDataPipe. This allows us to implement more intricate data pipelines with shuffling, mapping, and filtering functionality, and construct data loaders with DataLoader2
  • Adjusted tests which utilized test_utils.make_cell_table.

Remaining Issues

  • Need a plan for structuring how we will write modules and functions for interacting with AnnData objects. We can try some stuff, and see what works best.

@srivarra srivarra linked an issue Oct 24, 2023 that may be closed by this pull request
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@srivarra srivarra changed the title AnnData FOV 0 Conversion AnnData Conversion Notebook Nov 7, 2023
@srivarra srivarra self-assigned this Nov 7, 2023
@srivarra srivarra added the enhancement New feature or request label Nov 7, 2023
@srivarra srivarra linked an issue Nov 10, 2023 that may be closed by this pull request
@srivarra srivarra marked this pull request as ready for review November 30, 2023 21:17
Copy link
Contributor

@alex-l-kong alex-l-kong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful stuff overall. I think once the lab gets familiar with AnnData/dask/zarr they'll find it very easy to use. Mostly structural comments.

src/ark/utils/data_utils.py Show resolved Hide resolved
src/ark/utils/data_utils.py Outdated Show resolved Hide resolved
src/ark/utils/data_utils.py Outdated Show resolved Hide resolved
src/ark/utils/data_utils.py Outdated Show resolved Hide resolved
src/ark/utils/data_utils.py Show resolved Hide resolved
Copy link
Contributor

@camisowers camisowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notebook looks much cleaner!

templates/anndata_conversion.ipynb Outdated Show resolved Hide resolved
Copy link
Contributor

@alex-l-kong alex-l-kong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, sorry for the late review

@srivarra
Copy link
Contributor Author

@ngreenwald

@@ -0,0 +1,207 @@
{
Copy link
Member

@ngreenwald ngreenwald Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the embedded image show up correctly in the actual notebook? Not showing up here, could just be an issue with reviewnb


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think it's cause the image is being displayed with html, instead of markdown's native method.

Here's what it looks like when I open the notebook in Jupyter Lab.
image

Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome, just a couple small comments

docs/_rtd/data_types.md Show resolved Hide resolved
docs/_rtd/data_types.md Outdated Show resolved Hide resolved
src/ark/utils/data_utils.py Outdated Show resolved Hide resolved
srivarra and others added 3 commits December 13, 2023 11:42
@srivarra srivarra requested a review from ngreenwald December 18, 2023 23:57
Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@srivarra srivarra merged commit f3391e5 into main Dec 19, 2023
16 checks passed
@srivarra srivarra deleted the anndata-conversion-fov0 branch December 19, 2023 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AnnData Conversion Design Document Part 2 AnnData Conversion Design Doc
4 participants