Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse CellProfiler features #298

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

shntnu
Copy link
Member

@shntnu shntnu commented Jun 19, 2023

Description

This PR introduces a function to parse a CellProfiler feature string into its semantic components -- we are missing a standardized method for doing this.

I've focused on features generated in the JUMP Cell Painting datasets.

What is the nature of your change?

  • Enhancement (adds functionality).

Checklist

Please ensure that all boxes are checked before indicating that a pull request is ready for review.

  • I have read the CONTRIBUTING.md guidelines.
  • My code follows the style guidelines of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • New and existing unit tests pass locally with my changes.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have deleted all non-relevant text in this pull request template.

@codecov-commenter
Copy link

codecov-commenter commented Jun 19, 2023

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (a671c3b) 95.57% compared to head (90c9a89) 95.48%.

Files Patch % Lines
pycytominer/cyto_utils/parse_cp_features_cmd.py 0.00% 4 Missing ⚠️
pycytominer/cyto_utils/parse_cp_features.py 96.07% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #298      +/-   ##
==========================================
- Coverage   95.57%   95.48%   -0.09%     
==========================================
  Files          56       59       +3     
  Lines        3138     3212      +74     
==========================================
+ Hits         2999     3067      +68     
- Misses        139      145       +6     
Flag Coverage Δ
unittests 95.48% <91.89%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shntnu shntnu changed the title [WIP] Parse CellProfiler features Parse CellProfiler features Jul 10, 2023
@ErinWeisbart
Copy link
Member

My only quibble is that I would like the Mito channel to be returned as Mito and not MITO. (It is our standard styling, unlike all other channels that are usually ALLCAPS).

One thing to consider - the features measured in JUMP (and our subsequent default CP pipeline) have both Mito and mito features which this script handles and parses as MITO (though if I get my way that will change to Mito :)
I don't know the intended use of this script, but there is actually a difference between Mito and mito features. Mito features are captured from the Mito channel images, just like any other channel's features. mito features are on heavily processed images that did initially derive from Mito images but are heavily manipulated to generate additional features that were targeted for a specific project/screen. We do not manipulate/process any other channels in a similar manner. So, that leads me to the question of do we want to distinguish between Mito and mito because mito features are derived differently than all other features, or do we not care?

@ErinWeisbart
Copy link
Member

One further thought, I think your code breaks if there are _ in the channel name, but this is something we do consistently with Brightfield features e.g. Brightfield_high and Brightfield_low and occasionally with other channels to represent high/low focal planes (for Brightfield) or exposures (for fluorescence channels).

@shntnu
Copy link
Member Author

shntnu commented Jul 10, 2023

My only quibble is that I would like the Mito channel to be returned as Mito and not MITO. (It is our standard styling, unlike all other channels that are usually ALLCAPS).

Now fixed in 26591d7 (but later superseded by 36b5d50)

One thing to consider - the features measured in JUMP (and our subsequent default CP pipeline) have both Mito and mito features which this script handles and parses as MITO (though if I get my way that will change to Mito :) I don't know the intended use of this script, but there is actually a difference between Mito and mito features. Mito features are captured from the Mito channel images, just like any other channel's features. mito features are on heavily processed images that did initially derive from Mito images but are heavily manipulated to generate additional features that were targeted for a specific project/screen. We do not manipulate/process any other channels in a similar manner. So, that leads me to the question of do we want to distinguish between Mito and mito because mito features are derived differently than all other features, or do we not care?

Darn – that's definitely annoying, and we would want to capture this ideally (and we'd want to call that channel mito_tubeness, not mito IIUC)

I wanted to keep this as general as possible, so maybe the right way to address this issue (and the issue of allowing underscores) is to specify the list of channels as a parameter.

See 36b5d50

@ErinWeisbart
Copy link
Member

My concerns are all addressed now. Thanks Shantanu!

@shntnu shntnu marked this pull request as ready for review July 11, 2023 16:03
@d33bs
Copy link
Member

d33bs commented Aug 18, 2023

Hello, this is a courtesy notice that Pcytominer's master branch has been renamed to main as part of changes merged in #303. While changes here should be seamless, GitHub is unable to change your local git environment and offers the following guidance when it comes to this. Thank you for your understanding and please don't hesitate to reach out with any questions or concerns.

image

@shntnu
Copy link
Member Author

shntnu commented Dec 11, 2023

@d33bs @gwaybio – this was already reviewed by @ErinWeisbart and is ready to be merged. Would you like to review next or should we proceed with merging?

@shntnu shntnu requested review from d33bs and gwaybio December 11, 2023 14:23
@shntnu shntnu self-assigned this Dec 11, 2023
@gwaybio
Copy link
Member

gwaybio commented Dec 11, 2023

Yes, we'll need to review. @kenibrewer maybe you can take a look?

I'll make two guiding notes here:

  • this touches upon two items from our roadmap (@shntnu , i still need to share the document with you, we will make public too, just haven't had the chance). 1) institution specific code. We will work to refactor this out of pycytominer in subsequent versions, aiming for greater generalizability. 2) cli. We will implement Fire in subsequent versions.
  • cyto_utils has less but still rigorous merge guidelines, and this PR, upon first glance, might take some time to merge.

Thanks!

@shntnu
Copy link
Member Author

shntnu commented Dec 11, 2023

@gwaybio – I'm in no hurry to merge this, so certainly let's follow the roadmap all through (exciting!)

Copy link
Member

@kenibrewer kenibrewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shntnu Thanks for this contribution. As @gwaybio mentioned, part of our technical roadmap for pycytominer is moving away from hard-coded uses of CellProfiler feature names towards more generalizable approaches that work different software. A way we'd like to accomplish this is by a greater reliance on abstractions and object-oriented programming. Having cell_profiler features be simply one of many solutions supported.

Ideally, I think we would implement something similar to what is described in #360 first and then make this PR a method of that class. Is that an approach that you think would be interested in implementing? Alternately, this script could also be refactored to accept a regex and set of feature groups to extract.

@shntnu
Copy link
Member Author

shntnu commented Dec 12, 2023

@shntnu Thanks for this contribution. As @gwaybio mentioned, part of our technical roadmap for pycytominer is moving away from hard-coded uses of CellProfiler feature names towards more generalizable approaches that work different software. A way we'd like to accomplish this is by a greater reliance on abstractions and object-oriented programming. Having cell_profiler features be simply one

I like this plan!

Ideally, I think we would implement something similar to what is described in #360 first and then make this PR a method of that class. Is that an approach that you think would be interested in implementing?

I am happy for us to take this approach but I would not have the capacity to think through what's needed here.

IIUC the plan is to introduce a class like SummarizedExperiment or SingleCellExperiment but probably not as general as AnnData? That's quite a foundational effort if so.

I'm happy to have this PR sit around for a bit until this upstream issue is sorted out.

Alternately, this script could also be refactored to accept a regex and set of feature groups to extract.

Hm – I don't think this would be practical; it would need a really complex regex to replace all those conditional statements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants