-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse CellProfiler features #298
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #298 +/- ##
==========================================
- Coverage 95.57% 95.48% -0.09%
==========================================
Files 56 59 +3
Lines 3138 3212 +74
==========================================
+ Hits 2999 3067 +68
- Misses 139 145 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
My only quibble is that I would like the One thing to consider - the features measured in JUMP (and our subsequent default CP pipeline) have both |
One further thought, I think your code breaks if there are |
Now fixed in 26591d7 (but later superseded by 36b5d50)
Darn – that's definitely annoying, and we would want to capture this ideally (and we'd want to call that channel I wanted to keep this as general as possible, so maybe the right way to address this issue (and the issue of allowing underscores) is to specify the list of channels as a parameter. See 36b5d50 |
My concerns are all addressed now. Thanks Shantanu! |
Hello, this is a courtesy notice that Pcytominer's |
@d33bs @gwaybio – this was already reviewed by @ErinWeisbart and is ready to be merged. Would you like to review next or should we proceed with merging? |
Yes, we'll need to review. @kenibrewer maybe you can take a look? I'll make two guiding notes here:
Thanks! |
@gwaybio – I'm in no hurry to merge this, so certainly let's follow the roadmap all through (exciting!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shntnu Thanks for this contribution. As @gwaybio mentioned, part of our technical roadmap for pycytominer is moving away from hard-coded uses of CellProfiler feature names towards more generalizable approaches that work different software. A way we'd like to accomplish this is by a greater reliance on abstractions and object-oriented programming. Having cell_profiler features be simply one of many solutions supported.
Ideally, I think we would implement something similar to what is described in #360 first and then make this PR a method of that class. Is that an approach that you think would be interested in implementing? Alternately, this script could also be refactored to accept a regex and set of feature groups to extract.
I like this plan!
I am happy for us to take this approach but I would not have the capacity to think through what's needed here. IIUC the plan is to introduce a class like SummarizedExperiment or SingleCellExperiment but probably not as general as AnnData? That's quite a foundational effort if so. I'm happy to have this PR sit around for a bit until this upstream issue is sorted out.
Hm – I don't think this would be practical; it would need a really complex regex to replace all those conditional statements. |
Description
This PR introduces a function to parse a CellProfiler feature string into its semantic components -- we are missing a standardized method for doing this.
I've focused on features generated in the JUMP Cell Painting datasets.
What is the nature of your change?
Checklist
Please ensure that all boxes are checked before indicating that a pull request is ready for review.