-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include noise removal for feature selection #21
Comments
@shntnu Since there is a new version for jump-profiling-recipe it is still interesting to run Scenario_1 pipeline? |
The analysis was conducted using the code provided in the repository. I have tested the recipe by applying noise removal to Scenario 1 and Scenario 4: Scenario 1: Includes "source 4" on "target2". Observations Scenario 1: The addition of the noise removal function did not show significant differences in the results. Metrics: For both scenarios, the metrics for mAP negcon and mAP nonrep were collect Results: Number of Compounds with Significant Activity Scenario 1: Significant Activity using stdev of noise removal 0.5 True mAP negcon: 289 True mAP negcon with noise removal: 293 True mAP nonrep: 293 True mAP nonrep with noise removal: 293 Scenario 4: Significant Activity using stdev of noise removal 0.8 True mAP negcon: 186 True mAP negcon with noise removal: 161 True mAP nonrep: 270 True mAP nonrep with noise removal: 190 Attached are the plots comparing mAP negcon and mAP nonrep metrics with and without noise removal.
|
Thanks Paula for putting this results together! Could you please include the number of features dropped by the "noise removal" step in both scenarios? also, how many features are in common between both scenarios for the noise removal pipeline after the "featselect" step? Maybe after removing the noisy features, the feature selection yields to the same feature set, which would be a nice thing. |
@PaulaLlanos - please do ^^^ @johnarevalo - could you help drive this towards a conclusion about whether we should or should not use noise removal. If inconclusive, please help note what additional experiments will be needed in the future |
Overview from mini grant:
Test noise_removal for feature selection because we have no equivalent step in the profiling recipe. Without a feature selection step that filters based on feature relevance, we may end up with lots of noisy features. Scenario 1 in https://github.com/carpenter-singh-lab/2023_Arevalo_BatchCorrection would be ideal to test this out on. Use mAP (and possibly other metrics) to report if the baseline performance improves if we add this step to the preprocessing workflow. See carpenter-singh-lab/2023_Arevalo_NatComm_BatchCorrection#4 for more details about preprocessing. Also note this paper that tests feature selection methods: 40. Siegismund, D., Fassler, M., Heyse, S. & Steigele, S. Benchmarking feature selection methods for compressing image information in high-content screening. SLAS Technol 27, 85–93 (2022) - was summarized in a review as "AutoML (automated machine learning), enable the most informative features from Cell Painting datasets to be identified faster and more accurately"
Steps discussed by email:
Add noise_removal to the feature selection steps in the jump profiling recipe to see if it improves results (compared to without)
You will need to figure out how to include it in this step: https://github.com/broadinstitute/jump-profiling-recipe/blob/main/preprocessing/feature_selection.py
This is the function: https://github.com/cytomining/pycytominer/blob/08f3a043fd22e8f86de5488fdfc2d21814a491fb/pycytominer/operations/noise_removal.py#L8
Test:
https://github.com/cytomining/pycytominer/blob/08f3a043fd22e8f86de5488fdfc2d21814a491fb/tests/test_feature_select.py#L76
The text was updated successfully, but these errors were encountered: