Image-based profiling represents a series of data processing steps that turn image-based readouts into more manageable data matrices for downstream analyses (Caicedo et al. 2017). Typically, the image-based readouts are derived from CellProfiler (McQuin et al. 2018) and represent single cell morphology measurements. In this folder, we process the CellProfiler derived morphology features using pycytominer - a tool enabling reproducible image-based profiling. Specifically, we include:
- Data processing scripts to perform the full unified, image-based profiling pipeline
- Processed data for each Cell Painting plate (for several "data levels")
- Instructions on how to reproduce the profiling pipeline
Note here that we do not include the intermediate step of generating .sqlite
files per plate using a tool called cytominer-database.
This repository and workflow begins after we applied cytominer-database.
Data Level | Description | File Format | Included in this Repo |
---|---|---|---|
Level 1 | Cell Images | .tif |
No^ |
Level 2 | Single Cell Profiles | .sqlite |
No^ |
Level 3 | Aggregated Profiles with Metadata | .csv.gz |
Yes |
Level 4a | Normalized Profiles with Metadata | .csv.gz |
Yes |
Level 4b | Normalized and Feature Selected Profiles with Metadata | .csv.gz |
Yes |
Level 5 | Consensus Perturbation Profiles | .csv.gz |
Yes |
Importantly, we include files for two different types of normalization: Whole-plate normalization, and DMSO-specific normalization.
See profile.py
for more details.
^ Note that these files are being prepared
TBD
The pipeline can be reproduced by simply executing the following:
# Make sure conda environment is activated
conda activate lincs
# Reproduce the entire image-based profiling pipeline for CellProfiler derived features
python profiling_pipeline.py
There are several critical details that are important for understanding data generation and processing.
See profile.py
for more details about the specific processing steps and decisions.