This repository contains the pipeline for analyzing CODEX output images.
This repository contains the python scripts necessary for analyzing the CODEX data. The relevant scripts are for image registration, nucleus segmentation, and extracting average brightness to save in a .csv.
This repository contains the following scripts, intended to be run on data generated by CODEX:
requirements_registrator.txt
: List of requirements to run shift registration. Activate your conda environment and runpip install -r requirements_registrator.txt
registrator.py
: perform a focus stack using local (provide path) or remote (using GCSFS) images.
requirements_segment.txt
: requirements for operation, install usingpip install -r requirements_segment.txt
pbmc_cellpose_model.pth
: pretrained cellpose model for PBMC.segmenter.py
: segment nuclei using cellpose using a pretrained model.
requirements_analyze.txt
: requirements for operation, install usingpip install -r requirements_analyze.txt
analyzer.py
: measure the size and average brightness of each cell in each channel and save as a csv. Shift registration and segmentation must be run on the data first before running this step.
cell_crop.py
: Once the cells are identified, load the CSV, find a certain number of each cell and save them as images.
First, install miniconda for python 3 following this guide: https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html Miniconda will help manage our packages and python environment.
Next, install CUDA 11.3 and cuDNN 8.5.0 from nvidia. This lets us use the graphics card for accelerated ML and image processing. We need version 11.3 for compatibility with Cellpose and M2Unet.
CUDA: sudo apt-get update
, sudo apt-get upgrade
, sudo apt-get install cuda=11.3.1-1
, sudo apt-get install nvidia-gds=11.4.1-1
, export PATH="/usr/local/cuda-11.3/bin:$PATH"
, export LD_LIBRARY_PATH="/usr/local/cuda-11.3/lib64 $LD_LIBRARY_PATH"
, sudo reboot
. Verify that the PATH exported properly, if it didn't, modify ~./bashrc
to add CUDA to PATH and LD_LIBRARY_PATH.
cuDNN: Follow the directions for Ubuntu network installation here: https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#package-manager-ubuntu-install Make sure you install a version of cuDNN compatible with CUDA 11.3 (e.g.libcudnn8=8.2.1.32-1+cuda11.3
and libcudnn8-dev=8.2.1.32-1+cuda11.3
)
Create a new conda environment and install the requirements: conda create --name pipeline
, conda activate pipeline
, then follow this guide to install the correct version of pytorch: https://pytorch.org/get-started/locally/. Next, run the following commands:pip install -r requirements_fstack.txt
, pip install -r requirements_segment.txt
, pip install -r requirements_analyze.txt
, pip install -r requirements_deepzoom.txt
Shift-register and crop every imageset from a CODEX experiment. We assume the following file structure:
src
source directory (can be any valid string)exp_id_1
experiment ID (can be any string)index.csv
CSV with the valid cycle names (must be named "index.csv")cycle1
folder with name of fisrt cycle (can be any string as long as it matches index.csv)0
folder named "0" (must be named "0")0_0_0_Fluorescence_405_nm_Ex.bmp
bmp image with this name format. The first digit represents the i coordinate, the second digit represents the j coordinate, the third represents the z height, and the rest of the filename represnts the channel wavelength0_0_0_Fluorescence_488_nm_Ex.bmp
bmp with same coordinates as above but with a different channel- more BMPs
6_7_5_Fluorescence_488_nm_Ex.bmp
for our example, suppose i ranges from 0 to 6, j ranges from 0 to 7, and z index ranges from 0 to 5
exp_id_2
another experiment (can have any number)0
- identical structure to
exp_id_1
- identical structure to
For each experiment ID, for each channel Fluorescence_NNN_nm_Ex
, and for each i index i
and j index j
in the range, registrator.py
generates an image called "i
_j
_f_Fluorescence_NNN
_nm_Ex.png" image (with different values for i
, j
, and NNN
for each image stacked) and saves it to either the src
directory or a different directory of your choosing.
there are many parameters to set which images get focus stacked and where to save them. here's the complete list of the parameters and what they mean:
prefix
: string. If you have an index.csv, leave this string empty. If you don't have an index.csv with the names of the cycles to analyze, you can select which cycles to run by prefix. For example, if you have three cycles with names,cycle_good1
,cycle_also_good
, andbad_cycle
and you only want to run focus stacking on the two good datasets, you can setprefix="cy"
. Setprefix = '*'
to get all folders.key
: string. If you are connecting to a Google Cloud File Storage, set this to the local path to the authentication token .json. If you are running segmentation locally, this doesn't matter.gcs_project
: string. Set this to the Google Cloud Storage project name if you are connecting to GCS. Otherwise, it doesn't matter.src
: string. path to the folder that contains the experiment folder. Can be a GCS locator (e.g.gs://octopi-data
) or a local path (e.g./home/user/Documents/src/
). Note - must have a trailing slash (/
)dst
: string. path to folder to save data. If left blank, the images will be stored in the same directory assrc
.registrator.py
will recreate the source folder structure in this folder. Also must have a trailing slash.exp
: list of strings. List of experiment IDs. In the example above, this would be["exp_id_1", "exp_id_2"]
cha
: list of strings. List of channel names. In the example above, this would be["Fluorescence_405_nm_Ex", "Fluorescence_488_nm_Ex"]
but it should be left as-is.typ
: string. Filetype of the images to read. Leave as-is.colors
: Dictionary of lists. Leave as-is.remove_background
: boolean. SetTrue
to run a white-tophat filter. This will increase runtime considerably.invert_contrast
: boolean. SetTrue
to invert the image.shift_registration
: boolean. SetTrue
to ensure images stay aligned across cycles.subtract_background
: boolean. SetTrue
to subtract the minimum brightess off from the entire image.use_color
: boolean. SetTrue
when processing RGB images (untested).crop_start
andcrop_end
: integers. Crop the image; x and y coordinates less than or greater thancrop_start
orcrop_end
respectively will be cropped out.verbose
: boolean. SetTrue
to print extra details during operation (good for determining whether GPU stacking is faster than CPU)
Cells don't move between cycles so we only need to segment one cycle for each (i,j) view. We generally choose to segment the nuclear channel because it is brightest. The script first loads all the channel paths, sorts them to find the 0th channel, then filters the image paths so only the 0th channel images are loaded. Alternatively, we can choose to segment all the images. Images are then segmented one at a time. In principle, Cellpose can work faster by segmenting multi-image batches but in my experience not all GPUs can handle segmenting multiple images. Cellpose then saves a series of .npy files with the masks to the destination directory.
if you are having trouble installing cellpose, try uninstalling all other python packages that use QTpy (e.g. cv2), install cellpose, then reinstall everthing you uninstalled. If you are using CUDA, ensure your CUDA version is compatible with your version of torch. Cellpose[gui] seems to only work in the base channel.
root_dir
: string. local or remote path to where the images are storedexp_id
: string. experiment ID to get the images from. see registrator theory of operation for more detail.channel
: string. image channel to randomly select from. see registrator theory of operation for more detail.zstack
: string. if you ranregistrator.py
and want to use the shift-registered images, set this value to"f"
. otherwise set it to the z stack you want to use (for example, if z=5 is in focus for all cells across all images, you can setzstack="5"
and those images will be utilized)key
: string. If you are connecting to a Google Cloud File Storage, set this to the local path to the authentication token .json.gcs_project
: string. Set this to the Google Cloud Storage project name if you are connecting to GCS. Otherwise, it doesn't matter.cpmodel
: string. path to cellpose model. Can be remote or local.use_gpu
: boolean. Set toTrue
to try segmenting using the GPU.segment_all
: boolean. Set toTrue
to segment all cycles, not just the firstchannels
: list of ints. Which channel to run segmentation on. Set to[0,0]
for the monochrome channel
We assume cell positions don't change between cycles and we can mask the cells by expanding the nucleus masks from segmenter.py
. We have a mask for each (i,j) view; for each view we load all the images across all cycles at that view and load the nucleus mask. The mask is expanded to mask the entire cell. Then for each cell in the view, for each channel, for each cycle we calculate the average brightness of the cell and store it in a csv.
The CSV columns are cell index in a given view, the i index, j index, x position in the image, y position in the image, the number of pixels in the expanded mask, and the number of pixels in the nuclear mask. Then, there is a column for each element in the cartesian product of channel index and cycle index. The header is re-printed in the csv and the cell index resets for each view.
cy_name
: string. If during semgentationsegment_all = True
, set this to be the name of the cycle you want to use as the mask. Ifsegment_all = False
, setcy_name = "first"
.start_idx
,end_idx
: integers. Select a range of cycles to analyze. Set both to negative 1 to select all.n_ch
: integer. Number of channels to analyzeexpansion
: integer, must be odd. The number of pixels to expand around the nucleus mask to create the cell masks.root_dir
: string. local or remote path to where the images are storedexp_id
: string. experiment ID to get the images from. see registrator theory of operation for more detail.channel
: string. image channel to randomly select from. see registrator theory of operation for more detail.zstack
: string. if you ranregistrator.py
and want to use the shift registered images, set this value to"f"
. otherwise set it to the z stack you want to use (for example, if z=5 is in focus for all cells across all images, you can setzstack="5"
and those images will be utilized)key
: string. If you are connecting to a Google Cloud File Storage, set this to the local path to the authentication token .json.gcs_project
: string. Set this to the Google Cloud Storage project name if you are connecting to GCS. Otherwise, it doesn't matter.mask_union
: boolean. Set to true to save a .npy pickle for each view with the union of all nuclear and cell masks.out
: string. Path to store the csv. Local or remote path.
We first read the cell type csv file. For each row, we check whether the cell type is one we have already seen enough times. If so, we skip it. If we haven't seen this cell type enough times, we increment the number of times we have seen the cell type and load the image for each channel and for each cycle. Then, we crop the images such that the cell is centered and concatenate the different channels/cycles together to form a single image. Then we save the image to disk and move on to the next row in the csv.
root_dir
: string. local or remote path to where the images are storedexp_id
: string. experiment ID to get the images from. see registrator theory of operation for more detail.dest_dir
: string. local or remote path to where the images are storedchannels
: list of strings. channels to read from when making the combined image.celltype_file
: string. local or remote path to celltype .csv file.zstack
: string. if you ranregistrator.py
and want to use the shift registered images, set this value to"f"
. otherwise set it to the z stack you want to use (for example, if z=5 is in focus for all cells across all images, you can setzstack="5"
and those images will be utilized)key
: string. If you are connecting to a Google Cloud File Storage, set this to the local path to the authentication token .json.gcs_project
: string. Set this to the Google Cloud Storage project name if you are connecting to GCS. Otherwise, it doesn't matter.cell_radius
: integer. half the width/height of the final views.n_of_each_type
: integer. number of each cell type to make views forftype
: string. file type to save the outputs assubtract_min
: boolean. set "True" to subtract the minimum value off from each view.
Currently, this project only accepts outside contributions in the form of bug reports, feature requests, documentation improvements, and bug fixes. Additional features carry a long-term maintenance burden which this project is not yet mature enough to support, so pull requests adding new features will not be accepted at this time. Please start a new discussion (in the "Discussions" tab of this repository on Github) if you have a new feature you'd like to propose for this project.
We have chosen the following licenses in order to give away our work for free, so that you can freely use for whatever purposes you have, with minimal restrictions while still protecting our disclaimer that this work is provided without any warranties at all. If you're using this project, or if you have questions about the licenses, we'd love to hear from you - please start a new discussion thread in the "Discussions" tab of this repository on Github or email us at lietk12@gmail.com .
Except where otherwise indicated in this repository, software files provided here are covered by the following information:
Copyright Prakash Lab and template-permissive project contributors
SPDX-License-Identifier: Apache-2.0 OR BlueOak-1.0.0
Software files in this project are released under the Apache License v2.0 and the Blue Oak Model License 1.0.0; you can use the source code provided here under the Apache License or under the Blue Oak Model License, and you get to decide which license you will agree to. We are making the software available under the Apache license because it's OSI-approved and it goes well together with the Solderpad Hardware License, which is an open hardware license used in various projects released by the Prakash Lab; but we like the Blue Oak Model License more because it's easier to read and understand. Please read and understand the licenses for the specific language governing permissions and limitations.
Except where otherwise indicated in this repository, any other files (such as images, media, data, and textual documentation) provided here not already covered by software or hardware licenses (described above) are instead covered by the following information:
Copyright Prakash Lab and codex-analysis-pipeline project contributors
SPDX-License-Identifier: CC-BY-4.0
Files in this project are released under the Creative Commons Attribution 4.0 International License. Please read and understand the license for the specific language governing permissions and limitations.