Skip to content

Latest commit

 

History

History
executable file
·
192 lines (145 loc) · 9.57 KB

README.rst

File metadata and controls

executable file
·
192 lines (145 loc) · 9.57 KB

TYPEx workflow

Using multiplexed imaging, TYPEx detects protein expression on single cells, annotates cell types automatically based on user-provided definitions and quantifies cell densities per tissue area. It can be customised with input parameters and configuration files, allowing it to perform an end-to-end cell phenotyping analysis without the need for manual adjustments.

Usage

First, clone the TYPEX or the TRACERx-PHLEX repository:

git clone git@github.com:FrancisCrickInstitute/TRACERx-PHLEX.git

git clone git@github.com:FrancisCrickInstitute/TYPEx.git

Running on input generated with deep-imcyto

nextflow run TRACERx-PHLEX/TYPEx/main.nf \
     -c $PWD/TRACERx-PHLEX/TYPEx/conf/testdata.config \
     --input_dir $PWD/results/deep-imcyto/$release/ \
     --sample_file $PWD/TRACERx-PHLEX/TYPEx/data/sample_file.tracerx.txt \
     --release $release \
     --outDir "$PWD/results/TYPEx/$release/" \
     --params_config "$PWD/TRACERx-PHLEX/TYPEx/data/typing_params.json" \
     --annotation_config "$PWD/TRACERx-PHLEX/TYPEx/data/cell_type_annotation.json" \
             --tissue_seg_model "$PWD/TRACERx-PHLEX/TYPEx/models/tumour_stroma_classifier.ilp" \
             --color_config $PWD/TRACERx-PHLEX/TYPEx/data/celltype_colors.json \
     --deep_imcyto true --cellprofiler true \
     -profile singularity \
     -resume

Running TYPEx with user-provided cell objects tables (indpendently of deep-imcyto)

release=TYPEx_test
nextflow run TYPEx/main.nf \
-c $PWD/TYPEx/test.config \
 -c TYPEx/testdata.config \
 --input_dir $PWD/results/ \
 --release $release \
 --input_table $PWD/TYPEx/data/cell_objects.tracerx.txt \
 --sample_file $PWD/TYPEx/data/sample_file.tracerx.txt \
 --outDir "$PWD/results/TYPEx/$release/" \
 --params_config "$PWD/TYPEx/data/typing_params.json" \
 --annotation_config "$PWD/TYPEx/data/cell_type_annotation.json" \
     --color_config $PWD/TYPEx/data/celltype_colors.json \
 -profile singularity \
 -resume

Running locally without high-perfomance computing server

release=TYPEx_test
nextflow run TYPEx/main.nf \
-c $PWD/TYPEx/conf/testdata.config \
 -c TYPEx/testdata.config \
 --input_dir $PWD/results/ \
 --release $release \
 --input_table $PWD/TYPEx/data/cell_objects.tracerx.txt \
 --sample_file $PWD/TYPEx/data/sample_file.tracerx.txt \
 --outDir "$PWD/results/TYPEx/$release/" \
 --params_config "$PWD/TYPEx/conf/typing_params.json" \
 --annotation_config "$PWD/TYPEx/data/cell_type_annotation.json" \
     --color_config $PWD/TYPEx/data/celltype_colors.json \
 -profile docker \
 -resume

Input Files

Required Inputs

  • cell_type_annotation.json - a file with cell definitions specific to the user’s antibody panel (see :ref:`Cell type definitions`).
    Specified with --annotationConfig parameter.
  • sample_data.tracerx.txt
    A tab-delimited file with information for all images (see :ref:`Sample annotation table`). Specified with --sampleFile parameter.
  • inDir for deep-imcyto input or inputTable for runs independent of deep-imcyto
    Directory specified with --inDir parameter and input file specified with --inputTable parameter. --inputTable is tab-delimited file with marker intensities and cell coordiate per cell object (see :ref:`Input table`).

Optional Inputs

  • typing_params.json - a config file with information on the cell typing workflow.
    A tab-delimited file with information for all images (see :ref:`Typing parameters config`). Specified with --paramsConfig parameter.
  • tissue_segmentation.json - a file with information on tissue categories/annotation that can be overlaid to each cell object along with the cell type information. In the case of tissue compartments, e.g. Tumour and Stroma, a summary table will also be generated with quantifications per compartment.
    Specified with --overlayConfigFile parameter.
  • celltype_colors.json - color settings for the user-defined cell types.
    Specified with --colorConfig parameter.

Input Parameters

release - provide a unique identifier for the run [default: PHLEX_test] panel - provide a unique identifier for the panel [default: p1] study - provide a unique identifier for the study [default: tracerx]

Several input paramters can be used to define the typing workflow: - deep-imcyto run the TYPEx multi-tiered approach [default: true] - cellprofiler run TYPEx on deep-imcyto in MCCS mode when true and simple segmentation mode when false [default: true]

  • tiered run the TYPEx multi-tiered approach [default: true]
  • stratify_by_confidence include the stratification by low and high confidence when true [default: true]
  • sampled run TYPEx on subsampled data with three iterations when true [default: false]
  • clustered perform clustering without any stratification [default: false]

The following parameters refer to the typing approach: - subtype_method the clustering approach to be used in the last stratification step [default: FastPG] - major_markers the label of the major cell type definitions in cell_type_annotation.json [default: major_markers] - subtype_markers the label of the cell subtype definitions in cell_type_annotation.json [default: subtype_markers] - mostFreqCellType the most frequent cell type in the cohort if known in cell_type_annotation.json [default: None]

Note

The most frequent cell type is used to build the reference model by excluding this cell type. When it is not provided, the complete model wil be built, followed by the reference model. If provided, both will be executed in parallel. Parallel execution can make a difference in time, as these are the most time-consuming processes.

User-provided cell type definitions

The cell-type definitions file cell_type_annotation.json includes a list of cell lineages and the corresponding marker proteins that together can be used to identify a cell lineage. When designing this file it is important to ensure that each cell in the cohort can be covered by these definitions. Some markers, such as CD45 and Vimentin, are expressed by multiple cell lineages. These shared proteins are used to infer a hierarchy of cell lineages, which is later considered for cell stratification and annotation. An example of a cell-type definitions file is shown below for TRACERx analyses, where we defined 13 major cell types targeted by our two antibody panels, while ensuring that each cell in the cohort can be covered by these definitions.

Input table

The input matrix has values that summarise the intensity of a protein per cell object, such as mean intensity, independently of the imaging modality or antibody tagging technique.

ObjectNumber imagename X Y Area <Marker 1> ... <Marker N>

Typing parameters config

typing_params.json contains the settings for clustering approaches to be used, normalisation approaches, and filtering criteria.

Key parameters that are often of interest are: * magnitude As CellAssign was developed for single-cell sequencing read count data, the input protein intensity matrix should be rescaled to a range of 0 - 10^6 using the input parameter magnitude.

  • batch_effects

CellAssign also accounts for batch effects, which can be considered if provided in a sample-annotation table and specified as input parameters to TYPEx for batch correction.

Sample annotation table

Provide the sample annotation table in the following format:

imagename <experimental condition> <Batch effect 1> ... <Batch effect N> use_image

Outputs

TYPEx outputs summary tables that can be readily interrogated for biological questions. These include densities of identified cell phenotypes (cell_density_*.txt), a catalogue of the expressed proteins and combinations thereof (phenotypes.*.txt), quantified across the whole tissue area (summary_*.cell_stats.txt) or within each tissue compartment (categs_summary_*.cell_stats.txt).

    summary
├── cell_density_*.txt
├── cell_objects_*.txt
├── phenotypes.*.txt
├── summary_*.cell_stats.txt
├── categs_summary_*.cell_stats.txt
    ├── maps
    ├── intensity_plots
    ├── overlays

Troubleshooting

Several visualisation plots are output for each step in the workflow and can be used to make sure each step has gone as expected.