Skip to content

IMPC annotation pipeline

Hamed Haseli edited this page Jul 23, 2020 · 14 revisions

IMPC annotation pipeline

Annotation pipeline in the Internationa Mouse Phenotyping Consortium (IMPC) is an exciting data assignment project to associate phenotypic observations to the genetic modification. Here we explain the steps that are taken to select the best Mamalian Phenotype (MP) term to the genetic modification in mice when a significant difference (typically at the level of 0.0001) from the baselines observed.

Annotation pipeline and the analysis framework

The IMPC annotation pipeline (IMPC-AP) assigns MP terms to the statistically significant genetic effect. The genetic effect at the IMPC is specified by three statistical analysis platforms that are designed in the IMPC statistical pipeline through OpenStats software. Here we break the annotation pipeline by the type of the input data and the analysis frameworks.

Annotation table

The annotation pipeline in the IMPC requires a reference table that summarizes the available terms for an IMPC parameter. This can be retrieved from IMPReSS however to remove the dependency to the live servers, the IMPC-AP utilised an offline version of the file called Annotation Indexer in this document. An instance of this file is available from link [note that this file is in the Rdata format that requires R software to open].

Continuous data – Linear mixed model

Continuous data such as tail length, tibia length etc. in IMPC is analysed by linear mixed model, implemented in the software package OpenStats. The continuous measurements are more informative than the other types in the aspect that the direction of change can be determined by the effect size (increasing/decreasing/steady effect). Here we summarised the steps to assign MP terms to the continuous measurements.

From the statistical results From the Annotation Indexer
1*. Overal effect (both sexes)
- if pvalue ≥ threshold → no MP term
- if pvalue < threshold (II)
      1. If effect size > 0 → Increase term
      2. if effect size <0 → Decrease term
      3. if effect size=0 → Steady term
- similar steps in 1* apply for Male effect (III)
- similar steps in 1* apply for Female effect (IV)
Filter for
  1. Pipeline_stable_id
  2. Procedure_group
  3. Parameter_stable_id

Get available MP terms (I)
Find matches between I and II, III, IV
Notes - If increase or decrease effect detected then ignore ABNORMAL MP term.
- Generally accepted threshold by the IMPC consortium is 0.0001

Continuous data – Reference Range plus

Due to the complexity of the data not all continues data can be analysed by linear mixed model. Alternatively, there are many cases in the IMPC that are analysed by Reference Range plus (RR+) method implemented in the OpenStats software package. RR+ is a heuristic method that works based on discretizing baselines into low/normal/high categories. The mutants then are assigned a class based on the reference categories. Finally, Fisher's Exact test applies to specify any significant deviation from the normal category. Here we explain the MP term assignment algorithm for the results from the RR+ framework.

From the statistical results From the Annotation Indexer
1*. Overal effect (do not consider gender)
   - if pvalue.low ≥ threshold & pvalue.high ≥ threshold → Assign no MP term
   - for each pvalue(.low/.high) < threshold → assign temporary labels Abnormal, Increase and Decrease to the search criteria
   - Remove any Low.Increase and High.Decrease from the labels (II)
   - similar steps in 1* apply for Male effect (III)
   - similar steps in 1* apply for Female effect (IV)
Filter for
   1. Pipeline_stable_id
   2. Procedure_group
   3. Parameter_stable_id
   Get available MP terms (I)
Find matches between I and II, III, IV regardless of the term Low. and High.
Notes - IF conflicting Low. and High. MP terms detected then select Abnormal term
- Generally accepted threshold by the IMPC consortium is 0.0001

|

Categorical data

Categorical data in the IMPC encomapsses a range of qualitative measurements such as abnormality in eye, ear, tail and are analysed using Fisher's Exact test implemented in the R package OpenStats. The output MP term for this type of data is a single term Abnormal phenotype if the statistical test is significant. Here we explain the algorithm:

From the statistical results From the Annotation Indexer
1*. Overall effect (do not consider gender)
   - if pvalue ≥ threshold → Assign no MP term
   - if pvalue < threshold → search for the MP term (II)

   - similar steps in 1* apply for Male effect (III)
   - similar steps in 1* apply for Female effect (IV)
Filter for
   1. Pipeline_stable_id
   
   2. Procedure_group
   3. Parameter_stable_id
   Get available MP terms (I)
Find matches between I and II, III, IV
Note - Generally accepted threshold by the IMPC consortium is 0.0001

Schematic view of the IMPC-AP

Clone this wiki locally