Skip to content

permutation based statistics

Neil Horner edited this page Dec 14, 2018 · 26 revisions

Before running any LAMA scripts, make sure you the LAMA virtual environment

Introduction

todo
  • Why we choose n of line
  • describe the stats output files

From wikipedia

A permutation test (also called a randomization test, re-randomization test, or an exact test) is a type of statistical significance test in which the distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points

Motivation

In LAMA, permutation testing has been implemented for the organ volume analysis only. This is because the voxel-based data contains many millions of data points that would be too computationally expensive. Organ volume data on the other hand has only n number of data points (where n is the total number of labels in the atlas).

Pipeline overview

The following procedure is performed per organ label

Generate null-distribution
  • For each mutant line, obtain the mutant sample number (n)
  • Relabel n baselines as synthetic mutants
  • Do multiple linear regression - regress organ volume on genotype + whole embryo volume
  • Obtain p-value for genotype effect and add to null distribution
Generate alternative distribution
  • For each mutant line, do regression as above
  • Obtain p-value for genotype effect and add to alternative distribution
Calculate false discovery rate (FDR)

Search for a p-value cutoff where: proportion of null test statistics under the threshold / proportion of alternative test statistics under threshold is < 0.05

This gives us our FDR for that organ. i.e. how many times we expect to get a false positive result from across all our mutant lines.

Running the script

After running registration using job_runner you should have 2 folders in your output directory

  1. baseline

  2. mutant

Each will have an output folder containing individual lines (in the mutant instance) or a single folder named 'baseline' in the case of baselines.

The individual specimen folders with each contain the following csv files that hold information on organ_volumes organ_volumes.csv - The organ volume in voxels for each label present in the label map staging_info_volume.csv - The whole embryo volume based on the mask supplied during registration (stats_mask)

Open a terminal and do the following

$ lama_permutation_stats.py -w path/to/baseline, -m path/to/mutant -o path/to/outdir -n 1000

TODO: Desrcibe the output

Clone this wiki locally