permutation based statistics

Before running any LAMA scripts, make sure you the LAMA virtual environment

Introduction

todo

Why we choose n of line
describe the stats output files

A permutation test (also called a randomization test, re-randomization test, or an exact test) is a type of statistical significance test in which the distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points

Motivation

In LAMA, permutation testing has been implemented for the organ volume analysis only. This is because the voxel-based data contains many millions of data points that would be too computationally expensive. Organ volume data on the other hand has only n number of data points (where n is the total number of labels in the atlas).

Pipeline overview

The following procedure is performed per organ label

Generate null-distribution

For each mutant line, obtain the mutant sample number (n)
Relabel n baselines as synthetic mutants
Do multiple linear regression - regress organ volume on genotype + whole embryo volume
Obtain p-value for genotype effect and add to null distribution

Generate alternative distribution

For each mutant line, do regression as above
Obtain p-value for genotype effect and add to alternative distribution

Calculate false discovery rate (FDR)

Search for a p-value cutoff where: proportion of null test statistics under the threshold / proportion of alternative test statistics under threshold is < 0.05

This gives us our FDR for that organ. i.e. how many times we expect to get a false positive result from across all our mutant lines.

Running the script

After running registration using job_runner you should have 2 folders in your output directory

baseline
mutant

Each will have an output folder containing individual lines (in the mutant instance) or a single folder named 'baseline' in the case of baselines.

The individual specimen folders with each contain the following csv files that hold information on organ_volumes organ_volumes.csv - The organ volume in voxels for each label present in the label map staging_info_volume.csv - The whole embryo volume based on the mask supplied during registration (stats_mask)

Open a terminal and do the following

$ lama_permutation_stats.py -w path/to/baseline, -m path/to/mutant -o path/to/outdir -n 1000

TODO: Desrcibe the output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly