-
Notifications
You must be signed in to change notification settings - Fork 6
permutation based statistics
$ cd lama_phenotype_detection
$ python3.6 scripts/permutation_stats.py -w wt_dir, -m mut_dir -o outdir -n 1000
Why we choose n of line
From wikipedia
A permutation test (also called a randomization test, re-randomization test, or an exact test) is a type of statistical significance test in which the distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points
In LAMA, permutation testing has been implemented for the organ volume analysis only. This is because the voxel-based data contains many millions of data points that would be too computationally expensive. Organ volume data on the other hand has only n number of data points (where n is the total number of labels in the atlas).
The following procedure is performed per organ label
- For each mutant line, obtain the mutant sample number (n)
- Relabel n baselines as synthetic mutants
- Do multiple linear regression - regress organ volume on genotype + whole embryo volume
- Obtain p-value for genotype effect and add to null distribution
- For each mutant line, do regression as above
- Obtain p-value for genotype effect and add to alternative distribution
Search for a p-value cutoff where: proportion of null test statistics under the threshold / proportion of alternative test statistics under threshold is < 0.05
This gives us our FDR for that organ. i.e. how many times we expect to get a false positive result from across all our mutant lines.
After running registration using job_runner you should have 2 folders in your output directory
-
baseline
-
mutant
Each will have an output folder containing individual lines (in the mutant instance) or a single folder named 'baseline' in the case of baselines.
The individual specimen folders with each contain the following csv files that hold information on organ_volumes organ_volumes.csv - The organ volume in voxels for each label present in the label map staging_info_volume.csv - The whole embryo volume based on the mask supplied during registration (stats_mask)