-
Notifications
You must be signed in to change notification settings - Fork 6
statistical pipeline
To do the statistical analysis, the wild type specimens and mutant lines need to have previously been run through the registration pipeline
One this has been done, you should have three folders with the following names:
- baseline - containing the baseline registration results
- mutant - containing the mutant line registration results
- target - containing the population average and associated files (label maps, label name files etc)
In each specimen folder of the mutant and baseline folders there will be two CSV files that are needed for the analysis
- a CSV that contains the whole embryo volume information named staging_info_volume.csv.
- A CSV file with calcualted whole embryo volume named organ_volumes.csv
The stats config file is in toml format (https://toml.io/)
stats_types = [
'organ_volumes',
'intensity',
'jacobians'
]
use_log_jacobians = true
reg_folder = 'similarity'
jac_folder = 'similarity'
mask = 'mask_tight.nrrd'
label_info = 'label_info.csv'
label_map = 'labels.nrrd'
blur_fwhm = 100
voxel_size = 14.0
invert_stats = true
baseline_ids = 'baselines_ids.csv'
mutant_ids = 'mutant_ids.toml'
memmap = true
stats_types: list the analysis types to do (currently three available)
use_log_jacobians Use the log-transformed jacobians (default=true). If false, the non-transformed jacbians are used
Can be useful if there are negative jacobians.
reg_folder: the name of the registration sub-folders to use This will be the name of the final registration stage (todo: get this automatically)
jac_folder: the name of the jacobian determinant folder name to use todo (make this optional as generally we produce jacobians from only one registration stage
mask: The name of the mask to use for statistical analysis This could be a different mask than that used for registration. We have noticed that too tight a mask for registration can call problems. Whereas we might want a tighter mask for removing data points outside of the embryo.
label_info: name of the label info file. See Input data
label_map: name of the label map. See [Input data(/input_data#label-map)
blur_fwhm: the size of the Gaussian blur kernel (full width half maximum in micrometers)
voxel_size: the voxel size of the input images in micrometers
baseline_ids: (optional) name of a csv file (relative to config file path) containing baseline specimen ids (one per row) to be included in the analysis. If omitted, all baselines will be used.
mutant_ids: (optional) name of a toml file (relative to config file path) containing mutant specimen ids to be included in the analysis. If omitted, all specimens in the line will be used. See example below.
use_staging: true/false (default true). Whether to use staging (embryo volume) in the linear model to account for developmental stage.
normalise_organ_vol_to_mask: true/false (default false). Whether to normalise the organ volumes in rogan_volume analysis to the whole wmbryo volume.
memmap: true/false (default false). If true, read in the data and write to a memory-memmpared array before processing. Use if limited ram is avaiable or you get memory errors. This will slow processing speeds, but not so much if using an ssd drive.
example mutant_ids file
mutant_1 = [
'id_1',
'id_2']
mutant_2 = [
'id_3',
'id_4]
$ lama_stats.py -c <path to stats config> -w <Path to wild type directory> -m <Path to mutant directory> -t <path to target directory> -o <path to output directory>
In the output folder (specified with the -o argument) There will be a folder for each line processed. In that folder will be subfolders containing the output of each analysis type.
In the following example mutant1 has results fro the jacobian analysis. The nrrd file named line_jacobians.nrrd are the line-level results, and the files with the specimen names in them are the specimen-level results.
These images can be analysed using the image viewer VPV that we have developed.
There are instructions on how to loo at statistical heatmaps in vpv here.