Skip to content

Hapmap creation workflow

Vladimir Gritsenko edited this page Apr 26, 2016 · 2 revisions
  1. hapmap.install_3.php - creates the hapmap folders and associated configuration and log files. Important parameter - referencePloidy. "One diploid" is 2, "two haploid" is 1.
  2. hapmap.install_4.sh:
  • For two haploids:
    1. Copy SNP_CNV_v1.zip of both haploids into the hapmap folder, and unzip them as SNPdata_parent (they are indexed as 1 and 2).
    2. Run hapmap.preprocess_haploid_parents.py in hapmap mode. This script goes over both of the above datasets, and adds to the hapmap only those coordinates which are both homozygous (defined as allelic ratio >= 0.5) and different in the both datasets. Output is saved as SNPdata_parent.txt. Note: 0 in the phasing info column means that no correction is needed (per script).
    3. The original files from (i) are deleted.
  • For one diploid:
    1. Copy the parent diploid's putative_SNPs_v4.zip file, and unzip it as SNPdata_parent.
    2. Run hapmap.preprocess_parent.py on the above file (changing it) in hapmap mode.
    3. Run hapmap.expand_definitions.py.
    4. Copy the child's SNP_CNV_v1.zip and unzip it as SNPdata_child (TODO: why is this indexed?).
    5. Run hapmap.process_child.py on SNPdata_child. This changes SNPdata_parent.txt. The child dataset is removed.