-
Notifications
You must be signed in to change notification settings - Fork 1
Prerequisite Files for Lep Map3
Start to finish, a lot happens between the Lep-Map3 modules. Before you can begin, you will need to have 2 things on hand:
This variant call format file must be filtered to some degree. At minimum (and as a rule of thumb), remove missing data below a certain threshold and keep only biallelic snps, removing indels and multi-nucleotide polymorphisms.
The pedigree file is something you'll need to create yourself. Here are the guidelines from the Lep-Map3 wiki:
The first 6 lines presents the pedigree. First line is the family name, second individual name, third and fourth are the father and mother. Line 5 containts the sex of each individual (1 male, 2 female, 0 unknown) and the last line is the phenotype (can be 0 for all individuals, this is not currently used). The likelihoods can be provided from line 7 forward (columns must match) or on a separate file given as parameter posteriorFile or vcfFile. Finally, columns 1-2 give marker names (scaffold and pos) for genotypes, and can be any value for pedigree part. Thus, make sure that each line has n+2 tab separated columns if there are n individuals and column i + 2 gives the genotype and pedigree information on individual i.
Example pedigree (in correct transpose, should be tab separated) is below:
CHR POS F F F F F F CHR POS female male progeny_1 progeny_2 progeny_3 progeny_4 CHR POS 0 0 male male male male CHR POS 0 0 female female female female CHR POS 2 1 0 0 0 0 CHR POS 0 0 0 0 0 0
If you have a pedigree file (pedigree.txt) in transpose (6 columns, one row for each individual), it can be converted to proper LM3 format by following command:
./transpose_tab pedigree.txt|awk '{print "CHR\tPOS\t" }' >ped_t.txt
If you have a simple single-family cross, you can use /scripts/popmap2pedigree
to convert your Stacks- or dDocent-generated popmap file into a LepMap3-compliant pedigree file. Make sure your popmap is a tab-separated two-column text file with the first column of sample names and the second column of parent
or progeny
designations:
sample_1<tab>parent
sample_2 parent
f1_001 progeny
f1_002 progeny
f1_003 progeny
etc...
Actually, so long as the parents are designated parent
, it doesn't matter what progeny are designated as. Go wild 😃 (...but not too wild)