Skip to content

LepMap3 Details

Pavel V. Dimens edited this page May 20, 2021 · 7 revisions

The general workflow of LepWrap is:

Step Module Description Directory
1 ParentCall2 creates LepMap3 data file 1_ParentCall
2 Filtering2 filter LM3 data 2_Filtering
3 SeparateChromosomes2 clusters markers into possible LGs 3_SeparateChromosomes
4 JoinSingles2ALL clusters remaining markers into defined linkage groups project directory
5 OrderMarkers2 orders markers into linkage groups 4_OrderMarkers
6 Trim ends removes potential error and artifacts from edge clusters 5_Trim
7 OrderMarkers2 reorders trimmed markers 6_OrderMarkers
8 Calculate Distances/Intervals final maps and intervals 7_Distances* 7_Intervals

However, with Snakemake involved, there are a lot more rules that each do specific things. So, let's dive in to each rule and what it does:

1. parent_call

Merges your pedigree file with your filtered VCF file to create a data.call.gz file.

2. filtering

Filters the data from ParentCall2 with some kind of data tolerance value. This value is set in the config.yaml, and if you set it to 0, it will skip this step, instead creating data_f.call.gz as a relative soft link to data.call.gz.

3. separate_chromosomes

Create maps over a range of LOD score values that you have specified in the config.yaml file. By default, it's set to distortionLod=1. Maps will be output into the 3_SeparateChromosomes folder.

4. map_summary

A summary file all.maps.summary will be created combining the summaries of all the maps into a single file text file for easy assessment. The first column is the linkage group number, and the remaining columns correspond to the summary of markers per linkage group for each LOD limit that SeparateChromosomes2 was performed with.

4.5 refine a map

Depending on how your data looks at this point, you may consider cutting off the pipeline and refining a particular map before continuing. For example, if you expect 23 chromosomes/LG's but your map only has 22 and it seems like LG-1 and LG-2 have a lot more markers than the other groups, then you might want to try splitting out those over-clustered LG's. Use /scripts/refine_map.sh to iteratively test LOD scores to try splitting target LG's. You will have to play with the parameters a bit, such as setting a size limit to avoid accidentally creating too many LG's. If you perform this manual intervention, you will need to manually run JoinSingles2All (optional) or just copy your best map from that into the main directory as map.master.

5. join_singles

You are prompted in the console for the map you would like to use for the remainder of the pipeline. The parameters for this rule can be modified in the config.yaml (lodLimit=10,lodDifference=2). If not disabled in the config.yaml, this map will be ran through joinSingles2ALL with those parameters and output as map.master in the project directory. A log of your choice will be in 3_SeparateChromosomes/chosen.map.

6. order_markers

Order your markers into positions among your linkage groups based on the map you chose. You will need to specify the the number of linkage groups and number of iterations in the config.yaml. I would recommend 100 iterations or more, as my own work sometimes revealed that the 100th iteration had the best likelihood. Ordering 25,000 markers across 24 linkage groups 100x on 24 processors should take between 1 and 2 days. That's an oddly specific example, but that's the data it was tested on.

7. recombination_summary

Runs an R script to provide some summary information on the recombination information into recombinations.summary

8. trim_edge_clusters

Uses LepWrapTrim.r to trim the ordered maps. The script will install dplyr locally if it's not found on the system, and will examine the first and last X% of markers along a linkage group and remove marker clusters that are greater than NcM away from the next markers (both X and N need to be configured in the config.yaml, but 15 and 10 are reasonable values). It's a mechanism to identify and remove clusters of ambiguously ordered markers. A filtered map for each linkage group will be created, along with a file containing the names of the markers removed, and a pdf with a visualization of the mapping and which markers were removed. Here is an example of one of those plots:

trimming plot example

9. trim_summary

Collates all the removed markers into a summary files trim.summary (counts per lg) and trim.details (which markers per lg).

10. reorder_markers

Uses the trimmed linkage groups to reorder the markers. Uses the same configurations (#linkage groups, #iterations, distance method) as initial ordering.

11. recombination_summary2

Summarizes reordering the same way as in the initial ordering

12. calculate_distances

The final product of the pipeline is in 3 folders: distances, distances_sexaverage, and intervals. The distances folder contains a copy of the best likelihoods from reordering. The distances_sexaverage folder contains the sex-averaged distances using OrderMarkers2 evaluateOrder=bestLG sexAveraged=1 where bestLG is the best candidates per linkage group identified in the previous step. The intervals folder contains the intervals for each marker in a linkage group using OrderMarkers2 calculateIntervals=.

Getting started

LepMap3 Workflow

LepAnchor Workflow

Other

Clone this wiki locally