LepMap3 Details

The general workflow of LepWrap is:

Step	Module	Description	Directory
1	ParentCall2	creates LepMap3 data file	`1_ParentCall`
2	Filtering2	filter LM3 data	`2_Filtering`
3	SeparateChromosomes2	clusters markers into possible LGs	`3_SeparateChromosomes`
4	JoinSingles2ALL	clusters remaining markers into defined linkage groups	project directory
5	OrderMarkers2	orders markers into linkage groups	`4_OrderMarkers`
6	Trim ends	removes potential error and artifacts from edge clusters	`5_Trim`
7	OrderMarkers2	reorders trimmed markers	`6_OrderMarkers`
8	Calculate Distances/Intervals	final maps and intervals	`7_Distances*` `7_Intervals`

However, with Snakemake involved, there are a lot more rules that each do specific things. So, let's dive in to each rule and what it does:

1. parent_call

Merges your pedigree file with your filtered VCF file to create a data.call.gz file.

2. filtering

Filters the data from ParentCall2 with some kind of data tolerance value. This value is set in the config.yaml, and if you set it to 0, it will skip this step, instead creating data_f.call.gz as a relative soft link to data.call.gz.

3. separate_chromosomes

Create maps over a range of LOD score values that you have specified in the config.yaml file. By default, it's set to distortionLod=1. Maps will be output into the 3_SeparateChromosomes folder.

4. map_summary

A summary file all.maps.summary will be created combining the summaries of all the maps into a single file text file for easy assessment. The first column is the linkage group number, and the remaining columns correspond to the summary of markers per linkage group for each LOD limit that SeparateChromosomes2 was performed with.

4.5 refine a map

Depending on how your data looks at this point, you may consider cutting off the pipeline and refining a particular map before continuing. For example, if you expect 23 chromosomes/LG's but your map only has 22 and it seems like LG-1 and LG-2 have a lot more markers than the other groups, then you might want to try splitting out those over-clustered LG's. Use /scripts/refine_map.sh to iteratively test LOD scores to try splitting target LG's. You will have to play with the parameters a bit, such as setting a size limit to avoid accidentally creating too many LG's. If you perform this manual intervention, you will need to manually run JoinSingles2All (optional) or just copy your best map from that into the main directory as map.master.

5. join_singles

You are prompted in the console for the map you would like to use for the remainder of the pipeline. The parameters for this rule can be modified in the config.yaml (lodLimit=10,lodDifference=2). This map will be ran through joinSingles2ALL with those parameters and output as map.master in the project directory. A log of your choice will be in 3_SeparateChromosomes/chosen.map.

6. order_markers

Order your markers into positions among your linkage groups based on the map you chose. You will need to specify the the number of linkage groups and number of iterations in the config.yaml. I would recommend 100 iterations or more, as my own work sometimes revealed that the 100th iteration had the best likelihood. Ordering 25,000 markers across 24 linkage groups 100x on 24 processors should take between 1 and 2 days. That's an oddly specific example, but that's the data it was tested on.

7. order_summary

Pulls out all the likelihoods from every iteration of every linkage group and sorts them by group into likelihoods.summary. Also runs an R script to provide some summary information on the recombination information into recombinations.summary

8. best_likelihoods

Using the likelihoods isolated from order_summary, creates a file best.likelihoods that lists the best iteration for each linkage group.

9. trim_edge_clusters

Uses LepMak3rQA.r to trim the best ordered maps. The script will install dplyr locally if it's not found on the system, and will examine the first and last X% of markers along a linkage group and remove marker clusters that are greater than NcM away from the next markers (both X and N need to be configured in the config.yaml, but 15 and 10 are reasonable values). It's a mechanism to identify and remove clusters of ambiguously ordered markers. A filtered map for each linkage group will be created, along with a file containing the names of the markers removed, and a pdf with a visualization of the mapping and which markers were removed. Here is an example of one of those plots:

trimming plot example

These are fairly simple plots, but they feature the name of the linkage group and iteration on the Y-axis (ordered.linkage.iteration), and the markers that were removed from the male and female maps appearing as red crossed-out circles.

10. trim_summary

Collates all the removed markers into a summary file trim.summary

11. reorder_markers

Uses the trimmed linkage groups to reorder the markers. Uses the same configurations (#linkage groups, #iterations, distance method) as initial ordering.

12. reorder_summary

Summarizes reordering the same way as in the initial ordering

13. best_likelihoods2

Identifies the best candidates from reordering, just like it did with the initial ordering.

14. calculate_distances

The final product of the pipeline is in 3 folders: distances, distances_sexaverage, and intervals. The distances folder contains a copy of the best likelihoods from reordering. The distances_sexaverage folder contains the sex-averaged distances using OrderMarkers2 evaluateOrder=bestLG sexAveraged=1 where bestLG is the best candidates per linkage group identified in the previous step. The intervals folder contains the intervals for each marker in a linkage group using OrderMarkers2 calculateIntervals=.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly