Updating the definition of ambiguous mappers for paired-end #62
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously the definition of "ambiguous" for a paired-end alignment was encapsulated in the following code:
If the new score
scr
was greater than the previous scorealn_score
then the new mappings for end1 and end2 (s1
ands2
, respectively), would replace the previous best mapping for the pair. However, in many situations the positions ofs1
ands2
could differ by a very small amount, for example a single nucleotide. This would mean that the mappings would be considered different, even though their alignment, including the cigar strings, would be the same.This pull request addresses the issue by using the following code instead:
Now, when the previous core
aln_score
is identical to the scorescr
for the newly proposed alignment, if the Hamming distances are checked and used as a tie-breaker. This has the consequence of calling fewer paired-end mappings ambiguous. Those reads would have been mapped as single-end in the event of ambiguity as paired-end, and they could be detected as concordant based on their locations, but they would not have themtid
andmpos
fields within the BAM formatassigned. Being able to use those fields is very convenient for downstream processing.
This change seems to have little or no effect of speed of abismal.