Updating the definition of ambiguous mappers for paired-end #62

andrewdavidsmith · 2024-10-27T23:16:57Z

Previously the definition of "ambiguous" for a paired-end alignment was encapsulated in the following code:

bool update(const score_t scr, const se_element &s1, const se_element &s2) {
  if (scr > aln_score) {
    r1 = s1;
    r2 = s2;
    aln_score = scr;
    return true;
  }
  else if (scr == aln_score) {
    r1.set_ambig();
    return false;
  }
  return false;
}

If the new score scr was greater than the previous score aln_score then the new mappings for end1 and end2 (s1 and s2, respectively), would replace the previous best mapping for the pair. However, in many situations the positions of s1 and s2 could differ by a very small amount, for example a single nucleotide. This would mean that the mappings would be considered different, even though their alignment, including the cigar strings, would be the same.

This pull request addresses the issue by using the following code instead:

bool update(const score_t scr, const se_element &s1, const se_element &s2) {
  const auto rd = r1.diffs + r2.diffs;
  const auto sd = s1.diffs + s2.diffs;
  if (scr > aln_score || (scr == aln_score && sd < rd)) {
    r1 = s1;
    r2 = s2;
    aln_score = scr;
    return true;
  }
  else if (scr == aln_score && sd == rd) {
    r1.set_ambig();
    return false;
  }
  return false;
}

Now, when the previous core aln_score is identical to the score scr for the newly proposed alignment, if the Hamming distances are checked and used as a tie-breaker. This has the consequence of calling fewer paired-end mappings ambiguous. Those reads would have been mapped as single-end in the event of ambiguity as paired-end, and they could be detected as concordant based on their locations, but they would not have the mtid and mpos fields within the BAM format
assigned. Being able to use those fields is very convenient for downstream processing.

This change seems to have little or no effect of speed of abismal.

…reads to use Hamming distance as tiebreaker when alignment scores are identical. This has the effect of not marking near-identical concordant mapppings as ambiguous by simply choosing one among the two. Since the ends would be mapped as single-end the major consequence is to make it easier to identify concordant mappers downstream when merging them into fragments

…ed within abismal

…forgotten

andrewdavidsmith added 4 commits October 27, 2024 16:15

data/md5sum.txt: udpating hashes for change to how ambiguity is defin…

43ad31a

…ed within abismal

Merge commit

61667b4

data/md5sum.txt: udpating hashes because the previous PR hashes were …

25fd5fc

…forgotten

andrewdavidsmith merged commit a83a954 into master Oct 27, 2024
3 checks passed

andrewdavidsmith deleted the update-defn-for-pe-ambig branch October 28, 2024 00:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating the definition of ambiguous mappers for paired-end #62

Updating the definition of ambiguous mappers for paired-end #62

andrewdavidsmith commented Oct 27, 2024 •

edited

Loading

Updating the definition of ambiguous mappers for paired-end #62

Updating the definition of ambiguous mappers for paired-end #62

Conversation

andrewdavidsmith commented Oct 27, 2024 • edited Loading

andrewdavidsmith commented Oct 27, 2024 •

edited

Loading