Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating the definition of ambiguous mappers for paired-end #62

Merged
merged 4 commits into from
Oct 27, 2024

Conversation

andrewdavidsmith
Copy link
Contributor

@andrewdavidsmith andrewdavidsmith commented Oct 27, 2024

Previously the definition of "ambiguous" for a paired-end alignment was encapsulated in the following code:

bool update(const score_t scr, const se_element &s1, const se_element &s2) {
  if (scr > aln_score) {
    r1 = s1;
    r2 = s2;
    aln_score = scr;
    return true;
  }
  else if (scr == aln_score) {
    r1.set_ambig();
    return false;
  }
  return false;
}

If the new score scr was greater than the previous score aln_score then the new mappings for end1 and end2 (s1 and s2, respectively), would replace the previous best mapping for the pair. However, in many situations the positions of s1 and s2 could differ by a very small amount, for example a single nucleotide. This would mean that the mappings would be considered different, even though their alignment, including the cigar strings, would be the same.

This pull request addresses the issue by using the following code instead:

bool update(const score_t scr, const se_element &s1, const se_element &s2) {
  const auto rd = r1.diffs + r2.diffs;
  const auto sd = s1.diffs + s2.diffs;
  if (scr > aln_score || (scr == aln_score && sd < rd)) {
    r1 = s1;
    r2 = s2;
    aln_score = scr;
    return true;
  }
  else if (scr == aln_score && sd == rd) {
    r1.set_ambig();
    return false;
  }
  return false;
}

Now, when the previous core aln_score is identical to the score scr for the newly proposed alignment, if the Hamming distances are checked and used as a tie-breaker. This has the consequence of calling fewer paired-end mappings ambiguous. Those reads would have been mapped as single-end in the event of ambiguity as paired-end, and they could be detected as concordant based on their locations, but they would not have the mtid and mpos fields within the BAM format
assigned. Being able to use those fields is very convenient for downstream processing.

This change seems to have little or no effect of speed of abismal.

…reads to use Hamming distance as tiebreaker when alignment scores are identical. This has the effect of not marking near-identical concordant mapppings as ambiguous by simply choosing one among the two. Since the ends would be mapped as single-end the major consequence is to make it easier to identify concordant mappers downstream when merging them into fragments
@andrewdavidsmith andrewdavidsmith merged commit a83a954 into master Oct 27, 2024
3 checks passed
@andrewdavidsmith andrewdavidsmith deleted the update-defn-for-pe-ambig branch October 28, 2024 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant