-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AuxAlignment for parsing secondary/supplementary alignments in SAM tags #209
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe pull request introduces significant changes to the SAM module by replacing the New methods for creating auxiliary alignments from primary alignments have been added, along with a constant to indicate missing query bases. Comprehensive unit tests for the Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/fgpyo/sam/test_supplementary_alignments.py (1)
Line range hint
8-24
: Evaluate the necessity of tests for deprecatedSupplementaryAlignment
.Since
SupplementaryAlignment
is deprecated, consider focusing on testing the newAuxAlignment
class. Maintaining tests for a deprecated class might not be essential unless required for backward compatibility.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
fgpyo/sam/__init__.py
(5 hunks)tests/fgpyo/sam/test_aux_alignment.py
(1 hunks)tests/fgpyo/sam/test_supplementary_alignments.py
(4 hunks)
🔇 Additional comments (2)
fgpyo/sam/__init__.py (1)
1255-1528
: Implementation ofAuxAlignment
is solid.The
AuxAlignment
class is well-designed, effectively replacingSupplementaryAlignment
with enhanced functionality for parsing auxiliary alignments from SAM tags. The parsing methods and validation checks are thorough.tests/fgpyo/sam/test_aux_alignment.py (1)
1-612
: Comprehensive test coverage forAuxAlignment
.The test suite thoroughly validates input parsing, exception handling, and key functionalities of
AuxAlignment
, ensuring robustness against various edge cases.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #209 +/- ##
==========================================
+ Coverage 91.06% 91.54% +0.47%
==========================================
Files 18 18
Lines 2283 2400 +117
Branches 337 354 +17
==========================================
+ Hits 2079 2197 +118
Misses 133 133
+ Partials 71 70 -1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this direction!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after @tfenne's comments look good and you're happy with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
tests/fgpyo/sam/test_template_iterator.py (1)
225-241
: Test looks good but could use more edge cases.Test follows AAA pattern and verifies core functionality.
Consider adding test cases for:
- Empty/malformed tag values
- Multiple auxiliary alignments
- Hard-clipped alignments
fgpyo/sam/__init__.py (2)
142-164
: Add more comprehensive examples.Current examples show basic usage. Consider adding:
- Multiple auxiliary alignment parsing
- Error handling examples
- Template integration examples
1482-1486
: Improve hard-clipping sequence handling.Current implementation sets sequence to "*" for any hard-clipping. Consider implementing the TODO to properly pad/clip sequences.
Would you like me to propose an implementation for proper sequence padding/clipping?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
fgpyo/sam/__init__.py
(6 hunks)tests/fgpyo/sam/test_aux_alignment.py
(1 hunks)tests/fgpyo/sam/test_template_iterator.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/fgpyo/sam/test_aux_alignment.py
🧰 Additional context used
🪛 GitHub Actions: Python package
fgpyo/sam/__init__.py
[error] 1381-1381: f-string: expecting '}' - Syntax error in f-string formatting
🔇 Additional comments (3)
fgpyo/sam/__init__.py (3)
Line range hint
172-214
: Constants and imports look good.
1261-1537
: Well-structured class implementation.The AuxAlignment class is well-documented and implements comprehensive validation. Good separation of concerns between parsing and SAM record generation.
🧰 Tools
🪛 GitHub Actions: Python package
[error] 1381-1381: f-string: expecting '}' - Syntax error in f-string formatting
1348-1350
: Resolve alignment score validation TODO.The commented validation for alignment scores should be resolved.
Let's check BWA's behavior with negative alignment scores:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
fgpyo/sam/__init__.py (5)
142-164
: Add more comprehensive examples.Add examples showing:
- Error cases and exception handling
- Multiple auxiliary alignments in a single tag
- Usage with Template class
1225-1230
: Add usage examples in docstring.Add examples showing:
- How the method affects the template structure
- Common use cases
1339-1353
: Add more validations in post_init.Add validation for:
- reference_name format (non-empty, valid characters)
- cigar string format
1482-1487
: Implement TODO for handling hard clips.The current implementation sets bases/quals to "*" when hard clips exist. Consider implementing the suggested padding/clipping logic.
Would you like me to help implement the sequence padding/clipping logic?
1519-1523
: Consider making RG tag mandatory.The Read Group (RG) tag is optional but highly recommended. Consider requiring it for consistency with SAM spec.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fgpyo/sam/__init__.py
(6 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
fgpyo/sam/__init__.py
[warning] 1376-1376: fgpyo/sam/init.py#L1376
Added line #L1376 was not covered by tests
[warning] 1387-1387: fgpyo/sam/init.py#L1387
Added line #L1387 was not covered by tests
[warning] 1437-1437: fgpyo/sam/init.py#L1437
Added line #L1437 was not covered by tests
[warning] 1469-1469: fgpyo/sam/init.py#L1469
Added line #L1469 was not covered by tests
🔇 Additional comments (2)
fgpyo/sam/__init__.py (2)
213-214
: LGTM!The constant follows the module's naming convention and is well-documented.
1348-1350
: Resolve TODO comment about alignment scores.The comment about negative alignment scores needs resolution. Either:
- Add validation
- Document why negative scores are allowed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
fgpyo/sam/__init__.py (1)
142-164
: Add error case examples to documentation.Add examples showing:
- Invalid tag values
- Multi-value parsing
- Error handling
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
fgpyo/sam/__init__.py
(6 hunks)tests/fgpyo/sam/test_aux_alignment.py
(1 hunks)tests/fgpyo/sam/test_template_iterator.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/fgpyo/sam/test_template_iterator.py
- tests/fgpyo/sam/test_aux_alignment.py
🔇 Additional comments (4)
fgpyo/sam/__init__.py (4)
213-215
: LGTM!Well-documented constant following established patterns.
1225-1230
: LGTM!Clean implementation with proper null handling and efficient iterator usage.
1348-1350
: Resolve TODO comment about alignment score validation.The comment about alignment score validation needs resolution. If BWA allows negative scores, document this. If not, implement the validation.
1433-1438
: Add test coverage for error paths.Missing test coverage for:
- Empty tag values
- Invalid tag values
- Multi-value string parsing errors
✅ Verification successful
Error paths are well tested.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check test coverage for error paths fd "test_.*\.py" tests/fgpyo/sam/ -x rg -l "def test.*error" {}Length of output: 135
Script:
#!/bin/bash # Check specific error test cases echo "=== Test cases for aux alignment errors ===" rg "def test.*error" -A 5 tests/fgpyo/sam/test_aux_alignment.py echo -e "\n=== Implementation of from_tag_value ===" ast-grep --pattern 'def from_tag_value($$$) {$$$}' echo -e "\n=== Test cases with empty or invalid values ===" rg "empty|invalid|malformed" tests/fgpyo/sam/test_aux_alignment.pyLength of output: 1685
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (7)
tests/fgpyo/sam/test_aux_alignment.py (4)
13-58
: Add test cases for alignment_score validation and None handling.Add test cases for:
- Negative alignment_score validation
- None values in optional fields (mapping_quality, edit_distance, alignment_score)
60-164
: Add test cases for cigar string validation.Add test cases for:
- Empty cigar string ("")
- Invalid cigar string ("50X")
166-215
: Add test cases for invalid field formats.Add test cases for:
- Invalid mapping quality format ("abc")
- Invalid edit distance format ("xyz")
589-603
: Track hard-clipping sequence handling enhancement.The TODO about padding/clipping sequence with hard clips should be tracked in an issue.
Want me to create a GitHub issue to track this enhancement?
fgpyo/sam/__init__.py (3)
1261-1350
: Add validation for alignment_score.Add validation to ensure alignment_score is not negative when provided.
def __post_init__(self) -> None: """Perform post-initialization validation on this dataclass.""" errors: list[str] = [] if self.reference_start < 0: errors.append(f"Reference start cannot be less than 0! Found: {self.reference_start}") if self.mapping_quality is not None and self.mapping_quality < 0: errors.append(f"Mapping quality cannot be less than 0! Found: {self.mapping_quality}") if self.edit_distance is not None and self.edit_distance < 0: errors.append(f"Edit distance cannot be less than 0! Found: {self.edit_distance}") + if self.alignment_score is not None and self.alignment_score < 0: + errors.append(f"Alignment score cannot be less than 0! Found: {self.alignment_score}") if len(errors) > 0: raise ValueError("\n".join(errors))
1356-1422
: Refactor tag parsing into separate methods.Split parsing logic into tag-specific methods for better maintainability:
- _parse_sa_tag
- _parse_xa_tag
- _parse_xb_tag
1423-1438
: Add type hint for values list.Add explicit type hint for clarity:
- values: list[str] = cast(str, rec.get_tag(tag)).rstrip(";").split(";") + values: list[str] = cast(str, rec.get_tag(tag)).rstrip(";").split(";") + aux_values: list[str] = [v for v in values if v]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
fgpyo/sam/__init__.py
(6 hunks)tests/fgpyo/sam/test_aux_alignment.py
(1 hunks)
🔇 Additional comments (1)
fgpyo/sam/__init__.py (1)
1225-1230
: LGTM!Clean implementation of template reconstruction with auxiliary alignments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (8)
tests/fgpyo/sam/test_aux_alignment.py (4)
12-57
: Add test cases for optional parameters.Add test cases for:
- None values in optional parameters
- alignment_score validation
@pytest.mark.parametrize( ["kwargs", "error"], [ ( { "reference_name": "chr9", "reference_start": 123232, "is_forward": False, "cigar": Cigar.from_cigarstring("49M"), "edit_distance": None, "alignment_score": -1, "mapping_quality": None, }, "Alignment score cannot be less than 0! Found: -1", ), ], )
59-164
: Add edge cases for CIGAR string parsing.Add test cases for:
- Empty CIGAR string
- Invalid CIGAR string
[ # Test invalid CIGAR string in SA tag "SA", "chr9,104599381,-,49K,60,4", pytest.raises(CigarParsingException), ], [ # Test empty CIGAR string in SA tag "SA", "chr9,104599381,-,,60,4", pytest.raises(ValueError), ],
165-171
: Improve test specificity for field count validation.Add separate test cases for each tag type with their specific field count requirements.
@pytest.mark.parametrize( ["tag", "value", "expected_fields"], [ ("SA", "chr9,104599381", 6), ("XA", "chr9,104599381", 4), ("XB", "chr9,104599381", 6), ], ) def test_from_tag_value_invalid_field_count(tag: str, value: str, expected_fields: int) -> None: with pytest.raises(ValueError, match=f"{tag} tag value has the incorrect number of fields.*"): AuxAlignment.from_tag_value(tag, value)
588-603
: Track enhancement for hard-clipping sequence handling.Current implementation sets sequence to None when hard clips exist. Consider implementing the suggested padding/clipping logic.
Would you like me to open a GitHub issue to track this enhancement?
fgpyo/sam/__init__.py (4)
1266-1330
: Add type hints to docstring examples.Add type hints to make examples more informative:
```python ->>> supp = AuxAlignment.from_tag_value("SA", "chr1,123,+,50S100M,60,0") +>>> supp: AuxAlignment = AuxAlignment.from_tag_value("SA", "chr1,123,+,50S100M,60,0") >>> supp.reference_name 'chr1'
1361-1427
: Simplify tag parsing logic.Extract tag-specific parsing into separate methods to reduce complexity:
@classmethod def _parse_sa_tag(cls, fields: list[str]) -> Self: """Parse SA tag fields.""" reference_name, reference_start, strand, cigar, mapq, edit_distance = fields if strand not in ("+", "-"): raise ValueError(f"The strand field is not either '+' or '-': {strand}") return cls(...) @classmethod def _parse_xa_xb_tag(cls, fields: list[str]) -> Self: """Parse XA/XB tag fields.""" if len(fields) == 4: reference_name, stranded_start, cigar, edit_distance = fields mapq = alignment_score = None else: reference_name, stranded_start, cigar, edit_distance, alignment_score, mapq = fields return cls(...)
1444-1527
: Split record creation into smaller methods.Extract record creation logic into separate methods:
def _create_base_record(self, rec: AlignedSegment, hit: AuxAlignment) -> AlignedSegment: """Create base record with query information.""" aux = AlignedSegment(header=rec.header) aux.query_name = rec.query_name self._set_query_sequence(aux, rec, hit) return aux def _set_query_sequence(self, aux: AlignedSegment, rec: AlignedSegment, hit: AuxAlignment) -> None: """Set query sequence and qualities.""" if "H" in rec.cigarstring: aux.query_sequence = None aux.query_qualities = None elif rec.is_forward and not hit.is_forward: aux.query_sequence = reverse_complement(rec.query_sequence) aux.query_qualities = rec.query_qualities[::-1] else: aux.query_sequence = rec.query_sequence aux.query_qualities = rec.query_qualities
1230-1234
: Add validation for auxiliary alignments.Verify that auxiliary alignments belong to the template:
def with_aux_alignments(self) -> "Template": """Rebuild this template by adding auxiliary alignments from the primary alignment tags.""" r1_aux = iter([]) if self.r1 is None else AuxAlignment.many_pysam_from_rec(self.r1) r2_aux = iter([]) if self.r2 is None else AuxAlignment.many_pysam_from_rec(self.r2) + # Verify auxiliary alignments belong to this template + for aux in chain(r1_aux, r2_aux): + if aux.query_name != self.name: + raise ValueError(f"Auxiliary alignment {aux.query_name} does not match template {self.name}") return self.build(recs=chain(self.all_recs(), r1_aux, r2_aux))
@@ -1203,6 +1227,12 @@ def set_tag( | |||
for rec in self.all_recs(): | |||
rec.set_tag(tag, value) | |||
|
|||
def with_aux_alignments(self) -> "Template": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def with_aux_alignments(self) -> "Template": | |
def with_aux_alignments(self, include_secondary: bool = True, include_supplementary: bool = False) -> "Template": |
Are these sensible defaults?
By default, bwa aln
will produce a single record with all secondary alignments in the XA tag, and bwa mem
will produce multiple records, each with all other supplementary alignments in the SA tag
Alternatively (or in addition), we could confirm that rehydrating does not create any duplicate records on the template
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! This refactor may actually help me with another design decision I didn't love but couldn't solve easily at the time:
Prior drafts are here and here.