Skip to content

feat: implement are_equivalent functionality#1

Merged
folded merged 48 commits intomainfrom
feature/variant-equivalence
Feb 16, 2026
Merged

feat: implement are_equivalent functionality#1
folded merged 48 commits intomainfrom
feature/variant-equivalence

Conversation

@folded
Copy link
Owner

@folded folded commented Feb 5, 2026

This PR adds the 'are_equivalent' function to compare HGVS variants, handling normalization, coordinate mapping, and gene symbols.

@folded
Copy link
Owner Author

folded commented Feb 5, 2026

/gemini review

gemini-code-assist[bot]

This comment was marked as outdated.

- Introduced granular `IdentifierType` enum for better type safety.
- Updated `get_symbol_accessions` and `get_seq` to use `IdentifierType`.
- Implemented smart symbol expansion in `equivalence.rs` based on coordinate systems.
- Ensured interchangeability between `IdentifierType` enums and strings in the Python bridge.
- Fixed `rustc` deprecation warnings and improved Python stub generation.
@folded
Copy link
Owner Author

folded commented Feb 6, 2026

/gemini review - There are significant changes.

gemini-code-assist[bot]

This comment was marked as outdated.

- Added `SequenceProxy` to `weaver/cli/provider.py` for recording and replaying genomic sequences.
- Implemented assembly-aware manifest in sequence cache to handle multiple FASTA references.
- Added `tests/test_hgvs_eval_integration.py` with 17 portably-runnable integration tests.
- Created consolidated `tests/data/hgvs_eval_reference.gff` and `tests/data/hgvs_eval_sequences.json`.
- Added `scripts/fetch_gff.py` utility for optimized GFF retrieval.
- Updated Rust core and documentation for consistency.
@folded
Copy link
Owner Author

folded commented Feb 10, 2026

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant new feature: the are_equivalent function for comparing HGVS variants. This is a substantial change that touches many parts of the codebase, from the core Rust logic in hgvs-weaver to the Python bindings in src/lib.rs and the weaver package itself. The implementation correctly handles gene symbol expansion, cross-coordinate mapping, and variant normalization. New documentation and a comprehensive set of tests, including integration tests against hgvs-eval data, have been added to support this feature.

My review focuses on improving the robustness and maintainability of the new equivalence logic. Specifically, I've pointed out an incomplete implementation in a normalization function, some dead code, a potential performance regression, and opportunities to simplify and clarify complex coordinate-handling code.

- Fix identity variant (`c.=`) handling to prevent false frameshifts.
- Implement surgical stop-codon suffix trimming to prevent massive deletion reports.
- Resolve negative-strand delins translation mismatch by disabling inappropriate cDNA normalization in `c_to_p`.
- Achieve 91.3% accuracy parity with `ref-hgvs`.
- Implement explicit nonsense (`p.Ter`) and extension (`ext*`) normalization
- Fix `delins` formatting regression for single-residue substitutions
- Add `AaEdit::Ext` formatting in `fmt.rs`
- Implement `dna_repeat`, `pro_ext`, `pro_repeat`, `dna_con`, and `dna_copy` in parser
- Update `sanity_parity_test.rs` expectations for improved accuracy
- Add `normalization_test.rs` for nonsense and extension verification
- Include category classification and extraction scripts
Modifies AltSeqBuilder to correctly handle Repeat variants by replacing the full repeat run instead of single base, fixing frameshift errors. Adds regression tests for repeat expansion and contraction. Also fixes logic in contingency.py (handling of Stop/Ter) and updates README with latest validation stats (98.6% match).
…vsError::TranscriptMismatch\n- Expose TranscriptMismatchError to Python\n- Update validation scripts to use structured errors\n- Fix unreachable pattern warnings in sequence.rs\n- Move SPDI logic to Rust and expose via PyO3
- Implement `Sequence` trait for iterator-based sequence access in Rust.
- Refactor `AltSeqBuilder` to use lazy slicing and reduce eager string allocations.
- Fix regression where synonymous variants (`c.360=`) were incorrectly translated as frameshifts.
- Prioritize stop codons over frameshifts in protein translation if stop occurs at variant site.
- Fix SPDI identity handling (`REF:REF` representation).
- Integrate analysis and contingency reporting into `weaver-analyze`.
- Add regression test suite `tests/test_regressions.rs`.
…d SPDI shifting. Update benchmarks to 100k variants.
…ts\n\n- Added `to_spdi` logic for `NaEdit::Repeat` in `structs.rs`\n- Implemented `c_to_g` in `DataProvider` trait for intronic offset resolution\n- Updated Python bridge and `RefSeqDataProvider` with `c_to_g` support\n- Fixed all test mocks and resolved `Unsupported operation` errors for these categories\n- Updated README.md with 100k ClinVar validation results (93.917% SPDI match)
…ngth check and insertion detection in `mapper.rs`
@folded
Copy link
Owner Author

folded commented Feb 16, 2026

/gemini review

This branch has added a large number of new features, and needs a thorough re-review.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an impressive pull request that introduces significant new functionality and major improvements to the codebase. The implementation of the are_equivalent function is well-structured and handles complex cases like gene symbol expansion and cross-coordinate mapping. The refactoring of sequence handling using the Sequence trait is a standout improvement, promoting efficiency and code clarity. Moving the SPDI conversion logic into the Rust core is also a great decision.

I have one suggestion for improvement in an analysis script to ensure the reported validation metrics are as accurate as possible. Overall, this is excellent work.

folded and others added 4 commits February 17, 2026 08:54
@folded folded merged commit b9b5c8b into main Feb 16, 2026
4 checks passed
@folded folded deleted the feature/variant-equivalence branch February 16, 2026 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant