feat: implement are_equivalent functionality by folded · Pull Request #1 · folded/hgvs-weaver

folded · 2026-02-05T06:47:09Z

This PR adds the 'are_equivalent' function to compare HGVS variants, handling normalization, coordinate mapping, and gene symbols.

…lar types

folded · 2026-02-05T10:12:16Z

/gemini review

- Introduced granular `IdentifierType` enum for better type safety. - Updated `get_symbol_accessions` and `get_seq` to use `IdentifierType`. - Implemented smart symbol expansion in `equivalence.rs` based on coordinate systems. - Ensured interchangeability between `IdentifierType` enums and strings in the Python bridge. - Fixed `rustc` deprecation warnings and improved Python stub generation.

folded · 2026-02-06T00:11:29Z

/gemini review - There are significant changes.

- Added `SequenceProxy` to `weaver/cli/provider.py` for recording and replaying genomic sequences. - Implemented assembly-aware manifest in sequence cache to handle multiple FASTA references. - Added `tests/test_hgvs_eval_integration.py` with 17 portably-runnable integration tests. - Created consolidated `tests/data/hgvs_eval_reference.gff` and `tests/data/hgvs_eval_sequences.json`. - Added `scripts/fetch_gff.py` utility for optimized GFF retrieval. - Updated Rust core and documentation for consistency.

folded · 2026-02-10T07:11:20Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant new feature: the are_equivalent function for comparing HGVS variants. This is a substantial change that touches many parts of the codebase, from the core Rust logic in hgvs-weaver to the Python bindings in src/lib.rs and the weaver package itself. The implementation correctly handles gene symbol expansion, cross-coordinate mapping, and variant normalization. New documentation and a comprehensive set of tests, including integration tests against hgvs-eval data, have been added to support this feature.

My review focuses on improving the robustness and maintainability of the new equivalence logic. Specifically, I've pointed out an incomplete implementation in a normalization function, some dead code, a potential performance regression, and opportunities to simplify and clarify complex coordinate-handling code.

hgvs-weaver/src/equivalence.rs

docs/source/equivalence_logic.md

hgvs-weaver/src/equivalence.rs

hgvs-weaver/src/mapper.rs

…nts and improve robustness

…ew variant support information

… and unused imports

…epository clutter

…emoval

…bility

- Fix identity variant (`c.=`) handling to prevent false frameshifts. - Implement surgical stop-codon suffix trimming to prevent massive deletion reports. - Resolve negative-strand delins translation mismatch by disabling inappropriate cDNA normalization in `c_to_p`. - Achieve 91.3% accuracy parity with `ref-hgvs`.

- Implement explicit nonsense (`p.Ter`) and extension (`ext*`) normalization - Fix `delins` formatting regression for single-residue substitutions - Add `AaEdit::Ext` formatting in `fmt.rs` - Implement `dna_repeat`, `pro_ext`, `pro_repeat`, `dna_con`, and `dna_copy` in parser - Update `sanity_parity_test.rs` expectations for improved accuracy - Add `normalization_test.rs` for nonsense and extension verification - Include category classification and extraction scripts

Modifies AltSeqBuilder to correctly handle Repeat variants by replacing the full repeat run instead of single base, fixing frameshift errors. Adds regression tests for repeat expansion and contraction. Also fixes logic in contingency.py (handling of Stop/Ter) and updates README with latest validation stats (98.6% match).

… validation

…vsError::TranscriptMismatch\n- Expose TranscriptMismatchError to Python\n- Update validation scripts to use structured errors\n- Fix unreachable pattern warnings in sequence.rs\n- Move SPDI logic to Rust and expose via PyO3

- Implement `Sequence` trait for iterator-based sequence access in Rust. - Refactor `AltSeqBuilder` to use lazy slicing and reduce eager string allocations. - Fix regression where synonymous variants (`c.360=`) were incorrectly translated as frameshifts. - Prioritize stop codons over frameshifts in protein translation if stop occurs at variant site. - Fix SPDI identity handling (`REF:REF` representation). - Integrate analysis and contingency reporting into `weaver-analyze`. - Add regression test suite `tests/test_regressions.rs`.

…d SPDI shifting. Update benchmarks to 100k variants.

…ignore.

…ts\n\n- Added `to_spdi` logic for `NaEdit::Repeat` in `structs.rs`\n- Implemented `c_to_g` in `DataProvider` trait for intronic offset resolution\n- Updated Python bridge and `RefSeqDataProvider` with `c_to_g` support\n- Fixed all test mocks and resolved `Unsupported operation` errors for these categories\n- Updated README.md with 100k ClinVar validation results (93.917% SPDI match)

…ngth check and insertion detection in `mapper.rs`

…uences

folded · 2026-02-16T21:51:12Z

/gemini review

This branch has added a large number of new features, and needs a thorough re-review.

gemini-code-assist

Code Review

This is an impressive pull request that introduces significant new functionality and major improvements to the codebase. The implementation of the are_equivalent function is well-structured and handles complex cases like gene symbol expansion and cross-coordinate mapping. The refactoring of sequence handling using the Sequence trait is a standout improvement, promoting efficiency and code clarity. Moving the SPDI conversion logic into the Rust core is also a great decision.

I have one suggestion for improvement in an analysis script to ensure the reported validation metrics are as accurate as possible. Overall, this is excellent work.

weaver/cli/analyze.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

folded added 4 commits February 5, 2026 17:46

feat: implement are_equivalent functionality for variant comparison

58be7ce

fix: rename methods, fix stubs, and correct pre-commits

aef0d99

feat: use dynamic identifier identification for symbol resolution

f549718

feat: rename identify_identifier to get_identifier_type and add granu…

c477c07

…lar types

This comment was marked as outdated.

Sign in to view

folded added 5 commits February 6, 2026 11:20

Add documentation for variant equivalence algorithm

f49a164

Fix markdown lints

4aa9713

Fix mermaid syntax in equivalence logic docs

e588ab9

trailing newline cleanups

527ecff

gemini-code-assist bot reviewed Feb 10, 2026

View reviewed changes

hgvs-weaver/src/equivalence.rs Show resolved Hide resolved

docs/source/equivalence_logic.md Outdated Show resolved Hide resolved

hgvs-weaver/src/equivalence.rs Outdated Show resolved Hide resolved

hgvs-weaver/src/mapper.rs Outdated Show resolved Hide resolved

folded added 14 commits February 10, 2026 21:34

Fix EvalCase indexing and resolve fetch_gff.py pre-commit errors

8c65d7a

Refactor normalize_ins_to_dup to support Coding and NonCoding varia…

be0693b

…nts and improve robustness

Update equivalence_logic.md documentation with relative links and n…

977fcc7

…ew variant support information

Remove unused and redundant eq_dup_ins method from equivalence.rs

6c464de

Clean up Rust compile warnings: unused variables, naming conventions,…

3292270

… and unused imports

Remove biocommons_hgvs_eval_README.md to resolve linting errors and r…

bd79d43

…epository clutter

Optimize shift_3_prime: exponential chunk growth and redundant loop r…

f144e3d

…emoval

Refactor shift_3_prime inner loops: remove continue and improve reada…

f10d769

…bility

Add validation analysis script

08f9ef2

Fix redundant identity intervals and 3' UTR variant mapping

efaecf0

checkpoint

2e64127

checkpoint

8ed0107

folded added 20 commits February 12, 2026 20:24

Refactor: Implement SPDI generation in Rust and integrate into Python…

b9f618f

… validation

Fix validate.py not catching TranscriptMismatchError in rs_p calculation

8e92cf2

update readme

0f26279

benchmarks: highlight best performance values in validation table

a924c9e

Fix HGVS regressions: identity variants, stop codons, and minus-stran…

028489c

…d SPDI shifting. Update benchmarks to 100k variants.

Cleanup: remove analysis and coverage files from repo and update .git…

fbf7afc

…ignore.

Implement configurable to_spdi and decouple validation logic

070cb95

ruff cleanups

4f3b711

chore: resolve pre-commit errors across python scripts and stubs

a551cff

fix(protein): resolve coordinate shift by correcting normalization le…

01372cc

…ngth check and insertion detection in `mapper.rs`

update readme stats

d857435

Add Rust regression tests with realistic exon structures and cDNA seq…

204362e

…uences

Finalize historical benchmark script and project-wide formatting

e225ee6

Include latest commit in benchmark list

5f0bf83

Cleanup obsolete scripts and fix pypy pre-commit lint/type errors

5a647fe

Bump version to 0.2.0

5ebdda1

gemini-code-assist bot reviewed Feb 16, 2026

View reviewed changes

weaver/cli/analyze.py Outdated Show resolved Hide resolved

folded and others added 4 commits February 17, 2026 08:54

Update weaver/cli/analyze.py

4d18567

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

update README stats and uv.lock

a5c5343

make sure that the same logic for p. comparison is used everywhere

f6efc8b

cleanups

8c06aef

folded merged commit b9b5c8b into main Feb 16, 2026
4 checks passed

folded deleted the feature/variant-equivalence branch February 16, 2026 22:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement are_equivalent functionality#1

feat: implement are_equivalent functionality#1
folded merged 48 commits intomainfrom
feature/variant-equivalence

folded commented Feb 5, 2026

Uh oh!

folded commented Feb 5, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

folded commented Feb 6, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

folded commented Feb 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

folded commented Feb 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

folded commented Feb 5, 2026

Uh oh!

folded commented Feb 5, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

folded commented Feb 6, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

folded commented Feb 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

folded commented Feb 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant