Add dirty write counter to track unsaved mutations by titusz · Pull Request #20 · iscc/iscc-usearch

titusz · 2026-02-21T18:42:27Z

Summary

Adds a dirty property to all index types that tracks the number of unsaved key mutations (adds/removes) since the last save, load, view, or reset operation. This enables caller-driven flush policies and helps applications manage persistence efficiently.

Key Changes

NphdIndex: Added _dirty counter that increments on add() and remove() operations, and resets on save(), load(), view(), and reset() calls. The copy() method starts with a clean counter.
ShardedIndex: Added _dirty counter that increments on add() and remove() operations (both single and batch), and resets on save() and reset(). Read-only indexes always return dirty=0. Shard rotation does not reset the counter.
ShardedIndex128 & ShardedNphdIndex128: Inherit dirty tracking from their parent ShardedIndex class.
ShardedNphdIndex: Inherits dirty tracking from ShardedIndex with support for variable-dimension NPHD vectors.
Comprehensive test suite: Added 405 lines of tests covering all index types, validating:
- Counter initialization and increment behavior
- Reset on persistence operations (save/load/view/reset)
- Batch operations
- Transitive counting through upsert/add_once
- Read-only index behavior
- Shard rotation behavior
- Truthy/falsy checks

Implementation Details

The dirty counter is a simple integer that increments by the number of keys affected by each mutation operation
For batch operations, the counter increments by the batch size
The counter is reset to 0 after any persistence operation (save/load/view/reset)
Read-only indexes always report dirty=0 to prevent confusion
The property supports standard Python truthy checks (if idx.dirty:)
Shard rotation during add() does not reset the counter, allowing accurate tracking across shard boundaries

https://claude.ai/code/session_01CDPHh8ooDM3h5KvsSitst9

Add a `dirty` integer property to writable indexes that tracks unsaved key mutations, enabling custom flush/persistence strategies like "save every N writes". - NphdIndex: dirty increments in add()/remove(), resets on save/load/view/reset - ShardedIndex: dirty increments in add()/remove(), resets on save/reset - Read-only indexes always return 0 - Shard rotation does not reset the counter (bloom filters/tombstones remain) - Upsert/add_once count transitively through their delegation to add/remove Closes #16 https://claude.ai/code/session_01CDPHh8ooDM3h5KvsSitst9

https://claude.ai/code/session_01CDPHh8ooDM3h5KvsSitst9

titusz

Thanks — this looks solid.

Summary

Adds a dirty counter to NphdIndex and ShardedIndex tracking key mutations since last persistence/reset, enabling caller-driven flush policies (e.g. “save every N writes”).
Implementation is consistent across variants: increments on add()/remove(), resets on save()/load()/view()/reset() as appropriate, and read-only indexes report dirty == 0.
ShardedIndex.save() no longer early-returns on empty active shard, so dirty reliably resets even when nothing is written.

Verification

Ran locally on Windows: pytest (960 passed, 1 skipped; 100% coverage), plus ruff check, ruff format --check, ty check, and bandit.

Non-blocking notes

Current semantics count attempted removals (increments before bloom/contains checks), so removing a missing key can still bump dirty. That’s safe (may flush early) but slightly different from “successful mutations” — worth confirming this is the intended contract.
NphdIndex.add() treats 1D numpy arrays as a single vector, but not bytes/bytearray. If you want raw bytes to be treated as a single vector too, consider wrapping those similarly (or clarify in the docstring).

Address review feedback: remove() now checks key existence before incrementing the dirty counter, so removing non-existent keys no longer inflates the count. Added tests for single, batch, and partial-batch removal of missing keys. https://claude.ai/code/session_01CDPHh8ooDM3h5KvsSitst9

claude added 2 commits February 21, 2026 18:09

fix: remove unused imports in test_dirty.py

7ba3bae

https://claude.ai/code/session_01CDPHh8ooDM3h5KvsSitst9

titusz commented Feb 21, 2026

View reviewed changes

titusz merged commit 32c7266 into main Feb 22, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dirty write counter to track unsaved mutations#20

Add dirty write counter to track unsaved mutations#20
titusz merged 3 commits intomainfrom
claude/fix-issue-16-CXuGa

titusz commented Feb 21, 2026

Uh oh!

titusz left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

titusz commented Feb 21, 2026

Summary

Key Changes

Implementation Details

Uh oh!

titusz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants