Add dirty write counter to track unsaved mutations#20
Merged
Conversation
Add a `dirty` integer property to writable indexes that tracks unsaved key mutations, enabling custom flush/persistence strategies like "save every N writes". - NphdIndex: dirty increments in add()/remove(), resets on save/load/view/reset - ShardedIndex: dirty increments in add()/remove(), resets on save/reset - Read-only indexes always return 0 - Shard rotation does not reset the counter (bloom filters/tombstones remain) - Upsert/add_once count transitively through their delegation to add/remove Closes #16 https://claude.ai/code/session_01CDPHh8ooDM3h5KvsSitst9
titusz
commented
Feb 21, 2026
Member
Author
titusz
left a comment
There was a problem hiding this comment.
Thanks — this looks solid.
Summary
- Adds a
dirtycounter toNphdIndexandShardedIndextracking key mutations since last persistence/reset, enabling caller-driven flush policies (e.g. “save every N writes”). - Implementation is consistent across variants: increments on
add()/remove(), resets onsave()/load()/view()/reset()as appropriate, and read-only indexes reportdirty == 0. ShardedIndex.save()no longer early-returns on empty active shard, sodirtyreliably resets even when nothing is written.
Verification
- Ran locally on Windows:
pytest(960 passed, 1 skipped; 100% coverage), plusruff check,ruff format --check,ty check, andbandit.
Non-blocking notes
- Current semantics count attempted removals (increments before bloom/contains checks), so removing a missing key can still bump
dirty. That’s safe (may flush early) but slightly different from “successful mutations” — worth confirming this is the intended contract. NphdIndex.add()treats 1D numpy arrays as a single vector, but notbytes/bytearray. If you want raw bytes to be treated as a single vector too, consider wrapping those similarly (or clarify in the docstring).
Address review feedback: remove() now checks key existence before incrementing the dirty counter, so removing non-existent keys no longer inflates the count. Added tests for single, batch, and partial-batch removal of missing keys. https://claude.ai/code/session_01CDPHh8ooDM3h5KvsSitst9
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
dirtyproperty to all index types that tracks the number of unsaved key mutations (adds/removes) since the last save, load, view, or reset operation. This enables caller-driven flush policies and helps applications manage persistence efficiently.Key Changes
NphdIndex: Added
_dirtycounter that increments onadd()andremove()operations, and resets onsave(),load(),view(), andreset()calls. Thecopy()method starts with a clean counter.ShardedIndex: Added
_dirtycounter that increments onadd()andremove()operations (both single and batch), and resets onsave()andreset(). Read-only indexes always returndirty=0. Shard rotation does not reset the counter.ShardedIndex128 & ShardedNphdIndex128: Inherit dirty tracking from their parent
ShardedIndexclass.ShardedNphdIndex: Inherits dirty tracking from
ShardedIndexwith support for variable-dimension NPHD vectors.Comprehensive test suite: Added 405 lines of tests covering all index types, validating:
Implementation Details
dirty=0to prevent confusionif idx.dirty:)add()does not reset the counter, allowing accurate tracking across shard boundarieshttps://claude.ai/code/session_01CDPHh8ooDM3h5KvsSitst9