Skip to content

Add dirty write counter to track unsaved mutations#20

Merged
titusz merged 3 commits intomainfrom
claude/fix-issue-16-CXuGa
Feb 22, 2026
Merged

Add dirty write counter to track unsaved mutations#20
titusz merged 3 commits intomainfrom
claude/fix-issue-16-CXuGa

Conversation

@titusz
Copy link
Member

@titusz titusz commented Feb 21, 2026

Summary

Adds a dirty property to all index types that tracks the number of unsaved key mutations (adds/removes) since the last save, load, view, or reset operation. This enables caller-driven flush policies and helps applications manage persistence efficiently.

Key Changes

  • NphdIndex: Added _dirty counter that increments on add() and remove() operations, and resets on save(), load(), view(), and reset() calls. The copy() method starts with a clean counter.

  • ShardedIndex: Added _dirty counter that increments on add() and remove() operations (both single and batch), and resets on save() and reset(). Read-only indexes always return dirty=0. Shard rotation does not reset the counter.

  • ShardedIndex128 & ShardedNphdIndex128: Inherit dirty tracking from their parent ShardedIndex class.

  • ShardedNphdIndex: Inherits dirty tracking from ShardedIndex with support for variable-dimension NPHD vectors.

  • Comprehensive test suite: Added 405 lines of tests covering all index types, validating:

    • Counter initialization and increment behavior
    • Reset on persistence operations (save/load/view/reset)
    • Batch operations
    • Transitive counting through upsert/add_once
    • Read-only index behavior
    • Shard rotation behavior
    • Truthy/falsy checks

Implementation Details

  • The dirty counter is a simple integer that increments by the number of keys affected by each mutation operation
  • For batch operations, the counter increments by the batch size
  • The counter is reset to 0 after any persistence operation (save/load/view/reset)
  • Read-only indexes always report dirty=0 to prevent confusion
  • The property supports standard Python truthy checks (if idx.dirty:)
  • Shard rotation during add() does not reset the counter, allowing accurate tracking across shard boundaries

https://claude.ai/code/session_01CDPHh8ooDM3h5KvsSitst9

Add a `dirty` integer property to writable indexes that tracks unsaved
key mutations, enabling custom flush/persistence strategies like
"save every N writes".

- NphdIndex: dirty increments in add()/remove(), resets on save/load/view/reset
- ShardedIndex: dirty increments in add()/remove(), resets on save/reset
- Read-only indexes always return 0
- Shard rotation does not reset the counter (bloom filters/tombstones remain)
- Upsert/add_once count transitively through their delegation to add/remove

Closes #16

https://claude.ai/code/session_01CDPHh8ooDM3h5KvsSitst9
Copy link
Member Author

@titusz titusz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — this looks solid.

Summary

  • Adds a dirty counter to NphdIndex and ShardedIndex tracking key mutations since last persistence/reset, enabling caller-driven flush policies (e.g. “save every N writes”).
  • Implementation is consistent across variants: increments on add()/remove(), resets on save()/load()/view()/reset() as appropriate, and read-only indexes report dirty == 0.
  • ShardedIndex.save() no longer early-returns on empty active shard, so dirty reliably resets even when nothing is written.

Verification

  • Ran locally on Windows: pytest (960 passed, 1 skipped; 100% coverage), plus ruff check, ruff format --check, ty check, and bandit.

Non-blocking notes

  • Current semantics count attempted removals (increments before bloom/contains checks), so removing a missing key can still bump dirty. That’s safe (may flush early) but slightly different from “successful mutations” — worth confirming this is the intended contract.
  • NphdIndex.add() treats 1D numpy arrays as a single vector, but not bytes/bytearray. If you want raw bytes to be treated as a single vector too, consider wrapping those similarly (or clarify in the docstring).

Address review feedback: remove() now checks key existence before
incrementing the dirty counter, so removing non-existent keys no
longer inflates the count. Added tests for single, batch, and
partial-batch removal of missing keys.

https://claude.ai/code/session_01CDPHh8ooDM3h5KvsSitst9
@titusz titusz merged commit 32c7266 into main Feb 22, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants