Skip to content

Add graceful error handling for corrupted shard files#19

Open
titusz wants to merge 3 commits intomainfrom
claude/fix-issue-12-Diifd
Open

Add graceful error handling for corrupted shard files#19
titusz wants to merge 3 commits intomainfrom
claude/fix-issue-12-Diifd

Conversation

@titusz
Copy link
Member

@titusz titusz commented Feb 21, 2026

Summary

This PR adds comprehensive error handling for corrupted or truncated shard files in ShardedIndex and ShardedNphdIndex. Instead of crashing when encountering corrupted shards, the index now gracefully skips them with warnings and remains operational with valid shards.

Key Changes

  • New CorruptedShardError exception: A dedicated exception class for corrupted shard errors, exported from the main module for user code to catch if needed.

  • Graceful shard restoration: Modified _restore_shard() methods in ShardedIndex, ShardedIndexedKeys, ShardedNphdIndex, and ShardedNphdIndex128 to:

    • Wrap metadata reading and shard loading in try-except blocks
    • Return None instead of raising exceptions when shards are corrupted
    • Log detailed warnings about what went wrong (metadata unreadable, load/view failed, etc.)
  • Robust config resolution: Updated _resolve_config() and _resolve_max_dim() to iterate through all available shards when reading metadata, skipping corrupted ones until a valid shard is found.

  • Improved _load_existing() logic:

    • View shards: Corrupted shards are skipped with warnings; valid shards remain accessible
    • Active shard: If corrupted, a fresh empty shard is created instead of failing
    • All shards corrupted: Index opens successfully with size=0 and a fresh active shard
    • Tracks corrupted paths and logs a summary of skipped shards
  • Comprehensive test coverage: Added 275 lines of tests covering:

    • Single and multiple corrupted shards
    • Truncated and empty shard files
    • Read-only mode with corrupted shards
    • Config auto-detection fallback behavior
    • Full usability after recovery (add, search operations)
    • Both ShardedIndex and ShardedNphdIndex variants
  • Updated existing tests: Modified tests that expected exceptions on key kind mismatches to reflect the new graceful recovery behavior (index opens with size=0).

Implementation Details

  • Corrupted shards are logged at WARNING level with specific failure reasons
  • The index remains fully operational after recovery—new data can be added and searches work normally
  • Read-only mode gracefully skips corrupted shards without attempting recovery
  • When all shards are corrupted but ndim/max_dim is provided, the index still opens successfully
  • When all shards are corrupted and no dimension parameter is provided, a clear error message is raised

https://claude.ai/code/session_01EygYi2hu6fQPvaR5zKCdWs

Add CorruptedShardError exception and wrap all _restore_shard methods
with try/except to catch C++ exceptions from corrupted/truncated shard
files. Corrupted shards are now skipped with log warnings instead of
crashing the process, allowing the index to remain operational with
whatever valid shards remain. When all shards are corrupted, the
constructor succeeds with size=0 so consumers can detect and rebuild.

Closes #12

https://claude.ai/code/session_01EygYi2hu6fQPvaR5zKCdWs
Add comprehensive tests covering all new corruption handling code paths
and previously uncovered branches:
- Corrupted shard error metadata/view/load exception paths
- Search filtering with active shard overlap (_needs_compact)
- Vectors/keys array slow path with dtype conversion
- Iterator slow paths with tombstones and compaction
- Getitem error paths and empty array edge cases

https://claude.ai/code/session_01EygYi2hu6fQPvaR5zKCdWs
- Remove unused import (pathlib.Path) in test_corrupted_shards.py
- Remove unused variables (original_load, original_view)
- Fix ruff formatting in sharded.py (blank line, string concat)

https://claude.ai/code/session_01EygYi2hu6fQPvaR5zKCdWs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants