-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
If a .usearch shard file is corrupted or truncated, the usearch C++ layer may segfault rather than raising a clean Python exception. The atomic_write() pattern reduces this risk significantly, but doesn't eliminate it entirely (storage-level corruption, filesystem errors, interrupted OS-level writes).
Current behavior:
_restore_shard()callsIndex.metadata(str(path))which returnsNoneif metadata can't be read — this case is handled- But if metadata reads fine and the HNSW graph structure is corrupted,
Index.load()/Index.view()may hit out-of-bounds memory access in C++, causing a segfault _load_existing()raisesRuntimeErrorif_restore_shard()returnsNone, but if C++ crashes before reaching Python error handling, the process dies silently
Proposal
Wrap shard restore/load operations in try/except at the Python level to catch C++ exceptions that make it through to Python:
def _restore_shard(self, path):
try:
meta = Index.metadata(str(path))
if meta is None:
return None
idx = self._create_shard()
idx.load(str(path))
return idx
except Exception as e:
logger.warning(f"Failed to restore shard {path}: {e}")
return NoneConsider also:
- A dedicated
CorruptedShardErrorexception type so consumers can catch it specifically - Logging the corrupted shard path for diagnosis
- Optional
on_corrupt="warn"/on_corrupt="raise"parameter to control behavior
Context
iscc-search uses LMDB as the source of truth and usearch indexes as derived acceleration structures. If a shard is corrupted, iscc-search can rebuild the derived index from LMDB — but only if it gets a clean Python exception instead of a segfault. Graceful error handling here enables automatic recovery rather than process death.