Skip to content

Graceful error handling for corrupted shard files #12

@titusz

Description

@titusz

Problem

If a .usearch shard file is corrupted or truncated, the usearch C++ layer may segfault rather than raising a clean Python exception. The atomic_write() pattern reduces this risk significantly, but doesn't eliminate it entirely (storage-level corruption, filesystem errors, interrupted OS-level writes).

Current behavior:

  • _restore_shard() calls Index.metadata(str(path)) which returns None if metadata can't be read — this case is handled
  • But if metadata reads fine and the HNSW graph structure is corrupted, Index.load() / Index.view() may hit out-of-bounds memory access in C++, causing a segfault
  • _load_existing() raises RuntimeError if _restore_shard() returns None, but if C++ crashes before reaching Python error handling, the process dies silently

Proposal

Wrap shard restore/load operations in try/except at the Python level to catch C++ exceptions that make it through to Python:

def _restore_shard(self, path):
    try:
        meta = Index.metadata(str(path))
        if meta is None:
            return None
        idx = self._create_shard()
        idx.load(str(path))
        return idx
    except Exception as e:
        logger.warning(f"Failed to restore shard {path}: {e}")
        return None

Consider also:

  • A dedicated CorruptedShardError exception type so consumers can catch it specifically
  • Logging the corrupted shard path for diagnosis
  • Optional on_corrupt="warn" / on_corrupt="raise" parameter to control behavior

Context

iscc-search uses LMDB as the source of truth and usearch indexes as derived acceleration structures. If a shard is corrupted, iscc-search can rebuild the derived index from LMDB — but only if it gets a clean Python exception instead of a segfault. Graceful error handling here enables automatic recovery rather than process death.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions