hey so this is Photon DB, its a high performance vector database i built using Rust but with Python bindings. It uses HNSW (Hierarchical Navigable Small World) graphs which makes the nearest neighbor search really fast.
- Fast af: implemented in rust so its optimized for speed
- Easy Python API: simple interface to just plug and play
- Persistence: saves/loads indexes to disk instantly using zero-copy serialization (
rkyv) - Customizable: you can fine tune parameters like
Mandef_constructiondepending on what you need
you just need rust toolchain and python 3.7+ installed.
# 1. make a venv
python3 -m venv venv
source venv/bin/activate
# 2. install dependencies
pip install maturin numpy sentence-transformers tqdm torch --index-url https://download.pytorch.org/whl/cpu
# 3. build and install
maturin develop --releaseThe main class is PyHNSW, here's how to use it.
initializes the index.
max_elements: estimate of how many vectors you'll have (index can grow so just a rough number is fine)dim: dimensionality of your vectors (e.g. 384 for MiniLM, 1536 for OpenAI)m: max outgoing connections per node.- tip: 16-64 is usually good. higher = better recall but bigger index size
ef_construction: candidate list size during build.- tip: keep it between 100-500. higher means better graph quality but takes longer to build
inserts a single vector.
vec: the vector embedding (list of floats)m: max connections for this insert (usually same as init)m_max: max allowed connections per layer (usuallym * 2)ef_construction: depth for this insert (same as init)m_l: level generation factor (default1.0)- returns: the internal doc ID (int)
does the actual ANN search.
query: your query vectork: how many neighbors you want backef_search: search depth.- tip: set this to
kork * 10. higher value = more accurate but slower latency
- tip: set this to
- returns: list of results sorted by distance
[(distance, doc_id), ...]
does an exact search checking every single vector. mostly just for testing recall/accuracy.
saves the whole graph to disk.
static method to load a saved index.
db = photon_db.PyHNSW.load("my_index.pho")Latest Benchmark Output (SIFT10k)
╔══════════════════════════════════════════════════════════════╗
║ Dataset: SIFT10k (128d) | Mode: Strict ║
╚══════════════════════════════════════════════════════════════╝
[1/4] Loading SIFT10k Dataset...
Loaded 10000 base vectors
Loaded 100 query vectors
[2/4] Building HNSW Index...
[00:00:03] [████████████████████████████████████████] 10000/10000 (0s) Index Time: 3.85s (2598 items/sec)
[3/4] Running Queries (k=1)...
[4/4] Verifying against Ground Truth...
════════════════ FINAL RESULTS ════════════════
Engine: Photon-DB
Dataset: 10000 vectors
Avg Latency: 53 µs
Throughput: 18 items/s
Recall@1: 97%
═══════════════════════════════════════════════