PythonVectorDB

Pure Python vector database with Int8 quantization and lazy deletion.

🚀 Features

🧠 Int8 Quantization: 4x memory savings with minimal accuracy loss
⚡ Fast Search: Numba-optimized cosine similarity with parallel processing
🗑️ Lazy Deletion: Efficient deletion with threshold-based compaction
🔒 Thread-Safe: All operations protected by locks
💾 Binary Save/Load: Fast persistence using NumPy's compressed format

📦 Installation

pip install numpy numba

Then copy pythonvectordb.py to your project.

🎯 Quick Start

import numpy as np
from pythonvectordb import PythonVectorDB

# Create database
db = PythonVectorDB(dimension=128)

# Add vectors
vectors = np.random.randn(1000, 128).astype(np.float32)
db.add_vectors(vectors)

# Search
query = np.random.randn(128).astype(np.float32)
results = db.search(query, k=10)

for vector_id, score, metadata in results:
    print(f"{vector_id}: {score:.4f}")

📚 API Reference

Initialize

db = PythonVectorDB(dimension=128, initial_capacity=10000)

Add Vectors

db.add_vectors(
    vectors,              # np.ndarray of shape (n, dimension)
    vector_ids=None,      # Optional list of IDs
    metadata=None         # Optional list of dicts
)

Search

results = db.search(
    query,                # np.ndarray of shape (dimension,)
    k=10,                 # Number of results
    filter_fn=None        # Optional filter function
)
# Returns: List[(vector_id, score, metadata)]

**Performance Note:** Heavy metadata filtering on >300k vectors adds Python-side overhead.
For high-volume filtering, pre-partition data or use external ID filtering.

Save/Load

db.save("database.npz")
db = PythonVectorDB.load("database.npz")

Delete Vector

db.delete_vector(vector_id)  # Lazy deletion

Get Stats

stats = db.get_stats()
print(stats)  # Memory usage, QPS, latencies

⚡ Performance

Tested on 100K vectors (128 dimensions):

Database Size	Search QPS	Memory/Vector
1,000 vectors	16,619 QPS	640 bytes
10,000 vectors	3,676 QPS	466 bytes
50,000 vectors	1,159 QPS	608 bytes
100,000 vectors	448 QPS	466 bytes

Peak Performance:

Insert: 1.27M vectors/sec (1000 batch)
Memory Efficiency: 466 bytes/vector (4x savings vs float32)

🧪 Testing

Run the comprehensive test suite:

pip install -r requirements.txt  # Install all dependencies including psutil for benchmarks
python benchmark_suite.py    # Performance benchmarks

All tests pass on the first run – no setup required.

📄 License

MIT License – see pythonvectordb.py for details.

🤝 Contributing

Issues and PRs welcome! This is a single-file project – keep it simple.

PythonVectorDB – the vector database that actually works in pure Python.

⭐ Star this repo • 🐛 Report Issues • 📖 Documentation

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark_suite.py		benchmark_suite.py
pythonvectordb.py		pythonvectordb.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PythonVectorDB

🚀 Features

📦 Installation

🎯 Quick Start

📚 API Reference

Initialize

Add Vectors

Search

Save/Load

Delete Vector

Get Stats

⚡ Performance

🧪 Testing

📄 License

🤝 Contributing

About

Uh oh!

Releases

Packages

Languages

License

SherifSystems/PythonVectorDB

Folders and files

Latest commit

History

Repository files navigation

PythonVectorDB

🚀 Features

📦 Installation

🎯 Quick Start

📚 API Reference

Initialize

Add Vectors

Search

Save/Load

Delete Vector

Get Stats

⚡ Performance

🧪 Testing

📄 License

🤝 Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages