This project ingests train.csv, generates 1024-d embeddings using intfloat/e5-large-v2, stores vectors in multiple backends, and benchmarks Recall@K + latency plus QPS throughput.
Special thanks to Tahir Saeed for his collaboration on this project.
Supported backends:
pgvector(PostgreSQL)chromaqdrantweaviatemilvusfaiss
- Python 3.9+
- Docker + Docker Compose (for non-FAISS backends)
Install dependencies:
pip3 install -r requirements.txtUse run_benchmarks.py to orchestrate docker / insert / recall benchmark / QPS steps across one or more backends.
python3 run_benchmarks.py -d all -D -i -r -q -o outputs/benchmark.csv-d, --db-name(required): target backends. Use comma-separated names (pgvector,qdrant) orall.-D, --docker: start backend containers using each backend's compose file.faissisN/A (local backend).-i, --insert: run backend insert scripts. If enabled, shared embeddings are precomputed once before the first backend insert unless disabled.-r, --recall: run recall/latency benchmark scripts (benchmark-*.py).-q, --qps: run throughput benchmark scripts (benchmark-qps-*.py).-I, --insert-args "...": pass-through args only for insert scripts.-R, --recall-args "...": pass-through args only for recall scripts.-Q, --qps-args "...": pass-through args only for QPS scripts.-o, --benchmark-csv-path: output path for consolidated benchmark CSV (default:benchmark.csv).- Backward-compatible aliases are still accepted:
-dbname,--dbname, and--recal.
Parameter defaults and behavior:
- At least one action flag is required:
--docker,--insert,--recall, or--qps(or short forms). --db-name all(or-d all) expands to:pgvector,chroma,qdrant,weaviate,milvus,faiss.--distance cosineis auto-added to insert/recall/qps arg groups if missing.--num-queries 480is auto-added to recall args if missing.--insertprecompute step is skipped when--insert-argsincludes--no-embedding-cache.- Invalid quoted arg strings (for example unmatched quotes) in
--insert-args,--recall-args, or--qps-argswill stop execution with a parser error. - Docker handling:
- Fatal and immediate stop: Docker daemon not running, or Docker CLI not found.
- Per-DB docker/container issues are reported in
Summary; that DB's benchmark steps are marked as skipped while other DBs continue.
Template:
python3 run_benchmarks.py -d <db|db1,db2|all> <action flags>Every possible action combination:
# 1 action
python3 run_benchmarks.py -d all -D
python3 run_benchmarks.py -d all -i
python3 run_benchmarks.py -d all -r
python3 run_benchmarks.py -d all -q
# 2 actions
python3 run_benchmarks.py -d all -D -i
python3 run_benchmarks.py -d all -D -r
python3 run_benchmarks.py -d all -D -q
python3 run_benchmarks.py -d all -i -r
python3 run_benchmarks.py -d all -i -q
python3 run_benchmarks.py -d all -r -q
# 3 actions
python3 run_benchmarks.py -d all -D -i -r
python3 run_benchmarks.py -d all -D -i -q
python3 run_benchmarks.py -d all -D -r -q
python3 run_benchmarks.py -d all -i -r -q
# 4 actions
python3 run_benchmarks.py -d all -D -i -r -qRecommended practical examples:
# Insert only for pgvector and qdrant
python3 run_benchmarks.py -d pgvector,qdrant -i
# Force rebuild of shared embedding cache before insert
python3 run_benchmarks.py -d all -i -I "--force-rebuild-embeddings"
# Use a custom embedding cache directory
python3 run_benchmarks.py -d all -i -I "--cache-dir .cache/embeddings"
# Disable shared cache and fall back to per-script embedding
python3 run_benchmarks.py -d all -i -I "--no-embedding-cache"
# Recall benchmark only faiss
python3 run_benchmarks.py -d faiss -r
# Forward extra args to recall benchmark scripts
python3 run_benchmarks.py -d all -r -R "--k-values 1,5,10 --num-queries 300"
# Forward extra args to QPS scripts
python3 run_benchmarks.py -d qdrant -q -Q "--k 10 --seconds 20 --concurrency 8"QPS scripts can be run via run_benchmarks.py -q (or --qps) or standalone per backend.
Common example pattern:
python3 <backend>/benchmark-qps-<backend>.py --distance cosine --k 10 --seconds 20 --concurrency 8Examples:
python3 pgvector/benchmark-qps-pgvector.py --distance cosine --k 10 --seconds 20 --concurrency 8
python3 chroma/benchmark-qps-chroma.py --distance cosine --k 10 --seconds 20 --concurrency 8
python3 qdrant/benchmark-qps-qdrant.py --distance cosine --k 10 --seconds 20 --concurrency 8
python3 weaviate/benchmark-qps-weaviate.py --distance cosine --k 10 --seconds 20 --concurrency 8
python3 milvus/benchmark-qps-milvus.py --distance cosine --k 10 --seconds 20 --concurrency 8
python3 faiss/benchmark-qps-faiss.py --distance cosine --k 10 --seconds 20 --concurrency 8Each benchmark-*.py script prints runs in a standardized format:
===============
Benchmark Results
===============
Run: default
Distance: cosine
Measured queries: 480
-------------------------------------
Recall@1: 1.0000
Recall@5: 0.9880
Recall@10: 0.9860
-------------------------------------
Latency avg: 0.25 ms
Latency p50: 0.16 ms
Latency p95: 0.37 ms
-------------------------------------
QPS scripts (benchmark-qps-*.py) print in this standardized format:
===============
Benchmark Results
===============
Run: default
Distance: cosine
k: 10
-------------------------------------
Concurrency: 8
Duration (s): 20.00 (warmup 2.00s)
Measured queries: 394099
QPS: 21894.39
-------------------------------------
Latency avg: 0.36 ms
Latency p50: 0.30 ms
Latency p95: 0.71 ms
Latency p99: 1.36 ms
-------------------------------------
After all benchmark scripts finish, run_benchmarks.py prints separate final tables:
Final Recall Benchmark TableFinal QPS Benchmark Table
Recall table columns:
DBRunDistanceQueriesRecall@1Recall@5Recall@10Lat avg (ms)Lat p50 (ms)Lat p95 (ms)
QPS table columns:
DBRunDistanceQueriesQPSLat avg (ms)Lat p50 (ms)Lat p95 (ms)Lat p99 (ms)
run_benchmarks.py also writes a consolidated benchmark CSV via -o/--benchmark-csv-path (default: benchmark.csv).
run_benchmarks.py also prints a consolidated insert table:
DBRowsEmbedding SourceEmbedding time (s)Write time (s)Build time (s)
pgvector/insert-data-pgvector.pychroma/insert-data-chroma.pyqdrant/insert-data-qdrant.pyweaviate/insert-data-weaviate.pymilvus/insert-data-milvus.pyfaiss/insert-data-faiss.py
pgvector/benchmark-pgvector.pychroma/benchmark-chroma.pyqdrant/benchmark-qdrant.pyweaviate/benchmark-weaviate.pymilvus/benchmark-milvus.pyfaiss/benchmark-faiss.py
pgvector/benchmark-qps-pgvector.pychroma/benchmark-qps-chroma.pyqdrant/benchmark-qps-qdrant.pyweaviate/benchmark-qps-weaviate.pymilvus/benchmark-qps-milvus.pyfaiss/benchmark-qps-faiss.py
shared/embedding_cache.pyshared/precompute_embeddings.py
docker-compose-pgvector.ymldocker-compose-chroma.ymldocker-compose-qdrant.ymldocker-compose-weviate.ymldocker-compose-milvus.yml
run_benchmarks.pydefaults to--distance cosinefor insert and recall benchmark if you do not pass--distance.run_benchmarks.pydefaults recall benchmark scripts to--num-queries 480if you do not pass--num-queries.- Benchmarks default
--warmup-queriesto0, so measured queries match requested queries by default. - Shared embedding cache uses
NPZ + JSON. - Cache invalidates/rebuilds based on CSV content digest, model name, and embedding text prefix version.
- The same cached vectors are reused by all insert scripts in a run.
- Use the same
--distancefamily during insert and recall/qps benchmarking for fair comparisons. - Milvus benchmark now fails fast if requested
--distancedoes not match the collection index metric. - Weaviate benchmark includes retry-based connection startup handling for container readiness.
- FAISS runs locally and does not require Docker.
Common:
CSV_PATH(default:train.csv)EMBEDDING_MODEL(default:intfloat/e5-large-v2)
pgvector benchmark:
DATABASE_URLTABLE_NAME,ID_COLUMN,EMBEDDING_COLUMN,GENRE_COLUMN
chroma:
CHROMA_MODE,CHROMA_HOST,CHROMA_PORT,CHROMA_PERSIST_DIRCOLLECTION_NAME
qdrant:
QDRANT_URLCOLLECTION_NAME
weaviate:
WEAVIATE_HTTP_HOST,WEAVIATE_HTTP_PORTWEAVIATE_GRPC_HOST,WEAVIATE_GRPC_PORTWEAVIATE_SECURE,WEAVIATE_COLLECTION
milvus:
MILVUS_HOST,MILVUS_PORTMILVUS_COLLECTION,MILVUS_INDEX_NAMEMILVUS_DISTANCE/MILVUS_METRIC
faiss:
FAISS_DISTANCE,FAISS_INDEX_TYPEFAISS_IVF_NLIST,FAISS_HNSW_MFAISS_OUTPUT_DIR






