This repository contains the source code and experimental framework for my Master's Thesis at the University of West Attica. It implements a benchmark of Vector Databases (VDBMS) and Embedding Models, evaluating Recall, Precision, nDCG and Latency.
This project was developed using Python 3.13.9. To ensure reproducibility, install the pinned versions:
pip install -r requirements.txtCreate a .env file in the root directory to configure API keys for OpenAI and Pinecone:
# OpenAI embeddings (used by --model openai)
OPENAI_API_KEY=your_openai_key_here
# Pinecone access (used by 30_build_indexes.py and 40_db_benchmark.py)
PINECONE_API_KEY=your_pinecone_key_here
PINECONE_INDEX_FIQA=your_fiqa_index_name_here
PINECONE_INDEX_MOVIELENS=your_movielens_index_name_here
# Milvus configuration
MILVUS_HOST=your_milvus_host_here
MILVUS_PORT=your_milvus_port_hereRun Milvus locally as a standalone container. Instructions can be found here.
Ensure you have created the necessary indexes in your Pinecone console (Serverless or Pod-based) and added the credentials to the .env file.
Download and prepare the BEIR FiQA and MovieLens 20M datasets.
python scripts/00_get_data.pyGoal: Study Infrastructure Behavior (FAISS, Chroma, Milvus, Pinecone) using all-mini-lm-l6-v2.
1. Generate Embeddings Create 384D embeddings for both datasets using SentenceTransformers.
python scripts/10_make_embeddings.py --model mini --dataset all2. Generate Ground Truth Calculate exact k-NN (Brute Force) to serve as the baseline for Recall calculations.
python scripts/20_generate_db_ground_truth.py3. Build Indexes Populate all vector stores with the generated data.
# FAISS
python scripts/30_build_indexes.py --dataset fiqa_corpus --backend faiss --model mini
python scripts/30_build_indexes.py --dataset ml20m_movie --backend faiss --model mini
# Chroma
python scripts/30_build_indexes.py --dataset fiqa_corpus --backend chroma --model mini
python scripts/30_build_indexes.py --dataset ml20m_movie --backend chroma --model mini
# Milvus
python scripts/30_build_indexes.py --dataset fiqa_corpus --backend milvus --model mini
python scripts/30_build_indexes.py --dataset ml20m_movie --backend milvus --model mini
# Pinecone
python scripts/30_build_indexes.py --dataset fiqa_corpus --backend pinecone --model mini
python scripts/30_build_indexes.py --dataset ml20m_movie --backend pinecone --model mini4. Run Benchmarks
Execute queries and measure latency/recall. Use --export to save qualitative results (JSON).
# FAISS
python scripts/40_db_benchmark.py --dataset ml20m_movie --backend faiss --export
python scripts/40_db_benchmark.py --dataset fiqa_corpus --backend faiss --export
# Chroma
python scripts/40_db_benchmark.py --dataset ml20m_movie --backend chroma --export
python scripts/40_db_benchmark.py --dataset fiqa_corpus --backend chroma --export
# Milvus
python scripts/40_db_benchmark.py --dataset ml20m_movie --backend milvus --export
python scripts/40_db_benchmark.py --dataset fiqa_corpus --backend milvus --export
# Pinecone
python scripts/40_db_benchmark.py --dataset ml20m_movie --backend pinecone --export
python scripts/40_db_benchmark.py --dataset fiqa_corpus --backend pinecone --export5. (Optional)Visualize Results Generate plots comparing the backends (Recall vs Latency).
python scripts/41_plot_db_results.py6. (Optional)Extract summary csv
python scripts/42_extract_summary.pyGoal: Study models' behavior and their semantic representations (all-MiniLM-L6-v2, all-mpnet-base-v2 και text-embedding-3-small) using FAISS as the backend.
1. Create Embeddings for Advanced Models
python scripts/10_make_embeddings.py --model mpnet --dataset ml20m
python scripts/10_make_embeddings.py --model openai --dataset ml20m
# (MiniLM was already created in Phase 1)2. Build Indexes (FAISS)
python scripts/30_build_indexes.py --dataset ml20m_movie --backend faiss --model mpnet
python scripts/30_build_indexes.py --dataset ml20m_movie --backend faiss --model openai3. Full Retrieval Benchmark See how well the model retrieves relevant items from the whole corpus.
python scripts/50_model_benchmark.py --model mini --export
python scripts/50_model_benchmark.py --model mpnet --export
python scripts/50_model_benchmark.py --model openai --export4. Re-ranking Benchmark Study the model's ability to rank a candidate list.
python scripts/60_rerank_model_benchmark.py --model mini --export
python scripts/60_rerank_model_benchmark.py --model mpnet --export
python scripts/60_rerank_model_benchmark.py --model openai --exportThis work represents an exploratory study conducted within the scope of an MSc thesis. While every effort has been made to ensure accuracy, the findings should be viewed as observations specific to the testing environment rather than definitive conclusions. Any errors or oversights are my own, and I welcome constructive feedback.