Performance Evaluation of Vector Databases in Recommender Systems

This repository contains the source code and experimental framework for my Master's Thesis at the University of West Attica. It implements a benchmark of Vector Databases (VDBMS) and Embedding Models, evaluating Recall, Precision, nDCG and Latency.

🛠️ Setup & Installation

1. Install Dependencies

This project was developed using Python 3.13.9. To ensure reproducibility, install the pinned versions:

pip install -r requirements.txt

2. Configuration (.env)

Create a .env file in the root directory to configure API keys for OpenAI and Pinecone:

# OpenAI embeddings (used by --model openai)
OPENAI_API_KEY=your_openai_key_here

# Pinecone access (used by 30_build_indexes.py and 40_db_benchmark.py)
PINECONE_API_KEY=your_pinecone_key_here
PINECONE_INDEX_FIQA=your_fiqa_index_name_here
PINECONE_INDEX_MOVIELENS=your_movielens_index_name_here

# Milvus configuration
MILVUS_HOST=your_milvus_host_here
MILVUS_PORT=your_milvus_port_here

3. Infrastructure Setup

🐳 Milvus (Local Docker)

Run Milvus locally as a standalone container. Instructions can be found here.

☁️ Pinecone (Cloud)

Ensure you have created the necessary indexes in your Pinecone console (Serverless or Pod-based) and added the credentials to the .env file.

🚀 Workflow

Step 0: Download Datasets

Download and prepare the BEIR FiQA and MovieLens 20M datasets.

python scripts/00_get_data.py

🧪 Phase 1: Vector Database Benchmark

Goal: Study Infrastructure Behavior (FAISS, Chroma, Milvus, Pinecone) using all-mini-lm-l6-v2.

1. Generate Embeddings Create 384D embeddings for both datasets using SentenceTransformers.

python scripts/10_make_embeddings.py --model mini --dataset all

2. Generate Ground Truth Calculate exact k-NN (Brute Force) to serve as the baseline for Recall calculations.

python scripts/20_generate_db_ground_truth.py

3. Build Indexes Populate all vector stores with the generated data.

# FAISS
python scripts/30_build_indexes.py --dataset fiqa_corpus --backend faiss --model mini
python scripts/30_build_indexes.py --dataset ml20m_movie --backend faiss --model mini

# Chroma
python scripts/30_build_indexes.py --dataset fiqa_corpus --backend chroma --model mini
python scripts/30_build_indexes.py --dataset ml20m_movie --backend chroma --model mini

# Milvus
python scripts/30_build_indexes.py --dataset fiqa_corpus --backend milvus --model mini
python scripts/30_build_indexes.py --dataset ml20m_movie --backend milvus --model mini

# Pinecone
python scripts/30_build_indexes.py --dataset fiqa_corpus --backend pinecone --model mini
python scripts/30_build_indexes.py --dataset ml20m_movie --backend pinecone --model mini

4. Run Benchmarks Execute queries and measure latency/recall. Use --export to save qualitative results (JSON).

# FAISS
python scripts/40_db_benchmark.py --dataset ml20m_movie --backend faiss --export
python scripts/40_db_benchmark.py --dataset fiqa_corpus --backend faiss --export

# Chroma
python scripts/40_db_benchmark.py --dataset ml20m_movie --backend chroma --export
python scripts/40_db_benchmark.py --dataset fiqa_corpus --backend chroma --export

# Milvus
python scripts/40_db_benchmark.py --dataset ml20m_movie --backend milvus --export
python scripts/40_db_benchmark.py --dataset fiqa_corpus --backend milvus --export

# Pinecone
python scripts/40_db_benchmark.py --dataset ml20m_movie --backend pinecone --export
python scripts/40_db_benchmark.py --dataset fiqa_corpus --backend pinecone --export

5. (Optional)Visualize Results Generate plots comparing the backends (Recall vs Latency).

python scripts/41_plot_db_results.py

6. (Optional)Extract summary csv

python scripts/42_extract_summary.py

🧠 Phase 2: Embedding Model Benchmark

Goal: Study models' behavior and their semantic representations (all-MiniLM-L6-v2, all-mpnet-base-v2 και text-embedding-3-small) using FAISS as the backend.

1. Create Embeddings for Advanced Models

python scripts/10_make_embeddings.py --model mpnet --dataset ml20m
python scripts/10_make_embeddings.py --model openai --dataset ml20m
# (MiniLM was already created in Phase 1)

2. Build Indexes (FAISS)

python scripts/30_build_indexes.py --dataset ml20m_movie --backend faiss --model mpnet
python scripts/30_build_indexes.py --dataset ml20m_movie --backend faiss --model openai

3. Full Retrieval Benchmark See how well the model retrieves relevant items from the whole corpus.

python scripts/50_model_benchmark.py --model mini --export
python scripts/50_model_benchmark.py --model mpnet --export
python scripts/50_model_benchmark.py --model openai --export

4. Re-ranking Benchmark Study the model's ability to rank a candidate list.

python scripts/60_rerank_model_benchmark.py --model mini --export
python scripts/60_rerank_model_benchmark.py --model mpnet --export
python scripts/60_rerank_model_benchmark.py --model openai --export

Disclaimer

This work represents an exploratory study conducted within the scope of an MSc thesis. While every effort has been made to ensure accuracy, the findings should be viewed as observations specific to the testing environment rather than definitive conclusions. Any errors or oversights are my own, and I welcome constructive feedback.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
embeddings		embeddings
ground_truth		ground_truth
indices		indices
plots		plots
results_benchmark		results_benchmark
results_model_benchmark		results_model_benchmark
results_rerank		results_rerank
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Performance Evaluation of Vector Databases in Recommender Systems

🛠️ Setup & Installation

1. Install Dependencies

2. Configuration (.env)

3. Infrastructure Setup

🐳 Milvus (Local Docker)

☁️ Pinecone (Cloud)

🚀 Workflow

Step 0: Download Datasets

🧪 Phase 1: Vector Database Benchmark

🧠 Phase 2: Embedding Model Benchmark

Disclaimer

About

Uh oh!

Languages

License

ktsouvalis/mscacs-thesis

Folders and files

Latest commit

History

Repository files navigation

Performance Evaluation of Vector Databases in Recommender Systems

🛠️ Setup & Installation

1. Install Dependencies

2. Configuration (.env)

3. Infrastructure Setup

🐳 Milvus (Local Docker)

☁️ Pinecone (Cloud)

🚀 Workflow

Step 0: Download Datasets

🧪 Phase 1: Vector Database Benchmark

🧠 Phase 2: Embedding Model Benchmark

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages