Skip to content

avocatt/kvkk-rag-experiments

Repository files navigation

KVKK RAG Pipeline - Learning & Experimentation Platform

A comprehensive, fully local RAG (Retrieval-Augmented Generation) pipeline for experimenting with different architectural parameters and understanding how each component affects performance. Built with Turkish legal text (KVKK) as the knowledge base.

Purpose

This project is designed as a learning platform to:

  • Understand RAG architecture deeply through hands-on experimentation
  • Compare different embedding models (multilingual vs Turkish-specific)
  • Experiment with quantization levels (FP32, FP16, INT8) and measure speed vs quality tradeoffs
  • Test various chunking strategies and their impact on retrieval
  • Compare retrieval techniques (basic, multi-query, compression, reranking)
  • Evaluate local LLM performance with different quantization levels
  • Compare RAG vs non-RAG approaches

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                    KVKK RAG Pipeline                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. Document Loading (PDF → Pages)                         │
│     └─ PyPDFLoader with metadata preservation              │
│                                                             │
│  2. Chunking (Pages → Chunks)                              │
│     ├─ Character Splitting (fixed size)                    │
│     ├─ Recursive Splitting (semantic-aware)                │
│     └─ Semantic Chunking (embedding-based)                 │
│                                                             │
│  3. Embeddings (Chunks → Vectors)                          │
│     ├─ Multilingual: E5-base, Paraphrase-multilingual      │
│     ├─ Turkish: Turkish-BERT, Turkish-NLI                  │
│     └─ Quantization: FP32 / FP16 / INT8                    │
│                                                             │
│  4. Vector Store (Indexing)                                │
│     ├─ FAISS (fast, in-memory)                             │
│     └─ Chroma (persistent, hybrid search)                  │
│                                                             │
│  5. Retrieval (Query → Relevant Chunks)                    │
│     ├─ Basic similarity search                             │
│     ├─ Multi-query (multiple query variations)             │
│     ├─ Contextual compression                              │
│     ├─ Reranking                                           │
│     └─ Hybrid search (keyword + semantic)                  │
│                                                             │
│  6. Generation (Chunks + Query → Answer)                   │
│     └─ Local LLM via Ollama:                               │
│        ├─ Llama 3.1 8B                                     │
│        ├─ Mistral 7B                                       │
│        └─ Qwen 2.5 7B                                      │
│                                                             │
│  7. Evaluation                                             │
│     ├─ Retrieval metrics (precision, latency)              │
│     ├─ Generation quality (keyword matching)               │
│     └─ Baseline comparison (RAG vs full context)           │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Key Features

Fully Local & Free

  • No API costs - everything runs on your machine
  • Complete privacy - no data leaves your computer
  • Works offline after initial model downloads

Modular & Configurable

  • Easy parameter switching via YAML configs
  • Mix and match components
  • Systematic experimentation

Comprehensive Metrics

  • Retrieval speed and quality
  • Generation performance
  • Memory usage tracking
  • Quantization impact measurement

Educational Focus

  • Clear code with extensive documentation
  • Architectural decision explanations
  • Comparative analysis tools

Installation

Prerequisites

  • Python 3.10+
  • Ollama installed and running

Setup

  1. Clone or navigate to the project directory:
cd kvkk-rag-pipeline
  1. Create virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Install Ollama and pull a model:
# Install from https://ollama.ai
# Then pull a model (if you don't have llama3 already):
ollama pull llama3:8b
# Or llama3.1 if you prefer: ollama pull llama3.1:8b
  1. Add your KVKK PDF files to the data/ directory

Quick Start

1. Run with Default Configuration

python main.py

This will:

  • Load PDFs from data/ directory
  • Chunk documents using recursive strategy (512 chars, 50 overlap)
  • Create embeddings with multilingual-e5-base (FP32)
  • Build FAISS vector store
  • Initialize Llama 3.1 8B via Ollama
  • Run evaluation on test questions

2. Interactive Mode

python main.py --interactive

Ask questions about KVKK interactively.

3. Compare Embeddings

python main.py --compare-embeddings

Benchmarks different quantization levels for embedding models.

4. Check Ollama Status

python main.py --check-ollama

Configuration

All experiments are configured via YAML files in experiments/. Here's what you can configure:

Document Processing

document:
  data_dir: data
  file_pattern: "*.pdf"

Chunking Strategy

chunking:
  strategy: recursive  # Options: character, recursive, semantic
  chunk_size: 512     # 256, 512, 1024, etc.
  chunk_overlap: 50   # 0, 50, 100, 200, etc.

Why this matters: Chunk size affects retrieval precision. Smaller chunks = more precise but less context. Larger chunks = more context but less precise.

Embedding Model

embedding:
  model_name: intfloat/multilingual-e5-base
  # Options:
  #   - intfloat/multilingual-e5-base (multilingual)
  #   - sentence-transformers/paraphrase-multilingual-mpnet-base-v2
  #   - dbmdz/bert-base-turkish-cased (Turkish-specific)
  #   - emrecan/bert-base-turkish-cased-mean-nli-stsb-tr

  quantization: fp32  # Options: fp32, fp16, int8
  batch_size: 32

Why this matters:

  • Multilingual models work well across languages but may miss Turkish nuances
  • Turkish-specific models may perform better on Turkish legal text
  • Quantization trades quality for speed: FP16 is ~2x faster, INT8 is ~4x faster

Vector Store

vector_store:
  store_type: faiss  # Options: faiss, chroma
  persist_dir: vector_stores

Why this matters:

  • FAISS: Faster, in-memory, best for experimentation
  • Chroma: Persistent, supports hybrid search, better for production

Retrieval Strategy

retrieval:
  strategy: basic  # Options: basic, multi_query, compression, rerank, hybrid
  top_k: 5        # Number of chunks to retrieve

Why this matters:

  • Basic: Fast, simple similarity search
  • Multi-query: Generates query variations, better recall
  • Compression: Retrieves more, compresses to relevant, better precision
  • Rerank: Retrieve many, rerank, keep best
  • Hybrid: Combines keyword + semantic search (Chroma only)

LLM

llm:
  model: llama3:8b  # Options: llama3:8b, llama3.1:8b, mistral:7b, qwen2.5:7b
  temperature: 0.0  # 0 = deterministic, higher = more creative
  max_tokens: 1024

Why this matters:

  • Llama 3.1: Balanced, good general performance
  • Mistral: Fast, efficient
  • Qwen 2.5: Excellent for non-English languages
  • Temperature: 0 for factual answers, 0.7-1.0 for creative generation

Baseline Comparison

baseline:
  enabled: true
  full_context: true  # Use full document instead of RAG

Why this matters: Compare RAG vs passing the full document to see if chunking/retrieval actually helps.

Experimentation Guide

1. Chunking Experiments

Create configs with different chunk sizes:

experiments/chunk_256.yaml

experiment_name: chunk_size_256
chunking:
  chunk_size: 256
  chunk_overlap: 25

experiments/chunk_1024.yaml

experiment_name: chunk_size_1024
chunking:
  chunk_size: 1024
  chunk_overlap: 100

Run both:

python main.py --config experiments/chunk_256.yaml
python main.py --config experiments/chunk_1024.yaml

Compare results in experiments/results/

Questions to explore:

  • How does chunk size affect retrieval precision?
  • Does overlap improve context preservation?
  • What's the optimal size for legal text?

2. Embedding Model Comparison

Test multilingual vs Turkish-specific:

# Config 1: Multilingual
embedding:
  model_name: intfloat/multilingual-e5-base
  quantization: fp32

# Config 2: Turkish
embedding:
  model_name: dbmdz/bert-base-turkish-cased
  quantization: fp32

Questions to explore:

  • Does Turkish-specific model improve retrieval for KVKK?
  • How much better (if at all)?
  • Is the improvement worth the reduced flexibility?

3. Quantization Experiments

Test speed vs quality tradeoffs:

python main.py --compare-embeddings

Or create configs:

# FP32 (baseline)
embedding:
  quantization: fp32

# FP16 (2x faster)
embedding:
  quantization: fp16

# INT8 (4x faster)
embedding:
  quantization: int8

Questions to explore:

  • How much speed improvement?
  • How much quality degradation?
  • What's the sweet spot for your use case?

4. Retrieval Strategy Comparison

# Try each strategy
retrieval:
  strategy: basic
  # Then: multi_query, compression, rerank

Questions to explore:

  • Which strategy gives best precision?
  • Which is fastest?
  • Do advanced strategies justify the added complexity?

Project Structure

kvkk-rag-pipeline/
├── data/                          # Put your KVKK PDFs here
├── embeddings/                    # Downloaded embedding models
├── evaluation/
│   └── questions.yaml            # Test questions
├── experiments/
│   ├── default_config.yaml       # Default configuration
│   └── results/                  # Evaluation results (JSON)
├── notebooks/                     # Jupyter notebooks for exploration
├── src/
│   ├── config.py                 # Configuration management
│   ├── pipeline.py               # Main pipeline orchestrator
│   ├── document_processing/
│   │   ├── loader.py             # PDF loading
│   │   └── chunker.py            # Text chunking strategies
│   ├── embeddings/
│   │   └── embedding_manager.py  # Embeddings with quantization
│   ├── vector_stores/
│   │   └── vector_store_manager.py  # FAISS and Chroma
│   ├── retrieval/
│   │   └── retrieval_manager.py  # Retrieval strategies
│   ├── llm/
│   │   └── llm_manager.py        # Ollama integration
│   └── evaluation/
│       └── evaluator.py          # Evaluation framework
├── vector_stores/                 # Persisted vector stores
├── main.py                        # Main entry point
├── requirements.txt
└── README.md

Understanding the Results

After running experiments, check experiments/results/*.json:

{
  "config_name": "chunk_size_512",
  "retrieval_metrics": {
    "avg_retrieval_time_ms": 15.3,
    "avg_keyword_match_rate": 0.85,
    "avg_docs_retrieved": 5.0
  },
  "generation_metrics": {
    "avg_generation_time_s": 3.2,
    "avg_keyword_match_rate": 0.78,
    "avg_answer_length": 234
  }
}

Key metrics to compare:

  • Retrieval time: Lower is better (faster retrieval)
  • Keyword match rate: Higher is better (more relevant retrieval/generation)
  • Generation time: Lower is better, but quality matters more
  • Answer length: Not necessarily better if longer, check quality manually

Tips for Learning

Start Simple

  1. Run default config first
  2. Understand the baseline performance
  3. Change ONE parameter at a time
  4. Compare results

Focus on One Aspect

  • Week 1: Chunking experiments
  • Week 2: Embedding comparisons
  • Week 3: Retrieval strategies
  • Week 4: LLM quantization

Keep Notes

Document your findings:

  • What worked well?
  • What surprised you?
  • What tradeoffs did you discover?

Read the Code

The code is heavily documented. Read through:

  1. src/config.py - See all available options
  2. src/pipeline.py - Understand the flow
  3. Individual modules - Deep dive into each component

Advanced Usage

Jupyter Notebooks

Create a notebook for interactive exploration:

from pathlib import Path
from src.config import ExperimentConfig
from src.pipeline import RAGPipeline

# Load config
config = ExperimentConfig.from_yaml(Path("experiments/default_config.yaml"))

# Create pipeline
pipeline = RAGPipeline(config)
pipeline.run_full_pipeline()

# Query
answer = pipeline.query("KVKK nedir?")
print(answer)

# Inspect retrievals
pipeline.retrieval_manager.print_retrieval_results("Veri sorumlusu kimdir?")

Parameter Sweeps

Create a script to test multiple configurations:

from pathlib import Path
from src.config import ExperimentConfig, ChunkingConfig
from src.pipeline import RAGPipeline

chunk_sizes = [256, 512, 1024]
results = []

for size in chunk_sizes:
    config = ExperimentConfig.from_yaml(Path("experiments/default_config.yaml"))
    config.chunking.chunk_size = size
    config.experiment_name = f"chunk_{size}"

    pipeline = RAGPipeline(config)
    pipeline.run_full_pipeline()

    result = pipeline.evaluate()
    results.append(result)

# Compare all results
from src.evaluation import RAGEvaluator
evaluator = RAGEvaluator(Path("experiments/results"))
evaluator.compare_configurations(results)

Troubleshooting

"Ollama is not available"

  • Make sure Ollama is installed: https://ollama.ai
  • Start Ollama service
  • Pull a model: ollama pull llama3:8b

"No PDF files found"

  • Add PDF files to data/ directory
  • Check file pattern in config matches your files

Out of Memory

  • Reduce batch size in embedding config
  • Use smaller embedding model
  • Reduce chunk size
  • Use INT8 quantization

Slow Performance

  • Use FAISS instead of Chroma
  • Enable quantization (FP16 or INT8)
  • Reduce top_k in retrieval
  • Use smaller embedding model

Learning Resources

RAG Fundamentals

Embeddings

Quantization

Contributing

This is a learning project. Feel free to:

  • Add new chunking strategies
  • Integrate new embedding models
  • Implement additional retrieval techniques
  • Add evaluation metrics
  • Improve documentation

License

MIT License - Use freely for learning and experimentation.

Acknowledgments

  • LangChain for the RAG framework
  • Ollama for local LLM serving
  • Sentence Transformers for embeddings
  • The Turkish NLP community for Turkish models

Happy Learning! Remember: the goal is to understand, not just to run. Take time to experiment, observe, and learn from each configuration change.

About

RAG experiments for Turkish KVKK (data protection) documents

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors