Skip to content

Conversational movie recommendation system powered by RAG architecture with semantic search and multi-vector embeddings.

Notifications You must be signed in to change notification settings

wbott/rag-movie-rec-redux

Repository files navigation

🎬 RAG Movie Recommender

Python LangChain LLaMA FAISS Docker Streamlit License

A production-ready Retrieval-Augmented Generation system for intelligent movie recommendations using local LLM inference and semantic search.

Enterprise-grade movie recommendation engine leveraging Retrieval-Augmented Generation architecture with LangChain orchestration, LLaMA 3 local inference, and FAISS vector databases for semantic search. Built with MLOps best practices, Docker containerization, and API-first architecture for scalable deployment.

πŸš€ Key Features

🧠 AI-Powered Intelligence

  • 🦜 LangChain Integration β€” Advanced prompt engineering and chain orchestration
  • πŸ€– Local LLM Inference β€” LLaMA 3 via Ollama for privacy-preserving AI
  • πŸ” Semantic Search β€” FAISS vector databases with Transformer embeddings
  • πŸ“Š RAG Architecture β€” Retrieval-Augmented Generation for grounded responses

πŸ—οΈ Production Architecture

  • 🐳 Docker Support β€” Multi-stage containerization with GPU acceleration
  • πŸ”§ MLOps Pipeline β€” Automated testing, model evaluation, and CI/CD ready
  • πŸ“ˆ Scalable Infrastructure β€” Microservices design with FastAPI compatibility
  • 🌐 Multi-modal Interfaces β€” CLI, Jupyter, and Streamlit web UI

πŸ›‘οΈ Enterprise Features

  • πŸ”’ Privacy-First β€” No external API dependencies, local data processing
  • πŸ“Š Observability β€” Comprehensive logging and monitoring hooks
  • πŸ§ͺ Testing Framework β€” Pytest-based validation and health checks
  • πŸ“š Documentation β€” Complete API documentation and deployment guides

πŸ› οΈ Technology Stack

Core ML & AI

  • Deep Learning: PyTorch, Transformers, sentence-transformers
  • Vector Databases: FAISS for similarity search and retrieval
  • NLP Processing: HuggingFace ecosystem, tokenization pipelines
  • LLM Integration: LangChain, Ollama, prompt engineering

Data Pipeline & MLOps

  • Data Processing: pandas, NumPy, feature engineering pipelines
  • Model Deployment: Docker, containerized inference systems
  • Testing: pytest, model evaluation, A/B testing frameworks
  • Monitoring: Health checks, performance metrics, logging

Development & Deployment

  • Languages: Python 3.8+, optimized for ML workloads
  • Containerization: Docker, docker-compose, multi-stage builds
  • Version Control: Git-based ML workflows, reproducible experiments
  • UI Frameworks: Streamlit, Jupyter Lab, interactive dashboards

πŸ“¦ Quick Start

🐳 Docker Deployment (Recommended)

# Clone repository
git clone <repository-url>
cd rag-movie-rec-redux

# Start services with Docker Compose
docker-compose -f docker/docker-compose.yml up -d

# Initialize LLaMA 3 model
docker exec ollama-service ollama pull llama3

# Build vector store from IMDb data
docker exec rag-movie-rec python -m src.rag_movie_rec.cli build

# Launch web interface
docker-compose -f docker/docker-compose.yml --profile ui up -d
open http://localhost:8501

πŸ’» Local Development Setup

# Install dependencies
pip install -r requirements.txt

# Install Ollama and LLaMA 3
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3
ollama serve

# Process IMDb dataset
python -m src.rag_movie_rec.cli process

# Build FAISS vector store
python -m src.rag_movie_rec.cli build

# Interactive query mode
python -m src.rag_movie_rec.cli query --interactive

πŸ§ͺ System Validation

# Run comprehensive health check
python tests/test_setup.py

# Full test suite
pytest tests/ -v --cov=src/rag_movie_rec

# Docker health validation
docker exec rag-movie-rec python -m src.rag_movie_rec.cli health

πŸ—οΈ Architecture Overview

RAG Pipeline Architecture

graph TD
    A[User Query] --> B[Query Processing]
    B --> C[Vector Similarity Search]
    C --> D[FAISS Index]
    D --> E[Retrieved Documents]
    E --> F[Context Assembly]
    F --> G[LLaMA 3 LLM]
    G --> H[Generated Response]
    
    I[IMDb Dataset] --> J[Data Preprocessing]
    J --> K[Text Embedding]
    K --> L[sentence-transformers]
    L --> D
Loading

System Components

  1. πŸ“Š Data Preprocessing Pipeline

    • IMDb dataset normalization and feature engineering
    • Text chunking and metadata extraction
    • Genre classification and rating normalization
  2. πŸ” Vector Store Management

    • FAISS index creation and optimization
    • Embedding generation with sentence-transformers
    • Semantic search and similarity ranking
  3. πŸ€– RAG Engine

    • LangChain orchestration and prompt templates
    • Context retrieval and document ranking
    • LLaMA 3 inference and response generation
  4. 🌐 Interface Layer

    • CLI for scripting and automation
    • Streamlit web UI for interactive queries
    • RESTful API endpoints (extensible)

🎯 Usage Examples

Command Line Interface

# Process raw IMDb data
rag-movie-rec process --input-file IMDb_Dataset_Composite_Cleaned.csv

# Build optimized vector store
rag-movie-rec build --embedding-model sentence-transformers/all-MiniLM-L6-v2

# Query recommendations
rag-movie-rec query --query "Recommend sci-fi movies like The Matrix"

# Interactive mode with source attribution
rag-movie-rec query --interactive --show-sources

# Similarity search
rag-movie-rec search "Christopher Nolan thriller movies" --k 5

Python API Integration

from src.rag_movie_rec.rag_engine import MovieRAGEngine

# Initialize RAG system
engine = MovieRAGEngine(
    vector_store_path="faiss_imdb_store",
    llm_model="llama3",
    temperature=0.7
)

# Setup retrieval chain
engine.setup_rag_chain()

# Get movie recommendations
result = engine.query("What are some mind-bending movies like Inception?")
print(result['answer'])

# Batch processing
queries = [
    "Best horror movies from the 2010s",
    "Romantic comedies with high ratings", 
    "Action movies starring Tom Cruise"
]
results = engine.batch_query(queries)

Advanced Query Patterns

# Semantic similarity search
similar_movies = engine.get_similar_movies("The Dark Knight", k=10)

# Custom retrieval parameters
engine.retrieval_k = 5
engine.llm.temperature = 0.3

# Health monitoring
health_status = engine.health_check()
print(f"System Status: {health_status}")

πŸ”§ Configuration & Customization

Environment Variables

# Ollama Configuration
OLLAMA_HOST=localhost:11434
OLLAMA_MODEL=llama3

# Vector Store Settings  
VECTOR_STORE_PATH=faiss_imdb_store
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2

# LLM Parameters
LLM_TEMPERATURE=0.5
RETRIEVAL_K=3

# Optional: OMDB API for enhanced metadata
OMDB_API_KEY=your_key_here

Custom Model Configuration

# Advanced embedding models
EMBEDDING_OPTIONS = [
    "sentence-transformers/all-MiniLM-L6-v2",      # Fast, good quality
    "sentence-transformers/all-mpnet-base-v2",     # Best quality
    "sentence-transformers/multi-qa-MiniLM-L6-cos" # Q&A optimized
]

# Alternative LLM models via Ollama
LLM_OPTIONS = [
    "llama3",           # Meta LLaMA 3 8B
    "llama3:70b",       # LLaMA 3 70B (requires more RAM)
    "mistral",          # Mistral 7B
    "codellama"         # Code-optimized variant
]

πŸ“Š Performance & Benchmarks

System Requirements

Component Minimum Recommended Optimal
RAM 8GB 16GB 32GB+
Storage 10GB 50GB 100GB+
CPU 4 cores 8 cores 16+ cores
GPU None 8GB VRAM 24GB+ VRAM

Performance Metrics

  • Query Latency: ~2-5 seconds (CPU), ~0.5-1 second (GPU)
  • Vector Search: Sub-100ms for 50K+ movies
  • Throughput: 10-50 queries/minute depending on hardware
  • Memory Usage: ~4-8GB for full IMDb dataset

πŸ§ͺ Testing & Quality Assurance

Automated Testing Suite

# Unit tests
pytest tests/test_data_processor.py -v

# Integration tests  
pytest tests/test_vector_store.py -v

# End-to-end pipeline tests
pytest tests/test_rag_engine.py -v

# Performance benchmarks
pytest tests/test_performance.py --benchmark

# System health validation
python tests/test_setup.py

Code Quality & Standards

# Code formatting
black src/ tests/

# Import sorting
isort src/ tests/

# Linting
flake8 src/ tests/

# Type checking
mypy src/rag_movie_rec/

πŸš€ Deployment Strategies

Docker Production Deployment

# docker-compose.prod.yml
services:
  rag-movie-rec:
    image: rag-movie-rec:latest
    deploy:
      replicas: 3
      resources:
        limits:
          memory: 8G
          cpus: '4'
    environment:
      - OLLAMA_HOST=ollama-cluster:11434
      - VECTOR_STORE_PATH=/data/vector_store

Kubernetes Deployment

# k8s-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rag-movie-rec
spec:
  replicas: 3
  selector:
    matchLabels:
      app: rag-movie-rec
  template:
    spec:
      containers:
      - name: rag-movie-rec
        image: rag-movie-rec:latest
        resources:
          requests:
            memory: "4Gi"
            cpu: "2"
          limits:
            memory: "8Gi" 
            cpu: "4"

Cloud-Native Architecture

  • AWS: EKS + SageMaker + S3 for model artifacts
  • GCP: GKE + Vertex AI + Cloud Storage
  • Azure: AKS + Azure ML + Blob Storage

πŸ“ˆ Monitoring & Observability

Health Checks & Metrics

# Built-in health monitoring
from src.rag_movie_rec.rag_engine import MovieRAGEngine

engine = MovieRAGEngine()
health = engine.health_check()

# Component status
print(f"Vector Store: {'βœ…' if health['vector_store'] else '❌'}")
print(f"LLM Service: {'βœ…' if health['llm'] else '❌'}")
print(f"RAG Chain: {'βœ…' if health['rag_chain'] else '❌'}")

Integration with Monitoring Stack

# Prometheus metrics endpoint
metrics:
  - query_latency_seconds
  - vector_search_time_ms  
  - llm_inference_time_ms
  - active_connections
  - error_rate

# Grafana dashboards
dashboards:
  - rag_performance_overview
  - llm_usage_analytics
  - vector_store_metrics

🀝 Contributing & Development

Development Workflow

# Clone and set up development environment
git clone <repository-url>
cd rag-movie-rec-redux

# Install development dependencies
pip install -e ".[dev]"

# Run pre-commit hooks
pre-commit install

# Start development with Docker
docker-compose -f docker/docker-compose.yml --profile dev up -d

# Access Jupyter development environment
open http://localhost:8888

Code Contribution Guidelines

  1. πŸ”€ Feature Branches: Create feature branches from main
  2. πŸ§ͺ Test Coverage: Maintain >90% test coverage
  3. πŸ“ Documentation: Update docs for new features
  4. πŸ” Code Review: All PRs require review
  5. βœ… CI/CD: Automated testing and deployment

πŸ“š Documentation & Resources

Project Documentation

External Resources

πŸ“„ License & Citation

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

@software{rag_movie_recommender,
  title={RAG Movie Recommender: Local AI-Powered Movie Recommendations},
  author={RAG Movie Rec Team},
  year={2024},
  url={https://github.com/username/rag-movie-rec-redux},
  note={Retrieval-Augmented Generation system for movie recommendations}
}

πŸ™ Acknowledgments

  • Meta AI for LLaMA 3 open-source model
  • LangChain Team for the orchestration framework
  • Facebook Research for FAISS vector search
  • HuggingFace for Transformers and model hub
  • Ollama Team for local LLM deployment tools

🎬 Built with ❀️ for the AI and Machine Learning community

GitHub Stars GitHub Forks GitHub Issues