A production-ready Retrieval-Augmented Generation system for intelligent movie recommendations using local LLM inference and semantic search.
Enterprise-grade movie recommendation engine leveraging Retrieval-Augmented Generation architecture with LangChain orchestration, LLaMA 3 local inference, and FAISS vector databases for semantic search. Built with MLOps best practices, Docker containerization, and API-first architecture for scalable deployment.
- π¦ LangChain Integration β Advanced prompt engineering and chain orchestration
- π€ Local LLM Inference β LLaMA 3 via Ollama for privacy-preserving AI
- π Semantic Search β FAISS vector databases with Transformer embeddings
- π RAG Architecture β Retrieval-Augmented Generation for grounded responses
- π³ Docker Support β Multi-stage containerization with GPU acceleration
- π§ MLOps Pipeline β Automated testing, model evaluation, and CI/CD ready
- π Scalable Infrastructure β Microservices design with FastAPI compatibility
- π Multi-modal Interfaces β CLI, Jupyter, and Streamlit web UI
- π Privacy-First β No external API dependencies, local data processing
- π Observability β Comprehensive logging and monitoring hooks
- π§ͺ Testing Framework β Pytest-based validation and health checks
- π Documentation β Complete API documentation and deployment guides
- Deep Learning: PyTorch, Transformers, sentence-transformers
- Vector Databases: FAISS for similarity search and retrieval
- NLP Processing: HuggingFace ecosystem, tokenization pipelines
- LLM Integration: LangChain, Ollama, prompt engineering
- Data Processing: pandas, NumPy, feature engineering pipelines
- Model Deployment: Docker, containerized inference systems
- Testing: pytest, model evaluation, A/B testing frameworks
- Monitoring: Health checks, performance metrics, logging
- Languages: Python 3.8+, optimized for ML workloads
- Containerization: Docker, docker-compose, multi-stage builds
- Version Control: Git-based ML workflows, reproducible experiments
- UI Frameworks: Streamlit, Jupyter Lab, interactive dashboards
# Clone repository
git clone <repository-url>
cd rag-movie-rec-redux
# Start services with Docker Compose
docker-compose -f docker/docker-compose.yml up -d
# Initialize LLaMA 3 model
docker exec ollama-service ollama pull llama3
# Build vector store from IMDb data
docker exec rag-movie-rec python -m src.rag_movie_rec.cli build
# Launch web interface
docker-compose -f docker/docker-compose.yml --profile ui up -d
open http://localhost:8501
# Install dependencies
pip install -r requirements.txt
# Install Ollama and LLaMA 3
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3
ollama serve
# Process IMDb dataset
python -m src.rag_movie_rec.cli process
# Build FAISS vector store
python -m src.rag_movie_rec.cli build
# Interactive query mode
python -m src.rag_movie_rec.cli query --interactive
# Run comprehensive health check
python tests/test_setup.py
# Full test suite
pytest tests/ -v --cov=src/rag_movie_rec
# Docker health validation
docker exec rag-movie-rec python -m src.rag_movie_rec.cli health
graph TD
A[User Query] --> B[Query Processing]
B --> C[Vector Similarity Search]
C --> D[FAISS Index]
D --> E[Retrieved Documents]
E --> F[Context Assembly]
F --> G[LLaMA 3 LLM]
G --> H[Generated Response]
I[IMDb Dataset] --> J[Data Preprocessing]
J --> K[Text Embedding]
K --> L[sentence-transformers]
L --> D
-
π Data Preprocessing Pipeline
- IMDb dataset normalization and feature engineering
- Text chunking and metadata extraction
- Genre classification and rating normalization
-
π Vector Store Management
- FAISS index creation and optimization
- Embedding generation with sentence-transformers
- Semantic search and similarity ranking
-
π€ RAG Engine
- LangChain orchestration and prompt templates
- Context retrieval and document ranking
- LLaMA 3 inference and response generation
-
π Interface Layer
- CLI for scripting and automation
- Streamlit web UI for interactive queries
- RESTful API endpoints (extensible)
# Process raw IMDb data
rag-movie-rec process --input-file IMDb_Dataset_Composite_Cleaned.csv
# Build optimized vector store
rag-movie-rec build --embedding-model sentence-transformers/all-MiniLM-L6-v2
# Query recommendations
rag-movie-rec query --query "Recommend sci-fi movies like The Matrix"
# Interactive mode with source attribution
rag-movie-rec query --interactive --show-sources
# Similarity search
rag-movie-rec search "Christopher Nolan thriller movies" --k 5
from src.rag_movie_rec.rag_engine import MovieRAGEngine
# Initialize RAG system
engine = MovieRAGEngine(
vector_store_path="faiss_imdb_store",
llm_model="llama3",
temperature=0.7
)
# Setup retrieval chain
engine.setup_rag_chain()
# Get movie recommendations
result = engine.query("What are some mind-bending movies like Inception?")
print(result['answer'])
# Batch processing
queries = [
"Best horror movies from the 2010s",
"Romantic comedies with high ratings",
"Action movies starring Tom Cruise"
]
results = engine.batch_query(queries)
# Semantic similarity search
similar_movies = engine.get_similar_movies("The Dark Knight", k=10)
# Custom retrieval parameters
engine.retrieval_k = 5
engine.llm.temperature = 0.3
# Health monitoring
health_status = engine.health_check()
print(f"System Status: {health_status}")
# Ollama Configuration
OLLAMA_HOST=localhost:11434
OLLAMA_MODEL=llama3
# Vector Store Settings
VECTOR_STORE_PATH=faiss_imdb_store
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
# LLM Parameters
LLM_TEMPERATURE=0.5
RETRIEVAL_K=3
# Optional: OMDB API for enhanced metadata
OMDB_API_KEY=your_key_here
# Advanced embedding models
EMBEDDING_OPTIONS = [
"sentence-transformers/all-MiniLM-L6-v2", # Fast, good quality
"sentence-transformers/all-mpnet-base-v2", # Best quality
"sentence-transformers/multi-qa-MiniLM-L6-cos" # Q&A optimized
]
# Alternative LLM models via Ollama
LLM_OPTIONS = [
"llama3", # Meta LLaMA 3 8B
"llama3:70b", # LLaMA 3 70B (requires more RAM)
"mistral", # Mistral 7B
"codellama" # Code-optimized variant
]
Component | Minimum | Recommended | Optimal |
---|---|---|---|
RAM | 8GB | 16GB | 32GB+ |
Storage | 10GB | 50GB | 100GB+ |
CPU | 4 cores | 8 cores | 16+ cores |
GPU | None | 8GB VRAM | 24GB+ VRAM |
- Query Latency: ~2-5 seconds (CPU), ~0.5-1 second (GPU)
- Vector Search: Sub-100ms for 50K+ movies
- Throughput: 10-50 queries/minute depending on hardware
- Memory Usage: ~4-8GB for full IMDb dataset
# Unit tests
pytest tests/test_data_processor.py -v
# Integration tests
pytest tests/test_vector_store.py -v
# End-to-end pipeline tests
pytest tests/test_rag_engine.py -v
# Performance benchmarks
pytest tests/test_performance.py --benchmark
# System health validation
python tests/test_setup.py
# Code formatting
black src/ tests/
# Import sorting
isort src/ tests/
# Linting
flake8 src/ tests/
# Type checking
mypy src/rag_movie_rec/
# docker-compose.prod.yml
services:
rag-movie-rec:
image: rag-movie-rec:latest
deploy:
replicas: 3
resources:
limits:
memory: 8G
cpus: '4'
environment:
- OLLAMA_HOST=ollama-cluster:11434
- VECTOR_STORE_PATH=/data/vector_store
# k8s-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: rag-movie-rec
spec:
replicas: 3
selector:
matchLabels:
app: rag-movie-rec
template:
spec:
containers:
- name: rag-movie-rec
image: rag-movie-rec:latest
resources:
requests:
memory: "4Gi"
cpu: "2"
limits:
memory: "8Gi"
cpu: "4"
- AWS: EKS + SageMaker + S3 for model artifacts
- GCP: GKE + Vertex AI + Cloud Storage
- Azure: AKS + Azure ML + Blob Storage
# Built-in health monitoring
from src.rag_movie_rec.rag_engine import MovieRAGEngine
engine = MovieRAGEngine()
health = engine.health_check()
# Component status
print(f"Vector Store: {'β
' if health['vector_store'] else 'β'}")
print(f"LLM Service: {'β
' if health['llm'] else 'β'}")
print(f"RAG Chain: {'β
' if health['rag_chain'] else 'β'}")
# Prometheus metrics endpoint
metrics:
- query_latency_seconds
- vector_search_time_ms
- llm_inference_time_ms
- active_connections
- error_rate
# Grafana dashboards
dashboards:
- rag_performance_overview
- llm_usage_analytics
- vector_store_metrics
# Clone and set up development environment
git clone <repository-url>
cd rag-movie-rec-redux
# Install development dependencies
pip install -e ".[dev]"
# Run pre-commit hooks
pre-commit install
# Start development with Docker
docker-compose -f docker/docker-compose.yml --profile dev up -d
# Access Jupyter development environment
open http://localhost:8888
- π Feature Branches: Create feature branches from
main
- π§ͺ Test Coverage: Maintain >90% test coverage
- π Documentation: Update docs for new features
- π Code Review: All PRs require review
- β CI/CD: Automated testing and deployment
- ποΈ Architecture Guide - System design and components
- π³ Docker Guide - Containerization and deployment
- π§ API Reference - Python API documentation
- π Deployment Guide - Production deployment strategies
This project is licensed under the MIT License - see the LICENSE file for details.
@software{rag_movie_recommender,
title={RAG Movie Recommender: Local AI-Powered Movie Recommendations},
author={RAG Movie Rec Team},
year={2024},
url={https://github.com/username/rag-movie-rec-redux},
note={Retrieval-Augmented Generation system for movie recommendations}
}
- Meta AI for LLaMA 3 open-source model
- LangChain Team for the orchestration framework
- Facebook Research for FAISS vector search
- HuggingFace for Transformers and model hub
- Ollama Team for local LLM deployment tools