A powerful, self-hosted Retrieval-Augmented Generation (RAG) application that lets you chat with your personal documents using local AI. Built with GPU support for both AMD and NVIDIA systems.
- π Document Intelligence: Upload and chat with your TXT files and Markdown documents
- π¬ Smart Conversations: Context-aware chat using Retrieval-Augmented Generation (RAG)
- β‘ Local & Private: Everything runs on your machine - no data leaves your system
- π― Multi-GPU Support: Optimized for both AMD (ROCm) and NVIDIA (CUDA) GPUs
- π Document Search: Direct semantic search through your uploaded documents
- π§ Easy Setup: Docker-based deployment with auto-detection for your hardware
- π Real-time Streaming: Watch responses generate token-by-token
- π Vector Search: Semantic search powered by Elasticsearch's dense vector capabilities
graph TB
A[User Interface] --> B[Gradio Web App]
B --> C[Elasticsearch Vector DB]
B --> D[Ollama LLM]
C --> E[Document Storage]
D --> F[AMD/NVIDIA GPU]
subgraph "RAG Pipeline"
G[Document Upload] --> H[Text Chunking]
H --> I[Vector Embeddings]
I --> J[Vector Storage]
K[User Query] --> L[Semantic Search]
L --> M[Context Augmentation]
M --> N[LLM Generation]
end
subgraph "AI Backend"
C
D
end
- Docker and Docker Compose
- GPU (Optional but recommended):
- AMD GPU with ROCm support (RX 6000+ series recommended)
- NVIDIA GPU with CUDA support (GTX 10-series+ recommended)
- CPU-only mode also supported
git clone https://github.com/KingAkeem/personal-rag.git
cd personal-rag# The script automatically detects your GPU and configures accordingly
./scripts/start.sh# For AMD GPUs
./scripts/start.sh amd
# For NVIDIA GPUs
./scripts/start.sh nvidia- Main App: http://localhost:7860
- Elasticsearch: http://localhost:9200
- Kibana (Monitoring): http://localhost:5601
- Ollama API: http://localhost:11434
personal-rag-assistant/
βββ src/main.py # Main Gradio web interface
βββ src/embeddings # Text embedding utilities (nomic-embed-text)
βββ src/storage # Vector database operations (Elasticsearch, etc.)
βββ src/llm # LLM chat and RAG functionality (llama2:7b)
βββ docker-compose.amd.yml # AMD GPU configuration
βββ docker-compose.nvidia.yml # NVIDIA GPU configuration
βββ scripts/start.sh # Auto-detecting startup script
βββ scripts/stop.sh # Stop all services
βββ scripts/setup-elasticsearch.sh # Elasticsearch initialization
βββ scripts/install-rocm.sh # Setup ROCM for AMD GPUs locally
- Gradio-based web interface with three tabs: Chat, Upload Documents, Document Search
- Real-time streaming responses
- Configurable context chunks (1-5)
- File upload support for .txt, .pdf, .md files
- Elasticsearch 8.13.0 with vector search capabilities
- Automatic text chunking with configurable overlap
- Cosine similarity search for semantic retrieval
- Document indexing and management
- Ollama integration with streaming support
- RAG pipeline with context augmentation
- Configurable chat models (default: llama2:7b)
- Local embedding generation using nomic-embed-text
- 768-dimensional vector embeddings
- Error handling and fallback mechanisms
- Go to the "Upload Documents" tab
- Upload your TXT or Markdown files
- Documents are automatically chunked and indexed for semantic search
- Switch to the "Chat" tab
- Ask questions about your uploaded content
- Adjust the "Context chunks" slider (1-5) to control how much context is used
- Watch responses stream in real-time
- Use the "Document Search" tab for direct semantic search
- Find relevant passages with similarity scores
- View results in JSON format with filename and content
The app can be configured using environment variables in the Docker Compose files:
# Elasticsearch Configuration
ELASTICSEARCH_URL: "http://elasticsearch:9200"
ELASTICSEARCH_USERNAME: "elastic"
ELASTICSEARCH_PASSWORD: "changeme"
# Ollama Configuration
OLLAMA_HOST: "http://ollama:11434"
# Model Configuration (in respective Python files)
CHAT_MODEL: "llama2:7b"
EMBEDDING_MODEL: "nomic-embed-text"To use different models, modify the environment variables or directly edit the Python files:
# In llm.py
CHAT_MODEL = os.getenv("CHAT_MODEL", "mistral:7b") # Change default model
# In embeddings.py
EMBEDDING_MODEL = os.getenv('EMBEDDING_MODEL', "all-minilm:l6-v2") # Change embedding modelGPU Not Detected
# Check GPU detection
./scripts/start.sh --debug
# Force CPU mode
./scripts/start.sh amd # Uses CPU-only fallbackOllama Model Fails to Load
# Check available models
docker exec ollama ollama list
# Pull model manually
docker exec ollama ollama pull llama2:7bElasticsearch Health Issues
# Check Elasticsearch status
curl -u elastic:changeme http://localhost:9200/_cluster/health
# View Elasticsearch logs
docker logs elasticsearch -fPort Conflicts
# Check what's using the ports
sudo lsof -i :7860 # Gradio app
sudo lsof -i :9200 # Elasticsearch
sudo lsof -i :5601 # Kibana
sudo lsof -i :11434 # Ollama# View all service logs
docker compose -f docker-compose.amd.yml logs -f
# View specific service logs
docker logs rag-app -f
docker logs ollama -f
docker logs elasticsearch -f
# Check service health
docker ps
docker stats- Default passwords are set to
changeme- change these in production - Elasticsearch security is enabled by default
- The application runs locally by default (server_name="0.0.0.0")
- Consider using HTTPS and reverse proxy for external access
- Regularly update Docker images to latest versions
- Adjust
HSA_OVERRIDE_GFX_VERSIONin AMD configuration for your specific GPU - Modify
OLLAMA_GPU_LAYERSin NVIDIA configuration based on VRAM - Monitor GPU usage with
rocm-smi(AMD) ornvidia-smi(NVIDIA)
- Increase Elasticsearch heap size in
ES_JAVA_OPTS - Adjust chunk size and overlap in
storage.py - Monitor disk space for vector storage
- Additional file format support (DOCX)
- Enhanced UI/UX improvements
- More embedding model options
- Performance optimizations
- Additional vector database support
- Gradio for the excellent web interface framework
- Ollama for making local LLMs accessible
- Elasticsearch for vector search capabilities
- The open-source AI community for continuous inspiration
β If this project helped you, please give it a star on GitHub!