A Retrieval-Augmented Generation (RAG) system built with FastAPI that combines document search capabilities with conversational memory. This system uses Elasticsearch for document storage and retrieval, and Anthropic's Claude for intelligent responses with conversation context.
- π Document Ingestion: Automatically chunk and embed documents into Elasticsearch
- π Semantic Search: Vector similarity search across your document corpus
- π¬ Conversational Memory: Persistent conversation history across sessions
- π§ Context-Aware Responses: Leverages both document context and conversation history
- π§ Memory Management: Smart conversation thread management
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Documents β β User Query β β Conversation β
β (Markdown) β β β β Memory β
βββββββββββ¬ββββββββ βββββββββββ¬βββββββββ β (In-Memory) β
β β βββββββββββββββββββ
βΌ βΌ β²
βββββββββββββββββββ ββββββββββββββββββββ β
β Text Splitter β β Embeddings β β
β (LangChain) β β β β
βββββββββββ¬ββββββββ βββββββββββ¬βββββββββ β
β β β
βΌ βΌ β
βββββββββββββββββββ ββββββββββββββββββββ β
β Elasticsearch ββββββ€ Vector Search β β
β (Vectors + β β (kNN) β β
β Documents) β βββββββββββ¬βββββββββ β
βββββββββββββββββββ β β
βΌ β
ββββββββββββββββββββ β
β Claude LLM βββββββββββββββ
β (Anthropic) β
ββββββββββββββββββββ
- Python 3.8+
- Elasticsearch cluster (local or cloud)
- Anthropic API key
-
Clone the repository
git clone <your-repo-url> cd conversational-rag
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables
cp .env.example .env
Edit
.envwith your configuration:ELASTICSEARCH_URL=http://localhost:9200 ELASTICSEARCH_API_KEY=your_elasticsearch_api_key ANTHROPIC_API_KEY=your_anthropic_api_key
-
Prepare your documents
mkdir datasource # Add your markdown files to the datasource directory -
Run the application
python main.py
Or with uvicorn:
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
Once running, visit http://localhost:8000/docs for interactive API documentation.
GET /es/healthCheck Elasticsearch connectivity and system health.
GET /es/ingestProcess and index documents from the datasource directory.
GET /es/search?query=your_question&k=5Search documents without conversation context.
GET /rag/query?query=your_question&thread_id=optional_thread_id&k=5Query with conversation memory. If no thread_id is provided, a new conversation thread is created.
GET /rag/clear_memory?thread_id=your_thread_idClear conversation history for a specific thread.
- Chunk Size: 500 characters (configurable in
RecursiveCharacterTextSplitter) - Chunk Overlap: 50 characters
- Separators:
["\n\n", "\n", " ", ""]
- Model:
sentence-transformers/all-MiniLM-L6-v2 - Dimension: 384
- Use Case: Balanced performance and quality for semantic search
- Model:
claude-3-5-sonnet-20240620 - Temperature: 0 (deterministic responses)
- Provider: Anthropic via LangChain
- Type: ConversationBufferMemory
- History Limit: Last 6 messages (3 exchanges)
- Storage: In-memory (resets on restart)
curl "http://localhost:8000/rag/query?query=What is the main topic of the story?"curl "http://localhost:8000/rag/query?query=Tell me more about that&thread_id=your_thread_id"import requests
base_url = "http://localhost:8000"
# Start conversation
response1 = requests.get(f"{base_url}/rag/query", params={
"query": "What are the main characters?"
})
thread_id = response1.json()["thread_id"]
# Follow-up question
response2 = requests.get(f"{base_url}/rag/query", params={
"query": "What happens to them in the end?",
"thread_id": thread_id
})
print(response2.json()["answer"])βββ main.py # FastAPI application
βββ datasource/ # Document storage directory
β βββ story1.md # Example document
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variables template
βββ README.md # This file
To support different document formats, extend the ingestion logic in ingest_data():
# Example: PDF support
from PyPDF2 import PdfReader
def load_pdf(file_path):
reader = PdfReader(file_path)
text = ""
for page in reader.pages:
text += page.extract_text()
return textReplace the embedding model by updating:
MODEL_NAME = "sentence-transformers/your-preferred-model"
embedder = SentenceTransformer(MODEL_NAME)For persistent memory across restarts, consider:
- Database Storage: PostgreSQL, MongoDB
- Vector Databases: ChromaDB, Pinecone, Weaviate
- Hybrid Approach: Recent messages in memory + historical in database
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]Production environment variables:
ELASTICSEARCH_URL=https://your-es-cluster.com:9200
ELASTICSEARCH_API_KEY=your_production_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
LOG_LEVEL=INFOThe /es/health endpoint provides system health status for monitoring tools.
- API Keys: Never commit API keys to version control
- Authentication: Add API authentication for production use
- Rate Limiting: Implement rate limiting for public deployments
- Input Validation: Validate and sanitize user inputs
- CORS: Configure CORS settings appropriately
- Index Settings: Adjust shards and replicas based on data size
- Mapping: Define explicit mappings for better performance
- Caching: Enable query result caching
- Conversation Limits: Implement automatic cleanup of old conversations
- Memory Monitoring: Monitor memory usage in production
- Async Processing: Use async operations for better concurrency
-
Elasticsearch Connection Failed
# Check if Elasticsearch is running curl -X GET "localhost:9200/_cluster/health"
-
Empty Search Results
- Verify documents are ingested:
GET /es/ingest - Check index exists in Elasticsearch
- Verify embedding model is working
- Verify documents are ingested:
-
Memory Issues
- Restart the application to clear in-memory conversations
- Check available system memory
-
API Key Errors
- Verify environment variables are loaded
- Check API key validity with Anthropic
Enable debug logging:
import logging
logging.basicConfig(level=logging.DEBUG)- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Commit changes:
git commit -am 'Add feature' - Push to branch:
git push origin feature-name - Submit a pull request
fastapi>=0.104.0
uvicorn>=0.24.0
elasticsearch>=8.10.0
python-dotenv>=1.0.0
langchain>=0.0.300
langchain-anthropic>=0.1.0
sentence-transformers>=2.2.2
langchain-text-splitters>=0.0.1- LangChain - Framework for LLM applications
- Elasticsearch - Search and analytics engine
- Anthropic - Claude LLM provider
- Sentence Transformers - Embedding models
β If this project helped you, please give it a star! β