A modern RAG (Retrieval-Augmented Generation) application with separated frontend and backend architecture using LangGraph, FastAPI, and Streamlit.
- FastAPI: RESTful API server
- LangGraph: Workflow orchestration for RAG pipeline
- FAISS: Vector storage for document embeddings
- Ollama: LLM integration for response generation
- Ollama (nomic-embed-text): Text embeddings
- Streamlit: Web interface for document upload and chat
- API Client: HTTP client for backend communication
- 📄 PDF document upload and processing
- 🔍 Semantic search with vector embeddings
- 💬 Chat interface with context-aware responses
- 🤖 Integration with Ollama models (Llama2)
- 🔄 LangGraph workflow for RAG pipeline
- 🌐 Separated frontend/backend architecture
- 📊 Real-time document chunking and indexing
- 🎯 Session-based conversation continuity
-
Ollama Installation: Install and start Ollama
# Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Start Ollama server ollama serve # Pull a model ollama pull llama2
-
Python Environment: Python 3.9+
-
Backend Setup:
cd backend pip install -r requirements.txt -
Frontend Setup:
cd frontend pip install -r requirements.txt
cd backend
python main.pyThe backend will start on http://localhost:8000
cd frontend
streamlit run app.pyThe frontend will start on http://localhost:8501
GET /: Health checkPOST /upload: Upload and process PDF documentsPOST /chat: Chat with the RAG systemGET /documents: Get document informationDELETE /documents: Clear all documents
curl -X POST "http://localhost:8000/upload" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@document.pdf"curl -X POST "http://localhost:8000/chat" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"message": "What is this document about?",
"model": "llama2"
}'The RAG pipeline is implemented using LangGraph with the following nodes:
- Retrieve: Search for relevant documents using vector similarity
- Generate: Create response using Ollama with retrieved context
- Format: Format the final response with sources
Backend:
OLLAMA_HOST: Ollama server host (default:http://localhost:11434)
Frontend:
API_BASE_URL: Backend API URL (default:http://localhost:8000)
- Embedding model:
nomic-embed-text - Chunk size: 1000 characters with 200 character overlap
- Search results: 4 most relevant chunks
- Default LLM:
llama2
├── backend/
│ ├── main.py # FastAPI application
│ ├── services/
│ │ ├── rag_workflow.py # LangGraph RAG workflow
│ │ ├── pdf_service.py # PDF processing
│ │ ├── vector_service.py # Vector storage
│ │ └── ollama_service.py # Ollama integration
│ └── requirements.txt
├── frontend/
│ ├── app.py # Streamlit application
│ ├── api_client.py # API client
│ └── requirements.txt
└── README.md
- Ollama Connection: Ensure Ollama is running on
localhost:11434 - Model Not Found: The app will automatically pull models if available
- Vector Storage: FAISS runs in-memory by default
- API Connection: Ensure backend is running on
localhost:8000 - CORS Issues: Backend includes CORS middleware for frontend access
- File Upload: Check file size limits and PDF format
- Use
uvicorn main:app --reloadfor backend development - Use
streamlit run app.py --server.reloadfor frontend development - Check browser console for API errors
