A modular and scalable system that combines knowledge graphs with retrieval-augmented generation (RAG) to provide intelligent question-answering over documents.
- Modular Architecture: Clean, maintainable code structure with separated concerns
- Multiple Input Sources: Support for PDF uploads and Wikipedia queries
- Knowledge Graph Construction: Automatic entity and relationship extraction
- Hybrid Search: Combines structured (graph) and unstructured (vector) search
- Interactive Web Interface: User-friendly Streamlit application
- Chat History Support: Contextual conversations with memory
- Configurable Models: Support for various LLMs via Groq API
βββ config.py # Configuration management
βββ document_loader.py # Document loading and chunking
βββ entity_extractor.py # Entity extraction from queries
βββ graph_manager.py # Knowledge graph operations
βββ retrieval_system.py # RAG retrieval and QA
βββ main.py # Main pipeline orchestration
βββ streamlit_app.py # Web interface
βββ requirements.txt # Dependencies
- Neo4j Database: Set up a Neo4j instance (local or cloud)
- Groq API Key: Get your API key from Groq
- Python 3.8+: Ensure you have Python installed
- Clone the repository:
git clone <your-repo-url>
cd graphrag-system- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
# Create a .env file or set environment variables
export NEO4J_URI="neo4j+s://your-neo4j-uri"
export NEO4J_USERNAME="neo4j"
export NEO4J_PASSWORD="your-password"
export GROQ_API_KEY="your-groq-api-key"
export HF_TOKEN="your-huggingface-token" # Optionalstreamlit run streamlit_app.pyfrom main import GraphRAGPipeline
# Initialize pipeline
pipeline = GraphRAGPipeline()
# Process documents (Wikipedia example)
pipeline.process_documents("Artificial Intelligence", "wikipedia")
# Ask questions
answer = pipeline.ask_question("What is machine learning?")
print(answer)PDF Upload:
- Click on the "PDF Upload" tab
- Upload your PDF file
- Choose whether to clear existing graph data
- Click "Process PDF"
Wikipedia Query:
- Click on the "Wikipedia" tab
- Enter your search query (e.g., "Climate Change")
- Choose whether to clear existing graph data
- Click "Process Wikipedia"
Once documents are processed:
- Enter your question in the text input
- Click "Ask Question" to get an answer
- Click "Show Context" to see retrieved information
- View chat history for previous conversations
The system can be configured through the GraphRAGConfig class:
config = GraphRAGConfig(
model_name="deepseek-r1-distill-llama-70b",
embedding_model="sentence-transformers/all-mpnet-base-v2",
chunk_size=512,
chunk_overlap=24,
temperature=0.3
)deepseek-r1-distill-llama-70b(recommended)llama3-70b-8192mixtral-8x7b-32768
- Handles PDF and Wikipedia document loading
- Implements text chunking with configurable parameters
- Supports multiple input sources
- Manages Neo4j knowledge graph operations
- Creates nodes and relationships from documents
- Handles vector indexing for hybrid search
- Extracts entities from user queries
- Uses structured output parsing
- Supports person, organization, and location entities
- Implements hybrid retrieval (graph + vector search)
- Handles chat history and context
- Provides comprehensive question-answering
The system creates rich knowledge graphs with:
- Entities: People, organizations, locations, concepts
- Relationships: Connections between entities
- Documents: Source traceability
graphrag-system/
β
βββ config.py # System configuration
βββ document_loader.py # Document processing
βββ entity_extractor.py # Entity extraction
βββ graph_manager.py # Graph operations
βββ retrieval_system.py # RAG system
βββ main.py # Main pipeline
βββ streamlit_app.py # Web interface
βββ requirements.txt # Dependencies
βββ README.md # Documentation
- New Document Types: Extend
DocumentLoaderclass - Custom Models: Update configuration and model initialization
- Enhanced Retrieval: Modify
RetrievalSystemclass - UI Improvements: Update
streamlit_app.py
-
Connection Errors:
- Verify Neo4j credentials and connectivity
- Check if Neo4j instance is running
-
Model Errors:
- Ensure Groq API key is valid
- Check model name spelling
-
Memory Issues:
- Reduce chunk_size for large documents
- Process documents in smaller batches
Enable debug information by setting:
import logging
logging.basicConfig(level=logging.DEBUG)- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- LangChain for the RAG framework
- Neo4j for graph database technology
- Groq for fast LLM inference
- Streamlit for the web interface
- HuggingFace for embeddings models
For questions or support, please open an issue or contact subhadipde128@gmail.com.
Built with β€οΈ for the AI community