GraphRAG: Knowledge Graph-based Retrieval Augmented Generation

A modular and scalable system that combines knowledge graphs with retrieval-augmented generation (RAG) to provide intelligent question-answering over documents.

🌟 Features

Modular Architecture: Clean, maintainable code structure with separated concerns
Multiple Input Sources: Support for PDF uploads and Wikipedia queries
Knowledge Graph Construction: Automatic entity and relationship extraction
Hybrid Search: Combines structured (graph) and unstructured (vector) search
Interactive Web Interface: User-friendly Streamlit application
Chat History Support: Contextual conversations with memory
Configurable Models: Support for various LLMs via Groq API

🏗️ Architecture

├── config.py              # Configuration management
├── document_loader.py     # Document loading and chunking
├── entity_extractor.py    # Entity extraction from queries
├── graph_manager.py       # Knowledge graph operations
├── retrieval_system.py    # RAG retrieval and QA
├── main.py               # Main pipeline orchestration
├── streamlit_app.py      # Web interface
└── requirements.txt      # Dependencies

🚀 Quick Start

Prerequisites

Neo4j Database: Set up a Neo4j instance (local or cloud)
Groq API Key: Get your API key from Groq
Python 3.8+: Ensure you have Python installed

Installation

Clone the repository:

git clone <your-repo-url>
cd graphrag-system

Install dependencies:

pip install -r requirements.txt

Set up environment variables:

# Create a .env file or set environment variables
export NEO4J_URI="neo4j+s://your-neo4j-uri"
export NEO4J_USERNAME="neo4j"
export NEO4J_PASSWORD="your-password"
export GROQ_API_KEY="your-groq-api-key"
export HF_TOKEN="your-huggingface-token"  # Optional

Running the Application

Option 1: Streamlit Web Interface

streamlit run streamlit_app.py

Option 2: Python Script

from main import GraphRAGPipeline

# Initialize pipeline
pipeline = GraphRAGPipeline()

# Process documents (Wikipedia example)
pipeline.process_documents("Artificial Intelligence", "wikipedia")

# Ask questions
answer = pipeline.ask_question("What is machine learning?")
print(answer)

📖 Usage

Processing Documents

PDF Upload:

Click on the "PDF Upload" tab
Upload your PDF file
Choose whether to clear existing graph data
Click "Process PDF"

Wikipedia Query:

Click on the "Wikipedia" tab
Enter your search query (e.g., "Climate Change")
Choose whether to clear existing graph data
Click "Process Wikipedia"

Asking Questions

Once documents are processed:

Enter your question in the text input
Click "Ask Question" to get an answer
Click "Show Context" to see retrieved information
View chat history for previous conversations

⚙️ Configuration

The system can be configured through the GraphRAGConfig class:

config = GraphRAGConfig(
    model_name="deepseek-r1-distill-llama-70b",
    embedding_model="sentence-transformers/all-mpnet-base-v2",
    chunk_size=512,
    chunk_overlap=24,
    temperature=0.3
)

Available Models

deepseek-r1-distill-llama-70b (recommended)
llama3-70b-8192
mixtral-8x7b-32768

🔧 System Components

Document Loader (`document_loader.py`)

Handles PDF and Wikipedia document loading
Implements text chunking with configurable parameters
Supports multiple input sources

Graph Manager (`graph_manager.py`)

Manages Neo4j knowledge graph operations
Creates nodes and relationships from documents
Handles vector indexing for hybrid search

Entity Extractor (`entity_extractor.py`)

Extracts entities from user queries
Uses structured output parsing
Supports person, organization, and location entities

Retrieval System (`retrieval_system.py`)

Implements hybrid retrieval (graph + vector search)
Handles chat history and context
Provides comprehensive question-answering

📊 Graph Visualization

The system creates rich knowledge graphs with:

Entities: People, organizations, locations, concepts
Relationships: Connections between entities
Documents: Source traceability

🛠️ Development

Project Structure

graphrag-system/
│
├── config.py              # System configuration
├── document_loader.py     # Document processing
├── entity_extractor.py    # Entity extraction
├── graph_manager.py       # Graph operations
├── retrieval_system.py    # RAG system
├── main.py               # Main pipeline
├── streamlit_app.py      # Web interface
├── requirements.txt      # Dependencies
└── README.md            # Documentation

Adding New Features

New Document Types: Extend DocumentLoader class
Custom Models: Update configuration and model initialization
Enhanced Retrieval: Modify RetrievalSystem class
UI Improvements: Update streamlit_app.py

🔍 Troubleshooting

Common Issues

Connection Errors:
- Verify Neo4j credentials and connectivity
- Check if Neo4j instance is running
Model Errors:
- Ensure Groq API key is valid
- Check model name spelling
Memory Issues:
- Reduce chunk_size for large documents
- Process documents in smaller batches

Debug Mode

Enable debug information by setting:

import logging
logging.basicConfig(level=logging.DEBUG)

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

LangChain for the RAG framework
Neo4j for graph database technology
Groq for fast LLM inference
Streamlit for the web interface
HuggingFace for embeddings models

📧 Contact

For questions or support, please open an issue or contact subhadipde128@gmail.com.

Built with ❤️ for the AI community

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
notebooks		notebooks
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
logger.py		logger.py
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraphRAG: Knowledge Graph-based Retrieval Augmented Generation

🌟 Features

🏗️ Architecture

🚀 Quick Start

Prerequisites

Installation

Running the Application

Option 1: Streamlit Web Interface

Option 2: Python Script

📖 Usage

Processing Documents

Asking Questions

⚙️ Configuration

Available Models

🔧 System Components

Document Loader (`document_loader.py`)

Graph Manager (`graph_manager.py`)

Entity Extractor (`entity_extractor.py`)

Retrieval System (`retrieval_system.py`)

📊 Graph Visualization

🛠️ Development

Project Structure

Adding New Features

🔍 Troubleshooting

Common Issues

Debug Mode

🤝 Contributing

📄 License

🙏 Acknowledgments

📧 Contact

About

Uh oh!

Releases

Packages

Languages

License

iamsubha16/StudyBot

Folders and files

Latest commit

History

Repository files navigation

GraphRAG: Knowledge Graph-based Retrieval Augmented Generation

🌟 Features

🏗️ Architecture

🚀 Quick Start

Prerequisites

Installation

Running the Application

Option 1: Streamlit Web Interface

Option 2: Python Script

📖 Usage

Processing Documents

Asking Questions

⚙️ Configuration

Available Models

🔧 System Components

Document Loader (document_loader.py)

Graph Manager (graph_manager.py)

Entity Extractor (entity_extractor.py)

Retrieval System (retrieval_system.py)

📊 Graph Visualization

🛠️ Development

Project Structure

Adding New Features

🔍 Troubleshooting

Common Issues

Debug Mode

🤝 Contributing

📄 License

🙏 Acknowledgments

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Document Loader (`document_loader.py`)

Graph Manager (`graph_manager.py`)

Entity Extractor (`entity_extractor.py`)

Retrieval System (`retrieval_system.py`)

Packages