Skip to content

Commit

Permalink
Merge pull request #2 from fatemenajafi135/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
fatemenajafi135 authored Dec 2, 2024
2 parents 57a35a1 + 52cbdb4 commit 14d4fcd
Show file tree
Hide file tree
Showing 10 changed files with 172 additions and 39 deletions.
85 changes: 74 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,80 @@
# GraphRAG
An implementation of GraphRAG using graphrag-sdk, falkordb, langchain, ...

## Project Overview
This project is a robust service for managing, generating, and interacting with Knowledge Graphs (KG) using the **GraphRAG-SDK**, **FalkorDB**, and **LangChain**. The service provides a flexible and scalable platform for handling different file types, creating ontologies, generating KGs, extending them, and interacting with them via chat.

## Running Manually
Key features of this project include:
- **GraphRAG-SDK**: Used for managing the creation and processing of Knowledge Graphs.
- **FalkorDB**: Stores and evolves the generated KGs in a highly efficient and scalable manner.
- **LangChain**: Powers the conversational chat interface, enabling interactions with the knowledge stored in the KG.
- **FastAPI**: The service is built as a FastAPI application to provide a fast and efficient API layer for interacting with the system.
- **Dockerized with Docker Compose**: The entire service is containerized for easy deployment, with CI/CD pipelines via GitHub Actions for continuous integration and delivery.

- DB:
```shell
docker pull falkordb/falkordb:edge
docker run -p 6379:6379 -p 3000:3000 -it --rm -v ./knowledgebase:/knowledgebase falkordb/falkordb:edge
```
## Features
- **Ontology Creation**: Define and create an ontology to structure the data for the KG.
- **Knowledge Graph Generation**: Generate a KG based on a provided ontology and various data sources (PDF, Word, PowerPoint).
- **KG Extension**: Add new data and extend the existing KG.
- **Chat Interface**: Interact with the generated KG via a conversational chat powered by LangChain.
- **File Upload**: Upload files for future processing and integration.

## How to Run the Service

- FASTAPI
```shell
fastapi run app.py
```

### Prerequisites
- Python 3.11
- Docker and Docker Compose installed
- FastAPI and other Python dependencies

### Setup Instructions

1. **Clone the repository**:
```bash
git clone https://github.com/fatemenajafi135/GraphRAG.git
cd GraphRAG
```

2. Build and run the Docker containers: The project uses Docker Compose to set up the necessary services.
```shell
docker-compose up --build
```
This will:

- Build the FastAPI service container.
- Set up the FalkorDB container for storing the Knowledge Graph.
- Ensure that the CI/CD pipeline for GitHub Actions is ready for automatic deployment.

3. Access the FastAPI documentation: Once the services are running, you can access the FastAPI documentation at:
```shell
http://localhost:8000/docs
```

4. CI/CD Pipeline (GitHub Actions): The repository is integrated with GitHub Actions for CI/CD automation. It automatically builds, tests, and deploys containers upon any changes to the codebase. All relevant configuration files for GitHub Actions can be found in the `.github/workflows/` directory.

5. Available APIs
The service exposes the following APIs for interacting with the Knowledge Graph:

- POST /ontology/create: Create a new ontology based on a provided sources.
- POST /kg/create: Generate a new Knowledge Graph based on the defined ontology.
- PUT /kg/extend: Extend an existing Knowledge Graph by adding new data.
- POST /kg/upload-files: upload files for future processing and integration. (URL, PDF, Word, PowerPoint).
- POST /chat: Chat interface to interact with the Knowledge Graph via LangChain.

## Technologies Used AND WHY

- **GraphRAG-SDK**:
- Framework for generating and managing Knowledge Graphs.
- Chosen for its ability to efficiently manage the process of generating and processing complex Knowledge Graphs.
- **FalkorDB**:
- A scalable, production-grade graph database to store the Knowledge Graph.
- Ideal for storing graph-based data, providing fast querying and scalability.
- **LangChain**:
- Conversational AI chain for querying and interacting with the Knowledge Graph.
- A powerful tool for building conversational AI chains, perfect for creating chat interfaces with knowledge graphs.
- **FastAPI**:
- A modern web framework for creating APIs.
- Known for its high performance and easy integration with modern Python-based APIs.
- **Docker & Docker Compose**:
- Containerization and orchestration for the entire service.
- Ensures consistency in deployment across environments and simplifies managing dependencies.
- **GitHub Actions**:
- CI/CD for automated testing and deployment.
34 changes: 29 additions & 5 deletions app.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
import redis
from fastapi import FastAPI
from pathlib import Path
from typing import List
from fastapi import FastAPI, File, UploadFile
from src import services
from src.schema import OntologyConfig, KnowledgeGraphConfig, KGSources, ChatRequest


app = FastAPI()
redis_client = redis.StrictRedis(host="graph_db", port=6379, decode_responses=True)
sources_directory = Path("/sources")


@app.post('/chat')
def chat(chat_request: ChatRequest):
Expand All @@ -16,10 +20,30 @@ def chat(chat_request: ChatRequest):
return {'response': response}


@app.post('/upload_files')
async def upload_files(files: List[UploadFile] = File(...)):
sources_directory.mkdir(parents=True, exist_ok=True)

uploaded_file_paths = []

for file in files:
file_path = sources_directory / file.filename
with open(file_path, "wb") as buffer:
buffer.write(await file.read())
uploaded_file_paths.append(str(file_path))
print(uploaded_file_paths)
return uploaded_file_paths

@app.post('/create_ontology')
def create_ontology(sources: KGSources, config: OntologyConfig):
response = services.generate_ontology(sources, config)
return {'response': response}
def create_ontology(
sources: KGSources,
config: OntologyConfig,
):
try:
response = services.generate_ontology(sources, config)
return {'response': response}
except Exception as e:
print('ERROR MESSAGE:', e)


@app.post('/create_knowledgebase')
Expand All @@ -28,7 +52,7 @@ def create_knowledgebase(sources: KGSources, config: KnowledgeGraphConfig):
return {'response': response}


@app.post('/extend_knowledgebase')
@app.put('/extend_knowledgebase')
def extend_knowledgebase(sources: KGSources, config: KnowledgeGraphConfig):
response = services.extend_knowledge_graph(sources, config)
return {'response': response}
Expand Down
3 changes: 3 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ services:
- OPENAI_API_KEY=${OPENAI_API_KEY}
depends_on:
- graph_db
volumes:
- ./sources:/sources
- ./knowledgebase/ontologies:/knowledgebase/ontologies

graph_db:
image: falkordb/falkordb:edge
Expand Down
4 changes: 4 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,7 @@ langchain_community==0.3.8
graphrag_sdk[all]==0.3.3
falkordb==1.0.10
python-dotenv==1.0.1
docx2txt==0.8
unstructured==0.16.8
python-magic==0.4.27
python-pptx==1.0.2
60 changes: 39 additions & 21 deletions src/data_loader.py
Original file line number Diff line number Diff line change
@@ -1,30 +1,48 @@
from graphrag_sdk.source import URL
from graphrag_sdk.source import URL, PDF, TEXT
from langchain_community.document_loaders import Docx2txtLoader, UnstructuredPowerPointLoader
from src.utils import save_txt


class DataLoader:

def load(self, sources):
return [self._load_source(source) for source in sources.paths]

def _load_source(self, source):
if source.endswith(".pdf"):
return self._load_pdf(source)
elif source.endswith(".docx"):
return self._load_docx(source)
elif source.startswith("http"):
print('url', source)
return self._load_url(source)
def _load_source(self, source_path):

print('Loading... ', source_path)
if source_path.endswith(".pdf"):
return self._load_pdf(source_path)
elif source_path.endswith(".docx"):
return self._load_docx(source_path)
elif source_path.endswith(".pptx"):
return self._load_pptx(source_path)
elif source_path.startswith("http"):
return self._load_url(source_path)
else:
raise ValueError(f"Unsupported source type: {source}")

def _load_pdf(self, source):
# Logic to load PDF
pass

def _load_docx(self, source):
# Logic to load DOCX
pass

def _load_url(self, source):
return URL(source)
return ''
# raise ValueError(f"Unsupported source type: {source}")

@staticmethod
def _load_pdf(source_path):
return PDF(source_path)

@staticmethod
def _load_docx(source_path):
loader = Docx2txtLoader(source_path)
data = loader.load()
new_path = save_txt(source_path=source_path, data=data)
return TEXT(new_path)
# return Document(content=data[0].page_content)

@staticmethod
def _load_pptx(source_path):
loader = UnstructuredPowerPointLoader(source_path)
data = loader.load()
new_path = save_txt(source_path=source_path, data=data)
return TEXT(new_path)

@staticmethod
def _load_url(source_path):
print(type(URL(source_path)))
return URL(source_path)
1 change: 1 addition & 0 deletions src/knowledge_graph_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ def __init__(self, ontology, config: KnowledgeGraphConfig):

def create(self, sources):
if self.kb is None:
print('creating KG')
model = OpenAiGenerativeModel(model_name=self.config.model_name)
self.kg = KnowledgeGraph(
name=self.config.name,
Expand Down
1 change: 1 addition & 0 deletions src/ontology_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ def __init__(self):
self.ontology = None

def create(self, sources, config: OntologyConfig):
print('creating ontology')
model = OpenAiGenerativeModel(model_name=config.model_name)
self.ontology = Ontology.from_sources(
sources=sources,
Expand Down
3 changes: 2 additions & 1 deletion src/schema.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from pydantic import BaseModel
from typing import List
from fastapi import UploadFile


class ChatRequest(BaseModel):
Expand All @@ -14,7 +15,7 @@ class KGSources(BaseModel):
class OntologyConfig(BaseModel):

name: str = "movies-6"
path: str = "./knowledgebase/ontologies/"
path: str = "/knowledgebase/ontologies/"
model_name: str = "gpt-4o-mini"
ontology_prompt: str = """
Extract only the most relevant information about all the movies, actors, and directors over the text.
Expand Down
1 change: 0 additions & 1 deletion src/services.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import os
import sys
from pathlib import Path

sys.path.append(os.path.dirname(os.path.abspath(__file__)))

from src.ontology_service import OntologyService
Expand Down
19 changes: 19 additions & 0 deletions src/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
from fastapi import UploadFile, File


def upload_file(sources_directory: str, file: UploadFile = File(...)):
sources_directory.mkdir(parents=True, exist_ok=True)

file_path = sources_directory / file.filename
with open(file_path, "wb") as buffer:
buffer.write(file.read())

return {"filename": file.filename, "filepath": str(file_path)}


def save_txt(source_path, data):
content = '\n\n'.join(page.page_content for page in data)
new_path = '.'.join(source_path.split('.')[:-1]) + '.txt'
with open(new_path, 'w', encoding='utf-8') as f:
f.write(content)
return new_path

0 comments on commit 14d4fcd

Please sign in to comment.