Merge pull request #2 from fatemenajafi135/develop

Develop
fatemenajafi135 · Dec 2, 2024 · 14d4fcd · 14d4fcd
2 parents 57a35a1 + 52cbdb4
commit 14d4fcd
Show file tree

Hide file tree

Showing 10 changed files with 172 additions and 39 deletions.
diff --git a/README.md b/README.md
@@ -1,17 +1,80 @@
 # GraphRAG
 An implementation of GraphRAG using graphrag-sdk, falkordb, langchain, ...
 
+## Project Overview
+This project is a robust service for managing, generating, and interacting with Knowledge Graphs (KG) using the **GraphRAG-SDK**, **FalkorDB**, and **LangChain**. The service provides a flexible and scalable platform for handling different file types, creating ontologies, generating KGs, extending them, and interacting with them via chat.
 
-## Running Manually
+Key features of this project include:
+- **GraphRAG-SDK**: Used for managing the creation and processing of Knowledge Graphs.
+- **FalkorDB**: Stores and evolves the generated KGs in a highly efficient and scalable manner.
+- **LangChain**: Powers the conversational chat interface, enabling interactions with the knowledge stored in the KG.
+- **FastAPI**: The service is built as a FastAPI application to provide a fast and efficient API layer for interacting with the system.
+- **Dockerized with Docker Compose**: The entire service is containerized for easy deployment, with CI/CD pipelines via GitHub Actions for continuous integration and delivery.
 
-- DB:
-```shell
-docker pull falkordb/falkordb:edge
-docker run -p 6379:6379 -p 3000:3000 -it --rm -v ./knowledgebase:/knowledgebase falkordb/falkordb:edge
-```
+## Features
+- **Ontology Creation**: Define and create an ontology to structure the data for the KG.
+- **Knowledge Graph Generation**: Generate a KG based on a provided ontology and various data sources (PDF, Word, PowerPoint).
+- **KG Extension**: Add new data and extend the existing KG.
+- **Chat Interface**: Interact with the generated KG via a conversational chat powered by LangChain.
+- **File Upload**: Upload files for future processing and integration.
+
+## How to Run the Service
 
-- FASTAPI
-```shell
-fastapi run app.py
-```
-
+### Prerequisites
+- Python 3.11
+- Docker and Docker Compose installed
+- FastAPI and other Python dependencies
+
+### Setup Instructions
+
+1. **Clone the repository**:
+   ```bash
+   git clone https://github.com/fatemenajafi135/GraphRAG.git
+   cd GraphRAG
+    ```
+
+2. Build and run the Docker containers: The project uses Docker Compose to set up the necessary services.
+    ```shell
+    docker-compose up --build
+    ```
+   This will:
+
+- Build the FastAPI service container.
+- Set up the FalkorDB container for storing the Knowledge Graph.
+- Ensure that the CI/CD pipeline for GitHub Actions is ready for automatic deployment.
+
+3. Access the FastAPI documentation: Once the services are running, you can access the FastAPI documentation at:
+    ```shell
+    http://localhost:8000/docs
+    ```
+
+4. CI/CD Pipeline (GitHub Actions): The repository is integrated with GitHub Actions for CI/CD automation. It automatically builds, tests, and deploys containers upon any changes to the codebase. All relevant configuration files for GitHub Actions can be found in the `.github/workflows/` directory.
+
+5. Available APIs
+The service exposes the following APIs for interacting with the Knowledge Graph:
+
+- POST /ontology/create: Create a new ontology based on a provided sources.
+- POST /kg/create: Generate a new Knowledge Graph based on the defined ontology.
+- PUT /kg/extend: Extend an existing Knowledge Graph by adding new data.
+- POST /kg/upload-files: upload files for future processing and integration. (URL, PDF, Word, PowerPoint).
+- POST /chat: Chat interface to interact with the Knowledge Graph via LangChain.
+
+## Technologies Used AND WHY
+
+- **GraphRAG-SDK**: 
+  - Framework for generating and managing Knowledge Graphs.
+  - Chosen for its ability to efficiently manage the process of generating and processing complex Knowledge Graphs.
+- **FalkorDB**: 
+  - A scalable, production-grade graph database to store the Knowledge Graph.
+  - Ideal for storing graph-based data, providing fast querying and scalability.
+- **LangChain**: 
+  - Conversational AI chain for querying and interacting with the Knowledge Graph.
+  - A powerful tool for building conversational AI chains, perfect for creating chat interfaces with knowledge graphs.
+- **FastAPI**: 
+  - A modern web framework for creating APIs.
+  - Known for its high performance and easy integration with modern Python-based APIs.
+- **Docker & Docker Compose**: 
+  - Containerization and orchestration for the entire service.
+  - Ensures consistency in deployment across environments and simplifies managing dependencies.
+- **GitHub Actions**: 
+  - CI/CD for automated testing and deployment.
diff --git a/app.py b/app.py
@@ -1,11 +1,15 @@
 import redis
-from fastapi import FastAPI
+from pathlib import Path
+from typing import List
+from fastapi import FastAPI, File, UploadFile
 from src import services
 from src.schema import OntologyConfig, KnowledgeGraphConfig, KGSources, ChatRequest
 
 
 app = FastAPI()
 redis_client = redis.StrictRedis(host="graph_db", port=6379, decode_responses=True)
+sources_directory = Path("/sources")
+
 
 @app.post('/chat')
 def chat(chat_request: ChatRequest):
@@ -16,10 +20,30 @@ def chat(chat_request: ChatRequest):
     return {'response': response}
 
 
+@app.post('/upload_files')
+async def upload_files(files: List[UploadFile] = File(...)):
+    sources_directory.mkdir(parents=True, exist_ok=True)
+
+    uploaded_file_paths = []
+
+    for file in files:
+        file_path = sources_directory / file.filename
+        with open(file_path, "wb") as buffer:
+            buffer.write(await file.read())
+        uploaded_file_paths.append(str(file_path))
+    print(uploaded_file_paths)
+    return uploaded_file_paths
+
 @app.post('/create_ontology')
-def create_ontology(sources: KGSources, config: OntologyConfig):
-    response = services.generate_ontology(sources, config)
-    return {'response': response}
+def create_ontology(
+        sources: KGSources,
+        config: OntologyConfig,
+):
+    try:
+        response = services.generate_ontology(sources, config)
+        return {'response': response}
+    except Exception as e:
+        print('ERROR MESSAGE:', e)
 
 
 @app.post('/create_knowledgebase')
@@ -28,7 +52,7 @@ def create_knowledgebase(sources: KGSources, config: KnowledgeGraphConfig):
     return {'response': response}
 
 
-@app.post('/extend_knowledgebase')
+@app.put('/extend_knowledgebase')
 def extend_knowledgebase(sources: KGSources, config: KnowledgeGraphConfig):
     response = services.extend_knowledge_graph(sources, config)
     return {'response': response}

diff --git a/docker-compose.yml b/docker-compose.yml
@@ -10,6 +10,9 @@ services:
       - OPENAI_API_KEY=${OPENAI_API_KEY}
     depends_on:
       - graph_db
+    volumes:
+      - ./sources:/sources
+      - ./knowledgebase/ontologies:/knowledgebase/ontologies
 
   graph_db:
     image: falkordb/falkordb:edge

diff --git a/requirements.txt b/requirements.txt
@@ -5,3 +5,7 @@ langchain_community==0.3.8
 graphrag_sdk[all]==0.3.3
 falkordb==1.0.10
 python-dotenv==1.0.1
+docx2txt==0.8
+unstructured==0.16.8
+python-magic==0.4.27
+python-pptx==1.0.2
diff --git a/src/data_loader.py b/src/data_loader.py
@@ -1,30 +1,48 @@
-from graphrag_sdk.source import URL
+from graphrag_sdk.source import URL, PDF, TEXT
+from langchain_community.document_loaders import Docx2txtLoader, UnstructuredPowerPointLoader
+from src.utils import save_txt
 
 
 class DataLoader:
 
     def load(self, sources):
         return [self._load_source(source) for source in sources.paths]
 
-    def _load_source(self, source):
-        if source.endswith(".pdf"):
-            return self._load_pdf(source)
-        elif source.endswith(".docx"):
-            return self._load_docx(source)
-        elif source.startswith("http"):
-            print('url', source)
-            return self._load_url(source)
+    def _load_source(self, source_path):
+
+        print('Loading... ', source_path)
+        if source_path.endswith(".pdf"):
+            return self._load_pdf(source_path)
+        elif source_path.endswith(".docx"):
+            return self._load_docx(source_path)
+        elif source_path.endswith(".pptx"):
+            return self._load_pptx(source_path)
+        elif source_path.startswith("http"):
+            return self._load_url(source_path)
         else:
-            raise ValueError(f"Unsupported source type: {source}")
-
-    def _load_pdf(self, source):
-        # Logic to load PDF
-        pass
-
-    def _load_docx(self, source):
-        # Logic to load DOCX
-        pass
-
-    def _load_url(self, source):
-        return URL(source)
+            return ''
+            # raise ValueError(f"Unsupported source type: {source}")
+
+    @staticmethod
+    def _load_pdf(source_path):
+        return PDF(source_path)
+
+    @staticmethod
+    def _load_docx(source_path):
+        loader = Docx2txtLoader(source_path)
+        data = loader.load()
+        new_path = save_txt(source_path=source_path, data=data)
+        return TEXT(new_path)
+        # return Document(content=data[0].page_content)
+
+    @staticmethod
+    def _load_pptx(source_path):
+        loader = UnstructuredPowerPointLoader(source_path)
+        data = loader.load()
+        new_path = save_txt(source_path=source_path, data=data)
+        return TEXT(new_path)
 
+    @staticmethod
+    def _load_url(source_path):
+        print(type(URL(source_path)))
+        return URL(source_path)
diff --git a/src/knowledge_graph_service.py b/src/knowledge_graph_service.py
@@ -18,6 +18,7 @@ def __init__(self, ontology, config: KnowledgeGraphConfig):
 
     def create(self, sources):
         if self.kb is None:
+            print('creating KG')
             model = OpenAiGenerativeModel(model_name=self.config.model_name)
             self.kg = KnowledgeGraph(
                 name=self.config.name,

diff --git a/src/ontology_service.py b/src/ontology_service.py
@@ -15,6 +15,7 @@ def __init__(self):
         self.ontology = None
 
     def create(self, sources, config: OntologyConfig):
+        print('creating ontology')
         model = OpenAiGenerativeModel(model_name=config.model_name)
         self.ontology = Ontology.from_sources(
             sources=sources,

diff --git a/src/schema.py b/src/schema.py
@@ -1,5 +1,6 @@
 from pydantic import BaseModel
 from typing import List
+from fastapi import UploadFile
 
 
 class ChatRequest(BaseModel):
@@ -14,7 +15,7 @@ class KGSources(BaseModel):
 class OntologyConfig(BaseModel):
 
     name: str = "movies-6"
-    path: str = "./knowledgebase/ontologies/"
+    path: str = "/knowledgebase/ontologies/"
     model_name: str = "gpt-4o-mini"
     ontology_prompt: str = """
         Extract only the most relevant information about all the movies, actors, and directors over the text.

diff --git a/src/services.py b/src/services.py
@@ -1,7 +1,6 @@
 import os
 import sys
 from pathlib import Path
-
 sys.path.append(os.path.dirname(os.path.abspath(__file__)))
 
 from src.ontology_service import OntologyService

diff --git a/src/utils.py b/src/utils.py
@@ -0,0 +1,19 @@
+from fastapi import UploadFile, File
+
+
+def upload_file(sources_directory: str, file: UploadFile = File(...)):
+    sources_directory.mkdir(parents=True, exist_ok=True)
+
+    file_path = sources_directory / file.filename
+    with open(file_path, "wb") as buffer:
+        buffer.write(file.read())
+
+    return {"filename": file.filename, "filepath": str(file_path)}
+
+
+def save_txt(source_path, data):
+    content = '\n\n'.join(page.page_content for page in data)
+    new_path = '.'.join(source_path.split('.')[:-1]) + '.txt'
+    with open(new_path, 'w', encoding='utf-8') as f:
+        f.write(content)
+    return new_path