-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2 from fatemenajafi135/develop
Develop
- Loading branch information
Showing
10 changed files
with
172 additions
and
39 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,80 @@ | ||
# GraphRAG | ||
An implementation of GraphRAG using graphrag-sdk, falkordb, langchain, ... | ||
|
||
## Project Overview | ||
This project is a robust service for managing, generating, and interacting with Knowledge Graphs (KG) using the **GraphRAG-SDK**, **FalkorDB**, and **LangChain**. The service provides a flexible and scalable platform for handling different file types, creating ontologies, generating KGs, extending them, and interacting with them via chat. | ||
|
||
## Running Manually | ||
Key features of this project include: | ||
- **GraphRAG-SDK**: Used for managing the creation and processing of Knowledge Graphs. | ||
- **FalkorDB**: Stores and evolves the generated KGs in a highly efficient and scalable manner. | ||
- **LangChain**: Powers the conversational chat interface, enabling interactions with the knowledge stored in the KG. | ||
- **FastAPI**: The service is built as a FastAPI application to provide a fast and efficient API layer for interacting with the system. | ||
- **Dockerized with Docker Compose**: The entire service is containerized for easy deployment, with CI/CD pipelines via GitHub Actions for continuous integration and delivery. | ||
|
||
- DB: | ||
```shell | ||
docker pull falkordb/falkordb:edge | ||
docker run -p 6379:6379 -p 3000:3000 -it --rm -v ./knowledgebase:/knowledgebase falkordb/falkordb:edge | ||
``` | ||
## Features | ||
- **Ontology Creation**: Define and create an ontology to structure the data for the KG. | ||
- **Knowledge Graph Generation**: Generate a KG based on a provided ontology and various data sources (PDF, Word, PowerPoint). | ||
- **KG Extension**: Add new data and extend the existing KG. | ||
- **Chat Interface**: Interact with the generated KG via a conversational chat powered by LangChain. | ||
- **File Upload**: Upload files for future processing and integration. | ||
|
||
## How to Run the Service | ||
|
||
- FASTAPI | ||
```shell | ||
fastapi run app.py | ||
``` | ||
|
||
### Prerequisites | ||
- Python 3.11 | ||
- Docker and Docker Compose installed | ||
- FastAPI and other Python dependencies | ||
|
||
### Setup Instructions | ||
|
||
1. **Clone the repository**: | ||
```bash | ||
git clone https://github.com/fatemenajafi135/GraphRAG.git | ||
cd GraphRAG | ||
``` | ||
|
||
2. Build and run the Docker containers: The project uses Docker Compose to set up the necessary services. | ||
```shell | ||
docker-compose up --build | ||
``` | ||
This will: | ||
|
||
- Build the FastAPI service container. | ||
- Set up the FalkorDB container for storing the Knowledge Graph. | ||
- Ensure that the CI/CD pipeline for GitHub Actions is ready for automatic deployment. | ||
|
||
3. Access the FastAPI documentation: Once the services are running, you can access the FastAPI documentation at: | ||
```shell | ||
http://localhost:8000/docs | ||
``` | ||
|
||
4. CI/CD Pipeline (GitHub Actions): The repository is integrated with GitHub Actions for CI/CD automation. It automatically builds, tests, and deploys containers upon any changes to the codebase. All relevant configuration files for GitHub Actions can be found in the `.github/workflows/` directory. | ||
|
||
5. Available APIs | ||
The service exposes the following APIs for interacting with the Knowledge Graph: | ||
|
||
- POST /ontology/create: Create a new ontology based on a provided sources. | ||
- POST /kg/create: Generate a new Knowledge Graph based on the defined ontology. | ||
- PUT /kg/extend: Extend an existing Knowledge Graph by adding new data. | ||
- POST /kg/upload-files: upload files for future processing and integration. (URL, PDF, Word, PowerPoint). | ||
- POST /chat: Chat interface to interact with the Knowledge Graph via LangChain. | ||
|
||
## Technologies Used AND WHY | ||
|
||
- **GraphRAG-SDK**: | ||
- Framework for generating and managing Knowledge Graphs. | ||
- Chosen for its ability to efficiently manage the process of generating and processing complex Knowledge Graphs. | ||
- **FalkorDB**: | ||
- A scalable, production-grade graph database to store the Knowledge Graph. | ||
- Ideal for storing graph-based data, providing fast querying and scalability. | ||
- **LangChain**: | ||
- Conversational AI chain for querying and interacting with the Knowledge Graph. | ||
- A powerful tool for building conversational AI chains, perfect for creating chat interfaces with knowledge graphs. | ||
- **FastAPI**: | ||
- A modern web framework for creating APIs. | ||
- Known for its high performance and easy integration with modern Python-based APIs. | ||
- **Docker & Docker Compose**: | ||
- Containerization and orchestration for the entire service. | ||
- Ensures consistency in deployment across environments and simplifies managing dependencies. | ||
- **GitHub Actions**: | ||
- CI/CD for automated testing and deployment. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,30 +1,48 @@ | ||
from graphrag_sdk.source import URL | ||
from graphrag_sdk.source import URL, PDF, TEXT | ||
from langchain_community.document_loaders import Docx2txtLoader, UnstructuredPowerPointLoader | ||
from src.utils import save_txt | ||
|
||
|
||
class DataLoader: | ||
|
||
def load(self, sources): | ||
return [self._load_source(source) for source in sources.paths] | ||
|
||
def _load_source(self, source): | ||
if source.endswith(".pdf"): | ||
return self._load_pdf(source) | ||
elif source.endswith(".docx"): | ||
return self._load_docx(source) | ||
elif source.startswith("http"): | ||
print('url', source) | ||
return self._load_url(source) | ||
def _load_source(self, source_path): | ||
|
||
print('Loading... ', source_path) | ||
if source_path.endswith(".pdf"): | ||
return self._load_pdf(source_path) | ||
elif source_path.endswith(".docx"): | ||
return self._load_docx(source_path) | ||
elif source_path.endswith(".pptx"): | ||
return self._load_pptx(source_path) | ||
elif source_path.startswith("http"): | ||
return self._load_url(source_path) | ||
else: | ||
raise ValueError(f"Unsupported source type: {source}") | ||
|
||
def _load_pdf(self, source): | ||
# Logic to load PDF | ||
pass | ||
|
||
def _load_docx(self, source): | ||
# Logic to load DOCX | ||
pass | ||
|
||
def _load_url(self, source): | ||
return URL(source) | ||
return '' | ||
# raise ValueError(f"Unsupported source type: {source}") | ||
|
||
@staticmethod | ||
def _load_pdf(source_path): | ||
return PDF(source_path) | ||
|
||
@staticmethod | ||
def _load_docx(source_path): | ||
loader = Docx2txtLoader(source_path) | ||
data = loader.load() | ||
new_path = save_txt(source_path=source_path, data=data) | ||
return TEXT(new_path) | ||
# return Document(content=data[0].page_content) | ||
|
||
@staticmethod | ||
def _load_pptx(source_path): | ||
loader = UnstructuredPowerPointLoader(source_path) | ||
data = loader.load() | ||
new_path = save_txt(source_path=source_path, data=data) | ||
return TEXT(new_path) | ||
|
||
@staticmethod | ||
def _load_url(source_path): | ||
print(type(URL(source_path))) | ||
return URL(source_path) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
from fastapi import UploadFile, File | ||
|
||
|
||
def upload_file(sources_directory: str, file: UploadFile = File(...)): | ||
sources_directory.mkdir(parents=True, exist_ok=True) | ||
|
||
file_path = sources_directory / file.filename | ||
with open(file_path, "wb") as buffer: | ||
buffer.write(file.read()) | ||
|
||
return {"filename": file.filename, "filepath": str(file_path)} | ||
|
||
|
||
def save_txt(source_path, data): | ||
content = '\n\n'.join(page.page_content for page in data) | ||
new_path = '.'.join(source_path.split('.')[:-1]) + '.txt' | ||
with open(new_path, 'w', encoding='utf-8') as f: | ||
f.write(content) | ||
return new_path |