Flare AI RAG

Flare AI Kit template for Retrieval-Augmented Generation (RAG) Knowledge.

🚀 Key Features

Modular Architecture: Designed with independent components that can be easily extended.
Qdrant-Powered Retrieval: Leverages Qdrant for fast, semantic document retrieval, but can easily be adapted to other vector databases.
Highly Configurable & Extensible: Uses a straightforward configuration system, enabling effortless integration of new features and services.
Unified LLM Integration: Leverages Gemini as a unified provider while maintaining compatibility with OpenRouter for a broader range of models.

🎯 Getting Started

Prerequisites

Before getting started, ensure you have:

A Python 3.12 environment.
uv installed for dependency management.
Docker
A Gemini API key.
Access to one of the Flare databases. (The Flare Developer Hub is included in CSV format for local testing.)

Build & Run Instructions

You can deploy Flare AI RAG using Docker or set up the backend and frontend manually.

Environment Setup

Prepare the Environment File: Rename .env.example to .env and update the variables accordingly. (e.g. your Gemini API key)

Build using Docker (Recommended)

Build the Docker Image:
```
docker build -t flare-ai-rag .
```

Run the Docker Container:

docker run -p 80:80 -it --env-file .env flare-ai-rag

Access the Frontend: Open your browser and navigate to http://localhost:80 to interact with the Chat UI.

🛠 Build Manually

Flare AI RAG is composed of a Python-based backend and a JavaScript frontend. Follow these steps for manual setup:

Backend Setup

Install Dependencies: Use uv to install backend dependencies:
```
uv sync --all-extras
```
Setup a Qdrant Service: Make sure that Qdrant is up an running before running your script. You can quickly start a Qdrant instance using Docker:
```
docker run -p 6333:6333 qdrant/qdrant
```
Start the Backend: The backend runs by default on 0.0.0.0:8080:
```
uv run start-backend
```

Frontend Setup

Install Dependencies: In the chat-ui/ directory, install the required packages using npm:
```
cd chat-ui/
npm install
```
Configure the Frontend: Update the backend URL in chat-ui/src/App.js for testing:
```
const BACKEND_ROUTE = "http://localhost:8080/api/routes/chat/";
```
Note: Remember to change BACKEND_ROUTE back to 'api/routes/chat/' after testing.
Start the Frontend:
```
npm start
```

📁 Repo Structure

src/flare_ai_rag/
├── ai/                     # AI Provider implementations
│   ├── base.py            # Abstract base classes
│   ├── gemini.py          # Google Gemini integration
│   ├── model.py           # Model definitions
│   └── openrouter.py      # OpenRouter integration
├── api/                    # API layer
│   ├── middleware/        # Request/response middleware
│   └── routes/           # API endpoint definitions
├── attestation/           # TEE security layer
│   ├── simulated_token.txt
│   ├── vtpm_attestation.py  # vTPM client
│   └── vtpm_validation.py   # Token validation
├── responder/            # Response generation
│   ├── base.py           # Base responder interface
│   ├── config.py         # Response configuration
│   ├── prompts.py        # System prompts
│   └── responder.py      # Main responder logic
├── retriever/            # Document retrieval
│   ├── base.py          # Base retriever interface
│   ├── config.py        # Retriever configuration
│   ├── qdrant_collection.py  # Qdrant collection management
│   └── qdrant_retriever.py   # Qdrant implementation
├── router/               # API routing
│   ├── base.py          # Base router interface
│   ├── config.py        # Router configuration
│   ├── prompts.py       # Router prompts
│   └── router.py        # Main routing logic
├── utils/               # Utility functions
│   ├── file_utils.py    # File operations
│   └── parser_utils.py  # Input parsing
├── input_parameters.json # Configuration parameters
├── main.py              # Application entry point
├── query.txt           # Sample queries
└── settings.py         # Environment settings

🚀 Deploy on TEE

Deploy on a Confidential Space using AMD SEV.

Prerequisites

Google Cloud Platform Account: Access to the verifiable-ai-hackathon project is required.
Gemini API Key: Ensure your Gemini API key is linked to the project.
gcloud CLI: Install and authenticate the gcloud CLI.

Environment Configuration

Set Environment Variables: Update your .env file with:

TEE_IMAGE_REFERENCE=ghcr.io/flare-foundation/flare-ai-rag:main  # Replace with your repo build image
INSTANCE_NAME=<PROJECT_NAME-TEAM_NAME>

Load Environment Variables:
```
source .env
```
Reminder: Run the above command in every new shell session or after modifying .env. On Windows, we recommend using git BASH to access commands like source.

Verify the Setup:

echo $TEE_IMAGE_REFERENCE # Expected output: Your repo build image

Deploying to Confidential Space

Run the following command:

gcloud compute instances create $INSTANCE_NAME \
  --project=verifiable-ai-hackathon \
  --zone=us-east5-b \
  --machine-type=n2d-standard-2 \
  --network-interface=network-tier=PREMIUM,nic-type=GVNIC,stack-type=IPV4_ONLY,subnet=default \
  --metadata=tee-image-reference=$TEE_IMAGE_REFERENCE,\
tee-container-log-redirect=true,\
tee-env-GEMINI_API_KEY=$GEMINI_API_KEY,\
  --maintenance-policy=MIGRATE \
  --provisioning-model=STANDARD \
  --service-account=confidential-sa@verifiable-ai-hackathon.iam.gserviceaccount.com \
  --scopes=https://www.googleapis.com/auth/cloud-platform \
  --min-cpu-platform="AMD Milan" \
  --tags=flare-ai,http-server,https-server \
  --create-disk=auto-delete=yes,\
boot=yes,\
device-name=$INSTANCE_NAME,\
image=projects/confidential-space-images/global/images/confidential-space-debug-250100,\
mode=rw,\
size=11,\
type=pd-standard \
  --shielded-secure-boot \
  --shielded-vtpm \
  --shielded-integrity-monitoring \
  --reservation-affinity=any \
  --confidential-compute-type=SEV

Post-deployment

After deployment, you should see an output similar to:

NAME          ZONE           MACHINE_TYPE    PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP    STATUS
rag-team1   us-central1-b  n2d-standard-2               10.128.0.18  34.41.127.200  RUNNING

It may take a few minutes for Confidential Space to complete startup checks. You can monitor progress via the GCP Console logs. Click on Compute Engine → VM Instances (in the sidebar) → Select your instance → Serial port 1 (console).

When you see a message like:
```
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
```
the container is ready. Navigate to the external IP of the instance (visible in the VM Instances page) to access the Chat UI.

🔧 Troubleshooting

If you encounter issues, follow these steps:

Check Logs:

gcloud compute instances get-serial-port-output $INSTANCE_NAME --project=verifiable-ai-hackathon

Verify API Key(s): Ensure that all API Keys are set correctly (e.g. GEMINI_API_KEY).
Check Firewall Settings: Confirm that your instance is publicly accessible on port 80.

💡 Next Steps

Design and implement a knowledge ingestion pipeline, with a demonstration interface showing practical applications for developers and users.

N.B. Other vector databases can be used, provided they run within the same Docker container as the RAG system, since the deployment will occur in a TEE.

Enhanced Data Ingestion & Indexing: Explore more sophisticated data structures for improved indexing and retrieval, and expand beyond a CSV format to include additional data sources (e.g., Flare's GitHub, blogs, documentation). BigQuery integration would be desirable.
Intelligent Query & Data Processing: Use recommended AI models to refine the data processing pipeline, including pre-processing steps that optimize and clean incoming data, ensuring higher-quality context retrieval. (e.g. Use an LLM to reformulate or expand user queries before passing them to the retriever, improving the precision and recall of the semantic search.)
Advanced Context Management: Develop an intelligent context management system that:
- Implements Dynamic Relevance Scoring to rank documents by their contextual importance.
- Optimizes the Context Window to balance the amount of information sent to LLMs.
- Includes Source Verification Mechanisms to assess and validate the reliability of the data sources.
Improved Retrieval & Response Pipelines: Integrate hybrid search techniques (combining semantic and keyword-based methods) for better retrieval, and implement completion checks to verify that the responder's output is complete and accurate (potentially allow an iterative feedback loop for refining the final answer).

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github/workflows		.github/workflows
chat-ui		chat-ui
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
entrypoint.sh		entrypoint.sh
nginx.conf		nginx.conf
pyproject.toml		pyproject.toml
supervisord.conf		supervisord.conf
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flare AI RAG

🚀 Key Features

🎯 Getting Started

Prerequisites

Build & Run Instructions

Environment Setup

Build using Docker (Recommended)

🛠 Build Manually

Backend Setup

Frontend Setup

📁 Repo Structure

🚀 Deploy on TEE

Prerequisites

Environment Configuration

Deploying to Confidential Space

Post-deployment

🔧 Troubleshooting

💡 Next Steps

About

Releases

Packages

Contributors 3

Languages

License

flare-foundation/flare-ai-rag

Folders and files

Latest commit

History

Repository files navigation

Flare AI RAG

🚀 Key Features

🎯 Getting Started

Prerequisites

Build & Run Instructions

Environment Setup

Build using Docker (Recommended)

🛠 Build Manually

Backend Setup

Frontend Setup

📁 Repo Structure

🚀 Deploy on TEE

Prerequisites

Environment Configuration

Deploying to Confidential Space

Post-deployment

🔧 Troubleshooting

💡 Next Steps

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages