Flare AI Kit template for Retrieval-Augmented Generation (RAG) Knowledge.
- Modular Architecture: Designed with independent components that can be easily extended.
- Qdrant-Powered Retrieval: Leverages Qdrant for fast, semantic document retrieval, but can easily be adapted to other vector databases.
- Highly Configurable & Extensible: Uses a straightforward configuration system, enabling effortless integration of new features and services.
- Unified LLM Integration: Leverages Gemini as a unified provider while maintaining compatibility with OpenRouter for a broader range of models.
Before getting started, ensure you have:
- A Python 3.12 environment.
- uv installed for dependency management.
- Docker
- A Gemini API key.
- Access to one of the Flare databases. (The Flare Developer Hub is included in CSV format for local testing.)
You can deploy Flare AI RAG using Docker or set up the backend and frontend manually.
- Prepare the Environment File:
Rename
.env.example
to.env
and update the variables accordingly. (e.g. your Gemini API key)
-
Build the Docker Image:
docker build -t flare-ai-rag .
-
Run the Docker Container:
docker run -p 80:80 -it --env-file .env flare-ai-rag
-
Access the Frontend: Open your browser and navigate to http://localhost:80 to interact with the Chat UI.
Flare AI RAG is composed of a Python-based backend and a JavaScript frontend. Follow these steps for manual setup:
-
Install Dependencies: Use uv to install backend dependencies:
uv sync --all-extras
-
Setup a Qdrant Service: Make sure that Qdrant is up an running before running your script. You can quickly start a Qdrant instance using Docker:
docker run -p 6333:6333 qdrant/qdrant
-
Start the Backend: The backend runs by default on
0.0.0.0:8080
:uv run start-backend
-
Install Dependencies: In the
chat-ui/
directory, install the required packages using npm:cd chat-ui/ npm install
-
Configure the Frontend: Update the backend URL in
chat-ui/src/App.js
for testing:const BACKEND_ROUTE = "http://localhost:8080/api/routes/chat/";
Note: Remember to change
BACKEND_ROUTE
back to'api/routes/chat/'
after testing. -
Start the Frontend:
npm start
src/flare_ai_rag/
├── ai/ # AI Provider implementations
│ ├── base.py # Abstract base classes
│ ├── gemini.py # Google Gemini integration
│ ├── model.py # Model definitions
│ └── openrouter.py # OpenRouter integration
├── api/ # API layer
│ ├── middleware/ # Request/response middleware
│ └── routes/ # API endpoint definitions
├── attestation/ # TEE security layer
│ ├── simulated_token.txt
│ ├── vtpm_attestation.py # vTPM client
│ └── vtpm_validation.py # Token validation
├── responder/ # Response generation
│ ├── base.py # Base responder interface
│ ├── config.py # Response configuration
│ ├── prompts.py # System prompts
│ └── responder.py # Main responder logic
├── retriever/ # Document retrieval
│ ├── base.py # Base retriever interface
│ ├── config.py # Retriever configuration
│ ├── qdrant_collection.py # Qdrant collection management
│ └── qdrant_retriever.py # Qdrant implementation
├── router/ # API routing
│ ├── base.py # Base router interface
│ ├── config.py # Router configuration
│ ├── prompts.py # Router prompts
│ └── router.py # Main routing logic
├── utils/ # Utility functions
│ ├── file_utils.py # File operations
│ └── parser_utils.py # Input parsing
├── input_parameters.json # Configuration parameters
├── main.py # Application entry point
├── query.txt # Sample queries
└── settings.py # Environment settings
Deploy on a Confidential Space using AMD SEV.
-
Google Cloud Platform Account: Access to the
verifiable-ai-hackathon
project is required. -
Gemini API Key: Ensure your Gemini API key is linked to the project.
-
gcloud CLI: Install and authenticate the gcloud CLI.
-
Set Environment Variables: Update your
.env
file with:TEE_IMAGE_REFERENCE=ghcr.io/flare-foundation/flare-ai-rag:main # Replace with your repo build image INSTANCE_NAME=<PROJECT_NAME-TEAM_NAME>
-
Load Environment Variables:
source .env
Reminder: Run the above command in every new shell session or after modifying
.env
. On Windows, we recommend using git BASH to access commands likesource
. -
Verify the Setup:
echo $TEE_IMAGE_REFERENCE # Expected output: Your repo build image
Run the following command:
gcloud compute instances create $INSTANCE_NAME \
--project=verifiable-ai-hackathon \
--zone=us-east5-b \
--machine-type=n2d-standard-2 \
--network-interface=network-tier=PREMIUM,nic-type=GVNIC,stack-type=IPV4_ONLY,subnet=default \
--metadata=tee-image-reference=$TEE_IMAGE_REFERENCE,\
tee-container-log-redirect=true,\
tee-env-GEMINI_API_KEY=$GEMINI_API_KEY,\
--maintenance-policy=MIGRATE \
--provisioning-model=STANDARD \
--service-account=confidential-sa@verifiable-ai-hackathon.iam.gserviceaccount.com \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--min-cpu-platform="AMD Milan" \
--tags=flare-ai,http-server,https-server \
--create-disk=auto-delete=yes,\
boot=yes,\
device-name=$INSTANCE_NAME,\
image=projects/confidential-space-images/global/images/confidential-space-debug-250100,\
mode=rw,\
size=11,\
type=pd-standard \
--shielded-secure-boot \
--shielded-vtpm \
--shielded-integrity-monitoring \
--reservation-affinity=any \
--confidential-compute-type=SEV
-
After deployment, you should see an output similar to:
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS rag-team1 us-central1-b n2d-standard-2 10.128.0.18 34.41.127.200 RUNNING
-
It may take a few minutes for Confidential Space to complete startup checks. You can monitor progress via the GCP Console logs. Click on Compute Engine → VM Instances (in the sidebar) → Select your instance → Serial port 1 (console).
When you see a message like:
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
the container is ready. Navigate to the external IP of the instance (visible in the VM Instances page) to access the Chat UI.
If you encounter issues, follow these steps:
-
Check Logs:
gcloud compute instances get-serial-port-output $INSTANCE_NAME --project=verifiable-ai-hackathon
-
Verify API Key(s): Ensure that all API Keys are set correctly (e.g.
GEMINI_API_KEY
). -
Check Firewall Settings: Confirm that your instance is publicly accessible on port
80
.
Design and implement a knowledge ingestion pipeline, with a demonstration interface showing practical applications for developers and users.
N.B. Other vector databases can be used, provided they run within the same Docker container as the RAG system, since the deployment will occur in a TEE.
- Enhanced Data Ingestion & Indexing: Explore more sophisticated data structures for improved indexing and retrieval, and expand beyond a CSV format to include additional data sources (e.g., Flare's GitHub, blogs, documentation). BigQuery integration would be desirable.
- Intelligent Query & Data Processing: Use recommended AI models to refine the data processing pipeline, including pre-processing steps that optimize and clean incoming data, ensuring higher-quality context retrieval. (e.g. Use an LLM to reformulate or expand user queries before passing them to the retriever, improving the precision and recall of the semantic search.)
- Advanced Context Management: Develop an intelligent context management system that:
- Implements Dynamic Relevance Scoring to rank documents by their contextual importance.
- Optimizes the Context Window to balance the amount of information sent to LLMs.
- Includes Source Verification Mechanisms to assess and validate the reliability of the data sources.
- Improved Retrieval & Response Pipelines: Integrate hybrid search techniques (combining semantic and keyword-based methods) for better retrieval, and implement completion checks to verify that the responder's output is complete and accurate (potentially allow an iterative feedback loop for refining the final answer).