Apex is an API-first, self-hostable Retrieval-Augmented Generation platform focused on grounded answers, clear system boundaries, and production-grade traceability. It ingests documents, builds hybrid retrieval indexes over Postgres and Qdrant, and exposes the workflow through an HTTP API plus a CLI.
Today, the open-source project covers the core RAG path end to end:
- document ingest for PDF, Markdown, and plain text
- chunking with configurable strategies
- dense, sparse, and hybrid retrieval
- chat responses with citations
- authentication (API key + OIDC), role-based access control, and rate limiting
- multi-tenant HTTP and CLI workflows
The current platform is also hardened in the areas that matter operationally:
- protected routes now fail closed by default
/ingestrejects invalid or out-of-bounds filesystem paths as400requests- search keeps client-actionable collection errors visible while sanitizing real backend faults
- secrets in
rag-coreconfig are redacted from debug output agent-corevalidates tool references and rejects ambiguous checkpoint configs at load time- the workspace is aligned on Rust 2024 formatting and denies
unsafein the core library crates
Apex is the open-source migration path for Apex Accelerator, my commercial offering. That migration is still in progress.
The current open-source release spans the Phase 1 RAG foundation and the emerging Phase 2 agent workflow layer. Later phases bring over the observability, provenance, and compliance capabilities that matter in enterprise deployments.
- Phase 1: Open RAG foundation Ingest, retrieval, chat, API, CLI, and the storage/runtime model needed for a solid self-hosted RAG stack.
- Phase 2: Agent workflows Agent definitions, execution flows, tools, runs, and evidence-oriented orchestration on top of the RAG substrate.
- Phase 3: Observability and provenance OpenTelemetry instrumentation, richer provenance capture, and better auditability for where answers came from and how they were produced.
- Phase 4: EU AI Act alignment Compliance-facing controls, documentation, and governance features aimed at real-world high-assurance deployments.
Some commercial-only pieces remain out of scope for the open-source project or will land in different form, including the license server, binary obfuscation, and certain cloud-native infrastructure for authentication and persistence.
- Rust 1.94+ (edition 2024)
- Docker (for Postgres + Qdrant)
- just task runner
# 1. Clone and enter the repo
git clone https://github.com/HendrikReh/apex.git
cd apex
# 2. Start Postgres + Qdrant
just up
# 3. Create a .env with your database URL (matches docker-compose defaults)
echo 'DATABASE_URL=postgres://postgres:postgres@127.0.0.1:5432/postgres' > .env
# 4. Run the server with mock embeddings for local development
just run-server-mock
# 5. In another terminal, install the CLI and try it out
cargo install --path crates/rag-cli
# Ingest some files
rag-cli ingest ./path/to/docs --collection my-docs
# Search
rag-cli search --query "how does authentication work?" --collection my-docs
# Collection statistics
rag-cli collection-stats --collection my-docsjust run-server-mock is useful for local ingestion and retrieval development because it removes the embedding dependency. Chat still needs a configured LLM provider.
Use just run-server-mock when you want a fast local loop for ingest and retrieval work without configuring embeddings. This is the easiest way to validate API, chunking, ingest, and search behavior on a fresh checkout.
For grounded chat responses and production-quality retrieval, add your API key and run the default server profile:
# Add your API key to .env
echo 'OPENAI_API_KEY=sk-...' >> .env
# Run with OpenAI-compatible LLM + embeddings
just run-server
# Then ask a question through the CLI
rag-cli chat --query "Summarize the main concepts" --collection my-docsIn real deployments, prefer provider-backed embeddings plus explicit filesystem boundaries for server-side ingestion.
- Rust end to end for the server, client, and core retrieval pipeline.
- Self-hostable architecture with explicit storage boundaries: Postgres for metadata, Qdrant for vectors.
- Grounded responses with retrieval-backed citations instead of opaque chat completions.
- Multi-tenant by design through consistent tenant handling across API and CLI paths.
- Built for migration to higher-assurance workflows such as provenance, observability, agents, and compliance controls.
rag-cli ──→ rag-client ──→ HTTP ──→ rag-server ──→ rag-core ──→ rag-chunking
│
├──→ agent-core (graph-flow DAG runner)
│
┌────┴──────┐
│ Middleware│
│ stack: │
│ req-id │
│ tenant │
│ auth │
│ rate-limit│
│ authz │
└─────┬─────┘
│
┌──────┴──────┐
│ │
Postgres Qdrant
(metadata) (vectors)
| Crate | Role |
|---|---|
rag-core |
Extraction, embedding, retrieval, context assembly, stores, config |
rag-chunking |
9 token-aware text chunking strategies (pure, no IO) |
agent-core |
Graph-flow DAG runner for agent orchestration (YAML-driven specs, checkpoints) |
rag-server |
Axum HTTP API — auth, RBAC, rate limiting, multi-tenant middleware |
rag-client |
Typed HTTP client for the server API |
rag-cli |
CLI for ingest, search, chat, stats, and API key generation |
test-support |
Ephemeral Axum test servers for integration tests |
The server exposes a JSON API on http://localhost:8080 by default. Interactive documentation is served at /swagger-ui/ when the server is running.
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/health |
GET | No | Liveness probe |
/readiness |
GET | No | Readiness probe (checks Postgres + Qdrant) |
/openapi.json |
GET | No | OpenAPI 3.1 specification |
/swagger-ui/ |
GET | No | Interactive API documentation |
/ingest |
POST | Yes | Ingest documents from filesystem paths |
/ingest/upload |
POST | Yes | Upload and ingest a file (multipart, 50 MB) |
/search/dense |
POST | Yes | Dense vector search |
/search/sparse |
POST | Yes | BM25 sparse search |
/search/hybrid |
POST | Yes | Hybrid search with RRF fusion |
/chat |
POST | Yes | RAG chat with citations |
/collections/:name/stats |
GET | Yes | Collection statistics |
/auth/service-accounts |
POST | Yes | Create service account |
/auth/api-keys |
POST | Yes | Generate API key |
/auth/api-keys |
GET | Yes | List API keys |
/auth/api-keys/:id |
DELETE | Yes | Revoke API key |
Multi-tenancy is built in. Pass x-tenant header to isolate data per tenant (defaults to "default").
| Source | Purpose | Example |
|---|---|---|
config/app.toml |
Non-secret settings | Chunking params, endpoints, model names |
.env |
Secrets and local overrides | OPENAI_API_KEY, DATABASE_URL |
| Environment variables | Overrides | Take precedence over both files |
Two configuration boundaries are worth calling out explicitly:
ingest_allowed_rootsinconfig/app.tomlcontrols which server-visible filesystem paths/ingestmay access. Requests outside those roots are rejected with400./ingest/uploadis the safer choice when the client should send file content directly instead of referencing a path on the server host.
In practice:
# config/app.toml
ingest_allowed_roots = [
"/srv/apex/import",
"/var/lib/apex/dropbox",
]Keep secrets such as OPENAI_API_KEY and DATABASE_URL in .env or environment variables rather than in config/app.toml.
just test # fmt + clippy + cargo test
just unit-test # fast unit-only loop
just integration-test # Docker-backed integration suites
just smoke-test # optional provider/native smoke suites
just fmt # Check formatting
just clippy # Lint with strict settings
just up # Start docker services
just down-v # Stop services and wipe dataSee docs/howto/testing.md for the full testing workflow, where to place tests, prerequisites, and concrete examples.