AI-Powered Knowledge Platform. Secure. Multi-Tenant. Scalable. Hybrid RAG.
NEXUS RAG is a production-grade Retrieval-Augmented Generation (RAG) platform designed for secure enterprise knowledge management. It enables organizations to ingest vast amounts of data and query it using state-of-the-art LLMs, with strict data isolation between teams.
It combines a high-performance Hybrid Search pipeline (Dense + Sparse) with robust enterprise security (RBAC, Audit Logging, SSO).
- Framework: Next.js 16 (App Router, Server Actions)
- Language: TypeScript 5+
- Styling: Tailwind CSS v4, Custom Glassmorphism System
- State: React Hooks, Context API
- Core: Python 3.11+, FastAPI
- Orchestration: LlamaIndex / LangChain
- Vector Database: Qdrant (Containerized)
- Database: PostgreSQL (via Neon Serverless)
- ORM: Prisma (Multi-tenant Schema)
- Caching & Rate Limiting: Upstash Redis
- Authentication: NextAuth.js v5 (Google OAuth 2.0, JWT Strategy)
- Deployment: Docker Compose (Local), Vercel (Frontend), Railway/AWS (Backend)
| Feature | Description |
|---|---|
| Hybrid Search | Dense (BGE-Small) + Sparse (BM42) vectors for semantic + keyword matching. |
| HyDE | Hypothetical Document Embeddings to hallucinate an answer for better retrieval. |
| Parent-Child Indexing | Search on small chunks (children), retrieve full context (parents). |
| FlashRank Reranking | CPU-optimized reranking to order results by relevance. |
| Semantic Router | Zero-shot classification to decide if a query needs RAG or just Chat. |
| Inline Citations | Responses include clickable [Source: file.pdf] references. |
| VRAM Protection | Embeddings & Reranking run on CPU; only the LLM uses GPU. |
| RAGAS Evaluation | Built-in pipeline to test Faithfulness & Answer Relevancy. |
| Feature | Description |
|---|---|
| Multi-Tenancy | Logic-enforced isolation ensures Team A cannot access Team B's data |
| RBAC System | System Admin (Platform control), Team Owner (Manage members), Member (Read/Write) |
| Admin Console | Dedicated dashboard (/admin) for User management, Team creation, and Analytics |
| Smart Onboarding | Multi-step wizard with "Magic Auto-Join" based on email domain (@jwtl.in) |
| Audit Logging | Comprehensive tracking of all critical actions (Signups, Data Access, Settings) |
| Rate Limiting | Token-bucket limiting using Upstash Redis to prevent abuse (50 req/hour) |
| API Keys | Scoped API key management for programmatic access (sk_...) |
| Mobile-First UX | Fully responsive design with swipeable drawers, 100dvh viewport fixes, and touch targets |
| Real-Time Presence | Live "online users" indicators per team (Heartbeat mechanism) |
graph TD
User[User] -->|HTTPS| Frontend["Next.js 16 Frontend"]
Frontend -->|NextAuth| Auth["Google OAuth 2.0"]
Frontend -->|API Request| Backend["FastAPI Backend"]
subgraph "Backend Services"
Backend -->|Auth Check| API_Secret["Internal API Secret"]
Backend -->|Route| Router{Semantic Router}
Router -->|Chat| LLM["Qwen 2.5 3B (GPU)"]
Router -->|RAG| HybridParams["Contextualize + HyDE"]
HybridParams -->|Search| Qdrant[(Qdrant Vector DB)]
Qdrant -->|Retrieve| Reranker["FlashRank (CPU)"]
Reranker -->|Context| LLM
end
subgraph "Data Layer"
Qdrant <-->|Vectors| Ingestion["Ingestion Service"]
Frontend -->|Session| Redis[(Redis Cache)]
Backend -->|Metadata| PG[(PostgreSQL)]
end
NEXUS RAG/
βββ backend/ # Python FastAPI Backend
β βββ src/
β β βββ api.py # FastAPI endpoints
β β βββ services/
β β β βββ query.py # Main RAG pipeline
β β β βββ ingestion.py # Document ingestion
β β β βββ llm.py # Qwen LLM wrapper
β β β βββ auth.py # [NEW] Auth & User Context
β β β βββ router.py # Semantic classifier
β β β βββ guardrails.py # Security guardrails
β β βββ core/
β β β βββ config.py # Settings & prompts
β β βββ db/
β β βββ vector_store.py # Qdrant connection
β β βββ postgres.py # PostgreSQL storage
β β βββ redis_cache.py # Redis caching
β βββ evaluate_ragas.py # RAGAS evaluation script
β βββ debug_search.py # Retrieval diagnostic
β βββ SLM/ # Local LLM model files
β βββ docker-compose.yml # Qdrant + Redis containers
β
βββ frontend-new/ # Next.js 16 UI
βββ src/
β βββ app/
β β βββ admin/ # [NEW] Admin Dashboard
β β βββ onboarding/ # [NEW] User Onboarding Flow
β β βββ profile/ # [NEW] User Profile
β β βββ page.tsx # Main chat interface
β βββ components/ # Reusable UI (Sidebar, UserMenu)
β βββ lib/api/ # API client
β βββ hooks/ # Custom Hooks (useChatStream, usePresence)
βββ tailwind.config.ts # Design System & BreakpointsPurpose: Classify queries as "chat" (general conversation) or "rag" (document retrieval).
- Method:
route(query)returns "chat" or "rag" based on cosine similarity. - Threshold: > 0.75 similarity routes to Chat (pre-computed embeddings for "hi", "hello", etc.).
Purpose: The main RAG pipeline orchestrator.
- Configuration:
- Dense:
BAAI/bge-small-en-v1.5 - Sparse:
Qdrant/bm42-all-minilm-l6-v2-attentions - Rerank: Top 3 from
FlashRank
- Dense:
- Pipeline: Contextualize β HyDE β Hybrid Search β Fetch Parents β Rerank β Generate.
- System Prompt Rules: Answer directly, Quote text, Cite claims
[Source: filename], No outside knowledge.
Purpose: Process documents into searchable chunks (Parent-Child Indexing).
- Process: Load -> Chunk (2000 char parents, 400 char children) -> Embed (Dual) -> Index -> Store Parents in PG/Redis.
Purpose: Custom LlamaIndex wrapper for Qwen 2.5 3B running on GPU.
- Stack:
llama-cpp-pythonwith CUDA. - Context: 4096 tokens. Configured as a Singleton.
Purpose: Security layer.
- Protections: Prompt Injection, Jailbreaking, Harmful Content, Output Filtering.
- Identity: Enforces consistent responses to "who are you?".
| Method | Endpoint | Description |
|---|---|---|
POST |
/ingest/ |
Upload and index documents |
POST |
/query/ |
Query with RAG (non-streaming) |
POST |
/query/stream |
Query with streaming response |
GET |
/teams/ |
List all teams/collections |
DELETE |
/teams/{team} |
Delete team collection & wipe vectors |
POST |
/admin/teams |
[NEW] Create Team with Auto-Admin |
GET |
/admin/users |
[NEW] List/Manage Users |
- Metrics: Faithfulness (0.64), Answer Relevancy (0.80)
- Run:
python evaluate_ragas.py --team engineering
- Node.js 18+ (Running Next.js 16)
- Python 3.11+
- Docker & Docker Compose
- PostgreSQL Database (Local or Neon)
Copy the example environment files to get started quickly:
Frontend:
cp frontend-new/.env.example frontend-new/.env.localEdit .env.local to add your keys (Google OAuth, etc.).
Backend:
cp backend/.env.example backend/.envEdit .env to add your database and LLM paths.
# Frontend
cd frontend-new
npm install
npx prisma generate
npx prisma db push # Sync schema
# Backend
cd backend
pip install -r requirements.txt- Infrastructure:
docker-compose up -d(Starts Qdrant & Redis) - Backend:
uvicorn src.api:app --reload --host 0.0.0.0 --port 8000 - Frontend:
npm run dev(Runs on localhost:3000)
- Sign in with Google.
- The first user is automatically promoted to System Admin.
- Navigate to
/adminto create teams and invite users. - Subsequent users with matching domains (
@jwtl.in) will auto-join their existing teams.
Diagnostic Tools:
python debug_search.py: Inspect retrieved chunks for a query.evaluate_ragas.py: Run evaluation benchmarks.
Data Flow: User Query β SemanticRouter β (Chat / RAG) β HyDE β Hybrid Search (Qdrant) β Rerank (FlashRank) β LLM (Qwen) β Stream.
