NEXUS RAG - Enterprise Hybrid RAG System

AI-Powered Knowledge Platform. Secure. Multi-Tenant. Scalable. Hybrid RAG.

🚀 Overview

NEXUS RAG is a production-grade Retrieval-Augmented Generation (RAG) platform designed for secure enterprise knowledge management. It enables organizations to ingest vast amounts of data and query it using state-of-the-art LLMs, with strict data isolation between teams.

It combines a high-performance Hybrid Search pipeline (Dense + Sparse) with robust enterprise security (RBAC, Audit Logging, SSO).

🛠 Tech Stack

Frontend (Client)

Framework: Next.js 16 (App Router, Server Actions)
Language: TypeScript 5+
Styling: Tailwind CSS v4, Custom Glassmorphism System
State: React Hooks, Context API

Backend (Server)

Core: Python 3.11+, FastAPI
Orchestration: LlamaIndex / LangChain
Vector Database: Qdrant (Containerized)
Database: PostgreSQL (via Neon Serverless)
ORM: Prisma (Multi-tenant Schema)
Caching & Rate Limiting: Upstash Redis

Infrastructure & Security

Authentication: NextAuth.js v5 (Google OAuth 2.0, JWT Strategy)
Deployment: Docker Compose (Local), Vercel (Frontend), Railway/AWS (Backend)

✨ Features

Core RAG Capabilities

Feature	Description
Hybrid Search	Dense (BGE-Small) + Sparse (BM42) vectors for semantic + keyword matching.
HyDE	Hypothetical Document Embeddings to hallucinate an answer for better retrieval.
Parent-Child Indexing	Search on small chunks (children), retrieve full context (parents).
FlashRank Reranking	CPU-optimized reranking to order results by relevance.
Semantic Router	Zero-shot classification to decide if a query needs RAG or just Chat.
Inline Citations	Responses include clickable `[Source: file.pdf]` references.
VRAM Protection	Embeddings & Reranking run on CPU; only the LLM uses GPU.
RAGAS Evaluation	Built-in pipeline to test Faithfulness & Answer Relevancy.

🛡️ Enterprise Platform Features (New)

Feature	Description
Multi-Tenancy	Logic-enforced isolation ensures Team A cannot access Team B's data
RBAC System	System Admin (Platform control), Team Owner (Manage members), Member (Read/Write)
Admin Console	Dedicated dashboard (`/admin`) for User management, Team creation, and Analytics
Smart Onboarding	Multi-step wizard with "Magic Auto-Join" based on email domain (`@jwtl.in`)
Audit Logging	Comprehensive tracking of all critical actions (Signups, Data Access, Settings)
Rate Limiting	Token-bucket limiting using Upstash Redis to prevent abuse (50 req/hour)
API Keys	Scoped API key management for programmatic access (`sk_...`)
Mobile-First UX	Fully responsive design with swipeable drawers, `100dvh` viewport fixes, and touch targets
Real-Time Presence	Live "online users" indicators per team (Heartbeat mechanism)

🏗️ Architecture

graph TD
    User[User] -->|HTTPS| Frontend["Next.js 16 Frontend"]
    Frontend -->|NextAuth| Auth["Google OAuth 2.0"]
    Frontend -->|API Request| Backend["FastAPI Backend"]
    
    subgraph "Backend Services"
        Backend -->|Auth Check| API_Secret["Internal API Secret"]
        Backend -->|Route| Router{Semantic Router}
        Router -->|Chat| LLM["Qwen 2.5 3B (GPU)"]
        Router -->|RAG| HybridParams["Contextualize + HyDE"]
        
        HybridParams -->|Search| Qdrant[(Qdrant Vector DB)]
        Qdrant -->|Retrieve| Reranker["FlashRank (CPU)"]
        Reranker -->|Context| LLM
    end
    
    subgraph "Data Layer"
        Qdrant <-->|Vectors| Ingestion["Ingestion Service"]
        Frontend -->|Session| Redis[(Redis Cache)]
        Backend -->|Metadata| PG[(PostgreSQL)]
    end

📁 Project Structure

NEXUS RAG/
├── backend/                  # Python FastAPI Backend
│   ├── src/
│   │   ├── api.py           # FastAPI endpoints
│   │   ├── services/
│   │   │   ├── query.py     # Main RAG pipeline
│   │   │   ├── ingestion.py # Document ingestion
│   │   │   ├── llm.py       # Qwen LLM wrapper
│   │   │   ├── auth.py      # [NEW] Auth & User Context
│   │   │   ├── router.py    # Semantic classifier
│   │   │   └── guardrails.py # Security guardrails
│   │   ├── core/
│   │   │   └── config.py    # Settings & prompts
│   │   └── db/
│   │       ├── vector_store.py # Qdrant connection
│   │       ├── postgres.py  # PostgreSQL storage
│   │       └── redis_cache.py # Redis caching
│   ├── evaluate_ragas.py    # RAGAS evaluation script
│   ├── debug_search.py      # Retrieval diagnostic
│   ├── SLM/                 # Local LLM model files
│   └── docker-compose.yml   # Qdrant + Redis containers
│
└── frontend-new/             # Next.js 16 UI
    ├── src/
    │   ├── app/
    │   │   ├── admin/       # [NEW] Admin Dashboard
    │   │   ├── onboarding/  # [NEW] User Onboarding Flow
    │   │   ├── profile/     # [NEW] User Profile
    │   │   └── page.tsx     # Main chat interface
    │   ├── components/      # Reusable UI (Sidebar, UserMenu)
    │   ├── lib/api/         # API client
    │   └── hooks/           # Custom Hooks (useChatStream, usePresence)
    └── tailwind.config.ts   # Design System & Breakpoints

🔧 Backend Components Explained

1. SemanticRouter (`router.py`)

Purpose: Classify queries as "chat" (general conversation) or "rag" (document retrieval).

Method: route(query) returns "chat" or "rag" based on cosine similarity.
Threshold: > 0.75 similarity routes to Chat (pre-computed embeddings for "hi", "hello", etc.).

2. HybridQueryService (`query.py`)

Purpose: The main RAG pipeline orchestrator.

Configuration:
- Dense: BAAI/bge-small-en-v1.5
- Sparse: Qdrant/bm42-all-minilm-l6-v2-attentions
- Rerank: Top 3 from FlashRank
Pipeline: Contextualize → HyDE → Hybrid Search → Fetch Parents → Rerank → Generate.
System Prompt Rules: Answer directly, Quote text, Cite claims [Source: filename], No outside knowledge.

3. HybridIngestionService (`ingestion.py`)

Purpose: Process documents into searchable chunks (Parent-Child Indexing).

Process: Load -> Chunk (2000 char parents, 400 char children) -> Embed (Dual) -> Index -> Store Parents in PG/Redis.

4. LocalQwenGPU (`llm.py`)

Purpose: Custom LlamaIndex wrapper for Qwen 2.5 3B running on GPU.

Stack: llama-cpp-python with CUDA.
Context: 4096 tokens. Configured as a Singleton.

5. PromptGuardrails (`guardrails.py`)

Purpose: Security layer.

Protections: Prompt Injection, Jailbreaking, Harmful Content, Output Filtering.
Identity: Enforces consistent responses to "who are you?".

6. Enterprise API Endpoints

Method	Endpoint	Description
`POST`	`/ingest/`	Upload and index documents
`POST`	`/query/`	Query with RAG (non-streaming)
`POST`	`/query/stream`	Query with streaming response
`GET`	`/teams/`	List all teams/collections
`DELETE`	`/teams/{team}`	Delete team collection & wipe vectors
`POST`	`/admin/teams`	[NEW] Create Team with Auto-Admin
`GET`	`/admin/users`	[NEW] List/Manage Users

7. RAGAS Evaluation

Metrics: Faithfulness (0.64), Answer Relevancy (0.80)
Run: python evaluate_ragas.py --team engineering

🚦 Getting Started

Prerequisites

Node.js 18+ (Running Next.js 16)
Python 3.11+
Docker & Docker Compose
PostgreSQL Database (Local or Neon)

1. Environment Setup

Copy the example environment files to get started quickly:

Frontend:

cp frontend-new/.env.example frontend-new/.env.local

Edit .env.local to add your keys (Google OAuth, etc.).

Backend:

cp backend/.env.example backend/.env

Edit .env to add your database and LLM paths.

2. Installation

# Frontend
cd frontend-new
npm install
npx prisma generate
npx prisma db push # Sync schema

# Backend
cd backend
pip install -r requirements.txt

3. Run Locally

Infrastructure: docker-compose up -d (Starts Qdrant & Redis)
Backend: uvicorn src.api:app --reload --host 0.0.0.0 --port 8000
Frontend: npm run dev (Runs on localhost:3000)

4. Admin Setup

Sign in with Google.
The first user is automatically promoted to System Admin.
Navigate to /admin to create teams and invite users.
Subsequent users with matching domains (@jwtl.in) will auto-join their existing teams.

⚙️ Configuration & Diagnostics

Diagnostic Tools:

python debug_search.py: Inspect retrieved chunks for a query.
evaluate_ragas.py: Run evaluation benchmarks.

Data Flow: User Query → SemanticRouter → (Chat / RAG) → HyDE → Hybrid Search (Qdrant) → Rerank (FlashRank) → LLM (Qwen) → Stream.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.agent/workflows		.agent/workflows
backend		backend
frontend-new		frontend-new
.gitignore		.gitignore
HO RAG.code-workspace		HO RAG.code-workspace
Nexus Logo.png		Nexus Logo.png
README.md		README.md
docker-compose.prod.yml		docker-compose.prod.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NEXUS RAG - Enterprise Hybrid RAG System

🚀 Overview

🛠 Tech Stack

Frontend (Client)

Backend (Server)

Infrastructure & Security

✨ Features

Core RAG Capabilities

🛡️ Enterprise Platform Features (New)

🏗️ Architecture

📁 Project Structure

🔧 Backend Components Explained

1. SemanticRouter (`router.py`)

2. HybridQueryService (`query.py`)

3. HybridIngestionService (`ingestion.py`)

4. LocalQwenGPU (`llm.py`)

5. PromptGuardrails (`guardrails.py`)

6. Enterprise API Endpoints

7. RAGAS Evaluation

🚦 Getting Started

Prerequisites

1. Environment Setup

2. Installation

3. Run Locally

4. Admin Setup

⚙️ Configuration & Diagnostics

License

About

Uh oh!

Releases

Packages

Languages

CL4YMOR3/RAG

Folders and files

Latest commit

History

Repository files navigation

NEXUS RAG - Enterprise Hybrid RAG System

🚀 Overview

🛠 Tech Stack

Frontend (Client)

Backend (Server)

Infrastructure & Security

✨ Features

Core RAG Capabilities

🛡️ Enterprise Platform Features (New)

🏗️ Architecture

📁 Project Structure

🔧 Backend Components Explained

1. SemanticRouter (router.py)

2. HybridQueryService (query.py)

3. HybridIngestionService (ingestion.py)

4. LocalQwenGPU (llm.py)

5. PromptGuardrails (guardrails.py)

6. Enterprise API Endpoints

7. RAGAS Evaluation

🚦 Getting Started

Prerequisites

1. Environment Setup

2. Installation

3. Run Locally

4. Admin Setup

⚙️ Configuration & Diagnostics

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. SemanticRouter (`router.py`)

2. HybridQueryService (`query.py`)

3. HybridIngestionService (`ingestion.py`)

4. LocalQwenGPU (`llm.py`)

5. PromptGuardrails (`guardrails.py`)

Packages