"Scale by Subtraction" β The smartest systems aren't the ones that compute the mostβthey're the ones that know when NOT to compute.
A comprehensive guide to modern agentic system design principles and patterns. Production-tested architectural patterns that challenge conventional wisdom about AI agents.
- Overview
- Why This Matters
- Core Concepts
- Architecture Overview
- Quick Start
- Benefits
- Examples
- Contributing
- Philosophy
This repository documents revolutionary architectural patterns for building production-grade AI agent systems. These patterns challenge conventional wisdom and provide practical, battle-tested approaches to creating reliable, cost-effective, and scalable agentic applications.
| Traditional Approach | Agentic Architecture |
|---|---|
| β LLM for every request | β 90% lookup, 10% reasoning |
| β Detect hallucinations after | β Prevent hallucinations structurally |
| β Agents chat with each other | β Silent swarms with structured data |
| β Static knowledge bases | β Self-healing recursive ontologies |
| β Add more features | β Scale by subtraction |
Results achieved:
- π 10-100x performance improvement
- π° 90%+ cost reduction
- π‘οΈ 0% policy violations (vs 26.67% for prompt-based safety)
- π 92.4% code verification accuracy
Why "Thinking" is a Technical Debt.
Engineers are falling into the Inference Trap: throwing massive reasoning models at problems that are actually just retrieval problems. This document explores:
- The misconception that AI and Search are independent
- Why reasoning must have a "reason" (compute and latency costs)
- The Scale by Subtraction philosophy (removing capabilities)
- The missing component: The Guardrail Router
- The target ratio: 80-90% Lookup, 10-20% Reasoning
Key Insight: If your agent is "thinking" for every request, you haven't built an agent; you've built a philosophy major. In production, we need engineers, not philosophers.
The Decision Module That Prevents the Inference Trap.
The Guardrail Router is a critical component that sits before your AI system and decides: "Does this actually require reasoning?" This document covers:
- Request classification without expensive processing
- Constraint enforcement to maintain healthy ratios
- Smart routing between lookup and reasoning paths
- Metrics tracking and optimization
- Real-world implementation patterns
Key Insight: The smartest systems aren't the ones that compute the mostβthey're the ones that know when NOT to compute.
Why 90% of your agent's work should be "dumb" lookup, not "smart" reasoning.
Modern agentic systems achieve optimal performance by prioritizing fast, reliable lookups over expensive LLM computation. This document explores:
- The 90/10 rule for lookup vs. computation
- Performance and cost benefits
- Implementation strategies (caching, knowledge graphs, semantic indexing)
- Real-world examples with 10x performance improvements
- Metrics to track and optimize
Key Insight: The smartest agents aren't the ones that think the hardestβthey're the ones that know where to look.
Beyond Flat Context: Scale by Subtraction Using Graph Constraints.
Context is not just a pile of documents in a Vector Database. RAG is flatβit finds similar words but doesn't understand the structure of reality. This document covers:
- The problem with flat context (RAG limitations)
- The graph as a semantic firewall (constraint wrapper)
- Six dimensions: Identity & Scope, Organizational Hierarchy, Service Ownership, Dependencies, Temporal Weight, Authority
- Real-world example: "What pending items do I have on my plate?"
- The constraint outcome: Subtracting 99% of noise before the LLM sees anything
- Comparing RAG vs. Multidimensional approaches
Key Insight: The Graph doesn't answer questions. It eliminates wrong answers. By filtering the universe through dimensional constraints, we subtract 99% of noise using deterministic graph logic, leaving the AI with the easy job of summarizing the 1% of signal that remains.
Using Multidimensional Knowledge Graphs to block hallucinations before they happen.
A defense-in-depth architecture that prevents AI hallucinations through structural validation against knowledge graphs. This document covers:
- Multidimensional knowledge graph design (entity, temporal, confidence, context)
- Six validation rules for blocking hallucinations
- Implementation patterns for proactive protection
- Benefits over post-generation detection
- Real-world implementation examples
Key Insight: Don't detect hallucinations after generationβprevent them structurally before they reach users.
Why the best agents are the ones that can't talk (Silent Swarms).
Challenging the assumption that agents must communicate through natural language, this document presents:
- The performance bottleneck of conversational interfaces
- Headless architecture with structured data exchange
- Silent Swarm patterns for agent coordination
- 10-100x performance improvements
- 90%+ cost reduction through eliminating inter-agent LLM calls
- When to use headless vs. conversational patterns
Key Insight: Language is for humans. Code is for machines. Keep them separate.
Function Over Form: Scale by Subtraction Through "Security by Silence".
The AI industry suffers from a "Chatbot Hangover"βwe design systems as if conversation is mandatory. This document challenges that assumption:
- The Code Review Paradox: We want the work, not the worker's personality
- Separation of Concerns: "The Face" (can talk, no tools) vs. "The Hands" (can execute, no talk)
- Security by Silence: Jailbreak-resistant architecture
- 90% of agents should be mute
- Function over form in multi-agent coordination
Key Insight: Stop judging agents by how well they chat. Start judging them by how well they shut up and work.
Self-Updating Semantic Firewalls (Part 4).
Static systems die. In a world where data changes every second, knowledge graphs cannot remain static. This document introduces recursive ontologiesβsystems that update themselves:
- The Feedback Loop: Agents as telemetry (failures as signals)
- Ephemeral Graphs: Event-driven, just-in-time knowledge bases
- Human Wisdom: Statistical supervision (5% review, 95% automation)
- The Analyst System: Pattern detection and self-healing
- Real-world implementation of self-updating architectures
- The death of manual knowledge curation
Key Insight: When an agent fails to find an answer, that is not an errorβit is a signal. The system heals its own knowledge gaps based on the friction points of the agents living inside it.
The new role that replaces the traditional Software Engineer.
As AI agents become capable of writing code, the human role shifts to knowledge architecture and system design. This document explores:
- Core responsibilities (knowledge architecture, cognitive orchestration, optimization, recursive ontology management)
- Key skills (information architecture, system design, performance engineering)
- Day-to-day activities and deliverables
- Tools and technologies
- Career path from junior to principal architect
- Transition guide for software engineers
Key Insight: The best code is no code. The best architect designs systems that don't need to compute what they can look up. And the best knowledge graph is one that updates itself.
10. The Mute Agent π
Capability-Based Execution: Return NULL, Don't Hallucinate.
The most reliable agent is one that knows when to say nothing. This pattern implements capability-based execution where agents return NULL for out-of-scope requests instead of fabricating answers:
- Capability manifests: What the agent CAN do (not what it might try)
- NULL responses: Silence is better than hallucination
- POSIX-inspired permissions: Fine-grained access control
- Policy enforcement: Deterministic rules, not probabilistic guardrails
- The 0% violation guarantee: Structural safety over prompt engineering
Key Insight: An agent that returns NULL when uncertain is infinitely more valuable than one that confidently hallucinates.
11. Control Planes vs Prompts π
Why Deterministic Infrastructure Beats Probabilistic Prompting.
Stop trying to "prompt engineer" your way to safety. This pattern establishes control plane architecture for AI governance:
- Prompts are suggestions, policies are laws
- Kernel-level enforcement: Safety below the LLM layer
- Permission systems: What agents CAN do, not what they SHOULD do
- Audit trails: Every action logged, every decision traceable
- Rollback capability: Undo any agent action
Key Insight: You wouldn't secure a web app with strongly-worded comments. Don't secure AI agents with strongly-worded prompts.
These concepts work together to form a complete architectural philosophy:
flowchart TB
subgraph UI["π₯οΈ User Interface Layer"]
User["Natural Language Boundaries"]
end
subgraph Router["π¦ Guardrail Router"]
Decision{"Does this need<br/>reasoning?"}
end
subgraph Paths["Processing Paths"]
Lookup["π Lookup Path<br/><b>80-90%</b>"]
Reasoning["π§ Reasoning Path<br/><b>10-20%</b>"]
end
subgraph Firewall["π‘οΈ Semantic Firewall"]
Validate["Validation & Verification<br/>Block hallucinations structurally"]
end
subgraph Swarm["π Silent Swarm"]
Headless["Headless Agents<br/>Structured coordination"]
end
subgraph Execution["β‘ Execution Layer"]
L90["Lookup<br/><b>90%</b>"]
C10["Compute<br/><b>10%</b>"]
end
subgraph Knowledge["π Knowledge Architecture"]
KG["Graphs β’ Vectors β’ Indices"]
end
User --> Decision
Decision -->|"Cached/Known"| Lookup
Decision -->|"Novel/Complex"| Reasoning
Lookup --> Validate
Reasoning --> Validate
Validate --> Headless
Headless --> L90
Headless --> C10
L90 --> KG
C10 --> KG
style UI fill:#1a1a2e,stroke:#00d4ff,color:#fff
style Router fill:#16213e,stroke:#00d4ff,color:#fff
style Firewall fill:#0f3460,stroke:#e94560,color:#fff
style Swarm fill:#1a1a2e,stroke:#00d4ff,color:#fff
style Knowledge fill:#16213e,stroke:#00d4ff,color:#fff
Static systems die. Recursive Ontologies add a self-updating layer:
flowchart TB
subgraph Telemetry["π‘ Agent Telemetry"]
Failures["Failures as Signals<br/>Every agent contributes feedback"]
end
subgraph Analyst["π Analyst System"]
Patterns["Pattern Detection<br/>& Self-Healing"]
end
subgraph Actions["π§ Healing Actions"]
Auto["Auto Heal<br/><b>95%</b>"]
Human["Human Review<br/><b>5%</b>"]
Rebuild["Rebuild<br/>Graph Sectors"]
end
subgraph Graphs["π Ephemeral Graphs"]
Org["OrgGraph<br/><i>HR events</i>"]
Product["ProductGraph<br/><i>Git events</i>"]
Context["ContextGraph<br/><i>Project TTL</i>"]
end
Failures --> Patterns
Patterns --> Auto
Patterns --> Human
Patterns --> Rebuild
Auto --> Org
Auto --> Product
Auto --> Context
Human --> Org
Human --> Product
Human --> Context
Rebuild --> Org
Rebuild --> Product
Rebuild --> Context
style Telemetry fill:#1a1a2e,stroke:#00d4ff,color:#fff
style Analyst fill:#16213e,stroke:#e94560,color:#fff
style Graphs fill:#0f3460,stroke:#00d4ff,color:#fff
Key Insight: The system doesn't need manual updates. Agent failures signal knowledge gaps. The Analyst System detects patterns and triggers automatic healing.
π¨βπ» For Developers
Read the concepts in order:
| # | Concept | Learn |
|---|---|---|
| 1 | The Inference Trap | Why "thinking" is technical debt |
| 2 | The Guardrail Router | Prevent expensive reasoning misuse |
| 3 | Compute-to-Lookup Ratio | The 90/10 performance foundation |
| 4 | Multidimensional Knowledge Graphs | Constraint-based filtering |
| 5 | Semantic Firewall | Structural hallucination prevention |
| 6 | Headless Agent | Efficient coordination |
| 7 | Silent Swarm | Security by silence |
| 8 | Recursive Ontologies | Self-updating knowledge |
| 9 | The Mute Agent | Capability-based execution |
| 10 | Control Planes vs Prompts | Deterministic safety |
| 11 | Cognitive Systems Architect | The holistic view |
- Are you falling into the Inference Trap?
- What's your compute-to-lookup ratio?
- Where are hallucinations possible?
- How much do inter-agent LLM calls cost?
- Is your knowledge architecture documented?
# Start with the examples
cd examples/
python guardrail_router_example.py
python semantic_firewall_example.pyποΈ For Architects
Knowledge-First Systems:
- Implement Guardrail Router as first line of defense
- Map your domain's knowledge requirements
- Design multidimensional knowledge graphs
- Plan pre-computation and indexing strategies
- Define validation rules and confidence thresholds
Optimize for Lookup:
- Target 80-90% lookup, 10-20% reasoning
- Implement multi-tier caching
- Build comprehensive indices
- Pre-compute common queries
Build Trust Through Structure:
- Implement semantic firewalls
- Define validation rules
- Track confidence scores
- Maintain source attribution
Coordinate Efficiently:
- Use headless agents for inter-system communication
- Reserve natural language for human boundaries
- Implement event-driven architectures
- Design for observability with structured telemetry
Systems designed with these principles achieve:
| Metric | Result | How |
|---|---|---|
| π Performance | 10-100x faster | Aggressive caching, lookup optimization |
| π° Cost | 90%+ reduction | Minimize expensive LLM calls |
| π‘οΈ Safety | 0% violations | Structural validation, not prompts |
| π Scalability | Infinite | Stateless, parallel execution |
| π Observability | Perfect | Structured telemetry |
| π― Predictability | Deterministic | Lookups over stochastic generation |
All patterns include working Python examples:
examples/
βββ guardrail_router_example.py # Request classification & routing
βββ compute_to_lookup_example.py # 90/10 optimization patterns
βββ semantic_firewall_example.py # Hallucination prevention
βββ multidimensional_kg_example.py # Knowledge graph constraints
βββ headless_agent_example.py # Structured communication
βββ silent_swarm_example.py # Multi-agent coordination
βββ recursive_ontology_example.py # Self-healing systemsThis is a living document. Contributions welcome:
- π¬ Share implementation experiences
- π Propose new patterns
- π Submit case studies
- π Improve documentation
See CONTRIBUTING.md for guidelines.
Each concept document includes:
- Detailed explanations with diagrams
- Code examples in Python
- Real-world case studies
- Implementation checklists
- Metrics to track
- Common anti-patterns to avoid
|
|
|
|
|
|
|
- Agent OS - Safety-First Kernel implementing these patterns (0% policy violations)
- AgentMesh - The Secure Nervous System for Cloud-Native Agent Ecosystems
- Agent Mesh Patterns - Identity, Trust, Governance, Reward patterns
- Production Deployment Guide - CI/CD, observability, operational best practices
Built with β€οΈ for the future of agentic systems
β Star this repo if you find it useful!