Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 175 additions & 0 deletions research/014_compensating_controls_spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# Standard Engineering Controls for AI SAFE² Compliance

**Version:** 1.0
**Date:** February 7, 2026
**Author:** Vincent Sullivan
**Status:** Technical Specification

![Compensating Controls Spec](https://placehold.co/1200x630/1D1D1F/f6921e?text=Engineering+Controls+Specification)

---

## The Engineering Mandate

The **AI SAFE² Framework** operates on the core axiom: **"Policy is Intent. Engineering is Reality."**

To be compliant with AI SAFE² v2.1, an organization cannot simply document a risk in a risk register. They must deploy **Compensating Engineering Controls** that mechanically enforce the policy at runtime.

This specification outlines the minimum viable engineering standards for the critical control layers: **Sanitization**, **Circuit Breakers**, **State Verification**, **NHI Binding**, and **Swarm Consensus**.

---

## Control Spec 1: The Operational Circuit Breaker (Pillar 3)
**Objective:** Prevent "Runaway" agentic behavior (Infinite Loops, Financial Drain, Denial of Service).

### The Requirement
The system must possess a mechanical governor independent of the Agent's LLM logic. This governor must track velocity and accumulation of resources and sever execution **before** a human administrator can intervene.

### Implementation Standard
* **Location:** Must reside at the **Gateway / Proxy Layer** (between the Agent and the Tool API). It cannot reside inside the System Prompt.
* **Mechanism:** Token Bucket or Leaky Bucket algorithm.
* **Latency Budget:** < 20ms added latency.
* **Failure Mode:** **Fail-Closed.** If the metering service goes offline, all Agent traffic halts.

### The Logic Flow (Pseudo-Code)
```text
FUNCTION Check_Circuit_Breaker(Agent_ID):
Current_Spend = Get_Redis_Counter(Agent_ID)
Velocity_Rate = Get_Velocity(Agent_ID)

IF Current_Spend > Hard_Cap OR Velocity_Rate > Spike_Threshold:
Log_Event("Circuit Trip", Severity=CRITICAL)
Return 402_Payment_Required // Or 429_Too_Many_Requests
Sever_Connection()
ELSE:
Pass_Request()
```
## Control Spec 2: In-Line Input Sanitization (Pillar 1)
**Objective:** Prevent Prompt Injection and Jailbreaking from reaching the inference layer.

### The Requirement
All inputs from "Untrusted Sources" (Users, Web Scrapers, Email Ingestion) must be sanitized *before* entering the Context Window. Relying on the model to "refuse" a jailbreak is non-compliant.

### Implementation Standard
* **Inspection Depth:** Must scan both **User Input** and **Retrieval Context** (RAG Data).
* **Detection Method:** Hybrid approach required.
1. **Static Analysis:** Regex patterns for known jailbreak signatures (e.g., "Ignore previous instructions").
2. **Vector Similarity:** Cosine similarity check against a "Known Threat" vector database.
* **Action:** Redaction or Rejection.

### The Logic Flow (Pseudo-Code)
```text
FUNCTION Sanitize_Input(User_Text):
// Layer 1: Static
IF Regex_Match(User_Text, Threat_Patterns):
Return Block_Request("Static Signature Detected")

// Layer 2: Semantic
Similarity_Score = Vector_DB.Compare(User_Text, Threat_Embeddings)
IF Similarity_Score > 0.85:
Return Block_Request("Semantic Injection Attempt")

Return Forward_To_LLM()
```
---

### **Image 2: Control Spec 3 (State Verification)**
<p align="center">
<img src="assets/AI%20Cryptographic%20State%20Verification%20Workflow%20Control%20Spec%203.png" alt="Cryptographic State Verification Workflow" width="100%">
</p>

## Control Spec 3: Cryptographic State Verification (Pillar 2)
**Objective:** Detect "Memory Poisoning" and "Semantic Drift" in RAG systems and Long-Term Memory.

### The Requirement
Agents that utilize long-term memory or vector stores must be auditable for integrity. The system must be able to prove that the agent's "Knowledge Base" has not been altered by an unauthorized entity.

### Implementation Standard
* **Method:** SHA-256 Hashing of Artifacts.
* **Trigger:** Executed before critical "Write" operations or high-stakes "Read" operations.
* **Validation:** The current hash of the Vector Index or Knowledge Graph must match the "Last Known Good" hash stored in the immutable ledger.

### The Logic Flow (Pseudo-Code)
```text
FUNCTION Verify_State(Memory_Volume):
Current_Hash = SHA256(Memory_Volume)
Trusted_Hash = Ledger.Get_Last_Commit(Memory_Volume_ID)

IF Current_Hash != Trusted_Hash:
Trigger_Alert("Integrity Violation: Memory Drift Detected")
Lock_Agent(Agent_ID) // Fail-Secure
Require_Human_Override()
ELSE:
Proceed()
```
---

### **Image 3: Control Spec 4 & 5 (NHI & Swarms)**
<p align="center">
<img src="assets/Control%20Spec%204%20%26%205%20(NHI%20%26%20Swarms)%20v2.png" alt="Control Spec 4 & 5 NHI and Swarms" width="100%">
</p>

## Control Spec 4: NHI Ephemeral Binding (Pillar 1 & 2)
**Objective:** Prevent "Stale Credential" abuse and lateral movement by Non-Human Identities (NHI).

### The Requirement
Agent identities must not use static, long-lived API keys (e.g., a hardcoded `.env` variable that lasts for years). Authentication must be bound to the specific *session* or *task*.

### Implementation Standard
* **Method:** Token Exchange / Short-Lived Certificates (TTL < 60 minutes).
* **Binding:** Credentials must be cryptographically bound to the Agent's unique Instance ID.
* **Rotation:** Automated rotation upon task completion or failure.

### The Logic Flow (Pseudo-Code)
```text
FUNCTION Mint_Agent_Credential(Agent_Manifest):
// 1. Verify Agent Integrity
Verify_Signature(Agent_Manifest)

// 2. Issue Short-Lived Token
Token = Vault.Issue(
Role="Agent_Finance",
TTL="300s", // 5 Minutes
Scopes=["read_invoice"] // Least Privilege
)

Return Token
```
## Control Spec 5: Swarm Consensus Validation (Pillar 4)
**Objective:** Prevent "Consensus Hijacking" in Multi-Agent Systems.

### The Requirement
In a swarm architecture, no single agent should have unilateral authority to execute a high-risk action. Decisions must be validated by a "Quorum" or a "Supervisor Node" before execution.

### Implementation Standard
* **Threshold:** Defined as N/M signatures required (e.g., 2 of 3 agents must agree).
* **Topology Check:** Ensure the request originated from a valid node within the mapped swarm topology, not an external injector.

### The Logic Flow (Pseudo-Code)
```text
FUNCTION Execute_Swarm_Action(Proposed_Action):
Signatures = Collect_Votes(Swarm_Nodes)

IF Count(Signatures) < Quorum_Threshold:
Return Block_Request("Consensus Failed: Insufficient Votes")

IF Verify_Topology(Signatures) == False:
Return Block_Request("Topology Violation: Unauthorized Node")

Return Execute(Proposed_Action)
```


---

### **Image 4: Implementation Matrix**

## Implementation Matrix

| Control Level | Circuit Breaker | Sanitization | State Verification | NHI Management |
| :--- | :--- | :--- | :--- | :--- |
| **Basic (DIY)** | Python `time.sleep()` | Basic Regex Blacklist | File timestamp checks | Static `.env` files |
| **Standard (Toolkit)** | Redis-backed Token Counter | Vector-based Threat Match | SHA-256 File Hashing | Automated Key Rotation |
| **Enterprise (Engineered)** | **API Gateway Interception** | **Zero-Dwell Inspection** | **Blockchain/Ledger Proof** | **Ephemeral Service Mesh** |

*Note: The **AI SAFE² Implementation Toolkit** provides the code patterns for the "Standard" level controls. "Enterprise" controls require custom architecture integration via the **Digital Shield Program**.*