edgee-ai · SachaMorard · Feb 3, 2026 · Feb 3, 2026 · Feb 3, 2026
@@ -2,7 +2,7 @@
   "$schema": "https://mintlify.com/docs.json",
   "theme": "almond",
   "name": "Edgee documentation",
-  "description": "Edgee is a unified AI Gateway that gives you control over your LLM infrastructure.",
+  "description": "Edgee is an edge-native AI Gateway that reduces LLM costs by up to 50% through token compression and intelligent routing.",
   "colors": {
     "primary": "#8924A6",
     "light": "#C876FA",
@@ -79,7 +79,10 @@
               {
                 "group": "Features",
                 "pages": [
-                  "features/overview"
+                  "features/overview",
+                  "features/token-compression",
+                  "features/observability",
+                  "features/automatic-model-selection"
                 ]
               },
               {

@@ -1,9 +1,206 @@
 ---
 title: Automatic Model Selection
-description: Discover the Automatic Model Selection feature.
+description: Intelligent routing that optimizes for cost, performance, or both.
 icon: circuit-board
 ---
 
+Edgee's automatic model selection routes requests to the optimal model based on your priorities. Combined with token compression, it can reduce total AI costs by 60-70%.
+
+## Cost-Aware Routing
+
+Let Edgee automatically select the cheapest model that meets your quality requirements:
+
+```typescript
+const response = await edgee.send({
+  model: 'auto', // Enable automatic selection
+  strategy: 'cost', // Optimize for lowest cost
+  input: 'What is the capital of France?',
+  quality_threshold: 0.95, // Only use models with 95%+ quality score
+});
+
+console.log(`Model used: ${response.model}`); // e.g., "gpt-5.2"
+console.log(`Cost: $${response.cost.toFixed(4)}`);
+console.log(`Tokens saved (compression): ${response.usage.saved_tokens}`);
+```
+
+**How it works:**
+1. Analyze the request complexity and requirements
+2. Filter models that meet your quality threshold
+3. Route to the cheapest model after token compression
+4. Track savings from both compression and routing
+
+**Typical savings:**
+- Simple queries: Route to GPT-4o-mini or Claude Haiku (60-80% cheaper)
+- Complex tasks: Route to mid-tier models like GPT-4o or Claude 3.5 Sonnet
+- Specialized needs: Route to task-specific models (coding, vision, etc.)
+
+Combined with compression, you can save 60-70% on total AI costs.
+
+<Note>
+  Quality thresholds are based on benchmark performance across standard tasks. You can customize thresholds per request or set defaults per project.
+</Note>
+
+## Performance-Optimized Routing
+
+Route to the fastest model when latency matters more than cost:
+
+```typescript
+const response = await edgee.send({
+  model: 'auto',
+  strategy: 'performance', // Optimize for speed
+  input: 'Generate a summary of this document...',
+  max_latency_ms: 2000, // Must respond in under 2s
+});
+
+console.log(`Model used: ${response.model}`); // e.g., "gpt-4o"
+console.log(`Latency: ${response.latency_ms}ms`);
+```
+
+**Performance routing considers:**
+- Model inference speed (tokens/second)
+- Provider API latency
+- Time to first token (TTFT)
+- Geographic proximity to provider
+
+## Balanced Strategy
+
+Find the optimal trade-off between cost and performance:
+
+```typescript
+const response = await edgee.send({
+  model: 'auto',
+  strategy: 'balanced',
+  input: 'Analyze this customer feedback...',
+  cost_budget: 0.01, // Max $0.01 per request
+  quality_threshold: 0.9, // 90% quality minimum
+});
+```
+
+**Balanced routing:**
+- Stays within your cost budget
+- Meets quality requirements
+- Optimizes for best performance within constraints
+- Automatically adjusts based on token compression
+
+## Automatic Failover
+
+When a provider fails, Edgee automatically retries with backup models:
+
+```typescript
+const response = await edgee.send({
+  model: 'gpt-4o',
+  fallback_models: ['claude-3.5-sonnet', 'gemini-pro'], // Backup chain
+  input: 'Your prompt here',
+});
+
+// If GPT-4o is unavailable, Edgee tries Claude 3.5, then Gemini
+console.log(`Model used: ${response.model}`);
+console.log(`Fallback used: ${response.fallback_used}`); // true/false
+```
+
+**Failover triggers:**
+- Rate limits (429 errors)
+- Provider outages (5xx errors)
+- Timeout errors
+- Model unavailability
+
+**Failover behavior:**
+- Instant retry with next model in chain
+- No additional latency (parallel health checks)
+- Preserves request context and compression
+- Logs failover events for monitoring
+
+## Cost + Compression Savings
+
+Automatic model selection works seamlessly with token compression for maximum savings:
+
+| Scenario | Without Edgee | With Compression Only | With Compression + Routing | **Total Savings** |
+|----------|---------------|----------------------|----------------------------|-------------------|
+| Simple Q&A | $0.10 (GPT-4o) | $0.05 (50% compression) | $0.02 (GPT-4o-mini + compression) | **80%** |
+| RAG Pipeline | $0.50 (GPT-4o) | $0.25 (50% compression) | $0.15 (GPT-4o + compression + routing) | **70%** |
+| Document Analysis | $1.00 (Claude Opus) | $0.50 (50% compression) | $0.30 (Claude Sonnet + compression) | **70%** |
+
+<Note>
+  Savings vary by use case. Track your actual savings using the [observability dashboard](/features/observability).
+</Note>
+
+## Route by Use Case
+
+Configure default routing strategies per use case:
+
+```typescript
+// RAG Q&A: Optimize for cost
+await edgee.routing.configure({
+  name: 'rag-qa',
+  strategy: 'cost',
+  allowed_models: ['gpt-5.2', 'gpt-5.1', 'claude-3.5-sonnet'],
+  quality_threshold: 0.9,
+});
+
+// Code generation: Optimize for performance
+await edgee.routing.configure({
+  name: 'code-gen',
+  strategy: 'performance',
+  allowed_models: ['gpt-4o', 'claude-3.5-sonnet'],
+  quality_threshold: 0.95,
+});
+
+// Then use per request
+const response = await edgee.send({
+  model: 'auto',
+  routing_profile: 'rag-qa', // Use pre-configured strategy
+  input: 'Answer based on these documents...',
+});
+```
+
+## Custom Routing Rules
+
+Define custom routing logic based on request properties:
+
+```typescript
+await edgee.routing.addRule({
+  name: 'route-by-length',
+  condition: {
+    token_count: { gt: 10000 }, // Requests over 10k tokens
+  },
+  action: {
+    models: ['claude-3.5-sonnet'], // Use Claude for long contexts
+    strategy: 'cost',
+  },
+});
+
+await edgee.routing.addRule({
+  name: 'route-critical-requests',
+  condition: {
+    metadata: { priority: 'high' }, // High-priority requests
+  },
+  action: {
+    models: ['gpt-4o', 'claude-opus'], // Use premium models
+    strategy: 'performance',
+  },
+});
+```
+
+## What's Next
+
+<CardGroup cols={2}>
+  <Card title="Token Compression" icon="dollar-sign" iconType="duotone" href="/features/token-compression">
+    Learn how compression reduces costs by up to 50% before routing.
+  </Card>
+
+  <Card title="Observability" icon="chart-line" iconType="duotone" href="/features/observability">
+    Track routing decisions, costs, and compression savings.
+  </Card>
+
+  <Card title="Quick Start" icon="rocket" iconType="duotone" href="/quickstart">
+    Get started with automatic model selection in 5 minutes.
+  </Card>
+
+  <Card title="API Reference" icon="code" iconType="duotone" href="/api-reference">
+    Explore the full API for routing configuration.
+  </Card>
+</CardGroup>
+
 <Warning>
-This feature page is still under construction. We're working on it and will be published soon.
+This feature is under active development. Some routing strategies and configuration options may be added in future releases.
 </Warning>