Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"$schema": "https://mintlify.com/docs.json",
"theme": "almond",
"name": "Edgee documentation",
"description": "Edgee is a unified AI Gateway that gives you control over your LLM infrastructure.",
"description": "Edgee is an edge-native AI Gateway that reduces LLM costs by up to 50% through token compression and intelligent routing.",
"colors": {
"primary": "#8924A6",
"light": "#C876FA",
Expand Down Expand Up @@ -79,7 +79,10 @@
{
"group": "Features",
"pages": [
"features/overview"
"features/overview",
"features/token-compression",
"features/observability",
"features/automatic-model-selection"
]
},
{
Expand Down
201 changes: 199 additions & 2 deletions features/automatic-model-selection.mdx
Original file line number Diff line number Diff line change
@@ -1,9 +1,206 @@
---
title: Automatic Model Selection
description: Discover the Automatic Model Selection feature.
description: Intelligent routing that optimizes for cost, performance, or both.
icon: circuit-board
---

Edgee's automatic model selection routes requests to the optimal model based on your priorities. Combined with token compression, it can reduce total AI costs by 60-70%.

## Cost-Aware Routing

Let Edgee automatically select the cheapest model that meets your quality requirements:

```typescript
const response = await edgee.send({
model: 'auto', // Enable automatic selection
strategy: 'cost', // Optimize for lowest cost
input: 'What is the capital of France?',
quality_threshold: 0.95, // Only use models with 95%+ quality score
});

console.log(`Model used: ${response.model}`); // e.g., "gpt-5.2"
console.log(`Cost: $${response.cost.toFixed(4)}`);
console.log(`Tokens saved (compression): ${response.usage.saved_tokens}`);
```

**How it works:**
1. Analyze the request complexity and requirements
2. Filter models that meet your quality threshold
3. Route to the cheapest model after token compression
4. Track savings from both compression and routing

**Typical savings:**
- Simple queries: Route to GPT-4o-mini or Claude Haiku (60-80% cheaper)
- Complex tasks: Route to mid-tier models like GPT-4o or Claude 3.5 Sonnet
- Specialized needs: Route to task-specific models (coding, vision, etc.)

Combined with compression, you can save 60-70% on total AI costs.

<Note>
Quality thresholds are based on benchmark performance across standard tasks. You can customize thresholds per request or set defaults per project.
</Note>

## Performance-Optimized Routing

Route to the fastest model when latency matters more than cost:

```typescript
const response = await edgee.send({
model: 'auto',
strategy: 'performance', // Optimize for speed
input: 'Generate a summary of this document...',
max_latency_ms: 2000, // Must respond in under 2s
});

console.log(`Model used: ${response.model}`); // e.g., "gpt-4o"
console.log(`Latency: ${response.latency_ms}ms`);
```

**Performance routing considers:**
- Model inference speed (tokens/second)
- Provider API latency
- Time to first token (TTFT)
- Geographic proximity to provider

## Balanced Strategy

Find the optimal trade-off between cost and performance:

```typescript
const response = await edgee.send({
model: 'auto',
strategy: 'balanced',
input: 'Analyze this customer feedback...',
cost_budget: 0.01, // Max $0.01 per request
quality_threshold: 0.9, // 90% quality minimum
});
```

**Balanced routing:**
- Stays within your cost budget
- Meets quality requirements
- Optimizes for best performance within constraints
- Automatically adjusts based on token compression

## Automatic Failover

When a provider fails, Edgee automatically retries with backup models:

```typescript
const response = await edgee.send({
model: 'gpt-4o',
fallback_models: ['claude-3.5-sonnet', 'gemini-pro'], // Backup chain
input: 'Your prompt here',
});

// If GPT-4o is unavailable, Edgee tries Claude 3.5, then Gemini
console.log(`Model used: ${response.model}`);
console.log(`Fallback used: ${response.fallback_used}`); // true/false
```

**Failover triggers:**
- Rate limits (429 errors)
- Provider outages (5xx errors)
- Timeout errors
- Model unavailability

**Failover behavior:**
- Instant retry with next model in chain
- No additional latency (parallel health checks)
- Preserves request context and compression
- Logs failover events for monitoring

## Cost + Compression Savings

Automatic model selection works seamlessly with token compression for maximum savings:

| Scenario | Without Edgee | With Compression Only | With Compression + Routing | **Total Savings** |
|----------|---------------|----------------------|----------------------------|-------------------|
| Simple Q&A | $0.10 (GPT-4o) | $0.05 (50% compression) | $0.02 (GPT-4o-mini + compression) | **80%** |
| RAG Pipeline | $0.50 (GPT-4o) | $0.25 (50% compression) | $0.15 (GPT-4o + compression + routing) | **70%** |
| Document Analysis | $1.00 (Claude Opus) | $0.50 (50% compression) | $0.30 (Claude Sonnet + compression) | **70%** |

<Note>
Savings vary by use case. Track your actual savings using the [observability dashboard](/features/observability).
</Note>

## Route by Use Case

Configure default routing strategies per use case:

```typescript
// RAG Q&A: Optimize for cost
await edgee.routing.configure({
name: 'rag-qa',
strategy: 'cost',
allowed_models: ['gpt-5.2', 'gpt-5.1', 'claude-3.5-sonnet'],
quality_threshold: 0.9,
});

// Code generation: Optimize for performance
await edgee.routing.configure({
name: 'code-gen',
strategy: 'performance',
allowed_models: ['gpt-4o', 'claude-3.5-sonnet'],
quality_threshold: 0.95,
});

// Then use per request
const response = await edgee.send({
model: 'auto',
routing_profile: 'rag-qa', // Use pre-configured strategy
input: 'Answer based on these documents...',
});
```

## Custom Routing Rules

Define custom routing logic based on request properties:

```typescript
await edgee.routing.addRule({
name: 'route-by-length',
condition: {
token_count: { gt: 10000 }, // Requests over 10k tokens
},
action: {
models: ['claude-3.5-sonnet'], // Use Claude for long contexts
strategy: 'cost',
},
});

await edgee.routing.addRule({
name: 'route-critical-requests',
condition: {
metadata: { priority: 'high' }, // High-priority requests
},
action: {
models: ['gpt-4o', 'claude-opus'], // Use premium models
strategy: 'performance',
},
});
```

## What's Next

<CardGroup cols={2}>
<Card title="Token Compression" icon="dollar-sign" iconType="duotone" href="/features/token-compression">
Learn how compression reduces costs by up to 50% before routing.
</Card>

<Card title="Observability" icon="chart-line" iconType="duotone" href="/features/observability">
Track routing decisions, costs, and compression savings.
</Card>

<Card title="Quick Start" icon="rocket" iconType="duotone" href="/quickstart">
Get started with automatic model selection in 5 minutes.
</Card>

<Card title="API Reference" icon="code" iconType="duotone" href="/api-reference">
Explore the full API for routing configuration.
</Card>
</CardGroup>

<Warning>
This feature page is still under construction. We're working on it and will be published soon.
This feature is under active development. Some routing strategies and configuration options may be added in future releases.
</Warning>
Loading