Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,7 @@
"pages": [
"features/overview",
"features/token-compression",
"features/observability",
"features/automatic-model-selection"
"features/observability"
]
},
{
Expand Down
13 changes: 8 additions & 5 deletions features/automatic-model-selection.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ icon: circuit-board

Edgee's automatic model selection routes requests to the optimal model based on your priorities. Combined with token compression, it can reduce total AI costs by 60-70%.

<Warning>
This feature is under active development. Some routing strategies and configuration options may be added in future releases.
</Warning>

## Cost-Aware Routing

Let Edgee automatically select the cheapest model that meets your quality requirements:
Expand All @@ -19,8 +23,9 @@ const response = await edgee.send({
});

console.log(`Model used: ${response.model}`); // e.g., "gpt-5.2"
console.log(`Cost: $${response.cost.toFixed(4)}`);
console.log(`Tokens saved (compression): ${response.usage.saved_tokens}`);
if (response.compression) {
console.log(`Tokens saved: ${response.compression.saved_tokens}`);
}
```

**How it works:**
Expand Down Expand Up @@ -201,6 +206,4 @@ await edgee.routing.addRule({
</Card>
</CardGroup>

<Warning>
This feature is under active development. Some routing strategies and configuration options may be added in future releases.
</Warning>

63 changes: 23 additions & 40 deletions features/observability.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,31 +6,37 @@ icon: eye

Edgee provides complete visibility into your AI infrastructure with real-time metrics on costs, token usage, compression savings, performance, and errors. Every request is tracked and exportable for analysis, budgeting, and optimization.

## Cost Tracking
## Token Usage Tracking

Every Edgee response includes detailed cost information so you can track spending in real-time:
Every Edgee response includes detailed token usage information for tracking and cost analysis:

```typescript
const response = await edgee.send({
model: 'gpt-4o',
input: 'Your prompt here',
});

console.log(response.cost); // Total cost in USD (e.g., 0.0234)
console.log(response.usage.prompt_tokens); // Compressed input tokens
console.log(response.usage.completion_tokens); // Output tokens
console.log(response.usage.total_tokens); // Total for billing

// Compression savings (when applied)
if (response.compression) {
console.log(response.compression.input_tokens); // Original tokens
console.log(response.compression.saved_tokens); // Tokens saved
console.log(`${(response.compression.rate * 100).toFixed(1)}%`); // Compression rate
}
```

**Track spending by:**
**Track usage by:**
- Model (GPT-4o vs Claude vs Gemini)
- Project or application
- Environment (production vs staging)
- User or tenant (for multi-tenant apps)
- Time period (daily, weekly, monthly)

<Note>
Costs are calculated using real-time provider pricing. Edgee automatically handles rate changes and updates your historical data accordingly.
Use token usage data with provider pricing to calculate costs. The Edgee dashboard automatically calculates costs based on real-time provider pricing.
</Note>

## Request Tags for Analytics
Expand Down Expand Up @@ -143,13 +149,13 @@ If you're using the OpenAI or Anthropic SDKs with Edgee, add tags via the `x-edg
**Common tagging strategies:**

<CardGroup cols={2}>
<Card icon="layer-group" iconType="duotone">
<Card icon="package" iconType="duotone">
**Environment tagging**

Tag by environment: `production`, `staging`, `development`
</Card>

<Card icon="puzzle-piece" iconType="duotone">
<Card icon="tag" iconType="duotone">
**Feature tagging**

Tag by feature: `chat`, `summarization`, `code-generation`, `rag-qa`
Expand Down Expand Up @@ -180,13 +186,16 @@ See exactly how much token compression is saving you on every request:
const response = await edgee.send({
model: 'gpt-4o',
input: 'Long prompt with lots of context...',
enable_compression: true,
});

// Compression details
console.log(response.usage.prompt_tokens_original); // Original token count
console.log(response.usage.prompt_tokens); // After compression
console.log(response.usage.saved_tokens); // Tokens saved
console.log(response.usage.compression_ratio); // Percentage reduction (e.g., 45%)
if (response.compression) {
console.log(response.compression.input_tokens); // Original token count
console.log(response.usage.prompt_tokens); // After compression
console.log(response.compression.saved_tokens); // Tokens saved
console.log(`${(response.compression.rate * 100).toFixed(1)}%`); // Compression rate (e.g., 61.0%)
}
```

**Analyze compression effectiveness:**
Expand All @@ -208,13 +217,13 @@ console.log(response.usage.compression_ratio); // Percentage reduction (e.g., 45
Track compression ratios over time to identify optimization opportunities
</Card>

<Card icon="layer-group" iconType="duotone">
<Card icon="layers" iconType="duotone">
**By use case**

Compare compression effectiveness across different prompt types
</Card>

<Card icon="ranking-star" iconType="duotone">
<Card icon="badge-dollar-sign" iconType="duotone">
**Top savers**

Identify which requests generate the highest savings
Expand Down Expand Up @@ -265,7 +274,7 @@ Understand how your AI infrastructure is being used:
- Cost per model over time
- Model switching patterns

## Alerts & Budgets
## Alerts & Budgets (Coming Soon)

Stay in control with proactive alerts:

Expand Down Expand Up @@ -324,32 +333,6 @@ const data = await edgee.analytics.export({
});
```

## Dashboard Views

The Edgee dashboard provides pre-built views for common use cases:

<CardGroup cols={2}>
<Card title="Cost Overview" icon="dollar-sign" iconType="duotone">
Track spending trends, compare models, and identify cost optimization opportunities.
</Card>

<Card title="Compression Analytics" icon="compress" iconType="duotone">
Monitor token savings, compression ratios, and cumulative cost reductions.
</Card>

<Card title="Performance" icon="gauge" iconType="duotone">
Analyze latency, throughput, error rates, and provider health across regions.
</Card>

<Card title="Usage Patterns" icon="chart-bar" iconType="duotone">
Understand request volume, model distribution, and usage trends over time.
</Card>
</CardGroup>

<Note>
Dashboard access is included with all Edgee plans. Enterprise customers can customize dashboards and create team-specific views.
</Note>

## What's Next

<CardGroup cols={2}>
Expand Down
125 changes: 114 additions & 11 deletions features/token-compression.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,106 @@ Token compression happens automatically on every request through a four-step pro
Compression is most effective for prompts with repeated context (RAG), long system instructions, or verbose multi-turn histories. Simple queries may see minimal compression.
</Note>

## Enabling Token Compression

Token compression can be enabled in three ways, giving you flexibility to control compression at the request, API key, or organization level:

### 1. Per Request (SDK)

Enable compression for specific requests using the SDK:

<Tabs>
<Tab title="TypeScript">
```typescript
const response = await edgee.send({
model: 'gpt-4o',
input: {
"messages": [
{"role": "user", "content": "Your prompt here"}
],
"enable_compression": true,
"compression_rate": 0.8 // Target 80% compression (optional)
}
});
```
</Tab>

<Tab title="Python">
```python
response = edgee.send(
model="gpt-4o",
input={
"messages": [
{"role": "user", "content": "Your prompt here"}
],
"enable_compression": True,
"compression_rate": 0.8 # Target 80% compression (optional)
}
)
```
</Tab>

<Tab title="Go">
```go
response, err := client.Send("gpt-4o", edgee.InputObject{
Messages: []edgee.Message{
{Role: "user", Content: "Your prompt here"},
},
EnableCompression: true,
CompressionRate: 0.8, // Target 80% compression (optional)
})
```
</Tab>

<Tab title="Rust">
```rust
let input = InputObject::new(vec![Message::user("Your prompt here")])
.with_compression(true)
.with_compression_rate(0.8); // Target 80% compression (optional)

let response = client.send("gpt-4o", input).await?;
```
</Tab>
</Tabs>

### 2. Per API Key (Console)

Enable compression for specific API keys in your organization settings. This is useful when you want different compression settings for different applications or environments.

<Frame>
<img src="/images/compression-enabled-by-tag-light.png" alt="Enable compression for specific API keys" className="dark:hidden" />
<img src="/images/compression-enabled-by-tag-dark.png" alt="Enable compression for specific API keys" className="hidden dark:block" />
</Frame>

In the **Tools** section of your console:
1. Toggle **Enable token compression** on
2. Set your target **Compression rate** (0.7-0.9, default 0.75)
3. Under **Scope**, select **Apply to specific API keys**
4. Choose which API keys should use compression

### 3. Organization-Wide (Console)

Enable compression for all requests across your entire organization. This is the recommended setting for most users to maximize savings automatically.

<Frame>
<img src="/images/compression-enabled-org-light.png" alt="Enable compression organization-wide" className="dark:hidden" />
<img src="/images/compression-enabled-org-dark.png" alt="Enable compression organization-wide" className="hidden dark:block" />
</Frame>

In the **Tools** section of your console:
1. Toggle **Enable token compression** on
2. Set your target **Compression rate** (0.7-0.9, default 0.75)
3. Under **Scope**, select **Apply to all org requests**
4. All API keys will now use compression by default

<Tip>
**Compression rate** controls how aggressively Edgee compresses prompts. A higher rate (e.g., 0.9) attempts more compression but may be less effective, while a lower rate (e.g., 0.7) is more conservative. The default of 0.75 provides a good balance for most use cases.
</Tip>

<Note>
SDK-level configuration takes precedence over console settings. If you enable compression in your code with `enable_compression: true`, it will override the console configuration for that specific request.
</Note>

## When It Works Best

Token compression delivers the highest savings for these common use cases:
Expand Down Expand Up @@ -89,16 +189,19 @@ const documents = [
const response = await edgee.send({
model: 'gpt-4o',
input: `Answer the question based on these documents:\n\n${documents.join('\n\n')}\n\nQuestion: What is the main topic?`,
enable_compression: true, // Enable compression for this request
compression_rate: 0.8, // Target compression ratio (0-1, e.g., 0.8 = 80%)
});

console.log(response.text);

// Compression metrics
console.log(`Original tokens: ${response.usage.prompt_tokens_original}`);
console.log(`Compressed tokens: ${response.usage.prompt_tokens}`);
console.log(`Tokens saved: ${response.usage.saved_tokens}`);
console.log(`Compression ratio: ${response.usage.compression_ratio}%`);
console.log(`Request cost: $${response.cost.toFixed(4)}`);
if (response.compression) {
console.log(`Original tokens: ${response.compression.input_tokens}`);
console.log(`Compressed tokens: ${response.usage.prompt_tokens}`);
console.log(`Tokens saved: ${response.compression.saved_tokens}`);
console.log(`Compression rate: ${(response.compression.rate * 100).toFixed(1)}%`);
}
```

**Example output:**
Expand All @@ -107,7 +210,6 @@ Original tokens: 2,450
Compressed tokens: 1,225
Tokens saved: 1,225
Compression ratio: 50%
Request cost: $0.0184
```

## Real-World Savings
Expand Down Expand Up @@ -145,7 +247,7 @@ Here's what token compression means for your monthly AI bill:
<Accordion title="Configure compression per use case">
- Enable compression by default for all requests
- Compression happens automatically without configuration
- Track `compression_ratio` to understand effectiveness
- Track `compression.rate` to understand effectiveness
- Use response metrics to optimize prompt design
</Accordion>

Expand All @@ -162,14 +264,15 @@ Here's what token compression means for your monthly AI bill:
Every Edgee response includes detailed compression metrics:

```typescript
// Usage information
response.usage.prompt_tokens // Compressed token count (billed)
response.usage.prompt_tokens_original // Original token count (before compression)
response.usage.saved_tokens // Tokens saved by compression
response.usage.compression_ratio // Percentage reduction
response.usage.completion_tokens // Output tokens (unchanged)
response.usage.total_tokens // Total for billing calculation

response.cost // Total request cost in USD
// Compression information (when applied)
response.compression.input_tokens // Original token count (before compression)
response.compression.saved_tokens // Tokens saved by compression
response.compression.rate // Compression rate (0-1, e.g., 0.61 = 61%)
```

Use these fields to:
Expand Down
Binary file added images/compression-enabled-by-tag-dark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/compression-enabled-by-tag-light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/compression-enabled-org-dark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/compression-enabled-org-light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading