diff --git a/docs.json b/docs.json
index b19bf8b..cd6ad57 100644
--- a/docs.json
+++ b/docs.json
@@ -81,8 +81,7 @@
"pages": [
"features/overview",
"features/token-compression",
- "features/observability",
- "features/automatic-model-selection"
+ "features/observability"
]
},
{
diff --git a/features/automatic-model-selection.mdx b/features/automatic-model-selection.mdx
index b58bd66..067f5ad 100644
--- a/features/automatic-model-selection.mdx
+++ b/features/automatic-model-selection.mdx
@@ -6,6 +6,10 @@ icon: circuit-board
Edgee's automatic model selection routes requests to the optimal model based on your priorities. Combined with token compression, it can reduce total AI costs by 60-70%.
+
+This feature is under active development. Some routing strategies and configuration options may be added in future releases.
+
+
## Cost-Aware Routing
Let Edgee automatically select the cheapest model that meets your quality requirements:
@@ -19,8 +23,9 @@ const response = await edgee.send({
});
console.log(`Model used: ${response.model}`); // e.g., "gpt-5.2"
-console.log(`Cost: $${response.cost.toFixed(4)}`);
-console.log(`Tokens saved (compression): ${response.usage.saved_tokens}`);
+if (response.compression) {
+ console.log(`Tokens saved: ${response.compression.saved_tokens}`);
+}
```
**How it works:**
@@ -201,6 +206,4 @@ await edgee.routing.addRule({
-
-This feature is under active development. Some routing strategies and configuration options may be added in future releases.
-
+
diff --git a/features/observability.mdx b/features/observability.mdx
index 7cdad18..886c89d 100644
--- a/features/observability.mdx
+++ b/features/observability.mdx
@@ -6,9 +6,9 @@ icon: eye
Edgee provides complete visibility into your AI infrastructure with real-time metrics on costs, token usage, compression savings, performance, and errors. Every request is tracked and exportable for analysis, budgeting, and optimization.
-## Cost Tracking
+## Token Usage Tracking
-Every Edgee response includes detailed cost information so you can track spending in real-time:
+Every Edgee response includes detailed token usage information for tracking and cost analysis:
```typescript
const response = await edgee.send({
@@ -16,13 +16,19 @@ const response = await edgee.send({
input: 'Your prompt here',
});
-console.log(response.cost); // Total cost in USD (e.g., 0.0234)
console.log(response.usage.prompt_tokens); // Compressed input tokens
console.log(response.usage.completion_tokens); // Output tokens
console.log(response.usage.total_tokens); // Total for billing
+
+// Compression savings (when applied)
+if (response.compression) {
+ console.log(response.compression.input_tokens); // Original tokens
+ console.log(response.compression.saved_tokens); // Tokens saved
+ console.log(`${(response.compression.rate * 100).toFixed(1)}%`); // Compression rate
+}
```
-**Track spending by:**
+**Track usage by:**
- Model (GPT-4o vs Claude vs Gemini)
- Project or application
- Environment (production vs staging)
@@ -30,7 +36,7 @@ console.log(response.usage.total_tokens); // Total for billing
- Time period (daily, weekly, monthly)
- Costs are calculated using real-time provider pricing. Edgee automatically handles rate changes and updates your historical data accordingly.
+ Use token usage data with provider pricing to calculate costs. The Edgee dashboard automatically calculates costs based on real-time provider pricing.
## Request Tags for Analytics
@@ -143,13 +149,13 @@ If you're using the OpenAI or Anthropic SDKs with Edgee, add tags via the `x-edg
**Common tagging strategies:**
-
+
**Environment tagging**
Tag by environment: `production`, `staging`, `development`
-
+
**Feature tagging**
Tag by feature: `chat`, `summarization`, `code-generation`, `rag-qa`
@@ -180,13 +186,16 @@ See exactly how much token compression is saving you on every request:
const response = await edgee.send({
model: 'gpt-4o',
input: 'Long prompt with lots of context...',
+ enable_compression: true,
});
// Compression details
-console.log(response.usage.prompt_tokens_original); // Original token count
-console.log(response.usage.prompt_tokens); // After compression
-console.log(response.usage.saved_tokens); // Tokens saved
-console.log(response.usage.compression_ratio); // Percentage reduction (e.g., 45%)
+if (response.compression) {
+ console.log(response.compression.input_tokens); // Original token count
+ console.log(response.usage.prompt_tokens); // After compression
+ console.log(response.compression.saved_tokens); // Tokens saved
+ console.log(`${(response.compression.rate * 100).toFixed(1)}%`); // Compression rate (e.g., 61.0%)
+}
```
**Analyze compression effectiveness:**
@@ -208,13 +217,13 @@ console.log(response.usage.compression_ratio); // Percentage reduction (e.g., 45
Track compression ratios over time to identify optimization opportunities
-
+
**By use case**
Compare compression effectiveness across different prompt types
-
+
**Top savers**
Identify which requests generate the highest savings
@@ -265,7 +274,7 @@ Understand how your AI infrastructure is being used:
- Cost per model over time
- Model switching patterns
-## Alerts & Budgets
+## Alerts & Budgets (Coming Soon)
Stay in control with proactive alerts:
@@ -324,32 +333,6 @@ const data = await edgee.analytics.export({
});
```
-## Dashboard Views
-
-The Edgee dashboard provides pre-built views for common use cases:
-
-
-
- Track spending trends, compare models, and identify cost optimization opportunities.
-
-
-
- Monitor token savings, compression ratios, and cumulative cost reductions.
-
-
-
- Analyze latency, throughput, error rates, and provider health across regions.
-
-
-
- Understand request volume, model distribution, and usage trends over time.
-
-
-
-
- Dashboard access is included with all Edgee plans. Enterprise customers can customize dashboards and create team-specific views.
-
-
## What's Next
diff --git a/features/token-compression.mdx b/features/token-compression.mdx
index 2080833..9eeb1f9 100644
--- a/features/token-compression.mdx
+++ b/features/token-compression.mdx
@@ -40,6 +40,106 @@ Token compression happens automatically on every request through a four-step pro
Compression is most effective for prompts with repeated context (RAG), long system instructions, or verbose multi-turn histories. Simple queries may see minimal compression.
+## Enabling Token Compression
+
+Token compression can be enabled in three ways, giving you flexibility to control compression at the request, API key, or organization level:
+
+### 1. Per Request (SDK)
+
+Enable compression for specific requests using the SDK:
+
+
+
+ ```typescript
+ const response = await edgee.send({
+ model: 'gpt-4o',
+ input: {
+ "messages": [
+ {"role": "user", "content": "Your prompt here"}
+ ],
+ "enable_compression": true,
+ "compression_rate": 0.8 // Target 80% compression (optional)
+ }
+ });
+ ```
+
+
+
+ ```python
+ response = edgee.send(
+ model="gpt-4o",
+ input={
+ "messages": [
+ {"role": "user", "content": "Your prompt here"}
+ ],
+ "enable_compression": True,
+ "compression_rate": 0.8 # Target 80% compression (optional)
+ }
+ )
+ ```
+
+
+
+ ```go
+ response, err := client.Send("gpt-4o", edgee.InputObject{
+ Messages: []edgee.Message{
+ {Role: "user", Content: "Your prompt here"},
+ },
+ EnableCompression: true,
+ CompressionRate: 0.8, // Target 80% compression (optional)
+ })
+ ```
+
+
+
+ ```rust
+ let input = InputObject::new(vec![Message::user("Your prompt here")])
+ .with_compression(true)
+ .with_compression_rate(0.8); // Target 80% compression (optional)
+
+ let response = client.send("gpt-4o", input).await?;
+ ```
+
+
+
+### 2. Per API Key (Console)
+
+Enable compression for specific API keys in your organization settings. This is useful when you want different compression settings for different applications or environments.
+
+
+
+
+
+
+In the **Tools** section of your console:
+1. Toggle **Enable token compression** on
+2. Set your target **Compression rate** (0.7-0.9, default 0.75)
+3. Under **Scope**, select **Apply to specific API keys**
+4. Choose which API keys should use compression
+
+### 3. Organization-Wide (Console)
+
+Enable compression for all requests across your entire organization. This is the recommended setting for most users to maximize savings automatically.
+
+
+
+
+
+
+In the **Tools** section of your console:
+1. Toggle **Enable token compression** on
+2. Set your target **Compression rate** (0.7-0.9, default 0.75)
+3. Under **Scope**, select **Apply to all org requests**
+4. All API keys will now use compression by default
+
+
+ **Compression rate** controls how aggressively Edgee compresses prompts. A higher rate (e.g., 0.9) attempts more compression but may be less effective, while a lower rate (e.g., 0.7) is more conservative. The default of 0.75 provides a good balance for most use cases.
+
+
+
+ SDK-level configuration takes precedence over console settings. If you enable compression in your code with `enable_compression: true`, it will override the console configuration for that specific request.
+
+
## When It Works Best
Token compression delivers the highest savings for these common use cases:
@@ -89,16 +189,19 @@ const documents = [
const response = await edgee.send({
model: 'gpt-4o',
input: `Answer the question based on these documents:\n\n${documents.join('\n\n')}\n\nQuestion: What is the main topic?`,
+ enable_compression: true, // Enable compression for this request
+ compression_rate: 0.8, // Target compression ratio (0-1, e.g., 0.8 = 80%)
});
console.log(response.text);
// Compression metrics
-console.log(`Original tokens: ${response.usage.prompt_tokens_original}`);
-console.log(`Compressed tokens: ${response.usage.prompt_tokens}`);
-console.log(`Tokens saved: ${response.usage.saved_tokens}`);
-console.log(`Compression ratio: ${response.usage.compression_ratio}%`);
-console.log(`Request cost: $${response.cost.toFixed(4)}`);
+if (response.compression) {
+ console.log(`Original tokens: ${response.compression.input_tokens}`);
+ console.log(`Compressed tokens: ${response.usage.prompt_tokens}`);
+ console.log(`Tokens saved: ${response.compression.saved_tokens}`);
+ console.log(`Compression rate: ${(response.compression.rate * 100).toFixed(1)}%`);
+}
```
**Example output:**
@@ -107,7 +210,6 @@ Original tokens: 2,450
Compressed tokens: 1,225
Tokens saved: 1,225
Compression ratio: 50%
-Request cost: $0.0184
```
## Real-World Savings
@@ -145,7 +247,7 @@ Here's what token compression means for your monthly AI bill:
- Enable compression by default for all requests
- Compression happens automatically without configuration
- - Track `compression_ratio` to understand effectiveness
+ - Track `compression.rate` to understand effectiveness
- Use response metrics to optimize prompt design
@@ -162,14 +264,15 @@ Here's what token compression means for your monthly AI bill:
Every Edgee response includes detailed compression metrics:
```typescript
+// Usage information
response.usage.prompt_tokens // Compressed token count (billed)
-response.usage.prompt_tokens_original // Original token count (before compression)
-response.usage.saved_tokens // Tokens saved by compression
-response.usage.compression_ratio // Percentage reduction
response.usage.completion_tokens // Output tokens (unchanged)
response.usage.total_tokens // Total for billing calculation
-response.cost // Total request cost in USD
+// Compression information (when applied)
+response.compression.input_tokens // Original token count (before compression)
+response.compression.saved_tokens // Tokens saved by compression
+response.compression.rate // Compression rate (0-1, e.g., 0.61 = 61%)
```
Use these fields to:
diff --git a/images/compression-enabled-by-tag-dark.png b/images/compression-enabled-by-tag-dark.png
new file mode 100644
index 0000000..5687134
Binary files /dev/null and b/images/compression-enabled-by-tag-dark.png differ
diff --git a/images/compression-enabled-by-tag-light.png b/images/compression-enabled-by-tag-light.png
new file mode 100644
index 0000000..3ed6f53
Binary files /dev/null and b/images/compression-enabled-by-tag-light.png differ
diff --git a/images/compression-enabled-org-dark.png b/images/compression-enabled-org-dark.png
new file mode 100644
index 0000000..957011f
Binary files /dev/null and b/images/compression-enabled-org-dark.png differ
diff --git a/images/compression-enabled-org-light.png b/images/compression-enabled-org-light.png
new file mode 100644
index 0000000..7a1e36a
Binary files /dev/null and b/images/compression-enabled-org-light.png differ
diff --git a/integrations/anthropic-sdk.mdx b/integrations/anthropic-sdk.mdx
index e19169c..8c719c3 100644
--- a/integrations/anthropic-sdk.mdx
+++ b/integrations/anthropic-sdk.mdx
@@ -138,7 +138,7 @@ Stream responses for real-time token delivery:
## Cost Tracking & Compression
-Every Edgee response includes token compression metrics through the Anthropic API's `usage` field:
+Every Edgee response includes token compression metrics in a dedicated `compression` field:
@@ -158,15 +158,13 @@ Every Edgee response includes token compression metrics through the Anthropic AP
print(message.content[0].text)
- # Compression metrics
- usage = message.usage
- tokens_saved = usage.input_tokens_original - usage.input_tokens
- compression_ratio = (tokens_saved / usage.input_tokens_original) * 100
-
- print(f"Original input tokens: {usage.input_tokens_original}")
- print(f"Compressed input tokens: {usage.input_tokens}")
- print(f"Tokens saved: {tokens_saved}")
- print(f"Compression ratio: {compression_ratio:.1f}%")
+ # Compression metrics (if compression was applied)
+ if hasattr(message, 'compression') and message.compression:
+ compression = message.compression
+ print(f"Original input tokens: {compression.input_tokens}")
+ print(f"Compressed input tokens: {message.usage.input_tokens}")
+ print(f"Tokens saved: {compression.saved_tokens}")
+ print(f"Compression rate: {compression.rate * 100:.1f}%")
```
@@ -187,21 +185,20 @@ Every Edgee response includes token compression metrics through the Anthropic AP
console.log(message.content[0].text);
- // Compression metrics
- const usage = message.usage;
- const tokensSaved = usage.input_tokens_original - usage.input_tokens;
- const compressionRatio = (tokensSaved / usage.input_tokens_original) * 100;
-
- console.log(`Original input tokens: ${usage.input_tokens_original}`);
- console.log(`Compressed input tokens: ${usage.input_tokens}`);
- console.log(`Tokens saved: ${tokensSaved}`);
- console.log(`Compression ratio: ${compressionRatio.toFixed(1)}%`);
+ // Compression metrics (if compression was applied)
+ if (message.compression) {
+ const compression = message.compression;
+ console.log(`Original input tokens: ${compression.input_tokens}`);
+ console.log(`Compressed input tokens: ${message.usage.input_tokens}`);
+ console.log(`Tokens saved: ${compression.saved_tokens}`);
+ console.log(`Compression rate: ${(compression.rate * 100).toFixed(1)}%`);
+ }
```
- Edgee extends the Anthropic API response with `input_tokens_original` to show the token count before compression. All other fields remain standard Anthropic format.
+ Edgee extends the Anthropic API response with a `compression` field containing compression metrics (`input_tokens`, `saved_tokens`, `rate`). All standard Anthropic fields remain unchanged.
## Multi-Provider Access
diff --git a/integrations/openai-sdk.mdx b/integrations/openai-sdk.mdx
index 68ef2b7..88e040d 100644
--- a/integrations/openai-sdk.mdx
+++ b/integrations/openai-sdk.mdx
@@ -110,11 +110,12 @@ const completion = await openai.chat.completions.create({
console.log(completion.choices[0].message.content);
-// Access compression metrics
-const usage = completion.usage;
-console.log(`Tokens saved: ${usage.prompt_tokens_original - usage.prompt_tokens}`);
-console.log(`Compression ratio: ${((usage.prompt_tokens_original - usage.prompt_tokens) / usage.prompt_tokens_original * 100).toFixed(1)}%`);
-console.log(`Total tokens: ${usage.total_tokens}`);
+// Access compression metrics (if compression was applied)
+if (completion.compression) {
+ console.log(`Tokens saved: ${completion.compression.saved_tokens}`);
+ console.log(`Compression rate: ${(completion.compression.rate * 100).toFixed(1)}%`);
+}
+console.log(`Total tokens: ${completion.usage.total_tokens}`);
```
```python title="Python"
@@ -135,20 +136,17 @@ completion = client.chat.completions.create(
print(completion.choices[0].message.content)
-# Access compression metrics
-usage = completion.usage
-tokens_saved = usage.prompt_tokens_original - usage.prompt_tokens
-compression_ratio = (tokens_saved / usage.prompt_tokens_original) * 100
-
-print(f"Tokens saved: {tokens_saved}")
-print(f"Compression ratio: {compression_ratio:.1f}%")
-print(f"Total tokens: {usage.total_tokens}")
+# Access compression metrics (if compression was applied)
+if hasattr(completion, 'compression') and completion.compression:
+ print(f"Tokens saved: {completion.compression.saved_tokens}")
+ print(f"Compression rate: {completion.compression.rate * 100:.1f}%")
+print(f"Total tokens: {completion.usage.total_tokens}")
```
- Edgee extends the OpenAI API response with `prompt_tokens_original` to show the token count before compression. All other fields remain standard OpenAI format.
+ Edgee extends the OpenAI API response with a `compression` field containing compression metrics (`input_tokens`, `saved_tokens`, `rate`). All standard OpenAI fields remain unchanged.
## Advanced Usage
diff --git a/introduction.mdx b/introduction.mdx
index 6fdc24d..07b6d52 100644
--- a/introduction.mdx
+++ b/introduction.mdx
@@ -22,8 +22,9 @@ Edgee is an **AI Gateway** that reduces LLM costs by up to 50% through intellige
});
console.log(response.text);
- console.log(`Tokens saved: ${response.usage.saved_tokens}`);
- console.log(`Cost: $${response.cost.toFixed(4)}`);
+ if (response.compression) {
+ console.log(`Tokens saved: ${response.compression.saved_tokens}`);
+ }
```
@@ -39,8 +40,8 @@ Edgee is an **AI Gateway** that reduces LLM costs by up to 50% through intellige
)
print(response.text)
- print(f"Tokens saved: {response.usage.saved_tokens}")
- print(f"Cost: ${response.cost:.4f}")
+ if response.compression:
+ print(f"Tokens saved: {response.compression.saved_tokens}")
```
@@ -63,8 +64,9 @@ Edgee is an **AI Gateway** that reduces LLM costs by up to 50% through intellige
}
fmt.Println(response.Text())
- fmt.Printf("Tokens saved: %d\n", response.Usage.SavedTokens)
- fmt.Printf("Cost: $%.4f\n", response.Cost)
+ if response.Compression != nil {
+ fmt.Printf("Tokens saved: %d\n", response.Compression.SavedTokens)
+ }
}
```
@@ -77,8 +79,9 @@ Edgee is an **AI Gateway** that reduces LLM costs by up to 50% through intellige
let response = client.send("gpt-4o", "What is the capital of France?").await.unwrap();
println!("{}", response.text().unwrap_or(""));
- println!("Tokens saved: {}", response.usage.saved_tokens);
- println!("Cost: ${:.4}", response.cost);
+ if let Some(compression) = &response.compression {
+ println!("Tokens saved: {}", compression.saved_tokens);
+ }
```
diff --git a/introduction/faq.mdx b/introduction/faq.mdx
index 278fb09..c9e49c0 100644
--- a/introduction/faq.mdx
+++ b/introduction/faq.mdx
@@ -35,7 +35,7 @@ icon: message-circle-question-mark
- Multi-turn conversations with growing history
- Document analysis with redundant information
- Every response includes compression metrics (`saved_tokens`, `compression_ratio`) so you can track your savings in real-time.
+ Every response includes a `compression` field with metrics (`input_tokens`, `saved_tokens`, `rate`) so you can track your savings in real-time.
diff --git a/package-lock.json b/package-lock.json
index 87fdbb6..1a08bb4 100644
--- a/package-lock.json
+++ b/package-lock.json
@@ -5,7 +5,7 @@
"packages": {
"": {
"dependencies": {
- "mintlify": "^4.2.310"
+ "mintlify": "^4.2.314"
}
},
"node_modules/@alcalzone/ansi-tokenize": {
@@ -85,9 +85,9 @@
}
},
"node_modules/@babel/code-frame": {
- "version": "7.28.6",
- "resolved": "https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.28.6.tgz",
- "integrity": "sha512-JYgintcMjRiCvS8mMECzaEn+m3PfoQiyqukOMCCVQtoJGYJw8j/8LBJEiqkHLkfwCcs74E3pbAUFNg7d9VNJ+Q==",
+ "version": "7.29.0",
+ "resolved": "https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.29.0.tgz",
+ "integrity": "sha512-9NhCeYjq9+3uxgdtp20LSiJXJvN0FeCtNGpJxuMFZ1Kv3cWUNb6DOhJwUvcVCzKGR66cw4njwM6hrJLqgOwbcw==",
"license": "MIT",
"dependencies": {
"@babel/helper-validator-identifier": "^7.28.5",
@@ -982,18 +982,18 @@
}
},
"node_modules/@mintlify/cli": {
- "version": "4.0.914",
- "resolved": "https://registry.npmjs.org/@mintlify/cli/-/cli-4.0.914.tgz",
- "integrity": "sha512-L6Ls4qOedK0SkyZIBrfy8utQIt/fCyAX0sJO1yFoZjIPmGMYCSWHftIz9xXU2nWjV4924EoSg/b81+CBccZg6w==",
+ "version": "4.0.918",
+ "resolved": "https://registry.npmjs.org/@mintlify/cli/-/cli-4.0.918.tgz",
+ "integrity": "sha512-JwHE7Uhhog4xqbQmduuETZyWth/avASbGdDB1h9RhsprgIb7W8FR80YRRzUCCLUwz10M/dgqEaBEyr3/+RvKvg==",
"license": "Elastic-2.0",
"dependencies": {
"@inquirer/prompts": "7.9.0",
- "@mintlify/common": "1.0.694",
- "@mintlify/link-rot": "3.0.853",
- "@mintlify/models": "0.0.260",
- "@mintlify/prebuild": "1.0.830",
- "@mintlify/previewing": "4.0.886",
- "@mintlify/validation": "0.1.574",
+ "@mintlify/common": "1.0.697",
+ "@mintlify/link-rot": "3.0.856",
+ "@mintlify/models": "0.0.262",
+ "@mintlify/prebuild": "1.0.833",
+ "@mintlify/previewing": "4.0.889",
+ "@mintlify/validation": "0.1.576",
"adm-zip": "0.5.16",
"chalk": "5.2.0",
"color": "4.2.3",
@@ -1018,16 +1018,16 @@
}
},
"node_modules/@mintlify/common": {
- "version": "1.0.694",
- "resolved": "https://registry.npmjs.org/@mintlify/common/-/common-1.0.694.tgz",
- "integrity": "sha512-HiIL3+tZlFtrwcNHuIpjXQuBXWKIoVtIzPeFl08Pk88pLAHAphfCH2T4jBdEiaUPnx5BhDRko1HQ2T0uexRFeQ==",
+ "version": "1.0.697",
+ "resolved": "https://registry.npmjs.org/@mintlify/common/-/common-1.0.697.tgz",
+ "integrity": "sha512-xZ/arB2O60ncw+VPQg4jHqaY8huY2fhSTWvbSKSoJZyK+P7asMWXzNBCt0H6vcRf1rine4D2srlCA4ymCHnDHg==",
"license": "ISC",
"dependencies": {
"@asyncapi/parser": "3.4.0",
"@mintlify/mdx": "^3.0.4",
- "@mintlify/models": "0.0.260",
+ "@mintlify/models": "0.0.262",
"@mintlify/openapi-parser": "^0.0.8",
- "@mintlify/validation": "0.1.574",
+ "@mintlify/validation": "0.1.576",
"@sindresorhus/slugify": "2.2.0",
"@types/mdast": "4.0.4",
"acorn": "8.11.2",
@@ -1456,16 +1456,16 @@
}
},
"node_modules/@mintlify/link-rot": {
- "version": "3.0.853",
- "resolved": "https://registry.npmjs.org/@mintlify/link-rot/-/link-rot-3.0.853.tgz",
- "integrity": "sha512-zu6gr6RK7tY6WpD5/KJ3Q6zOgXlHfsJNHQp4F82cIaHDH8l4gnNGurnOW2RZcaLbDk+cj7UF3TrDbWT13RZgjg==",
+ "version": "3.0.856",
+ "resolved": "https://registry.npmjs.org/@mintlify/link-rot/-/link-rot-3.0.856.tgz",
+ "integrity": "sha512-4BFxaEJSqJtPM31zV+BD+bbXMnGWxYrtSa8ve3qFx7uSj1WYMVJ4Ag//xybmuzxPGlAN8vla8WXoLkz/6/gp3A==",
"license": "Elastic-2.0",
"dependencies": {
- "@mintlify/common": "1.0.694",
- "@mintlify/prebuild": "1.0.830",
- "@mintlify/previewing": "4.0.886",
+ "@mintlify/common": "1.0.697",
+ "@mintlify/prebuild": "1.0.833",
+ "@mintlify/previewing": "4.0.889",
"@mintlify/scraping": "4.0.522",
- "@mintlify/validation": "0.1.574",
+ "@mintlify/validation": "0.1.576",
"fs-extra": "11.1.0",
"unist-util-visit": "4.1.2"
},
@@ -1536,9 +1536,9 @@
}
},
"node_modules/@mintlify/models": {
- "version": "0.0.260",
- "resolved": "https://registry.npmjs.org/@mintlify/models/-/models-0.0.260.tgz",
- "integrity": "sha512-M7WpKC4ysrrc5M16fUPFBLbhmdxfOm3LsMeurhQJ7Jc4V8o8DCdqLKkGTs0PZEFPSKx34X1wCBp4YrDx3kBDNQ==",
+ "version": "0.0.262",
+ "resolved": "https://registry.npmjs.org/@mintlify/models/-/models-0.0.262.tgz",
+ "integrity": "sha512-9JNwnx1AtasQi3eP3yh/ffNgAB5ZS17jSE0IPa38QzBn4eMXoLvNQscPlhBp9krAYHpnOWm8VN0G5rV9lsgXvA==",
"license": "Elastic-2.0",
"dependencies": {
"axios": "1.13.2",
@@ -1583,15 +1583,15 @@
}
},
"node_modules/@mintlify/prebuild": {
- "version": "1.0.830",
- "resolved": "https://registry.npmjs.org/@mintlify/prebuild/-/prebuild-1.0.830.tgz",
- "integrity": "sha512-q696zAc5TvhKFddaIuygM8W20y/3J9D2R/EVsmzKmio9+IfWkHfY/h6WwfqVavHpYdgnFxh/sAZtA4pSb609QQ==",
+ "version": "1.0.833",
+ "resolved": "https://registry.npmjs.org/@mintlify/prebuild/-/prebuild-1.0.833.tgz",
+ "integrity": "sha512-RAnnVDplb1pdY1VzZDoJPd+unRKs0QJo/rrzMPw3ytf69iOS+Z3Ao00CSLPJm3ZQ65HOj0XFZ5qIfJHaoT6jpA==",
"license": "Elastic-2.0",
"dependencies": {
- "@mintlify/common": "1.0.694",
+ "@mintlify/common": "1.0.697",
"@mintlify/openapi-parser": "^0.0.8",
- "@mintlify/scraping": "4.0.555",
- "@mintlify/validation": "0.1.574",
+ "@mintlify/scraping": "4.0.558",
+ "@mintlify/validation": "0.1.576",
"chalk": "5.3.0",
"favicons": "7.2.0",
"front-matter": "4.0.2",
@@ -1605,12 +1605,12 @@
}
},
"node_modules/@mintlify/prebuild/node_modules/@mintlify/scraping": {
- "version": "4.0.555",
- "resolved": "https://registry.npmjs.org/@mintlify/scraping/-/scraping-4.0.555.tgz",
- "integrity": "sha512-YhxnlyirsKy4huUdUVBcPuPrIymbnu+hR9a9x0sullh7VKGEwPgxo0a0bqGSPdkoGJMBlXlJDOGLb2Ud0/gsdQ==",
+ "version": "4.0.558",
+ "resolved": "https://registry.npmjs.org/@mintlify/scraping/-/scraping-4.0.558.tgz",
+ "integrity": "sha512-CR8CBwrdcr4pQ3EHCLjuK0oet0Ag4Samwqpha3fGZ3FYad2Kaq1ZON7x89GosO4DiZTGQksBv2A9gpOVs0vpzg==",
"license": "Elastic-2.0",
"dependencies": {
- "@mintlify/common": "1.0.694",
+ "@mintlify/common": "1.0.697",
"@mintlify/openapi-parser": "^0.0.8",
"fs-extra": "11.1.1",
"hast-util-to-mdast": "10.1.0",
@@ -1786,14 +1786,14 @@
}
},
"node_modules/@mintlify/previewing": {
- "version": "4.0.886",
- "resolved": "https://registry.npmjs.org/@mintlify/previewing/-/previewing-4.0.886.tgz",
- "integrity": "sha512-WScIiiw/6QYm7CMSdfKqpW9skKDl/rw5eAk1FOzBlUdOWpVZhQ77OuWyltWaD2MepWw4Cj7UBbcYyDvsvPtyEA==",
+ "version": "4.0.889",
+ "resolved": "https://registry.npmjs.org/@mintlify/previewing/-/previewing-4.0.889.tgz",
+ "integrity": "sha512-SvVM+lzkDK2LM6G/d5mKMpIA1Kf3nlLzRhwJLDLdI01+Hq2uzKmyQvmWGWc4vqTTIZwTSNiYumbEzaMbzJx9uA==",
"license": "Elastic-2.0",
"dependencies": {
- "@mintlify/common": "1.0.694",
- "@mintlify/prebuild": "1.0.830",
- "@mintlify/validation": "0.1.574",
+ "@mintlify/common": "1.0.697",
+ "@mintlify/prebuild": "1.0.833",
+ "@mintlify/validation": "0.1.576",
"better-opn": "3.0.2",
"chalk": "5.2.0",
"chokidar": "3.5.3",
@@ -2433,13 +2433,13 @@
}
},
"node_modules/@mintlify/validation": {
- "version": "0.1.574",
- "resolved": "https://registry.npmjs.org/@mintlify/validation/-/validation-0.1.574.tgz",
- "integrity": "sha512-yqBiZJmP+7iHPiJ2h40MWO94f02Ajo/7PDve+V+mrWcw6OwetJErUdlYFsveiiPYk66HIM6BvFQBsQZt1NfSqQ==",
+ "version": "0.1.576",
+ "resolved": "https://registry.npmjs.org/@mintlify/validation/-/validation-0.1.576.tgz",
+ "integrity": "sha512-w3QWe2X2gj6oqA0jCTQialxIQz/ki+6ud0V0Y+gKAlLH5UsoIqBFCrXjFYZZLkBt/Md6wnNmADH71gjLh4481w==",
"license": "Elastic-2.0",
"dependencies": {
"@mintlify/mdx": "^3.0.4",
- "@mintlify/models": "0.0.260",
+ "@mintlify/models": "0.0.262",
"arktype": "2.1.27",
"js-yaml": "4.1.0",
"lcm": "0.0.3",
@@ -3144,12 +3144,12 @@
"license": "MIT"
},
"node_modules/@shikijs/core": {
- "version": "3.21.0",
- "resolved": "https://registry.npmjs.org/@shikijs/core/-/core-3.21.0.tgz",
- "integrity": "sha512-AXSQu/2n1UIQekY8euBJlvFYZIw0PHY63jUzGbrOma4wPxzznJXTXkri+QcHeBNaFxiiOljKxxJkVSoB3PjbyA==",
+ "version": "3.22.0",
+ "resolved": "https://registry.npmjs.org/@shikijs/core/-/core-3.22.0.tgz",
+ "integrity": "sha512-iAlTtSDDbJiRpvgL5ugKEATDtHdUVkqgHDm/gbD2ZS9c88mx7G1zSYjjOxp5Qa0eaW0MAQosFRmJSk354PRoQA==",
"license": "MIT",
"dependencies": {
- "@shikijs/types": "3.21.0",
+ "@shikijs/types": "3.22.0",
"@shikijs/vscode-textmate": "^10.0.2",
"@types/hast": "^3.0.4",
"hast-util-to-html": "^9.0.5"
@@ -3179,62 +3179,62 @@
}
},
"node_modules/@shikijs/engine-javascript": {
- "version": "3.21.0",
- "resolved": "https://registry.npmjs.org/@shikijs/engine-javascript/-/engine-javascript-3.21.0.tgz",
- "integrity": "sha512-ATwv86xlbmfD9n9gKRiwuPpWgPENAWCLwYCGz9ugTJlsO2kOzhOkvoyV/UD+tJ0uT7YRyD530x6ugNSffmvIiQ==",
+ "version": "3.22.0",
+ "resolved": "https://registry.npmjs.org/@shikijs/engine-javascript/-/engine-javascript-3.22.0.tgz",
+ "integrity": "sha512-jdKhfgW9CRtj3Tor0L7+yPwdG3CgP7W+ZEqSsojrMzCjD1e0IxIbwUMDDpYlVBlC08TACg4puwFGkZfLS+56Tw==",
"license": "MIT",
"dependencies": {
- "@shikijs/types": "3.21.0",
+ "@shikijs/types": "3.22.0",
"@shikijs/vscode-textmate": "^10.0.2",
"oniguruma-to-es": "^4.3.4"
}
},
"node_modules/@shikijs/engine-oniguruma": {
- "version": "3.21.0",
- "resolved": "https://registry.npmjs.org/@shikijs/engine-oniguruma/-/engine-oniguruma-3.21.0.tgz",
- "integrity": "sha512-OYknTCct6qiwpQDqDdf3iedRdzj6hFlOPv5hMvI+hkWfCKs5mlJ4TXziBG9nyabLwGulrUjHiCq3xCspSzErYQ==",
+ "version": "3.22.0",
+ "resolved": "https://registry.npmjs.org/@shikijs/engine-oniguruma/-/engine-oniguruma-3.22.0.tgz",
+ "integrity": "sha512-DyXsOG0vGtNtl7ygvabHd7Mt5EY8gCNqR9Y7Lpbbd/PbJvgWrqaKzH1JW6H6qFkuUa8aCxoiYVv8/YfFljiQxA==",
"license": "MIT",
"dependencies": {
- "@shikijs/types": "3.21.0",
+ "@shikijs/types": "3.22.0",
"@shikijs/vscode-textmate": "^10.0.2"
}
},
"node_modules/@shikijs/langs": {
- "version": "3.21.0",
- "resolved": "https://registry.npmjs.org/@shikijs/langs/-/langs-3.21.0.tgz",
- "integrity": "sha512-g6mn5m+Y6GBJ4wxmBYqalK9Sp0CFkUqfNzUy2pJglUginz6ZpWbaWjDB4fbQ/8SHzFjYbtU6Ddlp1pc+PPNDVA==",
+ "version": "3.22.0",
+ "resolved": "https://registry.npmjs.org/@shikijs/langs/-/langs-3.22.0.tgz",
+ "integrity": "sha512-x/42TfhWmp6H00T6uwVrdTJGKgNdFbrEdhaDwSR5fd5zhQ1Q46bHq9EO61SCEWJR0HY7z2HNDMaBZp8JRmKiIA==",
"license": "MIT",
"dependencies": {
- "@shikijs/types": "3.21.0"
+ "@shikijs/types": "3.22.0"
}
},
"node_modules/@shikijs/themes": {
- "version": "3.21.0",
- "resolved": "https://registry.npmjs.org/@shikijs/themes/-/themes-3.21.0.tgz",
- "integrity": "sha512-BAE4cr9EDiZyYzwIHEk7JTBJ9CzlPuM4PchfcA5ao1dWXb25nv6hYsoDiBq2aZK9E3dlt3WB78uI96UESD+8Mw==",
+ "version": "3.22.0",
+ "resolved": "https://registry.npmjs.org/@shikijs/themes/-/themes-3.22.0.tgz",
+ "integrity": "sha512-o+tlOKqsr6FE4+mYJG08tfCFDS+3CG20HbldXeVoyP+cYSUxDhrFf3GPjE60U55iOkkjbpY2uC3It/eeja35/g==",
"license": "MIT",
"dependencies": {
- "@shikijs/types": "3.21.0"
+ "@shikijs/types": "3.22.0"
}
},
"node_modules/@shikijs/transformers": {
- "version": "3.21.0",
- "resolved": "https://registry.npmjs.org/@shikijs/transformers/-/transformers-3.21.0.tgz",
- "integrity": "sha512-CZwvCWWIiRRiFk9/JKzdEooakAP8mQDtBOQ1TKiCaS2E1bYtyBCOkUzS8akO34/7ufICQ29oeSfkb3tT5KtrhA==",
+ "version": "3.22.0",
+ "resolved": "https://registry.npmjs.org/@shikijs/transformers/-/transformers-3.22.0.tgz",
+ "integrity": "sha512-E7eRV7mwDBjueLF6852n2oYeJYxBq3NSsDk+uyruYAXONv4U8holGmIrT+mPRJQ1J1SNOH6L8G19KRzmBawrFw==",
"license": "MIT",
"dependencies": {
- "@shikijs/core": "3.21.0",
- "@shikijs/types": "3.21.0"
+ "@shikijs/core": "3.22.0",
+ "@shikijs/types": "3.22.0"
}
},
"node_modules/@shikijs/twoslash": {
- "version": "3.21.0",
- "resolved": "https://registry.npmjs.org/@shikijs/twoslash/-/twoslash-3.21.0.tgz",
- "integrity": "sha512-iH360udAYON2JwfIldoCiMZr9MljuQA5QRBivKLpEuEpmVCSwrR+0WTQ0eS1ptgGBdH9weFiIsA5wJDzsEzTYg==",
+ "version": "3.22.0",
+ "resolved": "https://registry.npmjs.org/@shikijs/twoslash/-/twoslash-3.22.0.tgz",
+ "integrity": "sha512-GO27UPN+kegOMQvC+4XcLt0Mttyg+n16XKjmoKjdaNZoW+sOJV7FLdv2QKauqUDws6nE3EQPD+TFHEdyyoUBDw==",
"license": "MIT",
"dependencies": {
- "@shikijs/core": "3.21.0",
- "@shikijs/types": "3.21.0",
+ "@shikijs/core": "3.22.0",
+ "@shikijs/types": "3.22.0",
"twoslash": "^0.3.6"
},
"peerDependencies": {
@@ -3242,9 +3242,9 @@
}
},
"node_modules/@shikijs/types": {
- "version": "3.21.0",
- "resolved": "https://registry.npmjs.org/@shikijs/types/-/types-3.21.0.tgz",
- "integrity": "sha512-zGrWOxZ0/+0ovPY7PvBU2gIS9tmhSUUt30jAcNV0Bq0gb2S98gwfjIs1vxlmH5zM7/4YxLamT6ChlqqAJmPPjA==",
+ "version": "3.22.0",
+ "resolved": "https://registry.npmjs.org/@shikijs/types/-/types-3.22.0.tgz",
+ "integrity": "sha512-491iAekgKDBFE67z70Ok5a8KBMsQ2IJwOWw3us/7ffQkIBCyOQfm/aNwVMBUriP02QshIfgHCBSIYAl3u2eWjg==",
"license": "MIT",
"dependencies": {
"@shikijs/vscode-textmate": "^10.0.2",
@@ -3798,9 +3798,9 @@
}
},
"node_modules/@types/node": {
- "version": "25.1.0",
- "resolved": "https://registry.npmjs.org/@types/node/-/node-25.1.0.tgz",
- "integrity": "sha512-t7frlewr6+cbx+9Ohpl0NOTKXZNV9xHRmNOvql47BFJKcEG1CxtxlPEEe+gR9uhVWM4DwhnvTF110mIL4yP9RA==",
+ "version": "25.2.0",
+ "resolved": "https://registry.npmjs.org/@types/node/-/node-25.2.0.tgz",
+ "integrity": "sha512-DZ8VwRFUNzuqJ5khrvwMXHmvPe+zGayJhr2CDNiKB1WBE1ST8Djl00D0IC4vvNmHMdj6DlbYRIaFE7WHjlDl5w==",
"license": "MIT",
"peer": true,
"dependencies": {
@@ -9557,12 +9557,12 @@
}
},
"node_modules/mintlify": {
- "version": "4.2.310",
- "resolved": "https://registry.npmjs.org/mintlify/-/mintlify-4.2.310.tgz",
- "integrity": "sha512-1FBt5x1BSk0z+Shk4owGfDpEZhXsN68me1qNSZ8F6LyEXFuflADNGpQUYX36GGeGrbZoDbY76WLtiQLv7XPmtg==",
+ "version": "4.2.314",
+ "resolved": "https://registry.npmjs.org/mintlify/-/mintlify-4.2.314.tgz",
+ "integrity": "sha512-G7KSCSkPyNq377ooSSQomggCDWsDoEeG+SyB1K2eKW5sH0SnSKyaCdPYxMLJF05+lGgYGSmPJ3mGqgqQC64/Rg==",
"license": "Elastic-2.0",
"dependencies": {
- "@mintlify/cli": "4.0.914"
+ "@mintlify/cli": "4.0.918"
},
"bin": {
"mint": "index.js",
@@ -11528,17 +11528,17 @@
}
},
"node_modules/shiki": {
- "version": "3.21.0",
- "resolved": "https://registry.npmjs.org/shiki/-/shiki-3.21.0.tgz",
- "integrity": "sha512-N65B/3bqL/TI2crrXr+4UivctrAGEjmsib5rPMMPpFp1xAx/w03v8WZ9RDDFYteXoEgY7qZ4HGgl5KBIu1153w==",
+ "version": "3.22.0",
+ "resolved": "https://registry.npmjs.org/shiki/-/shiki-3.22.0.tgz",
+ "integrity": "sha512-LBnhsoYEe0Eou4e1VgJACes+O6S6QC0w71fCSp5Oya79inkwkm15gQ1UF6VtQ8j/taMDh79hAB49WUk8ALQW3g==",
"license": "MIT",
"dependencies": {
- "@shikijs/core": "3.21.0",
- "@shikijs/engine-javascript": "3.21.0",
- "@shikijs/engine-oniguruma": "3.21.0",
- "@shikijs/langs": "3.21.0",
- "@shikijs/themes": "3.21.0",
- "@shikijs/types": "3.21.0",
+ "@shikijs/core": "3.22.0",
+ "@shikijs/engine-javascript": "3.22.0",
+ "@shikijs/engine-oniguruma": "3.22.0",
+ "@shikijs/langs": "3.22.0",
+ "@shikijs/themes": "3.22.0",
+ "@shikijs/types": "3.22.0",
"@shikijs/vscode-textmate": "^10.0.2",
"@types/hast": "^3.0.4"
}
diff --git a/package.json b/package.json
index b42ef1b..45ccb81 100644
--- a/package.json
+++ b/package.json
@@ -4,6 +4,6 @@
"links": "mintlify broken-links"
},
"dependencies": {
- "mintlify": "^4.2.310"
+ "mintlify": "^4.2.314"
}
}
diff --git a/sdk/go/send.mdx b/sdk/go/send.mdx
index ea3198a..91f8b9f 100644
--- a/sdk/go/send.mdx
+++ b/sdk/go/send.mdx
@@ -43,6 +43,8 @@ When `input` is an `InputObject`, you have full control over the conversation:
| `Tools` | `[]Tool` | Array of function tools available to the model |
| `ToolChoice` | `any` | Controls which tool (if any) the model should call. Can be `string` (`"auto"`, `"none"`) or `map[string]interface{}`. See [Tools documentation](/sdk/go/tools) for details |
| `Tags` | `[]string` | Optional tags to categorize and label the request for analytics and filtering. Can also be sent via the `x-edgee-tags` header (comma-separated) |
+| `EnableCompression` | `bool` | Enable token compression for this request. If `true`, the request will be compressed to the compression rate specified in the API key settings. If `false`, the request will not be compressed. |
+| `CompressionRate` | `float64` | The compression rate to use for this request. If `EnableCompression` is `true`, this value will be used to compress the request. The value should be between 0.0 and 1.0. The default value is 0.75. |
**Example with InputObject:**
@@ -151,6 +153,7 @@ The `Send()` method returns `(SendResponse, error)`. On success, the `SendRespon
| `Model` | `string` | Model identifier used for the completion |
| `Choices` | `[]Choice` | Array of completion choices (typically one) |
| `Usage` | `*Usage` | Token usage information (if provided by the API) |
+| `Compression` | `*Compression` | Token compression metrics (if compression was applied) |
### Choice Object
@@ -195,7 +198,7 @@ Token usage information (when available):
| Property | Type | Description |
|----------|------|-------------|
-| `PromptTokens` | `int` | Number of tokens in the prompt |
+| `PromptTokens` | `int` | Number of tokens in the prompt (after compression if applied) |
| `CompletionTokens` | `int` | Number of tokens in the completion |
| `TotalTokens` | `int` | Total tokens used (prompt + completion) |
@@ -214,6 +217,41 @@ if response.Usage != nil {
}
```
+### Compression Object
+
+Token compression metrics (when compression is applied):
+
+| Property | Type | Description |
+|----------|------|-------------|
+| `InputTokens` | `int` | Original number of input tokens before compression |
+| `SavedTokens` | `int` | Number of tokens saved by compression |
+| `Rate` | `float64` | Compression rate as a decimal (0-1). For example, `0.61` means 61% compression |
+
+**Example - Accessing Compression Metrics:**
+
+```go
+response, err := client.Send("gpt-4o", edgee.InputObject{
+ Messages: []edgee.Message{
+ {Role: "user", Content: "Analyze this long document with lots of context..."},
+ },
+ EnableCompression: true,
+ CompressionRate: 0.8,
+})
+if err != nil {
+ log.Fatal(err)
+}
+
+if response.Compression != nil {
+ fmt.Printf("Original input tokens: %d\n", response.Compression.InputTokens)
+ fmt.Printf("Tokens saved: %d\n", response.Compression.SavedTokens)
+ fmt.Printf("Compression rate: %.1f%%\n", response.Compression.Rate * 100)
+}
+```
+
+
+ The `Compression` object is only present when token compression is applied to the request. Simple queries may not trigger compression.
+
+
## Convenience Methods
The `SendResponse` struct provides convenience methods for easier access:
diff --git a/sdk/go/stream.mdx b/sdk/go/stream.mdx
index e6d0690..61f07e1 100644
--- a/sdk/go/stream.mdx
+++ b/sdk/go/stream.mdx
@@ -57,6 +57,8 @@ When `input` is an `InputObject` or `map[string]interface{}`, you have full cont
| `Tools` | `[]Tool` | Array of function tools available to the model |
| `ToolChoice` | `any` | Controls which tool (if any) the model should call. See [Tools documentation](/sdk/go/tools) for details |
| `Tags` | `[]string` | Optional tags to categorize and label the request for analytics and filtering. Can also be sent via the `x-edgee-tags` header (comma-separated) |
+| `EnableCompression` | `bool` | Enable token compression for this request. If `true`, the request will be compressed to the compression rate specified in the API key settings. If `false`, the request will not be compressed. |
+| `CompressionRate` | `float64` | The compression rate to use for this request. If `EnableCompression` is `true`, this value will be used to compress the request. The value should be between 0.0 and 1.0. The default value is 0.75. |
For details about `Message` type, see the [Send Method documentation](/sdk/go/send#message-object).
For details about `Tool` and `ToolChoice` types, see the [Tools documentation](/sdk/go/tools).
@@ -110,6 +112,7 @@ Each chunk received from the channel has the following structure:
| `Created` | `int64` | Unix timestamp of when the chunk was created |
| `Model` | `string` | Model identifier used for the completion |
| `Choices` | `[]StreamChoice` | Array of streaming choices (typically one) |
+| `Compression` | `*Compression` | Token compression metrics (if compression was applied) |
### StreamChoice Object
diff --git a/sdk/index.mdx b/sdk/index.mdx
index 60e4262..f8d71d6 100644
--- a/sdk/index.mdx
+++ b/sdk/index.mdx
@@ -4,7 +4,7 @@ description: Choose your SDK and start using Edgee.
icon: boxes
---
-Edgee provides official SDKs for TypeScript, Python, Go, and Rust. All SDKs offer a consistent, type-safe interface to interact with the Edgee AI Gateway, supporting OpenAI-compatible chat completions and function calling.
+Edgee provides official SDKs for TypeScript, Python, Go, and Rust. All SDKs offer a consistent, type-safe interface to interact with the Edgee AI Gateway, supporting OpenAI-compatible chat completions and function calling. All SDKs include built-in support for **token compression** and **cost tracking**, automatically reducing your LLM spend by up to 50%.
## Quick Start
@@ -27,7 +27,9 @@ Choose your language and get started in minutes:
});
console.log(response.text);
- // "The capital of France is Paris."
+ if (response.compression) {
+ console.log(`Tokens saved: ${response.compression.saved_tokens}`);
+ }
```
@@ -47,7 +49,8 @@ Choose your language and get started in minutes:
)
print(response.text)
- # "The capital of France is Paris."
+ if response.compression:
+ print(f"Tokens saved: {response.compression.saved_tokens}")
```
@@ -74,7 +77,9 @@ Choose your language and get started in minutes:
}
fmt.Println(response.Text())
- // "The capital of France is Paris."
+ if response.Compression != nil {
+ fmt.Printf("Tokens saved: %d\n", response.Compression.SavedTokens)
+ }
}
```
@@ -91,7 +96,9 @@ Choose your language and get started in minutes:
let response = client.send("gpt-4o", "What is the capital of France?").await.unwrap();
println!("{}", response.text().unwrap_or(""));
- // "The capital of France is Paris."
+ if let Some(compression) = &response.compression {
+ println!("Tokens saved: {}", compression.saved_tokens);
+ }
```
@@ -100,12 +107,14 @@ Choose your language and get started in minutes:
All SDKs provide consistent functionality:
+- **Token Compression**: Automatic prompt compression with savings reporting
+- **Compression Metrics**: Real-time token savings and compression rate for every request
- **OpenAI-compatible API**: Use familiar patterns across all languages
- **Function Calling**: Full support for tool/function calling
- **Type Safety**: Strong typing and autocomplete support
- **Error Handling**: Comprehensive error handling and validation
- **Environment Variables**: Support for `EDGEE_API_KEY`
-- **Token Usage**: Access to prompt, completion, and total token counts
+- **Token Usage**: Access to prompt, completion, saved tokens, and compression ratios
To learn more about the SDKs, see the individual SDK pages:
diff --git a/sdk/python/send.mdx b/sdk/python/send.mdx
index 20c71cd..9e28595 100644
--- a/sdk/python/send.mdx
+++ b/sdk/python/send.mdx
@@ -42,6 +42,8 @@ When `input` is an `InputObject` or dictionary, you have full control over the c
| `tools` | `list[dict] \| None` | Array of function tools available to the model |
| `tool_choice` | `str \| dict \| None` | Controls which tool (if any) the model should call. See [Tools documentation](/sdk/python/tools) for details |
| `tags` | `list[str] \| None` | Optional tags to categorize and label the request for analytics and filtering. Can also be sent via the `x-edgee-tags` header (comma-separated) |
+| `enable_compression` | `bool` | Enable token compression for this request. If `true`, the request will be compressed to the compression rate specified in the API key settings. If `false`, the request will not be compressed. |
+| `compression_rate` | `float` | The compression rate to use for this request. If `enable_compression` is `true`, this value will be used to compress the request. The value should be between 0.0 and 1.0. The default value is 0.75. |
**Example with Dictionary Input:**
@@ -124,6 +126,7 @@ The `send()` method returns a `SendResponse` object when `stream=False` (default
|----------|------|-------------|
| `choices` | `list[Choice]` | Array of completion choices (typically one) |
| `usage` | `Usage \| None` | Token usage information (if provided by the API) |
+| `compression` | `Compression \| None` | Token compression metrics (if compression was applied) |
### Choice Object
@@ -165,7 +168,7 @@ Token usage information (when available):
| Property | Type | Description |
|----------|------|-------------|
-| `prompt_tokens` | `int` | Number of tokens in the prompt |
+| `prompt_tokens` | `int` | Number of tokens in the prompt (after compression if applied) |
| `completion_tokens` | `int` | Number of tokens in the completion |
| `total_tokens` | `int` | Total tokens used (prompt + completion) |
@@ -183,6 +186,40 @@ if response.usage:
print(f"Total tokens: {response.usage.total_tokens}")
```
+### Compression Object
+
+Token compression metrics (when compression is applied):
+
+| Property | Type | Description |
+|----------|------|-------------|
+| `input_tokens` | `int` | Original number of input tokens before compression |
+| `saved_tokens` | `int` | Number of tokens saved by compression |
+| `rate` | `float` | Compression rate as a decimal (0-1). For example, `0.61` means 61% compression |
+
+**Example - Accessing Compression Metrics:**
+
+```python
+response = edgee.send(
+ model="gpt-4o",
+ input={
+ "messages": [
+ {"role": "user", "content": "Analyze this long document with lots of context..."}
+ ],
+ "enable_compression": True,
+ "compression_rate": 0.8
+ }
+)
+
+if response.compression:
+ print(f"Original input tokens: {response.compression.input_tokens}")
+ print(f"Tokens saved: {response.compression.saved_tokens}")
+ print(f"Compression rate: {response.compression.rate * 100:.1f}%")
+```
+
+
+ The `compression` object is only present when token compression is applied to the request. Simple queries may not trigger compression.
+
+
## Convenience Properties
The `SendResponse` class provides convenience properties for easier access:
diff --git a/sdk/python/stream.mdx b/sdk/python/stream.mdx
index 752b2b4..ec7f3d6 100644
--- a/sdk/python/stream.mdx
+++ b/sdk/python/stream.mdx
@@ -40,6 +40,8 @@ When `input` is an `InputObject` or dictionary, you have full control over the c
| `tools` | `list[dict] \| None` | Array of function tools available to the model |
| `tool_choice` | `str \| dict \| None` | Controls which tool (if any) the model should call. See [Tools documentation](/sdk/python/tools) for details |
| `tags` | `list[str] \| None` | Optional tags to categorize and label the request for analytics and filtering. Can also be sent via the `x-edgee-tags` header (comma-separated) |
+| `enable_compression` | `bool` | Enable token compression for this request. If `true`, the request will be compressed to the compression rate specified in the API key settings. If `false`, the request will not be compressed. |
+| `compression_rate` | `float` | The compression rate to use for this request. If `enable_compression` is `true`, this value will be used to compress the request. The value should be between 0.0 and 1.0. The default value is 0.75. |
For details about `Message` type, see the [Send Method documentation](/sdk/python/send#message-object).
For details about `Tool` and `ToolChoice` types, see the [Tools documentation](/sdk/python/tools).
@@ -68,6 +70,7 @@ Each chunk yielded by the generator has the following structure:
| Property | Type | Description |
|----------|------|-------------|
| `choices` | `list[StreamChoice]` | Array of streaming choices (typically one) |
+| `compression` | `Compression \| None` | Token compression metrics (if compression was applied) |
### StreamChoice Object
diff --git a/sdk/rust/send.mdx b/sdk/rust/send.mdx
index 703c9e1..99f7ad4 100644
--- a/sdk/rust/send.mdx
+++ b/sdk/rust/send.mdx
@@ -57,6 +57,8 @@ When `input` is an `InputObject`, you have full control over the conversation:
| `tools` | `Option>` | Array of function tools available to the model |
| `tool_choice` | `Option` | Controls which tool (if any) the model should call. See [Tools documentation](/sdk/rust/tools) for details |
| `tags` | `Option>` | Optional tags to categorize and label the request for analytics and filtering. Can also be sent via the `x-edgee-tags` header (comma-separated) |
+| `enable_compression` | `Option` | Enable token compression for this request (overrides console settings). If not set, uses the configuration from your API key or organization settings |
+| `compression_rate` | `Option` | Target compression rate (0.0-1.0, default 0.75). Only used if compression is enabled. Higher values attempt more aggressive compression |
**Example with InputObject:**
@@ -142,6 +144,7 @@ The `send()` method returns a `Result`. On success, it contains:
| `model` | `String` | Model identifier used for the completion |
| `choices` | `Vec` | Array of completion choices (typically one) |
| `usage` | `Option` | Token usage information (if provided by the API) |
+| `compression` | `Option` | Token compression metrics (if compression was applied) |
### Choice Object
@@ -181,7 +184,7 @@ Token usage information (when available):
| Property | Type | Description |
|----------|------|-------------|
-| `prompt_tokens` | `u32` | Number of tokens in the prompt |
+| `prompt_tokens` | `u32` | Number of tokens in the prompt (after compression if applied) |
| `completion_tokens` | `u32` | Number of tokens in the completion |
| `total_tokens` | `u32` | Total tokens used (prompt + completion) |
@@ -197,6 +200,39 @@ if let Some(usage) = &response.usage {
}
```
+### Compression Object
+
+Token compression metrics (when compression is applied):
+
+| Property | Type | Description |
+|----------|------|-------------|
+| `input_tokens` | `u32` | Original number of input tokens before compression |
+| `saved_tokens` | `u32` | Number of tokens saved by compression |
+| `rate` | `f64` | Compression rate as a decimal (0-1). For example, `0.61` means 61% compression |
+
+**Example - Accessing Compression Metrics:**
+
+```rust
+let input = InputObject::new(vec![
+ Message::user("Analyze this long document with lots of context...")
+])
+.with_enable_compression(true)
+.with_compression_rate(0.8); // Target 80% compression
+
+let response = client.send("gpt-4o", input).await?;
+println!("{}", response.text().unwrap_or(""));
+
+if let Some(compression) = &response.compression {
+ println!("Original input tokens: {}", compression.input_tokens);
+ println!("Tokens saved: {}", compression.saved_tokens);
+ println!("Compression rate: {:.1}%", compression.rate * 100.0);
+}
+```
+
+
+ The `compression` object is only present when token compression is applied to the request. Simple queries may not trigger compression.
+
+
## Convenience Methods
The `SendResponse` struct provides convenience methods for easier access:
diff --git a/sdk/rust/stream.mdx b/sdk/rust/stream.mdx
index f4a1ba1..4581761 100644
--- a/sdk/rust/stream.mdx
+++ b/sdk/rust/stream.mdx
@@ -55,6 +55,8 @@ When `input` is a `Vec` or `InputObject`, you have full control over th
| `tools` | `Option>` | Array of function tools available to the model |
| `tool_choice` | `Option` | Controls which tool (if any) the model should call. See [Tools documentation](/sdk/rust/tools) for details |
| `tags` | `Option>` | Optional tags to categorize and label the request for analytics and filtering. Can also be sent via the `x-edgee-tags` header (comma-separated) |
+| `enable_compression` | `Option` | Enable token compression for this request (overrides console settings). If not set, uses the configuration from your API key or organization settings |
+| `compression_rate` | `Option` | Target compression rate (0.0-1.0, default 0.75). Only used if compression is enabled. Higher values attempt more aggressive compression |
For details about `Message` type, see the [Send Method documentation](/sdk/rust/send#message-object).
For details about `Tool` and `ToolChoice` types, see the [Tools documentation](/sdk/rust/tools).
@@ -96,6 +98,7 @@ Each chunk yielded by the stream has the following structure:
| `created` | `u64` | Unix timestamp of when the chunk was created |
| `model` | `String` | Model identifier used for the completion |
| `choices` | `Vec` | Array of streaming choices (typically one) |
+| `compression` | `Option` | Token compression metrics (if compression was applied) |
### StreamChoice Object
diff --git a/sdk/typescript/send.mdx b/sdk/typescript/send.mdx
index f3a04f8..40cdb90 100644
--- a/sdk/typescript/send.mdx
+++ b/sdk/typescript/send.mdx
@@ -45,6 +45,8 @@ When `input` is an `InputObject`, you have full control over the conversation:
| `tools` | `Tool[]` | Array of function tools available to the model |
| `tool_choice` | `ToolChoice` | Controls which tool (if any) the model should call. See [Tools documentation](/sdk/typescript/tools) for details |
| `tags` | `string[]` | Optional tags to categorize and label the request for analytics and filtering. Can also be sent via the `x-edgee-tags` header (comma-separated) |
+| `enable_compression` | `bool` | Enable token compression for this request. If `true`, the request will be compressed to the compression rate specified in the API key settings. If `false`, the request will not be compressed. |
+| `compression_rate` | `number` | The compression rate to use for this request. If `enable_compression` is `true`, this value will be used to compress the request. The value should be between 0.0 and 1.0. The default value is 0.75. |
**Example with InputObject:**
@@ -127,6 +129,7 @@ The `send()` method returns a `Promise` with the following structu
|----------|------|-------------|
| `choices` | `Choice[]` | Array of completion choices (typically one) |
| `usage` | `Usage \| undefined` | Token usage information (if provided by the API) |
+| `compression` | `Compression \| undefined` | Token compression metrics (if compression was applied) |
### Choice Object
@@ -169,7 +172,7 @@ Token usage information (when available):
| Property | Type | Description |
|----------|------|-------------|
-| `prompt_tokens` | `number` | Number of tokens in the prompt |
+| `prompt_tokens` | `number` | Number of tokens in the prompt (after compression if applied) |
| `completion_tokens` | `number` | Number of tokens in the completion |
| `total_tokens` | `number` | Total tokens used (prompt + completion) |
@@ -188,6 +191,41 @@ if (response.usage) {
}
```
+### Compression Object
+
+Token compression metrics (when compression is applied):
+
+| Property | Type | Description |
+|----------|------|-------------|
+| `input_tokens` | `number` | Original number of input tokens before compression |
+| `saved_tokens` | `number` | Number of tokens saved by compression |
+| `rate` | `number` | Compression rate as a decimal (0-1). For example, `0.61` means 61% compression |
+
+**Example - Accessing Compression Metrics:**
+
+```typescript
+const response = await edgee.send({
+ model: 'gpt-4o',
+ input: {
+ messages: [
+ { role: 'user', content: 'Analyze this long document with lots of context...' }
+ ],
+ enable_compression: true,
+ compression_rate: 0.8
+ }
+});
+
+if (response.compression) {
+ console.log(`Original input tokens: ${response.compression.input_tokens}`);
+ console.log(`Tokens saved: ${response.compression.saved_tokens}`);
+ console.log(`Compression rate: ${(response.compression.rate * 100).toFixed(1)}%`);
+}
+```
+
+
+ The `compression` object is only present when token compression is applied to the request. Simple queries may not trigger compression.
+
+
## Convenience Properties
The `SendResponse` class provides convenience getters for easier access:
diff --git a/sdk/typescript/stream.mdx b/sdk/typescript/stream.mdx
index 8c925de..5361802 100644
--- a/sdk/typescript/stream.mdx
+++ b/sdk/typescript/stream.mdx
@@ -45,6 +45,8 @@ When `input` is an `InputObject`, you have full control over the conversation:
| `tools` | `Tool[]` | Array of function tools available to the model |
| `tool_choice` | `ToolChoice` | Controls which tool (if any) the model should call. See [Tools documentation](/sdk/typescript/tools) for details |
| `tags` | `string[]` | Optional tags to categorize and label the request for analytics and filtering. Can also be sent via the `x-edgee-tags` header (comma-separated) |
+| `enable_compression` | `bool` | Enable token compression for this request. If `true`, the request will be compressed to the compression rate specified in the API key settings. If `false`, the request will not be compressed. |
+| `compression_rate` | `number` | The compression rate to use for this request. If `enable_compression` is `true`, this value will be used to compress the request. The value should be between 0.0 and 1.0. The default value is 0.75. |
For details about `Message` type, see the [Send Method documentation](/sdk/typescript/send#message-object).
For details about `Tool` and `ToolChoice` types, see the [Tools documentation](/sdk/typescript/tools).
@@ -76,6 +78,7 @@ Each chunk yielded by the generator has the following structure:
| Property | Type | Description |
|----------|------|-------------|
| `choices` | `StreamChoice[]` | Array of streaming choices (typically one) |
+| `compression` | `Compression \| null` | Token compression metrics (if compression was applied) |
### StreamChoice Object