diff --git a/docs.json b/docs.json index b19bf8b..cd6ad57 100644 --- a/docs.json +++ b/docs.json @@ -81,8 +81,7 @@ "pages": [ "features/overview", "features/token-compression", - "features/observability", - "features/automatic-model-selection" + "features/observability" ] }, { diff --git a/features/automatic-model-selection.mdx b/features/automatic-model-selection.mdx index b58bd66..067f5ad 100644 --- a/features/automatic-model-selection.mdx +++ b/features/automatic-model-selection.mdx @@ -6,6 +6,10 @@ icon: circuit-board Edgee's automatic model selection routes requests to the optimal model based on your priorities. Combined with token compression, it can reduce total AI costs by 60-70%. + +This feature is under active development. Some routing strategies and configuration options may be added in future releases. + + ## Cost-Aware Routing Let Edgee automatically select the cheapest model that meets your quality requirements: @@ -19,8 +23,9 @@ const response = await edgee.send({ }); console.log(`Model used: ${response.model}`); // e.g., "gpt-5.2" -console.log(`Cost: $${response.cost.toFixed(4)}`); -console.log(`Tokens saved (compression): ${response.usage.saved_tokens}`); +if (response.compression) { + console.log(`Tokens saved: ${response.compression.saved_tokens}`); +} ``` **How it works:** @@ -201,6 +206,4 @@ await edgee.routing.addRule({ - -This feature is under active development. Some routing strategies and configuration options may be added in future releases. - + diff --git a/features/observability.mdx b/features/observability.mdx index 7cdad18..886c89d 100644 --- a/features/observability.mdx +++ b/features/observability.mdx @@ -6,9 +6,9 @@ icon: eye Edgee provides complete visibility into your AI infrastructure with real-time metrics on costs, token usage, compression savings, performance, and errors. Every request is tracked and exportable for analysis, budgeting, and optimization. -## Cost Tracking +## Token Usage Tracking -Every Edgee response includes detailed cost information so you can track spending in real-time: +Every Edgee response includes detailed token usage information for tracking and cost analysis: ```typescript const response = await edgee.send({ @@ -16,13 +16,19 @@ const response = await edgee.send({ input: 'Your prompt here', }); -console.log(response.cost); // Total cost in USD (e.g., 0.0234) console.log(response.usage.prompt_tokens); // Compressed input tokens console.log(response.usage.completion_tokens); // Output tokens console.log(response.usage.total_tokens); // Total for billing + +// Compression savings (when applied) +if (response.compression) { + console.log(response.compression.input_tokens); // Original tokens + console.log(response.compression.saved_tokens); // Tokens saved + console.log(`${(response.compression.rate * 100).toFixed(1)}%`); // Compression rate +} ``` -**Track spending by:** +**Track usage by:** - Model (GPT-4o vs Claude vs Gemini) - Project or application - Environment (production vs staging) @@ -30,7 +36,7 @@ console.log(response.usage.total_tokens); // Total for billing - Time period (daily, weekly, monthly) - Costs are calculated using real-time provider pricing. Edgee automatically handles rate changes and updates your historical data accordingly. + Use token usage data with provider pricing to calculate costs. The Edgee dashboard automatically calculates costs based on real-time provider pricing. ## Request Tags for Analytics @@ -143,13 +149,13 @@ If you're using the OpenAI or Anthropic SDKs with Edgee, add tags via the `x-edg **Common tagging strategies:** - + **Environment tagging** Tag by environment: `production`, `staging`, `development` - + **Feature tagging** Tag by feature: `chat`, `summarization`, `code-generation`, `rag-qa` @@ -180,13 +186,16 @@ See exactly how much token compression is saving you on every request: const response = await edgee.send({ model: 'gpt-4o', input: 'Long prompt with lots of context...', + enable_compression: true, }); // Compression details -console.log(response.usage.prompt_tokens_original); // Original token count -console.log(response.usage.prompt_tokens); // After compression -console.log(response.usage.saved_tokens); // Tokens saved -console.log(response.usage.compression_ratio); // Percentage reduction (e.g., 45%) +if (response.compression) { + console.log(response.compression.input_tokens); // Original token count + console.log(response.usage.prompt_tokens); // After compression + console.log(response.compression.saved_tokens); // Tokens saved + console.log(`${(response.compression.rate * 100).toFixed(1)}%`); // Compression rate (e.g., 61.0%) +} ``` **Analyze compression effectiveness:** @@ -208,13 +217,13 @@ console.log(response.usage.compression_ratio); // Percentage reduction (e.g., 45 Track compression ratios over time to identify optimization opportunities - + **By use case** Compare compression effectiveness across different prompt types - + **Top savers** Identify which requests generate the highest savings @@ -265,7 +274,7 @@ Understand how your AI infrastructure is being used: - Cost per model over time - Model switching patterns -## Alerts & Budgets +## Alerts & Budgets (Coming Soon) Stay in control with proactive alerts: @@ -324,32 +333,6 @@ const data = await edgee.analytics.export({ }); ``` -## Dashboard Views - -The Edgee dashboard provides pre-built views for common use cases: - - - - Track spending trends, compare models, and identify cost optimization opportunities. - - - - Monitor token savings, compression ratios, and cumulative cost reductions. - - - - Analyze latency, throughput, error rates, and provider health across regions. - - - - Understand request volume, model distribution, and usage trends over time. - - - - - Dashboard access is included with all Edgee plans. Enterprise customers can customize dashboards and create team-specific views. - - ## What's Next diff --git a/features/token-compression.mdx b/features/token-compression.mdx index 2080833..9eeb1f9 100644 --- a/features/token-compression.mdx +++ b/features/token-compression.mdx @@ -40,6 +40,106 @@ Token compression happens automatically on every request through a four-step pro Compression is most effective for prompts with repeated context (RAG), long system instructions, or verbose multi-turn histories. Simple queries may see minimal compression. +## Enabling Token Compression + +Token compression can be enabled in three ways, giving you flexibility to control compression at the request, API key, or organization level: + +### 1. Per Request (SDK) + +Enable compression for specific requests using the SDK: + + + + ```typescript + const response = await edgee.send({ + model: 'gpt-4o', + input: { + "messages": [ + {"role": "user", "content": "Your prompt here"} + ], + "enable_compression": true, + "compression_rate": 0.8 // Target 80% compression (optional) + } + }); + ``` + + + + ```python + response = edgee.send( + model="gpt-4o", + input={ + "messages": [ + {"role": "user", "content": "Your prompt here"} + ], + "enable_compression": True, + "compression_rate": 0.8 # Target 80% compression (optional) + } + ) + ``` + + + + ```go + response, err := client.Send("gpt-4o", edgee.InputObject{ + Messages: []edgee.Message{ + {Role: "user", Content: "Your prompt here"}, + }, + EnableCompression: true, + CompressionRate: 0.8, // Target 80% compression (optional) + }) + ``` + + + + ```rust + let input = InputObject::new(vec![Message::user("Your prompt here")]) + .with_compression(true) + .with_compression_rate(0.8); // Target 80% compression (optional) + + let response = client.send("gpt-4o", input).await?; + ``` + + + +### 2. Per API Key (Console) + +Enable compression for specific API keys in your organization settings. This is useful when you want different compression settings for different applications or environments. + + +Enable compression for specific API keys +Enable compression for specific API keys + + +In the **Tools** section of your console: +1. Toggle **Enable token compression** on +2. Set your target **Compression rate** (0.7-0.9, default 0.75) +3. Under **Scope**, select **Apply to specific API keys** +4. Choose which API keys should use compression + +### 3. Organization-Wide (Console) + +Enable compression for all requests across your entire organization. This is the recommended setting for most users to maximize savings automatically. + + +Enable compression organization-wide +Enable compression organization-wide + + +In the **Tools** section of your console: +1. Toggle **Enable token compression** on +2. Set your target **Compression rate** (0.7-0.9, default 0.75) +3. Under **Scope**, select **Apply to all org requests** +4. All API keys will now use compression by default + + + **Compression rate** controls how aggressively Edgee compresses prompts. A higher rate (e.g., 0.9) attempts more compression but may be less effective, while a lower rate (e.g., 0.7) is more conservative. The default of 0.75 provides a good balance for most use cases. + + + + SDK-level configuration takes precedence over console settings. If you enable compression in your code with `enable_compression: true`, it will override the console configuration for that specific request. + + ## When It Works Best Token compression delivers the highest savings for these common use cases: @@ -89,16 +189,19 @@ const documents = [ const response = await edgee.send({ model: 'gpt-4o', input: `Answer the question based on these documents:\n\n${documents.join('\n\n')}\n\nQuestion: What is the main topic?`, + enable_compression: true, // Enable compression for this request + compression_rate: 0.8, // Target compression ratio (0-1, e.g., 0.8 = 80%) }); console.log(response.text); // Compression metrics -console.log(`Original tokens: ${response.usage.prompt_tokens_original}`); -console.log(`Compressed tokens: ${response.usage.prompt_tokens}`); -console.log(`Tokens saved: ${response.usage.saved_tokens}`); -console.log(`Compression ratio: ${response.usage.compression_ratio}%`); -console.log(`Request cost: $${response.cost.toFixed(4)}`); +if (response.compression) { + console.log(`Original tokens: ${response.compression.input_tokens}`); + console.log(`Compressed tokens: ${response.usage.prompt_tokens}`); + console.log(`Tokens saved: ${response.compression.saved_tokens}`); + console.log(`Compression rate: ${(response.compression.rate * 100).toFixed(1)}%`); +} ``` **Example output:** @@ -107,7 +210,6 @@ Original tokens: 2,450 Compressed tokens: 1,225 Tokens saved: 1,225 Compression ratio: 50% -Request cost: $0.0184 ``` ## Real-World Savings @@ -145,7 +247,7 @@ Here's what token compression means for your monthly AI bill: - Enable compression by default for all requests - Compression happens automatically without configuration - - Track `compression_ratio` to understand effectiveness + - Track `compression.rate` to understand effectiveness - Use response metrics to optimize prompt design @@ -162,14 +264,15 @@ Here's what token compression means for your monthly AI bill: Every Edgee response includes detailed compression metrics: ```typescript +// Usage information response.usage.prompt_tokens // Compressed token count (billed) -response.usage.prompt_tokens_original // Original token count (before compression) -response.usage.saved_tokens // Tokens saved by compression -response.usage.compression_ratio // Percentage reduction response.usage.completion_tokens // Output tokens (unchanged) response.usage.total_tokens // Total for billing calculation -response.cost // Total request cost in USD +// Compression information (when applied) +response.compression.input_tokens // Original token count (before compression) +response.compression.saved_tokens // Tokens saved by compression +response.compression.rate // Compression rate (0-1, e.g., 0.61 = 61%) ``` Use these fields to: diff --git a/images/compression-enabled-by-tag-dark.png b/images/compression-enabled-by-tag-dark.png new file mode 100644 index 0000000..5687134 Binary files /dev/null and b/images/compression-enabled-by-tag-dark.png differ diff --git a/images/compression-enabled-by-tag-light.png b/images/compression-enabled-by-tag-light.png new file mode 100644 index 0000000..3ed6f53 Binary files /dev/null and b/images/compression-enabled-by-tag-light.png differ diff --git a/images/compression-enabled-org-dark.png b/images/compression-enabled-org-dark.png new file mode 100644 index 0000000..957011f Binary files /dev/null and b/images/compression-enabled-org-dark.png differ diff --git a/images/compression-enabled-org-light.png b/images/compression-enabled-org-light.png new file mode 100644 index 0000000..7a1e36a Binary files /dev/null and b/images/compression-enabled-org-light.png differ diff --git a/integrations/anthropic-sdk.mdx b/integrations/anthropic-sdk.mdx index e19169c..8c719c3 100644 --- a/integrations/anthropic-sdk.mdx +++ b/integrations/anthropic-sdk.mdx @@ -138,7 +138,7 @@ Stream responses for real-time token delivery: ## Cost Tracking & Compression -Every Edgee response includes token compression metrics through the Anthropic API's `usage` field: +Every Edgee response includes token compression metrics in a dedicated `compression` field: @@ -158,15 +158,13 @@ Every Edgee response includes token compression metrics through the Anthropic AP print(message.content[0].text) - # Compression metrics - usage = message.usage - tokens_saved = usage.input_tokens_original - usage.input_tokens - compression_ratio = (tokens_saved / usage.input_tokens_original) * 100 - - print(f"Original input tokens: {usage.input_tokens_original}") - print(f"Compressed input tokens: {usage.input_tokens}") - print(f"Tokens saved: {tokens_saved}") - print(f"Compression ratio: {compression_ratio:.1f}%") + # Compression metrics (if compression was applied) + if hasattr(message, 'compression') and message.compression: + compression = message.compression + print(f"Original input tokens: {compression.input_tokens}") + print(f"Compressed input tokens: {message.usage.input_tokens}") + print(f"Tokens saved: {compression.saved_tokens}") + print(f"Compression rate: {compression.rate * 100:.1f}%") ``` @@ -187,21 +185,20 @@ Every Edgee response includes token compression metrics through the Anthropic AP console.log(message.content[0].text); - // Compression metrics - const usage = message.usage; - const tokensSaved = usage.input_tokens_original - usage.input_tokens; - const compressionRatio = (tokensSaved / usage.input_tokens_original) * 100; - - console.log(`Original input tokens: ${usage.input_tokens_original}`); - console.log(`Compressed input tokens: ${usage.input_tokens}`); - console.log(`Tokens saved: ${tokensSaved}`); - console.log(`Compression ratio: ${compressionRatio.toFixed(1)}%`); + // Compression metrics (if compression was applied) + if (message.compression) { + const compression = message.compression; + console.log(`Original input tokens: ${compression.input_tokens}`); + console.log(`Compressed input tokens: ${message.usage.input_tokens}`); + console.log(`Tokens saved: ${compression.saved_tokens}`); + console.log(`Compression rate: ${(compression.rate * 100).toFixed(1)}%`); + } ``` - Edgee extends the Anthropic API response with `input_tokens_original` to show the token count before compression. All other fields remain standard Anthropic format. + Edgee extends the Anthropic API response with a `compression` field containing compression metrics (`input_tokens`, `saved_tokens`, `rate`). All standard Anthropic fields remain unchanged. ## Multi-Provider Access diff --git a/integrations/openai-sdk.mdx b/integrations/openai-sdk.mdx index 68ef2b7..88e040d 100644 --- a/integrations/openai-sdk.mdx +++ b/integrations/openai-sdk.mdx @@ -110,11 +110,12 @@ const completion = await openai.chat.completions.create({ console.log(completion.choices[0].message.content); -// Access compression metrics -const usage = completion.usage; -console.log(`Tokens saved: ${usage.prompt_tokens_original - usage.prompt_tokens}`); -console.log(`Compression ratio: ${((usage.prompt_tokens_original - usage.prompt_tokens) / usage.prompt_tokens_original * 100).toFixed(1)}%`); -console.log(`Total tokens: ${usage.total_tokens}`); +// Access compression metrics (if compression was applied) +if (completion.compression) { + console.log(`Tokens saved: ${completion.compression.saved_tokens}`); + console.log(`Compression rate: ${(completion.compression.rate * 100).toFixed(1)}%`); +} +console.log(`Total tokens: ${completion.usage.total_tokens}`); ``` ```python title="Python" @@ -135,20 +136,17 @@ completion = client.chat.completions.create( print(completion.choices[0].message.content) -# Access compression metrics -usage = completion.usage -tokens_saved = usage.prompt_tokens_original - usage.prompt_tokens -compression_ratio = (tokens_saved / usage.prompt_tokens_original) * 100 - -print(f"Tokens saved: {tokens_saved}") -print(f"Compression ratio: {compression_ratio:.1f}%") -print(f"Total tokens: {usage.total_tokens}") +# Access compression metrics (if compression was applied) +if hasattr(completion, 'compression') and completion.compression: + print(f"Tokens saved: {completion.compression.saved_tokens}") + print(f"Compression rate: {completion.compression.rate * 100:.1f}%") +print(f"Total tokens: {completion.usage.total_tokens}") ``` - Edgee extends the OpenAI API response with `prompt_tokens_original` to show the token count before compression. All other fields remain standard OpenAI format. + Edgee extends the OpenAI API response with a `compression` field containing compression metrics (`input_tokens`, `saved_tokens`, `rate`). All standard OpenAI fields remain unchanged. ## Advanced Usage diff --git a/introduction.mdx b/introduction.mdx index 6fdc24d..07b6d52 100644 --- a/introduction.mdx +++ b/introduction.mdx @@ -22,8 +22,9 @@ Edgee is an **AI Gateway** that reduces LLM costs by up to 50% through intellige }); console.log(response.text); - console.log(`Tokens saved: ${response.usage.saved_tokens}`); - console.log(`Cost: $${response.cost.toFixed(4)}`); + if (response.compression) { + console.log(`Tokens saved: ${response.compression.saved_tokens}`); + } ``` @@ -39,8 +40,8 @@ Edgee is an **AI Gateway** that reduces LLM costs by up to 50% through intellige ) print(response.text) - print(f"Tokens saved: {response.usage.saved_tokens}") - print(f"Cost: ${response.cost:.4f}") + if response.compression: + print(f"Tokens saved: {response.compression.saved_tokens}") ``` @@ -63,8 +64,9 @@ Edgee is an **AI Gateway** that reduces LLM costs by up to 50% through intellige } fmt.Println(response.Text()) - fmt.Printf("Tokens saved: %d\n", response.Usage.SavedTokens) - fmt.Printf("Cost: $%.4f\n", response.Cost) + if response.Compression != nil { + fmt.Printf("Tokens saved: %d\n", response.Compression.SavedTokens) + } } ``` @@ -77,8 +79,9 @@ Edgee is an **AI Gateway** that reduces LLM costs by up to 50% through intellige let response = client.send("gpt-4o", "What is the capital of France?").await.unwrap(); println!("{}", response.text().unwrap_or("")); - println!("Tokens saved: {}", response.usage.saved_tokens); - println!("Cost: ${:.4}", response.cost); + if let Some(compression) = &response.compression { + println!("Tokens saved: {}", compression.saved_tokens); + } ``` diff --git a/introduction/faq.mdx b/introduction/faq.mdx index 278fb09..c9e49c0 100644 --- a/introduction/faq.mdx +++ b/introduction/faq.mdx @@ -35,7 +35,7 @@ icon: message-circle-question-mark - Multi-turn conversations with growing history - Document analysis with redundant information - Every response includes compression metrics (`saved_tokens`, `compression_ratio`) so you can track your savings in real-time. + Every response includes a `compression` field with metrics (`input_tokens`, `saved_tokens`, `rate`) so you can track your savings in real-time. diff --git a/package-lock.json b/package-lock.json index 87fdbb6..1a08bb4 100644 --- a/package-lock.json +++ b/package-lock.json @@ -5,7 +5,7 @@ "packages": { "": { "dependencies": { - "mintlify": "^4.2.310" + "mintlify": "^4.2.314" } }, "node_modules/@alcalzone/ansi-tokenize": { @@ -85,9 +85,9 @@ } }, "node_modules/@babel/code-frame": { - "version": "7.28.6", - "resolved": "https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.28.6.tgz", - "integrity": "sha512-JYgintcMjRiCvS8mMECzaEn+m3PfoQiyqukOMCCVQtoJGYJw8j/8LBJEiqkHLkfwCcs74E3pbAUFNg7d9VNJ+Q==", + "version": "7.29.0", + "resolved": "https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.29.0.tgz", + "integrity": "sha512-9NhCeYjq9+3uxgdtp20LSiJXJvN0FeCtNGpJxuMFZ1Kv3cWUNb6DOhJwUvcVCzKGR66cw4njwM6hrJLqgOwbcw==", "license": "MIT", "dependencies": { "@babel/helper-validator-identifier": "^7.28.5", @@ -982,18 +982,18 @@ } }, "node_modules/@mintlify/cli": { - "version": "4.0.914", - "resolved": "https://registry.npmjs.org/@mintlify/cli/-/cli-4.0.914.tgz", - "integrity": "sha512-L6Ls4qOedK0SkyZIBrfy8utQIt/fCyAX0sJO1yFoZjIPmGMYCSWHftIz9xXU2nWjV4924EoSg/b81+CBccZg6w==", + "version": "4.0.918", + "resolved": "https://registry.npmjs.org/@mintlify/cli/-/cli-4.0.918.tgz", + "integrity": "sha512-JwHE7Uhhog4xqbQmduuETZyWth/avASbGdDB1h9RhsprgIb7W8FR80YRRzUCCLUwz10M/dgqEaBEyr3/+RvKvg==", "license": "Elastic-2.0", "dependencies": { "@inquirer/prompts": "7.9.0", - "@mintlify/common": "1.0.694", - "@mintlify/link-rot": "3.0.853", - "@mintlify/models": "0.0.260", - "@mintlify/prebuild": "1.0.830", - "@mintlify/previewing": "4.0.886", - "@mintlify/validation": "0.1.574", + "@mintlify/common": "1.0.697", + "@mintlify/link-rot": "3.0.856", + "@mintlify/models": "0.0.262", + "@mintlify/prebuild": "1.0.833", + "@mintlify/previewing": "4.0.889", + "@mintlify/validation": "0.1.576", "adm-zip": "0.5.16", "chalk": "5.2.0", "color": "4.2.3", @@ -1018,16 +1018,16 @@ } }, "node_modules/@mintlify/common": { - "version": "1.0.694", - "resolved": "https://registry.npmjs.org/@mintlify/common/-/common-1.0.694.tgz", - "integrity": "sha512-HiIL3+tZlFtrwcNHuIpjXQuBXWKIoVtIzPeFl08Pk88pLAHAphfCH2T4jBdEiaUPnx5BhDRko1HQ2T0uexRFeQ==", + "version": "1.0.697", + "resolved": "https://registry.npmjs.org/@mintlify/common/-/common-1.0.697.tgz", + "integrity": "sha512-xZ/arB2O60ncw+VPQg4jHqaY8huY2fhSTWvbSKSoJZyK+P7asMWXzNBCt0H6vcRf1rine4D2srlCA4ymCHnDHg==", "license": "ISC", "dependencies": { "@asyncapi/parser": "3.4.0", "@mintlify/mdx": "^3.0.4", - "@mintlify/models": "0.0.260", + "@mintlify/models": "0.0.262", "@mintlify/openapi-parser": "^0.0.8", - "@mintlify/validation": "0.1.574", + "@mintlify/validation": "0.1.576", "@sindresorhus/slugify": "2.2.0", "@types/mdast": "4.0.4", "acorn": "8.11.2", @@ -1456,16 +1456,16 @@ } }, "node_modules/@mintlify/link-rot": { - "version": "3.0.853", - "resolved": "https://registry.npmjs.org/@mintlify/link-rot/-/link-rot-3.0.853.tgz", - "integrity": "sha512-zu6gr6RK7tY6WpD5/KJ3Q6zOgXlHfsJNHQp4F82cIaHDH8l4gnNGurnOW2RZcaLbDk+cj7UF3TrDbWT13RZgjg==", + "version": "3.0.856", + "resolved": "https://registry.npmjs.org/@mintlify/link-rot/-/link-rot-3.0.856.tgz", + "integrity": "sha512-4BFxaEJSqJtPM31zV+BD+bbXMnGWxYrtSa8ve3qFx7uSj1WYMVJ4Ag//xybmuzxPGlAN8vla8WXoLkz/6/gp3A==", "license": "Elastic-2.0", "dependencies": { - "@mintlify/common": "1.0.694", - "@mintlify/prebuild": "1.0.830", - "@mintlify/previewing": "4.0.886", + "@mintlify/common": "1.0.697", + "@mintlify/prebuild": "1.0.833", + "@mintlify/previewing": "4.0.889", "@mintlify/scraping": "4.0.522", - "@mintlify/validation": "0.1.574", + "@mintlify/validation": "0.1.576", "fs-extra": "11.1.0", "unist-util-visit": "4.1.2" }, @@ -1536,9 +1536,9 @@ } }, "node_modules/@mintlify/models": { - "version": "0.0.260", - "resolved": "https://registry.npmjs.org/@mintlify/models/-/models-0.0.260.tgz", - "integrity": "sha512-M7WpKC4ysrrc5M16fUPFBLbhmdxfOm3LsMeurhQJ7Jc4V8o8DCdqLKkGTs0PZEFPSKx34X1wCBp4YrDx3kBDNQ==", + "version": "0.0.262", + "resolved": "https://registry.npmjs.org/@mintlify/models/-/models-0.0.262.tgz", + "integrity": "sha512-9JNwnx1AtasQi3eP3yh/ffNgAB5ZS17jSE0IPa38QzBn4eMXoLvNQscPlhBp9krAYHpnOWm8VN0G5rV9lsgXvA==", "license": "Elastic-2.0", "dependencies": { "axios": "1.13.2", @@ -1583,15 +1583,15 @@ } }, "node_modules/@mintlify/prebuild": { - "version": "1.0.830", - "resolved": "https://registry.npmjs.org/@mintlify/prebuild/-/prebuild-1.0.830.tgz", - "integrity": "sha512-q696zAc5TvhKFddaIuygM8W20y/3J9D2R/EVsmzKmio9+IfWkHfY/h6WwfqVavHpYdgnFxh/sAZtA4pSb609QQ==", + "version": "1.0.833", + "resolved": "https://registry.npmjs.org/@mintlify/prebuild/-/prebuild-1.0.833.tgz", + "integrity": "sha512-RAnnVDplb1pdY1VzZDoJPd+unRKs0QJo/rrzMPw3ytf69iOS+Z3Ao00CSLPJm3ZQ65HOj0XFZ5qIfJHaoT6jpA==", "license": "Elastic-2.0", "dependencies": { - "@mintlify/common": "1.0.694", + "@mintlify/common": "1.0.697", "@mintlify/openapi-parser": "^0.0.8", - "@mintlify/scraping": "4.0.555", - "@mintlify/validation": "0.1.574", + "@mintlify/scraping": "4.0.558", + "@mintlify/validation": "0.1.576", "chalk": "5.3.0", "favicons": "7.2.0", "front-matter": "4.0.2", @@ -1605,12 +1605,12 @@ } }, "node_modules/@mintlify/prebuild/node_modules/@mintlify/scraping": { - "version": "4.0.555", - "resolved": "https://registry.npmjs.org/@mintlify/scraping/-/scraping-4.0.555.tgz", - "integrity": "sha512-YhxnlyirsKy4huUdUVBcPuPrIymbnu+hR9a9x0sullh7VKGEwPgxo0a0bqGSPdkoGJMBlXlJDOGLb2Ud0/gsdQ==", + "version": "4.0.558", + "resolved": "https://registry.npmjs.org/@mintlify/scraping/-/scraping-4.0.558.tgz", + "integrity": "sha512-CR8CBwrdcr4pQ3EHCLjuK0oet0Ag4Samwqpha3fGZ3FYad2Kaq1ZON7x89GosO4DiZTGQksBv2A9gpOVs0vpzg==", "license": "Elastic-2.0", "dependencies": { - "@mintlify/common": "1.0.694", + "@mintlify/common": "1.0.697", "@mintlify/openapi-parser": "^0.0.8", "fs-extra": "11.1.1", "hast-util-to-mdast": "10.1.0", @@ -1786,14 +1786,14 @@ } }, "node_modules/@mintlify/previewing": { - "version": "4.0.886", - "resolved": "https://registry.npmjs.org/@mintlify/previewing/-/previewing-4.0.886.tgz", - "integrity": "sha512-WScIiiw/6QYm7CMSdfKqpW9skKDl/rw5eAk1FOzBlUdOWpVZhQ77OuWyltWaD2MepWw4Cj7UBbcYyDvsvPtyEA==", + "version": "4.0.889", + "resolved": "https://registry.npmjs.org/@mintlify/previewing/-/previewing-4.0.889.tgz", + "integrity": "sha512-SvVM+lzkDK2LM6G/d5mKMpIA1Kf3nlLzRhwJLDLdI01+Hq2uzKmyQvmWGWc4vqTTIZwTSNiYumbEzaMbzJx9uA==", "license": "Elastic-2.0", "dependencies": { - "@mintlify/common": "1.0.694", - "@mintlify/prebuild": "1.0.830", - "@mintlify/validation": "0.1.574", + "@mintlify/common": "1.0.697", + "@mintlify/prebuild": "1.0.833", + "@mintlify/validation": "0.1.576", "better-opn": "3.0.2", "chalk": "5.2.0", "chokidar": "3.5.3", @@ -2433,13 +2433,13 @@ } }, "node_modules/@mintlify/validation": { - "version": "0.1.574", - "resolved": "https://registry.npmjs.org/@mintlify/validation/-/validation-0.1.574.tgz", - "integrity": "sha512-yqBiZJmP+7iHPiJ2h40MWO94f02Ajo/7PDve+V+mrWcw6OwetJErUdlYFsveiiPYk66HIM6BvFQBsQZt1NfSqQ==", + "version": "0.1.576", + "resolved": "https://registry.npmjs.org/@mintlify/validation/-/validation-0.1.576.tgz", + "integrity": "sha512-w3QWe2X2gj6oqA0jCTQialxIQz/ki+6ud0V0Y+gKAlLH5UsoIqBFCrXjFYZZLkBt/Md6wnNmADH71gjLh4481w==", "license": "Elastic-2.0", "dependencies": { "@mintlify/mdx": "^3.0.4", - "@mintlify/models": "0.0.260", + "@mintlify/models": "0.0.262", "arktype": "2.1.27", "js-yaml": "4.1.0", "lcm": "0.0.3", @@ -3144,12 +3144,12 @@ "license": "MIT" }, "node_modules/@shikijs/core": { - "version": "3.21.0", - "resolved": "https://registry.npmjs.org/@shikijs/core/-/core-3.21.0.tgz", - "integrity": "sha512-AXSQu/2n1UIQekY8euBJlvFYZIw0PHY63jUzGbrOma4wPxzznJXTXkri+QcHeBNaFxiiOljKxxJkVSoB3PjbyA==", + "version": "3.22.0", + "resolved": "https://registry.npmjs.org/@shikijs/core/-/core-3.22.0.tgz", + "integrity": "sha512-iAlTtSDDbJiRpvgL5ugKEATDtHdUVkqgHDm/gbD2ZS9c88mx7G1zSYjjOxp5Qa0eaW0MAQosFRmJSk354PRoQA==", "license": "MIT", "dependencies": { - "@shikijs/types": "3.21.0", + "@shikijs/types": "3.22.0", "@shikijs/vscode-textmate": "^10.0.2", "@types/hast": "^3.0.4", "hast-util-to-html": "^9.0.5" @@ -3179,62 +3179,62 @@ } }, "node_modules/@shikijs/engine-javascript": { - "version": "3.21.0", - "resolved": "https://registry.npmjs.org/@shikijs/engine-javascript/-/engine-javascript-3.21.0.tgz", - "integrity": "sha512-ATwv86xlbmfD9n9gKRiwuPpWgPENAWCLwYCGz9ugTJlsO2kOzhOkvoyV/UD+tJ0uT7YRyD530x6ugNSffmvIiQ==", + "version": "3.22.0", + "resolved": "https://registry.npmjs.org/@shikijs/engine-javascript/-/engine-javascript-3.22.0.tgz", + "integrity": "sha512-jdKhfgW9CRtj3Tor0L7+yPwdG3CgP7W+ZEqSsojrMzCjD1e0IxIbwUMDDpYlVBlC08TACg4puwFGkZfLS+56Tw==", "license": "MIT", "dependencies": { - "@shikijs/types": "3.21.0", + "@shikijs/types": "3.22.0", "@shikijs/vscode-textmate": "^10.0.2", "oniguruma-to-es": "^4.3.4" } }, "node_modules/@shikijs/engine-oniguruma": { - "version": "3.21.0", - "resolved": "https://registry.npmjs.org/@shikijs/engine-oniguruma/-/engine-oniguruma-3.21.0.tgz", - "integrity": "sha512-OYknTCct6qiwpQDqDdf3iedRdzj6hFlOPv5hMvI+hkWfCKs5mlJ4TXziBG9nyabLwGulrUjHiCq3xCspSzErYQ==", + "version": "3.22.0", + "resolved": "https://registry.npmjs.org/@shikijs/engine-oniguruma/-/engine-oniguruma-3.22.0.tgz", + "integrity": "sha512-DyXsOG0vGtNtl7ygvabHd7Mt5EY8gCNqR9Y7Lpbbd/PbJvgWrqaKzH1JW6H6qFkuUa8aCxoiYVv8/YfFljiQxA==", "license": "MIT", "dependencies": { - "@shikijs/types": "3.21.0", + "@shikijs/types": "3.22.0", "@shikijs/vscode-textmate": "^10.0.2" } }, "node_modules/@shikijs/langs": { - "version": "3.21.0", - "resolved": "https://registry.npmjs.org/@shikijs/langs/-/langs-3.21.0.tgz", - "integrity": "sha512-g6mn5m+Y6GBJ4wxmBYqalK9Sp0CFkUqfNzUy2pJglUginz6ZpWbaWjDB4fbQ/8SHzFjYbtU6Ddlp1pc+PPNDVA==", + "version": "3.22.0", + "resolved": "https://registry.npmjs.org/@shikijs/langs/-/langs-3.22.0.tgz", + "integrity": "sha512-x/42TfhWmp6H00T6uwVrdTJGKgNdFbrEdhaDwSR5fd5zhQ1Q46bHq9EO61SCEWJR0HY7z2HNDMaBZp8JRmKiIA==", "license": "MIT", "dependencies": { - "@shikijs/types": "3.21.0" + "@shikijs/types": "3.22.0" } }, "node_modules/@shikijs/themes": { - "version": "3.21.0", - "resolved": "https://registry.npmjs.org/@shikijs/themes/-/themes-3.21.0.tgz", - "integrity": "sha512-BAE4cr9EDiZyYzwIHEk7JTBJ9CzlPuM4PchfcA5ao1dWXb25nv6hYsoDiBq2aZK9E3dlt3WB78uI96UESD+8Mw==", + "version": "3.22.0", + "resolved": "https://registry.npmjs.org/@shikijs/themes/-/themes-3.22.0.tgz", + "integrity": "sha512-o+tlOKqsr6FE4+mYJG08tfCFDS+3CG20HbldXeVoyP+cYSUxDhrFf3GPjE60U55iOkkjbpY2uC3It/eeja35/g==", "license": "MIT", "dependencies": { - "@shikijs/types": "3.21.0" + "@shikijs/types": "3.22.0" } }, "node_modules/@shikijs/transformers": { - "version": "3.21.0", - "resolved": "https://registry.npmjs.org/@shikijs/transformers/-/transformers-3.21.0.tgz", - "integrity": "sha512-CZwvCWWIiRRiFk9/JKzdEooakAP8mQDtBOQ1TKiCaS2E1bYtyBCOkUzS8akO34/7ufICQ29oeSfkb3tT5KtrhA==", + "version": "3.22.0", + "resolved": "https://registry.npmjs.org/@shikijs/transformers/-/transformers-3.22.0.tgz", + "integrity": "sha512-E7eRV7mwDBjueLF6852n2oYeJYxBq3NSsDk+uyruYAXONv4U8holGmIrT+mPRJQ1J1SNOH6L8G19KRzmBawrFw==", "license": "MIT", "dependencies": { - "@shikijs/core": "3.21.0", - "@shikijs/types": "3.21.0" + "@shikijs/core": "3.22.0", + "@shikijs/types": "3.22.0" } }, "node_modules/@shikijs/twoslash": { - "version": "3.21.0", - "resolved": "https://registry.npmjs.org/@shikijs/twoslash/-/twoslash-3.21.0.tgz", - "integrity": "sha512-iH360udAYON2JwfIldoCiMZr9MljuQA5QRBivKLpEuEpmVCSwrR+0WTQ0eS1ptgGBdH9weFiIsA5wJDzsEzTYg==", + "version": "3.22.0", + "resolved": "https://registry.npmjs.org/@shikijs/twoslash/-/twoslash-3.22.0.tgz", + "integrity": "sha512-GO27UPN+kegOMQvC+4XcLt0Mttyg+n16XKjmoKjdaNZoW+sOJV7FLdv2QKauqUDws6nE3EQPD+TFHEdyyoUBDw==", "license": "MIT", "dependencies": { - "@shikijs/core": "3.21.0", - "@shikijs/types": "3.21.0", + "@shikijs/core": "3.22.0", + "@shikijs/types": "3.22.0", "twoslash": "^0.3.6" }, "peerDependencies": { @@ -3242,9 +3242,9 @@ } }, "node_modules/@shikijs/types": { - "version": "3.21.0", - "resolved": "https://registry.npmjs.org/@shikijs/types/-/types-3.21.0.tgz", - "integrity": "sha512-zGrWOxZ0/+0ovPY7PvBU2gIS9tmhSUUt30jAcNV0Bq0gb2S98gwfjIs1vxlmH5zM7/4YxLamT6ChlqqAJmPPjA==", + "version": "3.22.0", + "resolved": "https://registry.npmjs.org/@shikijs/types/-/types-3.22.0.tgz", + "integrity": "sha512-491iAekgKDBFE67z70Ok5a8KBMsQ2IJwOWw3us/7ffQkIBCyOQfm/aNwVMBUriP02QshIfgHCBSIYAl3u2eWjg==", "license": "MIT", "dependencies": { "@shikijs/vscode-textmate": "^10.0.2", @@ -3798,9 +3798,9 @@ } }, "node_modules/@types/node": { - "version": "25.1.0", - "resolved": "https://registry.npmjs.org/@types/node/-/node-25.1.0.tgz", - "integrity": "sha512-t7frlewr6+cbx+9Ohpl0NOTKXZNV9xHRmNOvql47BFJKcEG1CxtxlPEEe+gR9uhVWM4DwhnvTF110mIL4yP9RA==", + "version": "25.2.0", + "resolved": "https://registry.npmjs.org/@types/node/-/node-25.2.0.tgz", + "integrity": "sha512-DZ8VwRFUNzuqJ5khrvwMXHmvPe+zGayJhr2CDNiKB1WBE1ST8Djl00D0IC4vvNmHMdj6DlbYRIaFE7WHjlDl5w==", "license": "MIT", "peer": true, "dependencies": { @@ -9557,12 +9557,12 @@ } }, "node_modules/mintlify": { - "version": "4.2.310", - "resolved": "https://registry.npmjs.org/mintlify/-/mintlify-4.2.310.tgz", - "integrity": "sha512-1FBt5x1BSk0z+Shk4owGfDpEZhXsN68me1qNSZ8F6LyEXFuflADNGpQUYX36GGeGrbZoDbY76WLtiQLv7XPmtg==", + "version": "4.2.314", + "resolved": "https://registry.npmjs.org/mintlify/-/mintlify-4.2.314.tgz", + "integrity": "sha512-G7KSCSkPyNq377ooSSQomggCDWsDoEeG+SyB1K2eKW5sH0SnSKyaCdPYxMLJF05+lGgYGSmPJ3mGqgqQC64/Rg==", "license": "Elastic-2.0", "dependencies": { - "@mintlify/cli": "4.0.914" + "@mintlify/cli": "4.0.918" }, "bin": { "mint": "index.js", @@ -11528,17 +11528,17 @@ } }, "node_modules/shiki": { - "version": "3.21.0", - "resolved": "https://registry.npmjs.org/shiki/-/shiki-3.21.0.tgz", - "integrity": "sha512-N65B/3bqL/TI2crrXr+4UivctrAGEjmsib5rPMMPpFp1xAx/w03v8WZ9RDDFYteXoEgY7qZ4HGgl5KBIu1153w==", + "version": "3.22.0", + "resolved": "https://registry.npmjs.org/shiki/-/shiki-3.22.0.tgz", + "integrity": "sha512-LBnhsoYEe0Eou4e1VgJACes+O6S6QC0w71fCSp5Oya79inkwkm15gQ1UF6VtQ8j/taMDh79hAB49WUk8ALQW3g==", "license": "MIT", "dependencies": { - "@shikijs/core": "3.21.0", - "@shikijs/engine-javascript": "3.21.0", - "@shikijs/engine-oniguruma": "3.21.0", - "@shikijs/langs": "3.21.0", - "@shikijs/themes": "3.21.0", - "@shikijs/types": "3.21.0", + "@shikijs/core": "3.22.0", + "@shikijs/engine-javascript": "3.22.0", + "@shikijs/engine-oniguruma": "3.22.0", + "@shikijs/langs": "3.22.0", + "@shikijs/themes": "3.22.0", + "@shikijs/types": "3.22.0", "@shikijs/vscode-textmate": "^10.0.2", "@types/hast": "^3.0.4" } diff --git a/package.json b/package.json index b42ef1b..45ccb81 100644 --- a/package.json +++ b/package.json @@ -4,6 +4,6 @@ "links": "mintlify broken-links" }, "dependencies": { - "mintlify": "^4.2.310" + "mintlify": "^4.2.314" } } diff --git a/sdk/go/send.mdx b/sdk/go/send.mdx index ea3198a..91f8b9f 100644 --- a/sdk/go/send.mdx +++ b/sdk/go/send.mdx @@ -43,6 +43,8 @@ When `input` is an `InputObject`, you have full control over the conversation: | `Tools` | `[]Tool` | Array of function tools available to the model | | `ToolChoice` | `any` | Controls which tool (if any) the model should call. Can be `string` (`"auto"`, `"none"`) or `map[string]interface{}`. See [Tools documentation](/sdk/go/tools) for details | | `Tags` | `[]string` | Optional tags to categorize and label the request for analytics and filtering. Can also be sent via the `x-edgee-tags` header (comma-separated) | +| `EnableCompression` | `bool` | Enable token compression for this request. If `true`, the request will be compressed to the compression rate specified in the API key settings. If `false`, the request will not be compressed. | +| `CompressionRate` | `float64` | The compression rate to use for this request. If `EnableCompression` is `true`, this value will be used to compress the request. The value should be between 0.0 and 1.0. The default value is 0.75. | **Example with InputObject:** @@ -151,6 +153,7 @@ The `Send()` method returns `(SendResponse, error)`. On success, the `SendRespon | `Model` | `string` | Model identifier used for the completion | | `Choices` | `[]Choice` | Array of completion choices (typically one) | | `Usage` | `*Usage` | Token usage information (if provided by the API) | +| `Compression` | `*Compression` | Token compression metrics (if compression was applied) | ### Choice Object @@ -195,7 +198,7 @@ Token usage information (when available): | Property | Type | Description | |----------|------|-------------| -| `PromptTokens` | `int` | Number of tokens in the prompt | +| `PromptTokens` | `int` | Number of tokens in the prompt (after compression if applied) | | `CompletionTokens` | `int` | Number of tokens in the completion | | `TotalTokens` | `int` | Total tokens used (prompt + completion) | @@ -214,6 +217,41 @@ if response.Usage != nil { } ``` +### Compression Object + +Token compression metrics (when compression is applied): + +| Property | Type | Description | +|----------|------|-------------| +| `InputTokens` | `int` | Original number of input tokens before compression | +| `SavedTokens` | `int` | Number of tokens saved by compression | +| `Rate` | `float64` | Compression rate as a decimal (0-1). For example, `0.61` means 61% compression | + +**Example - Accessing Compression Metrics:** + +```go +response, err := client.Send("gpt-4o", edgee.InputObject{ + Messages: []edgee.Message{ + {Role: "user", Content: "Analyze this long document with lots of context..."}, + }, + EnableCompression: true, + CompressionRate: 0.8, +}) +if err != nil { + log.Fatal(err) +} + +if response.Compression != nil { + fmt.Printf("Original input tokens: %d\n", response.Compression.InputTokens) + fmt.Printf("Tokens saved: %d\n", response.Compression.SavedTokens) + fmt.Printf("Compression rate: %.1f%%\n", response.Compression.Rate * 100) +} +``` + + + The `Compression` object is only present when token compression is applied to the request. Simple queries may not trigger compression. + + ## Convenience Methods The `SendResponse` struct provides convenience methods for easier access: diff --git a/sdk/go/stream.mdx b/sdk/go/stream.mdx index e6d0690..61f07e1 100644 --- a/sdk/go/stream.mdx +++ b/sdk/go/stream.mdx @@ -57,6 +57,8 @@ When `input` is an `InputObject` or `map[string]interface{}`, you have full cont | `Tools` | `[]Tool` | Array of function tools available to the model | | `ToolChoice` | `any` | Controls which tool (if any) the model should call. See [Tools documentation](/sdk/go/tools) for details | | `Tags` | `[]string` | Optional tags to categorize and label the request for analytics and filtering. Can also be sent via the `x-edgee-tags` header (comma-separated) | +| `EnableCompression` | `bool` | Enable token compression for this request. If `true`, the request will be compressed to the compression rate specified in the API key settings. If `false`, the request will not be compressed. | +| `CompressionRate` | `float64` | The compression rate to use for this request. If `EnableCompression` is `true`, this value will be used to compress the request. The value should be between 0.0 and 1.0. The default value is 0.75. | For details about `Message` type, see the [Send Method documentation](/sdk/go/send#message-object). For details about `Tool` and `ToolChoice` types, see the [Tools documentation](/sdk/go/tools). @@ -110,6 +112,7 @@ Each chunk received from the channel has the following structure: | `Created` | `int64` | Unix timestamp of when the chunk was created | | `Model` | `string` | Model identifier used for the completion | | `Choices` | `[]StreamChoice` | Array of streaming choices (typically one) | +| `Compression` | `*Compression` | Token compression metrics (if compression was applied) | ### StreamChoice Object diff --git a/sdk/index.mdx b/sdk/index.mdx index 60e4262..f8d71d6 100644 --- a/sdk/index.mdx +++ b/sdk/index.mdx @@ -4,7 +4,7 @@ description: Choose your SDK and start using Edgee. icon: boxes --- -Edgee provides official SDKs for TypeScript, Python, Go, and Rust. All SDKs offer a consistent, type-safe interface to interact with the Edgee AI Gateway, supporting OpenAI-compatible chat completions and function calling. +Edgee provides official SDKs for TypeScript, Python, Go, and Rust. All SDKs offer a consistent, type-safe interface to interact with the Edgee AI Gateway, supporting OpenAI-compatible chat completions and function calling. All SDKs include built-in support for **token compression** and **cost tracking**, automatically reducing your LLM spend by up to 50%. ## Quick Start @@ -27,7 +27,9 @@ Choose your language and get started in minutes: }); console.log(response.text); - // "The capital of France is Paris." + if (response.compression) { + console.log(`Tokens saved: ${response.compression.saved_tokens}`); + } ``` @@ -47,7 +49,8 @@ Choose your language and get started in minutes: ) print(response.text) - # "The capital of France is Paris." + if response.compression: + print(f"Tokens saved: {response.compression.saved_tokens}") ``` @@ -74,7 +77,9 @@ Choose your language and get started in minutes: } fmt.Println(response.Text()) - // "The capital of France is Paris." + if response.Compression != nil { + fmt.Printf("Tokens saved: %d\n", response.Compression.SavedTokens) + } } ``` @@ -91,7 +96,9 @@ Choose your language and get started in minutes: let response = client.send("gpt-4o", "What is the capital of France?").await.unwrap(); println!("{}", response.text().unwrap_or("")); - // "The capital of France is Paris." + if let Some(compression) = &response.compression { + println!("Tokens saved: {}", compression.saved_tokens); + } ``` @@ -100,12 +107,14 @@ Choose your language and get started in minutes: All SDKs provide consistent functionality: +- **Token Compression**: Automatic prompt compression with savings reporting +- **Compression Metrics**: Real-time token savings and compression rate for every request - **OpenAI-compatible API**: Use familiar patterns across all languages - **Function Calling**: Full support for tool/function calling - **Type Safety**: Strong typing and autocomplete support - **Error Handling**: Comprehensive error handling and validation - **Environment Variables**: Support for `EDGEE_API_KEY` -- **Token Usage**: Access to prompt, completion, and total token counts +- **Token Usage**: Access to prompt, completion, saved tokens, and compression ratios To learn more about the SDKs, see the individual SDK pages: diff --git a/sdk/python/send.mdx b/sdk/python/send.mdx index 20c71cd..9e28595 100644 --- a/sdk/python/send.mdx +++ b/sdk/python/send.mdx @@ -42,6 +42,8 @@ When `input` is an `InputObject` or dictionary, you have full control over the c | `tools` | `list[dict] \| None` | Array of function tools available to the model | | `tool_choice` | `str \| dict \| None` | Controls which tool (if any) the model should call. See [Tools documentation](/sdk/python/tools) for details | | `tags` | `list[str] \| None` | Optional tags to categorize and label the request for analytics and filtering. Can also be sent via the `x-edgee-tags` header (comma-separated) | +| `enable_compression` | `bool` | Enable token compression for this request. If `true`, the request will be compressed to the compression rate specified in the API key settings. If `false`, the request will not be compressed. | +| `compression_rate` | `float` | The compression rate to use for this request. If `enable_compression` is `true`, this value will be used to compress the request. The value should be between 0.0 and 1.0. The default value is 0.75. | **Example with Dictionary Input:** @@ -124,6 +126,7 @@ The `send()` method returns a `SendResponse` object when `stream=False` (default |----------|------|-------------| | `choices` | `list[Choice]` | Array of completion choices (typically one) | | `usage` | `Usage \| None` | Token usage information (if provided by the API) | +| `compression` | `Compression \| None` | Token compression metrics (if compression was applied) | ### Choice Object @@ -165,7 +168,7 @@ Token usage information (when available): | Property | Type | Description | |----------|------|-------------| -| `prompt_tokens` | `int` | Number of tokens in the prompt | +| `prompt_tokens` | `int` | Number of tokens in the prompt (after compression if applied) | | `completion_tokens` | `int` | Number of tokens in the completion | | `total_tokens` | `int` | Total tokens used (prompt + completion) | @@ -183,6 +186,40 @@ if response.usage: print(f"Total tokens: {response.usage.total_tokens}") ``` +### Compression Object + +Token compression metrics (when compression is applied): + +| Property | Type | Description | +|----------|------|-------------| +| `input_tokens` | `int` | Original number of input tokens before compression | +| `saved_tokens` | `int` | Number of tokens saved by compression | +| `rate` | `float` | Compression rate as a decimal (0-1). For example, `0.61` means 61% compression | + +**Example - Accessing Compression Metrics:** + +```python +response = edgee.send( + model="gpt-4o", + input={ + "messages": [ + {"role": "user", "content": "Analyze this long document with lots of context..."} + ], + "enable_compression": True, + "compression_rate": 0.8 + } +) + +if response.compression: + print(f"Original input tokens: {response.compression.input_tokens}") + print(f"Tokens saved: {response.compression.saved_tokens}") + print(f"Compression rate: {response.compression.rate * 100:.1f}%") +``` + + + The `compression` object is only present when token compression is applied to the request. Simple queries may not trigger compression. + + ## Convenience Properties The `SendResponse` class provides convenience properties for easier access: diff --git a/sdk/python/stream.mdx b/sdk/python/stream.mdx index 752b2b4..ec7f3d6 100644 --- a/sdk/python/stream.mdx +++ b/sdk/python/stream.mdx @@ -40,6 +40,8 @@ When `input` is an `InputObject` or dictionary, you have full control over the c | `tools` | `list[dict] \| None` | Array of function tools available to the model | | `tool_choice` | `str \| dict \| None` | Controls which tool (if any) the model should call. See [Tools documentation](/sdk/python/tools) for details | | `tags` | `list[str] \| None` | Optional tags to categorize and label the request for analytics and filtering. Can also be sent via the `x-edgee-tags` header (comma-separated) | +| `enable_compression` | `bool` | Enable token compression for this request. If `true`, the request will be compressed to the compression rate specified in the API key settings. If `false`, the request will not be compressed. | +| `compression_rate` | `float` | The compression rate to use for this request. If `enable_compression` is `true`, this value will be used to compress the request. The value should be between 0.0 and 1.0. The default value is 0.75. | For details about `Message` type, see the [Send Method documentation](/sdk/python/send#message-object). For details about `Tool` and `ToolChoice` types, see the [Tools documentation](/sdk/python/tools). @@ -68,6 +70,7 @@ Each chunk yielded by the generator has the following structure: | Property | Type | Description | |----------|------|-------------| | `choices` | `list[StreamChoice]` | Array of streaming choices (typically one) | +| `compression` | `Compression \| None` | Token compression metrics (if compression was applied) | ### StreamChoice Object diff --git a/sdk/rust/send.mdx b/sdk/rust/send.mdx index 703c9e1..99f7ad4 100644 --- a/sdk/rust/send.mdx +++ b/sdk/rust/send.mdx @@ -57,6 +57,8 @@ When `input` is an `InputObject`, you have full control over the conversation: | `tools` | `Option>` | Array of function tools available to the model | | `tool_choice` | `Option` | Controls which tool (if any) the model should call. See [Tools documentation](/sdk/rust/tools) for details | | `tags` | `Option>` | Optional tags to categorize and label the request for analytics and filtering. Can also be sent via the `x-edgee-tags` header (comma-separated) | +| `enable_compression` | `Option` | Enable token compression for this request (overrides console settings). If not set, uses the configuration from your API key or organization settings | +| `compression_rate` | `Option` | Target compression rate (0.0-1.0, default 0.75). Only used if compression is enabled. Higher values attempt more aggressive compression | **Example with InputObject:** @@ -142,6 +144,7 @@ The `send()` method returns a `Result`. On success, it contains: | `model` | `String` | Model identifier used for the completion | | `choices` | `Vec` | Array of completion choices (typically one) | | `usage` | `Option` | Token usage information (if provided by the API) | +| `compression` | `Option` | Token compression metrics (if compression was applied) | ### Choice Object @@ -181,7 +184,7 @@ Token usage information (when available): | Property | Type | Description | |----------|------|-------------| -| `prompt_tokens` | `u32` | Number of tokens in the prompt | +| `prompt_tokens` | `u32` | Number of tokens in the prompt (after compression if applied) | | `completion_tokens` | `u32` | Number of tokens in the completion | | `total_tokens` | `u32` | Total tokens used (prompt + completion) | @@ -197,6 +200,39 @@ if let Some(usage) = &response.usage { } ``` +### Compression Object + +Token compression metrics (when compression is applied): + +| Property | Type | Description | +|----------|------|-------------| +| `input_tokens` | `u32` | Original number of input tokens before compression | +| `saved_tokens` | `u32` | Number of tokens saved by compression | +| `rate` | `f64` | Compression rate as a decimal (0-1). For example, `0.61` means 61% compression | + +**Example - Accessing Compression Metrics:** + +```rust +let input = InputObject::new(vec![ + Message::user("Analyze this long document with lots of context...") +]) +.with_enable_compression(true) +.with_compression_rate(0.8); // Target 80% compression + +let response = client.send("gpt-4o", input).await?; +println!("{}", response.text().unwrap_or("")); + +if let Some(compression) = &response.compression { + println!("Original input tokens: {}", compression.input_tokens); + println!("Tokens saved: {}", compression.saved_tokens); + println!("Compression rate: {:.1}%", compression.rate * 100.0); +} +``` + + + The `compression` object is only present when token compression is applied to the request. Simple queries may not trigger compression. + + ## Convenience Methods The `SendResponse` struct provides convenience methods for easier access: diff --git a/sdk/rust/stream.mdx b/sdk/rust/stream.mdx index f4a1ba1..4581761 100644 --- a/sdk/rust/stream.mdx +++ b/sdk/rust/stream.mdx @@ -55,6 +55,8 @@ When `input` is a `Vec` or `InputObject`, you have full control over th | `tools` | `Option>` | Array of function tools available to the model | | `tool_choice` | `Option` | Controls which tool (if any) the model should call. See [Tools documentation](/sdk/rust/tools) for details | | `tags` | `Option>` | Optional tags to categorize and label the request for analytics and filtering. Can also be sent via the `x-edgee-tags` header (comma-separated) | +| `enable_compression` | `Option` | Enable token compression for this request (overrides console settings). If not set, uses the configuration from your API key or organization settings | +| `compression_rate` | `Option` | Target compression rate (0.0-1.0, default 0.75). Only used if compression is enabled. Higher values attempt more aggressive compression | For details about `Message` type, see the [Send Method documentation](/sdk/rust/send#message-object). For details about `Tool` and `ToolChoice` types, see the [Tools documentation](/sdk/rust/tools). @@ -96,6 +98,7 @@ Each chunk yielded by the stream has the following structure: | `created` | `u64` | Unix timestamp of when the chunk was created | | `model` | `String` | Model identifier used for the completion | | `choices` | `Vec` | Array of streaming choices (typically one) | +| `compression` | `Option` | Token compression metrics (if compression was applied) | ### StreamChoice Object diff --git a/sdk/typescript/send.mdx b/sdk/typescript/send.mdx index f3a04f8..40cdb90 100644 --- a/sdk/typescript/send.mdx +++ b/sdk/typescript/send.mdx @@ -45,6 +45,8 @@ When `input` is an `InputObject`, you have full control over the conversation: | `tools` | `Tool[]` | Array of function tools available to the model | | `tool_choice` | `ToolChoice` | Controls which tool (if any) the model should call. See [Tools documentation](/sdk/typescript/tools) for details | | `tags` | `string[]` | Optional tags to categorize and label the request for analytics and filtering. Can also be sent via the `x-edgee-tags` header (comma-separated) | +| `enable_compression` | `bool` | Enable token compression for this request. If `true`, the request will be compressed to the compression rate specified in the API key settings. If `false`, the request will not be compressed. | +| `compression_rate` | `number` | The compression rate to use for this request. If `enable_compression` is `true`, this value will be used to compress the request. The value should be between 0.0 and 1.0. The default value is 0.75. | **Example with InputObject:** @@ -127,6 +129,7 @@ The `send()` method returns a `Promise` with the following structu |----------|------|-------------| | `choices` | `Choice[]` | Array of completion choices (typically one) | | `usage` | `Usage \| undefined` | Token usage information (if provided by the API) | +| `compression` | `Compression \| undefined` | Token compression metrics (if compression was applied) | ### Choice Object @@ -169,7 +172,7 @@ Token usage information (when available): | Property | Type | Description | |----------|------|-------------| -| `prompt_tokens` | `number` | Number of tokens in the prompt | +| `prompt_tokens` | `number` | Number of tokens in the prompt (after compression if applied) | | `completion_tokens` | `number` | Number of tokens in the completion | | `total_tokens` | `number` | Total tokens used (prompt + completion) | @@ -188,6 +191,41 @@ if (response.usage) { } ``` +### Compression Object + +Token compression metrics (when compression is applied): + +| Property | Type | Description | +|----------|------|-------------| +| `input_tokens` | `number` | Original number of input tokens before compression | +| `saved_tokens` | `number` | Number of tokens saved by compression | +| `rate` | `number` | Compression rate as a decimal (0-1). For example, `0.61` means 61% compression | + +**Example - Accessing Compression Metrics:** + +```typescript +const response = await edgee.send({ + model: 'gpt-4o', + input: { + messages: [ + { role: 'user', content: 'Analyze this long document with lots of context...' } + ], + enable_compression: true, + compression_rate: 0.8 + } +}); + +if (response.compression) { + console.log(`Original input tokens: ${response.compression.input_tokens}`); + console.log(`Tokens saved: ${response.compression.saved_tokens}`); + console.log(`Compression rate: ${(response.compression.rate * 100).toFixed(1)}%`); +} +``` + + + The `compression` object is only present when token compression is applied to the request. Simple queries may not trigger compression. + + ## Convenience Properties The `SendResponse` class provides convenience getters for easier access: diff --git a/sdk/typescript/stream.mdx b/sdk/typescript/stream.mdx index 8c925de..5361802 100644 --- a/sdk/typescript/stream.mdx +++ b/sdk/typescript/stream.mdx @@ -45,6 +45,8 @@ When `input` is an `InputObject`, you have full control over the conversation: | `tools` | `Tool[]` | Array of function tools available to the model | | `tool_choice` | `ToolChoice` | Controls which tool (if any) the model should call. See [Tools documentation](/sdk/typescript/tools) for details | | `tags` | `string[]` | Optional tags to categorize and label the request for analytics and filtering. Can also be sent via the `x-edgee-tags` header (comma-separated) | +| `enable_compression` | `bool` | Enable token compression for this request. If `true`, the request will be compressed to the compression rate specified in the API key settings. If `false`, the request will not be compressed. | +| `compression_rate` | `number` | The compression rate to use for this request. If `enable_compression` is `true`, this value will be used to compress the request. The value should be between 0.0 and 1.0. The default value is 0.75. | For details about `Message` type, see the [Send Method documentation](/sdk/typescript/send#message-object). For details about `Tool` and `ToolChoice` types, see the [Tools documentation](/sdk/typescript/tools). @@ -76,6 +78,7 @@ Each chunk yielded by the generator has the following structure: | Property | Type | Description | |----------|------|-------------| | `choices` | `StreamChoice[]` | Array of streaming choices (typically one) | +| `compression` | `Compression \| null` | Token compression metrics (if compression was applied) | ### StreamChoice Object