Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .changeset/kind-donuts-dream.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
'@tanstack/ai-openrouter': minor
'@tanstack/ai-anthropic': minor
'@tanstack/ai-gemini': minor
'@tanstack/ai-ollama': minor
'@tanstack/ai-openai': minor
'@tanstack/ai-grok': minor
'@tanstack/ai': minor
'@tanstack/ai-devtools-core': patch
---

Enhanced usage reporting for every provider
107 changes: 102 additions & 5 deletions docs/protocol/chunk-definitions.md
Original file line number Diff line number Diff line change
Expand Up @@ -266,15 +266,47 @@ Emitted when the stream completes successfully.
interface DoneStreamChunk extends BaseStreamChunk {
type: 'done';
finishReason: 'stop' | 'length' | 'content_filter' | 'tool_calls' | null;
usage?: {
promptTokens: number;
completionTokens: number;
totalTokens: number;
usage?: TokenUsage;
}

interface TokenUsage {
// Core token counts (always present when usage is available)
promptTokens: number;
completionTokens: number;
totalTokens: number;

// Detailed prompt token breakdown
promptTokensDetails?: {
cachedTokens?: number; // Tokens from prompt cache hits
cacheWriteTokens?: number; // Tokens written to cache
cacheCreationTokens?: number; // Anthropic cache creation tokens
cacheReadTokens?: number; // Anthropic cache read tokens
audioTokens?: number; // Audio input tokens
videoTokens?: number; // Video input tokens
imageTokens?: number; // Image input tokens
textTokens?: number; // Text input tokens
};

// Detailed completion token breakdown
completionTokensDetails?: {
reasoningTokens?: number; // Reasoning/thinking tokens (o1, Claude)
audioTokens?: number; // Audio output tokens
videoTokens?: number; // Video output tokens
imageTokens?: number; // Image output tokens
textTokens?: number; // Text output tokens
acceptedPredictionTokens?: number; // Accepted prediction tokens
rejectedPredictionTokens?: number; // Rejected prediction tokens
};

// Provider-specific details
providerUsageDetails?: Record<string, unknown>;

// Duration (for some billing models)
durationSeconds?: number;
}
```

**Example:**
**Example (basic usage):**
```json
{
"type": "done",
Expand All @@ -290,6 +322,64 @@ interface DoneStreamChunk extends BaseStreamChunk {
}
```

**Example (with cached tokens - OpenAI):**
```json
{
"type": "done",
"id": "chatcmpl-abc123",
"model": "gpt-4o",
"timestamp": 1701234567892,
"finishReason": "stop",
"usage": {
"promptTokens": 150,
"completionTokens": 75,
"totalTokens": 225,
"promptTokensDetails": {
"cachedTokens": 100
}
}
}
```

**Example (with reasoning tokens - o1):**
```json
{
"type": "done",
"id": "chatcmpl-abc123",
"model": "o1-preview",
"timestamp": 1701234567892,
"finishReason": "stop",
"usage": {
"promptTokens": 150,
"completionTokens": 500,
"totalTokens": 650,
"completionTokensDetails": {
"reasoningTokens": 425
}
}
}
```

**Example (Anthropic with cache):**
```json
{
"type": "done",
"id": "msg_abc123",
"model": "claude-3-5-sonnet",
"timestamp": 1701234567892,
"finishReason": "stop",
"usage": {
"promptTokens": 150,
"completionTokens": 75,
"totalTokens": 225,
"promptTokensDetails": {
"cacheCreationTokens": 50,
"cacheReadTokens": 100
}
}
}
```

**Finish Reasons:**
- `stop` - Natural completion
- `length` - Reached max tokens
Expand All @@ -302,6 +392,13 @@ interface DoneStreamChunk extends BaseStreamChunk {
- Clean up streaming state
- Display token usage (if available)

**Token Usage Notes:**
- `promptTokensDetails.cachedTokens` - OpenAI prompt caching
- `promptTokensDetails.cacheCreationTokens` / `cacheReadTokens` - Anthropic caching
- `completionTokensDetails.reasoningTokens` - Internal reasoning tokens (o1, Claude thinking)
- `providerUsageDetails` - Provider-specific fields not in the standard schema
- For Gemini, modality-specific token counts (audio, video, image, text) are extracted from the response

---

### ErrorStreamChunk
Expand Down
6 changes: 4 additions & 2 deletions examples/ts-react-chat/src/routes/api.tanchat.ts
Original file line number Diff line number Diff line change
Expand Up @@ -129,8 +129,10 @@ export const Route = createFileRoute('/api/tanchat')({
openai: () =>
createChatOptions({
adapter: openaiText((model || 'gpt-4o') as 'gpt-4o'),
temperature: 2,
modelOptions: {},
modelOptions: {
prompt_cache_key: 'user-session-12345',
prompt_cache_retention: '24h',
},
Comment on lines 129 to +135
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, examine the file to understand the context
cat -n examples/ts-react-chat/src/routes/api.tanchat.ts | head -150

Repository: TanStack/ai

Length of output: 6304


🏁 Script executed:

# Check if conversationId is available in scope at the lines in question
cat -n examples/ts-react-chat/src/routes/api.tanchat.ts | sed -n '70,140p'

Repository: TanStack/ai

Length of output: 3031


🏁 Script executed:

# Search for openai adapter imports and type definitions
rg -A 5 -B 5 "openaiText\|createChatOptions" examples/ts-react-chat/src/routes/api.tanchat.ts

Repository: TanStack/ai

Length of output: 37


🏁 Script executed:

# Look for `@tanstack/ai-openai` package structure and adapter options
find . -name "*.ts" -o -name "*.tsx" | xargs rg -l "@tanstack/ai-openai" | head -5

Repository: TanStack/ai

Length of output: 269


🌐 Web query:

OpenAI prompt caching API documentation prompt_cache_key prompt_cache_retention

πŸ’‘ Result:

Summary β€” prompt_cache_key and prompt_cache_retention (OpenAI Prompt Caching)

  • What they are:

    • prompt_cache_key: optional string you send with a request to influence cache routing and improve cache-hit rates for prompts that share the same prefix. Use it consistently for requests that share the same static prefix. [1][2]
    • prompt_cache_retention: optional setting controlling how long cached prompt prefixes are retained. Allowed values: "in_memory" (default) and "24h" (extended retention). Setting "24h" enables extended caching (up to 24 hours) on supported models. [1][3]
  • Effects / behavior:

    • Caching is only possible for prompts β‰₯ 1024 tokens; cached prefix routing/hash typically uses the first ~256 tokens (model-dependent). Cached tokens are reported in response. [1]
    • prompt_cache_key is combined with the prefix hash; keep each unique prefix+key below ~15 requests/min to avoid overflow. [1]
    • in_memory retention: cached prefixes typically expire after 5–10 minutes of inactivity (max ~1 hour) and are stored in volatile GPU memory. [1]
    • 24h (extended) retention: available on certain models (e.g., gpt-5.1, gpt-5.2, gpt-4.1, gpt-5 family variants listed in docs) and can keep cached prefixes up to 24 hours by offloading key/value tensors to GPU-local storage. Extended caching may affect Zero Data Retention eligibility. [1]
  • How to set:

    • Include prompt_cache_key and/or prompt_cache_retention in your Requests.create or chat.completions.create payload. Example:
      {"model":"gpt-5.1","input":"...","prompt_cache_key":"my-key","prompt_cache_retention":"24h"}. [1][3]
  • Notes / best practices:

    • Put static content (instructions/examples) at the beginning of prompts and dynamic content at the end for better prefix reuse. Monitor cached_tokens in response usage to measure cache hits. [1]

Sources:
[1] OpenAI Platform docs β€” Prompt Caching (platform.openai.com/docs/guides/prompt-caching).
[2] OpenAI API reference β€” Chat/Chat-completions object (platform.openai.com/docs/api-reference/chat/object).
[3] OpenAI API reference β€” Responses object (platform.openai.com/docs/api-reference/responses).


Avoid a hardcoded prompt cache key (cross‑user cache collision risk).

A fixed prompt_cache_key causes all users' prompts to share the same cache routing, risking data mixing. Use conversationId when available to scope the cache key per user/session.

πŸ› οΈ Suggested change
           openai: () =>
             createChatOptions({
               adapter: openaiText((model || 'gpt-4o') as 'gpt-4o'),
-              modelOptions: {
-                prompt_cache_key: 'user-session-12345',
-                prompt_cache_retention: '24h',
-              },
+              modelOptions: conversationId
+                ? {
+                    prompt_cache_key: conversationId,
+                    prompt_cache_retention: '24h',
+                  }
+                : {},
             }),
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
openai: () =>
createChatOptions({
adapter: openaiText((model || 'gpt-4o') as 'gpt-4o'),
temperature: 2,
modelOptions: {},
modelOptions: {
prompt_cache_key: 'user-session-12345',
prompt_cache_retention: '24h',
},
openai: () =>
createChatOptions({
adapter: openaiText((model || 'gpt-4o') as 'gpt-4o'),
modelOptions: conversationId
? {
prompt_cache_key: conversationId,
prompt_cache_retention: '24h',
}
: {},
}),
πŸ€– Prompt for AI Agents
In `@examples/ts-react-chat/src/routes/api.tanchat.ts` around lines 129 - 135, The
prompt_cache_key is hardcoded in the createChatOptions call (openai: () =>
createChatOptions({...}, modelOptions: { prompt_cache_key: 'user-session-12345'
})) which risks cross-user cache collisions; change prompt_cache_key to be
derived from the actual conversation/session identifier (e.g., conversationId or
request-specific session id) when available, falling back to a safe default only
if no conversation id exists, so construct the key using the conversationId from
the request/context and replace the fixed 'user-session-12345' value in the
createChatOptions modelOptions.

}),
}

Expand Down
2 changes: 1 addition & 1 deletion packages/typescript/ai-anthropic/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
"test:types": "tsc"
},
"dependencies": {
"@anthropic-ai/sdk": "^0.71.0"
"@anthropic-ai/sdk": "^0.71.2"
},
"peerDependencies": {
"@tanstack/ai": "workspace:^",
Expand Down
21 changes: 5 additions & 16 deletions packages/typescript/ai-anthropic/src/adapters/summarize.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import { BaseSummarizeAdapter } from '@tanstack/ai/adapters'
import { buildAnthropicUsage } from '../usage'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | πŸ”΄ Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, explore the repository structure
find packages/typescript/ai-anthropic -type f -name "*.ts" | head -20

# Check buildAnthropicUsage implementation
echo "=== buildAnthropicUsage implementation ==="
cat -n packages/typescript/ai-anthropic/src/usage.ts

# Check imports and usage in summarize adapter
echo "=== Summarize adapter streaming handling ==="
cat -n packages/typescript/ai-anthropic/src/adapters/summarize.ts | head -150

Repository: TanStack/ai

Length of output: 166


🏁 Script executed:

#!/bin/bash
# Inspect buildAnthropicUsage signature and implementation
cat -n packages/typescript/ai-anthropic/src/usage.ts

# Check streaming event handling in summarize adapter
echo "=== Summarize adapter ==="
cat -n packages/typescript/ai-anthropic/src/adapters/summarize.ts

# Check streaming event handling in text adapter
echo "=== Text adapter ==="
cat -n packages/typescript/ai-anthropic/src/adapters/text.ts

# Look for message_start and message_delta handling
echo "=== Searching for streaming event handling ==="
rg -n "message_start|message_delta" packages/typescript/ai-anthropic/src/

Repository: TanStack/ai

Length of output: 33848


Streaming usage under-reports prompt tokens; must capture and merge message_start usage data.

In streaming responses, message_start events carry the initial prompt token count while message_delta events provide only output tokens. The code currently ignores message_start (summarize.ts:95 has an empty handler) and only calls buildAnthropicUsage(event.usage) on message_delta events (summarize.ts:121, text.ts:565, text.ts:590). When BetaMessageDeltaUsage lacks input_tokens, buildAnthropicUsage() defaults to zero (usage.ts:33), causing the final usage report to omit prompt tokens entirely.

Extract and cache usage data from message_start events, then merge input tokens into the final message_delta usage before calling buildAnthropicUsage().

πŸ€– Prompt for AI Agents
In `@packages/typescript/ai-anthropic/src/adapters/summarize.ts` at line 2, The
streaming handler in summarize.ts currently ignores message_start usage and only
builds usage from message_delta, causing prompt (input) tokens to be lost;
update the streaming logic (the message_start and message_delta handlers in
summarize.ts) to capture and cache the BetaMessageDeltaUsage from message_start
(store its input_tokens) and when processing message_delta merge the cached
input_tokens into the message_delta usage object before calling
buildAnthropicUsage(event.usage); ensure you reference the cached value by a
clear name (e.g., cachedStartUsage) and clear it after merging so subsequent
messages don’t reuse stale data.

import {
createAnthropicClient,
generateId,
Expand Down Expand Up @@ -52,7 +53,7 @@ export class AnthropicSummarizeAdapter<
async summarize(options: SummarizationOptions): Promise<SummarizationResult> {
const systemPrompt = this.buildSummarizationPrompt(options)

const response = await this.client.messages.create({
const response = await this.client.beta.messages.create({
model: options.model,
messages: [{ role: 'user', content: options.text }],
system: systemPrompt,
Expand All @@ -69,11 +70,7 @@ export class AnthropicSummarizeAdapter<
id: response.id,
model: response.model,
summary: content,
usage: {
promptTokens: response.usage.input_tokens,
completionTokens: response.usage.output_tokens,
totalTokens: response.usage.input_tokens + response.usage.output_tokens,
},
usage: buildAnthropicUsage(response.usage),
}
}

Expand All @@ -84,10 +81,8 @@ export class AnthropicSummarizeAdapter<
const id = generateId(this.name)
const model = options.model
let accumulatedContent = ''
let inputTokens = 0
let outputTokens = 0

const stream = await this.client.messages.create({
const stream = await this.client.beta.messages.create({
model: options.model,
messages: [{ role: 'user', content: options.text }],
system: systemPrompt,
Expand All @@ -98,7 +93,6 @@ export class AnthropicSummarizeAdapter<

for await (const event of stream) {
if (event.type === 'message_start') {
inputTokens = event.message.usage.input_tokens
} else if (event.type === 'content_block_delta') {
if (event.delta.type === 'text_delta') {
const delta = event.delta.text
Expand All @@ -114,7 +108,6 @@ export class AnthropicSummarizeAdapter<
}
}
} else if (event.type === 'message_delta') {
outputTokens = event.usage.output_tokens
yield {
type: 'done',
id,
Expand All @@ -125,11 +118,7 @@ export class AnthropicSummarizeAdapter<
| 'length'
| 'content_filter'
| null,
usage: {
promptTokens: inputTokens,
completionTokens: outputTokens,
totalTokens: inputTokens + outputTokens,
},
usage: buildAnthropicUsage(event.usage),
}
}
}
Expand Down
20 changes: 5 additions & 15 deletions packages/typescript/ai-anthropic/src/adapters/text.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import { BaseTextAdapter } from '@tanstack/ai/adapters'
import { convertToolsToProviderFormat } from '../tools/tool-converter'
import { validateTextProviderOptions } from '../text/text-provider-options'
import { buildAnthropicUsage } from '../usage'
import {
createAnthropicClient,
generateId,
Expand Down Expand Up @@ -175,7 +176,7 @@ export class AnthropicTextAdapter<

try {
// Make non-streaming request with tool_choice forced to our structured output tool
const response = await this.client.messages.create(
const response = await this.client.beta.messages.create(
{
...requestParams,
stream: false,
Expand Down Expand Up @@ -222,6 +223,7 @@ export class AnthropicTextAdapter<
return {
data: parsed,
rawText,
usage: buildAnthropicUsage(response.usage),
}
} catch (error: unknown) {
const err = error as Error
Expand Down Expand Up @@ -560,13 +562,7 @@ export class AnthropicTextAdapter<
model: model,
timestamp,
finishReason: 'tool_calls',
usage: {
promptTokens: event.usage.input_tokens || 0,
completionTokens: event.usage.output_tokens || 0,
totalTokens:
(event.usage.input_tokens || 0) +
(event.usage.output_tokens || 0),
},
usage: buildAnthropicUsage(event.usage),
}
break
}
Expand All @@ -591,13 +587,7 @@ export class AnthropicTextAdapter<
model: model,
timestamp,
finishReason: 'stop',
usage: {
promptTokens: event.usage.input_tokens || 0,
completionTokens: event.usage.output_tokens || 0,
totalTokens:
(event.usage.input_tokens || 0) +
(event.usage.output_tokens || 0),
},
usage: buildAnthropicUsage(event.usage),
}
}
}
Expand Down
3 changes: 3 additions & 0 deletions packages/typescript/ai-anthropic/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,6 @@ export { convertToolsToProviderFormat } from './tools/tool-converter'

// Export tool types
export type { AnthropicTool, CustomTool } from './tools'

// Export provider usage types
export type { AnthropicProviderUsageDetails } from './usage'
66 changes: 66 additions & 0 deletions packages/typescript/ai-anthropic/src/usage.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
import { buildBaseUsage } from '@tanstack/ai'
import type { TokenUsage } from '@tanstack/ai'
import type Anthropic_SDK from '@anthropic-ai/sdk'

/**
* Anthropic-specific provider usage details.
* These fields are unique to Anthropic and placed in providerUsageDetails.
*/
export interface AnthropicProviderUsageDetails {
/**
* Server-side tool usage metrics.
* Available when using Anthropic's built-in tools like web search.
*/
serverToolUse?: {
/** Number of web search requests made during the response */
webSearchRequests?: number
/** Number of web fetch requests made during the response */
webFetchRequests?: number
}
/** Index signature for Record<string, unknown> compatibility */
[key: string]: unknown
}

/**
* Build normalized TokenUsage from Anthropic's usage object.
* Handles cache tokens and server tool use metrics.
*/
export function buildAnthropicUsage(
usage:
| Anthropic_SDK.Beta.BetaUsage
| Anthropic_SDK.Beta.BetaMessageDeltaUsage,
): TokenUsage {
const inputTokens = usage.input_tokens ?? 0
const outputTokens = usage.output_tokens

const result = buildBaseUsage({
promptTokens: inputTokens,
completionTokens: outputTokens,
totalTokens: inputTokens + outputTokens,
})

// Add prompt token details for cache tokens
const cacheCreation = usage.cache_creation_input_tokens
const cacheRead = usage.cache_read_input_tokens

result.promptTokensDetails = {
...(cacheCreation ? { cacheWriteTokens: cacheCreation } : {}),
...(cacheRead ? { cachedTokens: cacheRead } : {}),
}

// Add provider-specific usage details for server tool use
const serverToolUse = usage.server_tool_use

result.providerUsageDetails = {
serverToolUse: {
...(serverToolUse?.web_search_requests
? { webSearchRequests: serverToolUse.web_search_requests }
: {}),
...(serverToolUse?.web_fetch_requests
? { webFetchRequests: serverToolUse.web_fetch_requests }
: {}),
},
} satisfies AnthropicProviderUsageDetails

return result
}
Loading
Loading