-
-
Notifications
You must be signed in to change notification settings - Fork 116
feat: improve usage #242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: improve usage #242
Conversation
📝 WalkthroughWalkthroughAdds a unified TokenUsage model and per-provider converters; threads normalized usage through adapters, summaries, transcriptions, types, tests, and the DevTools UI to report granular token, modality, cache, reasoning, and duration metrics. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client/App
participant Adapter as Provider Adapter
participant SDK as Provider SDK
participant Builder as buildXxxUsage
participant Base as buildBaseUsage
participant Stream as StreamEmitter
Client->>Adapter: request (chat/summarize/transcribe)
Adapter->>SDK: call provider API
SDK-->>Adapter: response / streaming chunks (may include provider usage)
Adapter->>Builder: buildXxxUsage(provider usage)
Builder->>Base: buildBaseUsage({promptTokens, completionTokens, totalTokens})
Base-->>Builder: core TokenUsage
Builder-->>Adapter: normalized TokenUsage (with details/provider fields)
Adapter->>Stream: emit StreamChunk / Done chunk with usage
Stream-->>Client: client receives chunk with TokenUsage
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
View your CI Pipeline Execution ↗ for commit 71ee5da
☁️ Nx Cloud last updated this comment at |
@tanstack/ai
@tanstack/ai-anthropic
@tanstack/ai-client
@tanstack/ai-devtools-core
@tanstack/ai-gemini
@tanstack/ai-grok
@tanstack/ai-ollama
@tanstack/ai-openai
@tanstack/ai-openrouter
@tanstack/ai-preact
@tanstack/ai-react
@tanstack/ai-react-ui
@tanstack/ai-solid
@tanstack/ai-solid-ui
@tanstack/ai-svelte
@tanstack/ai-vue
@tanstack/ai-vue-ui
@tanstack/preact-ai-devtools
@tanstack/react-ai-devtools
@tanstack/solid-ai-devtools
commit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/typescript/ai-openai/tests/model-meta.test.ts (1)
373-392: Test descriptions don't match the actual model keys.The test descriptions reference the old model names (
gpt-5.1-chat,gpt-5-chat) while the type assertions use the new keys (gpt-5.1-chat-latest,gpt-5-chat-latest). Consider updating the descriptions for consistency.📝 Suggested fix
describe('Chat-only models WITH reasoning AND structured output but WITHOUT tools', () => { - it('gpt-5.1-chat should have reasoning and structured output but NOT tools', () => { + it('gpt-5.1-chat-latest should have reasoning and structured output but NOT tools', () => { type Options = OpenAIChatModelProviderOptionsByName['gpt-5.1-chat-latest'] ... }) - it('gpt-5-chat should have reasoning and structured output but NOT tools', () => { + it('gpt-5-chat-latest should have reasoning and structured output but NOT tools', () => { type Options = OpenAIChatModelProviderOptionsByName['gpt-5-chat-latest'] ... }) })
🤖 Fix all issues with AI agents
In `@examples/ts-react-chat/src/routes/api.tanchat.ts`:
- Around line 129-135: The prompt_cache_key is hardcoded in the
createChatOptions call (openai: () => createChatOptions({...}, modelOptions: {
prompt_cache_key: 'user-session-12345' })) which risks cross-user cache
collisions; change prompt_cache_key to be derived from the actual
conversation/session identifier (e.g., conversationId or request-specific
session id) when available, falling back to a safe default only if no
conversation id exists, so construct the key using the conversationId from the
request/context and replace the fixed 'user-session-12345' value in the
createChatOptions modelOptions.
In `@packages/typescript/ai-anthropic/src/adapters/text.ts`:
- Around line 50-107: The buildAnthropicUsage function leaves output_tokens
undefined when absent, which can make totalTokens become NaN; fix by defaulting
output_tokens to 0 the same way input_tokens is handled (use the same
null-coalescing pattern), update the local variable currently named outputTokens
(derived from usage.output_tokens) to default to 0, and ensure the TokenUsage
object (promptTokens, completionTokens, totalTokens) uses these safe numbers.
In `@packages/typescript/ai-ollama/src/adapters/text.ts`:
- Around line 22-73: In buildOllamaUsage, avoid treating missing token counts as
numbers and don't early-return when counts are 0 so provider duration details
aren't lost: read raw counts into e.g. promptTokensRaw =
response.prompt_eval_count and completionTokensRaw = response.eval_count,
determine if both raw counts are undefined and no provider durations exist then
return undefined; otherwise coalesce for arithmetic (promptTokens =
promptTokensRaw ?? 0, completionTokens = completionTokensRaw ?? 0), compute
totalTokens = promptTokens + completionTokens, and keep the existing
providerDetails logic so duration fields are preserved even when token counts
are zero.
In `@packages/typescript/ai-openai/src/adapters/text.ts`:
- Around line 40-75: In buildOpenAIUsage, guard optional nested fields
(usage.input_tokens_details and usage.output_tokens_details) before accessing
their properties to avoid runtime errors during streaming; replace the use of ||
with the nullish coalescing operator (??) when setting promptTokens,
completionTokens, and totalTokens so 0 is preserved, and if usage.total_tokens
is missing, compute totalTokens as (promptTokens ?? 0) + (completionTokens ??
0); when adding promptTokensDetails and completionTokensDetails check that
details and outputDetails exist (e.g., usage.input_tokens_details &&
usage.input_tokens_details.cached_tokens > 0) and only then spread into
result.promptTokensDetails/result.completionTokensDetails.
In `@packages/typescript/ai-openai/src/adapters/transcription.ts`:
- Around line 18-64: The buildTranscriptionUsage function incorrectly falls
through to duration-based billing for GPT-4o models when response.usage is
missing; update buildTranscriptionUsage (check model, duration, response) so
that if model.startsWith('gpt-4o') and the extracted usage is falsy, the
function returns undefined immediately (keep the existing return when usage
exists), preventing GPT-4o from ever using the Whisper-1 duration path.
🧹 Nitpick comments (4)
packages/typescript/ai-grok/src/adapters/text.ts (1)
54-63: Consider whether zero values should be included in token details.The condition
> 0filters out zero values, but if the API explicitly returnsreasoning_tokens: 0, this information is lost. If the intent is to only include non-zero values for cleaner output, this is fine. However, if tracking that reasoning was attempted but produced zero tokens is valuable, consider using just!== undefined.This appears intentional for cleaner output, so approving as-is if that's the design choice.
packages/typescript/ai-openrouter/src/adapters/text.ts (1)
43-77: Well-structured usage normalization helper.The function correctly handles undefined input and provides sensible defaults for required fields.
One minor note: if
usage.completionTokensDetailsexists but all its fields are falsy (e.g.,{ reasoningTokens: 0 }), the result will have an emptycompletionTokensDetails: {}object rather than omitting the field entirely. This is unlikely to cause issues in practice, but if you want strict parity with omitting the field when there are no meaningful details:♻️ Optional: Only set completionTokensDetails if non-empty
// Map completion tokens details (passthrough from SDK) if (usage.completionTokensDetails) { const details = usage.completionTokensDetails - result.completionTokensDetails = { + const completionDetails: TokenUsage['completionTokensDetails'] = { ...(details.reasoningTokens ? { reasoningTokens: details.reasoningTokens } : {}), ...(details.audioTokens ? { audioTokens: details.audioTokens } : {}), ...(details.acceptedPredictionTokens ? { acceptedPredictionTokens: details.acceptedPredictionTokens } : {}), ...(details.rejectedPredictionTokens ? { rejectedPredictionTokens: details.rejectedPredictionTokens } : {}), } + if (Object.keys(completionDetails).length > 0) { + result.completionTokensDetails = completionDetails + } }docs/protocol/chunk-definitions.md (1)
395-401: Consider documentingdurationSecondsin the notes.
It’s defined in the schema but not surfaced in the “Token Usage Notes” bullets, so readers may miss it.📝 Suggested doc tweak
- - `providerUsageDetails` - Provider-specific fields not in the standard schema - - For Gemini, modality-specific token counts (audio, video, image, text) are extracted from the response + - `durationSeconds` - Duration-based billing (e.g., transcription or other time-billed models) + - `providerUsageDetails` - Provider-specific fields not in the standard schema + - For Gemini, modality-specific token counts (audio, video, image, text) are extracted from the responsepackages/typescript/ai-gemini/tests/gemini-adapter.test.ts (1)
115-117: Optional: centralize mock payload extraction to avoid repeatedas any.
A small helper can keep these casts in one place and improve readability.Also applies to: 222-224, 298-300, 340-342
| openai: () => | ||
| createChatOptions({ | ||
| adapter: openaiText((model || 'gpt-4o') as 'gpt-4o'), | ||
| temperature: 2, | ||
| modelOptions: {}, | ||
| modelOptions: { | ||
| prompt_cache_key: 'user-session-12345', | ||
| prompt_cache_retention: '24h', | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, examine the file to understand the context
cat -n examples/ts-react-chat/src/routes/api.tanchat.ts | head -150Repository: TanStack/ai
Length of output: 6304
🏁 Script executed:
# Check if conversationId is available in scope at the lines in question
cat -n examples/ts-react-chat/src/routes/api.tanchat.ts | sed -n '70,140p'Repository: TanStack/ai
Length of output: 3031
🏁 Script executed:
# Search for openai adapter imports and type definitions
rg -A 5 -B 5 "openaiText\|createChatOptions" examples/ts-react-chat/src/routes/api.tanchat.tsRepository: TanStack/ai
Length of output: 37
🏁 Script executed:
# Look for `@tanstack/ai-openai` package structure and adapter options
find . -name "*.ts" -o -name "*.tsx" | xargs rg -l "@tanstack/ai-openai" | head -5Repository: TanStack/ai
Length of output: 269
🌐 Web query:
OpenAI prompt caching API documentation prompt_cache_key prompt_cache_retention
💡 Result:
Summary — prompt_cache_key and prompt_cache_retention (OpenAI Prompt Caching)
-
What they are:
- prompt_cache_key: optional string you send with a request to influence cache routing and improve cache-hit rates for prompts that share the same prefix. Use it consistently for requests that share the same static prefix. [1][2]
- prompt_cache_retention: optional setting controlling how long cached prompt prefixes are retained. Allowed values: "in_memory" (default) and "24h" (extended retention). Setting "24h" enables extended caching (up to 24 hours) on supported models. [1][3]
-
Effects / behavior:
- Caching is only possible for prompts ≥ 1024 tokens; cached prefix routing/hash typically uses the first ~256 tokens (model-dependent). Cached tokens are reported in response. [1]
- prompt_cache_key is combined with the prefix hash; keep each unique prefix+key below ~15 requests/min to avoid overflow. [1]
- in_memory retention: cached prefixes typically expire after 5–10 minutes of inactivity (max ~1 hour) and are stored in volatile GPU memory. [1]
- 24h (extended) retention: available on certain models (e.g., gpt-5.1, gpt-5.2, gpt-4.1, gpt-5 family variants listed in docs) and can keep cached prefixes up to 24 hours by offloading key/value tensors to GPU-local storage. Extended caching may affect Zero Data Retention eligibility. [1]
-
How to set:
- Include prompt_cache_key and/or prompt_cache_retention in your Requests.create or chat.completions.create payload. Example:
{"model":"gpt-5.1","input":"...","prompt_cache_key":"my-key","prompt_cache_retention":"24h"}. [1][3]
- Include prompt_cache_key and/or prompt_cache_retention in your Requests.create or chat.completions.create payload. Example:
-
Notes / best practices:
- Put static content (instructions/examples) at the beginning of prompts and dynamic content at the end for better prefix reuse. Monitor cached_tokens in response usage to measure cache hits. [1]
Sources:
[1] OpenAI Platform docs — Prompt Caching (platform.openai.com/docs/guides/prompt-caching).
[2] OpenAI API reference — Chat/Chat-completions object (platform.openai.com/docs/api-reference/chat/object).
[3] OpenAI API reference — Responses object (platform.openai.com/docs/api-reference/responses).
Avoid a hardcoded prompt cache key (cross‑user cache collision risk).
A fixed prompt_cache_key causes all users' prompts to share the same cache routing, risking data mixing. Use conversationId when available to scope the cache key per user/session.
🛠️ Suggested change
openai: () =>
createChatOptions({
adapter: openaiText((model || 'gpt-4o') as 'gpt-4o'),
- modelOptions: {
- prompt_cache_key: 'user-session-12345',
- prompt_cache_retention: '24h',
- },
+ modelOptions: conversationId
+ ? {
+ prompt_cache_key: conversationId,
+ prompt_cache_retention: '24h',
+ }
+ : {},
}),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| openai: () => | |
| createChatOptions({ | |
| adapter: openaiText((model || 'gpt-4o') as 'gpt-4o'), | |
| temperature: 2, | |
| modelOptions: {}, | |
| modelOptions: { | |
| prompt_cache_key: 'user-session-12345', | |
| prompt_cache_retention: '24h', | |
| }, | |
| openai: () => | |
| createChatOptions({ | |
| adapter: openaiText((model || 'gpt-4o') as 'gpt-4o'), | |
| modelOptions: conversationId | |
| ? { | |
| prompt_cache_key: conversationId, | |
| prompt_cache_retention: '24h', | |
| } | |
| : {}, | |
| }), |
🤖 Prompt for AI Agents
In `@examples/ts-react-chat/src/routes/api.tanchat.ts` around lines 129 - 135, The
prompt_cache_key is hardcoded in the createChatOptions call (openai: () =>
createChatOptions({...}, modelOptions: { prompt_cache_key: 'user-session-12345'
})) which risks cross-user cache collisions; change prompt_cache_key to be
derived from the actual conversation/session identifier (e.g., conversationId or
request-specific session id) when available, falling back to a safe default only
if no conversation id exists, so construct the key using the conversationId from
the request/context and replace the fixed 'user-session-12345' value in the
createChatOptions modelOptions.
| import type { StreamChunk, TextOptions, TokenUsage, Tool } from '@tanstack/ai' | ||
| import type { OllamaProviderUsageDetails } from '../usage-types' | ||
|
|
||
| /** | ||
| * Build normalized TokenUsage from Ollama's ChatResponse | ||
| */ | ||
| function buildOllamaUsage(response: ChatResponse): TokenUsage | undefined { | ||
| // Ollama provides prompt_eval_count and eval_count | ||
| const promptTokens = response.prompt_eval_count | ||
| const completionTokens = response.eval_count | ||
|
|
||
| // If no token counts are available, return undefined | ||
| if (promptTokens === 0 && completionTokens === 0) { | ||
| return undefined | ||
| } | ||
|
|
||
| const result: TokenUsage = { | ||
| promptTokens, | ||
| completionTokens, | ||
| totalTokens: promptTokens + completionTokens, | ||
| } | ||
|
|
||
| // Add provider-specific duration details | ||
| const providerDetails: OllamaProviderUsageDetails = {} | ||
| let hasProviderDetails = false | ||
|
|
||
| if (response.load_duration > 0) { | ||
| providerDetails.loadDuration = response.load_duration | ||
| hasProviderDetails = true | ||
| } | ||
|
|
||
| if (response.prompt_eval_duration > 0) { | ||
| providerDetails.promptEvalDuration = response.prompt_eval_duration | ||
| hasProviderDetails = true | ||
| } | ||
|
|
||
| if (response.eval_duration > 0) { | ||
| providerDetails.evalDuration = response.eval_duration | ||
| hasProviderDetails = true | ||
| } | ||
|
|
||
| if (response.total_duration > 0) { | ||
| providerDetails.totalDuration = response.total_duration | ||
| hasProviderDetails = true | ||
| } | ||
|
|
||
| if (hasProviderDetails) { | ||
| result.providerUsageDetails = providerDetails | ||
| } | ||
|
|
||
| return result | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
fd -t f "text.ts" packages/typescript/ai-ollama/src/adapters/ -x cat -n {}Repository: TanStack/ai
Length of output: 16578
🏁 Script executed:
rg "ChatResponse" packages/typescript/ai-ollama/ -A 5 -B 2Repository: TanStack/ai
Length of output: 2134
🏁 Script executed:
rg "prompt_eval_count|eval_count" packages/typescript/ai-ollama/ -B 2 -A 2Repository: TanStack/ai
Length of output: 699
🏁 Script executed:
rg "buildOllamaUsage" packages/typescript/ai-ollama/ -B 2 -A 5Repository: TanStack/ai
Length of output: 2700
🏁 Script executed:
find . -name "package.json" -path "*/node_modules/ollama/*" -o -name "*.d.ts" -path "*/node_modules/ollama/*" | head -20Repository: TanStack/ai
Length of output: 37
🏁 Script executed:
rg "prompt_eval_count|eval_count" --type ts --type tsx -A 2 -B 2 | grep -E "(ChatResponse|interface|type)" -A 5 -B 5Repository: TanStack/ai
Length of output: 82
🏁 Script executed:
# Check if ollama package is listed in package.json
fd package.json packages/typescript/ai-ollama -x cat {} | jq '.dependencies'Repository: TanStack/ai
Length of output: 79
🌐 Web query:
ollama ChatResponse typescript type definition prompt_eval_count eval_count
💡 Result:
You can represent Ollama's ChatResponse (including prompt_eval_count and eval_count) in TypeScript like this:
interface ChatMessage {
role: "user" | "assistant" | "system" | string;
content?: string;
thinking?: string;
tool_calls?: any[]; // tighten if you have schema for tool calls
images?: string[];
}
interface ChatResponse {
model: string;
created_at: string; // ISO timestamp
message: ChatMessage;
done: boolean;
done_reason?: string;
total_duration?: number; // nanoseconds
load_duration?: number; // nanoseconds
prompt_eval_count?: number; // number of input tokens
prompt_eval_duration?: number; // nanoseconds
eval_count?: number; // number of output tokens
eval_duration?: number; // nanoseconds
// any additional fields returned by your endpoint
}
Notes: Ollama documents prompt_eval_count as "how many input tokens were processed" and eval_count as "how many output tokens were processed" (timings measured in nanoseconds). [1][2]
Sources:
[1] Ollama API usage / example response.
[2] Ollama chat/generate API docs (shows fields including prompt_eval_count and eval_count).
🌐 Web query:
ollama js client ChatResponse ChatRequest types github
💡 Result:
You can find the Ollama JS client TypeScript types (including ChatRequest and ChatResponse) in the official GitHub repo under src/types.ts. The repo README/API also documents chat request/response shape and usage. [1][2]
Links:
- Repository and API/docs (includes chat request/response examples): [1]
- Issue discussing ChatResponse TypeScript interface (related typing details/bugs): [2]
Sources:
[1] https://github.com/ollama/ollama-js.
[2] https://github.com/ollama/ollama-js/issues (see "ChatResponse TypeScript interface is declared wrong?").
Handle missing token counts to prevent NaN totals and preserve provider duration details.
The prompt_eval_count and eval_count fields on Ollama's ChatResponse are optional. Assigning them directly without nullish coalescing results in undefined values, which compute to NaN when added (line 41). Additionally, the early return (lines 34–36) returns undefined when both counts are 0, discarding valid provider duration details that may exist in the response.
Proposed fix
- const promptTokens = response.prompt_eval_count
- const completionTokens = response.eval_count
+ const promptTokens = response.prompt_eval_count ?? 0
+ const completionTokens = response.eval_count ?? 0
+ const hasTokenCounts = promptTokens > 0 || completionTokens > 0
- if (promptTokens === 0 && completionTokens === 0) {
- return undefined
- }
-
const result: TokenUsage = {
promptTokens,
completionTokens,
totalTokens: promptTokens + completionTokens,
}
...
if (hasProviderDetails) {
result.providerUsageDetails = providerDetails
}
-
- return result
+ if (!hasTokenCounts && !hasProviderDetails) return undefined
+ return result📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import type { StreamChunk, TextOptions, TokenUsage, Tool } from '@tanstack/ai' | |
| import type { OllamaProviderUsageDetails } from '../usage-types' | |
| /** | |
| * Build normalized TokenUsage from Ollama's ChatResponse | |
| */ | |
| function buildOllamaUsage(response: ChatResponse): TokenUsage | undefined { | |
| // Ollama provides prompt_eval_count and eval_count | |
| const promptTokens = response.prompt_eval_count | |
| const completionTokens = response.eval_count | |
| // If no token counts are available, return undefined | |
| if (promptTokens === 0 && completionTokens === 0) { | |
| return undefined | |
| } | |
| const result: TokenUsage = { | |
| promptTokens, | |
| completionTokens, | |
| totalTokens: promptTokens + completionTokens, | |
| } | |
| // Add provider-specific duration details | |
| const providerDetails: OllamaProviderUsageDetails = {} | |
| let hasProviderDetails = false | |
| if (response.load_duration > 0) { | |
| providerDetails.loadDuration = response.load_duration | |
| hasProviderDetails = true | |
| } | |
| if (response.prompt_eval_duration > 0) { | |
| providerDetails.promptEvalDuration = response.prompt_eval_duration | |
| hasProviderDetails = true | |
| } | |
| if (response.eval_duration > 0) { | |
| providerDetails.evalDuration = response.eval_duration | |
| hasProviderDetails = true | |
| } | |
| if (response.total_duration > 0) { | |
| providerDetails.totalDuration = response.total_duration | |
| hasProviderDetails = true | |
| } | |
| if (hasProviderDetails) { | |
| result.providerUsageDetails = providerDetails | |
| } | |
| return result | |
| } | |
| import type { StreamChunk, TextOptions, TokenUsage, Tool } from '@tanstack/ai' | |
| import type { OllamaProviderUsageDetails } from '../usage-types' | |
| /** | |
| * Build normalized TokenUsage from Ollama's ChatResponse | |
| */ | |
| function buildOllamaUsage(response: ChatResponse): TokenUsage | undefined { | |
| // Ollama provides prompt_eval_count and eval_count | |
| const promptTokens = response.prompt_eval_count ?? 0 | |
| const completionTokens = response.eval_count ?? 0 | |
| const hasTokenCounts = promptTokens > 0 || completionTokens > 0 | |
| const result: TokenUsage = { | |
| promptTokens, | |
| completionTokens, | |
| totalTokens: promptTokens + completionTokens, | |
| } | |
| // Add provider-specific duration details | |
| const providerDetails: OllamaProviderUsageDetails = {} | |
| let hasProviderDetails = false | |
| if (response.load_duration > 0) { | |
| providerDetails.loadDuration = response.load_duration | |
| hasProviderDetails = true | |
| } | |
| if (response.prompt_eval_duration > 0) { | |
| providerDetails.promptEvalDuration = response.prompt_eval_duration | |
| hasProviderDetails = true | |
| } | |
| if (response.eval_duration > 0) { | |
| providerDetails.evalDuration = response.eval_duration | |
| hasProviderDetails = true | |
| } | |
| if (response.total_duration > 0) { | |
| providerDetails.totalDuration = response.total_duration | |
| hasProviderDetails = true | |
| } | |
| if (hasProviderDetails) { | |
| result.providerUsageDetails = providerDetails | |
| } | |
| if (!hasTokenCounts && !hasProviderDetails) return undefined | |
| return result | |
| } |
🤖 Prompt for AI Agents
In `@packages/typescript/ai-ollama/src/adapters/text.ts` around lines 22 - 73, In
buildOllamaUsage, avoid treating missing token counts as numbers and don't
early-return when counts are 0 so provider duration details aren't lost: read
raw counts into e.g. promptTokensRaw = response.prompt_eval_count and
completionTokensRaw = response.eval_count, determine if both raw counts are
undefined and no provider durations exist then return undefined; otherwise
coalesce for arithmetic (promptTokens = promptTokensRaw ?? 0, completionTokens =
completionTokensRaw ?? 0), compute totalTokens = promptTokens +
completionTokens, and keep the existing providerDetails logic so duration fields
are preserved even when token counts are zero.
| /** | ||
| * Build TokenUsage from transcription response. | ||
| * Whisper-1 uses duration-based billing, GPT-4o models use token-based billing. | ||
| */ | ||
| function buildTranscriptionUsage( | ||
| model: string, | ||
| duration?: number, | ||
| response?: OpenAI_SDK.Audio.TranscriptionCreateResponse, | ||
| ): TokenUsage | undefined { | ||
| // GPT-4o transcription models return usage with tokens | ||
| if (model.startsWith('gpt-4o')) { | ||
| // Check if response has usage field (GPT-4o models may include this) | ||
| const usage = response?.usage as | ||
| | { | ||
| prompt_tokens?: number | ||
| completion_tokens?: number | ||
| total_tokens?: number | ||
| } | ||
| | undefined | ||
|
|
||
| if (usage) { | ||
| return { | ||
| promptTokens: usage.prompt_tokens ?? 0, | ||
| completionTokens: usage.completion_tokens ?? 0, | ||
| totalTokens: usage.total_tokens ?? 0, | ||
| promptTokensDetails: { | ||
| audioTokens: usage.prompt_tokens, | ||
| }, | ||
| completionTokensDetails: { | ||
| textTokens: usage.completion_tokens, | ||
| }, | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Whisper-1 uses duration-based billing | ||
| if (duration !== undefined && duration > 0) { | ||
| return { | ||
| promptTokens: 0, | ||
| completionTokens: 0, | ||
| totalTokens: 0, | ||
| durationSeconds: duration, | ||
| } | ||
| } | ||
|
|
||
| return undefined | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n packages/typescript/ai-openai/src/adapters/transcription.tsRepository: TanStack/ai
Length of output: 10826
🏁 Script executed:
# Look for tests related to transcription
find . -type f -name "*.test.*" -o -name "*.spec.*" | grep -i transcription
# Also search for any documentation or comments about GPT-4o usage behavior
rg -i "gpt-4o.*transcri|usage.*gpt-4o" --type ts --type tsx -A 3 -B 3Repository: TanStack/ai
Length of output: 82
🏁 Script executed:
# Look for tests related to transcription
find . -type f \( -name "*.test.ts" -o -name "*.spec.ts" -o -name "*.test.js" -o -name "*.spec.js" \) | grep -i transcription
# Also search for any documentation or comments about GPT-4o usage behavior
rg "gpt-4o.*transcri|usage.*gpt-4o" -A 3 -B 3Repository: TanStack/ai
Length of output: 11224
🏁 Script executed:
# Find OpenAI type definitions or SDK usage information
rg "TranscriptionCreateResponse" -A 10 -B 5
# Also check if there are any comments or documentation about usage fields
rg "usage" packages/typescript/ai-openai/src/adapters/transcription.ts -B 3 -A 3Repository: TanStack/ai
Length of output: 2817
🏁 Script executed:
# Search for tests related to buildTranscriptionUsage
rg "buildTranscriptionUsage" -A 5 -B 5
# Also check if there are integration tests
find . -type f -name "*.test.ts" -o -name "*.spec.ts" | xargs grep -l "transcription" 2>/dev/null | head -5
# Check for any OpenAI API documentation or comments about GPT-4o usage always being present
rg "usage.*gpt-4o|gpt-4o.*usage" -iRepository: TanStack/ai
Length of output: 4672
🏁 Script executed:
# Check package.json to see which OpenAI SDK version is used
rg "openai" packages/typescript/ai-openai/package.json
# Look for any documentation or comments about the response types
rg "Audio\.Transcription" -A 10 -B 5 | head -50Repository: TanStack/ai
Length of output: 4564
🏁 Script executed:
# Look for test fixtures or examples showing transcription responses
find . -type f \( -name "*.test.ts" -o -name "*.json" \) | xargs grep -l "gpt-4o.*transcri\|transcri.*gpt-4o" 2>/dev/null | head -5
# Check for any hardcoded response examples or mocks
rg "usage.*prompt_tokens\|prompt_tokens.*usage" -B 5 -A 5 | head -80Repository: TanStack/ai
Length of output: 37
🏁 Script executed:
# Check the node_modules or package info to understand the OpenAI SDK better
# Let's also look for any comments or documentation in the codebase about API behavior
rg "TranscriptionCreateResponse\|transcription.*response" --type ts -B 5 -A 15 | grep -A 15 -B 5 "usage"
# Check if there's any documentation about the OpenAI API response structure
find . -name "README*" -o -name "*.md" | xargs grep -i "usage\|transcription" 2>/dev/null | grep -i "gpt-4o\|usage" | head -20Repository: TanStack/ai
Length of output: 1240
🏁 Script executed:
# Let's trace through the usage path more carefully
rg "buildTranscriptionUsage" -B 5 -A 15 packages/typescript/ai-openai/src/adapters/transcription.tsRepository: TanStack/ai
Length of output: 1645
Return undefined for GPT-4o when usage is missing; don't fall through to duration billing.
GPT-4o models use token-based billing and should never fall back to the Whisper-1 duration path. When a GPT-4o response omits usage, the function currently continues to the duration check instead of returning early, misapplying billing logic to a token-billed model. Add return undefined after the GPT-4o check when usage is not found.
🤖 Prompt for AI Agents
In `@packages/typescript/ai-openai/src/adapters/transcription.ts` around lines 18
- 64, The buildTranscriptionUsage function incorrectly falls through to
duration-based billing for GPT-4o models when response.usage is missing; update
buildTranscriptionUsage (check model, duration, response) so that if
model.startsWith('gpt-4o') and the extracted usage is falsy, the function
returns undefined immediately (keep the existing return when usage exists),
preventing GPT-4o from ever using the Whisper-1 duration path.
packages/typescript/ai/src/types.ts
Outdated
| * Fields are populated based on provider support. | ||
| */ | ||
| export interface PromptTokensDetails { | ||
| /** Tokens read from cache (OpenAI, OpenRouter, Gemini) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to be updating these lists of adapters as we grow out the numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're absolutely right! jokes aside I wanted to extract some common denominators out of all the adapters and some ended up being duplicates, i've cleaned up the types now, thank you for catching this!
| /** | ||
| * Build normalized TokenUsage from Anthropic's usage object | ||
| */ | ||
| function buildAnthropicUsage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every one of the providers now has one or more of these functions. It would be good to see if we could extract common functionality. Also, let's consider creating a usage.ts file in each adapter that has the calculation functions. That keeps them in a known spot with a known function, and keeps the line count down in the action handlers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, also added tests for each adapter for every usage case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/typescript/ai-openrouter/src/adapters/summarize.ts (1)
62-81: Avoid double-normalizing usage in summarize().
OpenRouterTextAdapteralready returns normalizedTokenUsagein done chunks. Re-runningbuildOpenRouterUsagecan strip provider-specific details (e.g., prediction tokens) because it only re-derives them fromcompletionTokensDetails.🔧 Suggested fix
- if (chunk.type === 'done' && chunk.usage) { - usage = buildOpenRouterUsage(chunk.usage)! - } + if (chunk.type === 'done' && chunk.usage) { + usage = chunk.usage + }
🤖 Fix all issues with AI agents
In `@packages/typescript/ai-anthropic/src/adapters/summarize.ts`:
- Around line 126-130: summarizeStream() is calling the non-beta
messages.create() which yields a different event shape than summarize() and
prevents using buildAnthropicUsage/BetaMessageDeltaUsage; change
summarizeStream() to call beta.messages.create() (matching text.ts and
summarize()), then replace the current manual usage aggregation with
buildAnthropicUsage(event.usage) where you process the stream events so the
usage comes from the beta event shape; ensure event handlers expect the beta
message delta structure (BetaMessageDeltaUsage) and keep summarize() unchanged.
In `@packages/typescript/ai-gemini/src/adapters/summarize.ts`:
- Around line 111-116: The summary adapter currently always emits a usage object
by calling buildGeminiUsage(response.usageMetadata), which returns a zeroed
usage when response.usageMetadata is undefined; change the return logic in the
summarize adapter so that when response.usageMetadata is undefined the returned
object sets usage to undefined (instead of calling buildGeminiUsage), otherwise
call buildGeminiUsage(response.usageMetadata); keep the id generation via
generateId('sum') and other fields unchanged.
In `@packages/typescript/ai-gemini/src/usage.ts`:
- Around line 114-121: The buildGeminiUsage function currently defaults
totalTokens to 0 when usageMetadata.totalTokenCount is missing, causing
inconsistent totals; change the construction passed to buildBaseUsage so
totalTokens uses usageMetadata?.totalTokenCount if present, otherwise falls back
to summing usageMetadata?.promptTokenCount and
usageMetadata?.candidatesTokenCount (each defaulting to 0 if undefined), i.e.
compute totalTokens = usageMetadata?.totalTokenCount ??
((usageMetadata?.promptTokenCount ?? 0) + (usageMetadata?.candidatesTokenCount
?? 0)) and pass that into buildBaseUsage alongside promptTokens and
completionTokens.
In `@packages/typescript/ai-ollama/src/usage.ts`:
- Around line 33-46: The code reads response.prompt_eval_count and
response.eval_count directly, which can be undefined at runtime and leads to NaN
totals; update the usage calculation in the function that builds usage (the
lines using prompt_eval_count, eval_count, and buildBaseUsage) to coalesce both
values to 0 (e.g., treat undefined as 0) before the zero-check and before
computing totalTokens, so the conditional if (promptTokens === 0 &&
completionTokens === 0) and totalTokens: promptTokens + completionTokens behave
correctly.
🧹 Nitpick comments (8)
packages/typescript/ai-anthropic/tests/model-meta.test.ts (1)
180-359: Consider reducing repetition with a parameterized approach.The "Detailed property type assertions" section contains significant repetition—each of the 6 properties is tested across all 9 models separately. While explicitness in test code is valuable, this could be consolidated using a parameterized helper.
♻️ Optional: Parameterized approach to reduce boilerplate
// Example helper pattern for type-level property checks const chatModels = [ 'claude-opus-4-5', 'claude-sonnet-4-5', 'claude-haiku-4-5', 'claude-opus-4-1', 'claude-sonnet-4', 'claude-3-7-sonnet', 'claude-opus-4', 'claude-3-5-haiku', 'claude-3-haiku', ] as const describe('all models should have base options', () => { it.each(chatModels)('%s should have all base properties', (model) => { type Options = AnthropicChatModelProviderOptionsByName[typeof model] expectTypeOf<Options>().toHaveProperty('container') expectTypeOf<Options>().toHaveProperty('context_management') expectTypeOf<Options>().toHaveProperty('mcp_servers') expectTypeOf<Options>().toHaveProperty('stop_sequences') expectTypeOf<Options>().toHaveProperty('tool_choice') expectTypeOf<Options>().toHaveProperty('top_k') }) })This is optional—the current explicit approach works fine and can be easier to debug when type assertions fail.
packages/typescript/ai-grok/tsconfig.json (1)
6-6: Inconsistent path prefix inincludepatterns.The
srcpatterns omit the./prefix whiletestsincludes it. While functionally equivalent, aligning the style improves readability.Suggested fix
- "include": ["src/**/*.ts", "src/**/*.tsx", "./tests/**/*.ts"], + "include": ["src/**/*.ts", "src/**/*.tsx", "tests/**/*.ts"],packages/typescript/ai-ollama/tsconfig.json (1)
6-6: Nitpick: Inconsistent path prefix style inincludearray.Consider removing the leading
./from"./tests/**/*.ts"for consistency with the other patterns.Suggested fix
- "include": ["src/**/*.ts", "src/**/*.tsx", "./tests/**/*.ts"], + "include": ["src/**/*.ts", "src/**/*.tsx", "tests/**/*.ts"],packages/typescript/ai-anthropic/src/usage.ts (2)
46-49: Consider omitting emptypromptTokensDetails.Unlike the Grok implementation which only adds detail objects when non-empty (see
packages/typescript/ai-grok/src/usage.tslines 47-52), this always assignspromptTokensDetailseven when the object is empty ({}). For consistency and cleaner output, consider only adding when there are actual details.♻️ Suggested refactor
- result.promptTokensDetails = { - ...(cacheCreation ? { cacheWriteTokens: cacheCreation } : {}), - ...(cacheRead ? { cachedTokens: cacheRead } : {}), - } + const promptTokensDetails = { + ...(cacheCreation ? { cacheWriteTokens: cacheCreation } : {}), + ...(cacheRead ? { cachedTokens: cacheRead } : {}), + } + if (Object.keys(promptTokensDetails).length > 0) { + result.promptTokensDetails = promptTokensDetails + }
54-63: Consider omitting emptyserverToolUseinproviderUsageDetails.The current implementation always includes
serverToolUse: {}even when there are no server tool metrics, resulting in{ serverToolUse: {} }. For cleaner output and consistency with other providers, consider only adding when metrics exist.♻️ Suggested refactor
- result.providerUsageDetails = { - serverToolUse: { - ...(serverToolUse?.web_search_requests - ? { webSearchRequests: serverToolUse.web_search_requests } - : {}), - ...(serverToolUse?.web_fetch_requests - ? { webFetchRequests: serverToolUse.web_fetch_requests } - : {}), - }, - } satisfies AnthropicProviderUsageDetails + const serverToolUseDetails = { + ...(serverToolUse?.web_search_requests + ? { webSearchRequests: serverToolUse.web_search_requests } + : {}), + ...(serverToolUse?.web_fetch_requests + ? { webFetchRequests: serverToolUse.web_fetch_requests } + : {}), + } + if (Object.keys(serverToolUseDetails).length > 0) { + result.providerUsageDetails = { + serverToolUse: serverToolUseDetails, + } satisfies AnthropicProviderUsageDetails + }packages/typescript/ai-anthropic/tests/usage-extraction.test.ts (1)
235-289: Consider adding test for undefined usage scenario.The current tests cover the happy path where
usageis always present. Consider adding a test case for whenmessage_deltahas nousagefield to verify graceful handling of missing usage data.💡 Suggested additional test
it('handles response with missing usage gracefully', async () => { const mockStream = createMockStream([ { type: 'message_start', message: { id: 'msg_123', type: 'message', role: 'assistant', content: [], model: 'claude-3-7-sonnet-20250219', usage: { input_tokens: 100, output_tokens: 0 }, }, }, { type: 'content_block_start', index: 0, content_block: { type: 'text', text: '' }, }, { type: 'content_block_delta', index: 0, delta: { type: 'text_delta', text: 'Hello' }, }, { type: 'message_delta', delta: { stop_reason: 'end_turn' }, // No usage field }, { type: 'message_stop' }, ]) mocks.betaMessagesCreate.mockResolvedValueOnce(mockStream) const chunks: Array<StreamChunk> = [] for await (const chunk of chat({ adapter: createAdapter(), messages: [{ role: 'user', content: 'Hello' }], })) { chunks.push(chunk) } const doneChunk = chunks.find((c) => c.type === 'done') expect(doneChunk).toBeDefined() // Verify it doesn't crash and handles missing usage })packages/typescript/ai-openrouter/src/usage.ts (2)
34-36: Redundant nullish coalescing operator.The
?? undefinedon line 35 is redundant. Since the condition on line 34 already checks thatusage.promptTokensDetailsis truthy, the value is guaranteed to exist within this block.♻️ Suggested simplification
// Add prompt token details (passthrough from SDK) if (usage.promptTokensDetails) { - result.promptTokensDetails = usage.promptTokensDetails ?? undefined + result.promptTokensDetails = usage.promptTokensDetails }
39-61: Inconsistent handling of empty detail objects.
completionTokensDetailsis assigned even when it results in an empty object{}(if neitherreasoningTokensnoraudioTokensare present), whileproviderUsageDetailsis only assigned when non-empty (lines 58-60). Consider applying the same pattern for consistency.♻️ Suggested fix for consistency
// Map completion tokens details (passthrough from SDK) if (usage.completionTokensDetails) { const details = usage.completionTokensDetails - result.completionTokensDetails = { + const completionDetails = { ...(details.reasoningTokens ? { reasoningTokens: details.reasoningTokens } : {}), ...(details.audioTokens ? { audioTokens: details.audioTokens } : {}), } + + if (Object.keys(completionDetails).length > 0) { + result.completionTokensDetails = completionDetails + } // Add OpenRouter-specific prediction tokens to providerUsageDetails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@packages/typescript/ai-anthropic/src/adapters/summarize.ts`:
- Line 2: The streaming handler in summarize.ts currently ignores message_start
usage and only builds usage from message_delta, causing prompt (input) tokens to
be lost; update the streaming logic (the message_start and message_delta
handlers in summarize.ts) to capture and cache the BetaMessageDeltaUsage from
message_start (store its input_tokens) and when processing message_delta merge
the cached input_tokens into the message_delta usage object before calling
buildAnthropicUsage(event.usage); ensure you reference the cached value by a
clear name (e.g., cachedStartUsage) and clear it after merging so subsequent
messages don’t reuse stale data.
🧹 Nitpick comments (1)
packages/typescript/ai-gemini/src/usage.ts (1)
36-42: Consider using nullish check instead of strict undefined comparison.The current check
item.tokenCount === undefinedwould not catchnullvalues if the API ever returns them. While this is unlikely given typical Google API patterns, using a nullish check would be more defensive.♻️ Suggested change
for (const item of modalities) { - if (!item.modality || item.tokenCount === undefined) { + if (!item.modality || item.tokenCount == null) { continue }
| @@ -1,4 +1,5 @@ | |||
| import { BaseSummarizeAdapter } from '@tanstack/ai/adapters' | |||
| import { buildAnthropicUsage } from '../usage' | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, explore the repository structure
find packages/typescript/ai-anthropic -type f -name "*.ts" | head -20
# Check buildAnthropicUsage implementation
echo "=== buildAnthropicUsage implementation ==="
cat -n packages/typescript/ai-anthropic/src/usage.ts
# Check imports and usage in summarize adapter
echo "=== Summarize adapter streaming handling ==="
cat -n packages/typescript/ai-anthropic/src/adapters/summarize.ts | head -150Repository: TanStack/ai
Length of output: 166
🏁 Script executed:
#!/bin/bash
# Inspect buildAnthropicUsage signature and implementation
cat -n packages/typescript/ai-anthropic/src/usage.ts
# Check streaming event handling in summarize adapter
echo "=== Summarize adapter ==="
cat -n packages/typescript/ai-anthropic/src/adapters/summarize.ts
# Check streaming event handling in text adapter
echo "=== Text adapter ==="
cat -n packages/typescript/ai-anthropic/src/adapters/text.ts
# Look for message_start and message_delta handling
echo "=== Searching for streaming event handling ==="
rg -n "message_start|message_delta" packages/typescript/ai-anthropic/src/Repository: TanStack/ai
Length of output: 33848
Streaming usage under-reports prompt tokens; must capture and merge message_start usage data.
In streaming responses, message_start events carry the initial prompt token count while message_delta events provide only output tokens. The code currently ignores message_start (summarize.ts:95 has an empty handler) and only calls buildAnthropicUsage(event.usage) on message_delta events (summarize.ts:121, text.ts:565, text.ts:590). When BetaMessageDeltaUsage lacks input_tokens, buildAnthropicUsage() defaults to zero (usage.ts:33), causing the final usage report to omit prompt tokens entirely.
Extract and cache usage data from message_start events, then merge input tokens into the final message_delta usage before calling buildAnthropicUsage().
🤖 Prompt for AI Agents
In `@packages/typescript/ai-anthropic/src/adapters/summarize.ts` at line 2, The
streaming handler in summarize.ts currently ignores message_start usage and only
builds usage from message_delta, causing prompt (input) tokens to be lost;
update the streaming logic (the message_start and message_delta handlers in
summarize.ts) to capture and cache the BetaMessageDeltaUsage from message_start
(store its input_tokens) and when processing message_delta merge the cached
input_tokens into the message_delta usage object before calling
buildAnthropicUsage(event.usage); ensure you reference the cached value by a
clear name (e.g., cachedStartUsage) and clear it after merging so subsequent
messages don’t reuse stale data.
🎯 Changes
Add usage details that were missing
✅ Checklist
pnpm run test:pr.🚀 Release Impact
Summary by CodeRabbit
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.