Skip to content

claude/add-cost-response-header-q5ZpB#5500

Draft
chitalian wants to merge 1 commit intomainfrom
claude/add-cost-response-header-q5ZpB
Draft

claude/add-cost-response-header-q5ZpB#5500
chitalian wants to merge 1 commit intomainfrom
claude/add-cost-response-header-q5ZpB

Conversation

@chitalian
Copy link
Contributor

When users send requests with the Helicone-Include-Cost: true header, the proxy will calculate the request cost synchronously and return it in the Helicone-Cost response header.

This allows users to get cost information directly from the LLM completion response without needing a separate API call.

Note: When this header is enabled, streaming responses will be buffered to calculate the cost before returning, which may affect latency for streaming requests.

Ticket

Link to the ticket(s) this pull request addresses.

Component/Service

What part of Helicone does this affect?

  • Web (Frontend)
  • Jawn (Backend)
  • Worker (Proxy)
  • Bifrost (Marketing)
  • AI Gateway
  • Packages
  • Infrastructure/Docker
  • Documentation

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Performance improvement
  • Refactoring

Deployment Notes

  • No special deployment steps required
  • Database migrations need to run
  • Environment variable changes required
  • Coordination with other teams needed

Screenshots / Demos

Before After

Extra Notes

Any additional context, considerations, or notes for reviewers.

Context

Why are you making this change?

Screenshots / Demos

When users send requests with the `Helicone-Include-Cost: true` header,
the proxy will calculate the request cost synchronously and return it
in the `Helicone-Cost` response header.

This allows users to get cost information directly from the LLM
completion response without needing a separate API call.

Note: When this header is enabled, streaming responses will be buffered
to calculate the cost before returning, which may affect latency for
streaming requests.
@vercel
Copy link

vercel bot commented Jan 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments
Project Deployment Review Updated (UTC)
helicone Skipped Skipped Jan 13, 2026 1:52am
helicone-bifrost Skipped Skipped Jan 13, 2026 1:52am
helicone-eu Skipped Skipped Jan 13, 2026 1:52am

@supabase
Copy link

supabase bot commented Jan 13, 2026

This pull request has been ignored for the connected project bolqqmqbrciybnypvklh because there are no changes detected in supbase/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.


Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

@supabase
Copy link

supabase bot commented Jan 13, 2026

This pull request has been ignored for the connected project lmahfbbnchpworytrrqk because there are no changes detected in .supabase/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.


Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

@claude
Copy link
Contributor

claude bot commented Jan 13, 2026

Claude finished @chitalian's task —— View job


PR Review Analysis

Reviewing the PR to add Helicone-Include-Cost header functionality.

Tasks:

  • Read and analyze modified files
  • Review for critical bugs and security issues
  • Check performance implications
  • Evaluate code quality and best practices
  • Provide specific suggestions and confidence score

Review Summary

Score: 7/10 - Good implementation with minor performance and error handling concerns

Key Changes:

  • Adds includeCost feature flag to HeliconeHeaders.ts (lines 29, 102, 435, 440)
  • Implements calculateCostFromResponse() function that duplicates existing cost calculation logic
  • Modifies response handling to synchronously calculate cost and add Helicone-Cost header
  • Buffers streaming responses when cost calculation is enabled

Critical Issues Found

1. Code Duplication Risk ⚠️

Problem: The calculateCostFromResponse() function (lines 49-155) duplicates ~90% of the cost calculation logic already present in the log() function (lines 654-747).

Impact:

  • Maintenance burden - changes to cost calculation must be made in two places
  • Risk of divergence between the two implementations
  • Potential for bugs when only one implementation is updated

Suggestion: Extract the common cost calculation logic into a shared utility function:

// Create shared function in worker/src/lib/util/costCalculation.ts
async function calculateCostFromLoggable(
  loggable: DBLoggable,
  proxyRequest: HeliconeProxyRequest
): Promise<number | undefined> {
  const rawResponseResult = await loggable.readRawResponse();
  if (rawResponseResult.error !== null) {
    return undefined;
  }
  
  // Move common logic here...
}

// Then use in both places:
// In proxyForwarder for header response
const cost = await calculateCostFromLoggable(loggable, proxyRequest);

// In log function for async logging
const cost = await calculateCostFromLoggable(loggable, proxyRequest);

2. Performance Impact Not Mitigated 🚨

Problem: Lines 510-531 show that when includeCost is enabled, streaming responses lose their primary benefit - the response is completely buffered before being returned to the client.

Impact:

  • Significantly increased latency for streaming requests
  • Increased memory usage for large responses
  • Poor user experience for applications relying on streaming

Suggestion: Add clear warnings and consider alternatives:

  1. Add documentation about the performance impact
  2. Consider implementing a timeout/size limit for cost calculation
  3. Potentially return an estimated cost for streaming requests instead of exact cost
if (includeCost) {
  // Add size/timeout limits
  const MAX_BUFFER_SIZE = 10 * 1024 * 1024; // 10MB
  const TIMEOUT_MS = 5000; // 5 seconds
  
  try {
    const cost = await Promise.race([
      calculateCostFromResponse({ loggable, proxyRequest }),
      new Promise((_, reject) => 
        setTimeout(() => reject(new Error('Cost calculation timeout')), TIMEOUT_MS)
      )
    ]);
    
    if (cost !== undefined) {
      responseBuilder.setHeader("Helicone-Cost", cost.toString());
    } else {
      responseBuilder.setHeader("Helicone-Cost-Status", "calculation-failed");
    }
  } catch (e) {
    console.error("Cost calculation failed:", e);
    responseBuilder.setHeader("Helicone-Cost-Status", "timeout-or-error");
  }
}

Minor Issues & Improvements

3. Error Handling Could Be More Granular

File: ProxyForwarder.ts:528-530

} catch (e) {
  console.error("Error calculating cost for response header:", e);
}

Suggestion: Provide more specific error information to help with debugging:

} catch (e) {
  console.error("Error calculating cost for response header:", e);
  // Optionally add header indicating cost calculation failed
  responseBuilder.setHeader("Helicone-Cost-Status", "calculation-error");
}

4. Header Name Inconsistency

Problem: The PR description mentions Helicone-Include-Cost: true but the code uses helicone-include-cost (lowercase).

Suggestion: Ensure documentation matches implementation or consider supporting both cases for better developer experience.

5. Memory Usage Optimization

File: ProxyForwarder.ts:527

responseBody = bodyChunks.join("");

Suggestion: For large responses, consider using a more efficient concatenation method:

// More efficient for large responses
const totalLength = bodyChunks.reduce((sum, chunk) => sum + chunk.length, 0);
const buffer = new Uint8Array(totalLength);
let offset = 0;
for (const chunk of bodyChunks) {
  buffer.set(chunk, offset);
  offset += chunk.length;
}
responseBody = buffer;

Positive Aspects ✅

  1. Clean Integration: The feature flag approach is well-implemented
  2. Backwards Compatible: No breaking changes to existing functionality
  3. Consistent Error Handling: Follows existing patterns in the codebase
  4. Comprehensive Cost Calculation: Handles both AI Gateway and legacy provider flows
  5. Proper TypeScript Types: New feature flag properly typed in interfaces

Suggestions Summary:

  1. Critical: Extract duplicate cost calculation logic into shared utility function
  2. Important: Add performance safeguards (timeouts, size limits) for cost calculation
  3. Minor: Improve error handling granularity and add status headers
  4. Documentation: Clarify performance impact in API documentation

The implementation is functionally sound but would benefit from addressing the code duplication and performance concerns before merging.


Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@chitalian chitalian marked this pull request as draft January 21, 2026 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants