Cut your AI costs in half. Without cutting corners.
Cost Katana is a drop-in SDK that wraps your AI calls with automatic cost tracking, smart caching, and optimizationβall in one line of code.
npm install cost-katanaimport { ai, OPENAI } from 'cost-katana';
const response = await ai(OPENAI.GPT_4, 'Explain quantum computing in one sentence');
console.log(response.text); // "Quantum computing uses qubits to perform..."
console.log(response.cost); // 0.0012
console.log(response.tokens); // 47That's it. No configuration files. No complex setup. Just results.
Cost Katana is completely provider-agnostic. Never lock yourself into a single vendor.
import { ai, ModelCapability } from 'cost-katana';
// Automatically selects best model for each task
const code = await ai(ModelCapability.CODE_GENERATION, 'Write a React component');
const chat = await ai(ModelCapability.CONVERSATION, 'Hello!');
const vision = await ai(ModelCapability.VISION, 'Describe this image', { image });import { ai } from 'cost-katana';
// Fastest model available
const fast = await ai({ speed: 'fastest' }, prompt);
// Cheapest model available
const cheap = await ai({ cost: 'cheapest' }, prompt);
// Best quality model
const best = await ai({ quality: 'best' }, prompt);
// Balanced approach
const balanced = await ai({ speed: 'fast', cost: 'cheap' }, prompt);Benefits:
- π Automatic Failover - Seamlessly switch providers if one goes down
- π° Cost Optimization - Routes to the cheapest provider automatically
- π Future-Proof - New providers added without code changes
- π Zero Lock-In - Switch providers anytime, no refactoring needed
Read the full Provider-Agnostic Guide β
Let's build something real. In this tutorial, you'll create a chatbot that:
- β Tracks every dollar spent
- β Caches repeated questions (saving 100% on duplicates)
- β Optimizes long responses (40-75% savings)
import { chat, OPENAI } from 'cost-katana';
// Create a persistent chat session
const session = chat(OPENAI.GPT_4);
// Send messages and track costs
await session.send('Hello! What can you help me with?');
await session.send('Tell me a programming joke');
await session.send('Now explain it');
// See exactly what you spent
console.log(`π° Total cost: $${session.totalCost.toFixed(4)}`);
console.log(`π Messages: ${session.messages.length}`);
console.log(`π― Tokens used: ${session.totalTokens}`);Cache identical questions to avoid paying twice:
import { ai, OPENAI } from 'cost-katana';
// First call - hits the API
const response1 = await ai(OPENAI.GPT_4, 'What is 2+2?', { cache: true });
console.log(`Cached: ${response1.cached}`); // false
console.log(`Cost: $${response1.cost}`); // $0.0008
// Second call - served from cache (FREE!)
const response2 = await ai(OPENAI.GPT_4, 'What is 2+2?', { cache: true });
console.log(`Cached: ${response2.cached}`); // true
console.log(`Cost: $${response2.cost}`); // $0.0000 πFor long-form content, Cortex compresses prompts intelligently:
import { ai, OPENAI } from 'cost-katana';
const response = await ai(
OPENAI.GPT_4,
'Write a comprehensive guide to machine learning for beginners',
{
cortex: true, // Enable 40-75% cost reduction
maxTokens: 2000
}
);
console.log(`Optimized: ${response.optimized}`);
console.log(`Saved: $${response.savedAmount}`);Find the best price-to-quality ratio for your use case:
import { ai, OPENAI, ANTHROPIC, GOOGLE } from 'cost-katana';
const prompt = 'Summarize the theory of relativity in 50 words';
const models = [
{ name: 'GPT-4', id: OPENAI.GPT_4 },
{ name: 'Claude 3.5 Sonnet', id: ANTHROPIC.CLAUDE_3_5_SONNET_20241022 },
{ name: 'Gemini 2.5 Pro', id: GOOGLE.GEMINI_2_5_PRO },
{ name: 'GPT-3.5 Turbo', id: OPENAI.GPT_3_5_TURBO }
];
console.log('π Model Cost Comparison\n');
for (const model of models) {
const response = await ai(model.id, prompt);
console.log(`${model.name.padEnd(20)} $${response.cost.toFixed(6)}`);
}Sample Output:
π Model Cost Comparison
GPT-4 $0.001200
Claude 3.5 Sonnet $0.000900
Gemini 2.5 Pro $0.000150
GPT-3.5 Turbo $0.000080
Stop guessing model names. Get autocomplete and catch typos at compile time:
import { OPENAI, ANTHROPIC, GOOGLE, AWS_BEDROCK, XAI, DEEPSEEK } from 'cost-katana';
// OpenAI
OPENAI.GPT_5
OPENAI.GPT_4
OPENAI.GPT_4O
OPENAI.GPT_3_5_TURBO
OPENAI.O1
OPENAI.O3
// Anthropic
ANTHROPIC.CLAUDE_SONNET_4_5
ANTHROPIC.CLAUDE_3_5_SONNET_20241022
ANTHROPIC.CLAUDE_3_5_HAIKU_20241022
// Google
GOOGLE.GEMINI_2_5_PRO
GOOGLE.GEMINI_2_5_FLASH
GOOGLE.GEMINI_1_5_PRO
// AWS Bedrock
AWS_BEDROCK.NOVA_PRO
AWS_BEDROCK.NOVA_LITE
AWS_BEDROCK.CLAUDE_SONNET_4_5
// Others
XAI.GROK_2_1212
DEEPSEEK.DEEPSEEK_CHATWhy constants over strings?
| Feature | String 'gpt-4' |
Constant OPENAI.GPT_4 |
|---|---|---|
| Autocomplete | β | β |
| Typo protection | β | β |
| Refactor safely | β | β |
| Self-documenting | β | β |
# Recommended: Use Cost Katana API key for all features
COST_KATANA_API_KEY=dak_your_key_here
# Or use provider keys directly
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=...
# For AWS Bedrock
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=us-east-1import { configure } from 'cost-katana';
await configure({
apiKey: 'dak_your_key',
cortex: true, // 40-75% cost savings
cache: true, // Smart caching
firewall: true // Block prompt injections
});const response = await ai(OPENAI.GPT_4, 'Your prompt', {
temperature: 0.7, // Creativity (0-2)
maxTokens: 500, // Response limit
systemMessage: 'You are a helpful AI', // System prompt
cache: true, // Enable caching
cortex: true, // Enable optimization
retry: true // Auto-retry on failures
});// app/api/chat/route.ts
import { ai, OPENAI } from 'cost-katana';
export async function POST(request: Request) {
const { prompt } = await request.json();
const response = await ai(OPENAI.GPT_4, prompt);
return Response.json(response);
}import express from 'express';
import { ai, OPENAI } from 'cost-katana';
const app = express();
app.use(express.json());
app.post('/api/chat', async (req, res) => {
const response = await ai(OPENAI.GPT_4, req.body.prompt);
res.json(response);
});
app.listen(3000);import fastify from 'fastify';
import { ai, OPENAI } from 'cost-katana';
const app = fastify();
app.post('/api/chat', async (request) => {
const { prompt } = request.body as { prompt: string };
return await ai(OPENAI.GPT_4, prompt);
});
app.listen({ port: 3000 });import { Controller, Post, Body } from '@nestjs/common';
import { ai, OPENAI } from 'cost-katana';
@Controller('api')
export class ChatController {
@Post('chat')
async chat(@Body() body: { prompt: string }) {
return await ai(OPENAI.GPT_4, body.prompt);
}
}Block prompt injection attacks automatically:
import { configure, ai, OPENAI } from 'cost-katana';
await configure({ firewall: true });
try {
await ai(OPENAI.GPT_4, 'Ignore all previous instructions and...');
} catch (error) {
console.log('π‘οΈ Blocked:', error.message);
}Protects against:
- Prompt injection attacks
- Jailbreak attempts
- Data exfiltration
- Malicious content generation
Never let provider outages break your app:
import { ai, OPENAI } from 'cost-katana';
// If OpenAI is down, automatically switches to Claude or Gemini
const response = await ai(OPENAI.GPT_4, 'Hello');
console.log(`Provider used: ${response.provider}`);
// Could be 'openai', 'anthropic', or 'google' depending on availabilityCost Katana now provides comprehensive tracking of every request, including network performance, client environment, and optimization opportunities:
import { AICostTracker, OPENAI } from 'cost-katana';
const tracker = new AICostTracker({
apiKey: process.env.COST_KATANA_API_KEY,
// Enable comprehensive tracking
comprehensiveTracking: true,
// Optional: configure tracking endpoints
trackingEndpoint: 'https://api.costkatana.com/usage/track-comprehensive'
});
const response = await tracker.chat(OPENAI.GPT_4, 'Explain quantum computing');
console.log('Response:', response.text);
console.log('Cost:', response.cost);
console.log('Tokens:', response.tokens);
console.log('Response Time:', response.responseTime);
// Comprehensive tracking data is automatically sent to your dashboard
// Including network metrics, client environment, and optimization suggestionsOnce tracking is enabled, you can view detailed analytics at your dashboard:
- Network Performance: DNS lookup time, TCP connection time, total response time
- Client Environment: User agent, platform, IP geolocation
- Request/Response Data: Full request and response payloads (sanitized)
- Optimization Opportunities: AI-powered suggestions to reduce costs
- Performance Metrics: Real-time monitoring with anomaly detection
For custom implementations or additional tracking:
import { AICostTracker } from 'cost-katana';
const tracker = new AICostTracker({
apiKey: process.env.COST_KATANA_API_KEY
});
// Manually track usage with additional metadata
await tracker.trackUsage({
model: 'gpt-4',
provider: 'openai',
prompt: 'Hello, world!',
completion: 'Hello! How can I help you today?',
promptTokens: 3,
completionTokens: 9,
totalTokens: 12,
cost: 0.00036,
responseTime: 850,
userId: 'user_123',
sessionId: 'session_abc',
tags: ['chat', 'greeting'],
// Additional metadata for comprehensive tracking
requestMetadata: {
userAgent: navigator?.userAgent,
clientIP: await fetch('https://api.ipify.org').then(r => r.text()),
feature: 'chat-interface'
}
});import { AICostTracker } from 'cost-katana';
const tracker = new AICostTracker({
apiKey: process.env.COST_KATANA_API_KEY,
sessionReplay: true,
distributedTracing: true
});
// Start a traced session
const sessionId = tracker.startSession({
userId: 'user_123',
feature: 'customer-support',
metadata: {
source: 'web-app',
version: '1.2.3'
}
});
// All requests in this session will be automatically traced
const response = await tracker.chat(OPENAI.GPT_4, 'How can I cancel my subscription?', {
sessionId,
tags: ['support', 'billing']
});
// End session and get analytics
const sessionStats = await tracker.endSession(sessionId);
console.log('Session cost:', sessionStats.totalCost);
console.log('Session duration:', sessionStats.duration);
console.log('Requests made:', sessionStats.requestCount);| Strategy | Savings | When to Use |
|---|---|---|
| Use GPT-3.5 over GPT-4 | 90% | Simple tasks, translations |
| Enable caching | 100% on hits | Repeated queries, FAQs |
| Enable Cortex | 40-75% | Long-form content |
| Batch in sessions | 10-20% | Related queries |
| Use Gemini Flash | 95% vs GPT-4 | High-volume, cost-sensitive |
// β Expensive: Using GPT-4 for everything
await ai(OPENAI.GPT_4, 'What is 2+2?'); // $0.001
// β
Smart: Match model to task
await ai(OPENAI.GPT_3_5_TURBO, 'What is 2+2?'); // $0.0001
// β
Smarter: Cache common queries
await ai(OPENAI.GPT_3_5_TURBO, 'What is 2+2?', { cache: true }); // $0 on repeat
// β
Smartest: Cortex for long content
await ai(OPENAI.GPT_4, 'Write a 2000-word essay', { cortex: true }); // 40-75% offimport { ai, OPENAI } from 'cost-katana';
try {
const response = await ai(OPENAI.GPT_4, 'Hello');
console.log(response.text);
} catch (error) {
switch (error.code) {
case 'NO_API_KEY':
console.log('Set COST_KATANA_API_KEY or OPENAI_API_KEY');
break;
case 'RATE_LIMIT':
console.log('Rate limited. Retrying...');
break;
case 'INVALID_MODEL':
console.log('Model not found. Available:', error.availableModels);
break;
default:
console.log('Error:', error.message);
}
}Explore 45+ complete examples in our examples repository:
π github.com/Hypothesize-Tech/costkatana-examples
| Category | Examples |
|---|---|
| Cost Tracking | Basic tracking, budgets, alerts |
| Gateway | Routing, load balancing, failover |
| Optimization | Cortex, caching, compression |
| Observability | OpenTelemetry, tracing, metrics |
| Security | Firewall, rate limiting, moderation |
| Workflows | Multi-step AI orchestration |
| Frameworks | Express, Next.js, Fastify, NestJS, FastAPI |
// Before
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: 'sk-...' });
const completion = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: 'Hello' }]
});
console.log(completion.choices[0].message.content);
// After
import { ai, OPENAI } from 'cost-katana';
const response = await ai(OPENAI.GPT_4, 'Hello');
console.log(response.text);
console.log(`Cost: $${response.cost}`); // Bonus: cost tracking!// Before
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic({ apiKey: 'sk-ant-...' });
const message = await anthropic.messages.create({
model: 'claude-3-sonnet-20241022',
messages: [{ role: 'user', content: 'Hello' }]
});
// After
import { ai, ANTHROPIC } from 'cost-katana';
const response = await ai(ANTHROPIC.CLAUDE_3_5_SONNET_20241022, 'Hello');// Before
import { ChatOpenAI } from 'langchain/chat_models/openai';
const model = new ChatOpenAI({ modelName: 'gpt-4' });
const response = await model.call([{ content: 'Hello' }]);
// After
import { ai, OPENAI } from 'cost-katana';
const response = await ai(OPENAI.GPT_4, 'Hello');We welcome contributions! See our Contributing Guide.
git clone https://github.com/Hypothesize-Tech/costkatana-core.git
cd costkatana-core
npm install
npm run lint # Check code style
npm run lint:fix # Auto-fix issues
npm run format # Format code
npm test # Run tests
npm run build # Build| Channel | Link |
|---|---|
| Dashboard | costkatana.com |
| Documentation | docs.costkatana.com |
| GitHub | github.com/Hypothesize-Tech |
| Discord | discord.gg/D8nDArmKbY |
| support@costkatana.com |
MIT Β© Cost Katana
Start cutting AI costs today π₯·
npm install cost-katanaimport { ai, OPENAI } from 'cost-katana';
await ai(OPENAI.GPT_4, 'Hello, world!');