The missing middleware for LLM-powered applications
Slash 30-70% off your API costs instantly with zero contract changes.
Features β’ Quick Start β’ Usage β’ Documentation β’ Contributing
- β‘ Zero-Config Integration - Add one line to your Express app, start saving immediately
- π° Instant Cost Savings - 30-70% reduction in token usage for LLM API responses
- π Smart Detection - Automatically identifies LLM clients vs. regular browsers
- π Built-in Analytics - Real-time savings tracking and metrics
- π High Performance - <3ms middleware overhead, >2000 req/s throughput
- π§© Pluggable Architecture - Swap cache, logger, or add custom detectors
- ποΈ Functional Core - Pure, deterministic, side-effect-free business logic
- π¦ Framework Ready - Express, NestJS & Fastify all available
TOON (Token-Oriented Object Notation) is a compact serialization format designed for LLMs. It achieves 30-60% fewer tokens than JSON by:
- Using indentation instead of braces
- Declaring field names once for arrays
- Removing redundant punctuation
- Maintaining human readability
Example:
// Standard JSON (86 characters β 22 tokens)
{"users":[{"id":1,"name":"Alice","role":"admin"},{"id":2,"name":"Bob","role":"user"}]}
// TOON format (52 characters β 13 tokens - 41% savings!)
users[2]{id,name,role}:
1,Alice,admin
2,Bob,userExpress:
npm install @toon-middleware/express
# or
pnpm add @toon-middleware/expressNestJS:
npm install @toon-middleware/nest
# or
pnpm add @toon-middleware/nestFastify:
npm install @toon-middleware/fastify
# or
pnpm add @toon-middleware/fastifyimport express from 'express';
import { createExpressToonMiddleware } from '@toon-middleware/express';
const app = express();
// Add TOON middleware (that's it!)
app.use(createExpressToonMiddleware());
// Your existing routes work unchanged
app.get('/api/users', (req, res) => {
res.json({
users: [
{ id: 1, name: 'Alice', email: 'alice@example.com' },
{ id: 2, name: 'Bob', email: 'bob@example.com' }
]
});
});
app.listen(3000);That's it! LLM clients now automatically receive TOON responses, while browsers get JSON.
import express from 'express';
import { createExpressToonMiddleware } from '@toon-middleware/express';
const app = express();
app.use(express.json());
app.use(createExpressToonMiddleware());
// π€ LLM Inference Endpoint - Benefits from TOON compression
app.post('/api/chat/completions', (req, res) => {
// Simulate chat completion response (large, repetitive structure)
res.json({
id: 'chatcmpl-123',
object: 'chat.completion',
created: Date.now(),
model: 'gpt-4',
choices: [
{
index: 0,
message: {
role: 'assistant',
content: 'Here are the analysis results...'
},
finish_reason: 'stop'
}
],
usage: {
prompt_tokens: 50,
completion_tokens: 200,
total_tokens: 250
}
});
});
// π Analytics Endpoint - Perfect for TOON (uniform array data)
app.get('/api/users', (req, res) => {
res.json({
users: [
{ id: 1, name: 'Alice', email: 'alice@example.com', role: 'admin', active: true },
{ id: 2, name: 'Bob', email: 'bob@example.com', role: 'user', active: true },
{ id: 3, name: 'Carol', email: 'carol@example.com', role: 'user', active: false }
],
total: 3,
page: 1
});
});
// π Regular Endpoint - Browsers get JSON, LLMs get TOON automatically
app.get('/api/health', (req, res) => {
res.json({ status: 'ok', timestamp: Date.now() });
});
app.listen(3000);What happens:
- π€ LLM clients (detected by User-Agent or headers) β Get TOON format, save 30-70% tokens
- π Browser clients β Get regular JSON, everything works as expected
- No code changes needed - The middleware handles everything automatically!
Example Response Comparison:
When an LLM client requests /api/users, they receive:
users[3]{active,email,id,name,role}:
true,alice@example.com,1,Alice,admin
true,bob@example.com,2,Bob,user
false,carol@example.com,3,Carol,user
total: 3
page: 1
When a browser requests the same endpoint, they receive:
{
"users": [
{"id": 1, "name": "Alice", "email": "alice@example.com", "role": "admin", "active": true},
{"id": 2, "name": "Bob", "email": "bob@example.com", "role": "user", "active": true},
{"id": 3, "name": "Carol", "email": "carol@example.com", "role": "user", "active": false}
],
"total": 3,
"page": 1
}Same data, different format, automatic detection! π―
import { createExpressToonMiddleware } from '@toon-middleware/express';
app.use(createExpressToonMiddleware({
// Auto-convert responses for detected LLM clients (default: true)
autoConvert: true,
// Enable caching (default: true)
cache: true,
cacheOptions: {
maxSize: 1000, // Max cached entries
ttl: 300000, // 5 minutes
checkPeriod: 60000 // Cleanup every minute
},
// Enable analytics tracking (default: true)
analytics: true,
// LLM detection confidence threshold (0-1, default: 0.8)
confidenceThreshold: 0.8,
// Token pricing for savings calculation
pricing: {
per1K: 0.002 // $0.002 per 1K tokens (default)
},
// Custom logger (default: built-in logger)
logger: customLogger,
// Log level: 'error' | 'warn' | 'info' | 'debug' | 'trace'
logLevel: 'info'
}));TOON middleware adds helpful headers to responses:
X-TOON-Mode: toon # 'toon', 'passthrough', or 'fallback'
X-TOON-Savings: 42.5% # Percentage of tokens saved
X-TOON-Tokens: 240->138 # Original -> Converted token count
X-TOON-Cost-Saved: $0.0002 # Estimated cost savings
X-Request-ID: req-1699564823456-abc # Unique request identifierimport { createExpressToonMiddleware } from '@toon-middleware/express';
const middleware = createExpressToonMiddleware({
analytics: true
});
// Access the analytics tracker
middleware.analytics?.on('conversion', (data) => {
console.log('Conversion:', data);
// {
// requestId: 'req-123',
// path: '/api/users',
// method: 'GET',
// savings: { percentage: 42.5, tokens: 102, cost: 0.0002 },
// timestamp: '2024-11-13T10:30:00.000Z'
// }
});
middleware.analytics?.on('error', (error) => {
console.error('Analytics error:', error);
});
app.use(middleware);import { createExpressToonMiddleware } from '@toon-middleware/express';
import { createHeaderDetector } from '@toon-middleware/core';
app.use(createExpressToonMiddleware({
customDetectors: [
// Detect custom header
createHeaderDetector('x-my-llm-client', () => true, {
confidence: 1.0
}),
// Custom detection function
({ headers, userAgent }) => {
if (headers['x-api-key']?.startsWith('llm-')) {
return { isLLM: true, confidence: 0.95 };
}
return { isRegular: true, confidence: 0.5 };
}
]
}));import { createExpressToonMiddleware } from '@toon-middleware/express';
import { convertToTOON } from '@toon-middleware/core';
app.use(createExpressToonMiddleware({
autoConvert: false // Disable automatic conversion
}));
app.get('/api/data', (req, res) => {
const data = { users: [...] };
// Manually convert to TOON
const result = convertToTOON(data);
if (result.success) {
res.set('Content-Type', 'text/plain; charset=utf-8');
res.send(result.data);
} else {
res.json(data); // Fallback to JSON
}
});// app.module.ts
import { Module } from '@nestjs/common';
import { ToonModule } from '@toon-middleware/nest';
@Module({
imports: [
ToonModule.forRoot({
autoConvert: true,
cache: true,
analytics: true
})
],
controllers: [UsersController]
})
export class AppModule {}// users.controller.ts
import { Controller, Get } from '@nestjs/common';
@Controller('api')
export class UsersController {
@Get('users')
getUsers() {
return {
users: [
{ id: 1, name: 'Alice', email: 'alice@example.com', role: 'admin' },
{ id: 2, name: 'Bob', email: 'bob@example.com', role: 'user' }
]
};
}
}That's it! LLM clients automatically receive TOON format, browsers get JSON.
import { ToonModule } from '@toon-middleware/nest';
@Module({
imports: [
ToonModule.forRoot({
autoConvert: true,
confidenceThreshold: 0.8,
cache: true,
cacheOptions: {
maxSize: 1000,
ttl: 300000
},
analytics: true,
analyticsOptions: {
enabled: true
},
pricing: {
per1K: 0.002
},
// Make module global (optional)
global: true
})
]
})
export class AppModule {}import { ConfigModule, ConfigService } from '@nestjs/config';
import { ToonModule } from '@toon-middleware/nest';
@Module({
imports: [
ConfigModule.forRoot(),
ToonModule.forRootAsync({
imports: [ConfigModule],
inject: [ConfigService],
useFactory: async (configService: ConfigService) => ({
autoConvert: configService.get('TOON_AUTO_CONVERT', true),
cache: configService.get('TOON_CACHE_ENABLED', true),
analytics: configService.get('TOON_ANALYTICS_ENABLED', true)
})
})
]
})
export class AppModule {}import { Injectable, OnModuleInit, Inject } from '@nestjs/common';
import { AnalyticsTracker } from '@toon-middleware/nest';
@Injectable()
export class AnalyticsService implements OnModuleInit {
constructor(
@Inject('TOON_ANALYTICS') private analytics: AnalyticsTracker
) {}
onModuleInit() {
if (this.analytics) {
this.analytics.on('conversion', (payload) => {
console.log('TOON Conversion:', {
path: payload.path,
savings: payload.savings.percentage,
tokensSaved: payload.savings.tokens
});
});
}
}
}import Fastify from 'fastify';
import toonPlugin from '@toon-middleware/fastify';
const fastify = Fastify({ logger: true });
// Register TOON plugin
await fastify.register(toonPlugin, {
autoConvert: true,
cache: true,
analytics: true
});
// Your routes work unchanged
fastify.get('/api/users', async () => {
return {
users: [
{ id: 1, name: 'Alice', role: 'admin' },
{ id: 2, name: 'Bob', role: 'user' }
]
};
});
await fastify.listen({ port: 3000 });That's it! LLM clients automatically receive TOON format, browsers get JSON.
import Fastify from 'fastify';
import toonPlugin from '@toon-middleware/fastify';
const fastify = Fastify();
await fastify.register(toonPlugin, {
autoConvert: true,
confidenceThreshold: 0.8,
cache: true,
cacheOptions: {
maxSize: 1000,
ttl: 300000
},
analytics: true,
analyticsOptions: {
enabled: true
},
pricing: {
per1K: 0.002
}
});
fastify.get('/api/data', async () => {
return { items: [1, 2, 3] };
});
await fastify.listen({ port: 3000 });import Fastify from 'fastify';
import toonPlugin from '@toon-middleware/fastify';
const fastify = Fastify();
await fastify.register(toonPlugin, {
analytics: true
});
// Access analytics via decorated property
fastify.toonAnalytics.on('conversion', (payload) => {
console.log('TOON Conversion:', {
path: payload.path,
savings: payload.savings.percentage,
tokensSaved: payload.savings.tokens
});
});
fastify.get('/api/test', async () => {
return { message: 'test' };
});
await fastify.listen({ port: 3000 });LLM clients should include headers to request TOON format:
// Using fetch
const response = await fetch('http://localhost:3000/api/users', {
headers: {
'User-Agent': 'OpenAI-API-Client/1.0',
'Accept': 'application/json, text/toon',
'X-Accept-Toon': 'true'
}
});
const toonData = await response.text();
console.log(toonData); // TOON formatted responseimport { convertToTOON } from '@toon-middleware/core';
const data = { users: [...] };
const result = convertToTOON(data);
const response = await fetch('http://localhost:3000/api/ingest', {
method: 'POST',
headers: {
'Content-Type': 'text/plain; charset=utf-8',
'X-Accept-Toon': 'true'
},
body: result.data
});toon-middleware/
βββ packages/
β βββ core/ # Pure business logic (converters, detectors, analytics)
β βββ integrations/ # Framework-specific adapters
β β βββ express/ # Express middleware β
β β βββ nest/ # NestJS module β
(TypeScript)
β β βββ fastify/ # Fastify plugin β
β βββ plugins/ # Pluggable infrastructure
β β βββ cache/ # Cache manager implementation
β β βββ logger/ # Logger factory and transports
β βββ utils/ # Shared helpers
β βββ examples/ # Example applications
β βββ express-basic/ # Express demo with dashboard
βββ tools/ # Benchmarks, scripts, configs
Core:
@toon-middleware/coreβ TOON converters, client detectors, analytics, optimizers, validators@toon-middleware/utilsβ Shared helpers for request IDs, validation, header detection
Integrations:
@toon-middleware/expressβ Express middleware (JavaScript)@toon-middleware/nestβ NestJS module with interceptors and DI (TypeScript)@toon-middleware/fastifyβ Fastify plugin (JavaScript)
Plugins:
@toon-middleware/cacheβ Event-driven TTL cache with LRU eviction@toon-middleware/loggerβ Level-aware structured logger
- Node.js 24+ (LTS) - Use nvm for version management
- PNPM 9+ - Fast, disk space efficient package manager
# Clone the repository
git clone https://github.com/yourusername/toon-middleware.git
cd toon-middleware
# Use the correct Node version (if you have nvm installed)
nvm use
# Install dependencies
pnpm install
# Run tests
pnpm test
# Run benchmarks
pnpm benchmark
# Start the demo server
pnpm demoVisit http://localhost:5050/dashboard to see live savings metrics.
pnpm build # Build all packages
pnpm test # Run all tests (node:test)
pnpm test:coverage # Generate experimental coverage
pnpm test:watch # Run tests in watch mode
pnpm benchmark # Execute performance benchmarks
pnpm lint # Lint all packages
pnpm typecheck # Type check JS with TypeScript
pnpm dev # Start demo in development mode
pnpm demo # Start demo server
pnpm clean # Clean all build artifacts and node_modules- Functional Core, Imperative Shell - Pure business logic in
core, side effects inintegrationsandplugins - Workspace Discipline - Internal packages use workspace protocol (
workspace:*) - Test Coverage - Every pure function has tests for determinism and immutability
- Performance First - Benchmarks validate <3ms overhead and >2000 req/s throughput
- Documentation - Every feature includes examples and API documentation
Targets:
- β Core conversions: <1 ms average
- β Middleware overhead: <3 ms
- β Throughput: >2000 requests/second
Run benchmarks:
pnpm benchmark- Express middleware integration
- Intelligent LLM client detection
- In-memory caching with TTL
- Real-time analytics and savings tracking
- Performance benchmarks
- NestJS module with TypeScript support
- Fastify plugin
- Redis cache adapter
- OpenTelemetry integration
- Metrics exporters (Prometheus, Datadog)
- Distributed load testing harness
- Architecture Guide - Functional core, imperative shell pattern
- API Reference - Complete API documentation
- Examples - Usage examples and patterns
We welcome contributions! Please follow these steps:
- Fork and clone the repository
- Use the correct Node version:
nvm use(requires Node 24+) - Install dependencies:
pnpm install - Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes following our architecture principles:
- Keep business logic pure in
packages/core - Isolate side effects in
integrationsandplugins - Add tests for new functionality
- Keep business logic pure in
- Run tests and linting:
pnpm test && pnpm lint - Commit your changes:
git commit -m 'Add amazing feature' - Push to your fork:
git push origin feature/amazing-feature - Open a Pull Request
- Keep business logic pure and deterministic inside
packages/core - Isolate side effects (HTTP, caching, logging, timers) within
integrationsandplugins - Reuse shared helpers from
packages/utilsto avoid duplication - Maintain documentation alongside features (
docs/and package READMEs) - Enforce workspace consistency via shared linting, formatting, and type checking
TOON Middleware is released under the MIT License.
- TOON Format - The compact serialization format powering this middleware
- The Node.js and Express communities for building amazing tools
Built with β€οΈ for the LLM ecosystem
β Star us on GitHub β’ π Report a Bug β’ π‘ Request a Feature