Skip to content

The missing Middleware for reducing LLM API costs through TOON format by converting JSON to TOON automatically with 30-60% token savings with no code changes.

License

Notifications You must be signed in to change notification settings

hwclass/toon-middleware

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎯 TOON Middleware

The missing middleware for LLM-powered applications

Slash 30-70% off your API costs instantly with zero contract changes.

License: MIT Node Version PNPM

Features β€’ Quick Start β€’ Usage β€’ Documentation β€’ Contributing


🌟 Features

  • ⚑ Zero-Config Integration - Add one line to your Express app, start saving immediately
  • πŸ’° Instant Cost Savings - 30-70% reduction in token usage for LLM API responses
  • πŸ” Smart Detection - Automatically identifies LLM clients vs. regular browsers
  • πŸ“Š Built-in Analytics - Real-time savings tracking and metrics
  • πŸš€ High Performance - <3ms middleware overhead, >2000 req/s throughput
  • 🧩 Pluggable Architecture - Swap cache, logger, or add custom detectors
  • πŸ—οΈ Functional Core - Pure, deterministic, side-effect-free business logic
  • πŸ“¦ Framework Ready - Express, NestJS & Fastify all available

πŸ“– What is TOON?

TOON (Token-Oriented Object Notation) is a compact serialization format designed for LLMs. It achieves 30-60% fewer tokens than JSON by:

  • Using indentation instead of braces
  • Declaring field names once for arrays
  • Removing redundant punctuation
  • Maintaining human readability

Example:

// Standard JSON (86 characters β‰ˆ 22 tokens)
{"users":[{"id":1,"name":"Alice","role":"admin"},{"id":2,"name":"Bob","role":"user"}]}

// TOON format (52 characters β‰ˆ 13 tokens - 41% savings!)
users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

πŸš€ Quick Start

Installation

Express:

npm install @toon-middleware/express
# or
pnpm add @toon-middleware/express

NestJS:

npm install @toon-middleware/nest
# or
pnpm add @toon-middleware/nest

Fastify:

npm install @toon-middleware/fastify
# or
pnpm add @toon-middleware/fastify

Basic Usage

import express from 'express';
import { createExpressToonMiddleware } from '@toon-middleware/express';

const app = express();

// Add TOON middleware (that's it!)
app.use(createExpressToonMiddleware());

// Your existing routes work unchanged
app.get('/api/users', (req, res) => {
  res.json({
    users: [
      { id: 1, name: 'Alice', email: 'alice@example.com' },
      { id: 2, name: 'Bob', email: 'bob@example.com' }
    ]
  });
});

app.listen(3000);

That's it! LLM clients now automatically receive TOON responses, while browsers get JSON.


πŸ’‘ Usage

Express Integration

Basic Setup

import express from 'express';
import { createExpressToonMiddleware } from '@toon-middleware/express';

const app = express();

app.use(express.json());
app.use(createExpressToonMiddleware());

// πŸ€– LLM Inference Endpoint - Benefits from TOON compression
app.post('/api/chat/completions', (req, res) => {
  // Simulate chat completion response (large, repetitive structure)
  res.json({
    id: 'chatcmpl-123',
    object: 'chat.completion',
    created: Date.now(),
    model: 'gpt-4',
    choices: [
      {
        index: 0,
        message: {
          role: 'assistant',
          content: 'Here are the analysis results...'
        },
        finish_reason: 'stop'
      }
    ],
    usage: {
      prompt_tokens: 50,
      completion_tokens: 200,
      total_tokens: 250
    }
  });
});

// πŸ“Š Analytics Endpoint - Perfect for TOON (uniform array data)
app.get('/api/users', (req, res) => {
  res.json({
    users: [
      { id: 1, name: 'Alice', email: 'alice@example.com', role: 'admin', active: true },
      { id: 2, name: 'Bob', email: 'bob@example.com', role: 'user', active: true },
      { id: 3, name: 'Carol', email: 'carol@example.com', role: 'user', active: false }
    ],
    total: 3,
    page: 1
  });
});

// 🌐 Regular Endpoint - Browsers get JSON, LLMs get TOON automatically
app.get('/api/health', (req, res) => {
  res.json({ status: 'ok', timestamp: Date.now() });
});

app.listen(3000);

What happens:

  • πŸ€– LLM clients (detected by User-Agent or headers) β†’ Get TOON format, save 30-70% tokens
  • 🌐 Browser clients β†’ Get regular JSON, everything works as expected
  • No code changes needed - The middleware handles everything automatically!

Example Response Comparison:

When an LLM client requests /api/users, they receive:

users[3]{active,email,id,name,role}:
  true,alice@example.com,1,Alice,admin
  true,bob@example.com,2,Bob,user
  false,carol@example.com,3,Carol,user
total: 3
page: 1

When a browser requests the same endpoint, they receive:

{
  "users": [
    {"id": 1, "name": "Alice", "email": "alice@example.com", "role": "admin", "active": true},
    {"id": 2, "name": "Bob", "email": "bob@example.com", "role": "user", "active": true},
    {"id": 3, "name": "Carol", "email": "carol@example.com", "role": "user", "active": false}
  ],
  "total": 3,
  "page": 1
}

Same data, different format, automatic detection! 🎯

Advanced Configuration

import { createExpressToonMiddleware } from '@toon-middleware/express';

app.use(createExpressToonMiddleware({
  // Auto-convert responses for detected LLM clients (default: true)
  autoConvert: true,

  // Enable caching (default: true)
  cache: true,
  cacheOptions: {
    maxSize: 1000,        // Max cached entries
    ttl: 300000,          // 5 minutes
    checkPeriod: 60000    // Cleanup every minute
  },

  // Enable analytics tracking (default: true)
  analytics: true,

  // LLM detection confidence threshold (0-1, default: 0.8)
  confidenceThreshold: 0.8,

  // Token pricing for savings calculation
  pricing: {
    per1K: 0.002  // $0.002 per 1K tokens (default)
  },

  // Custom logger (default: built-in logger)
  logger: customLogger,

  // Log level: 'error' | 'warn' | 'info' | 'debug' | 'trace'
  logLevel: 'info'
}));

Response Headers

TOON middleware adds helpful headers to responses:

X-TOON-Mode: toon                    # 'toon', 'passthrough', or 'fallback'
X-TOON-Savings: 42.5%                # Percentage of tokens saved
X-TOON-Tokens: 240->138              # Original -> Converted token count
X-TOON-Cost-Saved: $0.0002           # Estimated cost savings
X-Request-ID: req-1699564823456-abc  # Unique request identifier

Listening to Analytics Events

import { createExpressToonMiddleware } from '@toon-middleware/express';

const middleware = createExpressToonMiddleware({
  analytics: true
});

// Access the analytics tracker
middleware.analytics?.on('conversion', (data) => {
  console.log('Conversion:', data);
  // {
  //   requestId: 'req-123',
  //   path: '/api/users',
  //   method: 'GET',
  //   savings: { percentage: 42.5, tokens: 102, cost: 0.0002 },
  //   timestamp: '2024-11-13T10:30:00.000Z'
  // }
});

middleware.analytics?.on('error', (error) => {
  console.error('Analytics error:', error);
});

app.use(middleware);

Custom Client Detection

import { createExpressToonMiddleware } from '@toon-middleware/express';
import { createHeaderDetector } from '@toon-middleware/core';

app.use(createExpressToonMiddleware({
  customDetectors: [
    // Detect custom header
    createHeaderDetector('x-my-llm-client', () => true, {
      confidence: 1.0
    }),

    // Custom detection function
    ({ headers, userAgent }) => {
      if (headers['x-api-key']?.startsWith('llm-')) {
        return { isLLM: true, confidence: 0.95 };
      }
      return { isRegular: true, confidence: 0.5 };
    }
  ]
}));

Disable Auto-Conversion (Manual Mode)

import { createExpressToonMiddleware } from '@toon-middleware/express';
import { convertToTOON } from '@toon-middleware/core';

app.use(createExpressToonMiddleware({
  autoConvert: false  // Disable automatic conversion
}));

app.get('/api/data', (req, res) => {
  const data = { users: [...] };

  // Manually convert to TOON
  const result = convertToTOON(data);

  if (result.success) {
    res.set('Content-Type', 'text/plain; charset=utf-8');
    res.send(result.data);
  } else {
    res.json(data);  // Fallback to JSON
  }
});

NestJS Integration

Basic Setup

// app.module.ts
import { Module } from '@nestjs/common';
import { ToonModule } from '@toon-middleware/nest';

@Module({
  imports: [
    ToonModule.forRoot({
      autoConvert: true,
      cache: true,
      analytics: true
    })
  ],
  controllers: [UsersController]
})
export class AppModule {}
// users.controller.ts
import { Controller, Get } from '@nestjs/common';

@Controller('api')
export class UsersController {
  @Get('users')
  getUsers() {
    return {
      users: [
        { id: 1, name: 'Alice', email: 'alice@example.com', role: 'admin' },
        { id: 2, name: 'Bob', email: 'bob@example.com', role: 'user' }
      ]
    };
  }
}

That's it! LLM clients automatically receive TOON format, browsers get JSON.

Advanced Configuration

import { ToonModule } from '@toon-middleware/nest';

@Module({
  imports: [
    ToonModule.forRoot({
      autoConvert: true,
      confidenceThreshold: 0.8,

      cache: true,
      cacheOptions: {
        maxSize: 1000,
        ttl: 300000
      },

      analytics: true,
      analyticsOptions: {
        enabled: true
      },

      pricing: {
        per1K: 0.002
      },

      // Make module global (optional)
      global: true
    })
  ]
})
export class AppModule {}

Async Configuration with ConfigService

import { ConfigModule, ConfigService } from '@nestjs/config';
import { ToonModule } from '@toon-middleware/nest';

@Module({
  imports: [
    ConfigModule.forRoot(),
    ToonModule.forRootAsync({
      imports: [ConfigModule],
      inject: [ConfigService],
      useFactory: async (configService: ConfigService) => ({
        autoConvert: configService.get('TOON_AUTO_CONVERT', true),
        cache: configService.get('TOON_CACHE_ENABLED', true),
        analytics: configService.get('TOON_ANALYTICS_ENABLED', true)
      })
    })
  ]
})
export class AppModule {}

Listening to Analytics Events

import { Injectable, OnModuleInit, Inject } from '@nestjs/common';
import { AnalyticsTracker } from '@toon-middleware/nest';

@Injectable()
export class AnalyticsService implements OnModuleInit {
  constructor(
    @Inject('TOON_ANALYTICS') private analytics: AnalyticsTracker
  ) {}

  onModuleInit() {
    if (this.analytics) {
      this.analytics.on('conversion', (payload) => {
        console.log('TOON Conversion:', {
          path: payload.path,
          savings: payload.savings.percentage,
          tokensSaved: payload.savings.tokens
        });
      });
    }
  }
}

Fastify Integration

Basic Setup

import Fastify from 'fastify';
import toonPlugin from '@toon-middleware/fastify';

const fastify = Fastify({ logger: true });

// Register TOON plugin
await fastify.register(toonPlugin, {
  autoConvert: true,
  cache: true,
  analytics: true
});

// Your routes work unchanged
fastify.get('/api/users', async () => {
  return {
    users: [
      { id: 1, name: 'Alice', role: 'admin' },
      { id: 2, name: 'Bob', role: 'user' }
    ]
  };
});

await fastify.listen({ port: 3000 });

That's it! LLM clients automatically receive TOON format, browsers get JSON.

Advanced Configuration

import Fastify from 'fastify';
import toonPlugin from '@toon-middleware/fastify';

const fastify = Fastify();

await fastify.register(toonPlugin, {
  autoConvert: true,
  confidenceThreshold: 0.8,

  cache: true,
  cacheOptions: {
    maxSize: 1000,
    ttl: 300000
  },

  analytics: true,
  analyticsOptions: {
    enabled: true
  },

  pricing: {
    per1K: 0.002
  }
});

fastify.get('/api/data', async () => {
  return { items: [1, 2, 3] };
});

await fastify.listen({ port: 3000 });

Listening to Analytics Events

import Fastify from 'fastify';
import toonPlugin from '@toon-middleware/fastify';

const fastify = Fastify();

await fastify.register(toonPlugin, {
  analytics: true
});

// Access analytics via decorated property
fastify.toonAnalytics.on('conversion', (payload) => {
  console.log('TOON Conversion:', {
    path: payload.path,
    savings: payload.savings.percentage,
    tokensSaved: payload.savings.tokens
  });
});

fastify.get('/api/test', async () => {
  return { message: 'test' };
});

await fastify.listen({ port: 3000 });

Client Usage

Making Requests

LLM clients should include headers to request TOON format:

// Using fetch
const response = await fetch('http://localhost:3000/api/users', {
  headers: {
    'User-Agent': 'OpenAI-API-Client/1.0',
    'Accept': 'application/json, text/toon',
    'X-Accept-Toon': 'true'
  }
});

const toonData = await response.text();
console.log(toonData);  // TOON formatted response

Sending TOON Data

import { convertToTOON } from '@toon-middleware/core';

const data = { users: [...] };
const result = convertToTOON(data);

const response = await fetch('http://localhost:3000/api/ingest', {
  method: 'POST',
  headers: {
    'Content-Type': 'text/plain; charset=utf-8',
    'X-Accept-Toon': 'true'
  },
  body: result.data
});

πŸ—οΈ Architecture

Monorepo Structure

toon-middleware/
β”œβ”€β”€ packages/
β”‚   β”œβ”€β”€ core/                  # Pure business logic (converters, detectors, analytics)
β”‚   β”œβ”€β”€ integrations/          # Framework-specific adapters
β”‚   β”‚   β”œβ”€β”€ express/           # Express middleware βœ…
β”‚   β”‚   β”œβ”€β”€ nest/              # NestJS module βœ… (TypeScript)
β”‚   β”‚   └── fastify/           # Fastify plugin βœ…
β”‚   β”œβ”€β”€ plugins/               # Pluggable infrastructure
β”‚   β”‚   β”œβ”€β”€ cache/             # Cache manager implementation
β”‚   β”‚   └── logger/            # Logger factory and transports
β”‚   β”œβ”€β”€ utils/                 # Shared helpers
β”‚   └── examples/              # Example applications
β”‚       └── express-basic/     # Express demo with dashboard
└── tools/                     # Benchmarks, scripts, configs

Packages

Core:

  • @toon-middleware/core β€” TOON converters, client detectors, analytics, optimizers, validators
  • @toon-middleware/utils β€” Shared helpers for request IDs, validation, header detection

Integrations:

  • @toon-middleware/express β€” Express middleware (JavaScript)
  • @toon-middleware/nest β€” NestJS module with interceptors and DI (TypeScript)
  • @toon-middleware/fastify β€” Fastify plugin (JavaScript)

Plugins:

  • @toon-middleware/cache β€” Event-driven TTL cache with LRU eviction
  • @toon-middleware/logger β€” Level-aware structured logger

πŸ§ͺ Development

Prerequisites

  • Node.js 24+ (LTS) - Use nvm for version management
  • PNPM 9+ - Fast, disk space efficient package manager

Setup

# Clone the repository
git clone https://github.com/yourusername/toon-middleware.git
cd toon-middleware

# Use the correct Node version (if you have nvm installed)
nvm use

# Install dependencies
pnpm install

# Run tests
pnpm test

# Run benchmarks
pnpm benchmark

# Start the demo server
pnpm demo

Visit http://localhost:5050/dashboard to see live savings metrics.

Available Scripts

pnpm build          # Build all packages
pnpm test           # Run all tests (node:test)
pnpm test:coverage  # Generate experimental coverage
pnpm test:watch     # Run tests in watch mode
pnpm benchmark      # Execute performance benchmarks
pnpm lint           # Lint all packages
pnpm typecheck      # Type check JS with TypeScript
pnpm dev            # Start demo in development mode
pnpm demo           # Start demo server
pnpm clean          # Clean all build artifacts and node_modules

Project Principles

  • Functional Core, Imperative Shell - Pure business logic in core, side effects in integrations and plugins
  • Workspace Discipline - Internal packages use workspace protocol (workspace:*)
  • Test Coverage - Every pure function has tests for determinism and immutability
  • Performance First - Benchmarks validate <3ms overhead and >2000 req/s throughput
  • Documentation - Every feature includes examples and API documentation

πŸ“Š Performance Benchmarks

Targets:

  • βœ… Core conversions: <1 ms average
  • βœ… Middleware overhead: <3 ms
  • βœ… Throughput: >2000 requests/second

Run benchmarks:

pnpm benchmark

πŸ—ΊοΈ Roadmap

  • Express middleware integration
  • Intelligent LLM client detection
  • In-memory caching with TTL
  • Real-time analytics and savings tracking
  • Performance benchmarks
  • NestJS module with TypeScript support
  • Fastify plugin
  • Redis cache adapter
  • OpenTelemetry integration
  • Metrics exporters (Prometheus, Datadog)
  • Distributed load testing harness

πŸ“š Documentation


🀝 Contributing

We welcome contributions! Please follow these steps:

  1. Fork and clone the repository
  2. Use the correct Node version: nvm use (requires Node 24+)
  3. Install dependencies: pnpm install
  4. Create a feature branch: git checkout -b feature/amazing-feature
  5. Make your changes following our architecture principles:
    • Keep business logic pure in packages/core
    • Isolate side effects in integrations and plugins
    • Add tests for new functionality
  6. Run tests and linting: pnpm test && pnpm lint
  7. Commit your changes: git commit -m 'Add amazing feature'
  8. Push to your fork: git push origin feature/amazing-feature
  9. Open a Pull Request

Development Guidelines

  • Keep business logic pure and deterministic inside packages/core
  • Isolate side effects (HTTP, caching, logging, timers) within integrations and plugins
  • Reuse shared helpers from packages/utils to avoid duplication
  • Maintain documentation alongside features (docs/ and package READMEs)
  • Enforce workspace consistency via shared linting, formatting, and type checking

πŸ“„ License

TOON Middleware is released under the MIT License.


πŸ™ Acknowledgments

  • TOON Format - The compact serialization format powering this middleware
  • The Node.js and Express communities for building amazing tools

Built with ❀️ for the LLM ecosystem

⭐ Star us on GitHub β€’ πŸ› Report a Bug β€’ πŸ’‘ Request a Feature

About

The missing Middleware for reducing LLM API costs through TOON format by converting JSON to TOON automatically with 30-60% token savings with no code changes.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published