Skip to content

Token-Oriented Object Notation and JSON data format comparison in terms of token size , context window, token cost money, llm readability and data readability along with retrieval of context by the LLMs.

License

Notifications You must be signed in to change notification settings

264Gaurav/toon_json_for_llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TOON vs JSON (for LLM Inputs)

TOON (Token‑Oriented Object Notation) is a compact, LLM‑friendly format that typically reduces token count by 30–60% vs JSON. Fewer tokens = lower costs and more room in the context window.

Why TOON is Better for LLMs

  • Fewer tokens → lower cost: Keys aren’t repeated for every row.
  • More context per request: Save tokens for what matters (instructions + data).
  • LLM‑friendly structure: Simple, row‑oriented text with predictable patterns.

How TOON Achieves Savings

  1. Declare keys once per uniform array: {id,name,role} instead of repeating keys.
  2. State length upfront: [3] clearly signals row count.
  3. Minimal syntax: Avoids braces/quotes unless needed (quotes only when required).
  4. Delimited rows: Values appear in a fixed key order (CSV/TSV‑like), which tokenizes well.

Example

// JSON
{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" }
  ]
}
// TOON
users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

When TOON Fits

Use TOON when your data is:

  • Uniform arrays of objects (same keys per row)
  • Flat or shallow (1–2 nesting levels)
  • Large and sent directly to an LLM

Limitations (When NOT to use TOON)

  • Non‑uniform arrays (varying keys across rows)
  • Deeply nested objects (>3 levels) where readability drops
  • General storage/transport for arbitrary data (JSON/MsgPack are better)
  • Schemas with many optional fields (header becomes sparse/misleading)

Quick Start

Prereqs: Node.js 18+. Ollama optional for local model tests.

Install

npm install

Run

npm start          # Quick comparison (JSON vs TOON)
npm run compare    # Detailed metrics (bytes, lines, tokens)
npm run test       # Test with an Ollama model (if installed)

Example Results:

Use in Code

import { encode } from './src/toon.js';

const data = { users: [
  { id: 1, name: 'Alice', role: 'admin' },
  { id: 2, name: 'Bob', role: 'user' }
]};

console.log(encode(data));                 // default comma delimiter
console.log(encode(data, { delimiter: '\t' })); // often tokenizes better

Notes

  • Token counting uses tiktoken. Actual savings depend on model/tokenizer.
  • Delimiters supported: comma, tab, pipe. Tab often yields best tokenization.
  • TOON is designed for LLM prompts, not as a general‑purpose data store.

License

Apache License 2.0

About

Token-Oriented Object Notation and JSON data format comparison in terms of token size , context window, token cost money, llm readability and data readability along with retrieval of context by the LLMs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published