TOON vs JSON (for LLM Inputs)

TOON (Token‑Oriented Object Notation) is a compact, LLM‑friendly format that typically reduces token count by 30–60% vs JSON. Fewer tokens = lower costs and more room in the context window.

Why TOON is Better for LLMs

Fewer tokens → lower cost: Keys aren’t repeated for every row.
More context per request: Save tokens for what matters (instructions + data).
LLM‑friendly structure: Simple, row‑oriented text with predictable patterns.

How TOON Achieves Savings

Declare keys once per uniform array: {id,name,role} instead of repeating keys.
State length upfront: [3] clearly signals row count.
Minimal syntax: Avoids braces/quotes unless needed (quotes only when required).
Delimited rows: Values appear in a fixed key order (CSV/TSV‑like), which tokenizes well.

Example

// JSON
{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" }
  ]
}

// TOON
users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

When TOON Fits

Use TOON when your data is:

Uniform arrays of objects (same keys per row)
Flat or shallow (1–2 nesting levels)
Large and sent directly to an LLM

Limitations (When NOT to use TOON)

Non‑uniform arrays (varying keys across rows)
Deeply nested objects (>3 levels) where readability drops
General storage/transport for arbitrary data (JSON/MsgPack are better)
Schemas with many optional fields (header becomes sparse/misleading)

Quick Start

Prereqs: Node.js 18+. Ollama optional for local model tests.

Install

npm install

Run

npm start          # Quick comparison (JSON vs TOON)
npm run compare    # Detailed metrics (bytes, lines, tokens)
npm run test       # Test with an Ollama model (if installed)

Example Results:

Detailed Comparison Results - Token counts, byte sizes, and format comparisons across multiple datasets
LLM Test Results - Real-world testing with Ollama (llama3.1:8b) showing response times and token usage

Use in Code

import { encode } from './src/toon.js';

const data = { users: [
  { id: 1, name: 'Alice', role: 'admin' },
  { id: 2, name: 'Bob', role: 'user' }
]};

console.log(encode(data));                 // default comma delimiter
console.log(encode(data, { delimiter: '\t' })); // often tokenizes better

Notes

Token counting uses tiktoken. Actual savings depend on model/tokenizer.
Delimiters supported: comma, tab, pipe. Tab often yields best tokenization.
TOON is designed for LLM prompts, not as a general‑purpose data store.

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.cursor		.cursor
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compareResult.md		compareResult.md
package.json		package.json
testResult.md		testResult.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TOON vs JSON (for LLM Inputs)

Why TOON is Better for LLMs

How TOON Achieves Savings

When TOON Fits

Limitations (When NOT to use TOON)

Quick Start

License

About

Uh oh!

Releases

Packages

Languages

License

264Gaurav/toon_json_for_llm

Folders and files

Latest commit

History

Repository files navigation

TOON vs JSON (for LLM Inputs)

Why TOON is Better for LLMs

How TOON Achieves Savings

When TOON Fits

Limitations (When NOT to use TOON)

Quick Start

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages