
But because the team passed a 500-row customer table to the model as plain JSON. The same payload in TOON would have cost roughly a third of that.
That’s when it hits you: JSON wasn’t built for this world.
It came from 2001, a time of web round-trips and browser consoles. Every brace, quote, comma, and repeated key made sense back then.
In 2025, those characters are tokens. Tokens are money. And every repeated "id": and "name": is a tax you pay for no extra information. TOON is a format built to remove that tax.
It keeps the full JSON data model but strips away the syntax models don’t need.
It replaces braces with indentation, turns repeated keys into a single header row, and makes array sizes explicit so the model can’t hallucinate extra entries.
- Same data.
- Less noise.
- Fewer tokens.
In real workloads, the difference is big.
We saw 61 percent savings on common datasets. Accuracy jumped as well because the structure is clearer and harder for the model to misinterpret.
TOON isn’t a new database. It isn’t compression. It’s simply a way to present structured data in a form that LLMs read more efficiently than JSON. For APIs, logs, storage systems JSON is still perfect. Inside prompts, it quietly becomes the most expensive part of your pipeline.
If you care about tokens, or if your context often includes tables, logs, or structured objects, this is worth a look.
I wrote up the full notes and benchmarks here.
Happy to answer questions or share examples if anyone wants to test TOON on their own datasets.
