ONTO: A Token-Efficient Columnar Notation for LLM Input Optimization

Harshavardhanan Deekeswar

arxiv: 2604.17512 · v1 · submitted 2026-04-19 · 💻 cs.CL · cs.LG

ONTO: A Token-Efficient Columnar Notation for LLM Input Optimization

Harshavardhanan Deekeswar This is my paper

Pith reviewed 2026-05-10 05:18 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords ONTO notationtoken reductionserialization formatLLM input optimizationcolumnar notationJSON alternativedata serialization efficiency

0 comments

The pith

ONTO notation cuts LLM input tokens by 46-51% versus JSON while keeping task accuracy intact.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ONTO as a serialization format that lists field names only once per entity type and then packs record values into pipe-delimited rows with simple indentation for any nested levels. This removes the repeated key strings, braces, and punctuation that inflate JSON size when the same structure appears many times in operational data. Evaluation on synthetic datasets shows the savings hold steady as record counts grow from 100 to 1,000, and models given a short description of the format perform lookup, counting, extraction, and aggregation tasks at the same accuracy level as with JSON. If the approach generalizes, a fixed context window can now hold substantially more records, cutting both the number of model calls and the cost of feeding large tables or sensor logs into language models.

Core claim

ONTO declares field names once per entity and arranges values in pipe-delimited rows with indentation-based hierarchy. This schema-once, data-many design eliminates per-record key repetition while preserving human readability and nested structure support. Evaluation across three synthetic operational datasets demonstrates 46-51% token reduction versus JSON, with stable scaling from 100 to 1,000 records. Controlled inference benchmarks show corresponding 5-10% latency improvement. Comprehension validation confirms no material degradation in LLM task accuracy across lookup, counting, extraction, and aggregation operations when format context is provided.

What carries the argument

ONTO notation, which declares field names once per entity and arranges values in pipe-delimited rows with indentation-based hierarchy to eliminate repeated structural tokens.

Load-bearing premise

That the token savings and accuracy results measured on synthetic datasets will hold for real operational data once a format description is added to the prompt.

What would settle it

Token-count and accuracy measurements on genuine IoT sensor logs or business transaction tables, checking whether net savings remain above 40% after including the required format instructions.

Figures

Figures reproduced from arXiv: 2604.17512 by Harshavardhanan Deekeswar.

read the original abstract

Serialization formats designed for document interchange impose structural overhead that becomes prohibitive when large language models consume operational data at scale. A modest dataset of 1,000 IoT sensor readings serialized as JSON requires approximately 80,000 tokens - the majority spent on repeated field names, nested braces, and structural punctuation rather than semantic content. We present ONTO (Object Notation for Token Optimization), a columnar notation that declares field names once per entity and arranges values in pipe-delimited rows with indentation-based hierarchy. This schema-once, data-many design eliminates per-record key repetition while preserving human readability and nested structure support. Evaluation across three synthetic operational datasets demonstrates 46-51% token reduction versus JSON, with stable scaling from 100 to 1,000 records. Controlled inference benchmarks on Qwen2.5-7B show corresponding 5-10% latency improvement. Comprehension validation confirms no material degradation in LLM task accuracy across lookup, counting, extraction, and aggregation operations when format context is provided. Ablation analysis reveals that key repetition accounts for the majority of JSON overhead, with indentation costs in nested structures explaining the 4-percentage-point gap between flat and hierarchical data. ONTO occupies a previously unfilled position in the serialization landscape: columnar efficiency with hierarchical structure, optimized for LLM context windows rather than document interchange. Code and specification are available at https://github.com/harsh-aranga/onto.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ONTO is a straightforward columnar tweak that avoids repeating keys in structured LLM inputs, but the headline savings rest on synthetic data and ignore prompt overhead.

read the letter

ONTO declares the schema once then packs values into pipe-delimited rows with indentation for nesting. That design directly targets the repeated field names that bloat JSON when you feed tables or records to an LLM. The paper measures 46-51% token cuts on three synthetic operational sets from 100 to 1,000 records and reports small latency wins on Qwen2.5-7B. It also runs quick checks showing no big drop in lookup, counting, extraction, or aggregation accuracy once the model sees the format rules. The GitHub link supplies the spec and code, which is useful for anyone who wants to try it immediately. The combination of columnar layout plus hierarchy support aimed at LLM context windows looks new enough on the surface. The measurements are concrete and the ablation on key repetition is straightforward to follow. The main gaps are that all tests use synthetic data, so we have no evidence on noisy real traces. The token counts compare only the raw serializations; they leave out the schema text and the minimal instructions needed to teach the model how to parse ONTO. Those added tokens could shrink the reported gains, and the numbers may shift under other tokenizers. The accuracy tests cover only a narrow set of tasks on one model. This is for practitioners who already manage large structured inputs and want a lighter serialization option. A reader focused on prompt engineering or context-window budgeting would get practical value from the format and the open code. It is not a foundational result, but the idea is simple to test and the current evidence is honest enough to justify referee time. I would send it for review with a request to add real datasets and full prompt token accounting.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces ONTO (Object Notation for Token Optimization), a columnar serialization format that declares field names once per entity and arranges values in pipe-delimited rows with indentation for hierarchy. The central claim is that this achieves 46-51% token reduction compared to JSON on three synthetic operational datasets ranging from 100 to 1,000 records, with stable scaling, corresponding 5-10% latency improvements on Qwen2.5-7B, and no material degradation in LLM accuracy for lookup, counting, extraction, and aggregation tasks when format context is supplied. The work provides code and specification on GitHub.

Significance. If the reported token reductions prove robust once full prompt overhead is included and the results generalize beyond synthetic data, the contribution would be significant for practical LLM context-window optimization in data-heavy applications. The empirical measurements on scaling behavior and task accuracy, combined with the open-sourced code and specification, provide a reproducible starting point for further work in LLM-specific serialization formats.

major comments (2)

[Evaluation / abstract claims] The 46-51% token-reduction claim (abstract and evaluation description) is based on raw serializations of ONTO versus JSON. The manuscript does not report the token cost of the schema declaration plus the minimal instruction prompt needed for an LLM to parse ONTO. Because the design's advantage is precisely the elimination of per-record key repetition, any added prefix tokens directly affect the net savings and must be quantified to support the headline efficiency numbers.
[Comprehension validation and ablation analysis] The comprehension validation and ablation analysis lack details on the exact tokenization method, dataset construction, number of trials, or statistical measures such as error bars. This absence makes the claims of 'no material degradation' and 'key repetition accounts for the majority of JSON overhead' difficult to verify or reproduce from the given text.

minor comments (2)

[Abstract] The abstract states that ONTO 'occupies a previously unfilled position in the serialization landscape' but provides no explicit comparison table or positioning against related columnar or LLM-optimized formats.
[Controlled inference benchmarks] Clarify whether the reported latency improvements on Qwen2.5-7B include the full end-to-end prompt (schema + data + instruction) or only the data portion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and will incorporate revisions to strengthen the evaluation and reproducibility of the work.

read point-by-point responses

Referee: [Evaluation / abstract claims] The 46-51% token-reduction claim (abstract and evaluation description) is based on raw serializations of ONTO versus JSON. The manuscript does not report the token cost of the schema declaration plus the minimal instruction prompt needed for an LLM to parse ONTO. Because the design's advantage is precisely the elimination of per-record key repetition, any added prefix tokens directly affect the net savings and must be quantified to support the headline efficiency numbers.

Authors: We agree that net token savings, inclusive of the one-time schema declaration and the instruction prompt, provide a more complete assessment of practical utility. The ONTO design intentionally places the schema cost once per entity so that it is amortized over the data records; for the 100- to 1,000-record scales examined, this fixed overhead should not eliminate the reported advantage. We will add explicit token counts for the schema and minimal instruction prompt, compute the resulting net reductions for each dataset size, and present these figures in a new table within the evaluation section. The abstract will be updated to clarify that the headline percentages refer to data serialization while the full-prompt results are reported in the body. These changes will directly address the concern without altering the core experimental outcomes. revision: yes
Referee: [Comprehension validation and ablation analysis] The comprehension validation and ablation analysis lack details on the exact tokenization method, dataset construction, number of trials, or statistical measures such as error bars. This absence makes the claims of 'no material degradation' and 'key repetition accounts for the majority of JSON overhead' difficult to verify or reproduce from the given text.

Authors: We recognize that additional methodological transparency is required for independent verification. We will expand the relevant sections to specify the tokenizer employed (the native Qwen2.5 tokenizer), the exact procedure used to generate the synthetic flat and hierarchical datasets, the number of independent trials performed for each task, and the inclusion of standard deviations or error bars on all accuracy and token-overhead figures. These additions will allow readers to confirm both the absence of material accuracy degradation and the attribution of JSON overhead primarily to key repetition. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on direct empirical token counts, not derivations or fitted inputs

full rationale

The paper introduces ONTO as a columnar serialization format and validates its token-efficiency claims via straightforward measurements of token counts on three synthetic datasets (100-1000 records) plus controlled inference benchmarks. No equations, parameters, or predictive models are described whose outputs are then re-used as inputs. The 46-51% reduction is obtained by comparing raw serializations of identical data in JSON vs. ONTO; this is a direct count, not a self-referential prediction. No self-citations are invoked to justify uniqueness theorems or ansatzes. The evaluation is therefore self-contained against external benchmarks (actual tokenizers on the provided data) and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim depends on empirical token counts and accuracy measurements on synthetic data rather than any mathematical derivation; the main unstated premises are that the chosen synthetic datasets behave like real operational data and that the tokenizer used matches common LLM practice.

axioms (2)

domain assumption Synthetic operational datasets adequately represent the token-overhead behavior of real IoT and log data
Evaluation is performed exclusively on three synthetic datasets.
domain assumption Providing format context to the LLM adds negligible extra tokens compared with the savings
Accuracy validation assumes format context is supplied but does not quantify its token cost.

invented entities (1)

ONTO notation no independent evidence
purpose: Columnar serialization format optimized for LLM token efficiency while retaining hierarchical structure
Newly introduced format with no independent existence outside the paper.

pith-pipeline@v0.9.0 · 5553 in / 1412 out tokens · 34972 ms · 2026-05-10T05:18:00.434751+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

Bray, T. (2017). The JavaScript Object Notation (JSON) Data Interchange Format. RFC 8259, IETF

work page 2017
[2]

Ben-Kiki, O., Evans, C., and döt Net, I. (2021). YAML Ain't Markup Language Version 1.2. yaml.org/spec/1.2

work page 2021
[3]

Protocol Buffers: Developer Guide

Google (2008). Protocol Buffers: Developer Guide. https://protobuf.dev/

work page 2008
[4]

Apache Parquet

Apache Software Foundation (2013). Apache Parquet. https://parquet.apache.org/

work page 2013
[5]

Jiang, H., Wu, Q., Lin, C.-Y., Yang, Y., and Qiu, L. (2023). LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models. arXiv:2310.05736

work page arXiv 2023
[6]

Pan, Z., Wu, Q., Jiang, H., et al. (2024). LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression. arXiv:2403.12968

work page arXiv 2024
[7]

Lewis, P., Perez, E., Piktus, A., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020

work page 2020
[8]

Hao, Y., et al. (2022). Structured Prompting: Scaling In-Context Learning to 1,000 Examples. arXiv:2212.06713

work page arXiv 2022
[9]

Dziri, N., et al. (2023). Faith and Fate: Limits of Transformers on Compositionality. NeurIPS 2023

work page 2023
[10]

Cheng, Z., Kasai, J., and Yu, T. (2023). Batch Prompting: Efficient Inference with Large Language Model APIs. arXiv:2301.08721

work page arXiv 2023
[11]

Petrov, A., et al. (2023). Language Model Tokenizers Introduce Unfairness Between Languages. arXiv:2305.15425

work page arXiv 2023
[12]

Liskavets, B., Ushakov, M., Roy, S., Klibanov, M., Etemad, A., and Luke, S. (2024). Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference. arXiv:2409.01227

work page arXiv 2024
[13]

https://github.com/toon-format/toon

TOON: Schema-Aware JSON Optimization. https://github.com/toon-format/toon

work page

[1] [1]

Bray, T. (2017). The JavaScript Object Notation (JSON) Data Interchange Format. RFC 8259, IETF

work page 2017

[2] [2]

Ben-Kiki, O., Evans, C., and döt Net, I. (2021). YAML Ain't Markup Language Version 1.2. yaml.org/spec/1.2

work page 2021

[3] [3]

Protocol Buffers: Developer Guide

Google (2008). Protocol Buffers: Developer Guide. https://protobuf.dev/

work page 2008

[4] [4]

Apache Parquet

Apache Software Foundation (2013). Apache Parquet. https://parquet.apache.org/

work page 2013

[5] [5]

Jiang, H., Wu, Q., Lin, C.-Y., Yang, Y., and Qiu, L. (2023). LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models. arXiv:2310.05736

work page arXiv 2023

[6] [6]

Pan, Z., Wu, Q., Jiang, H., et al. (2024). LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression. arXiv:2403.12968

work page arXiv 2024

[7] [7]

Lewis, P., Perez, E., Piktus, A., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020

work page 2020

[8] [8]

Hao, Y., et al. (2022). Structured Prompting: Scaling In-Context Learning to 1,000 Examples. arXiv:2212.06713

work page arXiv 2022

[9] [9]

Dziri, N., et al. (2023). Faith and Fate: Limits of Transformers on Compositionality. NeurIPS 2023

work page 2023

[10] [10]

Cheng, Z., Kasai, J., and Yu, T. (2023). Batch Prompting: Efficient Inference with Large Language Model APIs. arXiv:2301.08721

work page arXiv 2023

[11] [11]

Petrov, A., et al. (2023). Language Model Tokenizers Introduce Unfairness Between Languages. arXiv:2305.15425

work page arXiv 2023

[12] [12]

Liskavets, B., Ushakov, M., Roy, S., Klibanov, M., Etemad, A., and Luke, S. (2024). Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference. arXiv:2409.01227

work page arXiv 2024

[13] [13]

https://github.com/toon-format/toon

TOON: Schema-Aware JSON Optimization. https://github.com/toon-format/toon

work page