ONTO: A Token-Efficient Columnar Notation for LLM Input Optimization
Pith reviewed 2026-05-10 05:18 UTC · model grok-4.3
The pith
ONTO notation cuts LLM input tokens by 46-51% versus JSON while keeping task accuracy intact.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ONTO declares field names once per entity and arranges values in pipe-delimited rows with indentation-based hierarchy. This schema-once, data-many design eliminates per-record key repetition while preserving human readability and nested structure support. Evaluation across three synthetic operational datasets demonstrates 46-51% token reduction versus JSON, with stable scaling from 100 to 1,000 records. Controlled inference benchmarks show corresponding 5-10% latency improvement. Comprehension validation confirms no material degradation in LLM task accuracy across lookup, counting, extraction, and aggregation operations when format context is provided.
What carries the argument
ONTO notation, which declares field names once per entity and arranges values in pipe-delimited rows with indentation-based hierarchy to eliminate repeated structural tokens.
Load-bearing premise
That the token savings and accuracy results measured on synthetic datasets will hold for real operational data once a format description is added to the prompt.
What would settle it
Token-count and accuracy measurements on genuine IoT sensor logs or business transaction tables, checking whether net savings remain above 40% after including the required format instructions.
Figures
read the original abstract
Serialization formats designed for document interchange impose structural overhead that becomes prohibitive when large language models consume operational data at scale. A modest dataset of 1,000 IoT sensor readings serialized as JSON requires approximately 80,000 tokens - the majority spent on repeated field names, nested braces, and structural punctuation rather than semantic content. We present ONTO (Object Notation for Token Optimization), a columnar notation that declares field names once per entity and arranges values in pipe-delimited rows with indentation-based hierarchy. This schema-once, data-many design eliminates per-record key repetition while preserving human readability and nested structure support. Evaluation across three synthetic operational datasets demonstrates 46-51% token reduction versus JSON, with stable scaling from 100 to 1,000 records. Controlled inference benchmarks on Qwen2.5-7B show corresponding 5-10% latency improvement. Comprehension validation confirms no material degradation in LLM task accuracy across lookup, counting, extraction, and aggregation operations when format context is provided. Ablation analysis reveals that key repetition accounts for the majority of JSON overhead, with indentation costs in nested structures explaining the 4-percentage-point gap between flat and hierarchical data. ONTO occupies a previously unfilled position in the serialization landscape: columnar efficiency with hierarchical structure, optimized for LLM context windows rather than document interchange. Code and specification are available at https://github.com/harsh-aranga/onto.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces ONTO (Object Notation for Token Optimization), a columnar serialization format that declares field names once per entity and arranges values in pipe-delimited rows with indentation for hierarchy. The central claim is that this achieves 46-51% token reduction compared to JSON on three synthetic operational datasets ranging from 100 to 1,000 records, with stable scaling, corresponding 5-10% latency improvements on Qwen2.5-7B, and no material degradation in LLM accuracy for lookup, counting, extraction, and aggregation tasks when format context is supplied. The work provides code and specification on GitHub.
Significance. If the reported token reductions prove robust once full prompt overhead is included and the results generalize beyond synthetic data, the contribution would be significant for practical LLM context-window optimization in data-heavy applications. The empirical measurements on scaling behavior and task accuracy, combined with the open-sourced code and specification, provide a reproducible starting point for further work in LLM-specific serialization formats.
major comments (2)
- [Evaluation / abstract claims] The 46-51% token-reduction claim (abstract and evaluation description) is based on raw serializations of ONTO versus JSON. The manuscript does not report the token cost of the schema declaration plus the minimal instruction prompt needed for an LLM to parse ONTO. Because the design's advantage is precisely the elimination of per-record key repetition, any added prefix tokens directly affect the net savings and must be quantified to support the headline efficiency numbers.
- [Comprehension validation and ablation analysis] The comprehension validation and ablation analysis lack details on the exact tokenization method, dataset construction, number of trials, or statistical measures such as error bars. This absence makes the claims of 'no material degradation' and 'key repetition accounts for the majority of JSON overhead' difficult to verify or reproduce from the given text.
minor comments (2)
- [Abstract] The abstract states that ONTO 'occupies a previously unfilled position in the serialization landscape' but provides no explicit comparison table or positioning against related columnar or LLM-optimized formats.
- [Controlled inference benchmarks] Clarify whether the reported latency improvements on Qwen2.5-7B include the full end-to-end prompt (schema + data + instruction) or only the data portion.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below and will incorporate revisions to strengthen the evaluation and reproducibility of the work.
read point-by-point responses
-
Referee: [Evaluation / abstract claims] The 46-51% token-reduction claim (abstract and evaluation description) is based on raw serializations of ONTO versus JSON. The manuscript does not report the token cost of the schema declaration plus the minimal instruction prompt needed for an LLM to parse ONTO. Because the design's advantage is precisely the elimination of per-record key repetition, any added prefix tokens directly affect the net savings and must be quantified to support the headline efficiency numbers.
Authors: We agree that net token savings, inclusive of the one-time schema declaration and the instruction prompt, provide a more complete assessment of practical utility. The ONTO design intentionally places the schema cost once per entity so that it is amortized over the data records; for the 100- to 1,000-record scales examined, this fixed overhead should not eliminate the reported advantage. We will add explicit token counts for the schema and minimal instruction prompt, compute the resulting net reductions for each dataset size, and present these figures in a new table within the evaluation section. The abstract will be updated to clarify that the headline percentages refer to data serialization while the full-prompt results are reported in the body. These changes will directly address the concern without altering the core experimental outcomes. revision: yes
-
Referee: [Comprehension validation and ablation analysis] The comprehension validation and ablation analysis lack details on the exact tokenization method, dataset construction, number of trials, or statistical measures such as error bars. This absence makes the claims of 'no material degradation' and 'key repetition accounts for the majority of JSON overhead' difficult to verify or reproduce from the given text.
Authors: We recognize that additional methodological transparency is required for independent verification. We will expand the relevant sections to specify the tokenizer employed (the native Qwen2.5 tokenizer), the exact procedure used to generate the synthetic flat and hierarchical datasets, the number of independent trials performed for each task, and the inclusion of standard deviations or error bars on all accuracy and token-overhead figures. These additions will allow readers to confirm both the absence of material accuracy degradation and the attribution of JSON overhead primarily to key repetition. revision: yes
Circularity Check
No circularity: claims rest on direct empirical token counts, not derivations or fitted inputs
full rationale
The paper introduces ONTO as a columnar serialization format and validates its token-efficiency claims via straightforward measurements of token counts on three synthetic datasets (100-1000 records) plus controlled inference benchmarks. No equations, parameters, or predictive models are described whose outputs are then re-used as inputs. The 46-51% reduction is obtained by comparing raw serializations of identical data in JSON vs. ONTO; this is a direct count, not a self-referential prediction. No self-citations are invoked to justify uniqueness theorems or ansatzes. The evaluation is therefore self-contained against external benchmarks (actual tokenizers on the provided data) and does not reduce to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Synthetic operational datasets adequately represent the token-overhead behavior of real IoT and log data
- domain assumption Providing format context to the LLM adds negligible extra tokens compared with the savings
invented entities (1)
-
ONTO notation
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Bray, T. (2017). The JavaScript Object Notation (JSON) Data Interchange Format. RFC 8259, IETF
work page 2017
-
[2]
Ben-Kiki, O., Evans, C., and döt Net, I. (2021). YAML Ain't Markup Language Version 1.2. yaml.org/spec/1.2
work page 2021
-
[3]
Protocol Buffers: Developer Guide
Google (2008). Protocol Buffers: Developer Guide. https://protobuf.dev/
work page 2008
-
[4]
Apache Software Foundation (2013). Apache Parquet. https://parquet.apache.org/
work page 2013
- [5]
- [6]
-
[7]
Lewis, P., Perez, E., Piktus, A., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020
work page 2020
- [8]
-
[9]
Dziri, N., et al. (2023). Faith and Fate: Limits of Transformers on Compositionality. NeurIPS 2023
work page 2023
- [10]
- [11]
- [12]
-
[13]
https://github.com/toon-format/toon
TOON: Schema-Aware JSON Optimization. https://github.com/toon-format/toon
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.