pith. sign in

arxiv: 2604.17512 · v1 · submitted 2026-04-19 · 💻 cs.CL · cs.LG

ONTO: A Token-Efficient Columnar Notation for LLM Input Optimization

Pith reviewed 2026-05-10 05:18 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords ONTO notationtoken reductionserialization formatLLM input optimizationcolumnar notationJSON alternativedata serialization efficiency
0
0 comments X

The pith

ONTO notation cuts LLM input tokens by 46-51% versus JSON while keeping task accuracy intact.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ONTO as a serialization format that lists field names only once per entity type and then packs record values into pipe-delimited rows with simple indentation for any nested levels. This removes the repeated key strings, braces, and punctuation that inflate JSON size when the same structure appears many times in operational data. Evaluation on synthetic datasets shows the savings hold steady as record counts grow from 100 to 1,000, and models given a short description of the format perform lookup, counting, extraction, and aggregation tasks at the same accuracy level as with JSON. If the approach generalizes, a fixed context window can now hold substantially more records, cutting both the number of model calls and the cost of feeding large tables or sensor logs into language models.

Core claim

ONTO declares field names once per entity and arranges values in pipe-delimited rows with indentation-based hierarchy. This schema-once, data-many design eliminates per-record key repetition while preserving human readability and nested structure support. Evaluation across three synthetic operational datasets demonstrates 46-51% token reduction versus JSON, with stable scaling from 100 to 1,000 records. Controlled inference benchmarks show corresponding 5-10% latency improvement. Comprehension validation confirms no material degradation in LLM task accuracy across lookup, counting, extraction, and aggregation operations when format context is provided.

What carries the argument

ONTO notation, which declares field names once per entity and arranges values in pipe-delimited rows with indentation-based hierarchy to eliminate repeated structural tokens.

Load-bearing premise

That the token savings and accuracy results measured on synthetic datasets will hold for real operational data once a format description is added to the prompt.

What would settle it

Token-count and accuracy measurements on genuine IoT sensor logs or business transaction tables, checking whether net savings remain above 40% after including the required format instructions.

Figures

Figures reproduced from arXiv: 2604.17512 by Harshavardhanan Deekeswar.

Figure 1
Figure 1. Figure 1: Token scaling from 100 to 1,000 records. Both [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
read the original abstract

Serialization formats designed for document interchange impose structural overhead that becomes prohibitive when large language models consume operational data at scale. A modest dataset of 1,000 IoT sensor readings serialized as JSON requires approximately 80,000 tokens - the majority spent on repeated field names, nested braces, and structural punctuation rather than semantic content. We present ONTO (Object Notation for Token Optimization), a columnar notation that declares field names once per entity and arranges values in pipe-delimited rows with indentation-based hierarchy. This schema-once, data-many design eliminates per-record key repetition while preserving human readability and nested structure support. Evaluation across three synthetic operational datasets demonstrates 46-51% token reduction versus JSON, with stable scaling from 100 to 1,000 records. Controlled inference benchmarks on Qwen2.5-7B show corresponding 5-10% latency improvement. Comprehension validation confirms no material degradation in LLM task accuracy across lookup, counting, extraction, and aggregation operations when format context is provided. Ablation analysis reveals that key repetition accounts for the majority of JSON overhead, with indentation costs in nested structures explaining the 4-percentage-point gap between flat and hierarchical data. ONTO occupies a previously unfilled position in the serialization landscape: columnar efficiency with hierarchical structure, optimized for LLM context windows rather than document interchange. Code and specification are available at https://github.com/harsh-aranga/onto.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces ONTO (Object Notation for Token Optimization), a columnar serialization format that declares field names once per entity and arranges values in pipe-delimited rows with indentation for hierarchy. The central claim is that this achieves 46-51% token reduction compared to JSON on three synthetic operational datasets ranging from 100 to 1,000 records, with stable scaling, corresponding 5-10% latency improvements on Qwen2.5-7B, and no material degradation in LLM accuracy for lookup, counting, extraction, and aggregation tasks when format context is supplied. The work provides code and specification on GitHub.

Significance. If the reported token reductions prove robust once full prompt overhead is included and the results generalize beyond synthetic data, the contribution would be significant for practical LLM context-window optimization in data-heavy applications. The empirical measurements on scaling behavior and task accuracy, combined with the open-sourced code and specification, provide a reproducible starting point for further work in LLM-specific serialization formats.

major comments (2)
  1. [Evaluation / abstract claims] The 46-51% token-reduction claim (abstract and evaluation description) is based on raw serializations of ONTO versus JSON. The manuscript does not report the token cost of the schema declaration plus the minimal instruction prompt needed for an LLM to parse ONTO. Because the design's advantage is precisely the elimination of per-record key repetition, any added prefix tokens directly affect the net savings and must be quantified to support the headline efficiency numbers.
  2. [Comprehension validation and ablation analysis] The comprehension validation and ablation analysis lack details on the exact tokenization method, dataset construction, number of trials, or statistical measures such as error bars. This absence makes the claims of 'no material degradation' and 'key repetition accounts for the majority of JSON overhead' difficult to verify or reproduce from the given text.
minor comments (2)
  1. [Abstract] The abstract states that ONTO 'occupies a previously unfilled position in the serialization landscape' but provides no explicit comparison table or positioning against related columnar or LLM-optimized formats.
  2. [Controlled inference benchmarks] Clarify whether the reported latency improvements on Qwen2.5-7B include the full end-to-end prompt (schema + data + instruction) or only the data portion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and will incorporate revisions to strengthen the evaluation and reproducibility of the work.

read point-by-point responses
  1. Referee: [Evaluation / abstract claims] The 46-51% token-reduction claim (abstract and evaluation description) is based on raw serializations of ONTO versus JSON. The manuscript does not report the token cost of the schema declaration plus the minimal instruction prompt needed for an LLM to parse ONTO. Because the design's advantage is precisely the elimination of per-record key repetition, any added prefix tokens directly affect the net savings and must be quantified to support the headline efficiency numbers.

    Authors: We agree that net token savings, inclusive of the one-time schema declaration and the instruction prompt, provide a more complete assessment of practical utility. The ONTO design intentionally places the schema cost once per entity so that it is amortized over the data records; for the 100- to 1,000-record scales examined, this fixed overhead should not eliminate the reported advantage. We will add explicit token counts for the schema and minimal instruction prompt, compute the resulting net reductions for each dataset size, and present these figures in a new table within the evaluation section. The abstract will be updated to clarify that the headline percentages refer to data serialization while the full-prompt results are reported in the body. These changes will directly address the concern without altering the core experimental outcomes. revision: yes

  2. Referee: [Comprehension validation and ablation analysis] The comprehension validation and ablation analysis lack details on the exact tokenization method, dataset construction, number of trials, or statistical measures such as error bars. This absence makes the claims of 'no material degradation' and 'key repetition accounts for the majority of JSON overhead' difficult to verify or reproduce from the given text.

    Authors: We recognize that additional methodological transparency is required for independent verification. We will expand the relevant sections to specify the tokenizer employed (the native Qwen2.5 tokenizer), the exact procedure used to generate the synthetic flat and hierarchical datasets, the number of independent trials performed for each task, and the inclusion of standard deviations or error bars on all accuracy and token-overhead figures. These additions will allow readers to confirm both the absence of material accuracy degradation and the attribution of JSON overhead primarily to key repetition. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on direct empirical token counts, not derivations or fitted inputs

full rationale

The paper introduces ONTO as a columnar serialization format and validates its token-efficiency claims via straightforward measurements of token counts on three synthetic datasets (100-1000 records) plus controlled inference benchmarks. No equations, parameters, or predictive models are described whose outputs are then re-used as inputs. The 46-51% reduction is obtained by comparing raw serializations of identical data in JSON vs. ONTO; this is a direct count, not a self-referential prediction. No self-citations are invoked to justify uniqueness theorems or ansatzes. The evaluation is therefore self-contained against external benchmarks (actual tokenizers on the provided data) and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim depends on empirical token counts and accuracy measurements on synthetic data rather than any mathematical derivation; the main unstated premises are that the chosen synthetic datasets behave like real operational data and that the tokenizer used matches common LLM practice.

axioms (2)
  • domain assumption Synthetic operational datasets adequately represent the token-overhead behavior of real IoT and log data
    Evaluation is performed exclusively on three synthetic datasets.
  • domain assumption Providing format context to the LLM adds negligible extra tokens compared with the savings
    Accuracy validation assumes format context is supplied but does not quantify its token cost.
invented entities (1)
  • ONTO notation no independent evidence
    purpose: Columnar serialization format optimized for LLM token efficiency while retaining hierarchical structure
    Newly introduced format with no independent existence outside the paper.

pith-pipeline@v0.9.0 · 5553 in / 1412 out tokens · 34972 ms · 2026-05-10T05:18:00.434751+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

  1. [1]

    Bray, T. (2017). The JavaScript Object Notation (JSON) Data Interchange Format. RFC 8259, IETF

  2. [2]

    Ben-Kiki, O., Evans, C., and döt Net, I. (2021). YAML Ain't Markup Language Version 1.2. yaml.org/spec/1.2

  3. [3]

    Protocol Buffers: Developer Guide

    Google (2008). Protocol Buffers: Developer Guide. https://protobuf.dev/

  4. [4]

    Apache Parquet

    Apache Software Foundation (2013). Apache Parquet. https://parquet.apache.org/

  5. [5]

    Jiang, H., Wu, Q., Lin, C.-Y., Yang, Y., and Qiu, L. (2023). LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models. arXiv:2310.05736

  6. [6]

    Pan, Z., Wu, Q., Jiang, H., et al. (2024). LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression. arXiv:2403.12968

  7. [7]

    Lewis, P., Perez, E., Piktus, A., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020

  8. [8]

    Hao, Y., et al. (2022). Structured Prompting: Scaling In-Context Learning to 1,000 Examples. arXiv:2212.06713

  9. [9]

    Dziri, N., et al. (2023). Faith and Fate: Limits of Transformers on Compositionality. NeurIPS 2023

  10. [10]

    Cheng, Z., Kasai, J., and Yu, T. (2023). Batch Prompting: Efficient Inference with Large Language Model APIs. arXiv:2301.08721

  11. [11]

    Petrov, A., et al. (2023). Language Model Tokenizers Introduce Unfairness Between Languages. arXiv:2305.15425

  12. [12]

    Liskavets, B., Ushakov, M., Roy, S., Klibanov, M., Etemad, A., and Luke, S. (2024). Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference. arXiv:2409.01227

  13. [13]

    https://github.com/toon-format/toon

    TOON: Schema-Aware JSON Optimization. https://github.com/toon-format/toon