pith. machine review for the scientific record. sign in

arxiv: 2604.07502 · v1 · submitted 2026-04-08 · 💻 cs.SE · cs.AI

Recognition: no theorem link

Beyond Human-Readable: Rethinking Software Engineering Conventions for the Agentic Development Era

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:06 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords agentic developmentLLM agentssoftware engineering conventionssemantic densitytoken optimizationlog formatsAI code consumptionprogram skeletons
0
0 comments X

The pith

Software conventions built for human readers raise total costs when LLM agents take over code consumption.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that six decades of human-optimized software engineering practices create inefficiencies once LLM-based agents become the main readers and writers of code. It advances semantic density optimization as the guiding rule: remove tokens that carry no information while retaining those with high semantic value. A controlled experiment comparing log formats shows that aggressive compression cut input tokens by 17 percent yet raised overall session cost by 67 percent because the model had to spend extra reasoning steps to recover meaning. The work therefore proposes rehabilitating certain classical anti-patterns, introducing program skeletons for faster navigation, and separating a program's semantic intent from any human-readable surface form. If these adjustments hold, agentic development sessions could run at lower total token expense without loss of correctness.

Core claim

Human-centric conventions such as verbose logs and conventional formatting impose extra interpretive work on LLM agents; replacing them with representations that maximize semantic density per token lowers total session cost, as shown when compressed logs reduced input volume but increased reasoning-phase expense enough to raise the overall bill by 67 percent.

What carries the argument

Semantic density optimization: the deliberate elimination of zero-information tokens while preserving high-value semantic content so that the agent's total token budget across input and reasoning phases is minimized.

If this is right

  • Certain classical anti-patterns can become advantageous once agents rather than humans navigate code.
  • Program skeletons provide compact structural maps that let agents locate and modify relevant sections without reading full files.
  • Decoupling semantic intent from human-readable syntax permits more compact yet still correct code representations.
  • Logging and documentation standards should be redesigned to favor semantic density over immediate human legibility.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same density principle may apply to other artifacts agents consume, such as error messages, configuration files, or API documentation.
  • Development environments could automatically generate dual representations—one dense for agents and one expanded for humans.
  • Longer-term agent training might reward models that operate efficiently on dense inputs, reducing reliance on post-hoc compression.

Load-bearing premise

The controlled experiment on log formats accurately models the token and reasoning dynamics that appear in real-world agentic development tasks with current or future LLMs.

What would settle it

A follow-up trial that measures cumulative input-plus-reasoning tokens across complete multi-step agentic coding sessions (for example, an agent fixing a bug in an actual repository) under the same four log-format conditions.

Figures

Figures reproduced from arXiv: 2604.07502 by Dmytro Ustynov.

Figure 1
Figure 1. Figure 1: Taxonomy of software engineering conventions under agentic pressure. Evidence levels [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The compression paradox: input tokens decrease with compression (blue), but total [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Stacked token breakdown. The orange “reasoning + output” component grows as input [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

For six decades, software engineering principles have been optimized for a single consumer: the human developer. The rise of agentic AI development, where LLM-based agents autonomously read, write, navigate, and debug codebases, introduces a new primary consumer with fundamentally different constraints. This paper presents a systematic analysis of human-centric conventions under agentic pressure and proposes a key design principle: semantic density optimization, eliminating tokens that carry zero information while preserving tokens that carry high semantic value. We validate this principle through a controlled experiment on log format token economy across four conditions (human-readable, structured, compressed, and tool-assisted compressed), demonstrating a counterintuitive finding: aggressive compression increased total session cost by 67% despite reducing input tokens by 17%, because it shifted interpretive burden to the model's reasoning phase. We extend this principle to propose the rehabilitation of classical anti-patterns, introduce the program skeleton concept for agentic code navigation, and argue for a fundamental decoupling of semantic intent from human-readable representation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper argues that software engineering conventions optimized for human readers are ill-suited to agentic LLM-based development. It introduces the principle of semantic density optimization—eliminating zero-information tokens while retaining high-semantic-value ones—and validates it via a four-condition controlled experiment on log formats (human-readable, structured, compressed, tool-assisted compressed). The key empirical claim is that aggressive compression cut input tokens by 17% yet raised total session cost by 67% by shifting interpretive load into the model's reasoning phase. The work further proposes rehabilitating classical anti-patterns, introduces the 'program skeleton' concept for agentic navigation, and advocates decoupling semantic intent from human-readable representations.

Significance. If the experimental result and its causal attribution hold under fuller scrutiny, the paper could prompt a re-examination of logging, documentation, and code organization practices for AI agents, potentially improving efficiency in agentic workflows. The semantic-density framing and program-skeleton idea supply concrete, falsifiable design heuristics that future work could test across languages and agent architectures. The absence of detailed token breakdowns and iteration statistics, however, currently limits the strength of the central claim.

major comments (2)
  1. [Abstract / Experimental Validation] The abstract and experimental validation section: the 67% total-session-cost increase is attributed to a shift of interpretive burden into reasoning, yet no per-condition data are supplied on output-token counts, number of reasoning steps/tool calls, or error/failure rates. Without these metrics it is impossible to distinguish the proposed mechanism from confounds such as poorer parseability producing more failed attempts or longer recovery trajectories.
  2. [§5 / Extensions] The generalization of semantic density optimization beyond log formats: the principle is presented as broadly applicable to codebases and documentation, but the only quantitative support is the log-format experiment. No additional measurements or case studies are provided for other artifacts (e.g., source files, API schemas, or commit messages), leaving the scope of the claim unsupported.
minor comments (2)
  1. [Experimental Setup] The four experimental conditions are named but not fully specified (exact tokenization method, model version, prompt templates, or stopping criteria). Adding a table or appendix with these parameters would improve reproducibility.
  2. [§6] The 'program skeleton' concept is introduced without a concrete example or pseudocode illustration; a short listing would clarify how it differs from existing notions such as abstract syntax trees or call graphs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the strength of our empirical claims and the scope of our proposals. We respond to each major point below, agreeing to strengthen the experimental reporting where data gaps exist and to refine the framing of generalizations.

read point-by-point responses
  1. Referee: [Abstract / Experimental Validation] The abstract and experimental validation section: the 67% total-session-cost increase is attributed to a shift of interpretive burden into reasoning, yet no per-condition data are supplied on output-token counts, number of reasoning steps/tool calls, or error/failure rates. Without these metrics it is impossible to distinguish the proposed mechanism from confounds such as poorer parseability producing more failed attempts or longer recovery trajectories.

    Authors: We agree that the current manuscript would benefit from additional per-condition metrics to isolate the mechanism. The experiment measured end-to-end session cost and input token reduction (17%), with the 67% cost increase observed despite lower input tokens. In the revision we will add tables reporting output token counts, average reasoning steps and tool calls, and error/failure rates with recovery trajectories for each of the four conditions. This will allow direct evaluation of whether the cost increase arises from reasoning load or alternative factors such as parse failures. revision: yes

  2. Referee: [§5 / Extensions] The generalization of semantic density optimization beyond log formats: the principle is presented as broadly applicable to codebases and documentation, but the only quantitative support is the log-format experiment. No additional measurements or case studies are provided for other artifacts (e.g., source files, API schemas, or commit messages), leaving the scope of the claim unsupported.

    Authors: The quantitative validation is intentionally limited to the controlled log-format experiment. The proposals in §5 for codebases, documentation, and other artifacts are presented as conceptual extensions of the semantic-density principle rather than fully supported empirical claims. We will revise the text to explicitly separate the validated scope (log formats) from the proposed applications, and we will add a short section outlining how the principle could be tested on source files and commit messages in future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; principle proposed then independently validated by experiment.

full rationale

The paper first identifies the mismatch between human-optimized SE conventions and agentic LLM consumers, then proposes semantic density optimization (eliminating zero-information tokens) as a design principle. It subsequently validates the principle via a controlled experiment on four log-format conditions, reporting the empirical result that aggressive compression cut input tokens 17% but raised total session cost 67% due to shifted reasoning burden. This result is presented as an observation supporting the principle rather than being entailed by it through definition, fitted parameters, or self-citation. No equations, uniqueness theorems, ansatzes smuggled via prior work, or renamings of known results appear in the derivation chain. The central claim therefore retains independent empirical content outside its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The paper introduces new concepts like semantic density optimization and program skeletons without prior independent evidence, relying on the experiment for validation. No free parameters mentioned. Relies on domain assumptions about agent behavior and token costs.

axioms (2)
  • domain assumption AI agents process code differently from humans, prioritizing semantic content over readability.
    Central to the proposal that human-centric conventions are suboptimal.
  • standard math Token count directly correlates with cost in LLM usage.
    Assumed in the experiment on token economy.
invented entities (2)
  • semantic density optimization no independent evidence
    purpose: Design principle for optimizing code for agents by eliminating zero-information tokens.
    New principle introduced in the paper.
  • program skeleton concept no independent evidence
    purpose: For agentic code navigation.
    Introduced as an extension of the principle.

pith-pipeline@v0.9.0 · 5465 in / 1378 out tokens · 53178 ms · 2026-05-10T17:06:07.381557+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 4 canonical work pages · 1 internal anchor

  1. [1]

    G. A. Miller. The magical number seven, plus or minus two. Psychological Review, 63(2):81--97, 1956

  2. [2]

    Claude Code

    Anthropic. Claude Code. https://code.claude.com, 2025

  3. [3]

    GitHub Copilot

    GitHub. GitHub Copilot. https://github.com/features/copilot, 2024

  4. [4]

    The AI Code Editor

    Cursor. The AI Code Editor. https://cursor.com, 2024

  5. [5]

    Codex CLI

    OpenAI. Codex CLI. https://github.com/openai/codex, 2025

  6. [6]

    Agentic AI coding: Best practice patterns for speed with quality

    CodeScene. Agentic AI coding: Best practice patterns for speed with quality. February 2026

  7. [7]

    Ronacher

    A. Ronacher. Agentic coding recommendations. https://lucumr.pocoo.org/2025/6/12/agentic-coding/, June 2025

  8. [8]

    Coding agents as a first-class consideration in project structures

    DEV Community. Coding agents as a first-class consideration in project structures. January 2026

  9. [9]

    F. Matsen. Agentic coding from first principles. https://matsen.fhcrc.org/general/2025/10/30/agentic-coding-principles.html, October 2025

  10. [10]

    B. Houston. Agentic coding best practices. https://benhouston3d.com/blog/agentic-coding-best-practices, March 2025

  11. [11]

    D. Haupt. Ideas for an agent-oriented programming language. https://davi.sh/blog/2026/02/markov-ideas/, February 2026

  12. [12]

    Ronacher

    A. Ronacher. A language for agents. https://lucumr.pocoo.org/2026/2/9/a-language-for-agents/, February 2026

  13. [13]

    Token-Oriented Object Notation

    TOON Format. Token-Oriented Object Notation. https://github.com/toon-format/toon, 2024

  14. [14]

    DigitalOcean. TOON vs. JSON. https://www.digitalocean.com/community/tutorials/toon-vs-json, December 2025

  15. [15]

    A guide to token-efficient data prep for LLM workloads

    The New Stack. A guide to token-efficient data prep for LLM workloads. December 2025

  16. [16]

    Mell et al

    S. Mell et al. A fast, reliable, and secure programming language for LLM agents with code actions. arXiv:2506.12202, June 2025

  17. [17]

    Mohammadi

    B. Mohammadi. Pel, a programming language for orchestrating AI agents. arXiv:2505.13453, May 2025

  18. [18]

    LLM as interpreter for natural language programming

    CoRE. LLM as interpreter for natural language programming. arXiv:2405.06907, May 2024

  19. [19]

    Configuring Agentic AI Coding Tools: An Exploratory Study

    Configuring agentic AI coding tools: An exploratory study. arXiv:2602.14690, February 2026

  20. [20]

    How to build your AGENTS.md (2026)

    Augment Code. How to build your AGENTS.md (2026). https://www.augmentcode.com/guides/how-to-build-agents-md, April 2026

  21. [21]

    You're slicing your architecture wrong! Referenced in agentstructure

    DEV Community. You're slicing your architecture wrong! Referenced in agentstructure

  22. [22]

    N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, and P. Liang. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 2024