pith. sign in

arxiv: 2606.26924 · v1 · pith:JH7WLF3Tnew · submitted 2026-06-25 · 💻 cs.SE · cs.AI· cs.CR

A Deterministic Control Plane for LLM Coding Agents

Pith reviewed 2026-06-26 03:54 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.CR
keywords LLM coding agentsagent configuration filessupply chain managementdeterministic control planepermission enforcementstate machine gatingprompt drift detectionGitHub repository analysis
0
0 comments X

The pith

LLM coding agent configurations propagate as unmanaged duplicates across repositories and require a deterministic control plane to enforce supply-chain integrity and permissions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

A study of 10,008 GitHub repositories identifies that agent configuration files duplicate at a 10.1 percent rate across independent projects, with most duplication crossing organizational boundaries, while fewer than 1 percent declare permission boundaries and revisions occur at low rates. The paper proposes a deterministic control plane that maps directly onto these gaps by treating agent definitions as a managed supply chain. This includes content addressing with SHA-256 hashes, HMAC-stamped lockfiles, tiered permissions enforced before LLM invocation, a phase state machine for traceability from requirements to tests, compilation to multiple IDE targets, and Jaccard-based drift detection. A sympathetic reader would care because the layer sits above the harness and aims to replace ad-hoc or further LLM-based management with tool-agnostic, enforceable invariants.

Core claim

Agent configurations propagate as undeclared shared components: 10.1 percent of tracked paths are SHA-256 exact duplicates across independent repositories, with 75.5 percent of clone pairs crossing organisational boundaries, 58 percent single-commit histories, and less than 1 percent declaring permission boundaries. The central claim is that these gaps are addressed by a deterministic control plane that treats definitions as a managed supply chain with content addressing and audit logs, enforces tiered permissions and blocklists, gates work through a requirement-to-file-to-test state machine, compiles one canonical definition to seven IDE targets, and detects prompt drift via Jaccard similar

What carries the argument

Rel(AI)Build deterministic control plane, which provides a one-to-one mapping from the identified configuration gaps to supply-chain primitives, tiered permissions, and state-machine gating.

If this is right

  • Agent definitions receive SHA-256 content addressing, HMAC-stamped lockfiles, and hash-chained audit logs.
  • Tiered permissions and attack-derived blocklists are enforced before any LLM invocation occurs.
  • Feature work is gated by a phase state machine that maintains requirement-to-file-to-test traceability.
  • A single canonical definition compiles to seven different IDE targets.
  • Prompt drift is detected automatically through Jaccard similarity on the definitions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same supply-chain and gating approach could extend to configuration files for non-coding LLM agents.
  • Widespread use would likely lower the observed cross-repository duplication rate by making definitions versioned and unique.
  • Integration points with existing CI/CD systems would allow agent setups to inherit the same governance level as workflow files.
  • Developer productivity and security metrics from actual deployments would be required to quantify the practical gains beyond the conformance tests.

Load-bearing premise

That the specific combination of supply-chain primitives, tiered permissions, and state-machine gating will produce better real-world outcomes than continued reliance on LLM orchestration or ad-hoc config management.

What would settle it

A controlled comparison measuring rates of unauthorized file access, configuration drift, or security incidents in projects that adopt the control plane versus matched projects that continue with unmanaged configurations.

Figures

Figures reproduced from arXiv: 2606.26924 by Padmaraj Madatha.

Figure 1
Figure 1. Figure 1: The thin-governance gap. Runtime augmentation (indexing, tools, [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The deterministic control plane architecture. Four horizontal layers [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The agent-definition install pipeline. Three independent integrity [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Conceptual illustration of tokenisation drift risk zones. Thresholds [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The deterministic phase-gated lifecycle. Four HITL gates (pause [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Requirement→file→test traceability. Each acceptance criterion (AC) in the content-addressed spec maps to implementing files and verifying tests; a verified AC is green, an unverified AC is amber. A file changed outside any AC scope surfaces as a spec-drift warning (red), making scope creep de￾tectable automatically. File-to-AC linkage requires cooperative agent invocation of trace-update (§4.5 trust bounda… view at source ↗
Figure 7
Figure 7. Figure 7: Threat → deterministic control mapping. Seven identified threats (red, left) each have a primary deterministic control (teal, right); T-numbers correspond to [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Raw lifetime version-control depth by file category. AI agent [PITH_FULL_IMAGE:figures/full_fig_p031_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Median commits/month by category. Left panel: all files; [PITH_FULL_IMAGE:figures/full_fig_p032_9.png] view at source ↗
read the original abstract

LLM coding harnesses grant agents broad file and shell access, yet the configuration layer that steers them -- rules files, agent definitions, IDE-specific markdown -- is largely unmanaged. A prevalence study of 10,008 public GitHub repositories (n=6,145 agent config files) finds that agent configurations propagate as undeclared shared components: 10.1% of tracked paths are SHA-256 exact duplicates across independent repositories (fork-adjusted, threshold-independent), with 75.5% of clone pairs crossing organisational boundaries. Two further patterns are indicative: configurations are rarely revised (58% single-commit; 0.4 vs 0.6 commits/month age-normalised against CI/CD workflows), and rarely declare permission boundaries (<1% of agent configs vs 33% of Actions workflows, n=31 true positives). We propose a deterministic control plane above the harness that maps one-to-one to these gaps. Rel(AI)Build treats agent definitions as a managed supply chain (SHA-256 content addressing, HMAC-stamped lockfiles, hash-chained audit logs); enforces tiered permissions and attack-derived blocklists before LLM invocation; gates feature work through a phase state machine with requirement-to-file-to-test traceability; compiles a single canonical definition to seven IDE targets; and detects prompt drift via Jaccard similarity. Conformance tests on injected violations confirm each mechanism enforces its stated invariant; developer outcomes remain future work. Governance of this layer must be deterministic and tool-agnostic -- not delegated to further LLM orchestration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript reports a prevalence study across 10,008 public GitHub repositories (yielding 6,145 agent config files) that identifies three gaps in LLM coding agent configuration management: 10.1% of tracked paths are SHA-256 exact duplicates across independent repositories (75.5% crossing organisational boundaries), configurations are rarely revised (58% single-commit; 0.4 vs 0.6 commits/month age-normalised), and permission boundaries are rarely declared (<1% of agent configs vs 33% of Actions workflows). It proposes the Rel(AI)Build deterministic control plane that maps one-to-one to these gaps via supply-chain primitives (SHA-256 content addressing, HMAC-stamped lockfiles, hash-chained audit logs), tiered permissions and attack-derived blocklists, a phase state machine with requirement-to-file-to-test traceability, compilation to seven IDE targets, and Jaccard-based prompt-drift detection. Conformance tests on injected violations are reported to confirm that each mechanism enforces its stated invariant; developer outcomes and real-world efficacy are explicitly scoped as future work.

Significance. If the design and its conformance properties hold, the work supplies a concrete, tool-agnostic alternative to ad-hoc or LLM-orchestrated configuration management for LLM coding agents. The prevalence statistics provide empirical motivation for the three gaps, the use of standard cryptographic primitives is parameter-free in the stated sense, and the explicit scoping of claims to the mechanisms (rather than asserted outcome improvements) is a strength. The approach could influence secure configuration practices in the growing LLM-agent tooling ecosystem.

minor comments (2)
  1. The abstract is unusually long and dense; a shorter version focused on the three gaps, the one-to-one mapping, and the scope limitation would improve readability while preserving all technical content.
  2. A high-level architecture diagram of the control plane (showing the relationship between the supply-chain layer, permission gate, state machine, and multi-target compiler) would help readers visualise the one-to-one mapping claimed in the abstract.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the detailed summary of the manuscript, the positive assessment of its significance, and the recommendation of minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper reports an empirical prevalence study (n=10,008 repos) that surfaces three observable patterns in agent configs, then presents an explicit design proposal (Rel(AI)Build) that addresses those patterns using standard, externally defined primitives (SHA-256 addressing, HMAC lockfiles, phase state machines, Jaccard drift detection). Conformance is verified by synthetic injection tests that check stated invariants. No equations, fitted parameters, or predictions appear; the mapping is by construction of the proposal itself rather than a reduction. No self-citations are load-bearing, no uniqueness theorems are invoked, and no ansatz or renaming of known results is used. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Relies on standard cryptographic assumptions and introduces one new system entity without fitted parameters or additional axioms beyond domain conventions for permissions and hashing.

axioms (1)
  • standard math SHA-256 and HMAC provide reliable content addressing and tamper evidence for configuration files.
    Invoked for lockfiles and audit logs.
invented entities (1)
  • Rel(AI)Build control plane no independent evidence
    purpose: Deterministic management layer for LLM agent configurations
    New system proposed to address identified gaps.

pith-pipeline@v0.9.1-grok · 5801 in / 1245 out tokens · 60257 ms · 2026-06-26T03:54:38.229057+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 4 canonical work pages · 1 internal anchor

  1. [1]

    Software engineering for machine learning: A case study

    Saleema Amershi et al. Software engineering for machine learning: A case study. In IEEE/ACM Intl. Conf. on Software Engineering: Software Engineering in Practice (ICSE-SEIP), 2019

  2. [2]

    The debugging decay index: Rethinking debugging strategies for code llms

    Anonymous . The debugging decay index: Rethinking debugging strategies for code llms. arXiv preprint arXiv:2506.18403, 2025. doi:10.48550/arXiv.2506.18403

  3. [3]

    Responsible scaling policy

    Anthropic . Responsible scaling policy. https://www.anthropic.com/news/anthropics-responsible-scaling-policy, 2023

  4. [4]

    Barrett, Jessica Newman, Brandie Nonnecke, Nada Madkour, Dan Hendrycks, Evan R

    Anthony M. Barrett, Jessica Newman, Brandie Nonnecke, Nada Madkour, Dan Hendrycks, Evan R. Murphy, Krystal Jackson, and Deepika Raman. Ai risk-management standards profile for general-purpose ai (gpai) and foundation models, version 1.2. Technical report, UC Berkeley Center for Long-Term Cybersecurity (CLTC), 2026. arXiv:2506.23949; https://cltc.berkeley....

  5. [5]

    Andrei Z. Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences (SEQUENCES), 1997

  6. [6]

    Robert G. Cooper. Stage-gate systems: A new tool for managing new products. Business Horizons, 33 0 (3): 0 44--54, 1990

  7. [7]

    Dennis and Earl C

    Jack B. Dennis and Earl C. Van Horn. Programming semantics for multiprogrammed computations. Communications of the ACM, 9 0 (3): 0 143--155, 1966

  8. [8]

    Minijail and seccomp-bpf

    Will Drewry and Tavis Ormandy. Minijail and seccomp-bpf. In Linux Security Summit, 2012

  9. [9]

    Aider: Ai pair programming in your terminal

    Paul Gauthier. Aider: Ai pair programming in your terminal. Open-source project, https://aider.chat/, 2024

  10. [10]

    Prompt cache: Modular attention reuse for low-latency inference,

    In Gim, Guojun Chen, Seung-seob Lee, Nikhil Sarda, Anurag Khandelwal, and Lin Zhong. Prompt cache: Modular attention reuse for low-latency inference. In Proceedings of Machine Learning and Systems (MLSys); arXiv:2311.04934, 2024. doi:10.48550/arXiv.2311.04934

  11. [11]

    Supply-chain levels for software artifacts (slsa), v1.0

    Google and others . Supply-chain levels for software artifacts (slsa), v1.0. https://slsa.dev/, 2023

  12. [12]

    Guardrails: A toolkit for building safe and reliable llm applications

    Guardrails AI . Guardrails: A toolkit for building safe and reliable llm applications. Open-source project, https://github.com/guardrails-ai/guardrails, 2024

  13. [13]

    Scott Stornetta

    Stuart Haber and W. Scott Stornetta. How to time-stamp a digital document. Journal of Cryptology, 3 0 (2): 0 99--111, 1991

  14. [14]

    C. A. R. Hoare. Communicating sequential processes. Communications of the ACM, 21 0 (8): 0 666--677, 1978

  15. [15]

    Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation

    Jez Humble and David Farley. Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation. Addison-Wesley, 2010

  16. [16]

    Iso/iec 27001:2022 information security management systems --- requirements, 2022

    ISO/IEC . Iso/iec 27001:2022 information security management systems --- requirements, 2022

  17. [17]

    The promises and perils of mining github

    Eirini Kalliamvakou et al. The promises and perils of mining github. In Working Conference on Mining Software Repositories (MSR), 2014

  18. [18]

    Taxonomy of attacks on open-source software supply chains

    Piergiorgio Ladisa et al. Taxonomy of attacks on open-source software supply chains. In IEEE Symposium on Security and Privacy (S&P), 2023

  19. [19]

    Can you trust chatgpt's package recommendations? Vendor research report (grey literature), 2024

    Bar Lanyado et al. Can you trust chatgpt's package recommendations? Vendor research report (grey literature), 2024

  20. [20]

    Certificate transparency

    Ben Laurie, Adam Langley, and Emilia Kasper. Certificate transparency. RFC 6962, IETF, 2013

  21. [21]

    Agentbench: Evaluating llms as agents

    Xiao Liu et al. Agentbench: Evaluating llms as agents. In International Conference on Learning Representations (ICLR), 2024

  22. [22]

    ``your ai, my shell'': Demystifying prompt injection attacks on agentic ai coding editors

    Yuhao Liu, Yiyang Zhao, Yiyang Lyu, Tianyi Zhang, Haoyu Wang, and David Lo. ``your ai, my shell'': Demystifying prompt injection attacks on agentic ai coding editors. arXiv preprint arXiv:2509.22040, 2025

  23. [23]

    Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity

    Yao Lu et al. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Annual Meeting of the Association for Computational Linguistics (ACL), 2022

  24. [24]

    Github agent-configuration prevalence study: Dataset and reproduction scripts

    Padmaraj Madatha. Github agent-configuration prevalence study: Dataset and reproduction scripts. Zenodo (DOI to be assigned) and arXiv ancillary files [Author artifact, not peer-reviewed], 2026 a

  25. [25]

    Rel(ai)build enterprise compliance alignment: Sox itgc, iso 27001 annex a, and nist ai rmf sub-function mappings

    Padmaraj Madatha. Rel(ai)build enterprise compliance alignment: Sox itgc, iso 27001 annex a, and nist ai rmf sub-function mappings. Happiest Minds Technologies, companion document [Author artifact, not peer-reviewed], 2026 b

  26. [26]

    Cursor vs cursor+rel(ai)build illustrative build walkthrough

    Padmaraj Madatha. Cursor vs cursor+rel(ai)build illustrative build walkthrough. arXiv ancillary files [Author artifact, not peer-reviewed], 2026 c

  27. [27]

    Mark S. Miller. Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis, Johns Hopkins University, 2006

  28. [28]

    Reframing instructional prompts to gptk's language

    Swaroop Mishra et al. Reframing instructional prompts to gptk's language. In Findings of the Association for Computational Linguistics (ACL Findings), 2022

  29. [29]

    Infrastructure as Code: Managing Servers in the Cloud

    Kief Morris. Infrastructure as Code: Managing Servers in the Cloud. O'Reilly Media, 2016

  30. [30]

    Petri nets: Properties, analysis and applications

    Tadao Murata. Petri nets: Properties, analysis and applications. Proceedings of the IEEE, 77 0 (4): 0 541--580, 1989

  31. [31]

    Bitcoin: A peer-to-peer electronic cash system, 2008

    Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system, 2008

  32. [32]

    Sigstore: Software signing for everybody

    Zachary Newman, John Speed Meyers, and Santiago Torres-Arias. Sigstore: Software signing for everybody. In ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 2353--2367, 2022. doi:10.1145/3548606.3560596

  33. [33]

    Artificial intelligence risk management framework (ai rmf 1.0)

    NIST . Artificial intelligence risk management framework (ai rmf 1.0). Technical Report NIST AI 100-1, National Institute of Standards and Technology, 2023

  34. [34]

    Artificial intelligence risk management framework: Generative artificial intelligence profile

    NIST . Artificial intelligence risk management framework: Generative artificial intelligence profile. Technical Report NIST AI 600-1, National Institute of Standards and Technology, 2024

  35. [35]

    Cybersecurity framework profile for artificial intelligence (cyber ai profile)

    NIST . Cybersecurity framework profile for artificial intelligence (cyber ai profile). Technical Report NIST IR 8596 (initial public draft), National Institute of Standards and Technology, 2025. https://csrc.nist.gov/pubs/ir/8596/iprd

  36. [36]

    Ai agent standards initiative

    NIST CAISI . Ai agent standards initiative. https://www.nist.gov/artificial-intelligence/ai-agent-standards-initiative, 2026

  37. [37]

    The minimum elements for a software bill of materials (sbom)

    NTIA . The minimum elements for a software bill of materials (sbom). Technical report, U.S. Department of Commerce, National Telecommunications and Information Administration, 2021

  38. [38]

    Backstabber's knife collection: A review of open source software supply chain attacks

    Marc Ohm et al. Backstabber's knife collection: A review of open source software supply chain attacks. In Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), 2020

  39. [39]

    Preparedness framework

    OpenAI . Preparedness framework. https://openai.com/index/openai-preparedness-framework/, 2023

  40. [40]

    Top 10 for large language model applications, 2025

    OWASP . Top 10 for large language model applications, 2025

  41. [41]

    Security smells in ansible and chef scripts: A replication study

    Akond Rahman et al. Security smells in ansible and chef scripts: A replication study. ACM Transactions on Software Engineering and Methodology (TOSEM), 30 0 (1): 0 1--31, 2021

  42. [42]

    Nemo guardrails: A toolkit for controllable and safe llm applications with programmable rails

    Traian Rebedea et al. Nemo guardrails: A toolkit for controllable and safe llm applications with programmable rails. arXiv preprint arXiv:2310.10561, 2023

  43. [43]

    Saltzer and Michael D

    Jerome H. Saltzer and Michael D. Schroeder. The protection of information in computer systems. Proceedings of the IEEE, 63 0 (9): 0 1278--1308, 1975

  44. [44]

    Quantifying language models' sensitivity to spurious features in prompt design

    Melanie Sclar et al. Quantifying language models' sensitivity to spurious features in prompt design. arXiv preprint arXiv:2310.11324, 2023

  45. [45]

    Executive order 14028: Improving the nation's cybersecurity

    The White House . Executive order 14028: Improving the nation's cybersecurity. Federal Register Vol. 86, No. 93, 2021

  46. [46]

    Pulling Strings with Puppet: Configuration Management Made Easy

    James Turnbull and Jeffrey McCune. Pulling Strings with Puppet: Configuration Management Made Easy. Apress, 2008

  47. [47]

    Wil M. P. van der Aalst. Workflow verification: Finding control-flow errors using petri-net-based techniques. In Business Process Management, LNCS 1806, pages 161--183. 2000

  48. [48]

    Ai package hallucination

    Vulcan Cyber . Ai package hallucination. Vendor research report (grey literature), 2024

  49. [49]

    OpenHands: An Open Platform for AI Software Developers as Generalist Agents

    Xingyao Wang et al. Openhands: An open platform for ai software developers as generalist agents (formerly opendevin). arXiv preprint arXiv:2407.16741, 2024. doi:10.48550/arXiv.2407.16741

  50. [50]

    Autogen: Enabling next-gen llm applications via multi-agent conversation

    Qingyun Wu et al. Autogen: Enabling next-gen llm applications via multi-agent conversation. arXiv preprint arXiv:2308.08155, 2023

  51. [51]

    Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

    John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. Swe-agent: Agent--computer interfaces enable automated software engineering. In Advances in Neural Information Processing Systems (NeurIPS), volume 37, 2024

  52. [52]

    What are weak links in the npm supply chain? In IEEE/ACM Intl

    Nusrat Zahan et al. What are weak links in the npm supply chain? In IEEE/ACM Intl. Conf. on Software Engineering: Software Engineering in Society (ICSE-SEIS), 2022