pith. sign in

arxiv: 2605.12863 · v1 · pith:NGAEA3IJnew · submitted 2026-05-13 · 💻 cs.PL · cs.AI· cs.CR

Language-Based Agent Control

Pith reviewed 2026-06-30 21:39 UTC · model grok-4.3

classification 💻 cs.PL cs.AIcs.CR
keywords language-based agent controltype systemsagent safetyinformation flow controlaccess controlprogramming modelsdata provenanceAI agents
0
0 comments X

The pith

Requiring agents to generate well-typed programs allows safety policies to apply uniformly to both agent and developer code.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes language-based agent control as a model in which agents must produce programs that type-check in the context of the surrounding scaffolding code. Unsafe outputs are rejected by the type checker before any execution occurs. A sympathetic reader would care because this extends established language-based security techniques to agentic applications while still permitting side-effect-free computation and recursive subagent calls under the same or stricter policies. The method is shown through case studies covering I/O sandboxing, data provenance, and information-flow control.

Core claim

LBAC requires agents to generate programs that are well typed in the context of the surrounding scaffolding code. Unsafe programs are rejected by the type-checker before execution, allowing policies to apply uniformly across the entire application, including both agent-generated behavior and developer-written scaffolding. Agents may still perform arbitrary side-effect-free computation and recursively invoke subagents, which retain full tool access subject to the same or potentially more restrictive policies.

What carries the argument

Language-based agent control (LBAC), the requirement that agent-generated programs must themselves be well typed within the scaffolding code so that static and runtime enforcement can cover the whole application.

If this is right

  • Policies for access control, information flow, and data provenance apply uniformly to both scaffolding and agent-generated code.
  • Unsafe agent outputs are rejected before execution by the existing type checker.
  • Agents retain the ability to perform arbitrary side-effect-free computation.
  • Subagents can be invoked recursively with full or more restrictive tool access under the same policy framework.
  • The same mechanisms demonstrated for filesystem capabilities, provenance tracking, and information-flow control become available for other policies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Existing typed languages could host agent scaffolding directly, letting developers reuse familiar type systems for agent safety.
  • Runtime monitoring might become less necessary if type enforcement covers the agent portion of the program.
  • The model could be tested by porting an existing agent framework into a statically typed language and measuring how often agents succeed in producing well-typed solutions.

Load-bearing premise

Agents can generate programs that satisfy the type system of the scaffolding code while retaining substantial expressiveness including arbitrary side-effect-free computation and recursive subagent invocation.

What would settle it

An experiment in which every agent attempt to solve a useful task either produces an ill-typed program that the checker rejects or loses necessary functionality when restricted to well-typed programs only.

Figures

Figures reproduced from arXiv: 2605.12863 by Loris D'Antoni, Nadia Polikarpova, Timothy Zhou.

Figure 1
Figure 1. Figure 1: Implementing a literature search agent using a code interpreter, specialized tools, and [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the Language-Based Agent Control (LBAC) model. The entire agentic system [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
read the original abstract

This paper introduces language-based agent control (LBAC), a new programming model for agentic applications that brings techniques from programming languages and language-based security to the problem of agent control. In conventional programming, combinations of static typing and runtime enforcement have long been used to guarantee that well-typed programs satisfy user-specified policies, including policies for access control, information flow, data provenance, and more. The key idea behind LBAC is to extend these guarantees to agentic applications by requiring agents to generate programs that are themselves well typed in the context of the surrounding scaffolding code. Unsafe programs are rejected by the type-checker before execution, allowing policies to apply uniformly across the entire application, including both agent-generated behavior and developer-written scaffolding. At the same time, LBAC preserves substantial expressiveness: agents may perform arbitrary side-effect-free computation and recursively invoke subagents, which retain full tool access subject to the same -- or potentially more restrictive -- policies. We demonstrate LBAC with three case studies: I/O sandboxing via filesystem capabilities, data provenance, and information-flow control.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Language-Based Agent Control (LBAC), a programming model for agentic applications that extends language-based security techniques from conventional programming. The core idea requires agents to generate programs that are well-typed in the context of surrounding scaffolding code, so that type checkers can reject unsafe programs before execution and enforce policies (access control, information flow, provenance) uniformly across both developer-written and agent-generated code. The approach claims to preserve substantial expressiveness, allowing arbitrary side-effect-free computation and recursive invocation of subagents that retain full or more restrictive tool access. Three case studies are presented to demonstrate the idea: I/O sandboxing via filesystem capabilities, data provenance, and information-flow control.

Significance. If the central claims are substantiated with concrete type-system design and evidence that agents can reliably synthesize the required programs, LBAC would represent a meaningful application of established PL techniques to the emerging problem of controlling LLM-based agents. It could provide static, uniform security guarantees without requiring separate runtime monitors for agent behavior, which would be of interest to both the programming-languages and AI-safety communities.

major comments (2)
  1. [Abstract] Abstract: The manuscript asserts that a single type discipline can simultaneously (a) enforce uniform policies across scaffolding and agent code and (b) still permit arbitrary side-effect-free computation plus recursive subagent calls with equal or stricter tool access. No formal type-system definition, no generation algorithm, and no success-rate or expressiveness measurements are supplied, making it impossible to evaluate whether the two requirements are compatible. This balance is load-bearing for the central claim.
  2. [Case studies] Case studies: The three demonstrations (I/O sandboxing, data provenance, information-flow control) are cited as evidence that the approach works, yet the text contains no implementation details, no description of how agents are prompted or constrained to produce well-typed programs, and no analysis of type-checking outcomes or policy violations. Without these, the case studies cannot support the feasibility or expressiveness claims.
minor comments (1)
  1. [Abstract] The abstract would benefit from a short related-work paragraph situating LBAC with respect to prior language-based security systems (e.g., those using capabilities or IFC) and existing agent-control frameworks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript introducing Language-Based Agent Control (LBAC). We agree that the central claims require more formal and concrete substantiation through a defined type system and expanded case study details. We will revise the manuscript accordingly. We respond point-by-point to the major comments below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The manuscript asserts that a single type discipline can simultaneously (a) enforce uniform policies across scaffolding and agent code and (b) still permit arbitrary side-effect-free computation plus recursive subagent calls with equal or stricter tool access. No formal type-system definition, no generation algorithm, and no success-rate or expressiveness measurements are supplied, making it impossible to evaluate whether the two requirements are compatible. This balance is load-bearing for the central claim.

    Authors: We acknowledge that the abstract presents the key claims at a high level without the supporting formalisms. The manuscript focuses on introducing the LBAC model and illustrating it through case studies, but does not include a complete formal type system or quantitative measurements. In the revised manuscript, we will add a formal definition of the type system that demonstrates how policies are enforced uniformly while allowing the specified expressiveness. We will also describe the program generation approach and include any expressiveness analysis possible from the case studies. revision: yes

  2. Referee: [Case studies] Case studies: The three demonstrations (I/O sandboxing, data provenance, information-flow control) are cited as evidence that the approach works, yet the text contains no implementation details, no description of how agents are prompted or constrained to produce well-typed programs, and no analysis of type-checking outcomes or policy violations. Without these, the case studies cannot support the feasibility or expressiveness claims.

    Authors: The case studies are currently described conceptually to show how LBAC can be applied to different policies. We agree that they lack the necessary implementation details to fully support the claims. In the revision, we will provide detailed descriptions of the implementations, including agent prompting strategies to ensure well-typed outputs, the type checking process, and analysis of any policy violations or type-checking results. This will strengthen the evidence for feasibility and expressiveness. revision: yes

Circularity Check

0 steps flagged

No circularity: conceptual model with case studies only

full rationale

The paper introduces LBAC as a new programming model extending type systems and language-based security to agents, with claims supported by three case studies (I/O sandboxing, data provenance, IFC). No equations, derivations, fitted parameters, or load-bearing self-citations appear in the abstract or described structure. The central idea—that agents generate well-typed programs in scaffolding context—is presented as a design choice demonstrated empirically, not derived from prior self-referential results or by construction. The derivation chain is self-contained as a proposal without reduction to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract contains no mathematical derivations, fitted parameters, or new postulated entities; the contribution is described at a conceptual level only.

pith-pipeline@v0.9.1-grok · 5712 in / 1163 out tokens · 31327 ms · 2026-06-30T21:39:15.588331+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 17 canonical work pages

  1. [1]

    Guardians: A static verifier for AI agent workflows

    Nada Amin and contributors. Guardians: A static verifier for AI agent workflows. https: //github.com/metareflection/guardians, 2026. Open-source implementation of the proposal in Meijer [15]

  2. [2]

    Build with Claude: Agents

    Anthropic. Build with Claude: Agents. https://docs.anthropic.com/en/docs/ build-with-claude/agents, 2025. URL https://docs.anthropic.com/en/docs/ build-with-claude/agents

  3. [3]

    Newton, Simon Peyton Jones, and Arnaud Spiwack

    Jean-Philippe Bernardy, Mathieu Boespflug, Ryan R. Newton, Simon Peyton Jones, and Arnaud Spiwack. Linear haskell: practical linearity in a higher-order polymorphic language. Proc. ACM Program. Lang., 2(POPL):5:1–5:29, 2018. doi: 10.1145/3158093. URL https://doi. org/10.1145/3158093

  4. [4]

    Securing ai agents with information-flow control, 2025

    Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. Securing ai agents with information-flow control, 2025. URL https://arxiv.org/abs/2505.23643

  5. [5]

    CrewAI Inc. CrewAI. https://github.com/crewAIInc/crewAI, 2024. URL https: //github.com/crewAIInc/crewAI

  6. [6]

    Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents

    Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024. URL https://openreview.net/forum? id=m1YYAQjO3w

  7. [7]

    Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents, 2024

    Edoardo Debenedetti, Jie Zhang, Mislav Balunovi´c, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents, 2024. URL https://arxiv.org/abs/2406.13352

  8. [8]

    Defeating prompt injections by design, 2025

    Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr. Defeating prompt injections by design, 2025. URL https://arxiv.org/abs/2503.18813

  9. [9]

    Multivariate amortized resource analysis

    Jan Hoffmann, Klaus Aehlig, and Martin Hofmann. Multivariate amortized resource analysis. In Thomas Ball and Mooly Sagiv, editors, Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011, pages 357–370. ACM, 2011. doi: 10.1145/1926385.1926427. URL https: //doi.org/10.1145...

  10. [10]

    Resource-guided program synthesis

    Tristan Knoth, Di Wang, Nadia Polikarpova, and Jan Hoffmann. Resource-guided program synthesis. In Kathryn S. McKinley and Kathleen Fisher, editors, Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, June 22-26, 2019 , pages 253–268. ACM, 2019. doi: 10.1145/3314221. 3314602. URL h...

  11. [11]

    Optimizing agent planning for security and autonomy, 2026

    Aashish Kolluri, Rishi Sharma, Manuel Costa, Boris Köpf, Tobias Nießen, Mark Russinovich, Shruti Tople, and Santiago Zanella-Béguelin. Optimizing agent planning for security and autonomy, 2026. URL https://arxiv.org/abs/2602.11416

  12. [12]

    LangGraph

    LangChain AI. LangGraph. https://github.com/langchain-ai/langgraph, 2024. URL https://github.com/langchain-ai/langgraph

  13. [13]

    Geller, Niki Vazou, and Ranjit Jhala

    Nico Lehmann, Adam T. Geller, Niki Vazou, and Ranjit Jhala. Flux: Liquid types for rust. Proc. ACM Program. Lang., 7(PLDI):1533–1557, 2023. doi: 10.1145/3591283. URL https: //doi.org/10.1145/3591283

  14. [14]

    Xavier Leroy

    Daan Leijen. Type directed compilation of row-typed algebraic effects. In Giuseppe Castagna and Andrew D. Gordon, editors, Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017, pages 486–499. ACM, 2017. doi: 10.1145/3009837.3009872. URL https://doi.org/10.1145/ 3009837.3009872. 10

  15. [15]

    Guardians of the agents.Commun

    Erik Meijer. Guardians of the agents.Commun. ACM, 69(1):46–52, 2026. doi: 10.1145/3777544. URL https://doi.org/10.1145/3777544

  16. [16]

    Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control

    Mark Samuel Miller. Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis, Johns Hopkins University, Baltimore, Maryland, USA, 2006. URL http://erights.org/talks/thesis/markm-thesis.pdf

  17. [17]

    Andrew C. Myers. Jflow: Practical mostly-static information flow control. In Andrew W. Appel and Alex Aiken, editors, POPL ’99, Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Antonio, TX, USA, January 20-22, 1999, pages 228–241. ACM, 1999. doi: 10.1145/292540.292561. URL https://doi.org/ 10.1145/292540.292561

  18. [18]

    OpenAI agents SDK

    OpenAI. OpenAI agents SDK. https://github.com/openai/openai-agents-python,

  19. [19]

    URL https://github.com/openai/openai-agents-python

  20. [20]

    Quantitative program reasoning with graded modal types

    Dominic Orchard, Vilem-Benjamin Liepelt, and Harley Eades III. Quantitative program reasoning with graded modal types. Proc. ACM Program. Lang., 3(ICFP), July 2019. doi: 10.1145/3341714. URL https://doi.org/10.1145/3341714

  21. [21]

    Plotkin and Matija Pretnar

    Gordon D. Plotkin and Matija Pretnar. Handlers of algebraic effects. In Giuseppe Castagna, editor, Programming Languages and Systems, 18th European Symposium on Programming, ESOP 2009, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009, York, UK, March 22-29, 2009. Proceedings, Lecture Notes in Computer Science, ...

  22. [22]

    Liquid information flow control

    Nadia Polikarpova, Deian Stefan, Jean Yang, Shachar Itzhaky, Travis Hance, and Armando Solar-Lezama. Liquid information flow control. Proc. ACM Program. Lang., 4(ICFP):105:1– 105:30, 2020. doi: 10.1145/3408987. URL https://doi.org/10.1145/3408987

  23. [23]

    Chained bash commands silently bypass the tool permission allowlist

    Pro777. Chained bash commands silently bypass the tool permission allowlist. GitHub issue #36637, anthropics/claude-code, March 2026. URL https://github.com/ anthropics/claude-code/issues/36637. Accessed: 2026-05-06

  24. [24]

    Andrei Sabelfeld and Andrew C. Myers. Language-based information-flow security. IEEE J. Sel. Areas Commun., 21(1):5–19, 2003. doi: 10.1109/JSAC.2002.806121. URL https: //doi.org/10.1109/JSAC.2002.806121

  25. [25]

    Schneider, J

    Fred B. Schneider, J. Gregory Morrisett, and Robert Harper. A language-based approach to security. In Reinhard Wilhelm, editor, Informatics - 10 Years Back. 10 Years Ahead, Lecture Notes in Computer Science, pages 86–101. Springer, 2001. doi: 10.1007/3-540-44577-3\_6. URL https://doi.org/10.1007/3-540-44577-3_6

  26. [26]

    An ai agent execution environment to safeguard user data, 2026

    Robert Stanley, Avi Verma, Lillian Tsai, Konstantinos Kallas, and Sam Kumar. An ai agent execution environment to safeguard user data, 2026. URL https://arxiv.org/abs/2604. 19657

  27. [27]

    Mitchell

    Deian Stefan, Alejandro Russo, David Mazières, and John C. Mitchell. Disjunction category labels. In Peeter Laud, editor, Information Security Technology for Applications - 16th Nordic Conference on Secure IT Systems, NordSec 2011, Tallinn, Estonia, October 26-28, 2011, Revised Selected Papers, Lecture Notes in Computer Science, pages 223–239. Springer, 2...

  28. [28]

    Mitchell, and David Mazières

    Deian Stefan, Alejandro Russo, John C. Mitchell, and David Mazières. Flexible dynamic information flow control in haskell. In Koen Claessen, editor, Proceedings of the 4th ACM SIGPLAN Symposium on Haskell, Haskell 2011, Tokyo, Japan, 22 September 2011 , pages 95–106. ACM, 2011. doi: 10.1145/2034675.2034688. URL https://doi.org/10.1145/ 2034675.2034688

  29. [29]

    Mitchell, and Alejandro Russo

    Deian Stefan, David Mazières, John C. Mitchell, and Alejandro Russo. Flexible dynamic information flow control in the presence of exceptions. J. Funct. Program., 27:e5, 2017. doi: 10.1017/S0956796816000241. URL https://doi.org/10.1017/S0956796816000241. 11

  30. [30]

    Peyton Jones, and David Mazières

    David Terei, Simon Marlow, Simon L. Peyton Jones, and David Mazières. Safe haskell. In Janis V oigtländer, editor,Proceedings of the 5th ACM SIGPLAN Symposium on Haskell, Haskell 2012, Copenhagen, Denmark, 13 September 2012, pages 137–148. ACM, 2012. doi: 10.1145/2364506.2364524. URL https://doi.org/10.1145/2364506.2364524

  31. [31]

    Contextual agent security: A policy for every purpose,

    Lillian Tsai and Eugene Bagdasarian. Contextual agent security: A policy for every purpose,

  32. [32]

    URL https://arxiv.org/abs/2501.17070

  33. [33]

    Seidel, Ranjit Jhala, Dimitrios Vytiniotis, and Simon L

    Niki Vazou, Eric L. Seidel, Ranjit Jhala, Dimitrios Vytiniotis, and Simon L. Peyton Jones. Re- finement types for haskell. In Johan Jeuring and Manuel M. T. Chakravarty, editors,Proceedings of the 19th ACM SIGPLAN international conference on Functional programming, Gothenburg, Sweden, September 1-3, 2014, pages 269–282. ACM, 2014. doi: 10.1145/2628136.262...

  34. [34]

    The dual LLM pattern for building AI assistants that can resist prompt in- jection

    Simon Willison. The dual LLM pattern for building AI assistants that can resist prompt in- jection. https://simonwillison.net/2023/Apr/25/dual-llm-pattern/, April 2023. Accessed: 2026-05-06

  35. [35]

    Autogen: Enabling next-gen LLM applications via multi-agent conversation,

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, and Chi Wang. Autogen: Enabling next-gen LLM applications via multi-agent conversation,

  36. [36]

    " "

    URL https://arxiv.org/abs/2308.08155. 12 A CaMeL Comparison Data Table 1: Number of AgentDojo Slack tasks completed (out of 21) by TYPE GUARD and CaMeL when not under attack, with and without IFC policies. No Policies With Policies Metric T YPE GUARD CaMeL T YPE GUARD CaMeL Utility (/21) 15 15 8 7 Table 2: Utility (benign tasks completed) and security (in...