pith. sign in

arxiv: 2606.20510 · v1 · pith:E2ZCJKSXnew · submitted 2026-06-18 · 💻 cs.CR · cs.AI

Efficient and Sound Probabilistic Verification for AI Agents

Pith reviewed 2026-06-26 16:54 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords probabilistic verificationAI agentsDatalog policiesdistributionally robust optimizationruntime monitoringsecurity policiestool calling agents
0
0 comments X

The pith

A distributionally robust optimization framework computes sound upper bounds on the probability that AI agents violate Datalog security policies, even when predicates are probabilistic and correlations are unknown.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

AI agents often need to follow security policies written in Datalog, but real systems involve probabilistic elements like imperfect detectors. Previous methods required assuming independence between these probabilities to calculate violation risks. This paper introduces a method using distributionally robust optimization to find upper bounds on violation probabilities that remain valid no matter how the probabilities correlate. The approach is tested on benchmarks for terminal and tool-calling agents, showing better performance than earlier techniques while providing rigorous guarantees. This matters because it allows enforcing policies in uncertain environments without overly conservative or unsafe assumptions.

Core claim

The central claim is that probabilistic verification for Datalog policies in AI agents can be reduced to a distributionally robust optimization problem, which yields efficiently computable sound upper bounds on the probability of policy violation that hold regardless of any correlations between the probabilistic predicates or state transitions.

What carries the argument

Distributionally robust optimization formulated over the probabilistic predicates and state transitions to bound Datalog policy violations.

If this is right

  • Sound upper bounds on policy violation probabilities are obtained without needing independence assumptions between predicates.
  • The method outperforms prior art on standard benchmarks for terminal and tool calling agents.
  • It improves the security-utility trade-off while maintaining rigorous bounds.
  • Verification remains efficient and sound for both terminal and tool calling agents.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the framework scales to more complex policies, it could enable runtime monitoring in production AI systems with probabilistic components.
  • Connections to other formal verification domains might allow similar robust bounds for non-Datalog logics.
  • Testing on real-world AI agent deployments could reveal practical limitations not seen in benchmarks.

Load-bearing premise

The probabilistic predicates and state transitions can be formulated such that distributionally robust optimization produces sound and efficiently computable bounds on Datalog policy violations.

What would settle it

Running the method on a benchmark where the actual observed violation probability exceeds the computed upper bound would falsify the soundness claim.

read the original abstract

Securing AI agents that operate in complex digital environments has become a critical need, and runtime monitoring approaches that formulate and enforce policies expressed in a formal language like Datalog offer a promising solution. However, existing approaches are restricted to deterministic policies. In many practical applications of AI agents, there is a need to enforce security policies in the face of ambiguity, leading to probabilistic predicates or state transitions (for example, a declassifier or Personally Identifiable Information (PII) detector that has some failure probability on each invocation). Furthermore, in many such applications, one cannot easily make the independence assumptions necessary to invoke prior work on probabilistic inference in Datalog. We address this by introducing a sound and efficient framework for such verification based on distributionally robust optimization, computing sound upper bounds on the probability of policy violation regardless of possible correlations between predicates. On standard benchmarks for terminal and tool calling agents, we demonstrate that our approach outperforms prior art and improves the security-utility trade-off while ensuring rigorous bounds on the probability of policy violation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper introduces a framework for probabilistic verification of AI agents enforcing Datalog policies that may involve probabilistic predicates or state transitions. It uses distributionally robust optimization (DRO) to derive sound upper bounds on the probability of policy violations that hold regardless of correlations among the predicates, avoiding independence assumptions required by prior probabilistic Datalog inference methods. The approach is evaluated on standard benchmarks involving terminal and tool-calling agents, where it is reported to outperform prior art while improving the security-utility trade-off under rigorous probabilistic guarantees.

Significance. If the soundness of the DRO-based bounds and the reported efficiency gains are substantiated, the work would meaningfully advance runtime monitoring for AI agents by extending formal policy enforcement into probabilistic settings common in practice (e.g., imperfect PII detectors). The explicit handling of arbitrary correlations via DRO is a clear technical contribution over independence-based methods, and reproducible benchmark comparisons would strengthen its practical relevance for security-utility trade-offs.

minor comments (3)
  1. [Abstract] Abstract: the description of the DRO formulation could include one sentence on how the ambiguity set is constructed from the probabilistic predicates to make the soundness claim immediately traceable.
  2. [Evaluation] Evaluation section: the security-utility trade-off plots would benefit from explicit error bars or confidence intervals on the reported probability bounds to allow direct comparison with prior art.
  3. [Framework] Notation: ensure that the probabilistic predicate failure probabilities are uniformly denoted (e.g., p_i) across the framework description and the benchmark tables.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary and recommendation of minor revision. We appreciate the recognition that our DRO-based approach provides sound bounds without independence assumptions and improves security-utility trade-offs on the evaluated benchmarks.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents a DRO-based framework for computing sound upper bounds on Datalog policy violation probabilities under probabilistic predicates without independence assumptions. The abstract and description indicate reliance on external distributionally robust optimization theory for the bounds, with no exhibited equations, fitted parameters, or self-citations that reduce the claimed predictions or bounds to inputs by construction. The central result is described as building on prior DRO methods rather than self-referential definitions or renamings, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, invented entities, or ad-hoc axioms are stated. The central claim rests on the applicability of distributionally robust optimization to Datalog policies.

axioms (1)
  • domain assumption Distributionally robust optimization yields sound upper bounds on policy violation probability for the given Datalog setting
    Invoked to guarantee bounds without independence assumptions.

pith-pipeline@v0.9.1-grok · 5726 in / 1253 out tokens · 33925 ms · 2026-06-26T16:54:41.197540+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 17 canonical work pages

  1. [1]

    Probabilistic Inference for Datalog with Correlated Inputs , year =

    Wang, Jingbo and Halalingaiah, Shashin and Chen, Weiyi and Wang, Chao and Dillig, I. Probabilistic Inference for Datalog with Correlated Inputs , year =. Proc. ACM Program. Lang. , month = oct, articleno =. doi:10.1145/3763058 , abstract =

  2. [2]

    The Eleventh International Conference on Learning Representations , year=

    ReAct: Synergizing Reasoning and Acting in Language Models , author=. The Eleventh International Conference on Learning Representations , year=

  3. [3]

    AutoGen: Enabling Next-Gen

    Qingyun Wu and Gagan Bansal and Jieyu Zhang and Yiran Wu and Beibin Li and Erkang Zhu and Li Jiang and Xiaoyun Zhang and Shaokun Zhang and Jiale Liu and Ahmed Hassan Awadallah and Ryen W White and Doug Burger and Chi Wang , booktitle=. AutoGen: Enabling Next-Gen. 2024 , url=

  4. [4]

    2025 , eprint=

    Securing AI Agents with Information-Flow Control , author=. 2025 , eprint=

  5. [5]

    Operations Research , month = dec, pages =

    Wiesemann, Wolfram and Kuhn, Daniel and Sim, Melvyn , title =. Operations Research , month = dec, pages =. 2014 , issue_date =

  6. [6]

    Proceedings of the 37th International Conference on Neural Information Processing Systems , articleno =

    Yang, John and Prabhakar, Akshara and Narasimhan, Karthik and Yao, Shunyu , title =. Proceedings of the 37th International Conference on Neural Information Processing Systems , articleno =. 2023 , publisher =

  7. [7]

    NL 2 B ash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System

    Lin, Xi Victoria and Wang, Chenglong and Zettlemoyer, Luke and Ernst, Michael D. NL 2 B ash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System. Proceedings of the Eleventh International Conference on Language Resources and Evaluation ( LREC 2018). 2018

  8. [8]

    2024 , eprint=

    System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective , author=. 2024 , eprint=

  9. [9]

    2026 , eprint=

    Formal Policy Enforcement for Real-World Agentic Systems , author=. 2026 , eprint=

  10. [10]

    Defeating Prompt Injections by Design , url =

    Debenedetti, Edoardo and Shumailov, Ilia and Fan, Tianqi and Hayes, Jamie and Carlini, Nicholas and Fabian, Daniel and Kern, Christoph and Shi, Chongyang and Terzis, Andreas and Tram\`er, Florian , booktitle =. Defeating Prompt Injections by Design , url =

  11. [11]

    Forty-second International Conference on Machine Learning , year=

    ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning , author=. Forty-second International Conference on Machine Learning , year=

  12. [12]

    2026 , volume =

    Haoyu Wang and Chris Poskitt and Jun Sun , booktitle =. 2026 , volume =

  13. [13]

    2025 , eprint=

    Enforcing Temporal Constraints for LLM Agents , author=. 2025 , eprint=

  14. [14]

    2025 , eprint=

    VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation , author=. 2025 , eprint=

  15. [15]

    2016 , issue_date =

    O'donoghue, Brendan and Chu, Eric and Parikh, Neal and Boyd, Stephen , title =. 2016 , issue_date =. doi:10.1007/s10957-016-0892-3 , journal =

  16. [16]

    Proceedings of the 20th International Joint Conference on Artifical Intelligence , pages =

    De Raedt, Luc and Kimmig, Angelika and Toivonen, Hannu , title =. Proceedings of the 20th International Joint Conference on Artifical Intelligence , pages =. 2007 , publisher =

  17. [17]

    2026 , eprint=

    AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security , author=. 2026 , eprint=

  18. [18]

    2019 , isbn =

    Wang, Jingbo and Sung, Chungha and Wang, Chao , title =. 2019 , isbn =. doi:10.1145/3338906.3338913 , booktitle =

  19. [19]

    Proceedings of the 20th National Conference on Artificial Intelligence - Volume 1 , pages =

    Sang, Tian and Bearne, Paul and Kautz, Henry , title =. Proceedings of the 20th National Conference on Artificial Intelligence - Volume 1 , pages =. 2005 , isbn =

  20. [20]

    Workshops at the Thirtieth AAAI Conference on Artificial Intelligence , year=

    JudgeD: a probabilistic datalog with dependencies , author=. Workshops at the Thirtieth AAAI Conference on Artificial Intelligence , year=

  21. [21]

    2023 , issue_date =

    Li, Ziyang and Huang, Jiani and Naik, Mayur , title =. 2023 , issue_date =. doi:10.1145/3591280 , journal =

  22. [22]

    Proceedings of the 32nd International Conference on Neural Information Processing Systems , pages =

    Manhaeve, Robin and Dumancic, Sebastijan and Kimmig, Angelika and Demeester, Thomas and Raedt, Luc De , title =. Proceedings of the 32nd International Conference on Neural Information Processing Systems , pages =. 2018 , publisher =

  23. [23]

    Theory and Practice of Logic Programming , volume=

    Inference and learning in probabilistic logic programs using weighted boolean formulas , author=. Theory and Practice of Logic Programming , volume=. 2015 , publisher=

  24. [24]

    2020 , issue_date =

    Holtzen, Steven and Van den Broeck, Guy and Millstein, Todd , title =. 2020 , issue_date =. doi:10.1145/3428208 , journal =

  25. [25]

    Decision Support Systems , volume=

    A linear programming framework for logics of uncertainty , author=. Decision Support Systems , volume=. 1996 , publisher=

  26. [26]

    Proceedings of the 34th USENIX Conference on Security Symposium , articleno =

    Chen, Sizhe and Piet, Julien and Sitawarin, Chawin and Wagner, David , title =. Proceedings of the 34th USENIX Conference on Security Symposium , articleno =. 2025 , isbn =

  27. [27]

    2025 , isbn =

    Chen, Sizhe and Zharmagambetov, Arman and Mahloujifar, Saeed and Chaudhuri, Kamalika and Wagner, David and Guo, Chuan , title =. 2025 , isbn =. doi:10.1145/3719027.3744836 , booktitle =

  28. [28]

    and Liskov, Barbara , title =

    Myers, Andrew C. and Liskov, Barbara , title =. 1997 , isbn =. doi:10.1145/268998.266669 , booktitle =

  29. [29]

    Schneider

    Schneider, Fred B. , title =. 2000 , issue_date =. doi:10.1145/353323.353382 , journal =

  30. [30]

    , title =

    Denning, Dorothy E. , title =. 1976 , issue_date =. doi:10.1145/360051.360056 , journal =

  31. [31]

    2024 , booktitle =

    Zheng, Chujie and Yin, Fan and Zhou, Hao and Meng, Fandong and Zhou, Jie and Chang, Kai-Wei and Huang, Minlie and Peng, Nanyun , title =. 2024 , booktitle =

  32. [32]

    Toolformer: language models can teach themselves to use tools , year =

    Schick, Timo and Dwivedi-Yu, Jane and Dess\'. Toolformer: language models can teach themselves to use tools , year =. Proceedings of the 37th International Conference on Neural Information Processing Systems , articleno =

  33. [33]

    IEEE Journal on selected areas in communications , volume=

    Language-based information-flow security , author=. IEEE Journal on selected areas in communications , volume=. 2003 , publisher=

  34. [34]

    2024 , eprint=

    Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents , author=. 2024 , eprint=

  35. [35]

    2026 , eprint=

    CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution , author=. 2026 , eprint=

  36. [36]

    Proceedings of the AAAI Conference on Artificial Intelligence , author=

    Beyond the Grounding Bottleneck: Datalog Techniques for Inference in Probabilistic Logic Programs , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2020 , month=. doi:10.1609/aaai.v34i06.6591 , abstractNote=

  37. [37]

    Declarative Probabilistic Programming with Datalog , year =

    B\'. Declarative Probabilistic Programming with Datalog , year =. doi:10.1145/3132700 , journal =

  38. [38]

    2005 , isbn =

    Crampton, Jason , title =. 2005 , isbn =. doi:10.1145/1063979.1063986 , booktitle =

  39. [39]

    2023 , eprint=

    Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations , author=. 2023 , eprint=

  40. [40]

    NeMo guardrails: A toolkit for controllable and safe LLM applications with programmable rails

    Rebedea, Traian and Dinu, Razvan and Sreedhar, Makesh Narsimhan and Parisien, Christopher and Cohen, Jonathan. N e M o Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2023. doi:10.18653/v1/2023.emnlp-demo.40

  41. [41]

    2024 , eprint=

    Operationalizing Contextual Integrity in Privacy-Conscious Assistants , author=. 2024 , eprint=

  42. [42]

    2026 , url=

    Hao Li and Xiaogeng Liu and CHIU Hung Chun and Dianqi Li and Ning Zhang and Chaowei Xiao , booktitle=. 2026 , url=

  43. [43]

    Several-Dream9346

    Tsai, Lillian and Bagdasarian, Eugene , title =. 2025 , isbn =. doi:10.1145/3713082.3730378 , booktitle =

  44. [44]

    2026 , eprint=

    Progent: Securing AI Agents with Privilege Control , author=. 2026 , eprint=

  45. [45]

    2025 , eprint=

    MiniScope: A Least Privilege Framework for Authorizing Tool Calling Agents , author=. 2025 , eprint=

  46. [46]

    Agent Security Bench (

    Hanrong Zhang and Jingyuan Huang and Kai Mei and Yifei Yao and Zhenting Wang and Chenlu Zhan and Hongwei Wang and Yongfeng Zhang , booktitle=. Agent Security Bench (. 2025 , url=

  47. [47]

    Findings of the Association for Computational Linguistics: ACL 2024 , pages=

    Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=

  48. [48]

    AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for

    Edoardo Debenedetti and Jie Zhang and Mislav Balunovic and Luca Beurer-Kellner and Marc Fischer and Florian Tram. AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for. The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

  49. [49]

    Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

    Trustagent: Towards safe and trustworthy llm-based agents , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

  50. [50]

    Microsoft Security Blog , year =

    Microsoft Defender Security Research Team and Uri Oren and Amit Eliahu and Dor Edry , title =. Microsoft Security Blog , year =

  51. [51]

    O’Brien, Carrie J

    Park, Joon Sung and O'Brien, Joseph and Cai, Carrie Jun and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S. , title =. 2023 , isbn =. doi:10.1145/3586183.3606763 , booktitle =

  52. [52]

    The Trail of Bits Blog , year =

    Lucas Bourtoule , title =. The Trail of Bits Blog , year =

  53. [53]

    Souffl \'e : On Synthesis of Program Analyzers

    Jordan, Herbert and Scholz, Bernhard and Suboti \' c , Pavle. Souffl \'e : On Synthesis of Program Analyzers. Computer Aided Verification. 2016

  54. [54]

    The Fourteenth International Conference on Learning Representations , year=

    Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces , author=. The Fourteenth International Conference on Learning Representations , year=

  55. [55]

    2024 , url=

    Carlos E Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik R Narasimhan , booktitle=. 2024 , url=

  56. [56]

    Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems , series =

    Constraint Integer Programming: a New Approach to Integrate CP and MIP , author =. Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems , series =. 2008 , publisher =

  57. [57]

    Z3: An Efficient SMT Solver

    de Moura, Leonardo and Bj rner, Nikolaj. Z3: An Efficient SMT Solver. Tools and Algorithms for the Construction and Analysis of Systems. 2008

  58. [58]

    Proceedings of the 9th ACM Conference on Computer and Communications Security , pages =

    Wagner, David and Soto, Paolo , title =. Proceedings of the 9th ACM Conference on Computer and Communications Security , pages =. 2002 , isbn =. doi:10.1145/586110.586145 , abstract =

  59. [59]

    , biburl =

    Anderson, James P. , biburl =

  60. [60]

    McCormick

    Mccormick, Garth P. , title =. 1976 , issue_date =. doi:10.1007/BF01580665 , journal =

  61. [61]

    Cryptography and Data Security

    Denning, Dorothy Elizabeth Robling. Cryptography and Data Security

  62. [62]

    , booktitle=

    Gray, J.W. , booktitle=. Toward a mathematical foundation for information flow security , year=