Efficient and Sound Probabilistic Verification for AI Agents

Alaia Solko-Breslin; Krishnamurthy Dj Dvijotham; Mihai Christodorescu; Pramod Kaushik Mudrakarta; Somesh Jha

arxiv: 2606.20510 · v1 · pith:E2ZCJKSXnew · submitted 2026-06-18 · 💻 cs.CR · cs.AI

Efficient and Sound Probabilistic Verification for AI Agents

Alaia Solko-Breslin , Pramod Kaushik Mudrakarta , Mihai Christodorescu , Somesh Jha , Krishnamurthy Dj Dvijotham This is my paper

Pith reviewed 2026-06-26 16:54 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords probabilistic verificationAI agentsDatalog policiesdistributionally robust optimizationruntime monitoringsecurity policiestool calling agents

0 comments

The pith

A distributionally robust optimization framework computes sound upper bounds on the probability that AI agents violate Datalog security policies, even when predicates are probabilistic and correlations are unknown.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

AI agents often need to follow security policies written in Datalog, but real systems involve probabilistic elements like imperfect detectors. Previous methods required assuming independence between these probabilities to calculate violation risks. This paper introduces a method using distributionally robust optimization to find upper bounds on violation probabilities that remain valid no matter how the probabilities correlate. The approach is tested on benchmarks for terminal and tool-calling agents, showing better performance than earlier techniques while providing rigorous guarantees. This matters because it allows enforcing policies in uncertain environments without overly conservative or unsafe assumptions.

Core claim

The central claim is that probabilistic verification for Datalog policies in AI agents can be reduced to a distributionally robust optimization problem, which yields efficiently computable sound upper bounds on the probability of policy violation that hold regardless of any correlations between the probabilistic predicates or state transitions.

What carries the argument

Distributionally robust optimization formulated over the probabilistic predicates and state transitions to bound Datalog policy violations.

If this is right

Sound upper bounds on policy violation probabilities are obtained without needing independence assumptions between predicates.
The method outperforms prior art on standard benchmarks for terminal and tool calling agents.
It improves the security-utility trade-off while maintaining rigorous bounds.
Verification remains efficient and sound for both terminal and tool calling agents.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the framework scales to more complex policies, it could enable runtime monitoring in production AI systems with probabilistic components.
Connections to other formal verification domains might allow similar robust bounds for non-Datalog logics.
Testing on real-world AI agent deployments could reveal practical limitations not seen in benchmarks.

Load-bearing premise

The probabilistic predicates and state transitions can be formulated such that distributionally robust optimization produces sound and efficiently computable bounds on Datalog policy violations.

What would settle it

Running the method on a benchmark where the actual observed violation probability exceeds the computed upper bound would falsify the soundness claim.

read the original abstract

Securing AI agents that operate in complex digital environments has become a critical need, and runtime monitoring approaches that formulate and enforce policies expressed in a formal language like Datalog offer a promising solution. However, existing approaches are restricted to deterministic policies. In many practical applications of AI agents, there is a need to enforce security policies in the face of ambiguity, leading to probabilistic predicates or state transitions (for example, a declassifier or Personally Identifiable Information (PII) detector that has some failure probability on each invocation). Furthermore, in many such applications, one cannot easily make the independence assumptions necessary to invoke prior work on probabilistic inference in Datalog. We address this by introducing a sound and efficient framework for such verification based on distributionally robust optimization, computing sound upper bounds on the probability of policy violation regardless of possible correlations between predicates. On standard benchmarks for terminal and tool calling agents, we demonstrate that our approach outperforms prior art and improves the security-utility trade-off while ensuring rigorous bounds on the probability of policy violation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a DRO method for sound upper bounds on probabilistic Datalog policy violations in AI agents without independence assumptions.

read the letter

The paper's main move is to apply distributionally robust optimization to get sound upper bounds on the chance that a Datalog policy for an AI agent gets violated, even when the underlying probabilistic predicates can be correlated in arbitrary ways.

What stands out is the shift away from deterministic policies or independence assumptions that earlier probabilistic Datalog work relied on. The authors target real cases like PII detectors or declassifiers that have per-invocation failure probabilities, and they claim the DRO setup produces rigorous bounds while improving the security-utility tradeoff on terminal and tool-calling agent benchmarks.

The approach looks like a direct lift of DRO ideas to this verification setting, which makes sense if the Datalog rules can be encoded into the robust optimization problem without blowing up the computation. The benchmarks are standard ones, so the outperformance claim is at least testable.

The soft spot is that the abstract gives almost no derivation or encoding details, so it is hard to see exactly how the Datalog clauses turn into the DRO instance or how tight the resulting bounds are in practice. If the full paper has clean reductions and reproducible experimental numbers, that gap closes; otherwise the efficiency and practicality claims stay hard to assess.

This is for people working on runtime security for agents that must handle noisy or probabilistic components. A reader already familiar with DRO or formal policy languages would get the most out of it.

It deserves peer review because the problem is concrete and the proposed fix is specific enough to check.

Referee Report

0 major / 3 minor

Summary. The paper introduces a framework for probabilistic verification of AI agents enforcing Datalog policies that may involve probabilistic predicates or state transitions. It uses distributionally robust optimization (DRO) to derive sound upper bounds on the probability of policy violations that hold regardless of correlations among the predicates, avoiding independence assumptions required by prior probabilistic Datalog inference methods. The approach is evaluated on standard benchmarks involving terminal and tool-calling agents, where it is reported to outperform prior art while improving the security-utility trade-off under rigorous probabilistic guarantees.

Significance. If the soundness of the DRO-based bounds and the reported efficiency gains are substantiated, the work would meaningfully advance runtime monitoring for AI agents by extending formal policy enforcement into probabilistic settings common in practice (e.g., imperfect PII detectors). The explicit handling of arbitrary correlations via DRO is a clear technical contribution over independence-based methods, and reproducible benchmark comparisons would strengthen its practical relevance for security-utility trade-offs.

minor comments (3)

[Abstract] Abstract: the description of the DRO formulation could include one sentence on how the ambiguity set is constructed from the probabilistic predicates to make the soundness claim immediately traceable.
[Evaluation] Evaluation section: the security-utility trade-off plots would benefit from explicit error bars or confidence intervals on the reported probability bounds to allow direct comparison with prior art.
[Framework] Notation: ensure that the probabilistic predicate failure probabilities are uniformly denoted (e.g., p_i) across the framework description and the benchmark tables.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary and recommendation of minor revision. We appreciate the recognition that our DRO-based approach provides sound bounds without independence assumptions and improves security-utility trade-offs on the evaluated benchmarks.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents a DRO-based framework for computing sound upper bounds on Datalog policy violation probabilities under probabilistic predicates without independence assumptions. The abstract and description indicate reliance on external distributionally robust optimization theory for the bounds, with no exhibited equations, fitted parameters, or self-citations that reduce the claimed predictions or bounds to inputs by construction. The central result is described as building on prior DRO methods rather than self-referential definitions or renamings, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, invented entities, or ad-hoc axioms are stated. The central claim rests on the applicability of distributionally robust optimization to Datalog policies.

axioms (1)

domain assumption Distributionally robust optimization yields sound upper bounds on policy violation probability for the given Datalog setting
Invoked to guarantee bounds without independence assumptions.

pith-pipeline@v0.9.1-grok · 5726 in / 1253 out tokens · 33925 ms · 2026-06-26T16:54:41.197540+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 17 canonical work pages

[1]

Probabilistic Inference for Datalog with Correlated Inputs , year =

Wang, Jingbo and Halalingaiah, Shashin and Chen, Weiyi and Wang, Chao and Dillig, I. Probabilistic Inference for Datalog with Correlated Inputs , year =. Proc. ACM Program. Lang. , month = oct, articleno =. doi:10.1145/3763058 , abstract =

work page doi:10.1145/3763058
[2]

The Eleventh International Conference on Learning Representations , year=

ReAct: Synergizing Reasoning and Acting in Language Models , author=. The Eleventh International Conference on Learning Representations , year=
[3]

AutoGen: Enabling Next-Gen

Qingyun Wu and Gagan Bansal and Jieyu Zhang and Yiran Wu and Beibin Li and Erkang Zhu and Li Jiang and Xiaoyun Zhang and Shaokun Zhang and Jiale Liu and Ahmed Hassan Awadallah and Ryen W White and Doug Burger and Chi Wang , booktitle=. AutoGen: Enabling Next-Gen. 2024 , url=

2024
[4]

2025 , eprint=

Securing AI Agents with Information-Flow Control , author=. 2025 , eprint=

2025
[5]

Operations Research , month = dec, pages =

Wiesemann, Wolfram and Kuhn, Daniel and Sim, Melvyn , title =. Operations Research , month = dec, pages =. 2014 , issue_date =

2014
[6]

Proceedings of the 37th International Conference on Neural Information Processing Systems , articleno =

Yang, John and Prabhakar, Akshara and Narasimhan, Karthik and Yao, Shunyu , title =. Proceedings of the 37th International Conference on Neural Information Processing Systems , articleno =. 2023 , publisher =

2023
[7]

NL 2 B ash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System

Lin, Xi Victoria and Wang, Chenglong and Zettlemoyer, Luke and Ernst, Michael D. NL 2 B ash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System. Proceedings of the Eleventh International Conference on Language Resources and Evaluation ( LREC 2018). 2018

2018
[8]

2024 , eprint=

System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective , author=. 2024 , eprint=

2024
[9]

2026 , eprint=

Formal Policy Enforcement for Real-World Agentic Systems , author=. 2026 , eprint=

2026
[10]

Defeating Prompt Injections by Design , url =

Debenedetti, Edoardo and Shumailov, Ilia and Fan, Tianqi and Hayes, Jamie and Carlini, Nicholas and Fabian, Daniel and Kern, Christoph and Shi, Chongyang and Terzis, Andreas and Tram\`er, Florian , booktitle =. Defeating Prompt Injections by Design , url =
[11]

Forty-second International Conference on Machine Learning , year=

ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning , author=. Forty-second International Conference on Machine Learning , year=
[12]

2026 , volume =

Haoyu Wang and Chris Poskitt and Jun Sun , booktitle =. 2026 , volume =

2026
[13]

2025 , eprint=

Enforcing Temporal Constraints for LLM Agents , author=. 2025 , eprint=

2025
[14]

2025 , eprint=

VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation , author=. 2025 , eprint=

2025
[15]

2016 , issue_date =

O'donoghue, Brendan and Chu, Eric and Parikh, Neal and Boyd, Stephen , title =. 2016 , issue_date =. doi:10.1007/s10957-016-0892-3 , journal =

work page doi:10.1007/s10957-016-0892-3 2016
[16]

Proceedings of the 20th International Joint Conference on Artifical Intelligence , pages =

De Raedt, Luc and Kimmig, Angelika and Toivonen, Hannu , title =. Proceedings of the 20th International Joint Conference on Artifical Intelligence , pages =. 2007 , publisher =

2007
[17]

2026 , eprint=

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security , author=. 2026 , eprint=

2026
[18]

2019 , isbn =

Wang, Jingbo and Sung, Chungha and Wang, Chao , title =. 2019 , isbn =. doi:10.1145/3338906.3338913 , booktitle =

work page doi:10.1145/3338906.3338913 2019
[19]

Proceedings of the 20th National Conference on Artificial Intelligence - Volume 1 , pages =

Sang, Tian and Bearne, Paul and Kautz, Henry , title =. Proceedings of the 20th National Conference on Artificial Intelligence - Volume 1 , pages =. 2005 , isbn =

2005
[20]

Workshops at the Thirtieth AAAI Conference on Artificial Intelligence , year=

JudgeD: a probabilistic datalog with dependencies , author=. Workshops at the Thirtieth AAAI Conference on Artificial Intelligence , year=
[21]

2023 , issue_date =

Li, Ziyang and Huang, Jiani and Naik, Mayur , title =. 2023 , issue_date =. doi:10.1145/3591280 , journal =

work page doi:10.1145/3591280 2023
[22]

Proceedings of the 32nd International Conference on Neural Information Processing Systems , pages =

Manhaeve, Robin and Dumancic, Sebastijan and Kimmig, Angelika and Demeester, Thomas and Raedt, Luc De , title =. Proceedings of the 32nd International Conference on Neural Information Processing Systems , pages =. 2018 , publisher =

2018
[23]

Theory and Practice of Logic Programming , volume=

Inference and learning in probabilistic logic programs using weighted boolean formulas , author=. Theory and Practice of Logic Programming , volume=. 2015 , publisher=

2015
[24]

2020 , issue_date =

Holtzen, Steven and Van den Broeck, Guy and Millstein, Todd , title =. 2020 , issue_date =. doi:10.1145/3428208 , journal =

work page doi:10.1145/3428208 2020
[25]

Decision Support Systems , volume=

A linear programming framework for logics of uncertainty , author=. Decision Support Systems , volume=. 1996 , publisher=

1996
[26]

Proceedings of the 34th USENIX Conference on Security Symposium , articleno =

Chen, Sizhe and Piet, Julien and Sitawarin, Chawin and Wagner, David , title =. Proceedings of the 34th USENIX Conference on Security Symposium , articleno =. 2025 , isbn =

2025
[27]

2025 , isbn =

Chen, Sizhe and Zharmagambetov, Arman and Mahloujifar, Saeed and Chaudhuri, Kamalika and Wagner, David and Guo, Chuan , title =. 2025 , isbn =. doi:10.1145/3719027.3744836 , booktitle =

work page doi:10.1145/3719027.3744836 2025
[28]

and Liskov, Barbara , title =

Myers, Andrew C. and Liskov, Barbara , title =. 1997 , isbn =. doi:10.1145/268998.266669 , booktitle =

work page doi:10.1145/268998.266669 1997
[29]

Schneider

Schneider, Fred B. , title =. 2000 , issue_date =. doi:10.1145/353323.353382 , journal =

work page doi:10.1145/353323.353382 2000
[30]

, title =

Denning, Dorothy E. , title =. 1976 , issue_date =. doi:10.1145/360051.360056 , journal =

work page doi:10.1145/360051.360056 1976
[31]

2024 , booktitle =

Zheng, Chujie and Yin, Fan and Zhou, Hao and Meng, Fandong and Zhou, Jie and Chang, Kai-Wei and Huang, Minlie and Peng, Nanyun , title =. 2024 , booktitle =

2024
[32]

Toolformer: language models can teach themselves to use tools , year =

Schick, Timo and Dwivedi-Yu, Jane and Dess\'. Toolformer: language models can teach themselves to use tools , year =. Proceedings of the 37th International Conference on Neural Information Processing Systems , articleno =
[33]

IEEE Journal on selected areas in communications , volume=

Language-based information-flow security , author=. IEEE Journal on selected areas in communications , volume=. 2003 , publisher=

2003
[34]

2024 , eprint=

Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents , author=. 2024 , eprint=

2024
[35]

2026 , eprint=

CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution , author=. 2026 , eprint=

2026
[36]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Beyond the Grounding Bottleneck: Datalog Techniques for Inference in Probabilistic Logic Programs , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2020 , month=. doi:10.1609/aaai.v34i06.6591 , abstractNote=

work page doi:10.1609/aaai.v34i06.6591 2020
[37]

Declarative Probabilistic Programming with Datalog , year =

B\'. Declarative Probabilistic Programming with Datalog , year =. doi:10.1145/3132700 , journal =

work page doi:10.1145/3132700
[38]

2005 , isbn =

Crampton, Jason , title =. 2005 , isbn =. doi:10.1145/1063979.1063986 , booktitle =

work page doi:10.1145/1063979.1063986 2005
[39]

2023 , eprint=

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations , author=. 2023 , eprint=

2023
[40]

NeMo guardrails: A toolkit for controllable and safe LLM applications with programmable rails

Rebedea, Traian and Dinu, Razvan and Sreedhar, Makesh Narsimhan and Parisien, Christopher and Cohen, Jonathan. N e M o Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2023. doi:10.18653/v1/2023.emnlp-demo.40

work page doi:10.18653/v1/2023.emnlp-demo.40 2023
[41]

2024 , eprint=

Operationalizing Contextual Integrity in Privacy-Conscious Assistants , author=. 2024 , eprint=

2024
[42]

2026 , url=

Hao Li and Xiaogeng Liu and CHIU Hung Chun and Dianqi Li and Ning Zhang and Chaowei Xiao , booktitle=. 2026 , url=

2026
[43]

Several-Dream9346

Tsai, Lillian and Bagdasarian, Eugene , title =. 2025 , isbn =. doi:10.1145/3713082.3730378 , booktitle =

work page doi:10.1145/3713082.3730378 2025
[44]

2026 , eprint=

Progent: Securing AI Agents with Privilege Control , author=. 2026 , eprint=

2026
[45]

2025 , eprint=

MiniScope: A Least Privilege Framework for Authorizing Tool Calling Agents , author=. 2025 , eprint=

2025
[46]

Agent Security Bench (

Hanrong Zhang and Jingyuan Huang and Kai Mei and Yifei Yao and Zhenting Wang and Chenlu Zhan and Hongwei Wang and Yongfeng Zhang , booktitle=. Agent Security Bench (. 2025 , url=

2025
[47]

Findings of the Association for Computational Linguistics: ACL 2024 , pages=

Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=

2024
[48]

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for

Edoardo Debenedetti and Jie Zhang and Mislav Balunovic and Luca Beurer-Kellner and Marc Fischer and Florian Tram. AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for. The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=
[49]

Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

Trustagent: Towards safe and trustworthy llm-based agents , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

2024
[50]

Microsoft Security Blog , year =

Microsoft Defender Security Research Team and Uri Oren and Amit Eliahu and Dor Edry , title =. Microsoft Security Blog , year =
[51]

O’Brien, Carrie J

Park, Joon Sung and O'Brien, Joseph and Cai, Carrie Jun and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S. , title =. 2023 , isbn =. doi:10.1145/3586183.3606763 , booktitle =

work page doi:10.1145/3586183.3606763 2023
[52]

The Trail of Bits Blog , year =

Lucas Bourtoule , title =. The Trail of Bits Blog , year =
[53]

Souffl \'e : On Synthesis of Program Analyzers

Jordan, Herbert and Scholz, Bernhard and Suboti \' c , Pavle. Souffl \'e : On Synthesis of Program Analyzers. Computer Aided Verification. 2016

2016
[54]

The Fourteenth International Conference on Learning Representations , year=

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces , author=. The Fourteenth International Conference on Learning Representations , year=
[55]

2024 , url=

Carlos E Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik R Narasimhan , booktitle=. 2024 , url=

2024
[56]

Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems , series =

Constraint Integer Programming: a New Approach to Integrate CP and MIP , author =. Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems , series =. 2008 , publisher =

2008
[57]

Z3: An Efficient SMT Solver

de Moura, Leonardo and Bj rner, Nikolaj. Z3: An Efficient SMT Solver. Tools and Algorithms for the Construction and Analysis of Systems. 2008

2008
[58]

Proceedings of the 9th ACM Conference on Computer and Communications Security , pages =

Wagner, David and Soto, Paolo , title =. Proceedings of the 9th ACM Conference on Computer and Communications Security , pages =. 2002 , isbn =. doi:10.1145/586110.586145 , abstract =

work page doi:10.1145/586110.586145 2002
[59]

, biburl =

Anderson, James P. , biburl =
[60]

McCormick

Mccormick, Garth P. , title =. 1976 , issue_date =. doi:10.1007/BF01580665 , journal =

work page doi:10.1007/bf01580665 1976
[61]

Cryptography and Data Security

Denning, Dorothy Elizabeth Robling. Cryptography and Data Security
[62]

, booktitle=

Gray, J.W. , booktitle=. Toward a mathematical foundation for information flow security , year=

[1] [1]

Probabilistic Inference for Datalog with Correlated Inputs , year =

Wang, Jingbo and Halalingaiah, Shashin and Chen, Weiyi and Wang, Chao and Dillig, I. Probabilistic Inference for Datalog with Correlated Inputs , year =. Proc. ACM Program. Lang. , month = oct, articleno =. doi:10.1145/3763058 , abstract =

work page doi:10.1145/3763058

[2] [2]

The Eleventh International Conference on Learning Representations , year=

ReAct: Synergizing Reasoning and Acting in Language Models , author=. The Eleventh International Conference on Learning Representations , year=

[3] [3]

AutoGen: Enabling Next-Gen

Qingyun Wu and Gagan Bansal and Jieyu Zhang and Yiran Wu and Beibin Li and Erkang Zhu and Li Jiang and Xiaoyun Zhang and Shaokun Zhang and Jiale Liu and Ahmed Hassan Awadallah and Ryen W White and Doug Burger and Chi Wang , booktitle=. AutoGen: Enabling Next-Gen. 2024 , url=

2024

[4] [4]

2025 , eprint=

Securing AI Agents with Information-Flow Control , author=. 2025 , eprint=

2025

[5] [5]

Operations Research , month = dec, pages =

Wiesemann, Wolfram and Kuhn, Daniel and Sim, Melvyn , title =. Operations Research , month = dec, pages =. 2014 , issue_date =

2014

[6] [6]

Proceedings of the 37th International Conference on Neural Information Processing Systems , articleno =

Yang, John and Prabhakar, Akshara and Narasimhan, Karthik and Yao, Shunyu , title =. Proceedings of the 37th International Conference on Neural Information Processing Systems , articleno =. 2023 , publisher =

2023

[7] [7]

NL 2 B ash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System

Lin, Xi Victoria and Wang, Chenglong and Zettlemoyer, Luke and Ernst, Michael D. NL 2 B ash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System. Proceedings of the Eleventh International Conference on Language Resources and Evaluation ( LREC 2018). 2018

2018

[8] [8]

2024 , eprint=

System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective , author=. 2024 , eprint=

2024

[9] [9]

2026 , eprint=

Formal Policy Enforcement for Real-World Agentic Systems , author=. 2026 , eprint=

2026

[10] [10]

Defeating Prompt Injections by Design , url =

Debenedetti, Edoardo and Shumailov, Ilia and Fan, Tianqi and Hayes, Jamie and Carlini, Nicholas and Fabian, Daniel and Kern, Christoph and Shi, Chongyang and Terzis, Andreas and Tram\`er, Florian , booktitle =. Defeating Prompt Injections by Design , url =

[11] [11]

Forty-second International Conference on Machine Learning , year=

ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning , author=. Forty-second International Conference on Machine Learning , year=

[12] [12]

2026 , volume =

Haoyu Wang and Chris Poskitt and Jun Sun , booktitle =. 2026 , volume =

2026

[13] [13]

2025 , eprint=

Enforcing Temporal Constraints for LLM Agents , author=. 2025 , eprint=

2025

[14] [14]

2025 , eprint=

VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation , author=. 2025 , eprint=

2025

[15] [15]

2016 , issue_date =

O'donoghue, Brendan and Chu, Eric and Parikh, Neal and Boyd, Stephen , title =. 2016 , issue_date =. doi:10.1007/s10957-016-0892-3 , journal =

work page doi:10.1007/s10957-016-0892-3 2016

[16] [16]

Proceedings of the 20th International Joint Conference on Artifical Intelligence , pages =

De Raedt, Luc and Kimmig, Angelika and Toivonen, Hannu , title =. Proceedings of the 20th International Joint Conference on Artifical Intelligence , pages =. 2007 , publisher =

2007

[17] [17]

2026 , eprint=

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security , author=. 2026 , eprint=

2026

[18] [18]

2019 , isbn =

Wang, Jingbo and Sung, Chungha and Wang, Chao , title =. 2019 , isbn =. doi:10.1145/3338906.3338913 , booktitle =

work page doi:10.1145/3338906.3338913 2019

[19] [19]

Proceedings of the 20th National Conference on Artificial Intelligence - Volume 1 , pages =

Sang, Tian and Bearne, Paul and Kautz, Henry , title =. Proceedings of the 20th National Conference on Artificial Intelligence - Volume 1 , pages =. 2005 , isbn =

2005

[20] [20]

Workshops at the Thirtieth AAAI Conference on Artificial Intelligence , year=

JudgeD: a probabilistic datalog with dependencies , author=. Workshops at the Thirtieth AAAI Conference on Artificial Intelligence , year=

[21] [21]

2023 , issue_date =

Li, Ziyang and Huang, Jiani and Naik, Mayur , title =. 2023 , issue_date =. doi:10.1145/3591280 , journal =

work page doi:10.1145/3591280 2023

[22] [22]

Proceedings of the 32nd International Conference on Neural Information Processing Systems , pages =

Manhaeve, Robin and Dumancic, Sebastijan and Kimmig, Angelika and Demeester, Thomas and Raedt, Luc De , title =. Proceedings of the 32nd International Conference on Neural Information Processing Systems , pages =. 2018 , publisher =

2018

[23] [23]

Theory and Practice of Logic Programming , volume=

Inference and learning in probabilistic logic programs using weighted boolean formulas , author=. Theory and Practice of Logic Programming , volume=. 2015 , publisher=

2015

[24] [24]

2020 , issue_date =

Holtzen, Steven and Van den Broeck, Guy and Millstein, Todd , title =. 2020 , issue_date =. doi:10.1145/3428208 , journal =

work page doi:10.1145/3428208 2020

[25] [25]

Decision Support Systems , volume=

A linear programming framework for logics of uncertainty , author=. Decision Support Systems , volume=. 1996 , publisher=

1996

[26] [26]

Proceedings of the 34th USENIX Conference on Security Symposium , articleno =

Chen, Sizhe and Piet, Julien and Sitawarin, Chawin and Wagner, David , title =. Proceedings of the 34th USENIX Conference on Security Symposium , articleno =. 2025 , isbn =

2025

[27] [27]

2025 , isbn =

Chen, Sizhe and Zharmagambetov, Arman and Mahloujifar, Saeed and Chaudhuri, Kamalika and Wagner, David and Guo, Chuan , title =. 2025 , isbn =. doi:10.1145/3719027.3744836 , booktitle =

work page doi:10.1145/3719027.3744836 2025

[28] [28]

and Liskov, Barbara , title =

Myers, Andrew C. and Liskov, Barbara , title =. 1997 , isbn =. doi:10.1145/268998.266669 , booktitle =

work page doi:10.1145/268998.266669 1997

[29] [29]

Schneider

Schneider, Fred B. , title =. 2000 , issue_date =. doi:10.1145/353323.353382 , journal =

work page doi:10.1145/353323.353382 2000

[30] [30]

, title =

Denning, Dorothy E. , title =. 1976 , issue_date =. doi:10.1145/360051.360056 , journal =

work page doi:10.1145/360051.360056 1976

[31] [31]

2024 , booktitle =

Zheng, Chujie and Yin, Fan and Zhou, Hao and Meng, Fandong and Zhou, Jie and Chang, Kai-Wei and Huang, Minlie and Peng, Nanyun , title =. 2024 , booktitle =

2024

[32] [32]

Toolformer: language models can teach themselves to use tools , year =

Schick, Timo and Dwivedi-Yu, Jane and Dess\'. Toolformer: language models can teach themselves to use tools , year =. Proceedings of the 37th International Conference on Neural Information Processing Systems , articleno =

[33] [33]

IEEE Journal on selected areas in communications , volume=

Language-based information-flow security , author=. IEEE Journal on selected areas in communications , volume=. 2003 , publisher=

2003

[34] [34]

2024 , eprint=

Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents , author=. 2024 , eprint=

2024

[35] [35]

2026 , eprint=

CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution , author=. 2026 , eprint=

2026

[36] [36]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Beyond the Grounding Bottleneck: Datalog Techniques for Inference in Probabilistic Logic Programs , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2020 , month=. doi:10.1609/aaai.v34i06.6591 , abstractNote=

work page doi:10.1609/aaai.v34i06.6591 2020

[37] [37]

Declarative Probabilistic Programming with Datalog , year =

B\'. Declarative Probabilistic Programming with Datalog , year =. doi:10.1145/3132700 , journal =

work page doi:10.1145/3132700

[38] [38]

2005 , isbn =

Crampton, Jason , title =. 2005 , isbn =. doi:10.1145/1063979.1063986 , booktitle =

work page doi:10.1145/1063979.1063986 2005

[39] [39]

2023 , eprint=

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations , author=. 2023 , eprint=

2023

[40] [40]

NeMo guardrails: A toolkit for controllable and safe LLM applications with programmable rails

Rebedea, Traian and Dinu, Razvan and Sreedhar, Makesh Narsimhan and Parisien, Christopher and Cohen, Jonathan. N e M o Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2023. doi:10.18653/v1/2023.emnlp-demo.40

work page doi:10.18653/v1/2023.emnlp-demo.40 2023

[41] [41]

2024 , eprint=

Operationalizing Contextual Integrity in Privacy-Conscious Assistants , author=. 2024 , eprint=

2024

[42] [42]

2026 , url=

Hao Li and Xiaogeng Liu and CHIU Hung Chun and Dianqi Li and Ning Zhang and Chaowei Xiao , booktitle=. 2026 , url=

2026

[43] [43]

Several-Dream9346

Tsai, Lillian and Bagdasarian, Eugene , title =. 2025 , isbn =. doi:10.1145/3713082.3730378 , booktitle =

work page doi:10.1145/3713082.3730378 2025

[44] [44]

2026 , eprint=

Progent: Securing AI Agents with Privilege Control , author=. 2026 , eprint=

2026

[45] [45]

2025 , eprint=

MiniScope: A Least Privilege Framework for Authorizing Tool Calling Agents , author=. 2025 , eprint=

2025

[46] [46]

Agent Security Bench (

Hanrong Zhang and Jingyuan Huang and Kai Mei and Yifei Yao and Zhenting Wang and Chenlu Zhan and Hongwei Wang and Yongfeng Zhang , booktitle=. Agent Security Bench (. 2025 , url=

2025

[47] [47]

Findings of the Association for Computational Linguistics: ACL 2024 , pages=

Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=

2024

[48] [48]

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for

Edoardo Debenedetti and Jie Zhang and Mislav Balunovic and Luca Beurer-Kellner and Marc Fischer and Florian Tram. AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for. The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

[49] [49]

Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

Trustagent: Towards safe and trustworthy llm-based agents , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

2024

[50] [50]

Microsoft Security Blog , year =

Microsoft Defender Security Research Team and Uri Oren and Amit Eliahu and Dor Edry , title =. Microsoft Security Blog , year =

[51] [51]

O’Brien, Carrie J

Park, Joon Sung and O'Brien, Joseph and Cai, Carrie Jun and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S. , title =. 2023 , isbn =. doi:10.1145/3586183.3606763 , booktitle =

work page doi:10.1145/3586183.3606763 2023

[52] [52]

The Trail of Bits Blog , year =

Lucas Bourtoule , title =. The Trail of Bits Blog , year =

[53] [53]

Souffl \'e : On Synthesis of Program Analyzers

Jordan, Herbert and Scholz, Bernhard and Suboti \' c , Pavle. Souffl \'e : On Synthesis of Program Analyzers. Computer Aided Verification. 2016

2016

[54] [54]

The Fourteenth International Conference on Learning Representations , year=

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces , author=. The Fourteenth International Conference on Learning Representations , year=

[55] [55]

2024 , url=

Carlos E Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik R Narasimhan , booktitle=. 2024 , url=

2024

[56] [56]

Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems , series =

Constraint Integer Programming: a New Approach to Integrate CP and MIP , author =. Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems , series =. 2008 , publisher =

2008

[57] [57]

Z3: An Efficient SMT Solver

de Moura, Leonardo and Bj rner, Nikolaj. Z3: An Efficient SMT Solver. Tools and Algorithms for the Construction and Analysis of Systems. 2008

2008

[58] [58]

Proceedings of the 9th ACM Conference on Computer and Communications Security , pages =

Wagner, David and Soto, Paolo , title =. Proceedings of the 9th ACM Conference on Computer and Communications Security , pages =. 2002 , isbn =. doi:10.1145/586110.586145 , abstract =

work page doi:10.1145/586110.586145 2002

[59] [59]

, biburl =

Anderson, James P. , biburl =

[60] [60]

McCormick

Mccormick, Garth P. , title =. 1976 , issue_date =. doi:10.1007/BF01580665 , journal =

work page doi:10.1007/bf01580665 1976

[61] [61]

Cryptography and Data Security

Denning, Dorothy Elizabeth Robling. Cryptography and Data Security

[62] [62]

, booktitle=

Gray, J.W. , booktitle=. Toward a mathematical foundation for information flow security , year=