pith. machine review for the scientific record. sign in

arxiv: 2605.03034 · v1 · submitted 2026-05-04 · 💻 cs.AI · cs.CR· cs.SY· eess.SY

Recognition: 3 theorem links

· Lean Theorem

Stable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:47 UTC · model grok-4.3

classification 💻 cs.AI cs.CRcs.SYeess.SY
keywords agentic LLM controltool-mediated architecturecyber defenseLyapunov stabilityinput-to-state stabilityStackelberg best responseformal verificationautonomous security operations
0
0 comments X

The pith

Tool-mediated LLM architecture for cyber defense provides machine-checked stability guarantees and reduces attacker expected payoff by 59 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an architecture in which LLM agents for cyber defense are restricted to deterministic tools and finite action catalogs enforced at the interface. This structure supports a composite Lyapunov function that has been fully machine-checked in Lean 4, establishing controllability, observability from asymmetric sensor data, and input-to-state stability against adversarial disturbances. The same claims are verified to hold with margin across 282 real enterprise attack graphs. In paired offensive and defensive telemetry experiments the architecture yields a 59 percent drop in the attacker's expected payoff relative to a deterministic greedy baseline, with zero variance over 40 runs at four temperatures. The non-determinism of the LLM is retained for strategy exploration while the enforced tools keep the overall system bounded.

Core claim

A composite Lyapunov function machine-checked in Lean 4 with zero sorry certifies controllability, observability from asymmetric sensor data, and Input-to-State Stability (ISS) robustness under intelligent adversarial disturbance, with two corollaries extending the certificate to any controller or adversary from the catalogs. On 282 real enterprise attack graphs the claims hold with margin. On paired offensive/defensive telemetry, a tool-mediated Claude Sonnet 4 controller reduces the attacker's expected payoff by 59 percent relative to a deterministic greedy baseline with zero variance across 40 runs at four temperatures, while a Claude Haiku 4.5 controller remains catalog-bounded.

What carries the argument

The tool-mediated architecture that forces LLM agents to select only from finite action catalogs at the tool-output interface, using deterministic tools such as Stackelberg best-response, Bayesian observer updates, and attack-graph primitives, together with the machine-checked composite Lyapunov function that certifies stability properties.

If this is right

  • The two corollaries allow the stability certificate to apply to every controller and every adversary drawn from the provided catalogs.
  • The formal claims remain valid with margin on all 282 tested real enterprise attack graphs.
  • The 59 percent reduction in attacker expected payoff holds with zero variance across the reported runs and temperatures.
  • Architectural stability persists even when a less capable model such as Claude Haiku 4.5 is substituted, as long as catalog bounds are respected.
  • LLM non-determinism can still generate creative strategies while the tool enforcement layer maintains overall controllability and robustness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same catalog-and-tool pattern could be applied to other adversarial control settings such as autonomous vehicle response or industrial process protection if suitable deterministic primitives are supplied.
  • Zero observed variance across temperatures indicates that the enforcement layer successfully removes output stochasticity from the decision loop.
  • Expanding the action catalog while preserving the Lean-checked Lyapunov properties would be a direct testable extension.
  • Deployment in live security operations centers would reveal whether the reported payoff reduction persists when sensor data and graph structures deviate from the 282-graph test set.

Load-bearing premise

The LLM will always interface correctly with the tools and choose only from the finite action catalog without bypassing the enforcement mechanism, and that the deterministic tools accurately model the real-world cyber environment including asymmetric sensor data.

What would settle it

Repeating the 40-run paired telemetry experiment with a different LLM controller or on new attack graphs and observing that the attacker's expected payoff reduction falls below 59 percent or exhibits non-zero variance would falsify the empirical performance claim; finding a step in the Lean 4 certificate that cannot be discharged would falsify the formal stability guarantee.

Figures

Figures reproduced from arXiv: 2605.03034 by Amy Villase\~nor, Anton Foltz, Cameron Denton, Joshua Knox, Kerri Prinos, Lilianne Brush, Snehal Antani, Zhanqi Wang.

Figure 1
Figure 1. Figure 1: Experiment 1 results on 282 graphs. (a) Plant trajectory view at source ↗
Figure 2
Figure 2. Figure 2: Within-family scaling of LLM stability. (a) Sonnet 4: all 40 runs converge to view at source ↗
read the original abstract

Agentic systems involved in high-stake decision-making under adversarial pressure need formal guarantees not offered by existing approaches. Motivated by the operational needs of security operations centers (SOCs) that must configure endpoint detection and response (EDR) policies under adversarial pressure, we present a tool-mediated architecture: LLM agents use deterministic tools (Stackelberg best-response, Bayesian observer updates, attack-graph primitives) and select from finite action catalogs enforced at the tool-output interface. A composite Lyapunov function machine-checked in Lean 4 with zero sorry certifies controllability, observability from asymmetric sensor data, and Input-to-State Stability (ISS) robustness under intelligent adversarial disturbance, with two corollaries extending the certificate to any controller or adversary from the catalogs. On 282 real enterprise attack graphs, the claims hold with margin. On paired offensive/defensive telemetry, a tool-mediated Claude Sonnet 4 controller reduces the attacker's expected payoff (game value) by 59% relative to a deterministic greedy baseline, with zero variance across 40 runs at four temperatures. A Claude Haiku 4.5 controller converges to suboptimal game values but stays catalog-bounded over an additional 40 runs, demonstrating that architectural stability is not dependent on the controller capability. The LLM agent's non-determinism furthers creative exploration of strategies, while the tool-mediated architecture ensures system stability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a tool-mediated architecture for LLM-based agents in autonomous cyber defense, where LLMs select actions from finite catalogs enforced at the deterministic tool interface (including Stackelberg best-response, Bayesian observer updates, and attack-graph primitives). A composite Lyapunov function, machine-checked in Lean 4 with zero 'sorry' statements, certifies controllability, observability from asymmetric sensor data, and Input-to-State Stability (ISS) under intelligent adversarial disturbances, with two corollaries extending the certificate to arbitrary catalog controllers or adversaries. On 282 real enterprise attack graphs the claims hold with margin; on paired offensive/defensive telemetry a Claude Sonnet 4 controller reduces attacker expected payoff by 59% versus a deterministic greedy baseline with zero variance over 40 runs at four temperatures, while a weaker Haiku model remains catalog-bounded.

Significance. If the formal certificate applies to the full agentic loop and the empirical gains prove robust, the work offers a concrete route to stable high-stakes agentic control by combining LLM exploration with provable robustness. The zero-sorry Lean proof, explicit corollaries for catalog generality, and evaluation on real attack graphs constitute clear strengths that raise the bar for future agentic cyber-defense research.

major comments (2)
  1. [Architecture and Formal Model sections] Architecture and Formal Model sections: The composite Lyapunov function and its ISS certificate are derived under the assumption that every LLM output is a catalog-compliant action executed exactly by the deterministic tools. No lemma, probability bound, or interface invariant is supplied establishing that the LLM will always produce such outputs rather than malformed strings that could bypass enforcement. This assumption is load-bearing for transferring the controllability/ISS guarantees to the actual agentic controller.
  2. [Empirical Evaluation section] Empirical Evaluation section: The 59% reduction in game value is reported with zero variance across 40 runs, yet the manuscript provides no per-run game-value distributions, confidence intervals, or statistical comparison (e.g., paired t-test or Wilcoxon) against the greedy baseline. Without these, it is impossible to assess whether the reported margin is statistically reliable or sensitive to modeling choices in the Stackelberg/Bayesian components.
minor comments (2)
  1. [Abstract and Results] The abstract and results text repeatedly state 'zero variance' without clarifying the precise metric (game value, payoff difference, or trajectory length) or whether the zero is exact or within floating-point tolerance.
  2. [Formal Model] Notation for the composite Lyapunov function (V = V_c + V_o + V_iss) is introduced without an explicit equation number or cross-reference to the Lean definitions, making it harder to trace the machine-checked statements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of the formal and empirical contributions and for the constructive major comments. We address each point below, indicating the changes we will make to the manuscript.

read point-by-point responses
  1. Referee: The composite Lyapunov function and its ISS certificate are derived under the assumption that every LLM output is a catalog-compliant action executed exactly by the deterministic tools. No lemma, probability bound, or interface invariant is supplied establishing that the LLM will always produce such outputs rather than malformed strings that could bypass enforcement. This assumption is load-bearing for transferring the controllability/ISS guarantees to the actual agentic controller.

    Authors: We agree that the transfer of guarantees relies on the interface enforcement. The architecture includes a deterministic parser at the tool interface that validates each LLM-generated string against the finite catalog; malformed or out-of-catalog outputs are rejected and replaced by a default catalog action (the no-op or a pre-specified safe action). This replacement is itself a catalog member, preserving the assumptions of the Lyapunov certificate. We will revise the Architecture section to explicitly describe this validation and replacement mechanism and add a remark that the ISS certificate holds under this interface invariant. A probabilistic model of LLM compliance rates is outside the paper's scope, as the focus is on the stability provided by the tool mediation regardless of the specific LLM. revision: yes

  2. Referee: The 59% reduction in game value is reported with zero variance across 40 runs, yet the manuscript provides no per-run game-value distributions, confidence intervals, or statistical comparison (e.g., paired t-test or Wilcoxon) against the greedy baseline. Without these, it is impossible to assess whether the reported margin is statistically reliable or sensitive to modeling choices in the Stackelberg/Bayesian components.

    Authors: The reported zero variance means that the game value was identical in all 40 runs (across four temperatures), which already indicates strong robustness to the LLM's sampling variability. We will add a supplementary table listing the per-run values (all equal to the reported figure) and include a note that, with zero sample variance, the 59% reduction is a point estimate with zero-width confidence interval. Since the baseline is deterministic, a paired comparison is not applicable in the usual sense, but the consistent superiority across runs supports reliability. We will also add a brief discussion confirming that the Stackelberg best-response and Bayesian updates are fixed components and that the result is insensitive to their specific parameterizations within the evaluated range. revision: yes

Circularity Check

0 steps flagged

No significant circularity; formal certificate is independent of LLM behavior.

full rationale

The paper's central derivation is a composite Lyapunov function that is machine-checked in Lean 4 with zero sorry, establishing controllability, observability from asymmetric sensors, and ISS for the deterministic tool-mediated closed-loop system. Two corollaries extend the certificate to any catalog controller or adversary without additional assumptions beyond catalog membership. This formal content is self-contained and does not reduce to a fit, self-definition, or self-citation chain; the experimental results on independent attack graphs and telemetry are reported separately and do not feed back into the certificate. The unformalized assumption of LLM catalog compliance is an architectural claim rather than a load-bearing step in the mathematical derivation itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach relies on assumptions about tool determinism and the validity of the custom Lyapunov certificate rather than introducing new free parameters or entities.

axioms (2)
  • domain assumption Deterministic tools and finite action catalogs can be enforced at the LLM output interface.
    This is central to preventing non-determinism from affecting stability.
  • ad hoc to paper The composite Lyapunov function correctly captures controllability, observability, and ISS under adversarial conditions.
    The paper constructs this function and claims machine-checking, but details not available.

pith-pipeline@v0.9.0 · 5577 in / 1571 out tokens · 85788 ms · 2026-05-08T17:47:57.578801+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 26 canonical work pages · 5 internal anchors

  1. [1]

    2026 global threat report: Year of the evasive adversary, 2026

    CrowdStrike. 2026 global threat report: Year of the evasive adversary, 2026. URL https://www. crowdstrike.com/explore/2026-global-threat-report/2026-global-threat-report? utm_medium=org

  2. [2]

    Finding efficient security strategies through reinforcement learning and self-play.arXiv, 2020

    Kim Hammar and Rolf Stadler. Finding efficient security strategies through reinforcement learning and self-play.arXiv, 2020. URLhttps://arxiv.org/pdf/2009.08120

  3. [3]

    Disrupting the first reported ai-orchestrated cyber espionage campaign

    Anthropic. Disrupting the first reported ai-orchestrated cyber espionage campaign

  4. [4]

    URL https://assets.anthropic.com/m/ec212e6566a0d47/original/ Disrupting-the-first-reported-AI-orchestrated-cyber-espionage-campaign.pdf

  5. [5]

    How sentinelone’s ai edr autonomously discovered and stopped anthropic’s claude from executing a zero-day supply chain attack globally, 2026

    SentinelOne. How sentinelone’s ai edr autonomously discovered and stopped anthropic’s claude from executing a zero-day supply chain attack globally, 2026. URL https://www.sentinelone.com/blog/ how-sentinelones-ai-edr-autonomously-discovered-and-stopped-anthropics-claude-from-executing-a-zero-day-supply-chain-attack-globally/

  6. [6]

    A control-theoretic foundation for agentic systems, 2026

    Ali Eslami and Jiangbo Yu. A control-theoretic foundation for agentic systems, 2026. URL https: //arxiv.org/html/2603.10779

  7. [7]

    Deterministic

    Non-Determinism of “Deterministic” LLM System Settings in Hosted Environments, 2025. URL https: //aclanthology.org/2025.eval4nlp-1.12/

  8. [8]

    Agents of Chaos

    Natalie Shapira, Chris Wendler, Avery Yen, Gabriele Sarti, Koyena Pal, Olivia FLoody, Adam Belfki, Alex Loftus, Aditya Ratan Jannali, Nikhil Prakash, Jasmine Cui, Giordano Rogers, Jannik Brinkmann, Can Rager, Amir Zur, Michael Ripa, Aruna Sankaranarayanan, David Atkinson, Rohit Gandikota, Jaden Fiotto-Kaufman, EunJeong Hwang, Hadas Orgad, P Sam Sahil, Neg...

  9. [9]

    Game theory meets llm and agentic ai: Reimagining cybersecurity for the age of intelligent threats.arXiv, 2025

    Quanyan Zhu. Game theory meets llm and agentic ai: Reimagining cybersecurity for the age of intelligent threats.arXiv, 2025. URLhttps://arxiv.org/abs/2507.10621

  10. [10]

    arXiv preprint arXiv:2405.06624 , year =

    David “davidad” Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, and Joshua Tenenbaum. Towards guaranteed safe ai: A framework for ensuring robust and reliable ai systems.a...

  11. [11]

    AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents

    Christopher M. Poskitt Haoyu Wang and Jun Sun. Agentspec: Customizable runtime enforcement for safe and reliable llm agents.arXiv, 2025. URLhttps://arxiv.org/abs/2503.18666

  12. [12]

    Enforcing temporal constraints for llm agents.arXiv, 2025

    Adharsh Kamath, Sishen Zhang, Calvin Xu, Shubham Ugare, Gagandeep Singh, and Sasa Misailovic. Enforcing temporal constraints for llm agents.arXiv, 2025. URL https://arxiv.org/pdf/2512. 23738

  13. [13]

    Agent Behavioral Contracts: Formal Specification and Runtime Enforcement,

    Varun Pratap Bhardwaj. Agent behavioral contracts: Formal specification and runtime enforcement for reliable autonomous ai agents.arXiv, 2026. URLhttps://arxiv.org/pdf/2602.22302

  14. [14]

    Type-checked compliance: Deterministic guardrails for agentic financial systems using lean 4 theorem proving.arXiv, 2026

    Devakh Rashie and Veda Rashi. Type-checked compliance: Deterministic guardrails for agentic financial systems using lean 4 theorem proving.arXiv, 2026. URLhttps://arxiv.org/abs/2604.01483

  15. [15]

    URL https://proceedings

    A Lyapunov-based Approach to Safe Reinforcement Learning, 2018. URL https://proceedings. neurips.cc/paper_files/paper/2018/file/4fe5149039b52765bde64beb9f674940-Paper. pdf

  16. [16]

    URL https://proceedings.neurips.cc/paper_files/paper/2017/file/ 766ebcd59621e305170616ba3d3dac32-Paper.pdf

    Safe Model-Based Reinforcement Learning with Stability Guarantees, 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/ 766ebcd59621e305170616ba3d3dac32-Paper.pdf

  17. [17]

    Actsafe: Active exploration with safety constraints for reinforcement learning.arXiv, 2024

    Yarden As, Bhavya Sukhija, Lenart Treven, Carmelo Sferrazza, Stelian Coros, and Andreas Krause. Actsafe: Active exploration with safety constraints for reinforcement learning.arXiv, 2024. URL https://arxiv.org/abs/2410.09486. 10

  18. [18]

    A lyapunov-based adap- tive control framework for discrete-time non-linear systems with exogenous disturbances.In- ternational Journal of Control, 77(3):250–263, 2004

    Wassim M.Haddad Tomohisa Hayakawa and Alexander Leonessa. A lyapunov-based adap- tive control framework for discrete-time non-linear systems with exogenous disturbances.In- ternational Journal of Control, 77(3):250–263, 2004. URL https://scispace.com/papers/ a-lyapunov-based-adaptive-control-framework-for-discrete-1mzw8c3u14

  19. [19]

    Input-to-state stability for discrete-time nonlinear systems.Automat- ica, 37(6):857–869, 2001

    Zhang-Ping Jiang and Yuan Wang. Input-to-state stability for discrete-time nonlinear systems.Automat- ica, 37(6):857–869, 2001. URL https://www.sciencedirect.com/science/article/abs/pii/ S0005109801000280

  20. [20]

    Games-in-games principle for cyber-physical resilience.IEEE Control Sytems Magazine, 35(1):46–65, 2015

    Quanyan Zhu and Tamer Basar. Games-in-games principle for cyber-physical resilience.IEEE Control Sytems Magazine, 35(1):46–65, 2015. URLhttps://ieeexplore.ieee.org/document/7011006

  21. [21]

    On generalized stackelberg strategies.Journal of Optimization Theory and Applications, 26: 637–643, 1978

    G Leitmann. On generalized stackelberg strategies.Journal of Optimization Theory and Applications, 26: 637–643, 1978. URLhttps://link.springer.com/article/10.1007/BF00933155

  22. [22]

    URL https://dl.acm.org/doi/10.5555/3304652

    Survey of Stackelberg Security Games, 2018. URL https://dl.acm.org/doi/10.5555/3304652. 3304789

  23. [23]

    URL https://dl.acm.org/doi/ 10.5555/2832249.2832322

    Optimal network security hardening using attack graph games, 2015. URL https://dl.acm.org/doi/ 10.5555/2832249.2832322

  24. [24]

    Input-to-state stability for discrete-time nonlinear systems.IEEE Transactions on Information Forensics and Security, 13(10):2490–2505, 2018

    Demosthenis Teneketzis Erik Miehling, Mohammad Rasouli. Input-to-state stability for discrete-time nonlinear systems.IEEE Transactions on Information Forensics and Security, 13(10):2490–2505, 2018. URLhttps://ieeexplore.ieee.org/document/8325528

  25. [25]

    URL https://dl.acm.org/doi/10.5555/ 3041838.3041906

    Learning to Search Better than Your Teacher, 2003. URL https://dl.acm.org/doi/10.5555/ 3041838.3041906

  26. [26]

    A double oracle algorithm for zero-sum security games on graphs, 2011

    Manish Jain, Dmytro Korzhyk, Ondˇrej Vanek, Vincent Conitzer, Michal Pechoucek, and Milind Tambe. A double oracle algorithm for zero-sum security games on graphs, 2011. URL https://www.cs.cmu. edu/~conitzer/graph_securityAAMAS11.pdf

  27. [27]

    URL https://dl.acm

    A Scalable Double Oracle for Hardening Large Active Directory Systems, 2023. URL https://dl.acm. org/doi/10.1145/3579856.3590343

  28. [28]

    Kevin Wood

    R. Kevin Wood. Deterministic network interdiction.Mathematical and Computer Modelling, 17(2):1–18,

  29. [29]

    URLhttps://apps.dtic.mil/sti/pdfs/ADA487308.pdf

  30. [30]

    Romano and Lacra Pavel

    Andrew R. Romano and Lacra Pavel. Dynamic ne seeking for multi-integrator networked agents with disturbance rejection.arXiv, 2019. URLhttps://arxiv.org/pdf/1903.02587

  31. [31]

    What is edr? endpoint detection & response defined | crowdstrike,

    Anne Aarness. What is edr? endpoint detection & response defined | crowdstrike,

  32. [32]

    URL https://www.crowdstrike.com/en-us/cybersecurity-101/endpoint-security/ endpoint-detection-and-response-edr/

  33. [33]

    URL https://dl.acm.org/doi/10.1145/3634737.3645012?__cf_chl_ tk=.86c2C1ZEfMkg0vaSnkaKbugkY4QLrNj6JcmGOjNIsI-1775786335-1.0.1.1-HvuTRKtAOs_ hMOOL9a39uxjxZ5mKEFwnF8OOohWildI

    Decoding the MITRE ATT&CK Enterprise Evaluation: An Analysis of EDR Performance in Real-World Environments, 2024. URL https://dl.acm.org/doi/10.1145/3634737.3645012?__cf_chl_ tk=.86c2C1ZEfMkg0vaSnkaKbugkY4QLrNj6JcmGOjNIsI-1775786335-1.0.1.1-HvuTRKtAOs_ hMOOL9a39uxjxZ5mKEFwnF8OOohWildI

  34. [34]

    Outkin, Patricia V

    Alexander V . Outkin, Patricia V . Schulz, Timothy Schulz, Thomas D. Tarman, and Ali Pinar. Defender policy evaluation and resource allocation with mitre att&ck evaluation data.IEEE Transactions on Dependable and Secure Computing, 20(3):1909–1926, 2023. URL https://ieeexplore.ieee.org/ document/9758675

  35. [35]

    Cbdra-is: Centrality-based defense resource allocation for securing interdependent systems.ACM Transactions on Privacy and Security, 28(3):1–44, 2025

    Mohammad Aleiadeh and Mustafa Abdallah. Cbdra-is: Centrality-based defense resource allocation for securing interdependent systems.ACM Transactions on Privacy and Security, 28(3):1–44, 2025. URL https://dl.acm.org/doi/10.1145/3736760

  36. [36]

    A probabilistic cost-benefit analysis for cyberattack path evaluation.Reliability Engineering & System Safety, 263, 2025

    Jinghan Zhang, Enrico Zio, Chiye Ma, Kang Liu, and Wei Wang. A probabilistic cost-benefit analysis for cyberattack path evaluation.Reliability Engineering & System Safety, 263, 2025. URL https: //www.sciencedirect.com/science/article/abs/pii/S0951832025004569

  37. [37]

    ReAct: Synergizing Reasoning and Acting in Language Models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. Re- act: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023. URLhttps://arxiv.org/abs/2210.03629. 11

  38. [38]

    Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

    Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, K...

  39. [39]

    Inner Monologue: Embodied Reasoning through Planning with Language Models

    Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, and Brian Ichter. Inner monologue: Embodied reasoning through planning with language models.arXiv, 2022. URLhttps://arxiv.org/abs/2207.05608

  40. [40]

    Victor Mayoral-Vilches, Maria Sanz-Gomez, Francesco Balassone, Stefan Rass, Lidia Salas-Espejo, Benjamin Jablonski, Luis Javier Navarrete-Lozano, Maite del Mundo de Torres, and Cristobal R. J. Veas Chavez. Cybersecurity ai: A game-theoretic ai for guiding attack and defense.arXiv, 2026. URL https://arxiv.org/abs/2601.05887

  41. [41]

    MaMa: A game-theoretic approach for designing safe agentic systems.arXiv, 2026

    Adish Singla Jonathan, Nöther and Goran Radanovic. MaMa: A game-theoretic approach for designing safe agentic systems.arXiv, 2026. URLhttps://arxiv.org/abs/2602.04431

  42. [42]

    Nodezero: The ultimate endpoint security platform, 2026

    Horizon3.ai. Nodezero: The ultimate endpoint security platform, 2026. URL https://www.horizon3. ai/nodezero/

  43. [43]

    MITRE ATT&CK: Enterprise matrix, 2024

    MITRE Corporation. MITRE ATT&CK: Enterprise matrix, 2024. URL https://attack.mitre.org/ matrices/enterprise/

  44. [44]

    Microsoft Defender XDR: Security Configuration and Attack Surface Reduction Guidance.https://learn.microsoft.com/en-us/defender-xdr/, 2026

    Microsoft Corporation. Microsoft Defender XDR: Security Configuration and Attack Surface Reduction Guidance.https://learn.microsoft.com/en-us/defender-xdr/, 2026. Accessed 2026-04

  45. [45]

    EDR Telemetry Comparison Matrix

    EDR Telemetry Project Contributors. EDR Telemetry Comparison Matrix. https://github.com/ tsale/EDR-Telemetry, 2026. Community-maintained telemetry-fidelity reference

  46. [46]

    Configure security settings in Microsoft Defender for Endpoint on Linux

    Microsoft. Configure security settings in Microsoft Defender for Endpoint on Linux. https://learn. microsoft.com/en-us/defender-endpoint/linux-preferences, 2026. Accessed: 2026-04-27. A Formal Verification of Closed-Loop Stability We formally verify the stability guarantees of Theorem 1 using the Lean 4 proof assistant with the Mathlib mathematical librar...

  47. [47]

    14 Table A3: Distribution statistics for the 282 valid benchmark graphs

    Node set.Vertices correspond to attack events (one per logged action), plus two virtual nodes: ENTRY (representing the attacker’s initial access point) andOBJECTIVE (representing the compromise goal, typically domain admin or sensitive data exfiltration). 14 Table A3: Distribution statistics for the 282 valid benchmark graphs. Quantity Min Median Mean Max...

  48. [48]

    Edge derivation.Edges are derived from three sources: (i) temporal ordering within each host (foothold → post-exploitation → objective), (ii) cross-host credential flow inferred from credential dumps matched to subsequent logons, and (iii) causal parent-child links from the penetration test platform’s attack chain data

  49. [49]

    Edge attributes.Each edge carries a MITRE ATT&CK technique label, anattacker payoff(derived from technique impact score and host criticality), ablock probability(policy effectiveness from the enrichment pipeline, capped at 0.95), adetection probability(flat baseline0.1), and a mapping from policy IDs to effectiveness values

  50. [50]

    Sanitized output.The final artifact is a JSON file per graph consumable by the experiment runner without access to raw pentest data. B.3 Filtering criteria Of the 300 exported graphs, 18 are excluded as degenerate inputs and 282 are retained for evaluation: • 14 graphs excluded for S <0.01 : the attacker has no viable path to the objective before any poli...