Recognition: 3 theorem links
· Lean TheoremStable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense
Pith reviewed 2026-05-08 17:47 UTC · model grok-4.3
The pith
Tool-mediated LLM architecture for cyber defense provides machine-checked stability guarantees and reduces attacker expected payoff by 59 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A composite Lyapunov function machine-checked in Lean 4 with zero sorry certifies controllability, observability from asymmetric sensor data, and Input-to-State Stability (ISS) robustness under intelligent adversarial disturbance, with two corollaries extending the certificate to any controller or adversary from the catalogs. On 282 real enterprise attack graphs the claims hold with margin. On paired offensive/defensive telemetry, a tool-mediated Claude Sonnet 4 controller reduces the attacker's expected payoff by 59 percent relative to a deterministic greedy baseline with zero variance across 40 runs at four temperatures, while a Claude Haiku 4.5 controller remains catalog-bounded.
What carries the argument
The tool-mediated architecture that forces LLM agents to select only from finite action catalogs at the tool-output interface, using deterministic tools such as Stackelberg best-response, Bayesian observer updates, and attack-graph primitives, together with the machine-checked composite Lyapunov function that certifies stability properties.
If this is right
- The two corollaries allow the stability certificate to apply to every controller and every adversary drawn from the provided catalogs.
- The formal claims remain valid with margin on all 282 tested real enterprise attack graphs.
- The 59 percent reduction in attacker expected payoff holds with zero variance across the reported runs and temperatures.
- Architectural stability persists even when a less capable model such as Claude Haiku 4.5 is substituted, as long as catalog bounds are respected.
- LLM non-determinism can still generate creative strategies while the tool enforcement layer maintains overall controllability and robustness.
Where Pith is reading between the lines
- The same catalog-and-tool pattern could be applied to other adversarial control settings such as autonomous vehicle response or industrial process protection if suitable deterministic primitives are supplied.
- Zero observed variance across temperatures indicates that the enforcement layer successfully removes output stochasticity from the decision loop.
- Expanding the action catalog while preserving the Lean-checked Lyapunov properties would be a direct testable extension.
- Deployment in live security operations centers would reveal whether the reported payoff reduction persists when sensor data and graph structures deviate from the 282-graph test set.
Load-bearing premise
The LLM will always interface correctly with the tools and choose only from the finite action catalog without bypassing the enforcement mechanism, and that the deterministic tools accurately model the real-world cyber environment including asymmetric sensor data.
What would settle it
Repeating the 40-run paired telemetry experiment with a different LLM controller or on new attack graphs and observing that the attacker's expected payoff reduction falls below 59 percent or exhibits non-zero variance would falsify the empirical performance claim; finding a step in the Lean 4 certificate that cannot be discharged would falsify the formal stability guarantee.
Figures
read the original abstract
Agentic systems involved in high-stake decision-making under adversarial pressure need formal guarantees not offered by existing approaches. Motivated by the operational needs of security operations centers (SOCs) that must configure endpoint detection and response (EDR) policies under adversarial pressure, we present a tool-mediated architecture: LLM agents use deterministic tools (Stackelberg best-response, Bayesian observer updates, attack-graph primitives) and select from finite action catalogs enforced at the tool-output interface. A composite Lyapunov function machine-checked in Lean 4 with zero sorry certifies controllability, observability from asymmetric sensor data, and Input-to-State Stability (ISS) robustness under intelligent adversarial disturbance, with two corollaries extending the certificate to any controller or adversary from the catalogs. On 282 real enterprise attack graphs, the claims hold with margin. On paired offensive/defensive telemetry, a tool-mediated Claude Sonnet 4 controller reduces the attacker's expected payoff (game value) by 59% relative to a deterministic greedy baseline, with zero variance across 40 runs at four temperatures. A Claude Haiku 4.5 controller converges to suboptimal game values but stays catalog-bounded over an additional 40 runs, demonstrating that architectural stability is not dependent on the controller capability. The LLM agent's non-determinism furthers creative exploration of strategies, while the tool-mediated architecture ensures system stability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a tool-mediated architecture for LLM-based agents in autonomous cyber defense, where LLMs select actions from finite catalogs enforced at the deterministic tool interface (including Stackelberg best-response, Bayesian observer updates, and attack-graph primitives). A composite Lyapunov function, machine-checked in Lean 4 with zero 'sorry' statements, certifies controllability, observability from asymmetric sensor data, and Input-to-State Stability (ISS) under intelligent adversarial disturbances, with two corollaries extending the certificate to arbitrary catalog controllers or adversaries. On 282 real enterprise attack graphs the claims hold with margin; on paired offensive/defensive telemetry a Claude Sonnet 4 controller reduces attacker expected payoff by 59% versus a deterministic greedy baseline with zero variance over 40 runs at four temperatures, while a weaker Haiku model remains catalog-bounded.
Significance. If the formal certificate applies to the full agentic loop and the empirical gains prove robust, the work offers a concrete route to stable high-stakes agentic control by combining LLM exploration with provable robustness. The zero-sorry Lean proof, explicit corollaries for catalog generality, and evaluation on real attack graphs constitute clear strengths that raise the bar for future agentic cyber-defense research.
major comments (2)
- [Architecture and Formal Model sections] Architecture and Formal Model sections: The composite Lyapunov function and its ISS certificate are derived under the assumption that every LLM output is a catalog-compliant action executed exactly by the deterministic tools. No lemma, probability bound, or interface invariant is supplied establishing that the LLM will always produce such outputs rather than malformed strings that could bypass enforcement. This assumption is load-bearing for transferring the controllability/ISS guarantees to the actual agentic controller.
- [Empirical Evaluation section] Empirical Evaluation section: The 59% reduction in game value is reported with zero variance across 40 runs, yet the manuscript provides no per-run game-value distributions, confidence intervals, or statistical comparison (e.g., paired t-test or Wilcoxon) against the greedy baseline. Without these, it is impossible to assess whether the reported margin is statistically reliable or sensitive to modeling choices in the Stackelberg/Bayesian components.
minor comments (2)
- [Abstract and Results] The abstract and results text repeatedly state 'zero variance' without clarifying the precise metric (game value, payoff difference, or trajectory length) or whether the zero is exact or within floating-point tolerance.
- [Formal Model] Notation for the composite Lyapunov function (V = V_c + V_o + V_iss) is introduced without an explicit equation number or cross-reference to the Lean definitions, making it harder to trace the machine-checked statements.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of the formal and empirical contributions and for the constructive major comments. We address each point below, indicating the changes we will make to the manuscript.
read point-by-point responses
-
Referee: The composite Lyapunov function and its ISS certificate are derived under the assumption that every LLM output is a catalog-compliant action executed exactly by the deterministic tools. No lemma, probability bound, or interface invariant is supplied establishing that the LLM will always produce such outputs rather than malformed strings that could bypass enforcement. This assumption is load-bearing for transferring the controllability/ISS guarantees to the actual agentic controller.
Authors: We agree that the transfer of guarantees relies on the interface enforcement. The architecture includes a deterministic parser at the tool interface that validates each LLM-generated string against the finite catalog; malformed or out-of-catalog outputs are rejected and replaced by a default catalog action (the no-op or a pre-specified safe action). This replacement is itself a catalog member, preserving the assumptions of the Lyapunov certificate. We will revise the Architecture section to explicitly describe this validation and replacement mechanism and add a remark that the ISS certificate holds under this interface invariant. A probabilistic model of LLM compliance rates is outside the paper's scope, as the focus is on the stability provided by the tool mediation regardless of the specific LLM. revision: yes
-
Referee: The 59% reduction in game value is reported with zero variance across 40 runs, yet the manuscript provides no per-run game-value distributions, confidence intervals, or statistical comparison (e.g., paired t-test or Wilcoxon) against the greedy baseline. Without these, it is impossible to assess whether the reported margin is statistically reliable or sensitive to modeling choices in the Stackelberg/Bayesian components.
Authors: The reported zero variance means that the game value was identical in all 40 runs (across four temperatures), which already indicates strong robustness to the LLM's sampling variability. We will add a supplementary table listing the per-run values (all equal to the reported figure) and include a note that, with zero sample variance, the 59% reduction is a point estimate with zero-width confidence interval. Since the baseline is deterministic, a paired comparison is not applicable in the usual sense, but the consistent superiority across runs supports reliability. We will also add a brief discussion confirming that the Stackelberg best-response and Bayesian updates are fixed components and that the result is insensitive to their specific parameterizations within the evaluated range. revision: yes
Circularity Check
No significant circularity; formal certificate is independent of LLM behavior.
full rationale
The paper's central derivation is a composite Lyapunov function that is machine-checked in Lean 4 with zero sorry, establishing controllability, observability from asymmetric sensors, and ISS for the deterministic tool-mediated closed-loop system. Two corollaries extend the certificate to any catalog controller or adversary without additional assumptions beyond catalog membership. This formal content is self-contained and does not reduce to a fit, self-definition, or self-citation chain; the experimental results on independent attack graphs and telemetry are reported separately and do not feed back into the certificate. The unformalized assumption of LLM catalog compliance is an architectural claim rather than a load-bearing step in the mathematical derivation itself.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Deterministic tools and finite action catalogs can be enforced at the LLM output interface.
- ad hoc to paper The composite Lyapunov function correctly captures controllability, observability, and ISS under adversarial conditions.
Lean theorems connected to this paper
-
IndisputableMonolith.Cost (Jcost = ½(x+x⁻¹)−1)washburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We define a composite Lyapunov function V(k) as the sum of the game value S(k) and a weighted aggregate of edge uncertainties Pe(k)... V(k) = S(k) + λθ(k), λ > 0
-
IndisputableMonolith.Foundation (parameter-free forcing chain)reality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Shared hyperparameters. B = 3, R = 0.05, ε_innov = 0.05, ε_V = 10^-4, max 10 rounds, λ = 1.0, seed 42.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
2026 global threat report: Year of the evasive adversary, 2026
CrowdStrike. 2026 global threat report: Year of the evasive adversary, 2026. URL https://www. crowdstrike.com/explore/2026-global-threat-report/2026-global-threat-report? utm_medium=org
2026
-
[2]
Finding efficient security strategies through reinforcement learning and self-play.arXiv, 2020
Kim Hammar and Rolf Stadler. Finding efficient security strategies through reinforcement learning and self-play.arXiv, 2020. URLhttps://arxiv.org/pdf/2009.08120
-
[3]
Disrupting the first reported ai-orchestrated cyber espionage campaign
Anthropic. Disrupting the first reported ai-orchestrated cyber espionage campaign
-
[4]
URL https://assets.anthropic.com/m/ec212e6566a0d47/original/ Disrupting-the-first-reported-AI-orchestrated-cyber-espionage-campaign.pdf
-
[5]
How sentinelone’s ai edr autonomously discovered and stopped anthropic’s claude from executing a zero-day supply chain attack globally, 2026
SentinelOne. How sentinelone’s ai edr autonomously discovered and stopped anthropic’s claude from executing a zero-day supply chain attack globally, 2026. URL https://www.sentinelone.com/blog/ how-sentinelones-ai-edr-autonomously-discovered-and-stopped-anthropics-claude-from-executing-a-zero-day-supply-chain-attack-globally/
2026
-
[6]
A control-theoretic foundation for agentic systems, 2026
Ali Eslami and Jiangbo Yu. A control-theoretic foundation for agentic systems, 2026. URL https: //arxiv.org/html/2603.10779
-
[7]
Deterministic
Non-Determinism of “Deterministic” LLM System Settings in Hosted Environments, 2025. URL https: //aclanthology.org/2025.eval4nlp-1.12/
2025
-
[8]
Natalie Shapira, Chris Wendler, Avery Yen, Gabriele Sarti, Koyena Pal, Olivia FLoody, Adam Belfki, Alex Loftus, Aditya Ratan Jannali, Nikhil Prakash, Jasmine Cui, Giordano Rogers, Jannik Brinkmann, Can Rager, Amir Zur, Michael Ripa, Aruna Sankaranarayanan, David Atkinson, Rohit Gandikota, Jaden Fiotto-Kaufman, EunJeong Hwang, Hadas Orgad, P Sam Sahil, Neg...
work page internal anchor Pith review arXiv 2026
-
[9]
Quanyan Zhu. Game theory meets llm and agentic ai: Reimagining cybersecurity for the age of intelligent threats.arXiv, 2025. URLhttps://arxiv.org/abs/2507.10621
-
[10]
arXiv preprint arXiv:2405.06624 , year =
David “davidad” Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, and Joshua Tenenbaum. Towards guaranteed safe ai: A framework for ensuring robust and reliable ai systems.a...
-
[11]
AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents
Christopher M. Poskitt Haoyu Wang and Jun Sun. Agentspec: Customizable runtime enforcement for safe and reliable llm agents.arXiv, 2025. URLhttps://arxiv.org/abs/2503.18666
work page internal anchor Pith review arXiv 2025
-
[12]
Enforcing temporal constraints for llm agents.arXiv, 2025
Adharsh Kamath, Sishen Zhang, Calvin Xu, Shubham Ugare, Gagandeep Singh, and Sasa Misailovic. Enforcing temporal constraints for llm agents.arXiv, 2025. URL https://arxiv.org/pdf/2512. 23738
2025
-
[13]
Agent Behavioral Contracts: Formal Specification and Runtime Enforcement,
Varun Pratap Bhardwaj. Agent behavioral contracts: Formal specification and runtime enforcement for reliable autonomous ai agents.arXiv, 2026. URLhttps://arxiv.org/pdf/2602.22302
-
[14]
Devakh Rashie and Veda Rashi. Type-checked compliance: Deterministic guardrails for agentic financial systems using lean 4 theorem proving.arXiv, 2026. URLhttps://arxiv.org/abs/2604.01483
-
[15]
URL https://proceedings
A Lyapunov-based Approach to Safe Reinforcement Learning, 2018. URL https://proceedings. neurips.cc/paper_files/paper/2018/file/4fe5149039b52765bde64beb9f674940-Paper. pdf
2018
-
[16]
URL https://proceedings.neurips.cc/paper_files/paper/2017/file/ 766ebcd59621e305170616ba3d3dac32-Paper.pdf
Safe Model-Based Reinforcement Learning with Stability Guarantees, 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/ 766ebcd59621e305170616ba3d3dac32-Paper.pdf
2017
-
[17]
Actsafe: Active exploration with safety constraints for reinforcement learning.arXiv, 2024
Yarden As, Bhavya Sukhija, Lenart Treven, Carmelo Sferrazza, Stelian Coros, and Andreas Krause. Actsafe: Active exploration with safety constraints for reinforcement learning.arXiv, 2024. URL https://arxiv.org/abs/2410.09486. 10
-
[18]
A lyapunov-based adap- tive control framework for discrete-time non-linear systems with exogenous disturbances.In- ternational Journal of Control, 77(3):250–263, 2004
Wassim M.Haddad Tomohisa Hayakawa and Alexander Leonessa. A lyapunov-based adap- tive control framework for discrete-time non-linear systems with exogenous disturbances.In- ternational Journal of Control, 77(3):250–263, 2004. URL https://scispace.com/papers/ a-lyapunov-based-adaptive-control-framework-for-discrete-1mzw8c3u14
2004
-
[19]
Input-to-state stability for discrete-time nonlinear systems.Automat- ica, 37(6):857–869, 2001
Zhang-Ping Jiang and Yuan Wang. Input-to-state stability for discrete-time nonlinear systems.Automat- ica, 37(6):857–869, 2001. URL https://www.sciencedirect.com/science/article/abs/pii/ S0005109801000280
2001
-
[20]
Quanyan Zhu and Tamer Basar. Games-in-games principle for cyber-physical resilience.IEEE Control Sytems Magazine, 35(1):46–65, 2015. URLhttps://ieeexplore.ieee.org/document/7011006
-
[21]
G Leitmann. On generalized stackelberg strategies.Journal of Optimization Theory and Applications, 26: 637–643, 1978. URLhttps://link.springer.com/article/10.1007/BF00933155
-
[22]
URL https://dl.acm.org/doi/10.5555/3304652
Survey of Stackelberg Security Games, 2018. URL https://dl.acm.org/doi/10.5555/3304652. 3304789
-
[23]
URL https://dl.acm.org/doi/ 10.5555/2832249.2832322
Optimal network security hardening using attack graph games, 2015. URL https://dl.acm.org/doi/ 10.5555/2832249.2832322
-
[24]
Demosthenis Teneketzis Erik Miehling, Mohammad Rasouli. Input-to-state stability for discrete-time nonlinear systems.IEEE Transactions on Information Forensics and Security, 13(10):2490–2505, 2018. URLhttps://ieeexplore.ieee.org/document/8325528
-
[25]
URL https://dl.acm.org/doi/10.5555/ 3041838.3041906
Learning to Search Better than Your Teacher, 2003. URL https://dl.acm.org/doi/10.5555/ 3041838.3041906
-
[26]
A double oracle algorithm for zero-sum security games on graphs, 2011
Manish Jain, Dmytro Korzhyk, Ondˇrej Vanek, Vincent Conitzer, Michal Pechoucek, and Milind Tambe. A double oracle algorithm for zero-sum security games on graphs, 2011. URL https://www.cs.cmu. edu/~conitzer/graph_securityAAMAS11.pdf
2011
-
[27]
A Scalable Double Oracle for Hardening Large Active Directory Systems, 2023. URL https://dl.acm. org/doi/10.1145/3579856.3590343
-
[28]
Kevin Wood
R. Kevin Wood. Deterministic network interdiction.Mathematical and Computer Modelling, 17(2):1–18,
-
[29]
URLhttps://apps.dtic.mil/sti/pdfs/ADA487308.pdf
-
[30]
Andrew R. Romano and Lacra Pavel. Dynamic ne seeking for multi-integrator networked agents with disturbance rejection.arXiv, 2019. URLhttps://arxiv.org/pdf/1903.02587
-
[31]
What is edr? endpoint detection & response defined | crowdstrike,
Anne Aarness. What is edr? endpoint detection & response defined | crowdstrike,
-
[32]
URL https://www.crowdstrike.com/en-us/cybersecurity-101/endpoint-security/ endpoint-detection-and-response-edr/
-
[33]
Decoding the MITRE ATT&CK Enterprise Evaluation: An Analysis of EDR Performance in Real-World Environments, 2024. URL https://dl.acm.org/doi/10.1145/3634737.3645012?__cf_chl_ tk=.86c2C1ZEfMkg0vaSnkaKbugkY4QLrNj6JcmGOjNIsI-1775786335-1.0.1.1-HvuTRKtAOs_ hMOOL9a39uxjxZ5mKEFwnF8OOohWildI
-
[34]
Alexander V . Outkin, Patricia V . Schulz, Timothy Schulz, Thomas D. Tarman, and Ali Pinar. Defender policy evaluation and resource allocation with mitre att&ck evaluation data.IEEE Transactions on Dependable and Secure Computing, 20(3):1909–1926, 2023. URL https://ieeexplore.ieee.org/ document/9758675
-
[35]
Mohammad Aleiadeh and Mustafa Abdallah. Cbdra-is: Centrality-based defense resource allocation for securing interdependent systems.ACM Transactions on Privacy and Security, 28(3):1–44, 2025. URL https://dl.acm.org/doi/10.1145/3736760
-
[36]
A probabilistic cost-benefit analysis for cyberattack path evaluation.Reliability Engineering & System Safety, 263, 2025
Jinghan Zhang, Enrico Zio, Chiye Ma, Kang Liu, and Wei Wang. A probabilistic cost-benefit analysis for cyberattack path evaluation.Reliability Engineering & System Safety, 263, 2025. URL https: //www.sciencedirect.com/science/article/abs/pii/S0951832025004569
2025
-
[37]
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. Re- act: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023. URLhttps://arxiv.org/abs/2210.03629. 11
work page internal anchor Pith review arXiv 2023
-
[38]
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, K...
work page internal anchor Pith review arXiv 2022
-
[39]
Inner Monologue: Embodied Reasoning through Planning with Language Models
Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, and Brian Ichter. Inner monologue: Embodied reasoning through planning with language models.arXiv, 2022. URLhttps://arxiv.org/abs/2207.05608
work page internal anchor Pith review arXiv 2022
-
[40]
Victor Mayoral-Vilches, Maria Sanz-Gomez, Francesco Balassone, Stefan Rass, Lidia Salas-Espejo, Benjamin Jablonski, Luis Javier Navarrete-Lozano, Maite del Mundo de Torres, and Cristobal R. J. Veas Chavez. Cybersecurity ai: A game-theoretic ai for guiding attack and defense.arXiv, 2026. URL https://arxiv.org/abs/2601.05887
-
[41]
MaMa: A game-theoretic approach for designing safe agentic systems.arXiv, 2026
Adish Singla Jonathan, Nöther and Goran Radanovic. MaMa: A game-theoretic approach for designing safe agentic systems.arXiv, 2026. URLhttps://arxiv.org/abs/2602.04431
-
[42]
Nodezero: The ultimate endpoint security platform, 2026
Horizon3.ai. Nodezero: The ultimate endpoint security platform, 2026. URL https://www.horizon3. ai/nodezero/
2026
-
[43]
MITRE ATT&CK: Enterprise matrix, 2024
MITRE Corporation. MITRE ATT&CK: Enterprise matrix, 2024. URL https://attack.mitre.org/ matrices/enterprise/
2024
-
[44]
Microsoft Defender XDR: Security Configuration and Attack Surface Reduction Guidance.https://learn.microsoft.com/en-us/defender-xdr/, 2026
Microsoft Corporation. Microsoft Defender XDR: Security Configuration and Attack Surface Reduction Guidance.https://learn.microsoft.com/en-us/defender-xdr/, 2026. Accessed 2026-04
2026
-
[45]
EDR Telemetry Comparison Matrix
EDR Telemetry Project Contributors. EDR Telemetry Comparison Matrix. https://github.com/ tsale/EDR-Telemetry, 2026. Community-maintained telemetry-fidelity reference
2026
-
[46]
Configure security settings in Microsoft Defender for Endpoint on Linux
Microsoft. Configure security settings in Microsoft Defender for Endpoint on Linux. https://learn. microsoft.com/en-us/defender-endpoint/linux-preferences, 2026. Accessed: 2026-04-27. A Formal Verification of Closed-Loop Stability We formally verify the stability guarantees of Theorem 1 using the Lean 4 proof assistant with the Mathlib mathematical librar...
2026
-
[47]
14 Table A3: Distribution statistics for the 282 valid benchmark graphs
Node set.Vertices correspond to attack events (one per logged action), plus two virtual nodes: ENTRY (representing the attacker’s initial access point) andOBJECTIVE (representing the compromise goal, typically domain admin or sensitive data exfiltration). 14 Table A3: Distribution statistics for the 282 valid benchmark graphs. Quantity Min Median Mean Max...
-
[48]
Edge derivation.Edges are derived from three sources: (i) temporal ordering within each host (foothold → post-exploitation → objective), (ii) cross-host credential flow inferred from credential dumps matched to subsequent logons, and (iii) causal parent-child links from the penetration test platform’s attack chain data
-
[49]
Edge attributes.Each edge carries a MITRE ATT&CK technique label, anattacker payoff(derived from technique impact score and host criticality), ablock probability(policy effectiveness from the enrichment pipeline, capped at 0.95), adetection probability(flat baseline0.1), and a mapping from policy IDs to effectiveness values
-
[50]
Sanitized output.The final artifact is a JSON file per graph consumable by the experiment runner without access to raw pentest data. B.3 Filtering criteria Of the 300 exported graphs, 18 are excluded as degenerate inputs and 282 are retained for evaluation: • 14 graphs excluded for S <0.01 : the attacker has no viable path to the objective before any poli...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.