A Queueing-Theoretic Framework for Dynamic Attack Surfaces: Data-Integrated Risk Analysis and Adaptive Defense
Pith reviewed 2026-05-10 16:42 UTC · model grok-4.3
The pith
A queueing model of attack surfaces as vulnerability backlogs shows RL adaptive defense reduces active vulnerabilities by over 90 percent without raising maintenance costs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the temporal evolution of cyber attack surfaces can be captured by a queue whose backlog represents active vulnerabilities, with dynamics scaled by an AI amplification factor. Analysis shows symmetric automation increases successful exploits. Data reveals heavy-tailed patching times that induce long-range dependence, explaining persistent risk. Formulating defense as a constrained MDP and applying RL yields policies that reduce the average number of active vulnerabilities by over 90% in trace-driven experiments on a software supply chain dataset, without increasing the overall maintenance budget.
What carries the argument
The queueing model of the attack surface as a vulnerability backlog whose arrival, exploit, and patching rates are scaled by an AI amplification factor.
If this is right
- Symmetric AI automation increases the rate of successful exploits.
- Heavy-tailed patching times induce long-range dependence in the vulnerability backlog.
- The RL algorithm for the constrained MDP achieves provably near-optimal regret.
- Adaptive RL policies significantly reduce successful exploits and mitigate heavy-tail queue events.
- Defenders can quantify cumulative exposure risk under long-range dependent dynamics.
Where Pith is reading between the lines
- Similar queueing models could be used to analyze other stochastic security processes, such as the accumulation of technical debt or unpatched endpoints.
- Confirming heavy-tailed patching in more environments would imply that defense policies should target tail events rather than average times.
- The constrained optimization approach may generalize to other budget-limited adaptive security decisions beyond vulnerability patching.
Load-bearing premise
The single AI amplification factor together with heavy-tailed patching times in the queueing model accurately captures how real-world attack surfaces evolve over time, and the ARVO dataset is representative of broader cases so that the 90 percent reduction generalizes.
What would settle it
If trace-driven experiments on vulnerability data from other software supply chains show that the RL-based policy reduces active vulnerabilities by less than 50 percent, the claimed performance improvement would not hold.
Figures
read the original abstract
We develop a queueing-theoretic framework to model the temporal evolution of cyber-attack surfaces, where the number of active vulnerabilities is represented as the backlog of a queue. Vulnerabilities arrive as they are discovered or created, and leave the system when they are patched or successfully exploited. Building on this model, we study how automation affects attack and defense dynamics by introducing an AI amplification factor that scales arrival, exploit, and patching rates. Our analysis shows that even symmetric automation can increase the rate of successful exploits. We validate the model using vulnerability data collected from an open source software supply chain and show that it closely matches real-world attack surface dynamics. Empirical results reveal heavy-tailed patching times, which we prove induce long-range dependence in vulnerability backlog and help explain persistent cyber risk. Utilizing our queueing abstraction for the attack surface, we develop a systematic approach for cyber risk mitigation. We formulate the dynamic defense problem as a constrained Markov decision process with resource-budget and switching-cost constraints, and develop a reinforcement learning (RL) algorithm that achieves provably near-optimal regret. Numerical experiments validate the approach and demonstrate that our adaptive RL-based defense policies significantly reduce successful exploits and mitigate heavy-tail queue events. Using trace-driven experiments on the ARVO dataset, we show that the proposed RL-based defense policy reduces the average number of active vulnerabilities in a software supply chain by over 90% compared to existing defense practices, without increasing the overall maintenance budget. Our results allow defenders to quantify cumulative exposure risk under long-range dependent attack dynamics and to design adaptive defense strategies with provable efficiency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a queueing-theoretic model of cyber attack surfaces in which the number of active vulnerabilities is represented as the backlog of an M/G/1-like queue with arrivals from discoveries/creations and departures via patching or successful exploits. An AI amplification factor is introduced that symmetrically scales arrival, exploit, and patching rates; analysis shows this can increase the rate of successful exploits. The model is validated on vulnerability data from an open-source software supply chain, where heavy-tailed patching times are observed and proved (via renewal theory) to induce long-range dependence in the backlog process. The dynamic defense problem is formulated as a constrained MDP with maintenance-budget and switching-cost constraints; a reinforcement-learning algorithm is derived that achieves provably near-optimal regret. Trace-driven experiments on the ARVO dataset are reported to show that the learned RL policy reduces average active vulnerabilities by over 90% relative to existing practices while respecting the budget.
Significance. If the empirical claims hold, the work supplies a mathematically grounded abstraction for quantifying cumulative exposure under long-range-dependent vulnerability dynamics and for designing adaptive defenses with regret guarantees. The explicit link between heavy-tailed service times and persistent backlog, together with the constrained-MDP formulation, offers a reusable template for supply-chain risk analysis. The reported 90% reduction under fixed budget would be a practically significant result if reproducible; the absence of released data, code, or fitting details currently prevents independent verification of that magnitude.
major comments (3)
- [Abstract / trace-driven experiments] Abstract and trace-driven experiments section: the central claim of a >90% reduction in average active vulnerabilities is presented without error bars, confidence intervals, statistical significance tests, or explicit definitions of the baseline defense policies being compared. This omission is load-bearing for the empirical contribution.
- [Model validation] Model validation section: arrival, service, and exploit rates are stated to be fitted to the ARVO dataset, yet no description is given of the fitting procedure, goodness-of-fit metrics, or sensitivity of the 90% figure to those fitted parameters. The AI amplification factor is likewise introduced as a free parameter without reported calibration or robustness checks.
- [RL formulation and regret analysis] RL algorithm and regret analysis: while near-optimal regret is claimed for the constrained MDP, it is unclear whether the analysis accounts for the long-range dependence induced by heavy-tailed patching times (proved earlier in the paper). Standard regret bounds for MDPs can degrade under LRD; a concrete statement of the assumptions under which the bound continues to hold is needed.
minor comments (2)
- [Model definition] Notation for the AI amplification factor and the three scaled rates should be introduced once with a single symbol and then used consistently; the current abstract-level description risks reader confusion about which rates are affected.
- [Data description] The ARVO dataset is referenced repeatedly but never characterized (size, time span, number of projects, vulnerability types). A brief table or paragraph summarizing its statistics would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below with clarifications and commit to revisions that strengthen the empirical and theoretical contributions without altering the core claims.
read point-by-point responses
-
Referee: [Abstract / trace-driven experiments] Abstract and trace-driven experiments section: the central claim of a >90% reduction in average active vulnerabilities is presented without error bars, confidence intervals, statistical significance tests, or explicit definitions of the baseline defense policies being compared. This omission is load-bearing for the empirical contribution.
Authors: We agree that the presentation of the 90% reduction would be strengthened by additional statistical detail. In the revised version we will add error bars and confidence intervals to all reported figures in the trace-driven experiments section, include p-values or other significance tests against the baselines, and explicitly define the baseline policies (static priority patching, uniform random allocation under the same budget, and myopic greedy patching). These additions will also be summarized in the abstract. revision: yes
-
Referee: [Model validation] Model validation section: arrival, service, and exploit rates are stated to be fitted to the ARVO dataset, yet no description is given of the fitting procedure, goodness-of-fit metrics, or sensitivity of the 90% figure to those fitted parameters. The AI amplification factor is likewise introduced as a free parameter without reported calibration or robustness checks.
Authors: We acknowledge the omission of fitting details. The revision will include a dedicated subsection describing the estimation procedure (maximum-likelihood fitting of the heavy-tailed inter-arrival and patching distributions), report Kolmogorov-Smirnov and Anderson-Darling goodness-of-fit statistics, and present a sensitivity table showing how the 90% reduction varies under ±20% perturbations of the fitted rates. For the AI amplification factor we will add calibration against observed automation adoption rates in the ARVO traces together with robustness plots across a range of amplification values. revision: yes
-
Referee: [RL formulation and regret analysis] RL algorithm and regret analysis: while near-optimal regret is claimed for the constrained MDP, it is unclear whether the analysis accounts for the long-range dependence induced by heavy-tailed patching times (proved earlier in the paper). Standard regret bounds for MDPs can degrade under LRD; a concrete statement of the assumptions under which the bound continues to hold is needed.
Authors: The regret bound is derived for the constrained MDP whose transition kernel is obtained from the queueing model; the heavy-tailed patching times and resulting LRD are therefore embedded in the state dynamics. Nevertheless, standard MDP regret analyses assume sufficient mixing. We will revise the section to state explicitly that the bound holds under the assumption that the adaptive policy induces geometric ergodicity on the effective state space (or provide a mixing-time correction term). If the original proof requires additional technical conditions, we will add them and, if necessary, weaken the claim to “near-optimal regret under the stated mixing assumption.” revision: partial
Circularity Check
No significant circularity in derivation chain
full rationale
The paper's core chain proceeds from a standard queueing abstraction (backlog as M/G/1-like process) to an AI-scaled rate model whose exploit-increase result follows directly from the scaling assumption, then to a constrained MDP whose RL solution carries standard regret bounds. The heavy-tailed patching times and resulting LRD are derived via renewal theory (external to the paper) and validated against the independent ARVO dataset. The 90% reduction claim is obtained from trace-driven simulation of the learned policy on real supply-chain traces under fixed budget, not from re-fitting or re-using the same modeled trajectories as both training and test. No equation reduces to its own input by construction, no fitted parameter is relabeled as a prediction, and no load-bearing step relies on self-citation of an unverified uniqueness result. The framework therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- AI amplification factor
- Arrival, service, and exploit rates
axioms (2)
- domain assumption Vulnerabilities behave as customers in a single-server or multi-server queue with independent arrivals and departures.
- domain assumption Patching times follow a heavy-tailed distribution that induces long-range dependence in the backlog process.
Reference graph
Works this paper leans on
-
[1]
ARVO: Atlas of Re- producible Vulnerabilities for Open Source Software, August 2024
X. Mei, P. S. Singaria, J. Del Castillo, H. Xi, T. Bao, R. Wang, Y . Shoshitaishvili, A. Doup´e, H. Pearce, B. Dolan-Gavittet al., “Arvo: Atlas of reproducible vulnerabilities for open source software,”arXiv preprint arXiv:2408.02153, 2024
-
[2]
Fair - factor analysis of information risk,
J. A. Jones, “Fair - factor analysis of information risk,”Risk Management Insight LLC, 2011. [Online]. Available: https://www. risklens.com/resources/fair-risk-analysis-model
work page 2011
-
[3]
Liu,Embracing Risk: Cyber Insurance as an Incentive Mechanism for Cybersecurity, ser
M. Liu,Embracing Risk: Cyber Insurance as an Incentive Mechanism for Cybersecurity, ser. Synthesis Lectures on Learning, Networks, and Algorithms. Springer, 2021
work page 2021
-
[4]
P. K. Manadhata and J. M. Wing, “An attack surface metric,”IEEE Transactions on Software Engineering, vol. 37, no. 3, pp. 371–386, 2011
work page 2011
-
[5]
Attack surface definitions: A systematic literature review,
C. Theisen, N. Munaiah, M. Al-Zyoud, J. C. Carver, A. Meneely, and L. Williams, “Attack surface definitions: A systematic literature review,” Information and Software Technology, vol. 104, pp. 94–103, 2018
work page 2018
-
[6]
Measuring the size and severity of the integrated cyber attack surface across us county governments,
C. Harry, I. Sivan-Sevilla, and M. McDermott, “Measuring the size and severity of the integrated cyber attack surface across us county governments,”Journal of Cybersecurity, vol. 11, no. 1, p. tyae032, 2025
work page 2025
-
[7]
An attack-graph based probabilistic security metric,
H. Wang, D. Zhang, and S. Jajodia, “An attack-graph based probabilistic security metric,” inIFIP Data and Applications Security ’08. Springer, 2008, pp. 109–124. [Online]. Available: https://link.springer.com/chapter/10.1007/978-0-387-09699-5 8
-
[8]
Prometheus: Infrastructure security posture analysis with ai-generated attack graphs,
X. Jin, C. Katsis, F. Sang, J. Sun, E. Bertino, R. R. Kompella, and A. Kundu, “Prometheus: Infrastructure security posture analysis with ai-generated attack graphs,” 2023
work page 2023
-
[9]
A bayesian network model for predicting cyber security threats,
J. J. Ryan and S. D. Dexter, “A bayesian network model for predicting cyber security threats,”Journal of Information Assurance and Security, vol. 4, no. 2, pp. 105–114, 2009. [Online]. Available: https://www.mirlabs.org/jias/V olume42 2009/V ol42.html
work page 2009
-
[10]
Dynamic security risk manage- ment using bayesian attack graphs,
N. Poolsappasit, R. Dewri, and I. Ray, “Dynamic security risk manage- ment using bayesian attack graphs,”IEEE Transactions on Dependable and Secure Computing, vol. 9, no. 1, pp. 61–74, 2012
work page 2012
-
[11]
Bayesian decision network-based security risk management framework,
M. Khosravi-Farmad and A. Ghaemi-Bafghi, “Bayesian decision network-based security risk management framework,”Journal of Network and Systems Management, vol. 28, p. 1794–1819, 2020. [Online]. Available: https://doi.org/10.1007/s10922-020-09558-5
-
[12]
Q. Zhang, C. Zhou, Y .-C. Tian, N. Xiong, Y . Qin, and B. Hu, “A fuzzy probability bayesian network approach for dynamic cybersecurity risk assessment in industrial control systems,”IEEE Transactions on Industrial Informatics, vol. 14, no. 6, pp. 2497–2506, 2018
work page 2018
-
[13]
Toward scalable graph-based security analysis for cloud networks,
A. Sabur, A. Chowdhary, D. Huang, and A. Alshamrani, “Toward scalable graph-based security analysis for cloud networks,”Computer Networks, vol. 206, p. 108795, 2022. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S1389128622000251
work page 2022
-
[14]
Probabilistic modeling and analysis of sequential cyber-attacks,
Q. Liu, L. Xing, and C. Zhou, “Probabilistic modeling and analysis of sequential cyber-attacks,”Engineering Reports, vol. 1, no. 4,
-
[15]
Available: https://onlinelibrary.wiley.com/doi/abs/10
[Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10. 1002/eng2.12065
-
[16]
Systematic literature review of security event correlation methods,
I. Kotenko, D. Gaifulina, and I. Zelichenok, “Systematic literature review of security event correlation methods,”IEEE Access, vol. 10, pp. 43 387– 43 420, 2022
work page 2022
-
[17]
Mathematical model on vulnerability characterization and its impact on network epidemics,
K. Haldar and B. K. Mishra, “Mathematical model on vulnerability characterization and its impact on network epidemics,”International Journal of System Assurance Engineering and Management, vol. 8, no. 2, pp. 378–392, 2017
work page 2017
-
[18]
A queueing solution to reduce delay in processing of disclosed vulnerabilities,
A. Feutrill, M. Roughan, J. Ross, and Y . Yarom, “A queueing solution to reduce delay in processing of disclosed vulnerabilities,” in2020 Second IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). IEEE, 2020, pp. 1–11
work page 2020
-
[19]
Deepcode ai fix: Fixing security vulnerabilities with large language models,
B. Berabi, A. Gronskiy, V . Raychev, G. Sivanrupan, V . Chibotaru, and M. Vechev, “Deepcode ai fix: Fixing security vulnerabilities with large language models,” 2024
work page 2024
-
[20]
Large language models are advanced anonymizers,
R. Staab, M. Vero, M. Balunovi ´c, and M. Vechev, “Large language models are advanced anonymizers,” 2024
work page 2024
-
[21]
Llm agents can autonomously hack websites,
R. Fang, R. Bindu, A. Gupta, Q. Zhan, and D. Kang, “Llm agents can autonomously hack websites,” 2024
work page 2024
-
[22]
Gradsafe: Detecting unsafe prompts for llms via safety-critical gradient analysis,
Y . Xie, M. Fang, R. Pi, and N. Gong, “Gradsafe: Detecting unsafe prompts for llms via safety-critical gradient analysis,” 2024
work page 2024
-
[23]
Coercing llms to do and reveal (almost) anything,
J. Geiping, A. Stein, M. Shu, K. Saifullah, Y . Wen, and T. Goldstein, “Coercing llms to do and reveal (almost) anything,” 2024
work page 2024
-
[24]
D. Miessler, “Ai agents,” Link for the LinkedIn Post, accessed: 2024- 11-07
work page 2024
-
[25]
Fbi warns of increasing threat of cyber criminals utilizing artificial intelligence,
F. B. of Investigation, “Fbi warns of increasing threat of cyber criminals utilizing artificial intelligence,” FBI Posting Link, May 2024. 15
work page 2024
-
[26]
Safe reinforcement learning with linear function approximation,
S. Amani, C. Thrampoulidis, and L. Yang, “Safe reinforcement learning with linear function approximation,” inInternational Conference on Machine Learning. PMLR, 2021, pp. 243–253
work page 2021
-
[27]
A simple reward-free approach to constrained reinforcement learning,
S. Miryoosefi and C. Jin, “A simple reward-free approach to constrained reinforcement learning,” inInternational Conference on Machine Learn- ing. PMLR, 2022, pp. 15 666–15 698
work page 2022
-
[28]
A near-optimal algorithm for safe reinforcement learning under instantaneous hard constraints,
M. Shi, Y . Liang, and N. Shroff, “A near-optimal algorithm for safe reinforcement learning under instantaneous hard constraints,” inInter- national Conference on Machine Learning. PMLR, 2023, pp. 31 243– 31 268
work page 2023
-
[29]
Provably efficient q-learning with low switching cost,
Y . Bai, T. Xie, N. Jiang, and Y .-X. Wang, “Provably efficient q-learning with low switching cost,”Advances in Neural Information Processing Systems, vol. 32, 2019
work page 2019
-
[30]
Towards deployment-efficient reinforcement learning: Lower bound and optimal- ity,
J. Huang, J. Chen, L. Zhao, T. Qin, N. Jiang, and T.-Y . Liu, “Towards deployment-efficient reinforcement learning: Lower bound and optimal- ity,”arXiv preprint arXiv:2202.06450, 2022
-
[31]
Near-optimal adversarial reinforcement learning with switching costs,
M. Shi, Y . Liang, and N. Shroff, “Near-optimal adversarial reinforcement learning with switching costs,” inEleventh International Conference on Learning Representations, 2023
work page 2023
-
[32]
N. Gautam, “Analysis of queues,”CRC Press, LLC, Boca Raton, Florida, United States, vol. 10, p. 2222496, 2012
work page 2012
-
[33]
Evaluating intrusion detection systems: The 1998 darpa off-line intrusion detection evaluation,
R. Lippmann and D. J. Fried, “Evaluating intrusion detection systems: The 1998 darpa off-line intrusion detection evaluation,” inRecent Advances in Intrusion Detection. Springer Berlin Heidelberg, 2005, pp. 162–184. APPENDIXA PROOF OFTHEOREM1 Let us define the indicator variable: Ii(t) △ = 1,a vulnerability arrives in((i−1)δ, iδ) and still in the sy...
work page 1998
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.