pith. sign in

arxiv: 2604.10427 · v2 · submitted 2026-04-12 · 💻 cs.CR · cs.AI· cs.LG· cs.SY· eess.SY· math.OC

A Queueing-Theoretic Framework for Dynamic Attack Surfaces: Data-Integrated Risk Analysis and Adaptive Defense

Pith reviewed 2026-05-10 16:42 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.LGcs.SYeess.SYmath.OC
keywords queueing theoryattack surfacereinforcement learningcyber riskvulnerability patchinglong-range dependencesoftware supply chainadaptive defense
0
0 comments X

The pith

A queueing model of attack surfaces as vulnerability backlogs shows RL adaptive defense reduces active vulnerabilities by over 90 percent without raising maintenance costs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a queueing-theoretic framework to model cyber-attack surfaces as the backlog of a queue where vulnerabilities arrive upon discovery and leave when patched or exploited. It incorporates an AI amplification factor to study automation effects and proves that even symmetric automation raises the rate of successful exploits. Validation on real vulnerability data shows heavy-tailed patching times induce long-range dependence in the backlog. The model is used to cast dynamic defense as a constrained Markov decision process, solved by an RL algorithm with near-optimal regret. Experiments on the ARVO dataset demonstrate that the RL policies reduce active vulnerabilities by over 90 percent without increasing the maintenance budget, allowing better quantification of exposure risk.

Core claim

The central claim is that the temporal evolution of cyber attack surfaces can be captured by a queue whose backlog represents active vulnerabilities, with dynamics scaled by an AI amplification factor. Analysis shows symmetric automation increases successful exploits. Data reveals heavy-tailed patching times that induce long-range dependence, explaining persistent risk. Formulating defense as a constrained MDP and applying RL yields policies that reduce the average number of active vulnerabilities by over 90% in trace-driven experiments on a software supply chain dataset, without increasing the overall maintenance budget.

What carries the argument

The queueing model of the attack surface as a vulnerability backlog whose arrival, exploit, and patching rates are scaled by an AI amplification factor.

If this is right

  • Symmetric AI automation increases the rate of successful exploits.
  • Heavy-tailed patching times induce long-range dependence in the vulnerability backlog.
  • The RL algorithm for the constrained MDP achieves provably near-optimal regret.
  • Adaptive RL policies significantly reduce successful exploits and mitigate heavy-tail queue events.
  • Defenders can quantify cumulative exposure risk under long-range dependent dynamics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar queueing models could be used to analyze other stochastic security processes, such as the accumulation of technical debt or unpatched endpoints.
  • Confirming heavy-tailed patching in more environments would imply that defense policies should target tail events rather than average times.
  • The constrained optimization approach may generalize to other budget-limited adaptive security decisions beyond vulnerability patching.

Load-bearing premise

The single AI amplification factor together with heavy-tailed patching times in the queueing model accurately captures how real-world attack surfaces evolve over time, and the ARVO dataset is representative of broader cases so that the 90 percent reduction generalizes.

What would settle it

If trace-driven experiments on vulnerability data from other software supply chains show that the RL-based policy reduces active vulnerabilities by less than 50 percent, the claimed performance improvement would not hold.

Figures

Figures reproduced from arXiv: 2604.10427 by Abdullah Yasin Etcibasi, C. Emre Koksal, Jihyeon Yun, Ming Shi.

Figure 1
Figure 1. Figure 1: Attack surface modeled as a queueing system. Vulnerabilities arrive via [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Steady-state distribution of the normalized attack surface size under [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Steady-state distribution of the normalized attack surface size under [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Temporal evolution of the attack surface size, [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: IA and ST distributions for Component 1 (weeks 0–64). Loglogistic [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Final integrated model: empirical QLD compared with segmented, [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Successful exploit rate vs. per-step defense budget (patches per unit [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Successful exploit rate vs. per-step defense budget (patches per unit [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Trace-driven comparison of queue-length probability densities on [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Queue-length histogram under the empirical aggregate baseline [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
read the original abstract

We develop a queueing-theoretic framework to model the temporal evolution of cyber-attack surfaces, where the number of active vulnerabilities is represented as the backlog of a queue. Vulnerabilities arrive as they are discovered or created, and leave the system when they are patched or successfully exploited. Building on this model, we study how automation affects attack and defense dynamics by introducing an AI amplification factor that scales arrival, exploit, and patching rates. Our analysis shows that even symmetric automation can increase the rate of successful exploits. We validate the model using vulnerability data collected from an open source software supply chain and show that it closely matches real-world attack surface dynamics. Empirical results reveal heavy-tailed patching times, which we prove induce long-range dependence in vulnerability backlog and help explain persistent cyber risk. Utilizing our queueing abstraction for the attack surface, we develop a systematic approach for cyber risk mitigation. We formulate the dynamic defense problem as a constrained Markov decision process with resource-budget and switching-cost constraints, and develop a reinforcement learning (RL) algorithm that achieves provably near-optimal regret. Numerical experiments validate the approach and demonstrate that our adaptive RL-based defense policies significantly reduce successful exploits and mitigate heavy-tail queue events. Using trace-driven experiments on the ARVO dataset, we show that the proposed RL-based defense policy reduces the average number of active vulnerabilities in a software supply chain by over 90% compared to existing defense practices, without increasing the overall maintenance budget. Our results allow defenders to quantify cumulative exposure risk under long-range dependent attack dynamics and to design adaptive defense strategies with provable efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper develops a queueing-theoretic model of cyber attack surfaces in which the number of active vulnerabilities is represented as the backlog of an M/G/1-like queue with arrivals from discoveries/creations and departures via patching or successful exploits. An AI amplification factor is introduced that symmetrically scales arrival, exploit, and patching rates; analysis shows this can increase the rate of successful exploits. The model is validated on vulnerability data from an open-source software supply chain, where heavy-tailed patching times are observed and proved (via renewal theory) to induce long-range dependence in the backlog process. The dynamic defense problem is formulated as a constrained MDP with maintenance-budget and switching-cost constraints; a reinforcement-learning algorithm is derived that achieves provably near-optimal regret. Trace-driven experiments on the ARVO dataset are reported to show that the learned RL policy reduces average active vulnerabilities by over 90% relative to existing practices while respecting the budget.

Significance. If the empirical claims hold, the work supplies a mathematically grounded abstraction for quantifying cumulative exposure under long-range-dependent vulnerability dynamics and for designing adaptive defenses with regret guarantees. The explicit link between heavy-tailed service times and persistent backlog, together with the constrained-MDP formulation, offers a reusable template for supply-chain risk analysis. The reported 90% reduction under fixed budget would be a practically significant result if reproducible; the absence of released data, code, or fitting details currently prevents independent verification of that magnitude.

major comments (3)
  1. [Abstract / trace-driven experiments] Abstract and trace-driven experiments section: the central claim of a >90% reduction in average active vulnerabilities is presented without error bars, confidence intervals, statistical significance tests, or explicit definitions of the baseline defense policies being compared. This omission is load-bearing for the empirical contribution.
  2. [Model validation] Model validation section: arrival, service, and exploit rates are stated to be fitted to the ARVO dataset, yet no description is given of the fitting procedure, goodness-of-fit metrics, or sensitivity of the 90% figure to those fitted parameters. The AI amplification factor is likewise introduced as a free parameter without reported calibration or robustness checks.
  3. [RL formulation and regret analysis] RL algorithm and regret analysis: while near-optimal regret is claimed for the constrained MDP, it is unclear whether the analysis accounts for the long-range dependence induced by heavy-tailed patching times (proved earlier in the paper). Standard regret bounds for MDPs can degrade under LRD; a concrete statement of the assumptions under which the bound continues to hold is needed.
minor comments (2)
  1. [Model definition] Notation for the AI amplification factor and the three scaled rates should be introduced once with a single symbol and then used consistently; the current abstract-level description risks reader confusion about which rates are affected.
  2. [Data description] The ARVO dataset is referenced repeatedly but never characterized (size, time span, number of projects, vulnerability types). A brief table or paragraph summarizing its statistics would improve reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below with clarifications and commit to revisions that strengthen the empirical and theoretical contributions without altering the core claims.

read point-by-point responses
  1. Referee: [Abstract / trace-driven experiments] Abstract and trace-driven experiments section: the central claim of a >90% reduction in average active vulnerabilities is presented without error bars, confidence intervals, statistical significance tests, or explicit definitions of the baseline defense policies being compared. This omission is load-bearing for the empirical contribution.

    Authors: We agree that the presentation of the 90% reduction would be strengthened by additional statistical detail. In the revised version we will add error bars and confidence intervals to all reported figures in the trace-driven experiments section, include p-values or other significance tests against the baselines, and explicitly define the baseline policies (static priority patching, uniform random allocation under the same budget, and myopic greedy patching). These additions will also be summarized in the abstract. revision: yes

  2. Referee: [Model validation] Model validation section: arrival, service, and exploit rates are stated to be fitted to the ARVO dataset, yet no description is given of the fitting procedure, goodness-of-fit metrics, or sensitivity of the 90% figure to those fitted parameters. The AI amplification factor is likewise introduced as a free parameter without reported calibration or robustness checks.

    Authors: We acknowledge the omission of fitting details. The revision will include a dedicated subsection describing the estimation procedure (maximum-likelihood fitting of the heavy-tailed inter-arrival and patching distributions), report Kolmogorov-Smirnov and Anderson-Darling goodness-of-fit statistics, and present a sensitivity table showing how the 90% reduction varies under ±20% perturbations of the fitted rates. For the AI amplification factor we will add calibration against observed automation adoption rates in the ARVO traces together with robustness plots across a range of amplification values. revision: yes

  3. Referee: [RL formulation and regret analysis] RL algorithm and regret analysis: while near-optimal regret is claimed for the constrained MDP, it is unclear whether the analysis accounts for the long-range dependence induced by heavy-tailed patching times (proved earlier in the paper). Standard regret bounds for MDPs can degrade under LRD; a concrete statement of the assumptions under which the bound continues to hold is needed.

    Authors: The regret bound is derived for the constrained MDP whose transition kernel is obtained from the queueing model; the heavy-tailed patching times and resulting LRD are therefore embedded in the state dynamics. Nevertheless, standard MDP regret analyses assume sufficient mixing. We will revise the section to state explicitly that the bound holds under the assumption that the adaptive policy induces geometric ergodicity on the effective state space (or provide a mixing-time correction term). If the original proof requires additional technical conditions, we will add them and, if necessary, weaken the claim to “near-optimal regret under the stated mixing assumption.” revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's core chain proceeds from a standard queueing abstraction (backlog as M/G/1-like process) to an AI-scaled rate model whose exploit-increase result follows directly from the scaling assumption, then to a constrained MDP whose RL solution carries standard regret bounds. The heavy-tailed patching times and resulting LRD are derived via renewal theory (external to the paper) and validated against the independent ARVO dataset. The 90% reduction claim is obtained from trace-driven simulation of the learned policy on real supply-chain traces under fixed budget, not from re-fitting or re-using the same modeled trajectories as both training and test. No equation reduces to its own input by construction, no fitted parameter is relabeled as a prediction, and no load-bearing step relies on self-citation of an unverified uniqueness result. The framework therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on standard queueing assumptions plus a new scaling factor and heavy-tail claim; no new physical entities are postulated.

free parameters (2)
  • AI amplification factor
    Single multiplier that scales arrival, exploit, and patching rates; value fitted or chosen to represent automation level.
  • Arrival, service, and exploit rates
    Parameters estimated from vulnerability data to match observed backlog dynamics.
axioms (2)
  • domain assumption Vulnerabilities behave as customers in a single-server or multi-server queue with independent arrivals and departures.
    Core modeling choice stated in the abstract; standard in queueing theory but requires that patching and exploitation are memoryless or Markovian.
  • domain assumption Patching times follow a heavy-tailed distribution that induces long-range dependence in the backlog process.
    Claimed to be proven from data and used to explain persistent risk.

pith-pipeline@v0.9.0 · 5615 in / 1634 out tokens · 37992 ms · 2026-05-10T16:42:02.084693+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

  1. [1]

    ARVO: Atlas of Re- producible Vulnerabilities for Open Source Software, August 2024

    X. Mei, P. S. Singaria, J. Del Castillo, H. Xi, T. Bao, R. Wang, Y . Shoshitaishvili, A. Doup´e, H. Pearce, B. Dolan-Gavittet al., “Arvo: Atlas of reproducible vulnerabilities for open source software,”arXiv preprint arXiv:2408.02153, 2024

  2. [2]

    Fair - factor analysis of information risk,

    J. A. Jones, “Fair - factor analysis of information risk,”Risk Management Insight LLC, 2011. [Online]. Available: https://www. risklens.com/resources/fair-risk-analysis-model

  3. [3]

    Liu,Embracing Risk: Cyber Insurance as an Incentive Mechanism for Cybersecurity, ser

    M. Liu,Embracing Risk: Cyber Insurance as an Incentive Mechanism for Cybersecurity, ser. Synthesis Lectures on Learning, Networks, and Algorithms. Springer, 2021

  4. [4]

    An attack surface metric,

    P. K. Manadhata and J. M. Wing, “An attack surface metric,”IEEE Transactions on Software Engineering, vol. 37, no. 3, pp. 371–386, 2011

  5. [5]

    Attack surface definitions: A systematic literature review,

    C. Theisen, N. Munaiah, M. Al-Zyoud, J. C. Carver, A. Meneely, and L. Williams, “Attack surface definitions: A systematic literature review,” Information and Software Technology, vol. 104, pp. 94–103, 2018

  6. [6]

    Measuring the size and severity of the integrated cyber attack surface across us county governments,

    C. Harry, I. Sivan-Sevilla, and M. McDermott, “Measuring the size and severity of the integrated cyber attack surface across us county governments,”Journal of Cybersecurity, vol. 11, no. 1, p. tyae032, 2025

  7. [7]

    An attack-graph based probabilistic security metric,

    H. Wang, D. Zhang, and S. Jajodia, “An attack-graph based probabilistic security metric,” inIFIP Data and Applications Security ’08. Springer, 2008, pp. 109–124. [Online]. Available: https://link.springer.com/chapter/10.1007/978-0-387-09699-5 8

  8. [8]

    Prometheus: Infrastructure security posture analysis with ai-generated attack graphs,

    X. Jin, C. Katsis, F. Sang, J. Sun, E. Bertino, R. R. Kompella, and A. Kundu, “Prometheus: Infrastructure security posture analysis with ai-generated attack graphs,” 2023

  9. [9]

    A bayesian network model for predicting cyber security threats,

    J. J. Ryan and S. D. Dexter, “A bayesian network model for predicting cyber security threats,”Journal of Information Assurance and Security, vol. 4, no. 2, pp. 105–114, 2009. [Online]. Available: https://www.mirlabs.org/jias/V olume42 2009/V ol42.html

  10. [10]

    Dynamic security risk manage- ment using bayesian attack graphs,

    N. Poolsappasit, R. Dewri, and I. Ray, “Dynamic security risk manage- ment using bayesian attack graphs,”IEEE Transactions on Dependable and Secure Computing, vol. 9, no. 1, pp. 61–74, 2012

  11. [11]

    Bayesian decision network-based security risk management framework,

    M. Khosravi-Farmad and A. Ghaemi-Bafghi, “Bayesian decision network-based security risk management framework,”Journal of Network and Systems Management, vol. 28, p. 1794–1819, 2020. [Online]. Available: https://doi.org/10.1007/s10922-020-09558-5

  12. [12]

    A fuzzy probability bayesian network approach for dynamic cybersecurity risk assessment in industrial control systems,

    Q. Zhang, C. Zhou, Y .-C. Tian, N. Xiong, Y . Qin, and B. Hu, “A fuzzy probability bayesian network approach for dynamic cybersecurity risk assessment in industrial control systems,”IEEE Transactions on Industrial Informatics, vol. 14, no. 6, pp. 2497–2506, 2018

  13. [13]

    Toward scalable graph-based security analysis for cloud networks,

    A. Sabur, A. Chowdhary, D. Huang, and A. Alshamrani, “Toward scalable graph-based security analysis for cloud networks,”Computer Networks, vol. 206, p. 108795, 2022. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S1389128622000251

  14. [14]

    Probabilistic modeling and analysis of sequential cyber-attacks,

    Q. Liu, L. Xing, and C. Zhou, “Probabilistic modeling and analysis of sequential cyber-attacks,”Engineering Reports, vol. 1, no. 4,

  15. [15]

    Available: https://onlinelibrary.wiley.com/doi/abs/10

    [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10. 1002/eng2.12065

  16. [16]

    Systematic literature review of security event correlation methods,

    I. Kotenko, D. Gaifulina, and I. Zelichenok, “Systematic literature review of security event correlation methods,”IEEE Access, vol. 10, pp. 43 387– 43 420, 2022

  17. [17]

    Mathematical model on vulnerability characterization and its impact on network epidemics,

    K. Haldar and B. K. Mishra, “Mathematical model on vulnerability characterization and its impact on network epidemics,”International Journal of System Assurance Engineering and Management, vol. 8, no. 2, pp. 378–392, 2017

  18. [18]

    A queueing solution to reduce delay in processing of disclosed vulnerabilities,

    A. Feutrill, M. Roughan, J. Ross, and Y . Yarom, “A queueing solution to reduce delay in processing of disclosed vulnerabilities,” in2020 Second IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). IEEE, 2020, pp. 1–11

  19. [19]

    Deepcode ai fix: Fixing security vulnerabilities with large language models,

    B. Berabi, A. Gronskiy, V . Raychev, G. Sivanrupan, V . Chibotaru, and M. Vechev, “Deepcode ai fix: Fixing security vulnerabilities with large language models,” 2024

  20. [20]

    Large language models are advanced anonymizers,

    R. Staab, M. Vero, M. Balunovi ´c, and M. Vechev, “Large language models are advanced anonymizers,” 2024

  21. [21]

    Llm agents can autonomously hack websites,

    R. Fang, R. Bindu, A. Gupta, Q. Zhan, and D. Kang, “Llm agents can autonomously hack websites,” 2024

  22. [22]

    Gradsafe: Detecting unsafe prompts for llms via safety-critical gradient analysis,

    Y . Xie, M. Fang, R. Pi, and N. Gong, “Gradsafe: Detecting unsafe prompts for llms via safety-critical gradient analysis,” 2024

  23. [23]

    Coercing llms to do and reveal (almost) anything,

    J. Geiping, A. Stein, M. Shu, K. Saifullah, Y . Wen, and T. Goldstein, “Coercing llms to do and reveal (almost) anything,” 2024

  24. [24]

    Ai agents,

    D. Miessler, “Ai agents,” Link for the LinkedIn Post, accessed: 2024- 11-07

  25. [25]

    Fbi warns of increasing threat of cyber criminals utilizing artificial intelligence,

    F. B. of Investigation, “Fbi warns of increasing threat of cyber criminals utilizing artificial intelligence,” FBI Posting Link, May 2024. 15

  26. [26]

    Safe reinforcement learning with linear function approximation,

    S. Amani, C. Thrampoulidis, and L. Yang, “Safe reinforcement learning with linear function approximation,” inInternational Conference on Machine Learning. PMLR, 2021, pp. 243–253

  27. [27]

    A simple reward-free approach to constrained reinforcement learning,

    S. Miryoosefi and C. Jin, “A simple reward-free approach to constrained reinforcement learning,” inInternational Conference on Machine Learn- ing. PMLR, 2022, pp. 15 666–15 698

  28. [28]

    A near-optimal algorithm for safe reinforcement learning under instantaneous hard constraints,

    M. Shi, Y . Liang, and N. Shroff, “A near-optimal algorithm for safe reinforcement learning under instantaneous hard constraints,” inInter- national Conference on Machine Learning. PMLR, 2023, pp. 31 243– 31 268

  29. [29]

    Provably efficient q-learning with low switching cost,

    Y . Bai, T. Xie, N. Jiang, and Y .-X. Wang, “Provably efficient q-learning with low switching cost,”Advances in Neural Information Processing Systems, vol. 32, 2019

  30. [30]

    Towards deployment-efficient reinforcement learning: Lower bound and optimal- ity,

    J. Huang, J. Chen, L. Zhao, T. Qin, N. Jiang, and T.-Y . Liu, “Towards deployment-efficient reinforcement learning: Lower bound and optimal- ity,”arXiv preprint arXiv:2202.06450, 2022

  31. [31]

    Near-optimal adversarial reinforcement learning with switching costs,

    M. Shi, Y . Liang, and N. Shroff, “Near-optimal adversarial reinforcement learning with switching costs,” inEleventh International Conference on Learning Representations, 2023

  32. [32]

    Analysis of queues,

    N. Gautam, “Analysis of queues,”CRC Press, LLC, Boca Raton, Florida, United States, vol. 10, p. 2222496, 2012

  33. [33]

    Evaluating intrusion detection systems: The 1998 darpa off-line intrusion detection evaluation,

    R. Lippmann and D. J. Fried, “Evaluating intrusion detection systems: The 1998 darpa off-line intrusion detection evaluation,” inRecent Advances in Intrusion Detection. Springer Berlin Heidelberg, 2005, pp. 162–184. APPENDIXA PROOF OFTHEOREM1 Let us define the indicator variable: Ii(t) △ =    1,a vulnerability arrives in((i−1)δ, iδ) and still in the sy...