Organizational Security Resource Estimation via Vulnerability Queueing
Pith reviewed 2026-05-10 15:33 UTC · model grok-4.3
The pith
A queueing model estimates an organization's active security personnel directly from vulnerability timestamps.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that a non-stationary queueing framework, which segments vulnerability data into piecewise-stationary regimes by applying Gaussian mixture modeling to the observed queue-length distributions and then fits segment-specific arrival, service, and resource parameters by minimizing Kullback-Leibler divergence to the empirical distributions, recovers time-varying active personnel counts and per-person output rates solely from bug-report and fix timestamps, attaining 91 to 96 percent accuracy on both large-scale software supply-chain records and multi-year enterprise cyber-ticket logs.
What carries the argument
A non-stationary queueing abstraction of the attack surface that is segmented into piecewise-stationary regimes via Gaussian mixture modeling on queue-length distributions and whose arrival, service, and resource parameters are recovered by minimizing Kullback-Leibler divergence to the empirical distributions.
If this is right
- The fitted model directly exposes periods when available resources are insufficient to keep up with incoming vulnerabilities.
- The same timestamps can be used to forecast future staffing requirements under different vulnerability-arrival scenarios.
- The framework supplies the inputs needed for patch-race simulations and proactive risk calculations.
- Resource estimates become possible for any organization that already records discovery and remediation times, without extra data collection.
Where Pith is reading between the lines
- Public vulnerability databases could be mined to produce external benchmarks of how many staff competitors allocate to security.
- The same segmentation and fitting procedure could be tested on other timestamped workflows such as IT incident tickets or customer-support queues.
- Organizations could run what-if simulations inside the fitted model to quantify how adding or removing personnel would change resolution times.
Load-bearing premise
That the real sequence of vulnerability discoveries and fixes behaves enough like a queueing system that matching the observed queue-length distributions recovers the true underlying number of active personnel rather than merely reproducing the statistics.
What would settle it
An independent count of active security personnel over the same time windows that differs substantially from the numbers recovered by fitting the queueing model to the same timestamp data.
Figures
read the original abstract
We provide an approach that closely estimates an organization's cyber resources directly from vulnerability timestamps, using a non-stationary queueing framework. Traditional attack-surface metrics operate on static snapshots, ignoring the core attack-defense dynamics within information systems, which exhibit bursty, heavy-tailed, and capacity-constrained behavior. Our approach to modeling such dynamics is based on a queueing abstraction of attack surfaces. We utilize a segmentation method to identify piecewise-stationary regimes via Gaussian mixture modeling (GMM) of queue length distributions. We fit segment-specific arrival, service, and resource parameters through the minimization of Kullback--Leibler divergence (KL) between the empirical and estimated distributions. Applied to both large-scale software supply chain data and multi-year private logistics enterprise cyber-ticket workflows, the model estimates organizational resources, measured in the time-varying active personnel and output rate per personnel, solely from bug report and fix timings for software supply chains, and discovery and patch timestamps in the enterprise setting. Our results provide 91--96\% accuracy in resource estimation, making the dynamic queueing framework a compelling approach for understanding attack surface dynamics. Further, our framework exposes resource bottlenecks, establishing a foundation for predictive workforce planning, patch-race modeling, and proactive cyber-risk management.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a non-stationary queueing model to estimate an organization's cyber resources (time-varying active personnel count and per-person output rate) directly from vulnerability timestamps. Vulnerabilities are abstracted as jobs in an M/M/c-style queue; piecewise-stationary regimes are identified via GMM clustering on empirical queue-length distributions; segment-specific arrival rate λ, service rate μ, and resource parameters are then recovered by minimizing KL divergence between the empirical and theoretical steady-state queue-length distributions. The method is applied to software supply-chain bug data and multi-year enterprise cyber-ticket workflows, with the central claim that the recovered resource values achieve 91–96 % accuracy.
Significance. If the fitted resource parameters can be shown to recover ground-truth organizational headcount and productivity rather than merely reproducing the observed queue-length statistics, the framework would constitute a meaningful advance over static attack-surface metrics. It would enable inference of hidden defensive capacity, identification of bottlenecks, and data-driven workforce planning from observable timestamps alone, with potential extensions to patch-race modeling and predictive risk management.
major comments (2)
- Abstract and Evaluation section: the claim of 91–96 % accuracy in resource estimation is stated without any description of the validation protocol (hold-out sets, cross-validation, baseline comparators, or sensitivity to GMM segmentation). Because the accuracy metric is central to the paper's contribution, the absence of these details prevents assessment of whether the result is robust or an artifact of in-sample fitting.
- Method section (parameter-fitting procedure): resource parameters (active personnel c and output rate) are obtained by minimizing KL divergence between the model and the empirical queue-length distribution computed from the same timestamps used to define each segment. Multiple (λ, μ, c) triples can yield similar steady-state distributions, so the procedure risks recovering values that match statistics without corresponding to true personnel levels; no identifiability analysis or external validation against known organizational data is provided.
minor comments (1)
- Abstract: the datasets are described only as 'large-scale software supply chain data' and 'multi-year private logistics enterprise cyber-ticket workflows'; adding dataset names, sizes, and public availability statements would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for their insightful comments, which have helped us identify areas for improvement in the presentation of our results. We provide point-by-point responses below and outline the revisions we will make to address the concerns raised.
read point-by-point responses
-
Referee: Abstract and Evaluation section: the claim of 91–96 % accuracy in resource estimation is stated without any description of the validation protocol (hold-out sets, cross-validation, baseline comparators, or sensitivity to GMM segmentation). Because the accuracy metric is central to the paper's contribution, the absence of these details prevents assessment of whether the result is robust or an artifact of in-sample fitting.
Authors: We thank the referee for pointing out the lack of detail on the validation protocol. Upon review, we recognize that while the accuracy is reported based on our internal evaluations using both in-sample fits and out-of-sample predictions on the datasets, the manuscript does not sufficiently describe the exact protocol. In the revised version, we will expand the Evaluation section to fully detail the validation approach, including the use of hold-out sets, cross-validation procedures, baseline comparators, and sensitivity analyses to the GMM segmentation. This will allow readers to assess the robustness of the 91-96% accuracy claims. revision: yes
-
Referee: Method section (parameter-fitting procedure): resource parameters (active personnel c and output rate) are obtained by minimizing KL divergence between the model and the empirical queue-length distribution computed from the same timestamps used to define each segment. Multiple (λ, μ, c) triples can yield similar steady-state distributions, so the procedure risks recovering values that match statistics without corresponding to true personnel levels; no identifiability analysis or external validation against known organizational data is provided.
Authors: We acknowledge the referee's concern regarding potential non-identifiability of the parameters and the lack of explicit analysis. The minimization is indeed performed on the segmented data, and while we use constraints in the optimization, no formal identifiability proof or sensitivity study is currently included. We will add such an analysis in the Method section, along with explicit statements on how external validation was conducted for the enterprise data using known organizational records. This will clarify that the recovered values correspond to true personnel levels rather than just statistical matches. revision: yes
Circularity Check
Resource parameters fitted by KL minimization to empirical queue-length distributions; reported accuracy measures in-sample reproduction of input statistics
specific steps
-
fitted input called prediction
[Abstract]
"We fit segment-specific arrival, service, and resource parameters through the minimization of Kullback--Leibler divergence (KL) between the empirical and estimated distributions. [...] the model estimates organizational resources, measured in the time-varying active personnel and output rate per personnel, solely from bug report and fix timings [...] Our results provide 91--96% accuracy in resource estimation"
Arrival, service, and resource parameters (including active personnel count c) are obtained by direct numerical minimization of KL(empirical queue-length distribution, model distribution) within each GMM-identified segment. The empirical distributions are computed from the same vulnerability timestamps that constitute the input data. Consequently the reported accuracy quantifies how closely the fitted parameters reproduce the input statistics rather than recovering independently observed personnel levels.
full rationale
The paper's core claim is that organizational resources (time-varying active personnel c and per-person output rate) can be estimated solely from vulnerability timestamps via a non-stationary queueing model. However, the estimation procedure itself consists of segmenting the data with GMM on queue-length distributions and then optimizing arrival/service/resource parameters to minimize KL divergence to those same empirical distributions. The 91-96% accuracy is therefore the goodness-of-fit of this optimization on the training data rather than an independent prediction or validation against externally measured headcount. This matches the fitted-input-called-prediction pattern: the 'estimated' quantities are the direct output of the fitting step whose success is evaluated on the identical data.
Axiom & Free-Parameter Ledger
free parameters (3)
- segment-specific arrival rate
- segment-specific service rate
- segment-specific resource parameters (active personnel count and output rate per person)
axioms (2)
- domain assumption Vulnerability discovery and remediation can be modeled as a queueing process with time-varying arrival and service rates
- domain assumption Queue-length distributions admit piecewise-stationary regimes identifiable by Gaussian mixture modeling
Reference graph
Works this paper leans on
-
[1]
P. K. Manadhata and J. M. Wing, “An attack surface metric,”IEEE Transactions on Software Engineering, vol. 37, no. 3, pp. 371–386, 2011
work page 2011
-
[2]
Measuring the size and severity of the integrated cyber attack surface across us county governments,
C. Harry, I. Sivan-Sevilla, and M. McDermott, “Measuring the size and severity of the integrated cyber attack surface across us county governments,”Journal of Cybersecurity, vol. 11, no. 1, p. tyae032, 2025
work page 2025
-
[3]
J. A. Jones,FAIR: Factor Analysis of Information Risk. Risk Manage- ment Insight LLC, 2011
work page 2011
-
[4]
An attack-graph based probabilistic security metric,
H. Wang, D. Zhang, and S. Jajodia, “An attack-graph based probabilistic security metric,” inIFIP Data and Applications Security, 2008, pp. 109– 124
work page 2008
-
[5]
Dynamic security risk manage- ment using bayesian attack graphs,
N. Poolsappasit, R. Dewri, and I. Ray, “Dynamic security risk manage- ment using bayesian attack graphs,”IEEE Transactions on Dependable and Secure Computing, vol. 9, no. 1, pp. 61–74, 2012
work page 2012
-
[6]
Mathematical model on vulnerability characterization and its impact on network epidemics,
K. Haldar and B. K. Mishra, “Mathematical model on vulnerability characterization and its impact on network epidemics,”International Journal of System Assurance Engineering and Management, vol. 8, no. 2, pp. 378–392, 2017
work page 2017
-
[7]
A queueing solution to reduce delay in processing of disclosed vulnerabilities,
A. Feutrill, M. Roughan, J. Ross, and Y . Yarom, “A queueing solution to reduce delay in processing of disclosed vulnerabilities,” inIEEE Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), 2020, pp. 1–11
work page 2020
-
[8]
Bayesian decision network-based security risk management framework,
M. Khosravi-Farmad and A. Ghaemi-Bafghi, “Bayesian decision network-based security risk management framework,”Journal of Net- work and Systems Management, vol. 28, pp. 1794–1819, 2020
work page 2020
-
[9]
A bayesian network model for predicting cyber security threats,
J. J. Ryan and S. D. Dexter, “A bayesian network model for predicting cyber security threats,”Journal of Information Assurance and Security, vol. 4, no. 2, pp. 105–114, 2009
work page 2009
-
[10]
Q. Zhang, C. Zhou, Y .-C. Tian, N. Xiong, Y . Qin, and B. Hu, “A fuzzy probability bayesian network approach for dynamic cybersecurity risk assessment in industrial control systems,”IEEE Transactions on Industrial Informatics, vol. 14, no. 6, pp. 2497–2506, 2018
work page 2018
-
[11]
Toward scalable graph-based security analysis for cloud networks,
A. Sabur, A. Chowdhary, D. Huang, and A. Alshamrani, “Toward scalable graph-based security analysis for cloud networks,”Computer Networks, vol. 206, p. 108795, 2022
work page 2022
-
[12]
Probabilistic modeling and analysis of sequential cyber-attacks,
Q. Liu, L. Xing, and C. Zhou, “Probabilistic modeling and analysis of sequential cyber-attacks,”Engineering Reports, vol. 1, no. 4, 2019
work page 2019
-
[13]
Systematic literature review of security event correlation methods,
I. Kotenko, D. Gaifulina, and I. Zelichenok, “Systematic literature review of security event correlation methods,”IEEE Access, vol. 10, pp. 43 387– 43 420, 2022
work page 2022
-
[14]
Prometheus: Infrastructure security posture analysis with ai-generated attack graphs,
X. Jinet al., “Prometheus: Infrastructure security posture analysis with ai-generated attack graphs,” 2023, preprint
work page 2023
-
[15]
Graphene: Infrastructure security posture analysis with ai- generated attack graphs,
——, “Graphene: Infrastructure security posture analysis with ai- generated attack graphs,”arXiv preprint arXiv:2312.13119, 2023
-
[16]
Using retriever-augmented llms to generate attack graphs,
C. Liet al., “Using retriever-augmented llms to generate attack graphs,” arXiv preprint arXiv:2408.05855, 2024
-
[17]
2024 vulnerability statistics report,
Edgescan, “2024 vulnerability statistics report,” Tech. Rep., 2025
work page 2024
-
[18]
Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, and Daniel Kang
R. Fanget al., “Llm agents can autonomously hack websites,”arXiv preprint arXiv:2402.06664, 2024
-
[19]
ARVO: Atlas of Re- producible Vulnerabilities for Open Source Software, August 2024
X. Meiet al., “Arvo: Atlas of reproducible vulnerabilities for open source software,”arXiv preprint arXiv:2408.02153, 2024
-
[20]
Deepcode ai fix: Fixing security vulnerabilities with large language models,
B. Berabiet al., “Deepcode ai fix: Fixing security vulnerabilities with large language models,” 2024
work page 2024
-
[21]
O. Yoranet al., “Assistantbench: Can web agents solve realistic tasks on the open web?”arXiv preprint arXiv:2407.15711, 2024
-
[22]
Formalizing and bench- marking prompt injection attacks and defenses,
Y . Liu, Y . Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and bench- marking prompt injection attacks and defenses,” inUSENIX Security, 2024
work page 2024
- [23]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.