pith. sign in

arxiv: 2604.10250 · v1 · submitted 2026-04-11 · 💻 cs.CR · cs.SE· eess.SP

Organizational Security Resource Estimation via Vulnerability Queueing

Pith reviewed 2026-05-10 15:33 UTC · model grok-4.3

classification 💻 cs.CR cs.SEeess.SP
keywords vulnerability queueingresource estimationcyber securityattack surfacenon-stationary queueingGaussian mixture modelingpersonnel estimationpatch management
0
0 comments X

The pith

A queueing model estimates an organization's active security personnel directly from vulnerability timestamps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a way to count how many people are actively fixing vulnerabilities inside an organization by modeling the entire flow of reports and resolutions as a queue. Traditional metrics only count open issues at one moment and miss how arrivals and fixes actually compete for limited staff time. The method splits the timeline into stretches where the queue length behaves consistently, then tunes arrival rates, service rates, and the number of active workers until the simulated queue lengths match the real data. A reader would care because the same timestamps already collected for bug tracking could reveal staffing levels, bottlenecks, and output per person without needing separate headcount reports.

Core claim

The authors claim that a non-stationary queueing framework, which segments vulnerability data into piecewise-stationary regimes by applying Gaussian mixture modeling to the observed queue-length distributions and then fits segment-specific arrival, service, and resource parameters by minimizing Kullback-Leibler divergence to the empirical distributions, recovers time-varying active personnel counts and per-person output rates solely from bug-report and fix timestamps, attaining 91 to 96 percent accuracy on both large-scale software supply-chain records and multi-year enterprise cyber-ticket logs.

What carries the argument

A non-stationary queueing abstraction of the attack surface that is segmented into piecewise-stationary regimes via Gaussian mixture modeling on queue-length distributions and whose arrival, service, and resource parameters are recovered by minimizing Kullback-Leibler divergence to the empirical distributions.

If this is right

  • The fitted model directly exposes periods when available resources are insufficient to keep up with incoming vulnerabilities.
  • The same timestamps can be used to forecast future staffing requirements under different vulnerability-arrival scenarios.
  • The framework supplies the inputs needed for patch-race simulations and proactive risk calculations.
  • Resource estimates become possible for any organization that already records discovery and remediation times, without extra data collection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Public vulnerability databases could be mined to produce external benchmarks of how many staff competitors allocate to security.
  • The same segmentation and fitting procedure could be tested on other timestamped workflows such as IT incident tickets or customer-support queues.
  • Organizations could run what-if simulations inside the fitted model to quantify how adding or removing personnel would change resolution times.

Load-bearing premise

That the real sequence of vulnerability discoveries and fixes behaves enough like a queueing system that matching the observed queue-length distributions recovers the true underlying number of active personnel rather than merely reproducing the statistics.

What would settle it

An independent count of active security personnel over the same time windows that differs substantially from the numbers recovered by fitting the queueing model to the same timestamp data.

Figures

Figures reproduced from arXiv: 2604.10250 by Abdullah Y. Etcibasi, C. Emre Koksal, Zachary Dobos.

Figure 1
Figure 1. Figure 1: Queueing representation of the attack surface. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Temporal dynamics in the ARVO and logistics enter [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Segmentation and mixture-model fitting results for the ARVO dataset. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Segment-level fitting and overall QLD validation for ARVO. The segmented [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Segmentation and mixture-model fitting results for the logistics enterprise dataset. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Segment-level fitting and overall QLD validation for the logistics enterprise dataset. The aggregated segmented model [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
read the original abstract

We provide an approach that closely estimates an organization's cyber resources directly from vulnerability timestamps, using a non-stationary queueing framework. Traditional attack-surface metrics operate on static snapshots, ignoring the core attack-defense dynamics within information systems, which exhibit bursty, heavy-tailed, and capacity-constrained behavior. Our approach to modeling such dynamics is based on a queueing abstraction of attack surfaces. We utilize a segmentation method to identify piecewise-stationary regimes via Gaussian mixture modeling (GMM) of queue length distributions. We fit segment-specific arrival, service, and resource parameters through the minimization of Kullback--Leibler divergence (KL) between the empirical and estimated distributions. Applied to both large-scale software supply chain data and multi-year private logistics enterprise cyber-ticket workflows, the model estimates organizational resources, measured in the time-varying active personnel and output rate per personnel, solely from bug report and fix timings for software supply chains, and discovery and patch timestamps in the enterprise setting. Our results provide 91--96\% accuracy in resource estimation, making the dynamic queueing framework a compelling approach for understanding attack surface dynamics. Further, our framework exposes resource bottlenecks, establishing a foundation for predictive workforce planning, patch-race modeling, and proactive cyber-risk management.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a non-stationary queueing model to estimate an organization's cyber resources (time-varying active personnel count and per-person output rate) directly from vulnerability timestamps. Vulnerabilities are abstracted as jobs in an M/M/c-style queue; piecewise-stationary regimes are identified via GMM clustering on empirical queue-length distributions; segment-specific arrival rate λ, service rate μ, and resource parameters are then recovered by minimizing KL divergence between the empirical and theoretical steady-state queue-length distributions. The method is applied to software supply-chain bug data and multi-year enterprise cyber-ticket workflows, with the central claim that the recovered resource values achieve 91–96 % accuracy.

Significance. If the fitted resource parameters can be shown to recover ground-truth organizational headcount and productivity rather than merely reproducing the observed queue-length statistics, the framework would constitute a meaningful advance over static attack-surface metrics. It would enable inference of hidden defensive capacity, identification of bottlenecks, and data-driven workforce planning from observable timestamps alone, with potential extensions to patch-race modeling and predictive risk management.

major comments (2)
  1. Abstract and Evaluation section: the claim of 91–96 % accuracy in resource estimation is stated without any description of the validation protocol (hold-out sets, cross-validation, baseline comparators, or sensitivity to GMM segmentation). Because the accuracy metric is central to the paper's contribution, the absence of these details prevents assessment of whether the result is robust or an artifact of in-sample fitting.
  2. Method section (parameter-fitting procedure): resource parameters (active personnel c and output rate) are obtained by minimizing KL divergence between the model and the empirical queue-length distribution computed from the same timestamps used to define each segment. Multiple (λ, μ, c) triples can yield similar steady-state distributions, so the procedure risks recovering values that match statistics without corresponding to true personnel levels; no identifiability analysis or external validation against known organizational data is provided.
minor comments (1)
  1. Abstract: the datasets are described only as 'large-scale software supply chain data' and 'multi-year private logistics enterprise cyber-ticket workflows'; adding dataset names, sizes, and public availability statements would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which have helped us identify areas for improvement in the presentation of our results. We provide point-by-point responses below and outline the revisions we will make to address the concerns raised.

read point-by-point responses
  1. Referee: Abstract and Evaluation section: the claim of 91–96 % accuracy in resource estimation is stated without any description of the validation protocol (hold-out sets, cross-validation, baseline comparators, or sensitivity to GMM segmentation). Because the accuracy metric is central to the paper's contribution, the absence of these details prevents assessment of whether the result is robust or an artifact of in-sample fitting.

    Authors: We thank the referee for pointing out the lack of detail on the validation protocol. Upon review, we recognize that while the accuracy is reported based on our internal evaluations using both in-sample fits and out-of-sample predictions on the datasets, the manuscript does not sufficiently describe the exact protocol. In the revised version, we will expand the Evaluation section to fully detail the validation approach, including the use of hold-out sets, cross-validation procedures, baseline comparators, and sensitivity analyses to the GMM segmentation. This will allow readers to assess the robustness of the 91-96% accuracy claims. revision: yes

  2. Referee: Method section (parameter-fitting procedure): resource parameters (active personnel c and output rate) are obtained by minimizing KL divergence between the model and the empirical queue-length distribution computed from the same timestamps used to define each segment. Multiple (λ, μ, c) triples can yield similar steady-state distributions, so the procedure risks recovering values that match statistics without corresponding to true personnel levels; no identifiability analysis or external validation against known organizational data is provided.

    Authors: We acknowledge the referee's concern regarding potential non-identifiability of the parameters and the lack of explicit analysis. The minimization is indeed performed on the segmented data, and while we use constraints in the optimization, no formal identifiability proof or sensitivity study is currently included. We will add such an analysis in the Method section, along with explicit statements on how external validation was conducted for the enterprise data using known organizational records. This will clarify that the recovered values correspond to true personnel levels rather than just statistical matches. revision: yes

Circularity Check

1 steps flagged

Resource parameters fitted by KL minimization to empirical queue-length distributions; reported accuracy measures in-sample reproduction of input statistics

specific steps
  1. fitted input called prediction [Abstract]
    "We fit segment-specific arrival, service, and resource parameters through the minimization of Kullback--Leibler divergence (KL) between the empirical and estimated distributions. [...] the model estimates organizational resources, measured in the time-varying active personnel and output rate per personnel, solely from bug report and fix timings [...] Our results provide 91--96% accuracy in resource estimation"

    Arrival, service, and resource parameters (including active personnel count c) are obtained by direct numerical minimization of KL(empirical queue-length distribution, model distribution) within each GMM-identified segment. The empirical distributions are computed from the same vulnerability timestamps that constitute the input data. Consequently the reported accuracy quantifies how closely the fitted parameters reproduce the input statistics rather than recovering independently observed personnel levels.

full rationale

The paper's core claim is that organizational resources (time-varying active personnel c and per-person output rate) can be estimated solely from vulnerability timestamps via a non-stationary queueing model. However, the estimation procedure itself consists of segmenting the data with GMM on queue-length distributions and then optimizing arrival/service/resource parameters to minimize KL divergence to those same empirical distributions. The 91-96% accuracy is therefore the goodness-of-fit of this optimization on the training data rather than an independent prediction or validation against externally measured headcount. This matches the fitted-input-called-prediction pattern: the 'estimated' quantities are the direct output of the fitting step whose success is evaluated on the identical data.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 0 invented entities

The central claim rests on three fitted quantities per segment and two domain assumptions about queueing behavior and segmentation; no new physical entities are postulated.

free parameters (3)
  • segment-specific arrival rate
    Fitted per piecewise-stationary regime to minimize KL divergence to empirical queue-length distribution
  • segment-specific service rate
    Fitted per regime via the same KL minimization
  • segment-specific resource parameters (active personnel count and output rate per person)
    Obtained as the fitted values that make the model queue-length distribution match the data
axioms (2)
  • domain assumption Vulnerability discovery and remediation can be modeled as a queueing process with time-varying arrival and service rates
    Core modeling abstraction stated in the abstract
  • domain assumption Queue-length distributions admit piecewise-stationary regimes identifiable by Gaussian mixture modeling
    Used to segment the data before parameter fitting

pith-pipeline@v0.9.0 · 5524 in / 1639 out tokens · 66203 ms · 2026-05-10T15:33:37.028179+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

  1. [1]

    An attack surface metric,

    P. K. Manadhata and J. M. Wing, “An attack surface metric,”IEEE Transactions on Software Engineering, vol. 37, no. 3, pp. 371–386, 2011

  2. [2]

    Measuring the size and severity of the integrated cyber attack surface across us county governments,

    C. Harry, I. Sivan-Sevilla, and M. McDermott, “Measuring the size and severity of the integrated cyber attack surface across us county governments,”Journal of Cybersecurity, vol. 11, no. 1, p. tyae032, 2025

  3. [3]

    J. A. Jones,FAIR: Factor Analysis of Information Risk. Risk Manage- ment Insight LLC, 2011

  4. [4]

    An attack-graph based probabilistic security metric,

    H. Wang, D. Zhang, and S. Jajodia, “An attack-graph based probabilistic security metric,” inIFIP Data and Applications Security, 2008, pp. 109– 124

  5. [5]

    Dynamic security risk manage- ment using bayesian attack graphs,

    N. Poolsappasit, R. Dewri, and I. Ray, “Dynamic security risk manage- ment using bayesian attack graphs,”IEEE Transactions on Dependable and Secure Computing, vol. 9, no. 1, pp. 61–74, 2012

  6. [6]

    Mathematical model on vulnerability characterization and its impact on network epidemics,

    K. Haldar and B. K. Mishra, “Mathematical model on vulnerability characterization and its impact on network epidemics,”International Journal of System Assurance Engineering and Management, vol. 8, no. 2, pp. 378–392, 2017

  7. [7]

    A queueing solution to reduce delay in processing of disclosed vulnerabilities,

    A. Feutrill, M. Roughan, J. Ross, and Y . Yarom, “A queueing solution to reduce delay in processing of disclosed vulnerabilities,” inIEEE Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), 2020, pp. 1–11

  8. [8]

    Bayesian decision network-based security risk management framework,

    M. Khosravi-Farmad and A. Ghaemi-Bafghi, “Bayesian decision network-based security risk management framework,”Journal of Net- work and Systems Management, vol. 28, pp. 1794–1819, 2020

  9. [9]

    A bayesian network model for predicting cyber security threats,

    J. J. Ryan and S. D. Dexter, “A bayesian network model for predicting cyber security threats,”Journal of Information Assurance and Security, vol. 4, no. 2, pp. 105–114, 2009

  10. [10]

    A fuzzy probability bayesian network approach for dynamic cybersecurity risk assessment in industrial control systems,

    Q. Zhang, C. Zhou, Y .-C. Tian, N. Xiong, Y . Qin, and B. Hu, “A fuzzy probability bayesian network approach for dynamic cybersecurity risk assessment in industrial control systems,”IEEE Transactions on Industrial Informatics, vol. 14, no. 6, pp. 2497–2506, 2018

  11. [11]

    Toward scalable graph-based security analysis for cloud networks,

    A. Sabur, A. Chowdhary, D. Huang, and A. Alshamrani, “Toward scalable graph-based security analysis for cloud networks,”Computer Networks, vol. 206, p. 108795, 2022

  12. [12]

    Probabilistic modeling and analysis of sequential cyber-attacks,

    Q. Liu, L. Xing, and C. Zhou, “Probabilistic modeling and analysis of sequential cyber-attacks,”Engineering Reports, vol. 1, no. 4, 2019

  13. [13]

    Systematic literature review of security event correlation methods,

    I. Kotenko, D. Gaifulina, and I. Zelichenok, “Systematic literature review of security event correlation methods,”IEEE Access, vol. 10, pp. 43 387– 43 420, 2022

  14. [14]

    Prometheus: Infrastructure security posture analysis with ai-generated attack graphs,

    X. Jinet al., “Prometheus: Infrastructure security posture analysis with ai-generated attack graphs,” 2023, preprint

  15. [15]

    Graphene: Infrastructure security posture analysis with ai- generated attack graphs,

    ——, “Graphene: Infrastructure security posture analysis with ai- generated attack graphs,”arXiv preprint arXiv:2312.13119, 2023

  16. [16]

    Using retriever-augmented llms to generate attack graphs,

    C. Liet al., “Using retriever-augmented llms to generate attack graphs,” arXiv preprint arXiv:2408.05855, 2024

  17. [17]

    2024 vulnerability statistics report,

    Edgescan, “2024 vulnerability statistics report,” Tech. Rep., 2025

  18. [18]

    Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, and Daniel Kang

    R. Fanget al., “Llm agents can autonomously hack websites,”arXiv preprint arXiv:2402.06664, 2024

  19. [19]

    ARVO: Atlas of Re- producible Vulnerabilities for Open Source Software, August 2024

    X. Meiet al., “Arvo: Atlas of reproducible vulnerabilities for open source software,”arXiv preprint arXiv:2408.02153, 2024

  20. [20]

    Deepcode ai fix: Fixing security vulnerabilities with large language models,

    B. Berabiet al., “Deepcode ai fix: Fixing security vulnerabilities with large language models,” 2024

  21. [21]

    2407.15711 , archivePrefix=

    O. Yoranet al., “Assistantbench: Can web agents solve realistic tasks on the open web?”arXiv preprint arXiv:2407.15711, 2024

  22. [22]

    Formalizing and bench- marking prompt injection attacks and defenses,

    Y . Liu, Y . Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and bench- marking prompt injection attacks and defenses,” inUSENIX Security, 2024

  23. [23]

    Gautam,Analysis of Queues

    N. Gautam,Analysis of Queues. CRC Press, 2012