pith. sign in

arxiv: 2604.16038 · v1 · submitted 2026-04-17 · 💻 cs.CR

Modeling Sparse and Bursty Vulnerability Sightings: Forecasting Under Data Constraints

Pith reviewed 2026-05-10 08:14 UTC · model grok-4.3

classification 💻 cs.CR
keywords vulnerability forecastingPoisson regressionSARIMAXsparse datacyber threat intelligenceseverity scorescount modelingbursty events
0
0 comments X

The pith

Poisson regression models offer more stable forecasts than SARIMAX for sparse and bursty vulnerability sightings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether vulnerability sightings can be forecasted using time-series methods, incorporating severity scores from a transformer model as inputs. It finds that traditional SARIMAX approaches struggle with the sparse, short, and bursty nature of the data, often yielding unrealistic results like negative values. In contrast, Poisson regression on weekly aggregated counts produces more reliable and interpretable predictions. This matters because better forecasting could help in anticipating cyber threats and prioritizing responses in vulnerability management.

Core claim

Vulnerability sightings exhibit sparse and bursty patterns that standard autoregressive models like SARIMAX cannot adequately capture, leading to wide confidence intervals and invalid predictions. Poisson regression models, when applied to weekly aggregated data and augmented with severity scores derived from textual descriptions, yield more stable and interpretable forecasts. Simpler methods like exponential decay functions also offer practical alternatives for short-term horizons without needing extensive historical data.

What carries the argument

Comparison of SARIMAX time-series models and Poisson regression for modeling sighting counts, using VLAI-derived severity scores as exogenous variables.

If this is right

  • Aggregating to weekly counts improves stability for bursty sighting data.
  • Severity scores serve as useful exogenous inputs for better forecasts.
  • Exponential decay functions enable short-horizon estimates without long histories.
  • Predictive models can support vulnerability intelligence workflows under data constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Zero-inflated Poisson models could handle the high number of zero sightings more effectively.
  • Improved forecasts might allow security teams to prioritize patching for vulnerabilities likely to see activity soon.
  • The preference for count models suggests cyber events follow discrete rather than continuous dynamics.
  • Testing on multi-year data could validate if patterns persist across different threat landscapes.

Load-bearing premise

That severity scores derived from textual descriptions can serve as useful exogenous variables and that sighting counts are adequately described by standard Poisson or time-series assumptions despite sparsity.

What would settle it

Running both models on a new set of vulnerabilities and comparing mean absolute error of forecasts against actual sighting counts, particularly checking for negative predictions in SARIMAX.

Figures

Figures reproduced from arXiv: 2604.16038 by Alexandre Dulaunoy, Cedric Bonhomme.

Figure 1
Figure 1. Figure 1: Observed sightings over time for CVE-2025-61932 [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: SARIMAX with Log-transform counts without seasonal components [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Poisson regression [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Exponential decay With a sufficient number of sightings, the Poisson regression typically produces results comparable to the expo￾nential decay method, as illustrated in fig. 3 and fig. 4. 6 [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Logistic model The fig. 6 was produced with the same logistic model, excluding all observations of the vulnerability after 2025- 11-01, in order to assess the accuracy of its forecast [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Logistic model with sigthings up to 2025-11-01 [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Observed sightings over time for CVE-2025-59287 [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: SARIMAX with Log-transform counts without seasonal components [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Poisson regression [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Poisson regression The figure fig. 10 was generated using the same Poisson model as fig. 9, but with all observations after 2025-11-01 removed to evaluate the forecast’s accuracy. We observe that the growth appears stronger in fig. 10, which is expected, but the final prediction overestimates the reality. Another observation is that a sudden drop in sightings collection is often not visible with the Poiss… view at source ↗
Figure 11
Figure 11. Figure 11: Exponential decay The adaptive solution would have selected the exponential decay model.5 8.3 CVE-2022-26134 [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Observed sightings over time for CVE-2022-26134 [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Poisson regression [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Logistic model While the decay model proves inadequate here, the logistic model appears largely unaffected by recent bursts of sightings for vulnerabilities monitored over an extended period (several months or years) with regular observations. We confirmed this behavior using sightings primarily sourced from the Shadowserver project. 9 Practical suggestions Based on our experience, we suggest: • Prioritiz… view at source ↗
read the original abstract

Understanding and anticipating vulnerability-related activity is a major challenge in cyber threat intelligence. This work investigates whether vulnerability sightings, such as proof-of-concept releases, detection templates, or online discussions, can be forecast over time. Building on our earlier work on VLAI, a transformer-based model that predicts vulnerability severity from textual descriptions, we examine whether severity scores can improve time-series forecasting as exogenous variables. We evaluate several approaches for short-term forecasting of sightings per vulnerability. First, we test SARIMAX models with and without log(x+1) transformations and VLAI-derived severity inputs. Although these adjustments provide limited improvements, SARIMAX remains poorly suited to sparse, short, and bursty vulnerability data. In practice, forecasts often produce overly wide confidence intervals and sometimes unrealistic negative values. To better capture the discrete and event-driven nature of sightings, we then explore count-based methods such as Poisson regression. Early results show that these models produce more stable and interpretable forecasts, especially when sightings are aggregated weekly. We also discuss simpler operational alternatives, including exponential decay functions for short forecasting horizons, to estimate future activity without requiring long historical series. Overall, this study highlights both the potential and the limitations of forecasting rare and bursty cyber events, and provides practical guidance for integrating predictive analytics into vulnerability intelligence workflows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper investigates forecasting sparse, bursty vulnerability sightings (e.g., PoC releases, discussions) using SARIMAX models (with/without log(x+1) transforms and VLAI severity scores as exogenous inputs) and count-based alternatives like Poisson regression. It claims SARIMAX is poorly suited, often yielding negative values and wide intervals, while Poisson regression (especially on weekly aggregates) produces more stable, interpretable forecasts; simpler exponential decay is also discussed for short horizons. The work builds on prior VLAI severity prediction and aims to provide practical guidance for cyber threat intelligence under data constraints.

Significance. If the empirical comparison holds after adding quantitative validation, the result would be useful for practitioners by demonstrating the mismatch between standard time-series tools and rare-event cyber data, while showing how severity scores from text models can be integrated as covariates. The emphasis on operational simplicity (e.g., decay functions) is a strength for real-world deployment where long histories are unavailable.

major comments (2)
  1. Abstract: the central claim that 'Poisson regression models produce more stable and interpretable forecasts' is unsupported by any reported metrics (MAE, RMSE, coverage, dispersion statistic, or comparison to negative binomial), error bars, or full experimental details, leaving the superiority over SARIMAX unverified.
  2. Abstract: no overdispersion diagnostics (e.g., variance-to-mean ratio or likelihood ratio test against negative binomial) are described despite the explicitly bursty nature of the counts; violation of the Poisson equidispersion assumption would bias standard errors and miscalibrate forecast intervals, directly undermining the stability claim.
minor comments (1)
  1. Abstract: the phrase 'limited improvements' from SARIMAX adjustments is imprecise; specify which performance aspect (bias, interval width, or predictive accuracy) was evaluated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback emphasizing the need for quantitative validation and overdispersion checks. We agree that the original claims in the abstract required stronger empirical support and have revised the manuscript accordingly by adding the requested metrics, diagnostics, and model comparisons.

read point-by-point responses
  1. Referee: [—] Abstract: the central claim that 'Poisson regression models produce more stable and interpretable forecasts' is unsupported by any reported metrics (MAE, RMSE, coverage, dispersion statistic, or comparison to negative binomial), error bars, or full experimental details, leaving the superiority over SARIMAX unverified.

    Authors: We agree that the abstract's reference to 'early results' was insufficiently supported by quantitative evidence in the initial submission. In the revised manuscript we have expanded the experimental evaluation to report MAE, RMSE, and interval coverage for SARIMAX versus Poisson regression on both daily and weekly aggregates. We also include bootstrap-derived error bars, a direct comparison to negative binomial regression, and explicit counts of invalid negative forecasts produced by SARIMAX. These additions confirm that Poisson (and negative binomial) models yield lower error and more stable intervals, particularly after weekly aggregation, while SARIMAX frequently produces negative values and overly wide intervals. revision: yes

  2. Referee: [—] Abstract: no overdispersion diagnostics (e.g., variance-to-mean ratio or likelihood ratio test against negative binomial) are described despite the explicitly bursty nature of the counts; violation of the Poisson equidispersion assumption would bias standard errors and miscalibrate forecast intervals, directly undermining the stability claim.

    Authors: We acknowledge the omission of formal overdispersion diagnostics. The revised version now includes variance-to-mean ratios computed for each vulnerability sighting series and likelihood-ratio tests comparing Poisson to negative binomial specifications. Where moderate overdispersion is detected, we report that negative binomial regression further improves interval calibration without altering the overall finding that count-based models remain more stable and interpretable than SARIMAX. These diagnostics refine rather than contradict our recommendation for count-based approaches under data constraints. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison of standard models

full rationale

The manuscript is an empirical study comparing off-the-shelf SARIMAX and Poisson regression models on sparse vulnerability sighting counts, with VLAI severity scores used only as exogenous inputs. No equations, derivations, or 'predictions' are presented that reduce by construction to fitted parameters or self-citations. The reference to prior VLAI work supplies an auxiliary feature and does not carry the central claim about model stability. The analysis therefore contains no load-bearing self-definition, fitted-input renaming, or uniqueness theorem imported from the authors' own prior results.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The modeling rests on standard statistical assumptions about time series and count data; no new entities are postulated.

free parameters (2)
  • SARIMAX order parameters
    p, d, q orders and seasonal terms chosen or fitted to the sighting series
  • VLAI severity scores
    Used as exogenous regressors; their scaling and inclusion decisions are model choices
axioms (1)
  • domain assumption Vulnerability sighting counts follow distributions amenable to SARIMAX or Poisson modeling after possible transformation
    Invoked when choosing and evaluating the regression families for sparse bursty series

pith-pipeline@v0.9.0 · 5529 in / 1240 out tokens · 38782 ms · 2026-05-10T08:14:43.871744+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

  1. [1]

    Scoring vulnerabilities by leveraging activity data from the fediverse

    C ´edric Bonhomme and Alexandre Dulaunoy. Scoring vulnerabilities by leveraging activity data from the fediverse. InCyber Threat Intelligence Conference, 2025

  2. [2]

    VLAI: A RoBERTa-based model for automated vulnerability sever- ity classification, 2025

    C ´edric Bonhomme and Alexandre Dulaunoy. VLAI: A RoBERTa-based model for automated vulnerability sever- ity classification, 2025

  3. [3]

    Enhancing vulnerability prioritization: Data-driven exploit predictions with community-driven insights, 2023

    Jay Jacobs, Sasha Romanosky, Octavian Suciu, Benjamin Edwards, and Armin Sarabi. Enhancing vulnerability prioritization: Data-driven exploit predictions with community-driven insights, 2023

  4. [4]

    Vulnerability forecasting: Theory and practice.Digital Threats, 3(4), March 2022

    ´Eireann Leverett, Matilda Rhode, and Adam Wedgbury. Vulnerability forecasting: Theory and practice.Digital Threats, 3(4), March 2022. 12