Neural Negative Binomial Regression for Weekly Seismicity Forecasting: Per-Cell Dispersion Estimation and Tail Risk Assessment

Alim Igilik

arxiv: 2605.21437 · v1 · pith:PW3CDIIInew · submitted 2026-05-20 · ⚛️ physics.geo-ph · cs.LG· stat.ML

Neural Negative Binomial Regression for Weekly Seismicity Forecasting: Per-Cell Dispersion Estimation and Tail Risk Assessment

Alim Igilik This is my paper

Pith reviewed 2026-05-21 02:30 UTC · model grok-4.3

classification ⚛️ physics.geo-ph cs.LGstat.ML

keywords seismicity forecastingnegative binomial regressionper-cell dispersionspatial heterogeneityoverdispersiontail riskneural networksprobabilistic forecasting

0 comments

The pith

A neural network learns a unique overdispersion parameter for each grid cell to forecast weekly earthquake counts and improve tail-risk alerts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard Poisson and negative binomial models for weekly seismicity assume one global dispersion value, yet this assumption fails for Central Asia earthquake data from 2010-2024. The paper introduces an architecture that computes a separate dispersion parameter per spatial cell from embeddings and a multilayer perceptron. This per-cell approach captures local differences in earthquake clustering and supports quantile-based probabilistic forecasts for risk assessment. Walk-forward tests show concrete gains over a global-alpha baseline, especially for weeks with five or more events.

Core claim

The EarthquakeNet architecture supplies an endogenous per-cell estimate of the negative binomial overdispersion parameter alpha through spatial embeddings plus an MLP, replacing the single global alpha used in prior negative binomial regression for seismological forecasting; the resulting distribution adapts to spatial heterogeneity in clustering and yields quantiles for risk-aware alerts.

What carries the argument

Per-cell overdispersion parameter alpha produced by a spatial-embedding MLP that replaces the uniform alpha of standard negative binomial regression.

If this is right

Quantiles of the cell-specific negative binomial distribution can be used directly for probabilistic risk alerts.
Forecast accuracy improves most in the tail regime where weekly counts reach five or higher.
The model identifies spatial patterns in seismic clustering that a global dispersion parameter cannot resolve.
An 8.6 percent drop in mean pinball deviation and 12.5 percent lower CRPS in the tail relative to a negative binomial GLM baseline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same per-cell dispersion idea could apply to other spatial count forecasting tasks where clustering strength varies by location.
High-dispersion cells identified by the model might serve as targets for denser monitoring networks.
Adding temporal features or known fault data to the embedding stage would likely refine the dispersion estimates further.

Load-bearing premise

Spatial embeddings plus a standard multilayer perceptron suffice to recover meaningful local dispersion values without explicit spatial covariance terms or extra geophysical covariates.

What would settle it

A map of the learned per-cell alpha values compared cell-by-cell with independent clustering statistics computed directly from historical event sequences in those same cells; systematic mismatch would falsify the claim that the network extracts genuine heterogeneity.

Figures

Figures reproduced from arXiv: 2605.21437 by Alim Igilik.

**Figure 1.** Figure 1: Overdispersion diagnostics following Theorem 3.15: the heavy-tailed marginal distribution of Y , the prevalence of local dispersion indices D = Var(Y )/E(Y ) > 1, and the systematic deviation from the Poisson relation Var = E. Proof. The density of the random effect λ ∗ ∼ Gamma(r, β) is: fλ∗ (ℓ) = β r Γ(r) ℓ r−1 e −βℓ, ℓ > 0. By the law of total probability: P(Y = y) = Z ∞ 0 P(Y = y | λ ∗ = ℓ) fλ∗ (ℓ) dℓ. … view at source ↗

**Figure 2.** Figure 2: Walk-Forward MPD stability by test year (four systems). (grid of 60 points): LR = 2 log LNB − log LPoisson = 820.21, αˆMLE = 2.98. Since H0 corresponds to α = 0, which lies on the boundary of the NB parameter space, the null distribution of the LR statistic under H0 is the boundary mixture [10] 1 2 δ0 + 1 2 χ 2 1 , rather than χ 2 1 . The standard χ 2 1 critical value would overstate significance; the bou… view at source ↗

**Figure 3.** Figure 3: Randomized PIT histograms. The red horizontal line indicates the expected level under uniformity. Both models exhibit PIT histograms close to uniform. The empirical moments are consistent with the theoretical targets: E[PIT] ≈ 0.5 and Var(PIT) ≈ 0.084, compared to the uniform reference of 1/12 ≈ 0.083. At the marginal level, Neural Poisson is marginally better (L1 = 0.00448 vs. 0.00466), indicating slight… view at source ↗

**Figure 4.** Figure 4: MPD by stratum and model (quartiles + Y ≥ 5). by adapting the dispersion to each cell’s seismotectonic regime, the model assigns higher probability mass to large counts where the GLM, constrained to a global αˆ ≈ 2.98, systematically underestimates tail probabilities. Notably, Neural Poisson Enhanced achieves lower MPD than Hybrid DL NB in the Y ≥ 5 stratum (26.426 vs. 25.643 — wait, NB is better here), co… view at source ↗

**Figure 5.** Figure 5: Moran’s I of Pearson residuals (red indicates significance at p < 0.05). 4.8 Audit of Parameter α and Identifiability Global statistics of the predicted α for Hybrid_DL_Enhanced (seed 42): n = 2448, α = 3.44, median(α) = 3.61, q0.1 = 1.63, q0.9 = 5.17, P(α < 10−2 ) = 0. (a) Distribution of predicted α. (b) Boxplot of α across 5 seeds [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗

**Figure 6.** Figure 6: Audit of the overdispersion parameter α for Hybrid DL NB Enhanced: marginal distribution and stability across independent seeds. The absence of a near-zero regime (P(α < 10−2 ) = 0 across all seeds) confirms that the network does not collapse to the Poisson limit (α → 0), as established theoretically in Proposition 3.17. The mean predicted α = 3.44 is consistent with the GLM profile-MLE estimate αˆMLE = 2.… view at source ↗

read the original abstract

Standard approaches to forecasting the weekly number of earthquakes on a spatial grid rely on the Poisson distribution with a single global dispersion assumption. We show that this assumption is systematically violated in seismic data from Central Asia (2010-2024), where a likelihood-ratio test with boundary correction strongly rejects the Poisson hypothesis (p < 10^{-179}). The main contribution of this work is the EarthquakeNet architecture, which provides an endogenous per-cell estimate of the overdispersion parameter alpha via a neural network (spatial embeddings + MLP), without explicit spatial covariance specification. In contrast to existing negative binomial regression approaches in seismological forecasting, which typically assume a single global alpha, the proposed per-cell formulation allows the model to identify spatial heterogeneity in seismic clustering and to construct probabilistic risk-aware alerts via quantiles of the predicted distribution. A walk-forward evaluation (2018-2023) over four systems shows an 8.6 percent reduction in mean pinball deviation (MPD) relative to a negative binomial GLM baseline. The strongest improvements are observed in the tail regime (Y >= 5), where the continuous ranked probability score (CRPS) of the proposed model is 12.5 percent lower than that of the baseline, indicating improved calibration in extreme-event forecasting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces EarthquakeNet, a neural architecture that uses spatial embeddings and an MLP to produce per-cell estimates of the negative binomial dispersion parameter alpha for weekly earthquake count forecasting on a spatial grid in Central Asia (2010-2024). It reports a strong likelihood-ratio rejection of the Poisson model (p < 10^{-179}) and, in walk-forward validation (2018-2023), an 8.6% reduction in mean pinball deviation and 12.5% lower tail CRPS (Y >= 5) relative to a negative binomial GLM baseline that assumes a single global alpha.

Significance. If the per-cell alpha estimates genuinely capture spatial heterogeneity in seismic clustering rather than collapsing or fitting noise, the approach could improve tail-risk calibration and probabilistic alert systems in seismology. The walk-forward design and emphasis on tail-specific metrics are appropriate; however, the central claim that the neural per-cell formulation drives the reported gains rests on the unverified premise that spatial embeddings alone suffice to recover meaningful dispersion variation without explicit covariance structure or geophysical covariates.

major comments (2)

[Section 3] Section 3 (EarthquakeNet architecture description): the claim that spatial embeddings plus a standard MLP recover distinct per-cell alpha values reflecting genuine clustering heterogeneity is not yet supported by direct evidence. Because the negative binomial likelihood couples the mean and dispersion parameters, and no spatial covariance (e.g., GP, convolutional layers) or auxiliary covariates (fault maps, strain rates) are included, the per-cell estimates risk non-identifiability or collapse to near-global values; this directly undermines the attribution of the 8.6% MPD and 12.5% tail-CRPS gains to the per-cell dispersion mechanism.
[Section 4] Results, walk-forward evaluation (Section 4): an ablation isolating the contribution of per-cell alpha is missing. A comparison against a neural model that retains per-cell means but enforces a single global alpha would be required to confirm that the observed improvements in MPD and tail CRPS arise from spatially varying dispersion rather than from the neural network's added flexibility in modeling the mean rate.

minor comments (2)

[Methods] The exact negative binomial parameterization (mean-dispersion vs. other forms) and the precise output activation used for alpha should be stated explicitly, together with any constraints applied to keep alpha positive.
[Results] A map or summary statistic (e.g., histogram, spatial autocorrelation) of the learned per-cell alpha values would help readers assess whether the estimates exhibit plausible spatial structure rather than random variation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on identifiability and the need for targeted ablations. These points help clarify the attribution of performance gains to the per-cell dispersion mechanism. We respond to each major comment below and indicate the revisions that will be incorporated.

read point-by-point responses

Referee: Section 3 (EarthquakeNet architecture description): the claim that spatial embeddings plus a standard MLP recover distinct per-cell alpha values reflecting genuine clustering heterogeneity is not yet supported by direct evidence. Because the negative binomial likelihood couples the mean and dispersion parameters, and no spatial covariance (e.g., GP, convolutional layers) or auxiliary covariates (fault maps, strain rates) are included, the per-cell estimates risk non-identifiability or collapse to near-global values; this directly undermines the attribution of the 8.6% MPD and 12.5% tail-CRPS gains to the per-cell dispersion mechanism.

Authors: We acknowledge that direct evidence is required to demonstrate that the learned alphas are distinct and not collapsed. Although the negative binomial parameterization treats the mean rate and dispersion alpha as separate outputs of the network (with variance = mu + mu^2/alpha), the coupling through the likelihood does create a risk of non-identifiability when no explicit spatial structure is present. To address this concern, the revised manuscript will add: (i) a spatial map of the estimated per-cell alpha values, (ii) quantitative summary statistics (mean, variance, min/max) of alpha across the grid to show deviation from a single global value, and (iii) a brief discussion of how the per-cell formulation remains identifiable under the full likelihood when the data exhibit sufficient heterogeneity. These additions will provide the missing direct evidence. revision: yes
Referee: Results, walk-forward evaluation (Section 4): an ablation isolating the contribution of per-cell alpha is missing. A comparison against a neural model that retains per-cell means but enforces a single global alpha would be required to confirm that the observed improvements in MPD and tail CRPS arise from spatially varying dispersion rather than from the neural network's added flexibility in modeling the mean rate.

Authors: We agree that the current comparison to the GLM baseline does not fully isolate the effect of per-cell dispersion from the added flexibility of the neural mean model. In the revised manuscript we will add an ablation study that trains an otherwise identical neural architecture (same spatial embeddings and MLP for the mean) but replaces the per-cell alpha head with a single shared global alpha parameter. Results for mean pinball deviation and tail CRPS (Y >= 5) will be reported for this global-alpha neural variant alongside the original per-cell model and the GLM baseline. This will allow a direct assessment of whether the reported 8.6% MPD and 12.5% tail-CRPS improvements are attributable to spatially varying dispersion. revision: yes

Circularity Check

0 steps flagged

No circularity: results from out-of-sample walk-forward evaluation

full rationale

The paper's central claims rest on a neural architecture (spatial embeddings + MLP) trained to produce per-cell negative binomial dispersion parameters, followed by explicit walk-forward validation (2018-2023) that computes MPD and tail CRPS against an independent negative binomial GLM baseline on held-out seismic counts. These metrics are not algebraically forced by the fitted parameters themselves; the likelihood-ratio rejection of global Poisson and the reported 8.6% / 12.5% gains are data-driven comparisons. No self-citation chain, uniqueness theorem, or ansatz is invoked to derive the per-cell alpha values or the risk quantiles; the architecture is presented as a modeling choice whose value is assessed externally. The derivation chain is therefore self-contained against the evaluation data and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The model rests on the domain assumption that negative binomial is the right family once overdispersion is acknowledged, plus a large number of fitted neural-network parameters that define the per-cell mapping; no new physical entities are postulated.

free parameters (1)

neural network parameters (embeddings and MLP weights)
All weights and biases of the spatial embedding layer and subsequent MLP are fitted to the training seismic counts to produce cell-specific alpha values.

axioms (1)

domain assumption Negative binomial distribution adequately captures the overdispersion present in weekly earthquake counts once a per-cell alpha is supplied.
Invoked when replacing the rejected Poisson model and when constructing the predictive distributions for quantiles and CRPS.

invented entities (1)

EarthquakeNet no independent evidence
purpose: Neural architecture that maps spatial embeddings to per-cell dispersion parameters.
New model name and structure introduced to implement the per-cell formulation.

pith-pipeline@v0.9.0 · 5760 in / 1477 out tokens · 41201 ms · 2026-05-21T02:30:52.501895+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

[1]

Ogata, Y. (1988). Statistical models for earthquake occurrences and residual analysis for point processes.Journal of the American Statistical Association, 83(401), 9–27

work page 1988
[2]

Helmstetter, A., & Sornette, D. (2002). Subcritical and supercritical regimes in epidemic models of earthquake aftershocks.Journal of Geophysical Research: Solid Earth, 107(B10), ESE 10-1–ESE 10-21

work page 2002
[3]

Zhuang, J. (2011). Next-day earthquake forecasts for the Japan region generated by the ETAS model.Earth, Planets and Space, 63(3), 207–216

work page 2011
[4]

(2013).Regression Analysis of Count Data(2nd ed.)

Cameron, A.C., & Trivedi, P.K. (2013).Regression Analysis of Count Data(2nd ed.). Cambridge University Press

work page 2013
[5]

Lawless, J.F. (1987). Negative binomial and mixed Poisson regression.The Canadian Journal of Statistics, 15(3), 209–225

work page 1987
[6]

Du, N., Dai, H., Trivedi, R., Upadhyay, U., Gomez-Rodriguez, M., & Song, L. (2016). Recur- rent marked temporal point processes: Embedding event history to vector. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(pp. 1555–1564)

work page 2016
[7]

Mei, H., & Eisner, J.M. (2017). The neural Hawkes process: A neurally self-modulating multivariate point process. InAdvances in Neural Information Processing Systems(Vol. 30)

work page 2017
[8]

Shchur, O., Türkmen, A.C., Januschowski, T., & Günnemann, S. (2020). Intensity-free learning of temporal point processes. InInternational Conference on Learning Representa- tions

work page 2020
[9]

Wiemer, S., & Wyss, M. (2000). Minimum magnitude of completeness in earthquake catalogs: Examples from Alaska, the western United States, and Japan.Bulletin of the Seismological Society of America, 90(4), 859–869

work page 2000
[10]

Self, S.G., & Liang, K.-Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions.Journal of the American Statistical Association, 82(398), 605–610. 28Igilik Alim

work page 1987
[11]

DeVries, P.M.R., Viégas, F., Wattenberg, M., & Meade, B.J. (2018). Deep learning of aftershock patterns following large earthquakes.Nature, 560, 632–634

work page 2018
[12]

Mignan, A., & Broccardo, M. (2019). One neuron versus deep learning in aftershock prediction.Nature, 574, E1–E3

work page 2019
[13]

(1981).Spatial Processes: Models & Applications

Cliff, A.D., & Ord, J.K. (1981).Spatial Processes: Models & Applications. Pion

work page 1981
[14]

Czado, C., Gneiting, T., & Held, L. (2009). Predictive model assessment for count data. Biometrics, 65(4), 1254–1261

work page 2009

[1] [1]

Ogata, Y. (1988). Statistical models for earthquake occurrences and residual analysis for point processes.Journal of the American Statistical Association, 83(401), 9–27

work page 1988

[2] [2]

Helmstetter, A., & Sornette, D. (2002). Subcritical and supercritical regimes in epidemic models of earthquake aftershocks.Journal of Geophysical Research: Solid Earth, 107(B10), ESE 10-1–ESE 10-21

work page 2002

[3] [3]

Zhuang, J. (2011). Next-day earthquake forecasts for the Japan region generated by the ETAS model.Earth, Planets and Space, 63(3), 207–216

work page 2011

[4] [4]

(2013).Regression Analysis of Count Data(2nd ed.)

Cameron, A.C., & Trivedi, P.K. (2013).Regression Analysis of Count Data(2nd ed.). Cambridge University Press

work page 2013

[5] [5]

Lawless, J.F. (1987). Negative binomial and mixed Poisson regression.The Canadian Journal of Statistics, 15(3), 209–225

work page 1987

[6] [6]

Du, N., Dai, H., Trivedi, R., Upadhyay, U., Gomez-Rodriguez, M., & Song, L. (2016). Recur- rent marked temporal point processes: Embedding event history to vector. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(pp. 1555–1564)

work page 2016

[7] [7]

Mei, H., & Eisner, J.M. (2017). The neural Hawkes process: A neurally self-modulating multivariate point process. InAdvances in Neural Information Processing Systems(Vol. 30)

work page 2017

[8] [8]

Shchur, O., Türkmen, A.C., Januschowski, T., & Günnemann, S. (2020). Intensity-free learning of temporal point processes. InInternational Conference on Learning Representa- tions

work page 2020

[9] [9]

Wiemer, S., & Wyss, M. (2000). Minimum magnitude of completeness in earthquake catalogs: Examples from Alaska, the western United States, and Japan.Bulletin of the Seismological Society of America, 90(4), 859–869

work page 2000

[10] [10]

Self, S.G., & Liang, K.-Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions.Journal of the American Statistical Association, 82(398), 605–610. 28Igilik Alim

work page 1987

[11] [11]

DeVries, P.M.R., Viégas, F., Wattenberg, M., & Meade, B.J. (2018). Deep learning of aftershock patterns following large earthquakes.Nature, 560, 632–634

work page 2018

[12] [12]

Mignan, A., & Broccardo, M. (2019). One neuron versus deep learning in aftershock prediction.Nature, 574, E1–E3

work page 2019

[13] [13]

(1981).Spatial Processes: Models & Applications

Cliff, A.D., & Ord, J.K. (1981).Spatial Processes: Models & Applications. Pion

work page 1981

[14] [14]

Czado, C., Gneiting, T., & Held, L. (2009). Predictive model assessment for count data. Biometrics, 65(4), 1254–1261

work page 2009