Neural Negative Binomial Regression for Weekly Seismicity Forecasting: Per-Cell Dispersion Estimation and Tail Risk Assessment
Pith reviewed 2026-05-21 02:30 UTC · model grok-4.3
The pith
A neural network learns a unique overdispersion parameter for each grid cell to forecast weekly earthquake counts and improve tail-risk alerts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The EarthquakeNet architecture supplies an endogenous per-cell estimate of the negative binomial overdispersion parameter alpha through spatial embeddings plus an MLP, replacing the single global alpha used in prior negative binomial regression for seismological forecasting; the resulting distribution adapts to spatial heterogeneity in clustering and yields quantiles for risk-aware alerts.
What carries the argument
Per-cell overdispersion parameter alpha produced by a spatial-embedding MLP that replaces the uniform alpha of standard negative binomial regression.
If this is right
- Quantiles of the cell-specific negative binomial distribution can be used directly for probabilistic risk alerts.
- Forecast accuracy improves most in the tail regime where weekly counts reach five or higher.
- The model identifies spatial patterns in seismic clustering that a global dispersion parameter cannot resolve.
- An 8.6 percent drop in mean pinball deviation and 12.5 percent lower CRPS in the tail relative to a negative binomial GLM baseline.
Where Pith is reading between the lines
- The same per-cell dispersion idea could apply to other spatial count forecasting tasks where clustering strength varies by location.
- High-dispersion cells identified by the model might serve as targets for denser monitoring networks.
- Adding temporal features or known fault data to the embedding stage would likely refine the dispersion estimates further.
Load-bearing premise
Spatial embeddings plus a standard multilayer perceptron suffice to recover meaningful local dispersion values without explicit spatial covariance terms or extra geophysical covariates.
What would settle it
A map of the learned per-cell alpha values compared cell-by-cell with independent clustering statistics computed directly from historical event sequences in those same cells; systematic mismatch would falsify the claim that the network extracts genuine heterogeneity.
Figures
read the original abstract
Standard approaches to forecasting the weekly number of earthquakes on a spatial grid rely on the Poisson distribution with a single global dispersion assumption. We show that this assumption is systematically violated in seismic data from Central Asia (2010-2024), where a likelihood-ratio test with boundary correction strongly rejects the Poisson hypothesis (p < 10^{-179}). The main contribution of this work is the EarthquakeNet architecture, which provides an endogenous per-cell estimate of the overdispersion parameter alpha via a neural network (spatial embeddings + MLP), without explicit spatial covariance specification. In contrast to existing negative binomial regression approaches in seismological forecasting, which typically assume a single global alpha, the proposed per-cell formulation allows the model to identify spatial heterogeneity in seismic clustering and to construct probabilistic risk-aware alerts via quantiles of the predicted distribution. A walk-forward evaluation (2018-2023) over four systems shows an 8.6 percent reduction in mean pinball deviation (MPD) relative to a negative binomial GLM baseline. The strongest improvements are observed in the tail regime (Y >= 5), where the continuous ranked probability score (CRPS) of the proposed model is 12.5 percent lower than that of the baseline, indicating improved calibration in extreme-event forecasting.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces EarthquakeNet, a neural architecture that uses spatial embeddings and an MLP to produce per-cell estimates of the negative binomial dispersion parameter alpha for weekly earthquake count forecasting on a spatial grid in Central Asia (2010-2024). It reports a strong likelihood-ratio rejection of the Poisson model (p < 10^{-179}) and, in walk-forward validation (2018-2023), an 8.6% reduction in mean pinball deviation and 12.5% lower tail CRPS (Y >= 5) relative to a negative binomial GLM baseline that assumes a single global alpha.
Significance. If the per-cell alpha estimates genuinely capture spatial heterogeneity in seismic clustering rather than collapsing or fitting noise, the approach could improve tail-risk calibration and probabilistic alert systems in seismology. The walk-forward design and emphasis on tail-specific metrics are appropriate; however, the central claim that the neural per-cell formulation drives the reported gains rests on the unverified premise that spatial embeddings alone suffice to recover meaningful dispersion variation without explicit covariance structure or geophysical covariates.
major comments (2)
- [Section 3] Section 3 (EarthquakeNet architecture description): the claim that spatial embeddings plus a standard MLP recover distinct per-cell alpha values reflecting genuine clustering heterogeneity is not yet supported by direct evidence. Because the negative binomial likelihood couples the mean and dispersion parameters, and no spatial covariance (e.g., GP, convolutional layers) or auxiliary covariates (fault maps, strain rates) are included, the per-cell estimates risk non-identifiability or collapse to near-global values; this directly undermines the attribution of the 8.6% MPD and 12.5% tail-CRPS gains to the per-cell dispersion mechanism.
- [Section 4] Results, walk-forward evaluation (Section 4): an ablation isolating the contribution of per-cell alpha is missing. A comparison against a neural model that retains per-cell means but enforces a single global alpha would be required to confirm that the observed improvements in MPD and tail CRPS arise from spatially varying dispersion rather than from the neural network's added flexibility in modeling the mean rate.
minor comments (2)
- [Methods] The exact negative binomial parameterization (mean-dispersion vs. other forms) and the precise output activation used for alpha should be stated explicitly, together with any constraints applied to keep alpha positive.
- [Results] A map or summary statistic (e.g., histogram, spatial autocorrelation) of the learned per-cell alpha values would help readers assess whether the estimates exhibit plausible spatial structure rather than random variation.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on identifiability and the need for targeted ablations. These points help clarify the attribution of performance gains to the per-cell dispersion mechanism. We respond to each major comment below and indicate the revisions that will be incorporated.
read point-by-point responses
-
Referee: Section 3 (EarthquakeNet architecture description): the claim that spatial embeddings plus a standard MLP recover distinct per-cell alpha values reflecting genuine clustering heterogeneity is not yet supported by direct evidence. Because the negative binomial likelihood couples the mean and dispersion parameters, and no spatial covariance (e.g., GP, convolutional layers) or auxiliary covariates (fault maps, strain rates) are included, the per-cell estimates risk non-identifiability or collapse to near-global values; this directly undermines the attribution of the 8.6% MPD and 12.5% tail-CRPS gains to the per-cell dispersion mechanism.
Authors: We acknowledge that direct evidence is required to demonstrate that the learned alphas are distinct and not collapsed. Although the negative binomial parameterization treats the mean rate and dispersion alpha as separate outputs of the network (with variance = mu + mu^2/alpha), the coupling through the likelihood does create a risk of non-identifiability when no explicit spatial structure is present. To address this concern, the revised manuscript will add: (i) a spatial map of the estimated per-cell alpha values, (ii) quantitative summary statistics (mean, variance, min/max) of alpha across the grid to show deviation from a single global value, and (iii) a brief discussion of how the per-cell formulation remains identifiable under the full likelihood when the data exhibit sufficient heterogeneity. These additions will provide the missing direct evidence. revision: yes
-
Referee: Results, walk-forward evaluation (Section 4): an ablation isolating the contribution of per-cell alpha is missing. A comparison against a neural model that retains per-cell means but enforces a single global alpha would be required to confirm that the observed improvements in MPD and tail CRPS arise from spatially varying dispersion rather than from the neural network's added flexibility in modeling the mean rate.
Authors: We agree that the current comparison to the GLM baseline does not fully isolate the effect of per-cell dispersion from the added flexibility of the neural mean model. In the revised manuscript we will add an ablation study that trains an otherwise identical neural architecture (same spatial embeddings and MLP for the mean) but replaces the per-cell alpha head with a single shared global alpha parameter. Results for mean pinball deviation and tail CRPS (Y >= 5) will be reported for this global-alpha neural variant alongside the original per-cell model and the GLM baseline. This will allow a direct assessment of whether the reported 8.6% MPD and 12.5% tail-CRPS improvements are attributable to spatially varying dispersion. revision: yes
Circularity Check
No circularity: results from out-of-sample walk-forward evaluation
full rationale
The paper's central claims rest on a neural architecture (spatial embeddings + MLP) trained to produce per-cell negative binomial dispersion parameters, followed by explicit walk-forward validation (2018-2023) that computes MPD and tail CRPS against an independent negative binomial GLM baseline on held-out seismic counts. These metrics are not algebraically forced by the fitted parameters themselves; the likelihood-ratio rejection of global Poisson and the reported 8.6% / 12.5% gains are data-driven comparisons. No self-citation chain, uniqueness theorem, or ansatz is invoked to derive the per-cell alpha values or the risk quantiles; the architecture is presented as a modeling choice whose value is assessed externally. The derivation chain is therefore self-contained against the evaluation data and does not reduce to its inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- neural network parameters (embeddings and MLP weights)
axioms (1)
- domain assumption Negative binomial distribution adequately captures the overdispersion present in weekly earthquake counts once a per-cell alpha is supplied.
invented entities (1)
-
EarthquakeNet
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Ogata, Y. (1988). Statistical models for earthquake occurrences and residual analysis for point processes.Journal of the American Statistical Association, 83(401), 9–27
work page 1988
-
[2]
Helmstetter, A., & Sornette, D. (2002). Subcritical and supercritical regimes in epidemic models of earthquake aftershocks.Journal of Geophysical Research: Solid Earth, 107(B10), ESE 10-1–ESE 10-21
work page 2002
-
[3]
Zhuang, J. (2011). Next-day earthquake forecasts for the Japan region generated by the ETAS model.Earth, Planets and Space, 63(3), 207–216
work page 2011
-
[4]
(2013).Regression Analysis of Count Data(2nd ed.)
Cameron, A.C., & Trivedi, P.K. (2013).Regression Analysis of Count Data(2nd ed.). Cambridge University Press
work page 2013
-
[5]
Lawless, J.F. (1987). Negative binomial and mixed Poisson regression.The Canadian Journal of Statistics, 15(3), 209–225
work page 1987
-
[6]
Du, N., Dai, H., Trivedi, R., Upadhyay, U., Gomez-Rodriguez, M., & Song, L. (2016). Recur- rent marked temporal point processes: Embedding event history to vector. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(pp. 1555–1564)
work page 2016
-
[7]
Mei, H., & Eisner, J.M. (2017). The neural Hawkes process: A neurally self-modulating multivariate point process. InAdvances in Neural Information Processing Systems(Vol. 30)
work page 2017
-
[8]
Shchur, O., Türkmen, A.C., Januschowski, T., & Günnemann, S. (2020). Intensity-free learning of temporal point processes. InInternational Conference on Learning Representa- tions
work page 2020
-
[9]
Wiemer, S., & Wyss, M. (2000). Minimum magnitude of completeness in earthquake catalogs: Examples from Alaska, the western United States, and Japan.Bulletin of the Seismological Society of America, 90(4), 859–869
work page 2000
-
[10]
Self, S.G., & Liang, K.-Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions.Journal of the American Statistical Association, 82(398), 605–610. 28Igilik Alim
work page 1987
-
[11]
DeVries, P.M.R., Viégas, F., Wattenberg, M., & Meade, B.J. (2018). Deep learning of aftershock patterns following large earthquakes.Nature, 560, 632–634
work page 2018
-
[12]
Mignan, A., & Broccardo, M. (2019). One neuron versus deep learning in aftershock prediction.Nature, 574, E1–E3
work page 2019
-
[13]
(1981).Spatial Processes: Models & Applications
Cliff, A.D., & Ord, J.K. (1981).Spatial Processes: Models & Applications. Pion
work page 1981
-
[14]
Czado, C., Gneiting, T., & Held, L. (2009). Predictive model assessment for count data. Biometrics, 65(4), 1254–1261
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.