Earthquake magnitudes depend on seismic history, as revealed by a neural network analysis
Pith reviewed 2026-05-23 22:14 UTC · model grok-4.3
The pith
Earthquake magnitudes carry information from past seismic activity, allowing better predictions than the standard Gutenberg-Richter model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MAGNET, a multi-encoder neural network with LSTM units, ingests spatiotemporal patterns from seismic catalogs and outputs magnitude distributions that outperform the time-independent Gutenberg-Richter model by an average of 0.07 bits per earthquake. The advantage persists across three regional catalogs after explicit controls for detection artifacts. These outcomes establish that standard hypocenter data carry measurable information about future magnitudes, contradicting the separability assumption that underpins most operational earthquake forecasts.
What carries the argument
MAGNET, a multi-encoder neural network with LSTM units that ingests hypocenter locations, occurrence times, and past magnitudes to produce history-dependent magnitude probability distributions.
If this is right
- Magnitude forecasts can be improved by conditioning on the preceding sequence of events rather than treating magnitudes as independent draws.
- The separability assumption between occurrence times, locations, and magnitudes does not hold in the examined catalogs.
- Seismic hazard models can incorporate magnitude predictions that vary with recent seismic history.
- The information gain remains detectable after standard controls for catalog artifacts in multiple independent regions.
Where Pith is reading between the lines
- Point-process models of seismicity could be extended to include explicit magnitude-history coupling terms.
- The same neural architecture might be tested on other catalogs or on laboratory earthquake data to check generality.
- If the dependence is physical, it would constrain possible mechanisms that link stress history to rupture size.
Load-bearing premise
The measured information gain arises from real physical dependence on seismic history rather than from residual catalog incompleteness, detection thresholds, or model overfitting.
What would settle it
Applying the same model to a fully complete synthetic catalog generated strictly under the time-independent Gutenberg-Richter assumption and obtaining zero information gain would falsify the central claim.
Figures
read the original abstract
Earthquake occurrence is notoriously difficult to predict. While some aspects of their spatiotemporal statistics can be relatively well captured by point-process models, very little is known regarding the magnitude of future events, and it is deeply debated whether it is possible to predict the magnitude of an earthquake before it starts. Most operational forecasting models assume that earthquake magnitudes follow a time-independent Gutenberg-Richter (GR) distribution, effectively treating magnitudes as independent of seismic history. We address this fundamental question by demonstrating that standard hypocenter catalogs carry information about future earthquake magnitudes, making them more predictable than previously considered. We present MAGNET (MAGnitude Neural EsTimation model), which uses a multi-encoder neural network architecture with LSTM units to process spatiotemporal patterns in seismic history. By analyzing hypocenter locations, occurrence times, and magnitudes of past events, MAGNET generates probabilistic magnitude forecasts that demonstrate information gains in predicting magnitudes of future events over GR-based models, after controlling for detection artifacts. Our model achieves an information gain of approximately 0.07 bit per earthquake on average over the GR benchmark in Southern California, Japan, and New Zealand catalogs, with this advantage persisting. These results demonstrate that hypocentral earthquake catalogs contain extractable information about future magnitudes, challenging the conventional separability assumption in earthquake forecasting and offering new approaches for seismic hazard assessment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MAGNET, a multi-encoder LSTM neural network that ingests hypocenter times, locations, and past magnitudes to produce probabilistic forecasts of future earthquake magnitudes. It reports an average information gain of ~0.07 bits per event over the time-independent Gutenberg-Richter benchmark across Southern California, Japan, and New Zealand catalogs, after stated controls for detection artifacts, and concludes that standard catalogs contain extractable information about magnitude dependence on seismic history.
Significance. If the result is robust, the modest but consistent gain would challenge the conventional assumption that magnitudes are independent of seismic history, with direct implications for point-process forecasting and hazard assessment. Credit is due for testing three independent regional catalogs and for attempting artifact controls; however, the small effect size makes the finding sensitive to unstated methodological choices.
major comments (2)
- [Methods (artifact controls)] The controls for detection artifacts (described in the abstract and the methods section on catalog processing) do not include an end-to-end synthetic null test that injects only realistic, spatially/temporally varying detection thresholds and incompleteness while keeping magnitudes strictly GR-distributed and independent of history. Without this test the 0.07-bit gain cannot be confidently attributed to physical history dependence rather than residual catalog artifacts.
- [Methods (model training and evaluation)] No details are provided on training/validation splits, hyperparameter search procedure, or quantitative checks that the reported gain survives alternative controls or different random seeds. Given the small effect size and the number of free parameters in the LSTM architecture, these omissions leave the central claim vulnerable to overfitting or data-processing artifacts.
minor comments (1)
- [Abstract] The abstract states the gain 'persists' but does not specify the time window or catalog subset over which persistence is measured; a brief clarification would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive review and the opportunity to improve the manuscript. We address each major comment below and will revise the paper to incorporate the requested methodological details and tests.
read point-by-point responses
-
Referee: [Methods (artifact controls)] The controls for detection artifacts (described in the abstract and the methods section on catalog processing) do not include an end-to-end synthetic null test that injects only realistic, spatially/temporally varying detection thresholds and incompleteness while keeping magnitudes strictly GR-distributed and independent of history. Without this test the 0.07-bit gain cannot be confidently attributed to physical history dependence rather than residual catalog artifacts.
Authors: We agree that an explicit end-to-end synthetic null test would provide stronger evidence that the reported gain is not an artifact of catalog incompleteness. Our existing controls address magnitude-of-completeness variations and temporal detection changes, but we will add a dedicated synthetic experiment in the revised Methods section. Synthetic catalogs will be generated with strictly history-independent GR magnitudes, realistic spatially and temporally varying detection thresholds derived from the real data, and then processed identically to the observed catalogs to confirm that MAGNET yields no spurious information gain under the null. revision: yes
-
Referee: [Methods (model training and evaluation)] No details are provided on training/validation splits, hyperparameter search procedure, or quantitative checks that the reported gain survives alternative controls or different random seeds. Given the small effect size and the number of free parameters in the LSTM architecture, these omissions leave the central claim vulnerable to overfitting or data-processing artifacts.
Authors: We will add a new subsection in Methods that fully specifies the temporal training/validation/test splits (chosen to avoid forward leakage), the hyperparameter search procedure and selection criteria, and quantitative robustness results across multiple random seeds and alternative preprocessing pipelines. These additions will demonstrate that the average 0.07-bit gain remains stable under the reported controls. revision: yes
Circularity Check
No circularity; empirical out-of-sample comparison to fixed baseline
full rationale
The paper trains a neural network (MAGNET) on hypocenter catalogs to produce probabilistic magnitude forecasts and reports an information gain versus the fixed Gutenberg-Richter benchmark on held-out events. This is a standard supervised-learning evaluation against an independent null model; the reported gain is not obtained by fitting a parameter to the test set and relabeling it a prediction, nor does any step reduce to a self-definition or self-citation chain. No equations or load-bearing claims in the provided text exhibit the enumerated circular patterns.
Axiom & Free-Parameter Ledger
free parameters (1)
- LSTM hidden sizes and learning rate
axioms (1)
- domain assumption Earthquake catalogs after detection-artifact correction are sufficiently complete for magnitude forecasting
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Most operational forecasting models assume that earthquake magnitudes follow a time-independent Gutenberg-Richter (GR) distribution ... separability assumption
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
The conditioned cumulative information gain (CIG) is then recalculated and presented as the yellow dashed curved in Fig. 4 for all three tested regions. Indeed, factoring out the temporal incompleteness reduces the information gain, though it is still significant: evidently, the conditioned CIG curves show a steadily increasing trend. The horizontal dashe...
work page 2016
-
[2]
Bernard, Earthquake precursors and crustal ’tran- sients’, Nature , 1 (1999)
P. Bernard, Earthquake precursors and crustal ’tran- sients’, Nature , 1 (1999)
work page 1999
-
[3]
R. J. Geller, D. D. Jackson, Y. Y. Kagan, and F. Mu- largia, Earthquakes Cannot Be Predicted, Science 275, 1616 (1997)
work page 1997
-
[4]
B. Gutenberg and C. F. Richter, Frequency of earth- quakes in California*, Bulletin of the Seismological Soci- ety of America 34, 185 (1944)
work page 1944
-
[5]
Y. Y. Kagan, Seismic moment distribution revisited: I. Statistical results, Geophysical Journal International 148, 520 (2002)
work page 2002
- [6]
-
[7]
K. Dascher-Cousineau, E. E. Brodsky, T. Lay, and T. H. W. Goebel, What Controls Variations in After- shock Productivity?, Journal of Geophysical Research: 10 Solid Earth 125, e2019JB018111 (2020)
work page 2020
-
[8]
Y. Y. Kagan, Aftershock Zone Scaling, Bulletin of the Seismological Society of America 92, 641 (2002)
work page 2002
-
[9]
T. Utsu, Y. Ogata, R. S, and Matsu’ura, The Centenary of the Omori Formula for a Decay Law of Aftershock Activity, Journal of Physics of the Earth 43, 1 (1995)
work page 1995
-
[10]
F. Omori, On After-Shocks of Earthquakes, Journal of the College of Science, Imperial University of Tokyo , 111 (1894)
-
[11]
Y. Y. Kagan, Short-Term Properties of Earthquake Cat- alogs and Models of Earthquake Source, Bulletin of the Seismological Society of America 94, 1207 (2004)
work page 2004
-
[12]
Y. Ben-Zion and I. Zaliapin, Localization and coalescence of seismicity before large earthquakes, Geophysical Jour- nal International 223, 561 (2020)
work page 2020
-
[13]
P. M. R. DeVries, F. Vi´ egas, M. Wattenberg, and B. J. Meade, Deep learning of aftershock patterns following large earthquakes, Nature 560, 632 (2018)
work page 2018
-
[14]
G. C. P. King, R. S. Stein, and J. Lin, Static stress changes and the triggering of earthquakes, Bulletin of the Seismological Society of America 84, 935 (1994)
work page 1994
-
[15]
Y. Ogata, Statistical Models for Earthquake Occurrences and Residual Analysis for Point Processes, Source: Jour- nal of the American Statistical Association 83, 9 (1988)
work page 1988
-
[16]
J. L. Hardebeck, A. L. Llenos, A. J. Michael, M. T. Page, M. Schneider, and N. J. van der Elst, Aftershock Fore- casting, Annual Review of Earth and Planetary Sciences 52, null (2024)
work page 2024
-
[17]
T. H. Jordan, Y. T. Chen, P. Gasparini, R. Madariaga, I. Main, W. Marzocchi, G. Papadopoulos, G. Sobolev, K. Yamaoka, and J. Zschau, OPERATIONAL EARTH- QUAKE FORECASTING. State of Knowledge and Guidelines for Utilization, Annals of Geophysics 54, 319 (2011)
work page 2011
-
[18]
M. Stirling, G. McVerry, M. Gerstenberger, N. Litch- field, R. Van Dissen, K. Berryman, P. Barnes, L. Wal- lace, P. Villamor, R. Langridge, G. Lamarche, S. Nodder, M. Reyners, B. Bradley, D. Rhoades, W. Smith, A. Nicol, J. Pettinga, K. Clark, and K. Jacobs, National Seismic Hazard Model for New Zealand: 2010 Update, Bulletin of the Seismological Society o...
work page 2010
- [19]
-
[20]
W. L. Ellsworth and G. C. Beroza, Seismic Evidence for an Earthquake Nucleation Phase, Science 268, 851 (12.5.95)
- [21]
-
[22]
L. Gulia and S. Wiemer, Real-time discrimination of earthquake foreshocks and aftershocks, Nature 574, 193 (2019)
work page 2019
- [23]
-
[24]
F. P. Schoenberg, Testing Separability in Spatial- Temporal Marked Point Processes, Biometrics 60, 471 (2004), 3695775
work page 2004
-
[25]
S. Stockman, D. J. Lawson, and M. J. Werner, Fore- casting the 2016–2017 Central Apennines Earthquake Se- quence With a Neural Point Process, Earth’s Future 11, e2023EF003777 (2023)
work page 2016
- [26]
-
[27]
A. Sornette and D. Sornette, Self-Organized Criticality and Earthquakes, Europhysics Letters 9, 197 (1989)
work page 1989
-
[28]
T. W. J. de Geus and M. Wyart, Scaling theory for the statistics of slip at frictional interfaces, Physical Review E 106, 065001 (2022)
work page 2022
-
[29]
G. Petrillo and J. Zhuang, Verifying the Magnitude De- pendence in Earthquake Occurrence, Physical Review Letters 131, 154101 (2023)
work page 2023
-
[30]
M. Taroni, Are the magnitudes of earthquakes in South- ern California, with incompleteness removed, correlated?, Geophysical Journal International 236, 1596 (2024)
work page 2024
-
[31]
J. Davidsen and A. Green, Are Earthquake Magnitudes Clustered?, Physical Review Letters 106, 108502 (2011)
work page 2011
- [32]
-
[33]
Do Earthquakes Exhibit Self- Organized Criticality?
´A. Corral, Comment on “Do Earthquakes Exhibit Self- Organized Criticality?”, Physical Review Letters 95, 159801 (2005)
work page 2005
-
[34]
I. Spassiani and G. Sebastiani, Exploring the relation- ship between the magnitudes of seismic events, Journal of Geophysical Research: Solid Earth 121, 903 (2016)
work page 2016
-
[36]
E. Lippiello, L. de Arcangelis, and C. Godano, A positive answer on the existence of correlations be- tween positive earthquake magnitude differences (2024), arXiv:2404.15706 [physics]
-
[37]
E. Lippiello, L. de Arcangelis, and C. Godano, Influence of Time and Space Correlations on Earthquake Magni- tude, Physical Review Letters 100, 038501 (2008)
work page 2008
-
[38]
R. Shcherbakov, J. Zhuang, G. Z¨ oller, and Y. Ogata, Forecasting the magnitude of the largest expected earth- quake, Nature Communications 10, 4051 (2019)
work page 2019
-
[39]
A. Panakkat and H. Adeli, NEURAL NETWORK MODELS FOR EARTHQUAKE MAGNITUDE PRE- DICTION USING MULTIPLE SEISMICITY INDICA- TORS, International Journal of Neural Systems 17, 13 (2007)
work page 2007
-
[40]
S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation 9, 1735 (1997)
work page 1997
-
[41]
P. Kumaraswamy, A generalized probability density func- tion for double-bounded random processes, Journal of Hydrology 46, 79 (1980)
work page 1980
-
[42]
E. Hauksson, W. Yang, and P. M. Shearer, Waveform Relocated Earthquake Catalog for Southern California (1981 to June 2011), Bulletin of the Seismological Society of America 102, 2239 (2012)
work page 1981
-
[43]
GNS, GeoNet Aotearoa New Zealand Earthquake Cata- logue (1970)
work page 1970
-
[44]
Japan Meteorological Agency website, https://www.data.jma.go.jp/svd/eqev/data/bulletin/hypo e.html
-
[45]
K. P. Murphy, Machine Learning: A Probabilistic Per- spective (2012). 11
work page 2012
-
[46]
S. B¨ uttcher, C. L. A. Clarke, and G. V. Cormack,Infor- mation Retrieval: Implementing and Evaluating Search Engines (MIT Press, 2010)
work page 2010
-
[47]
S. Wiemer and M. Wyss, Minimum Magnitude of Completeness in Earthquake Catalogs: Examples from Alaska, the Western United States, and Japan, Bulletin of the Seismological Society of America 90, 859 (2000)
work page 2000
-
[48]
D. Amitrano, Brittle-ductile transition and associated seismicity: Experimental and numerical studies and re- lationship with the b value, Journal of Geophysical Re- search: Solid Earth 108, 10.1029/2001JB000680 (2003)
-
[49]
C. H. Scholz, On the stress dependence of the earthquake b value, Geophysical Research Letters 42, 1399 (2015)
work page 2015
-
[50]
M. Herrmann, E. Piegari, and W. Marzocchi, Revealing the spatiotemporal complexity of the magnitude distribu- tion and b-value during an earthquake sequence, Nature Communications 13, 5087 (2022)
work page 2022
-
[51]
M. Taroni and M. M. C. Carafa, Earthquake size distri- butions are slightly different in compression vs extension, Communications Earth & Environment 4, 1 (2023)
work page 2023
- [52]
-
[53]
P. Sturmfels, S. Lundberg, and S.-I. Lee, Visualizing the Impact of Feature Attribution Baselines, Distill 5, e22 (2020)
work page 2020
- [54]
-
[55]
Z. Liu and F. Xu, Interpretable neural networks: Princi- ples and applications, Frontiers in Artificial Intelligence 6, 10.3389/frai.2023.974295 (2023)
-
[56]
S. M. Mousavi and G. C. Beroza, Deep-learning seismol- ogy, Science 377, eabm4470 (2022)
work page 2022
-
[57]
S. M. Mousavi and G. C. Beroza, Machine Learning in Earthquake Seismology, Annual Review of Earth and Planetary Sciences 51, 105 (2023)
work page 2023
-
[58]
A. Mignan and M. Broccardo, Neural Network Appli- cations in Earthquake Prediction (1994–2019): Meta- Analytic and Statistical Insights on Their Limitations, Seismological Research Letters 91, 2330 (2020)
work page 1994
-
[59]
S. Karimpouli, D. Caus, H. Grover, P. Mart´ ınez-Garz´ on, M. Bohnhoff, G. C. Beroza, G. Dresen, T. Goebel, T. Weigel, and G. Kwiatek, Explainable machine learn- ing for labquake prediction using catalog-driven features, Earth and Planetary Science Letters 622, 118383 (2023)
work page 2023
-
[60]
K. J. Bergen, P. A. Johnson, M. V. de Hoop, and G. C. Beroza, Machine learning for data-driven discovery in solid Earth geoscience, Science 363, eaau0323 (2019)
work page 2019
-
[61]
Y. Ogata, Statistics of Earthquake Activity: Models and Methods for Earthquake Predictability Studies, Annual Review of Earth and Planetary Sciences 45, 497 (2017)
work page 2017
-
[62]
K. Dascher-Cousineau, O. Shchur, E. E. Brodsky, and S. G¨ unnemann, Using Deep Learning for Flexible and Scalable Earthquake Forecasting, Geophysical Research Letters 50, e2023GL103909 (2023)
work page 2023
-
[63]
O. Zlydenko, G. Elidan, A. Hassidim, D. Kukliansky, Y. Matias, B. Meade, A. Molchanov, S. Nevo, and Y. Bar- Sinai, A neural encoder for earthquake rate forecasting, Scientific Reports 13, 12350 (2023)
work page 2023
-
[64]
D. W. Scott, Multivariate Density Estimation , 2nd ed. (Wiley, 2015)
work page 2015
-
[65]
P. Virtanen, R. Gommers, T. E. Oliphant, M. Haber- land, T. Reddy, D. Cournapeau, E. Burovski, P. Pe- terson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, ˙I. Po- lat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henr...
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.