pith. sign in

arxiv: 2606.02912 · v1 · pith:22BFIADXnew · submitted 2026-06-01 · 🌌 astro-ph.IM · cs.LG· gr-qc· physics.geo-ph

Data-Driven Forecasting of three-Component Seismograms Using Transformer Architectures

Pith reviewed 2026-06-28 12:13 UTC · model grok-4.3

classification 🌌 astro-ph.IM cs.LGgr-qcphysics.geo-ph
keywords seismogram forecastingtransformer autoregressive modelthree-component waveformssynthetic seismogramsphase coherencespectral energy preservationwaveform continuation
0
0 comments X

The pith

A transformer autoregressive model forecasts three-component seismograms from P-wave context onward, achieving median normalized cross-correlation above 0.93 on synthetic data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The work formulates seismic forecasting as a continuation task in which a transformer receives waveform context past the S-wave arrival and then generates future samples recursively without ground truth. Evaluation uses synthetic three-component records spanning source depths 5-100 km, distances 10-90 degrees, and magnitudes 3-7, with three context-ratio setups and fixed horizons of 120 and 240 seconds. Across every setup the median correlation stays above 0.93, and successful rollouts keep both phase alignment and spectral content intact. Failures appear mainly as gradual phase drift rather than creation of non-physical signals.

Core claim

SeismoGPT demonstrates that a transformer can learn stable dynamical continuation of seismic wavefields in the time domain, producing forecasts whose median normalized cross-correlation exceeds 0.93 while preserving phase coherence and spectral energy distribution on the tested synthetic ensemble.

What carries the argument

SeismoGPT, a transformer-based autoregressive model that performs physically constrained continuation of three-component waveforms starting from P-wave arrival.

If this is right

  • Successful forecasts preserve both phase coherence and spectral energy distribution of the input waveforms.
  • Failure cases arise primarily from gradual phase drift during autoregressive rollout rather than unphysical signal generation.
  • The results indicate that transformer sequence models can learn stable continuation of seismic wavefields on the tested parameter ranges.
  • The methodology carries potential applications in seismic warning and hazard mitigation, including for next-generation gravitational-wave observatories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the phase-drift failure mode can be mitigated by longer training or auxiliary loss terms, the same architecture might support longer prediction horizons.
  • Training on a wider mix of real and synthetic records could test whether the learned continuation transfers beyond the current synthetic ensemble.
  • The continuation framing used here could be applied to other multi-component wave-propagation problems where only partial observations are available.

Load-bearing premise

Synthetic seismograms generated across the stated ranges of depth, distance, and magnitude are representative enough for the autoregressive model to learn continuation that would hold on real recorded data.

What would settle it

Direct comparison of model forecasts against recorded three-component seismograms from real earthquakes of comparable magnitude and distance would show whether the reported correlation levels persist outside the synthetic training distribution.

Figures

Figures reproduced from arXiv: 2606.02912 by Alexander Kappes, Christine Thomas, Jana Klinge, Stuart Russell, Waleed Esmail.

Figure 1
Figure 1. Figure 1: Token-level autoregressive forecasting framework. (left) A continuous seismic [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Schematic overview of the synthetic waveform generation pipeline. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the SeismoGPT architecture. Input tokens of shape (T ×C ×K) are embedded by the token encoder, which applies a 1×1 convolution for channel mixing, mean and last-sample pooling over the within-token axis K, and a linear projection with layer normalization to produce a sequence of d-dimensional token embeddings. These are passed through a stack of L causally masked transformer encoder layers. The… view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of NCC (left), SRR(center), and PSD log- [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Dependence of NCC (top row), SRR (middle row) and PSD log- [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Two-dimensional parameter-plane maps of median NCC (top), SRR (middle), and [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Median NCC as a function of context ratio for a fixed prediction horizon of 240 s, [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Autoregressive forecast of a representative event ( [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Representative failure case for Configuration A. Shallow intermediate-distance [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Representative failure case for Configuration B. Shallow intermediate-distance [PITH_FULL_IMAGE:figures/full_fig_p027_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Representative failure case for Configuration C. Deep, large-distance event (depth [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Two-dimensional parameter-plane maps of median NCC (top), SRR (middle), [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Two-dimensional parameter-plane maps of median NCC (top), SRR (middle), [PITH_FULL_IMAGE:figures/full_fig_p029_13.png] view at source ↗
read the original abstract

Forecasting seismic waveforms beyond observed data remains challenging due to the nonlinear, dispersive, and multi-scale nature of seismic wave propagation. In this work, we introduce \textsc{SeismoGPT}, a transformer-based autoregressive model designed to forecast three-component seismic waveforms directly in the time domain. Forecasting is formulated as a physically constrained continuation problem in which the model receives waveform context beginning at the P-wave arrival and extending a defined time beyond the S-wave arrival, after which future motion is generated recursively without access to ground-truth samples. Evaluation is performed on synthetic seismograms spanning source depths of 5--100\,km, epicentral distances of 10--90$^\circ$, and magnitudes $3 \leq M_w \leq 7$. To disentangle the effects of context length and prediction horizon, we define three evaluation configurations using a distance-normalized context ratio and fixed prediction horizons of 120 and 240\,s. Across all configurations, the model achieves median normalized cross correlation above 0.93. Analysis of representative forecasts shows that successful predictions preserve both phase coherence and spectral energy distribution. Where failure cases arise, this is primarily due to gradual phase drift during autoregressive rollout rather than unphysical signal generation. These results demonstrate that transformer-based sequence models can learn stable dynamical continuation of seismic wavefields, highlighting the potential of foundation-model approaches for physics-driven time-series forecasting. There are potential applications of this methodology in seismic warning and hazard mitigation, particularly for next-generation gravitational-wave observatories, such as the Einstein Telescope.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces SeismoGPT, a transformer-based autoregressive model for direct time-domain forecasting of three-component seismograms. It formulates forecasting as a continuation problem starting from P-wave arrival context (with defined distance-normalized ratios) and generates future waveforms recursively over fixed 120 s and 240 s horizons. Evaluation on synthetic data spanning 5-100 km depths, 10-90° distances, and Mw 3-7 yields median normalized cross-correlation above 0.93 across configurations, with successful cases preserving phase coherence and spectral energy; failures are attributed to gradual phase drift.

Significance. If the synthetic results hold under the reported conditions, the work demonstrates that transformer sequence models can capture stable dynamical continuation of seismic wavefields without explicit physics constraints, opening a data-driven route to waveform forecasting. This has potential relevance for early-warning systems and next-generation observatories, provided the approach can be shown to generalize beyond the synthetic generator.

major comments (2)
  1. [Abstract] Abstract: the headline median NCC > 0.93 is reported without error bars, baseline comparisons (e.g., against AR models, RNNs, or physics-based propagators), or any description of training procedure, loss, or regularization; these omissions make it impossible to judge whether the performance exceeds what simpler methods achieve on the same synthetic ensemble.
  2. [Abstract] Abstract and evaluation section: all quantitative results are confined to held-out synthetic seismograms generated within the stated depth/distance/magnitude ranges; no experiments or discussion address transfer to real recordings (instrument response, site effects, scattering, noise), which is load-bearing for the claimed applications in seismic warning and for confirming that the autoregressive rollout obeys wave-propagation physics rather than generator-specific statistics.
minor comments (1)
  1. The three evaluation configurations are described only in prose; a small table listing context ratios, horizons, and per-configuration median NCC values would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline median NCC > 0.93 is reported without error bars, baseline comparisons (e.g., against AR models, RNNs, or physics-based propagators), or any description of training procedure, loss, or regularization; these omissions make it impossible to judge whether the performance exceeds what simpler methods achieve on the same synthetic ensemble.

    Authors: We agree the abstract is concise and will revise it to include a brief statement on the transformer architecture, autoregressive training with MSE loss, and the reported median with variability from the evaluation figures. Training details appear in Section 3; we will add a cross-reference. Baseline comparisons to AR models are in the supplementary material and will be referenced in the abstract revision. revision: yes

  2. Referee: [Abstract] Abstract and evaluation section: all quantitative results are confined to held-out synthetic seismograms generated within the stated depth/distance/magnitude ranges; no experiments or discussion address transfer to real recordings (instrument response, site effects, scattering, noise), which is load-bearing for the claimed applications in seismic warning and for confirming that the autoregressive rollout obeys wave-propagation physics rather than generator-specific statistics.

    Authors: The work is framed as a controlled demonstration on synthetic data to isolate the effects of context ratio and horizon. We will expand the discussion to explicitly note the absence of real-data transfer experiments as a limitation and outline future directions for instrument response and noise. This revision clarifies scope without altering the synthetic focus of the present study. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical ML model (SeismoGPT) trained and evaluated exclusively on synthetic seismograms using held-out test data with explicitly defined context ratios and prediction horizons. The central performance claim (median NCC > 0.93) is a direct empirical metric computed between model rollouts and ground-truth synthetics, with no reduction to a fitted parameter, self-definitional loop, or load-bearing self-citation. No uniqueness theorems, ansatzes smuggled via citation, or renaming of known results appear in the provided text. The derivation chain is self-contained as a data-driven forecasting demonstration on synthetics.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities beyond the model name itself; the central claim rests on the unstated assumption that the synthetic data distribution matches real seismicity sufficiently for generalization.

invented entities (1)
  • SeismoGPT no independent evidence
    purpose: Transformer autoregressive model for seismogram forecasting
    Introduced as the core contribution; no independent evidence supplied beyond the abstract's performance claim.

pith-pipeline@v0.9.1-grok · 5826 in / 1199 out tokens · 28709 ms · 2026-06-28T12:13:30.399933+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

65 extracted references · 26 canonical work pages · 2 internal anchors

  1. [1]

    Springer Science & Business Media, 2012

    Haruo Sato, Michael C Fehler, and Takuto Maeda.Seismic wave propagation and scattering in the heterogeneous earth. Springer Science & Business Media, 2012

  2. [2]

    Oxford University Press, 2017

    Heiner Igel.Computational seismology: a practical introduction. Oxford University Press, 2017

  3. [3]

    Robert W. Graves. Simulating seismic wave propagation in 3d elastic media using staggered-grid finite differences.Bulletin of the Seismological Society of America, 86(4): 1091–1106, 1996. doi: 10.1785/bssa0860041091

  4. [4]

    Introduction to the spectral element method for three-dimensional seismic wave propagation.Geophysical Journal International, 139 (3):806–822, 1999

    Dimitri Komatitsch and Jeroen Tromp. Introduction to the spectral element method for three-dimensional seismic wave propagation.Geophysical Journal International, 139 (3):806–822, 1999. doi: 10.1046/j.1365-246x.1999.00967.x

  5. [5]

    Spectral-element simulations of global seismic wave propagation—i

    Dimitri Komatitsch and Jeroen Tromp. Spectral-element simulations of global seismic wave propagation—i. validation.Geophysical Journal International, 149(2):390–412, 2002

  6. [6]

    AxiSEM: broadband 3-D seismic wavefields in axisymmetric media.Solid Earth, 5(1):425–445, 2014

    Tarje Nissen-Meyer, Martin van Driel, Simon C Stähler, Kasra Hosseini, Stefanie Hempel, Ludwig Auer, Andrea Colombi, and Alexandre Fournier. AxiSEM: broadband 3-D seismic wavefields in axisymmetric media.Solid Earth, 5(1):425–445, 2014

  7. [7]

    Kuangdai Leng, Tarje Nissen-Meyer, and Martin van Driel. Efficient global wave prop- agation adapted to 3-d structural complexity: a pseudospectral/spectral-element ap- proach.Geophysical Supplements to the Monthly Notices of the Royal Astronomical Society, 207(3):1700–1721, 2016

  8. [8]

    Axisem3d: broad-band seismic wavefields in 3-d global earth models with undulating discontinuities.Geophysical Journal International, 217(3):2125–2146, 2019

    Kuangdai Leng, Tarje Nissen-Meyer, Martin Van Driel, Kasra Hosseini, and David Al-Attar. Axisem3d: broad-band seismic wavefields in 3-d global earth models with undulating discontinuities.Geophysical Journal International, 217(3):2125–2146, 2019

  9. [9]

    Xu and Z

    T. Xu and Z. Zhang. Numerical simulation of 3-d seismic wave based on alternative flux finite-difference weno scheme.Geophysical Journal International, 238(1):496–512,

  10. [10]

    doi: 10.1093/gji/ggae167

  11. [11]

    Spectral-element simulations of global seismic wave propagation—ii

    Dimitri Komatitsch and Jeroen Tromp. Spectral-element simulations of global seismic wave propagation—ii. three-dimensional models, oceans, rotation and self-gravitation. Geophysical journal international, 150(1):303–318, 2002

  12. [12]

    On the modelling of self- gravitation for full 3-d global seismic wave propagation.Geophysical Journal Interna- tional, 227(1):632–643, 2021

    Martin van Driel, Johannes Kemper, and Christian Boehm. On the modelling of self- gravitation for full 3-d global seismic wave propagation.Geophysical Journal Interna- tional, 227(1):632–643, 2021

  13. [13]

    Lyu et al

    D. Lyu et al. Rapid wavefield forecasting for earthquake early warning via deep sequence to sequence learning.Nature Communications, 16(1), 2025. 30

  14. [14]

    Mostafa Mousavi and Gregory C

    S. Mostafa Mousavi and Gregory C. Beroza. Machine learning in earthquake seis- mology.Annual Review of Earth and Planetary Sciences, 50:641–666, 2022. doi: 10.1146/annurev-earth-032320-041749

  15. [15]

    SeisLM: a foundation model for seismic waveforms.arXiv preprint arXiv:2410.15765, 2024

    Tianlin Liu et al. SeisLM: a foundation model for seismic waveforms.arXiv preprint arXiv:2410.15765, 2024

  16. [16]

    H. Kubo. Recent advances in earthquake seismology using machine learning.Earth, Planets and Space, 76(1):1–22, 2024. doi: 10.1186/s40623-024-01966-w

  17. [17]

    S. M. Mousavi, W. L. Ellsworth, W. Zhu, L. Y. Chuang, and G. C. Beroza. Earth- quake transformer—an attentive deep-learning model for simultaneous earthquake de- tection and phase picking.Nature Communications, 11(1):1–12, 2020. doi: 10.1038/ s41467-020-17591-w

  18. [18]

    Weiqiang Zhu and Gregory C. Beroza. PhaseNet: a deep-neural-network-based seismic arrival-time picking method.Geophysical Journal International, 216(1):261–273, 2019. doi: 10.1093/gji/ggy423

  19. [19]

    Mostafa Mousavi, and Gregory C

    Weiqiang Zhu, S. Mostafa Mousavi, and Gregory C. Beroza. Seismic signal denoising and decomposition using deep neural networks.IEEE Transactions on Geoscience and Remote Sensing, 57(11):9476–9488, 2019. doi: 10.1109/TGRS.2019.2926772

  20. [20]

    Q. Kong, R. M. Allen, L. Schreier, and Y. W. Kwon. Machine learning aspects of the MyShake global smartphone seismic network.Seismological Research Letters, 89(5): 1887–1896, 2018. doi: 10.1785/0220180037

  21. [21]

    Suppression of wind turbine noise from seismologi- cal data using nonlinear thresholding and denoising autoencoder.Journal of Seismology, 26(5):913–934, 2022

    Janis Heuel and Wolfgang Friederich. Suppression of wind turbine noise from seismologi- cal data using nonlinear thresholding and denoising autoencoder.Journal of Seismology, 26(5):913–934, 2022

  22. [22]

    Ross, Men-Andrin Meier, Egill Hauksson, and Thomas H

    Zachary E. Ross, Men-Andrin Meier, Egill Hauksson, and Thomas H. Heaton. Gener- alized seismic phase detection with deep learning.Bulletin of the Seismological Society of America, 108(5A):2894–2901, 2018. doi: 10.1785/0120180080

  23. [23]

    F. F. Mojtahedi. Deep learning for time series forecasting: Review and applications in geotechnics and geosciences.Current Trends in Geotechnical Engineering, 2025

  24. [24]

    Q. Kong, R. M. Allen, L. Schreier, and Y. W. Kwon. Machine learning in seismology: Turning data into insights.Seismological Research Letters, 89(4):1429–1441, 2018. doi: 10.1785/0220180112

  25. [25]

    SeismicNet: Physics-informed neural networks for seismic wave modeling in semi-infinite domain

    PuRen, ChengpingRao, SuChen, Jian-XunWang, HaoSun, andYangLiu. SeismicNet: Physics-informed neural networks for seismic wave modeling in semi-infinite domain. arXiv preprint arXiv:2210.14044, 2022. URL https://arxiv.org/abs/2210.14044. 31

  26. [26]

    Applications of physics-informed neural networks in geosciences: From basic seismology to comprehensive environmental monitoring.Journal of Applied Geophysics, 2024

    Maan Habib et al. Applications of physics-informed neural networks in geosciences: From basic seismology to comprehensive environmental monitoring.Journal of Applied Geophysics, 2024. doi: 10.1016/j.jappgeo.2024.105342

  27. [27]

    Forecasting Seismic Waveforms: A Deep Learning Approach for Einstein Telescope.arXiv preprint arXiv:2509.21446, 2025

    Waleed Esmail, Alexander Kappes, Stuart Russell, and Christine Thomas. Forecasting Seismic Waveforms: A Deep Learning Approach for Einstein Telescope.arXiv preprint arXiv:2509.21446, 2025. URL https://arxiv.org/abs/2509.21446

  28. [28]

    A survey of large language models.arXiv preprint arXiv:2303.18223, 2023

    Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models.arXiv preprint arXiv:2303.18223, 2023. URL https://arxiv.org/abs/ 2303.18223

  29. [29]

    The status of earthquake early warning around the world: An introductory overview.Seismological Research Letters, 80, 09 2009

    Richard Allen, Paolo Gasparini, Osamu Kamigaichi, and Maren Böse. The status of earthquake early warning around the world: An introductory overview.Seismological Research Letters, 80, 09 2009. doi: 10.1785/gssrl.80.5.682

  30. [30]

    ( year 2026 ), month Mar

    Adrian Abac et al. The Science of the Einstein Telescope.JCAP, 03:081, 2026. doi: 10.1088/1475-7516/2026/03/081

  31. [31]

    Harms, Living Reviews in Relativity22, 6 (2019), URL https://doi.org/10.1007/s41114-019-0022-2

    Jan Harms. Terrestrial gravity fluctuations.Living Rev. Rel., 22(1):6, 2019. doi: 10.1007/s41114-019-0022-2

  32. [32]

    Attention is all you need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, pages 5998–6008, 2017. URL https://arxiv. org/abs/1706.03762

  33. [33]

    TOTEM: TOkenized time series EMbeddings for general time series analysis.Transactions on Machine Learning Re- search, 2024

    Sabera Talukder, Yisong Yue, and Georgia Gkioxari. TOTEM: TOkenized time series EMbeddings for general time series analysis.Transactions on Machine Learning Re- search, 2024. URL https://arxiv.org/abs/2402.16412. arXiv preprint arXiv:2402.16412

  34. [34]

    Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815, 2024

    Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, et al. Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815, 2024. URL https://arxiv.org/abs/2403.07815

  35. [36]

    URL https://arxiv.org/abs/2412.05244

  36. [37]

    Anjos, Sebastian Lautz, and Aleksandar Kolev

    Egon Peršak, Miguel F. Anjos, Sebastian Lautz, and Aleksandar Kolev. Multiple- resolution tokenization for time series forecasting with an application to pricing.arXiv preprint arXiv:2407.03185, 2024. URL https://arxiv.org/abs/2407.03185. 32

  37. [38]

    TOKON: TOKenization-optimized normalization for time series anal- ysis with a large language model.arXiv preprint arXiv:2502.05701, 2025

    Janghoon Yang. TOKON: TOKenization-optimized normalization for time series anal- ysis with a large language model.arXiv preprint arXiv:2502.05701, 2025. URL https://arxiv.org/abs/2502.05701

  38. [39]

    Ergodic seismic precursors and transfer learning for short term eruption forecasting at data scarce volcanoes.Nature Commu- nications, 16, 02 2025

    Alberto Ardid Segura, David Dempsey, Corentin Caudron, Shane Cronin, Ben Kennedy, Társilo Girona, Diana Roman, Craig Miller, Sally Potter, Oliver Lamb, Martanto Mar- tanto, Yesim Cubuk, Leoncio Cabrera, Sergio Ruiz, Rodrigo Contreras-Arratia, Javier Pacheco, Mauricio Mora, and Silvio De Angelis. Ergodic seismic precursors and transfer learning for short t...

  39. [40]

    Abolfazl Farahani, Sahar Voghoei, Khaled Rasheed, and Hamid R. Arabnia. A brief re- view of domain adaptation. In Robert Stahlbock, Gary M. Weiss, Mahmoud Abou-Nasr, Cheng-Ying Yang, Hamid R. Arabnia, and Leonidas Deligiannidis, editors,Advances in Data Science and Information Engineering, pages 877–894, Cham, 2021. Springer In- ternational Publishing. IS...

  40. [41]

    Instaseis: instant global seismograms based on a broadband waveform database

    Martin van Driel, Lion Krischer, Simon C Stähler, Kambod Hosseini, and Tarje Nissen- Meyer. Instaseis: instant global seismograms based on a broadband waveform database. Solid Earth, 6(2):701–717, 2015

  41. [42]

    Data products at the IRIS-DMC: Growth and usage

    Alexander R Hutko, Manochehr Bahavar, Chad Trabant, Robert T Weekly, Mick Van Fossen, and Timothy Ahern. Data products at the IRIS-DMC: Growth and usage. Seismological Research Letters, 88(3):892–903, 2017. doi: 10.1785/0220160190

  42. [43]

    Hutko, Martin van Driel, Simon Stähler, Manochehr Ba- havar, Chad Trabant, and Tarje Nissen-Meyer

    Lion Krischer, Alexander R. Hutko, Martin van Driel, Simon Stähler, Manochehr Ba- havar, Chad Trabant, and Tarje Nissen-Meyer. On-demand custom broadband syn- thetic seismograms.Seismological Research Letters, 88(4):1127–1140, 04 2017. ISSN 0895-0695. doi: 10.1785/0220160210. URL https://doi.org/10.1785/0220160210

  43. [44]

    Montagner and B

    J.-P. Montagner and B. L. N. Kennett. How to reconcile body-wave and normal-mode reference earth models.Geophysical Journal International, 125(1):229–248, 04 1996. ISSN 0956-540X. doi: 10.1111/j.1365-246X.1996.tb06548.x. URL https://doi.org/10. 1111/j.1365-246X.1996.tb06548.x

  44. [45]

    Determination of earthquake source parameters from waveform data for studies of global and regional seismicity

    Adam M Dziewonski, T-A Chou, and John H Woodhouse. Determination of earthquake source parameters from waveform data for studies of global and regional seismicity. Journal of Geophysical Research: Solid Earth, 86(B4):2825–2852, 1981

  45. [46]

    The global CMT project 2004–2010: Centroid-moment tensors for 13,017 earthquakes.Physics of the Earth and Planetary Interiors, 200:1–9, 2012

    Göran Ekström, Meredith Nettles, and Adam M Dziewoński. The global CMT project 2004–2010: Centroid-moment tensors for 13,017 earthquakes.Physics of the Earth and Planetary Interiors, 200:1–9, 2012

  46. [47]

    The taup toolkit: Flexible seismic travel-time and ray-path utilities.Seismological Research Letters, 70:154–160, 1999

    H Philip Crotwell, Thomas J Owens, Jeroen Ritsema, et al. The taup toolkit: Flexible seismic travel-time and ray-path utilities.Seismological Research Letters, 70:154–160, 1999. 33

  47. [48]

    Obspy: A python toolbox for seismology.Seismological Research Letters, 81(3):530–533, 2010

    Moritz Beyreuther, Robert Barsch, Lion Krischer, Tobias Megies, Yannik Behr, and Joachim Wassermann. Obspy: A python toolbox for seismology.Seismological Research Letters, 81(3):530–533, 2010

  48. [49]

    Improving language understanding by genera- tive pre-training

    Alec Radford and Karthik Narasimhan. Improving language understanding by genera- tive pre-training. 2018. URL https://api.semanticscholar.org/CorpusID:49313245

  49. [50]

    Language models are unsupervised multitask learners

    Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019. URL https://api. semanticscholar.org/CorpusID:160025533

  50. [51]

    AudioLM: A language modeling approach to audio generation

    Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, and Neil Zeghidour. AudioLM: A language modeling approach to audio generation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:2523–2533,

  51. [52]

    doi: 10.1109/TASLP.2023.3288409

  52. [53]

    ProtGPT2 is a deep unsupervised language model for protein design.Nature Communications, 13(1):4348, 2022

    Noelia Ferruz, Steffen Schmidt, and Birte Höcker. ProtGPT2 is a deep unsupervised language model for protein design.Nature Communications, 13(1):4348, 2022. doi: 10.1038/s41467-022-32007-7

  53. [54]

    Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization.arXiv preprint arXiv:1607.06450, 2016. doi: 10.48550/arXiv.1607.06450

  54. [55]

    Szegedy, C., Vanhoucke, V ., Ioffe, S., Shlens, J., and Wojna, Z

    Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Bo Wen, and Yunfeng Liu. Ro- Former: Enhanced transformer with rotary position embedding.Neurocomputing, 568: 127063, 2024. doi: 10.1016/j.neucom.2023.127063

  55. [56]

    Gaussian Error Linear Units (GELUs)

    Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (GELUs).arXiv preprint arXiv:1606.08415, 2016. doi: 10.48550/arXiv.1606.08415

  56. [57]

    Parallel wavegan: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

    Ryuichi Yamamoto, Eunwoo Song, and Jae-Min Kim. Parallel wavegan: A fast waveform generation model based on generative adversarial networks with multi- resolution spectrogram. InICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6199–6203, 2020. doi: 10.1109/ICASSP40776.2020.9053795

  57. [58]

    Pytorch: An imperative style, high-performance deep learn- ing library

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gre- gory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Al- ban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-p...

  58. [59]

    PyTorch Lightning, March 2019

    William Falcon and The PyTorch Lightning team. PyTorch Lightning, March 2019. URL https://github.com/Lightning-AI/lightning

  59. [60]

    Decoupled weight decay regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInterna- tional Conference on Learning Representations (ICLR), 2019. URL https://arxiv.org/ abs/1711.05101

  60. [61]

    Similaritymeasures for time series forecasting: A survey.Algorithms, 15(10):354, 2022

    KonstantinosPapasotiriou, NikolaosBakas, andAndreasLangousis. Similaritymeasures for time series forecasting: A survey.Algorithms, 15(10):354, 2022. doi: 10.3390/ a15100354

  61. [62]

    Perspectives of cross-correlation in seismic monitoring at the international data centre.Pure and Applied Geophysics, 171 (3):439–468, 2014

    Dmitry Bobrov, Ivan Kitov, and Lassina Zerbo. Perspectives of cross-correlation in seismic monitoring at the international data centre.Pure and Applied Geophysics, 171 (3):439–468, 2014. doi: 10.1007/s00024-012-0626-x

  62. [63]

    Peter Welch. The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms.IEEE Transactions on Audio and Electroacoustics, 15(2):70–73, 1967. doi: 10.1109/TAU.1967.1161901

  63. [64]

    Brown, Benjamin Chess, Re- won Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei

    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Re- won Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling Laws for Neural Language Models. 1 2020

  64. [65]

    S. W. French and B. Romanowicz. Whole-mantle radially anisotropic shear velocity structure from spectral-element waveform tomography.Geophysical Journal Interna- tional, 199:1303–1327, 2014

  65. [66]

    W. Lei, Y. Ruan, E. Bozdağ, D. Peter, M. Lefebvre, D. Komatitsch, J. Tromp, J. Hill, N. Podhorszki, and D. Pugmire. Global adjoint tomography—model GLAD-M25.Geo- physical Journal International, 223(1):1–21, 2020. 35