Data-Driven Forecasting of three-Component Seismograms Using Transformer Architectures

Alexander Kappes; Christine Thomas; Jana Klinge; Stuart Russell; Waleed Esmail

arxiv: 2606.02912 · v1 · pith:22BFIADXnew · submitted 2026-06-01 · 🌌 astro-ph.IM · cs.LG· gr-qc· physics.geo-ph

Data-Driven Forecasting of three-Component Seismograms Using Transformer Architectures

Waleed Esmail , Stuart Russell , Jana Klinge , Alexander Kappes , Christine Thomas This is my paper

Pith reviewed 2026-06-28 12:13 UTC · model grok-4.3

classification 🌌 astro-ph.IM cs.LGgr-qcphysics.geo-ph

keywords seismogram forecastingtransformer autoregressive modelthree-component waveformssynthetic seismogramsphase coherencespectral energy preservationwaveform continuation

0 comments

The pith

A transformer autoregressive model forecasts three-component seismograms from P-wave context onward, achieving median normalized cross-correlation above 0.93 on synthetic data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The work formulates seismic forecasting as a continuation task in which a transformer receives waveform context past the S-wave arrival and then generates future samples recursively without ground truth. Evaluation uses synthetic three-component records spanning source depths 5-100 km, distances 10-90 degrees, and magnitudes 3-7, with three context-ratio setups and fixed horizons of 120 and 240 seconds. Across every setup the median correlation stays above 0.93, and successful rollouts keep both phase alignment and spectral content intact. Failures appear mainly as gradual phase drift rather than creation of non-physical signals.

Core claim

SeismoGPT demonstrates that a transformer can learn stable dynamical continuation of seismic wavefields in the time domain, producing forecasts whose median normalized cross-correlation exceeds 0.93 while preserving phase coherence and spectral energy distribution on the tested synthetic ensemble.

What carries the argument

SeismoGPT, a transformer-based autoregressive model that performs physically constrained continuation of three-component waveforms starting from P-wave arrival.

If this is right

Successful forecasts preserve both phase coherence and spectral energy distribution of the input waveforms.
Failure cases arise primarily from gradual phase drift during autoregressive rollout rather than unphysical signal generation.
The results indicate that transformer sequence models can learn stable continuation of seismic wavefields on the tested parameter ranges.
The methodology carries potential applications in seismic warning and hazard mitigation, including for next-generation gravitational-wave observatories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the phase-drift failure mode can be mitigated by longer training or auxiliary loss terms, the same architecture might support longer prediction horizons.
Training on a wider mix of real and synthetic records could test whether the learned continuation transfers beyond the current synthetic ensemble.
The continuation framing used here could be applied to other multi-component wave-propagation problems where only partial observations are available.

Load-bearing premise

Synthetic seismograms generated across the stated ranges of depth, distance, and magnitude are representative enough for the autoregressive model to learn continuation that would hold on real recorded data.

What would settle it

Direct comparison of model forecasts against recorded three-component seismograms from real earthquakes of comparable magnitude and distance would show whether the reported correlation levels persist outside the synthetic training distribution.

Figures

Figures reproduced from arXiv: 2606.02912 by Alexander Kappes, Christine Thomas, Jana Klinge, Stuart Russell, Waleed Esmail.

**Figure 2.** Figure 2: Schematic overview of the synthetic waveform generation pipeline. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the SeismoGPT architecture. Input tokens of shape (T ×C ×K) are embedded by the token encoder, which applies a 1×1 convolution for channel mixing, mean and last-sample pooling over the within-token axis K, and a linear projection with layer normalization to produce a sequence of d-dimensional token embeddings. These are passed through a stack of L causally masked transformer encoder layers. The… view at source ↗

**Figure 4.** Figure 4: Distribution of NCC (left), SRR(center), and PSD log- [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Dependence of NCC (top row), SRR (middle row) and PSD log- [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: Two-dimensional parameter-plane maps of median NCC (top), SRR (middle), and [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Median NCC as a function of context ratio for a fixed prediction horizon of 240 s, [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗

**Figure 8.** Figure 8: Autoregressive forecast of a representative event ( [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗

**Figure 9.** Figure 9: Representative failure case for Configuration A. Shallow intermediate-distance [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

**Figure 10.** Figure 10: Representative failure case for Configuration B. Shallow intermediate-distance [PITH_FULL_IMAGE:figures/full_fig_p027_10.png] view at source ↗

**Figure 11.** Figure 11: Representative failure case for Configuration C. Deep, large-distance event (depth [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗

**Figure 12.** Figure 12: Two-dimensional parameter-plane maps of median NCC (top), SRR (middle), [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗

**Figure 13.** Figure 13: Two-dimensional parameter-plane maps of median NCC (top), SRR (middle), [PITH_FULL_IMAGE:figures/full_fig_p029_13.png] view at source ↗

read the original abstract

Forecasting seismic waveforms beyond observed data remains challenging due to the nonlinear, dispersive, and multi-scale nature of seismic wave propagation. In this work, we introduce \textsc{SeismoGPT}, a transformer-based autoregressive model designed to forecast three-component seismic waveforms directly in the time domain. Forecasting is formulated as a physically constrained continuation problem in which the model receives waveform context beginning at the P-wave arrival and extending a defined time beyond the S-wave arrival, after which future motion is generated recursively without access to ground-truth samples. Evaluation is performed on synthetic seismograms spanning source depths of 5--100\,km, epicentral distances of 10--90$^\circ$, and magnitudes $3 \leq M_w \leq 7$. To disentangle the effects of context length and prediction horizon, we define three evaluation configurations using a distance-normalized context ratio and fixed prediction horizons of 120 and 240\,s. Across all configurations, the model achieves median normalized cross correlation above 0.93. Analysis of representative forecasts shows that successful predictions preserve both phase coherence and spectral energy distribution. Where failure cases arise, this is primarily due to gradual phase drift during autoregressive rollout rather than unphysical signal generation. These results demonstrate that transformer-based sequence models can learn stable dynamical continuation of seismic wavefields, highlighting the potential of foundation-model approaches for physics-driven time-series forecasting. There are potential applications of this methodology in seismic warning and hazard mitigation, particularly for next-generation gravitational-wave observatories, such as the Einstein Telescope.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SeismoGPT gets median correlations above 0.93 on synthetic seismogram continuation but shows no real-data results.

read the letter

The main point is that this transformer model achieves median normalized cross correlation above 0.93 when doing autoregressive forecasting of three-component seismograms on held-out synthetic data. The setup uses context from P-wave arrival onward and rolls out 120 or 240 seconds ahead across three distance-normalized configurations, with source parameters spanning 5-100 km depth, 10-90 degrees distance, and magnitudes 3-7.

What works is the clean problem framing as a continuation task and the observation that good predictions keep both phase and spectral energy. The identified failure mode of gradual phase drift during rollout is plausible for this kind of model. Using synthetics with explicit ranges lets them control context length and horizon without immediate confounding from real-world noise.

The clear gap is the complete absence of real recorded seismograms. All numbers come from synthetic waveforms, so it remains open whether the model has learned actual wave propagation rules or just the statistics of the generator. No baseline comparisons appear in the abstract, and training details are not summarized, which makes it hard to judge how much the transformer architecture adds over simpler sequence models.

This is for groups working on machine learning for geophysical signals or early-warning applications. Readers focused on whether sequence models can handle dispersive wavefields will see a concrete proof-of-concept on synthetics; anyone needing evidence that the approach survives instrument response or site effects will not find it here.

The synthetic results are concrete enough to justify peer review, mainly to check the training procedure and to see whether the authors can add real-data tests.

Referee Report

2 major / 1 minor

Summary. The paper introduces SeismoGPT, a transformer-based autoregressive model for direct time-domain forecasting of three-component seismograms. It formulates forecasting as a continuation problem starting from P-wave arrival context (with defined distance-normalized ratios) and generates future waveforms recursively over fixed 120 s and 240 s horizons. Evaluation on synthetic data spanning 5-100 km depths, 10-90° distances, and Mw 3-7 yields median normalized cross-correlation above 0.93 across configurations, with successful cases preserving phase coherence and spectral energy; failures are attributed to gradual phase drift.

Significance. If the synthetic results hold under the reported conditions, the work demonstrates that transformer sequence models can capture stable dynamical continuation of seismic wavefields without explicit physics constraints, opening a data-driven route to waveform forecasting. This has potential relevance for early-warning systems and next-generation observatories, provided the approach can be shown to generalize beyond the synthetic generator.

major comments (2)

[Abstract] Abstract: the headline median NCC > 0.93 is reported without error bars, baseline comparisons (e.g., against AR models, RNNs, or physics-based propagators), or any description of training procedure, loss, or regularization; these omissions make it impossible to judge whether the performance exceeds what simpler methods achieve on the same synthetic ensemble.
[Abstract] Abstract and evaluation section: all quantitative results are confined to held-out synthetic seismograms generated within the stated depth/distance/magnitude ranges; no experiments or discussion address transfer to real recordings (instrument response, site effects, scattering, noise), which is load-bearing for the claimed applications in seismic warning and for confirming that the autoregressive rollout obeys wave-propagation physics rather than generator-specific statistics.

minor comments (1)

The three evaluation configurations are described only in prose; a small table listing context ratios, horizons, and per-configuration median NCC values would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the headline median NCC > 0.93 is reported without error bars, baseline comparisons (e.g., against AR models, RNNs, or physics-based propagators), or any description of training procedure, loss, or regularization; these omissions make it impossible to judge whether the performance exceeds what simpler methods achieve on the same synthetic ensemble.

Authors: We agree the abstract is concise and will revise it to include a brief statement on the transformer architecture, autoregressive training with MSE loss, and the reported median with variability from the evaluation figures. Training details appear in Section 3; we will add a cross-reference. Baseline comparisons to AR models are in the supplementary material and will be referenced in the abstract revision. revision: yes
Referee: [Abstract] Abstract and evaluation section: all quantitative results are confined to held-out synthetic seismograms generated within the stated depth/distance/magnitude ranges; no experiments or discussion address transfer to real recordings (instrument response, site effects, scattering, noise), which is load-bearing for the claimed applications in seismic warning and for confirming that the autoregressive rollout obeys wave-propagation physics rather than generator-specific statistics.

Authors: The work is framed as a controlled demonstration on synthetic data to isolate the effects of context ratio and horizon. We will expand the discussion to explicitly note the absence of real-data transfer experiments as a limitation and outline future directions for instrument response and noise. This revision clarifies scope without altering the synthetic focus of the present study. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical ML model (SeismoGPT) trained and evaluated exclusively on synthetic seismograms using held-out test data with explicitly defined context ratios and prediction horizons. The central performance claim (median NCC > 0.93) is a direct empirical metric computed between model rollouts and ground-truth synthetics, with no reduction to a fitted parameter, self-definitional loop, or load-bearing self-citation. No uniqueness theorems, ansatzes smuggled via citation, or renaming of known results appear in the provided text. The derivation chain is self-contained as a data-driven forecasting demonstration on synthetics.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities beyond the model name itself; the central claim rests on the unstated assumption that the synthetic data distribution matches real seismicity sufficiently for generalization.

invented entities (1)

SeismoGPT no independent evidence
purpose: Transformer autoregressive model for seismogram forecasting
Introduced as the core contribution; no independent evidence supplied beyond the abstract's performance claim.

pith-pipeline@v0.9.1-grok · 5826 in / 1199 out tokens · 28709 ms · 2026-06-28T12:13:30.399933+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

65 extracted references · 26 canonical work pages · 2 internal anchors

[1]

Springer Science & Business Media, 2012

Haruo Sato, Michael C Fehler, and Takuto Maeda.Seismic wave propagation and scattering in the heterogeneous earth. Springer Science & Business Media, 2012

2012
[2]

Oxford University Press, 2017

Heiner Igel.Computational seismology: a practical introduction. Oxford University Press, 2017

2017
[3]

Robert W. Graves. Simulating seismic wave propagation in 3d elastic media using staggered-grid finite differences.Bulletin of the Seismological Society of America, 86(4): 1091–1106, 1996. doi: 10.1785/bssa0860041091

work page doi:10.1785/bssa0860041091 1996
[4]

Introduction to the spectral element method for three-dimensional seismic wave propagation.Geophysical Journal International, 139 (3):806–822, 1999

Dimitri Komatitsch and Jeroen Tromp. Introduction to the spectral element method for three-dimensional seismic wave propagation.Geophysical Journal International, 139 (3):806–822, 1999. doi: 10.1046/j.1365-246x.1999.00967.x

work page doi:10.1046/j.1365-246x.1999.00967.x 1999
[5]

Spectral-element simulations of global seismic wave propagation—i

Dimitri Komatitsch and Jeroen Tromp. Spectral-element simulations of global seismic wave propagation—i. validation.Geophysical Journal International, 149(2):390–412, 2002

2002
[6]

AxiSEM: broadband 3-D seismic wavefields in axisymmetric media.Solid Earth, 5(1):425–445, 2014

Tarje Nissen-Meyer, Martin van Driel, Simon C Stähler, Kasra Hosseini, Stefanie Hempel, Ludwig Auer, Andrea Colombi, and Alexandre Fournier. AxiSEM: broadband 3-D seismic wavefields in axisymmetric media.Solid Earth, 5(1):425–445, 2014

2014
[7]

Kuangdai Leng, Tarje Nissen-Meyer, and Martin van Driel. Efficient global wave prop- agation adapted to 3-d structural complexity: a pseudospectral/spectral-element ap- proach.Geophysical Supplements to the Monthly Notices of the Royal Astronomical Society, 207(3):1700–1721, 2016

2016
[8]

Axisem3d: broad-band seismic wavefields in 3-d global earth models with undulating discontinuities.Geophysical Journal International, 217(3):2125–2146, 2019

Kuangdai Leng, Tarje Nissen-Meyer, Martin Van Driel, Kasra Hosseini, and David Al-Attar. Axisem3d: broad-band seismic wavefields in 3-d global earth models with undulating discontinuities.Geophysical Journal International, 217(3):2125–2146, 2019

2019
[9]

Xu and Z

T. Xu and Z. Zhang. Numerical simulation of 3-d seismic wave based on alternative flux finite-difference weno scheme.Geophysical Journal International, 238(1):496–512,
[10]

doi: 10.1093/gji/ggae167

work page doi:10.1093/gji/ggae167
[11]

Spectral-element simulations of global seismic wave propagation—ii

Dimitri Komatitsch and Jeroen Tromp. Spectral-element simulations of global seismic wave propagation—ii. three-dimensional models, oceans, rotation and self-gravitation. Geophysical journal international, 150(1):303–318, 2002

2002
[12]

On the modelling of self- gravitation for full 3-d global seismic wave propagation.Geophysical Journal Interna- tional, 227(1):632–643, 2021

Martin van Driel, Johannes Kemper, and Christian Boehm. On the modelling of self- gravitation for full 3-d global seismic wave propagation.Geophysical Journal Interna- tional, 227(1):632–643, 2021

2021
[13]

Lyu et al

D. Lyu et al. Rapid wavefield forecasting for earthquake early warning via deep sequence to sequence learning.Nature Communications, 16(1), 2025. 30

2025
[14]

Mostafa Mousavi and Gregory C

S. Mostafa Mousavi and Gregory C. Beroza. Machine learning in earthquake seis- mology.Annual Review of Earth and Planetary Sciences, 50:641–666, 2022. doi: 10.1146/annurev-earth-032320-041749

work page doi:10.1146/annurev-earth-032320-041749 2022
[15]

SeisLM: a foundation model for seismic waveforms.arXiv preprint arXiv:2410.15765, 2024

Tianlin Liu et al. SeisLM: a foundation model for seismic waveforms.arXiv preprint arXiv:2410.15765, 2024

arXiv 2024
[16]

H. Kubo. Recent advances in earthquake seismology using machine learning.Earth, Planets and Space, 76(1):1–22, 2024. doi: 10.1186/s40623-024-01966-w

work page doi:10.1186/s40623-024-01966-w 2024
[17]

S. M. Mousavi, W. L. Ellsworth, W. Zhu, L. Y. Chuang, and G. C. Beroza. Earth- quake transformer—an attentive deep-learning model for simultaneous earthquake de- tection and phase picking.Nature Communications, 11(1):1–12, 2020. doi: 10.1038/ s41467-020-17591-w

2020
[18]

Weiqiang Zhu and Gregory C. Beroza. PhaseNet: a deep-neural-network-based seismic arrival-time picking method.Geophysical Journal International, 216(1):261–273, 2019. doi: 10.1093/gji/ggy423

work page doi:10.1093/gji/ggy423 2019
[19]

Mostafa Mousavi, and Gregory C

Weiqiang Zhu, S. Mostafa Mousavi, and Gregory C. Beroza. Seismic signal denoising and decomposition using deep neural networks.IEEE Transactions on Geoscience and Remote Sensing, 57(11):9476–9488, 2019. doi: 10.1109/TGRS.2019.2926772

work page doi:10.1109/tgrs.2019.2926772 2019
[20]

Q. Kong, R. M. Allen, L. Schreier, and Y. W. Kwon. Machine learning aspects of the MyShake global smartphone seismic network.Seismological Research Letters, 89(5): 1887–1896, 2018. doi: 10.1785/0220180037

work page doi:10.1785/0220180037 2018
[21]

Suppression of wind turbine noise from seismologi- cal data using nonlinear thresholding and denoising autoencoder.Journal of Seismology, 26(5):913–934, 2022

Janis Heuel and Wolfgang Friederich. Suppression of wind turbine noise from seismologi- cal data using nonlinear thresholding and denoising autoencoder.Journal of Seismology, 26(5):913–934, 2022

2022
[22]

Ross, Men-Andrin Meier, Egill Hauksson, and Thomas H

Zachary E. Ross, Men-Andrin Meier, Egill Hauksson, and Thomas H. Heaton. Gener- alized seismic phase detection with deep learning.Bulletin of the Seismological Society of America, 108(5A):2894–2901, 2018. doi: 10.1785/0120180080

work page doi:10.1785/0120180080 2018
[23]

F. F. Mojtahedi. Deep learning for time series forecasting: Review and applications in geotechnics and geosciences.Current Trends in Geotechnical Engineering, 2025

2025
[24]

Q. Kong, R. M. Allen, L. Schreier, and Y. W. Kwon. Machine learning in seismology: Turning data into insights.Seismological Research Letters, 89(4):1429–1441, 2018. doi: 10.1785/0220180112

work page doi:10.1785/0220180112 2018
[25]

SeismicNet: Physics-informed neural networks for seismic wave modeling in semi-infinite domain

PuRen, ChengpingRao, SuChen, Jian-XunWang, HaoSun, andYangLiu. SeismicNet: Physics-informed neural networks for seismic wave modeling in semi-infinite domain. arXiv preprint arXiv:2210.14044, 2022. URL https://arxiv.org/abs/2210.14044. 31

arXiv 2022
[26]

Applications of physics-informed neural networks in geosciences: From basic seismology to comprehensive environmental monitoring.Journal of Applied Geophysics, 2024

Maan Habib et al. Applications of physics-informed neural networks in geosciences: From basic seismology to comprehensive environmental monitoring.Journal of Applied Geophysics, 2024. doi: 10.1016/j.jappgeo.2024.105342

work page doi:10.1016/j.jappgeo.2024.105342 2024
[27]

Forecasting Seismic Waveforms: A Deep Learning Approach for Einstein Telescope.arXiv preprint arXiv:2509.21446, 2025

Waleed Esmail, Alexander Kappes, Stuart Russell, and Christine Thomas. Forecasting Seismic Waveforms: A Deep Learning Approach for Einstein Telescope.arXiv preprint arXiv:2509.21446, 2025. URL https://arxiv.org/abs/2509.21446

arXiv 2025
[28]

A survey of large language models.arXiv preprint arXiv:2303.18223, 2023

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models.arXiv preprint arXiv:2303.18223, 2023. URL https://arxiv.org/abs/ 2303.18223

Pith/arXiv arXiv 2023
[29]

The status of earthquake early warning around the world: An introductory overview.Seismological Research Letters, 80, 09 2009

Richard Allen, Paolo Gasparini, Osamu Kamigaichi, and Maren Böse. The status of earthquake early warning around the world: An introductory overview.Seismological Research Letters, 80, 09 2009. doi: 10.1785/gssrl.80.5.682

work page doi:10.1785/gssrl.80.5.682 2009
[30]

( year 2026 ), month Mar

Adrian Abac et al. The Science of the Einstein Telescope.JCAP, 03:081, 2026. doi: 10.1088/1475-7516/2026/03/081

work page doi:10.1088/1475-7516/2026/03/081 2026
[31]

Harms, Living Reviews in Relativity22, 6 (2019), URL https://doi.org/10.1007/s41114-019-0022-2

Jan Harms. Terrestrial gravity fluctuations.Living Rev. Rel., 22(1):6, 2019. doi: 10.1007/s41114-019-0022-2

work page doi:10.1007/s41114-019-0022-2 2019
[32]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, pages 5998–6008, 2017. URL https://arxiv. org/abs/1706.03762

Pith/arXiv arXiv 2017
[33]

TOTEM: TOkenized time series EMbeddings for general time series analysis.Transactions on Machine Learning Re- search, 2024

Sabera Talukder, Yisong Yue, and Georgia Gkioxari. TOTEM: TOkenized time series EMbeddings for general time series analysis.Transactions on Machine Learning Re- search, 2024. URL https://arxiv.org/abs/2402.16412. arXiv preprint arXiv:2402.16412

arXiv 2024
[34]

Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815, 2024

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, et al. Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815, 2024. URL https://arxiv.org/abs/2403.07815

Pith/arXiv arXiv 2024
[36]

URL https://arxiv.org/abs/2412.05244

arXiv
[37]

Anjos, Sebastian Lautz, and Aleksandar Kolev

Egon Peršak, Miguel F. Anjos, Sebastian Lautz, and Aleksandar Kolev. Multiple- resolution tokenization for time series forecasting with an application to pricing.arXiv preprint arXiv:2407.03185, 2024. URL https://arxiv.org/abs/2407.03185. 32

arXiv 2024
[38]

TOKON: TOKenization-optimized normalization for time series anal- ysis with a large language model.arXiv preprint arXiv:2502.05701, 2025

Janghoon Yang. TOKON: TOKenization-optimized normalization for time series anal- ysis with a large language model.arXiv preprint arXiv:2502.05701, 2025. URL https://arxiv.org/abs/2502.05701

arXiv 2025
[39]

Ergodic seismic precursors and transfer learning for short term eruption forecasting at data scarce volcanoes.Nature Commu- nications, 16, 02 2025

Alberto Ardid Segura, David Dempsey, Corentin Caudron, Shane Cronin, Ben Kennedy, Társilo Girona, Diana Roman, Craig Miller, Sally Potter, Oliver Lamb, Martanto Mar- tanto, Yesim Cubuk, Leoncio Cabrera, Sergio Ruiz, Rodrigo Contreras-Arratia, Javier Pacheco, Mauricio Mora, and Silvio De Angelis. Ergodic seismic precursors and transfer learning for short t...

work page doi:10.1038/s41467-025-56689-x 2025
[40]

Abolfazl Farahani, Sahar Voghoei, Khaled Rasheed, and Hamid R. Arabnia. A brief re- view of domain adaptation. In Robert Stahlbock, Gary M. Weiss, Mahmoud Abou-Nasr, Cheng-Ying Yang, Hamid R. Arabnia, and Leonidas Deligiannidis, editors,Advances in Data Science and Information Engineering, pages 877–894, Cham, 2021. Springer In- ternational Publishing. IS...

2021
[41]

Instaseis: instant global seismograms based on a broadband waveform database

Martin van Driel, Lion Krischer, Simon C Stähler, Kambod Hosseini, and Tarje Nissen- Meyer. Instaseis: instant global seismograms based on a broadband waveform database. Solid Earth, 6(2):701–717, 2015

2015
[42]

Data products at the IRIS-DMC: Growth and usage

Alexander R Hutko, Manochehr Bahavar, Chad Trabant, Robert T Weekly, Mick Van Fossen, and Timothy Ahern. Data products at the IRIS-DMC: Growth and usage. Seismological Research Letters, 88(3):892–903, 2017. doi: 10.1785/0220160190

work page doi:10.1785/0220160190 2017
[43]

Hutko, Martin van Driel, Simon Stähler, Manochehr Ba- havar, Chad Trabant, and Tarje Nissen-Meyer

Lion Krischer, Alexander R. Hutko, Martin van Driel, Simon Stähler, Manochehr Ba- havar, Chad Trabant, and Tarje Nissen-Meyer. On-demand custom broadband syn- thetic seismograms.Seismological Research Letters, 88(4):1127–1140, 04 2017. ISSN 0895-0695. doi: 10.1785/0220160210. URL https://doi.org/10.1785/0220160210

work page doi:10.1785/0220160210 2017
[44]

Montagner and B

J.-P. Montagner and B. L. N. Kennett. How to reconcile body-wave and normal-mode reference earth models.Geophysical Journal International, 125(1):229–248, 04 1996. ISSN 0956-540X. doi: 10.1111/j.1365-246X.1996.tb06548.x. URL https://doi.org/10. 1111/j.1365-246X.1996.tb06548.x

work page doi:10.1111/j.1365-246x.1996.tb06548.x 1996
[45]

Determination of earthquake source parameters from waveform data for studies of global and regional seismicity

Adam M Dziewonski, T-A Chou, and John H Woodhouse. Determination of earthquake source parameters from waveform data for studies of global and regional seismicity. Journal of Geophysical Research: Solid Earth, 86(B4):2825–2852, 1981

1981
[46]

The global CMT project 2004–2010: Centroid-moment tensors for 13,017 earthquakes.Physics of the Earth and Planetary Interiors, 200:1–9, 2012

Göran Ekström, Meredith Nettles, and Adam M Dziewoński. The global CMT project 2004–2010: Centroid-moment tensors for 13,017 earthquakes.Physics of the Earth and Planetary Interiors, 200:1–9, 2012

2004
[47]

The taup toolkit: Flexible seismic travel-time and ray-path utilities.Seismological Research Letters, 70:154–160, 1999

H Philip Crotwell, Thomas J Owens, Jeroen Ritsema, et al. The taup toolkit: Flexible seismic travel-time and ray-path utilities.Seismological Research Letters, 70:154–160, 1999. 33

1999
[48]

Obspy: A python toolbox for seismology.Seismological Research Letters, 81(3):530–533, 2010

Moritz Beyreuther, Robert Barsch, Lion Krischer, Tobias Megies, Yannik Behr, and Joachim Wassermann. Obspy: A python toolbox for seismology.Seismological Research Letters, 81(3):530–533, 2010

2010
[49]

Improving language understanding by genera- tive pre-training

Alec Radford and Karthik Narasimhan. Improving language understanding by genera- tive pre-training. 2018. URL https://api.semanticscholar.org/CorpusID:49313245

2018
[50]

Language models are unsupervised multitask learners

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019. URL https://api. semanticscholar.org/CorpusID:160025533

2019
[51]

AudioLM: A language modeling approach to audio generation

Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, and Neil Zeghidour. AudioLM: A language modeling approach to audio generation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:2523–2533,
[52]

doi: 10.1109/TASLP.2023.3288409

work page doi:10.1109/taslp.2023.3288409 2023
[53]

ProtGPT2 is a deep unsupervised language model for protein design.Nature Communications, 13(1):4348, 2022

Noelia Ferruz, Steffen Schmidt, and Birte Höcker. ProtGPT2 is a deep unsupervised language model for protein design.Nature Communications, 13(1):4348, 2022. doi: 10.1038/s41467-022-32007-7

work page doi:10.1038/s41467-022-32007-7 2022
[54]

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization.arXiv preprint arXiv:1607.06450, 2016. doi: 10.48550/arXiv.1607.06450

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1607.06450 2016
[55]

Szegedy, C., Vanhoucke, V ., Ioffe, S., Shlens, J., and Wojna, Z

Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Bo Wen, and Yunfeng Liu. Ro- Former: Enhanced transformer with rotary position embedding.Neurocomputing, 568: 127063, 2024. doi: 10.1016/j.neucom.2023.127063

work page doi:10.1016/j.neucom.2023.127063 2024
[56]

Gaussian Error Linear Units (GELUs)

Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (GELUs).arXiv preprint arXiv:1606.08415, 2016. doi: 10.48550/arXiv.1606.08415

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1606.08415 2016
[57]

Parallel wavegan: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

Ryuichi Yamamoto, Eunwoo Song, and Jae-Min Kim. Parallel wavegan: A fast waveform generation model based on generative adversarial networks with multi- resolution spectrogram. InICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6199–6203, 2020. doi: 10.1109/ICASSP40776.2020.9053795

work page doi:10.1109/icassp40776.2020.9053795 2020
[58]

Pytorch: An imperative style, high-performance deep learn- ing library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gre- gory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Al- ban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-p...

2019
[59]

PyTorch Lightning, March 2019

William Falcon and The PyTorch Lightning team. PyTorch Lightning, March 2019. URL https://github.com/Lightning-AI/lightning

2019
[60]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInterna- tional Conference on Learning Representations (ICLR), 2019. URL https://arxiv.org/ abs/1711.05101

Pith/arXiv arXiv 2019
[61]

Similaritymeasures for time series forecasting: A survey.Algorithms, 15(10):354, 2022

KonstantinosPapasotiriou, NikolaosBakas, andAndreasLangousis. Similaritymeasures for time series forecasting: A survey.Algorithms, 15(10):354, 2022. doi: 10.3390/ a15100354

2022
[62]

Perspectives of cross-correlation in seismic monitoring at the international data centre.Pure and Applied Geophysics, 171 (3):439–468, 2014

Dmitry Bobrov, Ivan Kitov, and Lassina Zerbo. Perspectives of cross-correlation in seismic monitoring at the international data centre.Pure and Applied Geophysics, 171 (3):439–468, 2014. doi: 10.1007/s00024-012-0626-x

work page doi:10.1007/s00024-012-0626-x 2014
[63]

Peter Welch. The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms.IEEE Transactions on Audio and Electroacoustics, 15(2):70–73, 1967. doi: 10.1109/TAU.1967.1161901

work page doi:10.1109/tau.1967.1161901 1967
[64]

Brown, Benjamin Chess, Re- won Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Re- won Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling Laws for Neural Language Models. 1 2020

2020
[65]

S. W. French and B. Romanowicz. Whole-mantle radially anisotropic shear velocity structure from spectral-element waveform tomography.Geophysical Journal Interna- tional, 199:1303–1327, 2014

2014
[66]

W. Lei, Y. Ruan, E. Bozdağ, D. Peter, M. Lefebvre, D. Komatitsch, J. Tromp, J. Hill, N. Podhorszki, and D. Pugmire. Global adjoint tomography—model GLAD-M25.Geo- physical Journal International, 223(1):1–21, 2020. 35

2020

[1] [1]

Springer Science & Business Media, 2012

Haruo Sato, Michael C Fehler, and Takuto Maeda.Seismic wave propagation and scattering in the heterogeneous earth. Springer Science & Business Media, 2012

2012

[2] [2]

Oxford University Press, 2017

Heiner Igel.Computational seismology: a practical introduction. Oxford University Press, 2017

2017

[3] [3]

Robert W. Graves. Simulating seismic wave propagation in 3d elastic media using staggered-grid finite differences.Bulletin of the Seismological Society of America, 86(4): 1091–1106, 1996. doi: 10.1785/bssa0860041091

work page doi:10.1785/bssa0860041091 1996

[4] [4]

Introduction to the spectral element method for three-dimensional seismic wave propagation.Geophysical Journal International, 139 (3):806–822, 1999

Dimitri Komatitsch and Jeroen Tromp. Introduction to the spectral element method for three-dimensional seismic wave propagation.Geophysical Journal International, 139 (3):806–822, 1999. doi: 10.1046/j.1365-246x.1999.00967.x

work page doi:10.1046/j.1365-246x.1999.00967.x 1999

[5] [5]

Spectral-element simulations of global seismic wave propagation—i

Dimitri Komatitsch and Jeroen Tromp. Spectral-element simulations of global seismic wave propagation—i. validation.Geophysical Journal International, 149(2):390–412, 2002

2002

[6] [6]

AxiSEM: broadband 3-D seismic wavefields in axisymmetric media.Solid Earth, 5(1):425–445, 2014

Tarje Nissen-Meyer, Martin van Driel, Simon C Stähler, Kasra Hosseini, Stefanie Hempel, Ludwig Auer, Andrea Colombi, and Alexandre Fournier. AxiSEM: broadband 3-D seismic wavefields in axisymmetric media.Solid Earth, 5(1):425–445, 2014

2014

[7] [7]

Kuangdai Leng, Tarje Nissen-Meyer, and Martin van Driel. Efficient global wave prop- agation adapted to 3-d structural complexity: a pseudospectral/spectral-element ap- proach.Geophysical Supplements to the Monthly Notices of the Royal Astronomical Society, 207(3):1700–1721, 2016

2016

[8] [8]

Axisem3d: broad-band seismic wavefields in 3-d global earth models with undulating discontinuities.Geophysical Journal International, 217(3):2125–2146, 2019

Kuangdai Leng, Tarje Nissen-Meyer, Martin Van Driel, Kasra Hosseini, and David Al-Attar. Axisem3d: broad-band seismic wavefields in 3-d global earth models with undulating discontinuities.Geophysical Journal International, 217(3):2125–2146, 2019

2019

[9] [9]

Xu and Z

T. Xu and Z. Zhang. Numerical simulation of 3-d seismic wave based on alternative flux finite-difference weno scheme.Geophysical Journal International, 238(1):496–512,

[10] [10]

doi: 10.1093/gji/ggae167

work page doi:10.1093/gji/ggae167

[11] [11]

Spectral-element simulations of global seismic wave propagation—ii

Dimitri Komatitsch and Jeroen Tromp. Spectral-element simulations of global seismic wave propagation—ii. three-dimensional models, oceans, rotation and self-gravitation. Geophysical journal international, 150(1):303–318, 2002

2002

[12] [12]

On the modelling of self- gravitation for full 3-d global seismic wave propagation.Geophysical Journal Interna- tional, 227(1):632–643, 2021

Martin van Driel, Johannes Kemper, and Christian Boehm. On the modelling of self- gravitation for full 3-d global seismic wave propagation.Geophysical Journal Interna- tional, 227(1):632–643, 2021

2021

[13] [13]

Lyu et al

D. Lyu et al. Rapid wavefield forecasting for earthquake early warning via deep sequence to sequence learning.Nature Communications, 16(1), 2025. 30

2025

[14] [14]

Mostafa Mousavi and Gregory C

S. Mostafa Mousavi and Gregory C. Beroza. Machine learning in earthquake seis- mology.Annual Review of Earth and Planetary Sciences, 50:641–666, 2022. doi: 10.1146/annurev-earth-032320-041749

work page doi:10.1146/annurev-earth-032320-041749 2022

[15] [15]

SeisLM: a foundation model for seismic waveforms.arXiv preprint arXiv:2410.15765, 2024

Tianlin Liu et al. SeisLM: a foundation model for seismic waveforms.arXiv preprint arXiv:2410.15765, 2024

arXiv 2024

[16] [16]

H. Kubo. Recent advances in earthquake seismology using machine learning.Earth, Planets and Space, 76(1):1–22, 2024. doi: 10.1186/s40623-024-01966-w

work page doi:10.1186/s40623-024-01966-w 2024

[17] [17]

S. M. Mousavi, W. L. Ellsworth, W. Zhu, L. Y. Chuang, and G. C. Beroza. Earth- quake transformer—an attentive deep-learning model for simultaneous earthquake de- tection and phase picking.Nature Communications, 11(1):1–12, 2020. doi: 10.1038/ s41467-020-17591-w

2020

[18] [18]

Weiqiang Zhu and Gregory C. Beroza. PhaseNet: a deep-neural-network-based seismic arrival-time picking method.Geophysical Journal International, 216(1):261–273, 2019. doi: 10.1093/gji/ggy423

work page doi:10.1093/gji/ggy423 2019

[19] [19]

Mostafa Mousavi, and Gregory C

Weiqiang Zhu, S. Mostafa Mousavi, and Gregory C. Beroza. Seismic signal denoising and decomposition using deep neural networks.IEEE Transactions on Geoscience and Remote Sensing, 57(11):9476–9488, 2019. doi: 10.1109/TGRS.2019.2926772

work page doi:10.1109/tgrs.2019.2926772 2019

[20] [20]

Q. Kong, R. M. Allen, L. Schreier, and Y. W. Kwon. Machine learning aspects of the MyShake global smartphone seismic network.Seismological Research Letters, 89(5): 1887–1896, 2018. doi: 10.1785/0220180037

work page doi:10.1785/0220180037 2018

[21] [21]

Suppression of wind turbine noise from seismologi- cal data using nonlinear thresholding and denoising autoencoder.Journal of Seismology, 26(5):913–934, 2022

Janis Heuel and Wolfgang Friederich. Suppression of wind turbine noise from seismologi- cal data using nonlinear thresholding and denoising autoencoder.Journal of Seismology, 26(5):913–934, 2022

2022

[22] [22]

Ross, Men-Andrin Meier, Egill Hauksson, and Thomas H

Zachary E. Ross, Men-Andrin Meier, Egill Hauksson, and Thomas H. Heaton. Gener- alized seismic phase detection with deep learning.Bulletin of the Seismological Society of America, 108(5A):2894–2901, 2018. doi: 10.1785/0120180080

work page doi:10.1785/0120180080 2018

[23] [23]

F. F. Mojtahedi. Deep learning for time series forecasting: Review and applications in geotechnics and geosciences.Current Trends in Geotechnical Engineering, 2025

2025

[24] [24]

Q. Kong, R. M. Allen, L. Schreier, and Y. W. Kwon. Machine learning in seismology: Turning data into insights.Seismological Research Letters, 89(4):1429–1441, 2018. doi: 10.1785/0220180112

work page doi:10.1785/0220180112 2018

[25] [25]

SeismicNet: Physics-informed neural networks for seismic wave modeling in semi-infinite domain

PuRen, ChengpingRao, SuChen, Jian-XunWang, HaoSun, andYangLiu. SeismicNet: Physics-informed neural networks for seismic wave modeling in semi-infinite domain. arXiv preprint arXiv:2210.14044, 2022. URL https://arxiv.org/abs/2210.14044. 31

arXiv 2022

[26] [26]

Applications of physics-informed neural networks in geosciences: From basic seismology to comprehensive environmental monitoring.Journal of Applied Geophysics, 2024

Maan Habib et al. Applications of physics-informed neural networks in geosciences: From basic seismology to comprehensive environmental monitoring.Journal of Applied Geophysics, 2024. doi: 10.1016/j.jappgeo.2024.105342

work page doi:10.1016/j.jappgeo.2024.105342 2024

[27] [27]

Forecasting Seismic Waveforms: A Deep Learning Approach for Einstein Telescope.arXiv preprint arXiv:2509.21446, 2025

Waleed Esmail, Alexander Kappes, Stuart Russell, and Christine Thomas. Forecasting Seismic Waveforms: A Deep Learning Approach for Einstein Telescope.arXiv preprint arXiv:2509.21446, 2025. URL https://arxiv.org/abs/2509.21446

arXiv 2025

[28] [28]

A survey of large language models.arXiv preprint arXiv:2303.18223, 2023

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models.arXiv preprint arXiv:2303.18223, 2023. URL https://arxiv.org/abs/ 2303.18223

Pith/arXiv arXiv 2023

[29] [29]

The status of earthquake early warning around the world: An introductory overview.Seismological Research Letters, 80, 09 2009

Richard Allen, Paolo Gasparini, Osamu Kamigaichi, and Maren Böse. The status of earthquake early warning around the world: An introductory overview.Seismological Research Letters, 80, 09 2009. doi: 10.1785/gssrl.80.5.682

work page doi:10.1785/gssrl.80.5.682 2009

[30] [30]

( year 2026 ), month Mar

Adrian Abac et al. The Science of the Einstein Telescope.JCAP, 03:081, 2026. doi: 10.1088/1475-7516/2026/03/081

work page doi:10.1088/1475-7516/2026/03/081 2026

[31] [31]

Harms, Living Reviews in Relativity22, 6 (2019), URL https://doi.org/10.1007/s41114-019-0022-2

Jan Harms. Terrestrial gravity fluctuations.Living Rev. Rel., 22(1):6, 2019. doi: 10.1007/s41114-019-0022-2

work page doi:10.1007/s41114-019-0022-2 2019

[32] [32]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, pages 5998–6008, 2017. URL https://arxiv. org/abs/1706.03762

Pith/arXiv arXiv 2017

[33] [33]

TOTEM: TOkenized time series EMbeddings for general time series analysis.Transactions on Machine Learning Re- search, 2024

Sabera Talukder, Yisong Yue, and Georgia Gkioxari. TOTEM: TOkenized time series EMbeddings for general time series analysis.Transactions on Machine Learning Re- search, 2024. URL https://arxiv.org/abs/2402.16412. arXiv preprint arXiv:2402.16412

arXiv 2024

[34] [34]

Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815, 2024

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, et al. Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815, 2024. URL https://arxiv.org/abs/2403.07815

Pith/arXiv arXiv 2024

[35] [36]

URL https://arxiv.org/abs/2412.05244

arXiv

[36] [37]

Anjos, Sebastian Lautz, and Aleksandar Kolev

Egon Peršak, Miguel F. Anjos, Sebastian Lautz, and Aleksandar Kolev. Multiple- resolution tokenization for time series forecasting with an application to pricing.arXiv preprint arXiv:2407.03185, 2024. URL https://arxiv.org/abs/2407.03185. 32

arXiv 2024

[37] [38]

TOKON: TOKenization-optimized normalization for time series anal- ysis with a large language model.arXiv preprint arXiv:2502.05701, 2025

Janghoon Yang. TOKON: TOKenization-optimized normalization for time series anal- ysis with a large language model.arXiv preprint arXiv:2502.05701, 2025. URL https://arxiv.org/abs/2502.05701

arXiv 2025

[38] [39]

Ergodic seismic precursors and transfer learning for short term eruption forecasting at data scarce volcanoes.Nature Commu- nications, 16, 02 2025

Alberto Ardid Segura, David Dempsey, Corentin Caudron, Shane Cronin, Ben Kennedy, Társilo Girona, Diana Roman, Craig Miller, Sally Potter, Oliver Lamb, Martanto Mar- tanto, Yesim Cubuk, Leoncio Cabrera, Sergio Ruiz, Rodrigo Contreras-Arratia, Javier Pacheco, Mauricio Mora, and Silvio De Angelis. Ergodic seismic precursors and transfer learning for short t...

work page doi:10.1038/s41467-025-56689-x 2025

[39] [40]

Abolfazl Farahani, Sahar Voghoei, Khaled Rasheed, and Hamid R. Arabnia. A brief re- view of domain adaptation. In Robert Stahlbock, Gary M. Weiss, Mahmoud Abou-Nasr, Cheng-Ying Yang, Hamid R. Arabnia, and Leonidas Deligiannidis, editors,Advances in Data Science and Information Engineering, pages 877–894, Cham, 2021. Springer In- ternational Publishing. IS...

2021

[40] [41]

Instaseis: instant global seismograms based on a broadband waveform database

Martin van Driel, Lion Krischer, Simon C Stähler, Kambod Hosseini, and Tarje Nissen- Meyer. Instaseis: instant global seismograms based on a broadband waveform database. Solid Earth, 6(2):701–717, 2015

2015

[41] [42]

Data products at the IRIS-DMC: Growth and usage

Alexander R Hutko, Manochehr Bahavar, Chad Trabant, Robert T Weekly, Mick Van Fossen, and Timothy Ahern. Data products at the IRIS-DMC: Growth and usage. Seismological Research Letters, 88(3):892–903, 2017. doi: 10.1785/0220160190

work page doi:10.1785/0220160190 2017

[42] [43]

Hutko, Martin van Driel, Simon Stähler, Manochehr Ba- havar, Chad Trabant, and Tarje Nissen-Meyer

Lion Krischer, Alexander R. Hutko, Martin van Driel, Simon Stähler, Manochehr Ba- havar, Chad Trabant, and Tarje Nissen-Meyer. On-demand custom broadband syn- thetic seismograms.Seismological Research Letters, 88(4):1127–1140, 04 2017. ISSN 0895-0695. doi: 10.1785/0220160210. URL https://doi.org/10.1785/0220160210

work page doi:10.1785/0220160210 2017

[43] [44]

Montagner and B

J.-P. Montagner and B. L. N. Kennett. How to reconcile body-wave and normal-mode reference earth models.Geophysical Journal International, 125(1):229–248, 04 1996. ISSN 0956-540X. doi: 10.1111/j.1365-246X.1996.tb06548.x. URL https://doi.org/10. 1111/j.1365-246X.1996.tb06548.x

work page doi:10.1111/j.1365-246x.1996.tb06548.x 1996

[44] [45]

Determination of earthquake source parameters from waveform data for studies of global and regional seismicity

Adam M Dziewonski, T-A Chou, and John H Woodhouse. Determination of earthquake source parameters from waveform data for studies of global and regional seismicity. Journal of Geophysical Research: Solid Earth, 86(B4):2825–2852, 1981

1981

[45] [46]

The global CMT project 2004–2010: Centroid-moment tensors for 13,017 earthquakes.Physics of the Earth and Planetary Interiors, 200:1–9, 2012

Göran Ekström, Meredith Nettles, and Adam M Dziewoński. The global CMT project 2004–2010: Centroid-moment tensors for 13,017 earthquakes.Physics of the Earth and Planetary Interiors, 200:1–9, 2012

2004

[46] [47]

The taup toolkit: Flexible seismic travel-time and ray-path utilities.Seismological Research Letters, 70:154–160, 1999

H Philip Crotwell, Thomas J Owens, Jeroen Ritsema, et al. The taup toolkit: Flexible seismic travel-time and ray-path utilities.Seismological Research Letters, 70:154–160, 1999. 33

1999

[47] [48]

Obspy: A python toolbox for seismology.Seismological Research Letters, 81(3):530–533, 2010

Moritz Beyreuther, Robert Barsch, Lion Krischer, Tobias Megies, Yannik Behr, and Joachim Wassermann. Obspy: A python toolbox for seismology.Seismological Research Letters, 81(3):530–533, 2010

2010

[48] [49]

Improving language understanding by genera- tive pre-training

Alec Radford and Karthik Narasimhan. Improving language understanding by genera- tive pre-training. 2018. URL https://api.semanticscholar.org/CorpusID:49313245

2018

[49] [50]

Language models are unsupervised multitask learners

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019. URL https://api. semanticscholar.org/CorpusID:160025533

2019

[50] [51]

AudioLM: A language modeling approach to audio generation

Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, and Neil Zeghidour. AudioLM: A language modeling approach to audio generation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:2523–2533,

[51] [52]

doi: 10.1109/TASLP.2023.3288409

work page doi:10.1109/taslp.2023.3288409 2023

[52] [53]

ProtGPT2 is a deep unsupervised language model for protein design.Nature Communications, 13(1):4348, 2022

Noelia Ferruz, Steffen Schmidt, and Birte Höcker. ProtGPT2 is a deep unsupervised language model for protein design.Nature Communications, 13(1):4348, 2022. doi: 10.1038/s41467-022-32007-7

work page doi:10.1038/s41467-022-32007-7 2022

[53] [54]

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization.arXiv preprint arXiv:1607.06450, 2016. doi: 10.48550/arXiv.1607.06450

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1607.06450 2016

[54] [55]

Szegedy, C., Vanhoucke, V ., Ioffe, S., Shlens, J., and Wojna, Z

Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Bo Wen, and Yunfeng Liu. Ro- Former: Enhanced transformer with rotary position embedding.Neurocomputing, 568: 127063, 2024. doi: 10.1016/j.neucom.2023.127063

work page doi:10.1016/j.neucom.2023.127063 2024

[55] [56]

Gaussian Error Linear Units (GELUs)

Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (GELUs).arXiv preprint arXiv:1606.08415, 2016. doi: 10.48550/arXiv.1606.08415

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1606.08415 2016

[56] [57]

Parallel wavegan: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

Ryuichi Yamamoto, Eunwoo Song, and Jae-Min Kim. Parallel wavegan: A fast waveform generation model based on generative adversarial networks with multi- resolution spectrogram. InICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6199–6203, 2020. doi: 10.1109/ICASSP40776.2020.9053795

work page doi:10.1109/icassp40776.2020.9053795 2020

[57] [58]

Pytorch: An imperative style, high-performance deep learn- ing library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gre- gory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Al- ban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-p...

2019

[58] [59]

PyTorch Lightning, March 2019

William Falcon and The PyTorch Lightning team. PyTorch Lightning, March 2019. URL https://github.com/Lightning-AI/lightning

2019

[59] [60]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInterna- tional Conference on Learning Representations (ICLR), 2019. URL https://arxiv.org/ abs/1711.05101

Pith/arXiv arXiv 2019

[60] [61]

Similaritymeasures for time series forecasting: A survey.Algorithms, 15(10):354, 2022

KonstantinosPapasotiriou, NikolaosBakas, andAndreasLangousis. Similaritymeasures for time series forecasting: A survey.Algorithms, 15(10):354, 2022. doi: 10.3390/ a15100354

2022

[61] [62]

Perspectives of cross-correlation in seismic monitoring at the international data centre.Pure and Applied Geophysics, 171 (3):439–468, 2014

Dmitry Bobrov, Ivan Kitov, and Lassina Zerbo. Perspectives of cross-correlation in seismic monitoring at the international data centre.Pure and Applied Geophysics, 171 (3):439–468, 2014. doi: 10.1007/s00024-012-0626-x

work page doi:10.1007/s00024-012-0626-x 2014

[62] [63]

Peter Welch. The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms.IEEE Transactions on Audio and Electroacoustics, 15(2):70–73, 1967. doi: 10.1109/TAU.1967.1161901

work page doi:10.1109/tau.1967.1161901 1967

[63] [64]

Brown, Benjamin Chess, Re- won Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Re- won Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling Laws for Neural Language Models. 1 2020

2020

[64] [65]

S. W. French and B. Romanowicz. Whole-mantle radially anisotropic shear velocity structure from spectral-element waveform tomography.Geophysical Journal Interna- tional, 199:1303–1327, 2014

2014

[65] [66]

W. Lei, Y. Ruan, E. Bozdağ, D. Peter, M. Lefebvre, D. Komatitsch, J. Tromp, J. Hill, N. Podhorszki, and D. Pugmire. Global adjoint tomography—model GLAD-M25.Geo- physical Journal International, 223(1):1–21, 2020. 35

2020