The Seismic Wavefield Common Task Framework

Alexey Yermakov; Amy Sara Rude; David Zoro; Georg Maierhofer; Jan Williams; J. Nathan Kutz; Joe Germany; Joseph Bakarji; Judah Goldfeder; Marine Denolle

arxiv: 2512.19927 · v2 · submitted 2025-12-22 · 💻 cs.LG

The Seismic Wavefield Common Task Framework

Alexey Yermakov , Yue Zhao , Marine Denolle , Yiyu Ni , Philippe M. Wyder , Judah Goldfeder , Stefano Riva , Jan Williams

show 8 more authors

David Zoro Amy Sara Rude Matteo Tomasetto Joe Germany Joseph Bakarji Georg Maierhofer Miles Cranmer J. Nathan Kutz

This is my paper

Pith reviewed 2026-05-16 20:08 UTC · model grok-4.3

classification 💻 cs.LG

keywords seismic wavefieldsmachine learningcommon task frameworkbenchmarkingwavefield reconstructionforecastinggeneralizationseismology

0 comments

The pith

A Common Task Framework standardizes machine learning comparisons for seismic wavefield forecasting and reconstruction across global, crustal, and local scales.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a Common Task Framework to enable consistent, reproducible evaluations of machine learning methods on seismic wavefield data. It supplies curated datasets at three scales together with metrics that test forecasting, reconstruction from sparse sensors, and generalization under noise and limited samples. This structure mirrors successful frameworks in other fields and replaces scattered ad hoc tests with hidden test sets. Readers care because reliable head-to-head results can speed progress on practical tasks such as earthquake early warning and ground-motion prediction.

Core claim

The authors introduce a Common Task Framework (CTF) for ML for seismic wavefields, demonstrated on three distinct wavefield datasets at global, crustal, and local scales. The CTF supplies task-specific metrics for forecasting, reconstruction, and generalization under realistic constraints such as noise and limited data. The framework supports standardized head-to-head algorithm evaluation on hidden test sets, replacing ad hoc comparisons with rigorous, reproducible benchmarks.

What carries the argument

The Common Task Framework (CTF), a collection of multi-scale curated datasets paired with task-specific metrics that enforce standardized evaluation of forecasting, reconstruction, and generalization performance.

If this is right

Methods for wavefield reconstruction from sparse sensors can be directly ranked by accuracy under controlled noise and data limits.
Algorithm suitability for specific scales (global versus local) becomes measurable rather than anecdotal.
Hidden test sets prevent overfitting to public data and raise the standard for reported results.
Progress across forecasting and generalization tasks can be tracked uniformly over time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Widespread adoption could accelerate development of real-time seismic monitoring tools by making cross-paper claims directly comparable.
The multi-scale design may naturally support transfer-learning studies between global and local regimes.
Future extensions could incorporate source-mechanism or full-waveform tasks once the core reconstruction benchmarks stabilize.

Load-bearing premise

The chosen datasets and metrics capture real-world parametric variability, noise characteristics, and generalization challenges without selection bias or unintended favoritism toward particular algorithm classes.

What would settle it

Run the same set of algorithms on an independent collection of real seismic recordings never seen during CTF construction; if the performance rankings reverse or collapse, the framework does not yet reflect practical conditions.

Figures

Figures reproduced from arXiv: 2512.19927 by Alexey Yermakov, Amy Sara Rude, David Zoro, Georg Maierhofer, Jan Williams, J. Nathan Kutz, Joe Germany, Joseph Bakarji, Judah Goldfeder, Marine Denolle, Matteo Tomasetto, Miles Cranmer, Philippe M. Wyder, Stefano Riva, Yiyu Ni, Yue Zhao.

**Figure 2.** Figure 2: The Seismic Wavefield CTF scores the performance of methods on seismic wavefield [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Architecture of the Deep Operator Network. The target field at the evaluation point [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗

**Figure 4.** Figure 4: Schematic of the Sparse Identification of Nonlinear Dynamics (SINDy) algorithm from [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗

**Figure 5.** Figure 5: Scheme of the Dynamic Mode Decomposition algorithm from J. Nathan Kutz et al. [2016]. [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗

**Figure 6.** Figure 6: Architecture of the Fourier Neural Operator from Li et al. [2021] [PITH_FULL_IMAGE:figures/full_fig_p031_6.png] view at source ↗

**Figure 7.** Figure 7: Sample architecture of a Kolmogorov-Arnold Network with three layers of size [PITH_FULL_IMAGE:figures/full_fig_p032_7.png] view at source ↗

read the original abstract

Seismology faces fundamental challenges in state forecasting and reconstruction (e.g., earthquake early warning and ground motion prediction) and managing the parametric variability of source locations, mechanisms, and Earth models (e.g., subsurface structure and topography effects). Addressing these with simulations is hindered by their massive scale, both in synthetic data volumes and numerical complexity, while real-data efforts are constrained by models that inadequately reflect the Earth's complexity and by sparse sensor measurements from the field. Recent machine learning (ML) efforts offer promise, but progress is obscured by a lack of proper characterization, fair reporting, and rigorous comparisons. To address this, we introduce a Common Task Framework (CTF) for ML for seismic wavefields, demonstrated here on three distinct wavefield datasets. Our CTF features a curated set of datasets at various scales (global, crustal, and local) and task-specific metrics spanning forecasting, reconstruction, and generalization under realistic constraints such as noise and limited data. Inspired by CTFs in fields like natural language processing, this framework provides a structured and rigorous foundation for head-to-head algorithm evaluation. We evaluate various methods for reconstructing seismic wavefields from sparse sensor measurements, with results illustrating the CTF's utility in revealing strengths, limitations, and suitability for specific problem classes. Our vision is to replace ad hoc comparisons with standardized evaluations on hidden test sets, raising the bar for rigor and reproducibility in scientific ML.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a CTF for seismic ML with three multi-scale datasets and task-specific metrics for forecasting, reconstruction, and generalization, which organizes comparisons better than ad-hoc work but shows no numbers yet.

read the letter

The main takeaway is that this paper sets up a Common Task Framework for machine learning on seismic wavefields. It supplies three curated datasets at global, crustal, and local scales along with metrics that target forecasting, reconstruction, and generalization under noise and limited-data conditions. The goal is to replace scattered comparisons with hidden test sets, modeled on CTFs from other fields like NLP.

Referee Report

2 major / 2 minor

Summary. The paper introduces a Common Task Framework (CTF) for machine learning applied to seismic wavefields. It curates datasets at global, crustal, and local scales and defines task-specific metrics for forecasting, reconstruction, and generalization under constraints such as noise and limited data. The framework is demonstrated via evaluations of methods for sparse-sensor wavefield reconstruction, with the stated goal of replacing ad hoc comparisons with standardized evaluations on hidden test sets to improve rigor and reproducibility.

Significance. If implemented with full quantitative benchmarks and community adoption, the CTF could raise standards for ML in seismology by providing multi-scale datasets and realistic metrics that address parametric variability and data sparsity. The proposal draws useful parallels to CTFs in NLP and offers a structured path for head-to-head algorithm testing. However, the current manuscript presents the framework largely as a definition plus illustration rather than a fully exercised benchmark with reported results.

major comments (2)

[Abstract and demonstration] Abstract and demonstration section: the manuscript describes framework construction and illustrative evaluations on sparse-sensor reconstruction but supplies no quantitative results, error bars, explicit metric definitions, or data-split details. This leaves the central claim of enabling rigorous comparisons without supporting evidence in the provided text.
[Datasets and metrics] Datasets and metrics description: the selection of global/crustal/local datasets and the specific metrics for forecasting, reconstruction, and generalization under noise/limited-data constraints are not detailed enough to evaluate whether they capture real-world variability without selection bias or favoritism toward particular algorithm classes.

minor comments (2)

[Evaluation] Clarify the exact algorithms evaluated in the demonstration and provide references or implementation details for reproducibility.
[Conclusion] Include a brief roadmap for hidden test-set maintenance and community contribution to support long-term adoption.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript introducing the Seismic Wavefield Common Task Framework. We address the major comments point by point below and will revise the manuscript to strengthen the quantitative support and details as suggested.

read point-by-point responses

Referee: [Abstract and demonstration] Abstract and demonstration section: the manuscript describes framework construction and illustrative evaluations on sparse-sensor reconstruction but supplies no quantitative results, error bars, explicit metric definitions, or data-split details. This leaves the central claim of enabling rigorous comparisons without supporting evidence in the provided text.

Authors: We agree that the current demonstration section is primarily illustrative and lacks the quantitative results, error bars, explicit metric definitions, and data-split details needed to fully support the central claims. In the revised manuscript, we will add these elements, including detailed quantitative benchmarks from the sparse-sensor reconstruction evaluations along with error bars, precise metric formulas, and data-split specifications. This will provide concrete evidence for the framework's utility in enabling rigorous, standardized comparisons. revision: yes
Referee: [Datasets and metrics] Datasets and metrics description: the selection of global/crustal/local datasets and the specific metrics for forecasting, reconstruction, and generalization under noise/limited-data constraints are not detailed enough to evaluate whether they capture real-world variability without selection bias or favoritism toward particular algorithm classes.

Authors: We will expand the datasets and metrics sections to include explicit selection criteria and justifications for the global, crustal, and local datasets, with discussion of how they represent real-world parametric variability. We will also detail the metric definitions for forecasting, reconstruction, and generalization under noise and limited-data constraints, and add analysis of potential biases along with the framework's use of hidden test sets to promote fairness across algorithm classes. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces a Common Task Framework by curating three wavefield datasets (global, crustal, local) and defining task-specific metrics for forecasting, reconstruction, and generalization under noise and limited-data constraints. No derivation chain, equations, fitted parameters, or predictions exist that could reduce to inputs by construction. The demonstration of sparse-sensor reconstruction methods is presented as an illustrative evaluation of the framework rather than a self-referential result. No self-citations are load-bearing for any central claim, and the framework is defined independently of its own evaluation outcomes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that standardized tasks and metrics can fairly expose algorithm strengths and limitations across realistic seismology constraints; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Standardized tasks and metrics enable fair head-to-head comparison of ML methods for seismic wavefields
Invoked in the design of the CTF to replace ad hoc evaluations.

pith-pipeline@v0.9.0 · 5601 in / 1296 out tokens · 39927 ms · 2026-05-16T20:08:51.039782+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a Common Task Framework (CTF) for ML for seismic wavefields... task-specific metrics spanning forecasting, reconstruction, and generalization under realistic constraints such as noise and limited data.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The forecasting score combines two components: a short-term forecast score EST and a long-term forecast score ELT... SST and SLT via RMSE and power-spectrum error.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

[1]

URL https://www.science.org/doi/abs/10.1126/ science.aay5881

doi: 10.1126/science.aay5881. URL https://www.science.org/doi/abs/10.1126/ science.aay5881. Xu Liu, Juncheng Liu, Gerald Woo, Taha Aksu, Yuxuan Liang, Roger Zimmermann, Chenghao Liu, Silvio Savarese, Caiming Xiong, and Doyen Sahoo. Moirai-moe: Empowering time series foundation models with sparse mixture of experts.arXiv preprint arXiv:2410.10469, 2024a. X...

work page doi:10.1126/science.aay5881 2022
[2]

URL https://www.sciencedirect.com/science/ article/pii/S0022000004000406

doi: 10.1016/j.jcss.2004.04.001. URL https://www.sciencedirect.com/science/ article/pii/S0022000004000406. Wolfgang Maass, Thomas Natschläger, and Henry Markram. Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations.Neural Computation, 14(11):2531–2560, November 2002. ISSN 0899-7667. doi: 10.1162/08997660...

work page doi:10.1016/j.jcss.2004.04.001 2004
[3]

doi: 10.1017/S0022112009992059

ISSN 1469-7645, 0022-1120. doi: 10.1017/S0022112009992059. URL https: //www.cambridge.org/core/journals/journal-of-fluid-mechanics/article/abs/ spectral-analysis-of-nonlinear-flows/311041E1027AE7FEE7DDA36AC9AD4270 . Publisher: Cambridge University Press. Lars Ruthotto. Differential equations for continuous-time deep learning.arXiv preprint arXiv:2401.0396...

work page doi:10.1017/s0022112009992059 2024
[4]

complex, long-range, andautoregressive

doi: 10.1098/rsta.2021.0199. URL https://royalsocietypublishing.org/doi/abs/ 10.1098/rsta.2021.0199. P. J. Schmid. Dynamic Mode Decomposition of numerical and experimental data.Journal of Fluid Mechanics, 656:5–28, 2010. doi: 10.1017/S0022112010001217. URL https://doi.org/10. 1017/S0022112010001217. Publisher: Cambridge University Press. Qibin Shi, Marine...

work page doi:10.1098/rsta.2021.0199 2021
[5]

finite_difference

in the fluid dynamics community to identify spatio-temporal coherent structures from high- dimensional data. The DMD algorithm is based on the Singular Value Decomposition (SVD) of a data matrix; in particular, DMD is able to provide a modal decomposition where each mode consists of spatially correlated structures that have the same linear behaviour in ti...

work page 2009

[1] [1]

URL https://www.science.org/doi/abs/10.1126/ science.aay5881

doi: 10.1126/science.aay5881. URL https://www.science.org/doi/abs/10.1126/ science.aay5881. Xu Liu, Juncheng Liu, Gerald Woo, Taha Aksu, Yuxuan Liang, Roger Zimmermann, Chenghao Liu, Silvio Savarese, Caiming Xiong, and Doyen Sahoo. Moirai-moe: Empowering time series foundation models with sparse mixture of experts.arXiv preprint arXiv:2410.10469, 2024a. X...

work page doi:10.1126/science.aay5881 2022

[2] [2]

URL https://www.sciencedirect.com/science/ article/pii/S0022000004000406

doi: 10.1016/j.jcss.2004.04.001. URL https://www.sciencedirect.com/science/ article/pii/S0022000004000406. Wolfgang Maass, Thomas Natschläger, and Henry Markram. Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations.Neural Computation, 14(11):2531–2560, November 2002. ISSN 0899-7667. doi: 10.1162/08997660...

work page doi:10.1016/j.jcss.2004.04.001 2004

[3] [3]

doi: 10.1017/S0022112009992059

ISSN 1469-7645, 0022-1120. doi: 10.1017/S0022112009992059. URL https: //www.cambridge.org/core/journals/journal-of-fluid-mechanics/article/abs/ spectral-analysis-of-nonlinear-flows/311041E1027AE7FEE7DDA36AC9AD4270 . Publisher: Cambridge University Press. Lars Ruthotto. Differential equations for continuous-time deep learning.arXiv preprint arXiv:2401.0396...

work page doi:10.1017/s0022112009992059 2024

[4] [4]

complex, long-range, andautoregressive

doi: 10.1098/rsta.2021.0199. URL https://royalsocietypublishing.org/doi/abs/ 10.1098/rsta.2021.0199. P. J. Schmid. Dynamic Mode Decomposition of numerical and experimental data.Journal of Fluid Mechanics, 656:5–28, 2010. doi: 10.1017/S0022112010001217. URL https://doi.org/10. 1017/S0022112010001217. Publisher: Cambridge University Press. Qibin Shi, Marine...

work page doi:10.1098/rsta.2021.0199 2021

[5] [5]

finite_difference

in the fluid dynamics community to identify spatio-temporal coherent structures from high- dimensional data. The DMD algorithm is based on the Singular Value Decomposition (SVD) of a data matrix; in particular, DMD is able to provide a modal decomposition where each mode consists of spatially correlated structures that have the same linear behaviour in ti...

work page 2009