pith. sign in

arxiv: 2511.06731 · v2 · submitted 2025-11-10 · ⚛️ physics.geo-ph · cs.AI

Recovering Sub-threshold S-wave Arrivals in Deep Learning Phase Pickers via Shape-Aware Loss

Pith reviewed 2026-05-18 00:22 UTC · model grok-4.3

classification ⚛️ physics.geo-ph cs.AI
keywords seismic phase pickingdeep learningS-wave arrivalsconditional GANshape-aware lossloss landscapesub-threshold signalsearthquake monitoring
0
0 comments X

The pith

Deep learning seismic pickers miss clear S-waves because pointwise losses create an optimization trap; shape-then-align via conditional GAN recovers them and boosts detections by 64%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper diagnoses why deep learning phase pickers produce only sub-threshold distorted peaks for some unambiguous S-wave arrivals while handling P-waves correctly. It traces the failure to three interacting factors: temporal uncertainty in S-wave labels, CNN bias toward amplitude boundaries, and pointwise losses that supply no lateral corrective forces. The authors argue that arrival labels are coherent shapes rather than independent point estimates, so training must first preserve shape then align position. They formalize this as the shape-then-align strategy and show a conditional GAN implementation that recovers the missed signals. The result is a 64% increase in effective S-phase detections, and the work supplies loss-landscape visualization tools that turn label and loss choices into objects of systematic analysis.

Core claim

Phase arrival labels are structured shapes rather than independent probability estimates, requiring training objectives that preserve coherence. Temporal uncertainty in S-wave arrivals, CNN bias toward amplitude boundaries, and pointwise loss limitations interact to trap predictions below the detection threshold. The shape-then-align strategy, implemented through a conditional GAN, recovers previously sub-threshold signals and achieves a 64% increase in effective S-phase detections. Loss landscape visualization and numerical simulation provide a general methodology for analyzing how label designs and loss functions interact with temporal uncertainty.

What carries the argument

The shape-then-align strategy, which first enforces coherence between the predicted waveform shape and the label shape before refining temporal alignment, implemented as a conditional GAN.

Load-bearing premise

The three diagnosed factors are the dominant causes of the sub-threshold failure mode and the conditional GAN version of shape-then-align generalizes beyond the tested datasets and models.

What would settle it

Retrain a standard phase picker on the same data using the shape-aware loss and measure whether the fraction of sub-threshold S-wave predictions drops by roughly 64% on an independent test set; if the gain disappears, the claim is falsified.

Figures

Figures reproduced from arXiv: 2511.06731 by An-Sheng Lee, Chun-Ming Huang, Hao Kuo-Chen, I-Hsin Chang, Li-Heng Chang.

Figure 1
Figure 1. Figure 1: The dynamic process of amplitude suppression. Color coding: P-wave (blue), S-wave (orange), Noise (green); light/dark colors show labels/predictions. This figure visualizes key training steps from a conventional supervised model. The model exhibits a specific learning sequence: initially producing right-flanked responses at regions of sharpest amplitude change, then learning left flanks to form half-Gaussi… view at source ↗
Figure 2
Figure 2. Figure 2: Visual diagnosis and correction of amplitude suppression. Color coding: P-wave (blue), S-wave (orange), Noise (green); light/dark colors show labels/predictions. (a) Representative waveform with phase arrival markers: longest line (reference label), upper short line (conventional prediction, panel b), lower short line (our framework, panel c). Note temporal delay between S-wave onset and high-amplitude wav… view at source ↗
Figure 3
Figure 3. Figure 3: Large-scale quantitative statistics of S-phase predictions: Confirmation and mitigation of suppression. Black Gaussian curves represent reference labels (see Supplementary [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The optimization trap: Actual appearance, geometric origin, and conceptual solution. This figure reveals the geometric nature of amplitude suppression. (a) Magnified view of a suppressed S-phase prediction at step 2692, showing characteristic flattened-top appearance. (b) Theoretical loss landscape of point-wise BCE loss: low-loss region at label peak is a non-global minimum due to additional low-loss regi… view at source ↗
Figure 5
Figure 5. Figure 5: Numerical simulation: Deconstructing the key factors of the dynamic optimization trap. This experiment removes the neural network so optimizer updates directly reflect on prediction curves. The 2×4 grid varies distribution width (σ = 0.1, 0.2, 0.3, 0.5, columns) representing temporal uncertainty and skewness (0 vs. -10, rows) representing systematic bias. Each subplot shows three panels: green (target dist… view at source ↗
Figure 1
Figure 1. Figure 1: Comparative Analysis of P-wave and S-wave Arrival Time Labels. (a) P-wave: The amplitude boundary coincides with true arrival time, providing a distinct marker with low temporal uncertainty. (b) S-wave: Increasing epicentral distance causes waveform broadening and blurred amplitude boundaries. Longer propagation paths produce multiple overlapping waves, reducing amplitude gradients and annotation accuracy.… view at source ↗
Figure 2
Figure 2. Figure 2: The choice of hyperparameter λ and its impact on the convergence trajectory. Yellow dots represent temporally accurate predictions (±0.1 s), gray dots fall outside this range. S-wave prediction trajectories under different λ settings reveal BCE-GAN loss balance: (a) λ=5000 (too high): excessive BCE induces rightward bias, overpowering GAN’s lateral correction; (b) λ=4000 (optimal): forces reach equilibrium… view at source ↗
Figure 3
Figure 3. Figure 3: Comparative analysis of training dynamics. This figure compares three training methods across six panels: Data Loss only (a, d), Hybrid Loss (b, e), and GAN Loss only (c, f), showing curve evolution (top row) and peak trajectories (bottom row). Data Loss only: gradual amplitude development but eventual suppression; trajectory straight but peak never surpasses threshold. Hybrid Loss implementing shape-then-… view at source ↗
Figure 4
Figure 4. Figure 4: Potential and challenges as a general shape learner. To test framework generality, we replaced Gaussian labels with tapered boxcar functions (start at P-wave arrival, end at S + (S − P) × 1.7, taper width σ = 20) without changing architecture. At step 6000, the model successfully learned this non-Gaussian, dynamically-sized shape. However, this required extremely low λ = 10, making generator-discriminator … view at source ↗
Figure 5
Figure 5. Figure 5: Diverse case analysis: Evidence for robustness, limitations, and mode collapse. This figure demonstrates framework performance across diverse scenarios. The hybrid loss framework shows robustness in common cases: near-field earthquakes (a, b) and pure noise (f) with correct near-zero baseline. However, limitations emerge in extreme cases: regional event (c) shows raised amplitude but miscalibrated timing; … view at source ↗
Figure 6
Figure 6. Figure 6: Complete quantitative statistical comparison of P- and S-wave predictions. Black Gaussian curves represent reference labels. This figure compares prediction peak distributions across the test set for three methods (Data Loss only, Hybrid Loss, GAN Loss only). P-wave results (a, c, e): most methods converge well, but GAN Loss only (e) shows pathological diffusion from mode collapse. S-wave results (b, d, f)… view at source ↗
Figure 7
Figure 7. Figure 7: Internal dynamics of generative adversarial training. This figure reveals the continuous minimax game between generator and discriminator across five panels. Discriminator scores for real and fake labels (a, b, c) for three training methods show divergence when the discriminator successfully distinguishes labels and convergence when the generator deceives it. Discriminator loss (d) and generator loss (e) c… view at source ↗
Figure 8
Figure 8. Figure 8: Model Architecture. (a) Complete cGAN training framework showing how the generator (G) is guided by two loss functions for task decoupling: BCE-based Data Loss controls temporal position, while GAN Loss from discriminator (D) controls shape. (b) Generator architecture: PhaseNet model based on U-Net’s encoder-decoder structure. (c) Discriminator (BlueDisc): lightweight CNN with three repeating blocks. Abbre… view at source ↗
read the original abstract

Deep learning has transformed seismic phase picking, but a systematic failure mode persists: for some S-wave arrivals that appear unambiguous to human analysts, the model produces only a distorted peak trapped below the detection threshold, even as the P-wave prediction on the same record appears flawless. By examining training dynamics and loss landscape geometry, we diagnose this amplitude suppression as an optimization trap arising from three interacting factors. Temporal uncertainty in S-wave arrivals, CNN bias toward amplitude boundaries, and the inability of pointwise loss to provide lateral corrective forces combine to create the trap. The diagnosis reveals that phase arrival labels are structured shapes rather than independent probability estimates, requiring training objectives that preserve coherence. We formalize this as the shape-then-align strategy and validate it through a conditional GAN proof of concept, recovering previously sub-threshold signals and achieving a 64% increase in effective S-phase detections. Beyond this implementation, the loss landscape visualization and numerical simulation techniques we introduce provide a general methodology for analyzing how label designs and loss functions interact with temporal uncertainty, transforming these choices from trial-and-error into principled analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper diagnoses a systematic failure mode in deep learning seismic phase pickers: unambiguous S-wave arrivals produce only distorted sub-threshold peaks despite flawless P-wave predictions on the same record. It attributes this to three interacting factors (temporal uncertainty in S arrivals, CNN amplitude bias, and pointwise loss limitations) that create an optimization trap, then formalizes a shape-then-align strategy and validates it via a conditional GAN proof-of-concept that recovers signals and yields a 64% increase in effective S-phase detections. The work also introduces loss-landscape visualization and numerical simulation techniques for analyzing label-loss interactions under temporal uncertainty.

Significance. If the central claim holds after proper isolation of the mechanism, the result would be significant for improving automated S-phase detection in seismic monitoring, with potential carry-over to other time-series tasks involving structured labels and uncertainty. Credit is given for the empirical cGAN demonstration, the formalization of shape-then-align, and the methodological tools for principled loss-function analysis.

major comments (2)
  1. [Abstract] Abstract: the reported 64% gain in effective S-phase detections is presented without data-split details, baseline model specification, error bars, or tests on alternative architectures, leaving the magnitude and robustness of the improvement unverified.
  2. [Results] Results / proof-of-concept section: the attribution of recovery specifically to the shape-then-align component lacks an isolating ablation (e.g., replacing the conditional GAN with a direct shape-preserving regularizer such as trace-wise cross-correlation or SSIM on the probability sequence while holding model and data fixed); without this control the observed gain cannot be confidently distinguished from generic adversarial regularization.
minor comments (2)
  1. [Abstract] Abstract: define 'effective S-phase detections' explicitly and state how the 64% figure is computed relative to the baseline.
  2. [Methods] Methods: clarify the precise architecture of the conditional GAN generator and discriminator and the form of the shape-aware loss term.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the positive assessment of our work's significance and for the constructive major comments. We address each point below, proposing revisions where appropriate to enhance the clarity and robustness of our claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported 64% gain in effective S-phase detections is presented without data-split details, baseline model specification, error bars, or tests on alternative architectures, leaving the magnitude and robustness of the improvement unverified.

    Authors: We acknowledge that the abstract prioritizes conciseness and omits specific experimental details. The manuscript details the data splits, baseline model (standard phase picker with pointwise loss), and includes error bars in the results section. To address this, we will revise the abstract to include a brief statement on the evaluation setup and baseline for better context. Tests on alternative architectures are noted as future work given the proof-of-concept nature, but we will add a discussion on potential generalizability. revision: partial

  2. Referee: [Results] Results / proof-of-concept section: the attribution of recovery specifically to the shape-then-align component lacks an isolating ablation (e.g., replacing the conditional GAN with a direct shape-preserving regularizer such as trace-wise cross-correlation or SSIM on the probability sequence while holding model and data fixed); without this control the observed gain cannot be confidently distinguished from generic adversarial regularization.

    Authors: This is a valid point regarding the need for stronger isolation of the mechanism. Our cGAN implementation serves as a proof-of-concept for the shape-then-align strategy, where the adversarial component encourages shape coherence. To better distinguish from generic adversarial effects, we agree to include an ablation study using a direct shape-preserving regularizer such as SSIM on the probability sequence in the revised version, while keeping the model and data fixed. This will help confirm the specific contribution of the shape-aware approach. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical validation stands independent of inputs

full rationale

The paper diagnoses three interacting factors (temporal uncertainty, CNN amplitude bias, pointwise loss limitations) from training dynamics and loss landscape geometry, then formalizes a shape-then-align strategy validated empirically via conditional GAN on seismic data, reporting a 64% increase in S-phase detections. No equations, derivations, or self-citations are shown that reduce the claimed recovery or performance gain to a fitted parameter, self-referential quantity, or ansatz by construction. The central result is presented as an outcome of the proposed loss strategy on held-out records rather than a closed loop equivalent to the input diagnoses. The derivation chain remains self-contained as a methodological proposal with external empirical support.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that phase-arrival labels possess coherent shape structure that pointwise losses fail to preserve, plus the empirical effectiveness of the conditional-GAN implementation; no free parameters or invented entities are mentioned.

axioms (1)
  • domain assumption Phase arrival labels are structured shapes rather than independent probability estimates
    Explicitly stated in the abstract as the basis for requiring coherence-preserving training objectives.

pith-pipeline@v0.9.0 · 5504 in / 1347 out tokens · 59985 ms · 2026-05-18T00:22:03.826857+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 7 internal anchors

  1. [1]

    doi:10.1093/gji/ggy423

    ISSN 0956-540X. doi:10.1093/gji/ggy423. 11 arXivTemplateA PREPRINT S. Mostafa Mousavi, William L. Ellsworth, Weiqiang Zhu, Lindsay Y . Chuang, and Gregory C. Beroza. Earthquake transformer—an attentive deep-learning model for simultaneous earthquake detection and phase picking.Nature Communications, 11(1):3952,

  2. [2]

    Wu-Yu Liao, En-Jui Lee, Dawei Mu, Po Chen, and Ruey-Juin Rau

    doi:10.1038/s41467-020-17591-w. Wu-Yu Liao, En-Jui Lee, Dawei Mu, Po Chen, and Ruey-Juin Rau. ARRU phase picker: Attention recurrent-residual u-net for picking seismic p - and s -phase arrivals.Seismological Research Letters, 92(4):2410–2428,

  3. [3]

    doi:10.1785/0220200382

    ISSN 0895-0695. doi:10.1785/0220200382. Yuanming Li, Dongsik Yoon, Bonhwa Ku, and Hanseok Ko. ConSeisGen: Controllable synthetic seismic waveform generation.IEEE Geoscience and Remote Sensing Letters, 21:1–5,

  4. [4]

    doi:10.1109/lgrs.2023.3338652

    ISSN 1545-598X. doi:10.1109/lgrs.2023.3338652. Tiantong Wang, Daniel Trugman, and Youzuo Lin. SeismoGen: Seismic waveform synthesis using GAN with application to seismic data augmentation.Journal of Geophysical Research: Solid Earth, 126(4),

  5. [5]

    doi:10.1029/2020jb020077

    ISSN 2169-9313. doi:10.1029/2020jb020077. Yongsoo Park and Gregory C Beroza. Reducing the parameter dependency of phase-picking neural networks with dice loss.The Seismic Record, 5(1):55–63,

  6. [6]

    doi:10.1785/0320240028

    ISSN 2694-4006. doi:10.1785/0320240028. Jesse Williams, Greg Beroza, John Pace, and Artemii Novoselov. Deep learning probabilistic regression for onset time determination. Technical report, Defense Technical Information Center. Technical Report at https://apps.dtic. mil/sti/html/trecms/AD1211554/index.html(2023). Mehdi Mirza and Simon Osindero. Conditiona...

  7. [7]

    Conditional Generative Adversarial Nets

    Preprint at https://arxiv.org/ abs/1411.1784. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks,

  8. [8]

    Image-to-Image Translation with Conditional Adversarial Networks

    Preprint athttps://arxiv.org/abs/1611.07004. Ruoyu Sun, Tiantian Fang, and Alex Schwing. Towards a better global loss landscape of GANs,

  9. [9]

    Yiwen Huang, Aaron Gokaslan, V olodymyr Kuleshov, and James Tompkin

    Preprint at https://arxiv.org/abs/2011.04926. Yiwen Huang, Aaron Gokaslan, V olodymyr Kuleshov, and James Tompkin. The GAN is dead; long live the GAN! a modern GAN baseline,

  10. [10]

    Alberto Michelini, Spina Cianetti, Sonja Gaviano, Carlo Giunchi, Dario Jozinovi ´c, and Valentino Lauciani

    Preprint athttps://arxiv.org/abs/2501.05441. Alberto Michelini, Spina Cianetti, Sonja Gaviano, Carlo Giunchi, Dario Jozinovi ´c, and Valentino Lauciani. IN- STANCE – the italian seismic dataset for machine learning.Earth System Science Data, 13(12):5509–5544,

  11. [11]

    doi:10.5194/essd-13-5509-2021. Jack Woollam, Jannes Münchmeyer, Frederik Tilmann, Andreas Rietbrock, Dietrich Lange, Thomas Bornstein, Tobias Diehl, Carlo Giunchi, Florian Haslinger, Dario Jozinovi´c, Alberto Michelini, Joachim Saul, and Hugo Soto. SeisBench—a toolbox for machine learning in seismology.Seismological Research Letters, 93(3):1695–1709,

  12. [12]

    doi:10.1785/0220210324

    ISSN 0895-0695. doi:10.1785/0220210324. Jannes Münchmeyer, Jack Woollam, Andreas Rietbrock, Frederik Tilmann, Dietrich Lange, Thomas Bornstein, Tobias Diehl, Carlo Giunchi, Florian Haslinger, Dario Jozinovi´c, Alberto Michelini, Joachim Saul, and Hugo Soto. Which picker fits my data? a quantitative evaluation of deep learning based seismic pickers.Journal...

  13. [13]

    doi:10.1029/2021jb023499

    ISSN 2169-9313. doi:10.1029/2021jb023499. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmenta- tion,

  14. [14]

    U-Net: Convolutional Networks for Biomedical Image Segmentation

    Preprint athttps://arxiv.org/abs/1505.04597. Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. f-GAN: Training generative neural samplers using variational divergence minimization,

  15. [15]

    f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization

    Preprint athttps://arxiv.org/abs/1606.00709. Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks,

  16. [16]

    Generative Adversarial Networks

    Preprint athttps://arxiv.org/abs/1406.2661. Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization,

  17. [17]

    Adam: A Method for Stochastic Optimization

    Preprint at https://arxiv. org/abs/1412.6980. Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. Mixed precision training,

  18. [18]

    Preprint at https://arxiv.org/abs/1710.03740. 12 arXivTemplateA PREPRINT Supplementary Information Supplementary Note: cGAN Training Challenges and Solutions The cGAN training process is challenging, primarily because it relies on maintaining a precise yet fragile dynamic equilibrium between the generator and the discriminator. The stability of this frame...