Evaluating PhaseNet on Teleseismic Data with MsPASS
Pith reviewed 2026-05-25 00:55 UTC · model grok-4.3
The pith
Retraining PhaseNet from scratch on 1.6 million teleseismic picks raises recall by 741.5 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors assembled a control dataset of 1.6 million teleseismic waveforms linked to P-wave picks made by analysts at the USArray Array Network Facility. The original PhaseNet model trained on regional signals performs poorly on these data. Training PhaseNet from scratch on the training split of the ANF control dataset and evaluating it on a non-overlapping held-out test split increased P-pick recall by 741.5 percent and yielded 683.9 percent more picks within a 0.1 s residual window. Increasing model size by about 120 times improved precision and recall by 15.6 percent and 23.2 percent respectively, but reduced inference throughput by 87.2 percent on an NVIDIA A100 GPU and by 97.3 percent
What carries the argument
The ANF control dataset of 1.6 million analyst-labeled teleseismic waveforms, used for supervised retraining of PhaseNet and quantitative before-after evaluation on held-out data.
If this is right
- Domain-specific retraining on teleseismic data is required for PhaseNet to achieve high recall on distant events.
- Larger model sizes deliver only modest accuracy gains while sharply lowering throughput.
- GPUs make scaled PhaseNet models far more practical than high-core-count CPU nodes.
- Reproducible workflows enable systematic large-scale training and testing on archived seismic data.
Where Pith is reading between the lines
- The same control-dataset approach could be applied to adapt other machine-learning phase pickers to teleseismic signals.
- The ANF dataset could serve as a public benchmark for comparing future teleseismic picking algorithms.
- Widespread use of retrained models might allow global networks to handle substantially larger data volumes with existing analyst resources.
Load-bearing premise
The 1.6 million teleseismic waveforms labeled by USArray ANF analysts provide sufficiently accurate and unbiased ground-truth P-wave picks for both supervised training and quantitative evaluation of model performance.
What would settle it
Re-evaluation of the retrained model on an independent collection of teleseismic P-picks made by a different analyst group that shows recall gains below 200 percent would falsify the reported improvement from domain-specific training.
Figures
read the original abstract
Numerous studies have shown that the machine-learning picker PhaseNet produces accurate P and S picks on local earthquake signals, but its performance can degrade sharply on teleseismic signals. To address this limitation, we present a reproducible MsPASS workflow that (i) enables scalable data preparation and management for large seismic archives and (ii) supports standardized PhaseNet training and inference. We assembled a control dataset of 1.6 million waveforms linked to teleseismic P-wave picks made by analysts at the USArray Array Network Facility (ANF). The control dataset confirms that the PhaseNet model trained on regional signals performs poorly on these data. We then trained PhaseNet from scratch on the training split of the ANF control dataset and evaluated it on a non-overlapping held-out test split, increasing P-pick recall by 741.5% and yielding 683.9% more picks within a 0.1s residual window. We also evaluated PhaseNet across different model sizes on both CPUs and GPUs. Increasing the model size by about 120 times improved precision and recall by 15.6% and 23.2%, respectively. However, the scaled model reduced inference throughput by 87.2% on an NVIDIA A100 GPU and by 97.3% on a 128-core high-performance CPU node. These results indicate that scaling PhaseNet is more practical on GPUs than on CPUs, and that simply enlarging the model is not an efficient way to achieve large accuracy gains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a reproducible MsPASS-based workflow for assembling and managing a control dataset of 1.6 million teleseismic waveforms with P-wave picks from USArray ANF analysts. It shows that the original PhaseNet model (trained on regional data) performs poorly on this teleseismic set, then retrains PhaseNet from scratch on a training split and evaluates on a non-overlapping held-out test split, reporting a 741.5% increase in P-pick recall and 683.9% more picks inside a 0.1 s residual window. Additional scaling experiments compare model sizes, showing modest accuracy gains at the cost of substantially reduced inference throughput on both CPU and GPU.
Significance. If the central empirical claims hold after addressing label quality, the work supplies a concrete, scalable example of domain adaptation for ML seismic pickers and demonstrates that retraining on teleseismic data yields far larger gains than simply enlarging the model. The use of a large held-out split and the open MsPASS workflow are strengths that support reproducibility and allow future comparisons.
major comments (2)
- [Abstract and Results] Abstract and Results section: The headline metrics (741.5% recall increase and 683.9% more picks within the 0.1 s residual window) are computed by treating ANF analyst picks as exact ground truth for both training and evaluation. The manuscript contains no quantification of timing uncertainty on these teleseismic picks, no cross-check against an independent catalog (e.g., ISC or NEIC), and no sensitivity test varying the residual tolerance. Because teleseismic P onsets are typically emergent and low-SNR, documented analyst uncertainties of 0.2–0.5 s would place a non-negligible fraction of labels outside the 0.1 s acceptance window, undermining the interpretation of relative improvement.
- [Methods] Methods (dataset construction): The paper states that the 1.6 M waveforms are linked to ANF analyst picks and that the test split is non-overlapping, but provides no details on how event overlap was prevented (e.g., by event ID, origin time window, or station clustering) or on the distribution of pick quality flags. This information is required to assess whether the reported gains could be inflated by label noise or split leakage.
minor comments (2)
- [Abstract] The abstract reports percentage improvements without accompanying absolute numbers (e.g., baseline recall, total picks) or confidence intervals; adding these would improve interpretability.
- [Figures and Results] Figure captions and text should explicitly state the exact definition of the 0.1 s residual window (one-sided or two-sided) and whether picks are evaluated only on events that have an ANF label.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important considerations regarding label quality and dataset construction for teleseismic picks. We respond to each major comment below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract and Results] Abstract and Results section: The headline metrics (741.5% recall increase and 683.9% more picks within the 0.1 s residual window) are computed by treating ANF analyst picks as exact ground truth for both training and evaluation. The manuscript contains no quantification of timing uncertainty on these teleseismic picks, no cross-check against an independent catalog (e.g., ISC or NEIC), and no sensitivity test varying the residual tolerance. Because teleseismic P onsets are typically emergent and low-SNR, documented analyst uncertainties of 0.2–0.5 s would place a non-negligible fraction of labels outside the 0.1 s acceptance window, undermining the interpretation of relative improvement.
Authors: We agree that ANF analyst picks for teleseismic events have inherent timing uncertainties larger than regional events due to emergent onsets and lower SNR. Since both the baseline and retrained models are evaluated on the identical label set, the relative improvements still demonstrate the value of domain adaptation. In the revised manuscript we will add an explicit discussion of expected teleseismic pick uncertainties (citing literature values of 0.2-0.5 s) and include a sensitivity analysis reporting recall/precision at residual tolerances of 0.1 s, 0.2 s, and 0.5 s. A direct cross-check against ISC/NEIC catalogs is outside the current scope because our MsPASS workflow does not include the required event matching metadata; we will note this as a limitation for future work. revision: partial
-
Referee: [Methods] Methods (dataset construction): The paper states that the 1.6 M waveforms are linked to ANF analyst picks and that the test split is non-overlapping, but provides no details on how event overlap was prevented (e.g., by event ID, origin time window, or station clustering) or on the distribution of pick quality flags. This information is required to assess whether the reported gains could be inflated by label noise or split leakage.
Authors: The held-out test split was formed by partitioning on unique event IDs from the ANF catalog so that no event appears in both training and test sets; this eliminates event-level leakage. We will revise the Methods section to document this event-ID-based partitioning procedure and to report the distribution of ANF pick quality flags (e.g., fraction labeled high-quality). revision: yes
Circularity Check
No circularity: purely empirical train/test evaluation on held-out data
full rationale
The paper reports an empirical ML experiment: assemble 1.6M teleseismic waveforms with ANF analyst labels, train PhaseNet from scratch on a training split, evaluate recall/precision on a non-overlapping test split, and compare model sizes on CPU/GPU. No derivations, no first-principles predictions, no fitted parameters renamed as independent results, and no self-citation chains are invoked to justify any claim. The performance numbers are direct measurements against the held-out labels; they do not reduce to the inputs by construction. This matches the default case of a self-contained empirical study with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Analyst picks at the USArray ANF constitute reliable ground-truth labels for teleseismic P waves
Reference graph
Works this paper leans on
-
[1]
Bulletin of the seismological society of America , shortjournal=
Automatic earthquake recognition and timing from single traces , author=. Bulletin of the seismological society of America , shortjournal=. 1978 , publisher=
work page 1978
-
[2]
Bulletin of the Seismological Society of America , shortjournal=
An automatic phase picker for local and teleseismic events , author=. Bulletin of the Seismological Society of America , shortjournal=. 1987 , publisher=
work page 1987
-
[3]
Physics of the earth and planetary interiors , shortjournal=
Robust automatic P-phase picking: an on-line implementation in the analysis of broadband seismogram recordings , author=. Physics of the earth and planetary interiors , shortjournal=. 1999 , publisher=
work page 1999
-
[4]
IEEE transactions on geoscience and remote sensing , shortjournal=
PAI-S/K: A robust automatic seismic P phase arrival identification scheme , author=. IEEE transactions on geoscience and remote sensing , shortjournal=. 2002 , publisher=
work page 2002
-
[5]
Geophysical Journal International , shortjournal=
Automated determination of P-phase arrival times at regional and local distances using higher order statistics , author=. Geophysical Journal International , shortjournal=. 2010 , publisher=
work page 2010
-
[6]
Journal of Geophysical Research: Machine Learning and Computation , shortjournal=
Evaluating automated seismic event detection approaches: An application to Victoria Land, East Antarctica , author=. Journal of Geophysical Research: Machine Learning and Computation , shortjournal=. 2024 , publisher=
work page 2024
-
[7]
New Manual of Seismological Observatory Practice 2 (NMSOP-2) , pages=
Automated event and phase identification , author=. New Manual of Seismological Observatory Practice 2 (NMSOP-2) , pages=. 2012 , publisher=
work page 2012
-
[8]
Bulletin of the Seismological Society of America , shortjournal=
Automatic S-wave picker for local earthquake tomography , author=. Bulletin of the Seismological Society of America , shortjournal=. 2009 , publisher=
work page 2009
-
[9]
Geophysical Journal International , shortjournal=
PhaseNet: a deep-neural-network-based seismic arrival-time picking method , author=. Geophysical Journal International , shortjournal=. 2019 , publisher=
work page 2019
-
[10]
Bulletin of the Seismological Society of America , shortjournal=
Generalized seismic phase detection with deep learning , author=. Bulletin of the Seismological Society of America , shortjournal=. 2018 , publisher=
work page 2018
-
[11]
Nature communications , shortjournal=
Earthquake transformer—an attentive deep-learning model for simultaneous earthquake detection and phase picking , author=. Nature communications , shortjournal=. 2020 , publisher=
work page 2020
-
[12]
Geophysical Journal International , shortjournal=
DeepPhasePick: A method for detecting and picking seismic phases from local earthquakes based on highly optimized convolutional and recurrent deep neural networks , author=. Geophysical Journal International , shortjournal=. 2021 , publisher=
work page 2021
-
[13]
Journal of Geophysical Research: Solid Earth , shortjournal=
Which picker fits my data? A quantitative evaluation of deep learning based seismic pickers , author=. Journal of Geophysical Research: Solid Earth , shortjournal=. 2022 , publisher=
work page 2022
-
[14]
Seismological Research Letters , shortjournal=
MsPASS: A data management and processing framework for seismology , author=. Seismological Research Letters , shortjournal=
-
[15]
Seismological Research Letters , shortjournal=
SeisBench—A toolbox for machine learning in seismology , author=. Seismological Research Letters , shortjournal=
-
[16]
Earth, Planets and Space , shortjournal=
Neural phase picker trained on the Japan meteorological agency unified earthquake catalog , author=. Earth, Planets and Space , shortjournal=. 2024 , publisher=
work page 2024
-
[17]
Earthquake Science , shortjournal=
Benchmark on the accuracy and efficiency of several neural network based phase pickers using datasets from China Seismic Network , author=. Earthquake Science , shortjournal=. 2023 , publisher=
work page 2023
-
[18]
Geological Society of America Today , shortjournal=
The usarray initiative , author=. Geological Society of America Today , shortjournal=. 1999 , publisher=
work page 1999
-
[19]
Computers & Geosciences , shortjournal=
Array processing of teleseismic body waves with the USArray , author=. Computers & Geosciences , shortjournal=. 2010 , publisher=
work page 2010
-
[20]
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
Squeeze-and-Excitation Networks , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
-
[21]
IEEE transactions on pattern analysis and machine intelligence , shortjournal=
Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs , author=. IEEE transactions on pattern analysis and machine intelligence , shortjournal=. 2017 , publisher=
work page 2017
-
[22]
Picking Regional Seismic Phase Arrival Times with Deep Learning , author=. Seismica , shortjournal=. 2025 , doi=
work page 2025
-
[23]
Picking Induced Seismicity with Deep Learning (piSDL) , author=. Seismica , shortjournal=. 2025 , doi=
work page 2025
-
[24]
Seismotectonics of the Imperial Valley of southern California , author=
CEDAR: An approach to the computer automation of short-period local seismic networks, 1. Seismotectonics of the Imperial Valley of southern California , author=. Ph. D. Thesis , year=
-
[25]
Scaling Laws for Neural Language Models
Scaling laws for neural language models , author=. arXiv preprint arXiv:2001.08361 , year=
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[26]
USArray Transportable Array , publisher =. doi:10.7914/SN/TA , url =
-
[27]
Seismological Research Letters , shortjournal=
Data products at the IRIS DMC: Stepping stones for research and other applications , author=. Seismological Research Letters , shortjournal=. 2012 , publisher=
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.