Parameter-Efficient Transfer Learning for Microseismic Phase Picking Using a Neural Operator
Pith reviewed 2026-05-16 22:23 UTC · model grok-4.3
The pith
Fine-tuning just 3.6 percent of a pre-trained neural operator's parameters adapts it to microseismic phase picking and delivers up to 30 percent gains over the original model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that parameter-efficient fine-tuning of PhaseNO, which adjusts only 3.6 percent of its weights on 200 microseismic examples, produces a model that raises F1 and accuracy scores by up to 30 percent absolute on three independent microseismic test sets, while also exceeding both STA/LTA and current state-of-the-art networks such as PhaseNet and EQTransformer.
What carries the argument
Parameter-efficient fine-tuning of the Phase Neural Operator (PhaseNO), which selectively updates a small fraction of parameters to specialize the model's pre-learned global representations for low-SNR, sparse-geometry microseismic data.
If this is right
- The adapted model maintains high accuracy on low signal-to-noise microseismic events where traditional and other deep-learning methods degrade.
- Only a modest calibration set is required, making the workflow feasible for new campaign deployments without building large labeled catalogs.
- Performance gains hold across multiple independent test collections drawn from hydraulic-fracturing environments.
- The same pre-trained earthquake backbone can be reused for microseismic tasks by updating a tiny parameter subset rather than retraining from scratch.
Where Pith is reading between the lines
- The approach could extend to other geophysical monitoring tasks that suffer from limited labeled data but share underlying wave-propagation physics with earthquakes.
- Further reduction of the tunable parameter fraction below 3.6 percent might still preserve gains, offering even lighter adaptation for edge deployment.
- Success here indicates that large-scale earthquake pre-training captures transferable features that microseismic signals can leverage with minimal additional supervision.
Load-bearing premise
The 200 labeled microseismic recordings are sufficient and representative enough for the fine-tuned model to generalize across three independent test datasets without overfitting or data leakage.
What would settle it
Running the adapted model on a fourth independent microseismic dataset collected with different sensor spacing or higher noise levels and finding that its F1 scores fall to or below those of the original PhaseNO would falsify the claim of consistent outperformance.
read the original abstract
Seismic phase picking is fundamental for microseismic monitoring and subsurface imaging. Manual processing is impractical for real-time applications and large sensor arrays, motivating the use of deep learning-based pickers trained on extensive earthquake catalogs. On a broader scale, these models are generally tuned to perform optimally in high signal-to-noise and long-duration networks and often fail to perform satisfactorily when applied to campaign-based microseismic datasets, which are characterized by low signal-to-noise ratios, sparse geometries, and limited labeled data. In this study, we present a microseismic adaptation of a network-wide earthquake phase picker, Phase Neural Operator (PhaseNO), using transfer learning and parameter-efficient fine-tuning. Starting from a model pre-trained on more than 57,000 three-component earthquake and noise records, we fine-tune it using only 200 labeled and noisy microseismic recordings from hydraulic fracturing settings. We present a parameter-efficient adaptation of PhaseNO that fine-tunes a small fraction of its parameters (only 3.6%) while retaining its global spatiotemporal representations learned from a large dataset of earthquake recordings. We then evaluate our adapted model on three independent microseismic datasets and compare its performance against the original pre-trained PhaseNO, a STA/LTA-based workflow, and two state-of-the-art deep learning models, PhaseNet and EQTransformer. We demonstrate that our adapted model significantly outperforms the original PhaseNO in F1 and accuracy metrics, achieving up to 30% absolute improvements in all test sets and consistently performing better than STA/LTA and state-of-the-art models. With our adaptation being based on a small calibration set, our proposed workflow is a practical and efficient tool to deploy network-wide models in data-limited microseismic applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a parameter-efficient transfer learning approach to adapt the PhaseNO neural operator, pre-trained on over 57,000 earthquake records, to microseismic phase picking by fine-tuning only 3.6% of its parameters on 200 labeled noisy recordings from hydraulic fracturing. The adapted model is evaluated on three independent microseismic datasets, showing up to 30% absolute improvements in F1 and accuracy metrics over the original PhaseNO, STA/LTA, PhaseNet, and EQTransformer.
Significance. If the reported gains hold without data leakage or overfitting, this work demonstrates a practical method for deploying large-scale seismic models in data-scarce microseismic settings, potentially improving real-time monitoring efficiency. The use of parameter-efficient fine-tuning preserves the model's global representations while adapting to low-SNR, sparse geometries.
major comments (3)
- [Methods/Dataset description] The independence of the 200 fine-tuning recordings from the three test datasets is asserted in the abstract and evaluation sections but lacks explicit verification; details on station/event overlap checks, cross-validation strategy, or data leakage prevention are needed to support the generalization claim and rule out site-specific memorization.
- [Methods/Training procedure] The training procedure for the parameter-efficient fine-tuning (specific parameters selected for update, exact PEFT technique such as LoRA or adapters, optimizer, learning rate, epochs, and regularization) is insufficiently detailed, making it difficult to assess overfitting risk on the small 200-sample set or reproduce the 30% gains.
- [Results/Evaluation metrics] The headline claim of 'up to 30% absolute improvements' in F1 and accuracy requires per-dataset numerical breakdowns, standard deviations or confidence intervals, and statistical significance tests against baselines to substantiate consistent outperformance across all three test sets.
minor comments (2)
- [Introduction] Clarify the exact parameter-efficient technique (e.g., which layers or modules are adapted) and add citations to the relevant PEFT literature in the introduction or methods.
- [Figures] Ensure figures showing performance comparisons include error bars, clear legends, and consistent axis scaling for F1/accuracy metrics.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review of our manuscript. We have carefully considered each major comment and provide point-by-point responses below. Where appropriate, we have revised the manuscript to address the concerns raised.
read point-by-point responses
-
Referee: [Methods/Dataset description] The independence of the 200 fine-tuning recordings from the three test datasets is asserted in the abstract and evaluation sections but lacks explicit verification; details on station/event overlap checks, cross-validation strategy, or data leakage prevention are needed to support the generalization claim and rule out site-specific memorization.
Authors: We appreciate the referee's emphasis on rigorously establishing dataset independence. The 200 fine-tuning recordings originate from a distinct hydraulic fracturing campaign in one geographic region, while the three test datasets come from separate microseismic monitoring projects with no shared events, stations, or temporal overlap. We will add a dedicated paragraph in the Methods section that explicitly documents the data provenance, confirms the absence of overlap via event identifiers and station codes, and describes the internal 80/20 train-validation split used during fine-tuning to mitigate leakage. These additions will directly support the generalization claims. revision: yes
-
Referee: [Methods/Training procedure] The training procedure for the parameter-efficient fine-tuning (specific parameters selected for update, exact PEFT technique such as LoRA or adapters, optimizer, learning rate, epochs, and regularization) is insufficiently detailed, making it difficult to assess overfitting risk on the small 200-sample set or reproduce the 30% gains.
Authors: We agree that greater specificity on the fine-tuning protocol is required for reproducibility and to allow assessment of overfitting. The adaptation employs LoRA adapters inserted into the attention layers of PhaseNO, updating only the low-rank matrices (rank 8) for 3.6% of total parameters. Optimization used Adam with a learning rate of 1e-4, a batch size of 8, and training for a maximum of 80 epochs with early stopping (patience of 15 epochs) on a held-out validation subset. L2 weight decay of 1e-5 was applied. We will expand the Methods section with these hyperparameters, a summary table, and a brief description of the PEFT implementation to enable full reproduction. revision: yes
-
Referee: [Results/Evaluation metrics] The headline claim of 'up to 30% absolute improvements' in F1 and accuracy requires per-dataset numerical breakdowns, standard deviations or confidence intervals, and statistical significance tests against baselines to substantiate consistent outperformance across all three test sets.
Authors: We concur that more granular and statistically supported reporting will strengthen the results. We will revise the Results section to include a table with per-dataset F1, accuracy, precision, and recall values for the adapted model and all baselines. Standard deviations will be reported from five independent fine-tuning runs using different random seeds. We will also add paired statistical significance tests (McNemar's test for classification metrics) with p-values to confirm that the observed improvements are consistent and not due to chance. These changes will substantiate the headline performance gains across all test sets. revision: yes
Circularity Check
No circularity: empirical metrics measured on independent held-out sets
full rationale
The paper's central claims rest on direct experimental measurement: a PhaseNO model is pre-trained on an external corpus of >57k records, a small subset of parameters (3.6%) is fine-tuned on 200 labeled microseismic examples, and F1/accuracy are then computed on three explicitly independent test datasets. No equation or derivation reduces to its own inputs by construction; performance deltas are not fitted parameters renamed as predictions, nor are they justified solely by self-citation chains. The evaluation protocol (comparison to STA/LTA, PhaseNet, EQTransformer) is externally falsifiable and does not rely on any uniqueness theorem or ansatz smuggled from prior author work. This is a standard empirical transfer-learning result whose validity hinges on data independence rather than tautological re-derivation.
Axiom & Free-Parameter Ledger
free parameters (1)
- fine-tuning hyperparameters
axioms (1)
- domain assumption Microseismic data shares transferable spatiotemporal features with earthquake data learned by PhaseNO
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We employ the Phase Neural Operator (PhaseNO) model ... Temporal information is processed using layers of Fourier neural operators (FNO) ... Spatial information ... using graph neural operators (GNO)
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
fine-tune it using only 200 labeled and noisy microseismic recordings ... fine-tunes a small fraction of its parameters (only 3.6%)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Earth-Science Reviews239, 104371 (2023)
Anikiev, D., Birnie, C., Waheed, U., Alkhali- fah, T., Gu, C., Verschuur, D.J., Eisner, L.: Machine learning in microseismic monitoring. Earth-Science Reviews239, 104371 (2023)
work page 2023
-
[2]
Bul- letin of the Seismological Society of America 113(6), 2323–2344 (2023)
Armstrong, A.D., Claerhout, Z., Baker, B., Koper, K.D.: A deep-learning phase picker with cali- brated bayesian-derived uncertainties for earth- quakes in the yellowstone volcanic region. Bul- letin of the Seismological Society of America 113(6), 2323–2344 (2023)
work page 2023
-
[3]
Bulletin of the Seismological Society 9 Fig
Baillard, C., Crawford, W.C., Ballu, V., Hibert, C., Mangeney, A.: An automatic kurtosis-based p-and s-phase picker designed for local seismic networks. Bulletin of the Seismological Society 9 Fig. 8Comparison of picking arrival time differences (top row) and predicted uncertainty computed from the probability function of a corresponding pick (bottom row)...
work page 2014
-
[4]
Seismological Society of America92(2A), 787–801 (2021)
Baker, B., Holt, M.M., Pankow, K.L., Koper, K.D., Farrell, J.: Monitoring the 2020 magna, utah, earthquake sequence with nodal seis- mometers and machine learning. Seismological Society of America92(2A), 787–801 (2021)
work page 2020
-
[5]
Seismological Society of America93(5), 2554–2569 (2022)
Chen, G., Li, J.: Cubenet: Array-based seismic phase picking with deep learning. Seismological Society of America93(5), 2554–2569 (2022)
work page 2022
-
[6]
Geophysical Research Letters47(16), 2020– 088651 (2020)
Venkatakrishnan, S.V., Schoenball, M., Zhu, W., Beroza, G.C., Thurber, C., Team, E.C.: Using a deep neural network and transfer learn- ing to bridge scales for seismic phase picking. Geophysical Research Letters47(16), 2020– 088651 (2020)
work page 2020
-
[7]
Geochemistry, Geophysics, Geosystems23(11), 2022–010453 (2022)
Feng, T., Mohanna, S., Meng, L.: Edgephase: A deep learning model for multi-station seis- mic phase picking. Geochemistry, Geophysics, Geosystems23(11), 2022–010453 (2022)
work page 2022
-
[8]
In: International Confer- ence on Machine Learning, pp
Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: International Confer- ence on Machine Learning, pp. 1263–1272 (2017). PMLR
work page 2017
-
[9]
IEEE Geoscience and Remote Sensing Letters18(4), 617–621 (2020)
He, Z., Peng, P., Wang, L., Jiang, Y.: Pickcapsnet: Capsule network for automatic p-wave arrival picking. IEEE Geoscience and Remote Sensing Letters18(4), 617–621 (2020)
work page 2020
-
[10]
Geophysical journal international224(1), 230–240 (2021)
Koper, K.D.: Application of a convolutional 11 neural network for seismic phase picking of mining-induced seismicity. Geophysical journal international224(1), 230–240 (2021)
work page 2021
-
[11]
In: SEG Inter- national Exposition and Annual Meeting, pp
Kirschner, D., Howes, N., Daly, C., Mukherjee, J., Li, J.: Detecting p-and s-wave arrivals with a recurrent neural network. In: SEG Inter- national Exposition and Annual Meeting, pp. 043–119005 (2019). SEG
work page 2019
-
[12]
Jour- nal of Machine Learning Research24(89), 1–97 (2023)
Kovachki, N., Li, Z., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A., Anandkumar, A.: Neural operator: Learning maps between function spaces with applications to pdes. Jour- nal of Machine Learning Research24(89), 1–97 (2023)
work page 2023
-
[13]
Frontiers in big Data6, 1174478 (2023)
Kolar, P., Waheed, U.b., Eisner, L., Matousek, P.: Arrival times by recurrent neural network for induced seismic events from a permanent net- work. Frontiers in big Data6, 1174478 (2023)
work page 2023
-
[14]
Fourier Neural Operator for Parametric Partial Differential Equations
Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., Anandku- mar, A.: Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[15]
Neural Operator: Graph Kernel Network for Partial Differential Equations
Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., Anandku- mar, A.: Neural operator: Graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2003
-
[16]
Nature communications11(1), 3952 (2020)
Chuang, L.Y., Beroza, G.C.: Earthquake trans- former—an attentive deep-learning model for simultaneous earthquake detection and phase picking. Nature communications11(1), 3952 (2020)
work page 2020
-
[17]
Bulletin of the Seismological Society of America108(5A), 2894–2901 (2018)
Ross, Z.E., Meier, M.-A., Hauksson, E., Heaton, T.H.: Generalized seismic phase detection with deep learning. Bulletin of the Seismological Society of America108(5A), 2894–2901 (2018)
work page 2018
-
[18]
Seismological Research Letters93(3), 1673–1682 (2022)
Retailleau, L., Saurel, J.-M., Zhu, W., Satri- ano, C., Beroza, G.C., Issartel, S., Boissier, P., Team, O., Team, O.,et al.: A wrapper to use a machine-learning-based algorithm for earthquake monitoring. Seismological Research Letters93(3), 1673–1682 (2022)
work page 2022
-
[19]
Journal of Geophysical Research: Machine Learning and Computation1(4), 2024–000220 (2024)
Selvadurai, P.A., Lanza, F., Yuan, S., Ober- mann, A., Mesimeri, M., M¨ unchmeyer, J.,et al.: From labquakes to megathrusts: Scaling deep learning based pickers over 15 orders of magni- tude. Journal of Geophysical Research: Machine Learning and Computation1(4), 2024–000220 (2024)
work page 2024
-
[20]
Geophysical Research Letters50(24), 2023–106434 (2023)
Sun, H., Ross, Z.E., Zhu, W., Azizzadenesheli, K.: Phase neural operator for multi-station pick- ing of seismic arrivals. Geophysical Research Letters50(24), 2023–106434 (2023)
work page 2023
-
[21]
In: 2021 IEEE 3rd International Conference on BioInspired Processing (BIP), pp
Laat, L., Baldares, R.J., Chaves, E.J., Mene- ses, E.: Oksp: a novel deep learning automatic event detection pipeline for seismic monitoring in costa rica. In: 2021 IEEE 3rd International Conference on BioInspired Processing (BIP), pp. 1–6 (2021). IEEE
work page 2021
-
[22]
Journal of Geophysical Research: Solid Earth 124(7), 6612–6624 (2019)
Wang, J., Xiao, Z., Liu, C., Zhao, D., Yao, Z.: Deep learning for picking seismic arrival times. Journal of Geophysical Research: Solid Earth 124(7), 6612–6624 (2019)
work page 2019
-
[23]
Geophysical Journal International 216(1), 261–273 (2019)
Zhu, W., Beroza, G.C.: Phasenet: a deep- neural-network-based seismic arrival-time pick- ing method. Geophysical Journal International 216(1), 261–273 (2019)
work page 2019
-
[24]
Journal of Geophysical Research: Solid Earth 130(4), 2025–031294 (2025) 12
Zhou, Y., Ding, H., Ghosh, A., Ge, Z.: Ai-pal: Self-supervised ai phase picking via rule-based algorithm for generalized earthquake detection. Journal of Geophysical Research: Solid Earth 130(4), 2025–031294 (2025) 12
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.