Explainable AI for microseismic event detection

Ayrat Abdullin; Denis Anikiev; Umair Bin Waheed

arxiv: 2510.17458 · v2 · submitted 2025-10-20 · 💻 cs.LG · physics.geo-ph

Explainable AI for microseismic event detection

Ayrat Abdullin , Denis Anikiev , Umair Bin Waheed This is my paper

Pith reviewed 2026-05-18 05:50 UTC · model grok-4.3

classification 💻 cs.LG physics.geo-ph

keywords explainable AImicroseismic event detectionPhaseNetSHAPGrad-CAMseismic waveform analysisdeep learning interpretation

0 comments

The pith

A SHAP-gated PhaseNet reaches F1-score 0.98 on microseismic waveforms by filtering outputs with explanation metrics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that Grad-CAM and SHAP can reveal how PhaseNet attends to P- and S-wave arrivals and uses component amplitudes in ways that match standard geophysical expectations. From these interpretations the authors build a SHAP-gated inference rule that accepts a detection only when the explanation score supports the model's output. On a held-out set of 9000 waveforms this gated version lifts the F1-score from 0.97 to 0.98 while raising precision to 0.99 and improving tolerance to added noise. The result supplies both an audit of the original network and a practical post-processing step that reduces false triggers without retraining. A sympathetic reader would see this as evidence that explanation tools can move from post-hoc inspection to active performance gains in automated seismic monitoring.

Core claim

By applying Grad-CAM and SHAP to PhaseNet the model is shown to focus on the correct P- and S-wave arrivals, with vertical-component energy driving P picks and horizontal-component energy driving S picks. These insights are then used to construct a SHAP-gated inference scheme that combines the network probability with a SHAP-derived consistency metric; on 9000 test waveforms the scheme records an F1-score of 0.98 (precision 0.99, recall 0.97) and greater noise robustness than the unmodified PhaseNet baseline.

What carries the argument

The SHAP-gated inference scheme, which accepts or rejects PhaseNet detections according to whether the model's output is supported by a SHAP explanation metric.

If this is right

Seismic monitoring systems can add a post-processing filter that raises precision and noise robustness without retraining the base network.
Explanation scores can serve as an internal consistency check that reduces false detections in noisy field data.
The same workflow supplies a documented template for making other black-box seismic detectors more auditable.
Component-specific feature contributions revealed by SHAP align with and therefore reinforce existing geophysical priors for phase identification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The gating idea could be tested on other phase-picking or event-classification networks to see whether explanation-based filters generalize beyond PhaseNet.
Systematic comparison of SHAP-gated outputs against human analyst picks on the same waveforms would quantify how often the filter corrects or introduces new errors.
If the method scales, it offers one route toward regulatory or operational acceptance of automated microseismic catalogs in industrial settings.

Load-bearing premise

The SHAP-derived explanation metric is an unbiased indicator of geophysical correctness and thresholding it will not discard valid low-amplitude events the original network would have kept.

What would settle it

Running the gated model on an independent dataset rich in low-amplitude but correctly labeled events and finding that it misses more true events than the baseline PhaseNet would falsify the performance claim.

read the original abstract

Deep neural networks like PhaseNet show high accuracy in detecting microseismic events, but their black-box nature is a concern in critical applications. We apply Explainable Artificial Intelligence (XAI) techniques, such as Gradient-weighted Class Activation Mapping (Grad-CAM) and Shapley Additive Explanations (SHAP), to interpret the PhaseNet model's decisions and improve its reliability. Grad-CAM highlights that the network's attention aligns with P- and S-wave arrivals. SHAP values quantify feature contributions, confirming that vertical-component amplitudes drive P-phase picks while horizontal components dominate S-phase picks, consistent with geophysical principles. Leveraging these insights, we introduce a SHAP-gated inference scheme that combines the model's output with an explanation-based metric to reduce errors. On a test set of 9,000 waveforms, the SHAP-gated model achieved an F1-score of 0.98 (precision 0.99, recall 0.97), outperforming the baseline PhaseNet (F1-score 0.97) and demonstrating enhanced robustness to noise. These results show that XAI can not only interpret deep learning models but also directly enhance their performance, providing a template for building trust in automated seismic detectors. The implementation and scripts used in this study will be publicly available at https://github.com/ayratabd/xAI_PhaseNet.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies Grad-CAM and SHAP to PhaseNet and adds a gating rule that raises F1 from 0.97 to 0.98 on their 9,000-waveform test set.

read the letter

The main point is that they take an existing PhaseNet model, generate Grad-CAM maps and SHAP values on microseismic waveforms, confirm that the explanations track P- and S-wave arrivals and component importance as expected, and then use a SHAP-derived metric to gate the output. This produces the reported F1 gain plus claimed better noise handling, with code promised on GitHub. The domain-specific use of the gating step is not in the prior work they cite, so that part is new. The interpretations themselves are straightforward and line up with basic geophysics, which is useful to see spelled out for this data type. The numbers are concrete and the comparison is against the unmodified baseline on held-out data. The soft spots are the small size of the lift and the missing details on how the gating threshold was chosen and whether the test set includes a range of amplitudes and noise levels. If the threshold was tuned on the same data used for the final numbers, or if low-amplitude true events are being filtered out to buy the precision increase, the robustness claim would need more support. No amplitude- or SNR-stratified breakdown is mentioned in the abstract, so that remains an open question. This is mainly for researchers already working on automated microseismic monitoring who want a practical example of adding some interpretability without starting from scratch. A reader outside that niche will not find broad methodological advances. I would send it for peer review. The empirical claim is specific enough and the application is grounded, so referees can check the missing details and decide if the gain holds up.

Referee Report

2 major / 2 minor

Summary. The manuscript applies Grad-CAM and SHAP to interpret the PhaseNet model for microseismic event detection and introduces a SHAP-gated inference scheme that combines model output with an explanation-based metric. On a test set of 9,000 waveforms, the gated model is reported to reach F1=0.98 (precision 0.99, recall 0.97) versus baseline PhaseNet F1=0.97, with explanations aligning to geophysical expectations on P- and S-wave arrivals and improved noise robustness. Code and scripts are to be released publicly.

Significance. If the SHAP-gating proves unbiased with respect to event amplitude and the modest F1 gain is confirmed by stratified analysis, the work provides a concrete template for using XAI both to interpret and to improve reliability of deep learning detectors in a high-stakes geophysical domain. The reported consistency between SHAP attributions and domain knowledge, together with the planned public code release, strengthens the contribution.

major comments (2)

Results section (performance on 9,000-waveform test set): the claim of enhanced noise robustness and the 0.01 F1 improvement rest on the untested assumption that the SHAP-gate threshold does not systematically reject valid low-amplitude events that the baseline would accept. No amplitude- or SNR-stratified error analysis is described, leaving open the possibility that the precision lift is achieved by trading recall on subtle but real arrivals.
Methods / experimental setup: details on test-set construction (how the 9,000 waveforms were selected or augmented), the noise levels used for robustness tests, and the procedure for choosing the SHAP gating threshold (validation-set tuning versus test-set) are not provided. These omissions directly affect the reliability of the reported metrics.

minor comments (2)

Abstract: the statement that code will be publicly available should be accompanied by the actual GitHub URL in the final version.
Methods: the precise mathematical definition of the explanation-based gating metric and the rule for combining it with the network output probability should be stated explicitly, preferably with an equation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and have revised the manuscript to incorporate additional analysis and methodological details that strengthen the reliability of our claims.

read point-by-point responses

Referee: Results section (performance on 9,000-waveform test set): the claim of enhanced noise robustness and the 0.01 F1 improvement rest on the untested assumption that the SHAP-gate threshold does not systematically reject valid low-amplitude events that the baseline would accept. No amplitude- or SNR-stratified error analysis is described, leaving open the possibility that the precision lift is achieved by trading recall on subtle but real arrivals.

Authors: We agree this is a substantive concern that must be explicitly tested. In the revised manuscript we have added a new stratified analysis by event amplitude and SNR. The results confirm that the SHAP-gated model does not systematically reject low-amplitude events; recall remains stable or improves in the lowest SNR bins (0–5 dB), and the observed precision gain is not achieved by sacrificing subtle arrivals. We include a new figure showing precision, recall, and F1 stratified by SNR and amplitude quartiles. revision: yes
Referee: Methods / experimental setup: details on test-set construction (how the 9,000 waveforms were selected or augmented), the noise levels used for robustness tests, and the procedure for choosing the SHAP gating threshold (validation-set tuning versus test-set) are not provided. These omissions directly affect the reliability of the reported metrics.

Authors: We acknowledge that these details were insufficiently documented. The revised Methods section now specifies: (i) the origin of the 9,000-waveform test set and the selection/augmentation protocol; (ii) the exact SNR ranges (0–20 dB) and noise types used for robustness evaluation; and (iii) that the SHAP gating threshold was tuned exclusively on a held-out validation set via cross-validation, with no test-set information used in threshold selection. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical test-set metrics independent of internal definitions

full rationale

The reported performance (F1=0.98 on 9,000 held-out waveforms versus baseline PhaseNet F1=0.97) is obtained by direct evaluation on external test data after applying the SHAP-gated scheme. No equations, fitted parameters, or self-citations are shown that would make the improvement equivalent to a quantity defined inside the paper itself. The derivation chain consists of standard XAI application followed by thresholded inference and separate benchmarking, with no reduction of the central claim to its own inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work rests on the standard assumption that PhaseNet was trained on representative microseismic data and that SHAP values computed on that model faithfully reflect decision boundaries. No new physical constants, particles, or ad-hoc entities are introduced. The gating threshold is the only tunable element whose value is not reported in the abstract.

free parameters (1)

SHAP-gate threshold
The cutoff value that decides whether an explanation-based metric overrides the raw network output; its specific value is not stated in the abstract.

axioms (1)

domain assumption SHAP values computed on PhaseNet accurately capture the contribution of vertical versus horizontal components to P- and S-phase decisions.
Invoked when the authors state that the explanations are consistent with geophysical principles.

pith-pipeline@v0.9.0 · 5772 in / 1465 out tokens · 35256 ms · 2026-05-18T05:50:45.395127+00:00 · methodology

Explainable AI for microseismic event detection

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)