Shapley Value-Guided Adaptive Ensemble Learning for Explainable Financial Fraud Detection with U.S. Regulatory Compliance Validation
Pith reviewed 2026-05-10 16:23 UTC · model grok-4.3
The pith
SHAP attribution agreement allows dynamic weighting of ensemble models to achieve superior fraud detection performance with built-in regulatory compliance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the SHAP-Guided Adaptive Ensemble dynamically adjusts per-transaction ensemble weights based on SHAP attribution agreement, achieving the highest AUC-ROC among all tested models on the IEEE-CIS fraud dataset while ensuring explanations meet regulatory requirements for financial institutions.
What carries the argument
The SHAP-Guided Adaptive Ensemble (SGAE), a method that uses agreement between SHAP explanations from different base models to set their combination weights on a per-transaction basis.
If this is right
- Consistent SHAP attributions lead to higher weights for those base models in the ensemble for each transaction.
- The SGAE method records an AUC-ROC of 0.8837 on held-out data and 0.9245 under cross-validation.
- Explanations from the ensemble satisfy the transparency demands of OCC Bulletin 2011-12, Federal Reserve SR 11-7, and BSA-AML.
- Standalone GNN-GraphSAGE reaches an AUC-ROC of 0.9248 and F1 score of 0.6013 on the full dataset.
Where Pith is reading between the lines
- The per-transaction adaptation might extend naturally to other time-sensitive detection tasks such as anomaly detection in network security.
- Using SHAP agreement as a weighting signal could reduce reliance on validation sets for ensemble tuning.
- Future work might examine whether this agreement metric correlates with actual transaction outcomes beyond the current dataset.
Load-bearing premise
That agreement among SHAP attributions across base models provides an unbiased and non-overfitting signal for dynamically setting ensemble weights without circular dependence on the explanation method itself.
What would settle it
Comparing the AUC-ROC of SGAE to that of a non-adaptive ensemble average on the same IEEE-CIS held-out set; if the gap disappears, the adaptive weighting based on SHAP agreement would not be the driver of the gains.
read the original abstract
Financial crime costs U.S. institutions over $32 billion each year. Although AI tools for fraud detection have become more advanced, their use in real-world systems still faces a major obstacle: many of these models operate as black boxes that cannot provide the transparent, auditable explanations required by regulations such as OCC Bulletin 2011-12 and Federal Reserve SR 11-7. This study makes three main contributions. First, it offers a thorough evaluation of explanation quality across faithfulness (sufficiency and comprehensiveness at k=5, 10, and 15) and stability (Kendall's W across 30 bootstrap samples). XGBoost paired with TreeExplainer achieves near-perfect stability (W=0.9912), while LSTM with DeepExplainer shows weak results (W=0.4962). Second, the paper introduces the SHAP-Guided Adaptive Ensemble (SGAE), which dynamically adjusts per-transaction ensemble weights based on SHAP attribution agreement, achieving the highest AUC-ROC among all tested models (0.8837 held-out; 0.9245 cross-validation). Third, a complete three-architecture evaluation of LSTM, Transformer, and GNN-GraphSAGE on the full 590,540-transaction IEEE-CIS dataset is provided, with GNN-GraphSAGE achieving AUC-ROC 0.9248 and F1=0.6013. All results are mapped directly to OCC, SR 11-7, and BSA-AML regulatory compliance requirements.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper evaluates explanation quality (faithfulness and stability) for fraud detection models including LSTM, Transformer, GNN-GraphSAGE, and XGBoost on the IEEE-CIS 590,540-transaction dataset. It introduces the SHAP-Guided Adaptive Ensemble (SGAE) that sets per-transaction weights via agreement among base-model SHAP attributions, reports SGAE AUC-ROC of 0.8837 (held-out) and 0.9245 (CV) as the best result, and maps all findings to OCC, SR 11-7, and BSA-AML compliance requirements.
Significance. If the SGAE weighting mechanism can be shown to operate without circular dependence on the same SHAP attributions used for both weighting and explanation, the work would provide a concrete, regulation-aligned path to high-performing yet auditable ensembles for financial fraud detection. The stability comparison (e.g., XGBoost TreeExplainer W=0.9912 vs. LSTM DeepExplainer W=0.4962) supplies useful empirical guidance on explanation reliability.
major comments (1)
- [Abstract / SGAE description] Abstract (SGAE paragraph): the headline claim that SGAE achieves the highest held-out AUC-ROC (0.8837) rests on per-transaction weights derived from SHAP attribution agreement across LSTM, Transformer, and GNN-GraphSAGE. The text provides no explicit statement that this agreement signal is computed on a validation fold strictly separate from the held-out test set used for the AUC metric. Because SHAP values are extracted directly from each base model and the reported stability for LSTM+DeepExplainer is low (W=0.4962), any bias or instability in the explanations can directly propagate into the ensemble weights and therefore into the performance number, creating a potential circularity that must be ruled out before the central performance claim can be accepted.
minor comments (2)
- [Abstract] Abstract: the reported metrics (AUC 0.8837 held-out, 0.9245 CV, F1=0.6013 for GNN) are given without train/test split ratios, hyperparameter search protocol, or any statistical significance test, making it impossible to assess whether the gains are robust.
- [Regulatory compliance discussion] The regulatory mapping section would benefit from a concise table that explicitly links each reported metric (faithfulness at k=5/10/15, stability W, AUC) to the specific requirements in OCC 2011-12 and SR 11-7.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback on manuscript arXiv:2604.14231. The identification of an ambiguity in the SGAE description is helpful and will improve the clarity of the work. We respond point-by-point to the major comment below.
read point-by-point responses
-
Referee: Abstract (SGAE paragraph): the headline claim that SGAE achieves the highest held-out AUC-ROC (0.8837) rests on per-transaction weights derived from SHAP attribution agreement across LSTM, Transformer, and GNN-GraphSAGE. The text provides no explicit statement that this agreement signal is computed on a validation fold strictly separate from the held-out test set used for the AUC metric. Because SHAP values are extracted directly from each base model and the reported stability for LSTM+DeepExplainer is low (W=0.4962), any bias or instability in the explanations can directly propagate into the ensemble weights and therefore into the performance number, creating a potential circularity that must be ruled out before the central performance claim can be accepted.
Authors: We agree that the manuscript does not explicitly state the data partitioning used to compute the SHAP agreement signal for SGAE weights. This omission creates the ambiguity noted. In the revised version we will add a clear statement in the abstract and a new paragraph in Section 3 (Methods) specifying the protocol: base models are trained exclusively on the training fold; SHAP attributions for weight computation are generated on a distinct validation fold; and ensemble predictions plus the reported AUC-ROC (0.8837) are obtained on a strictly held-out test set never seen during weight determination. This separation eliminates test-set leakage into the weighting step. Regarding propagation of instability, the per-transaction weights are derived from cross-model agreement rather than any single model’s SHAP values; the high stability of XGBoost (W=0.9912) therefore anchors the ensemble. We will also insert a short sensitivity analysis showing that down-weighting the less-stable LSTM does not materially alter the final AUC. These changes directly address the circularity concern while preserving the regulatory mapping to SR 11-7 and OCC requirements. revision: yes
Circularity Check
No significant circularity in SGAE derivation chain
full rationale
The paper's central claim rests on training base models (LSTM, Transformer, GNN-GraphSAGE), computing SHAP attributions on them, deriving per-transaction weights from attribution agreement, and then evaluating the resulting ensemble on held-out AUC-ROC (0.8837) and cross-validation (0.9245). No equations or procedural descriptions in the provided abstract reduce the performance metric to the SHAP agreement signal by construction; the held-out evaluation remains an independent benchmark. No self-citations, uniqueness theorems, or ansatzes are invoked to force the result. The method is self-contained against external data splits and regulatory mapping, with no load-bearing step that collapses to a fitted input renamed as prediction.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
SCAFDS: Edge-Feature Graph Attention for Interbank Fraud Detection with Attribution-Grounded SAR Generation
SCAFDS applies edge-feature graph attention on fraud co-occurrence metrics to detect interbank fraud and generate attribution-grounded SAR reports, reporting AUPRC 0.515 and AUROC 0.802 on IEEE-CIS data with gains ove...
Reference graph
Works this paper leans on
-
[1]
M. Shafii et al., 'Explainable AI for fraud detection: An attention-based ensemble of CNNs, GNNs, and a confidence-driven gating mechanism,' arXiv:2410.09069, 2025. [9] Y. Cheng, X. Zhou, J. Wang, and Y. Zhang, 'A comprehensive review of graph neural networks for fraud detection,' Frontiers Comput. Sci., vol. 19, no. 1, pp. 143–162, 2025. [10] T. Deng, S....
-
[2]
P. Thanathamathee et al., 'SHAP-instance weighting for imbalanced fraud detection,' Emerging Science Journal, vol. 8, no. 3, 2024. [31] A. Awasthi, 'Post-hoc explainability and regulatory compliance risk in AI-driven financial decisions,' Financial Innovation, 2025. [32] A. Miró-Nicolau, G. Moyà-Alcover, A. Jaume-i-Capó, M. González-Hidalgo, and P. Bibilo...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.