Case-Based Reasoning for Assisting Domain Experts in Processing Fraud Alerts of Black-Box Machine Learning Models
Pith reviewed 2026-05-25 01:15 UTC · model grok-4.3
The pith
Similarity of local post-hoc explanations enables case-based visualizations that help fraud analysts assess black-box model alerts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A case-based reasoning approach that measures similarity on local post-hoc explanations of predictions can generate visualizations that supply useful evidence on trustworthiness for fraud analysts processing machine-learning alerts.
What carries the argument
Case-based reasoning retrieval that ranks and displays prior cases by similarity of their local post-hoc explanations to supply trustworthiness evidence.
If this is right
- The visualization can be useful for processing alerts.
- The approach is perceived useful and easy to use by fraud analysts at a major Dutch bank.
- Similarity between local post-hoc explanations provides evidence that domain experts can act on.
Where Pith is reading between the lines
- The same retrieval logic could transfer to other regulated domains that rely on black-box scoring.
- If the explanation similarity metric correlates with expert judgment, the system may shorten review time per alert.
- The method offers an alternative to building inherently interpretable models when post-hoc tools already exist.
Load-bearing premise
That similarity of local post-hoc explanations between predictions indicates cases that are meaningfully informative about trustworthiness for domain experts.
What would settle it
A controlled comparison in which fraud analysts process the same set of alerts with and without the visualization and show no measurable difference in decision accuracy, speed, or reported .
Figures
read the original abstract
In many contexts, it can be useful for domain experts to understand to what extent predictions made by a machine learning model can be trusted. In particular, estimates of trustworthiness can be useful for fraud analysts who process machine learning-generated alerts of fraudulent transactions. In this work, we present a case-based reasoning (CBR) approach that provides evidence on the trustworthiness of a prediction in the form of a visualization of similar previous instances. Different from previous works, we consider similarity of local post-hoc explanations of predictions and show empirically that our visualization can be useful for processing alerts. Furthermore, our approach is perceived useful and easy to use by fraud analysts at a major Dutch bank.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a case-based reasoning (CBR) approach that visualizes similar previous instances based on the similarity of their local post-hoc explanations, to help fraud analysts assess the trustworthiness of black-box ML model alerts for fraudulent transactions. It reports an empirical demonstration that the visualization aids alert processing and that the approach is perceived as useful and easy to use by fraud analysts at a major Dutch bank.
Significance. If the user study is methodologically sound, the work could provide practical evidence for combining CBR with post-hoc explanations in a high-stakes domain, addressing a real need for domain experts to efficiently process ML-generated alerts.
major comments (1)
- [Evaluation] Evaluation section: the abstract and manuscript report an empirical demonstration and positive user feedback but supply no information on study design, sample size, statistical tests, controls, or objective measures of processing improvement; this absence prevents evaluation of support for the central claim that the visualization is useful for processing alerts.
Simulated Author's Rebuttal
We thank the referee for their review and constructive comment on the evaluation. We address the point below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: the abstract and manuscript report an empirical demonstration and positive user feedback but supply no information on study design, sample size, statistical tests, controls, or objective measures of processing improvement; this absence prevents evaluation of support for the central claim that the visualization is useful for processing alerts.
Authors: We agree that the manuscript as submitted provides insufficient detail on the user study methodology. The study was a qualitative evaluation with fraud analysts at the Dutch bank, using questionnaires for perceived usefulness and ease of use (based on TAM) along with think-aloud sessions, but the current text does not report participant count, exact protocol, or any quantitative metrics. In the revision we will expand the Evaluation section with a full description of the study design, sample size, procedure, and any available objective or statistical results to allow proper assessment of the claims. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper describes an empirical CBR visualization method based on similarity of local post-hoc explanations, with utility demonstrated via a user study and perception evaluation by fraud analysts. No equations, fitted parameters presented as predictions, self-citation load-bearing steps, or derivation chains exist that reduce any claim to its inputs by construction. The similarity-to-trustworthiness link is introduced as a design premise whose practical value is then tested externally through domain-expert feedback rather than asserted via internal reduction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
José Miguel Benedí Ruiz, Francisco Casacuberta Nolla, Enrique Vidal Ruiz, In- maculada Benlloch, Antonio Castellanos López, María José Castro Bleda, Jon An- der Gómez Adrián, Alfons Juan Císcar, and Juan Antonio Puchol García. 1991. Proyecto ROARS: Robust Analytical Speech Recognition System. (1991)
work page 1991
-
[2]
Fred D. Davis. 1989. Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology. MIS Quarterly 13, 3 (1989), 319–340
work page 1989
-
[3]
João Gama, Indre Zliobaite, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. ACM Comput. Surv. 46, 4 (2014), 44:1–44:37. https://doi.org/10.1145/2523813
-
[4]
Heinrich Jiang, Been Kim, Melody Guan, and Maya Gupta. 2018. To Trust Or Not To Trust A Classifier. In Advances in Neural Information Processing Systems 31 , S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Inc., 5541–5552
work page 2018
-
[5]
Ron Kohavi. 1997. Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision- Tree Hybrid. KDD (09 1997)
work page 1997
-
[6]
Janet L. Kolodner. 1992. An introduction to case-based reasoning. Artificial Intelligence Review 6, 1 (1992), 3–34. https://doi.org/10.1007/bf00155578
-
[7]
J. B. Kruskal. 1964. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29, 1 (01 Mar 1964), 1–27. https: //doi.org/10.1007/BF02289565
-
[8]
Volodymyr Kuleshov and Percy S Liang. 2015. Calibrated Structured Prediction. In Advances in Neural Information Processing Systems 28 , C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 3474–3482. http://papers.nips.cc/paper/5658-calibrated-structured-prediction.pdf
work page 2015
-
[9]
Consistent Individualized Feature Attribution for Tree Ensembles
Scott M. Lundberg, Gabriel G. Erion, and Su-In Lee. 2018. Consistent Individual- ized Feature Attribution for Tree Ensembles. (2018). arXiv:1802.03888
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[10]
Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30 , I Guyon, U V Luxburg, S Bengio, H Wallach, R Fergus, S Vishwanathan, and R Garnett (Eds.). Curran Associates, Inc., 4765–4774. http://papers.nips.cc/paper/7062-a- unified-approach-to-interpreting-model-predi...
work page 2017
-
[11]
G.P. McArdle and D.C. Wilson. 2003. Visualising Case-Base Usage. In Workshop Proceedings ICCBR, L. McGinty (Ed.). Trondhuim, 105–114
work page 2003
-
[12]
Conor Nugent and Pádraig Cunningham. 2005. A Case-Based Explanation System for Black-Box Systems. Artificial Intelligence Review 24, 2 (oct 2005), 163–178. https://doi.org/10.1007/s10462-005-4609-5
-
[13]
Olson, William La Cava, Patryk Orzechowski, Ryan J
Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore. 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10, 1 (11 Dec 2017), 36. https: //doi.org/10.1186/s13040-017-0154-4
-
[14]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour- napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830
work page 2011
-
[15]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. InProceedings of the 22nd ACM SIG International Conference on Knowledge Discovery and Data Mining (KDD). ACM Press, New York, New York, USA, 1135–1144. https://doi.org/10. 1145/2939672.2939778
-
[16]
Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High- Precision Model-Agnostic Explanations. In AAAI
work page 2018
-
[17]
Frode Sørmo, Jörg Cassens, and Agnar Aamodt. 2005. Explanation in Case-Based Reasoning–Perspectives and Goals. Artificial Intelligence Review 24, 2 (oct 2005), 109–143. https://doi.org/10.1007/s10462-005-4607-7
-
[18]
Erik Štrumbelj and Igor Kononenko. 2014. Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems 41, 3 (2014), 647–665. https://doi.org/10.1007/s10115-013-0679-x KDD-ADF ’19, August 2019, Anchorage, Alaska, USA Weerts, et al. Figure 6: CBR dashboard when applied to predictions of a random fores...
-
[19]
van Rijn, Bernd Bischl, and Luis Torgo
Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. 2013. OpenML: Networked Science in Machine Learning.SIGKDD Explorations 15, 2 (2013), 49–60. https://doi.org/10.1145/2641190.2641198
-
[20]
Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2018. Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR. Harvard journal of law & technology 31 (04 2018), 841–887
work page 2018
-
[21]
Weerts, Werner van Ipenburg, and Mykola Pechenizkiy
Hilde J.P. Weerts, Werner van Ipenburg, and Mykola Pechenizkiy. 2019. A Human- Grounded Evaluation of SHAP for Alert Processing. In Proceedings of KDD Work- shop on Explainable AI (KDD-XAI ’19)
work page 2019
-
[22]
Dietrich Wettschereck, David W. Aha, and Takao Mohri. 1997. A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms. Artificial Intelligence Review 11, 1/5 (1997), 273–314. https://doi.org/ 10.1023/a:1006593614256
-
[23]
Indre Zliobaite, Mykola Pechenizkiy, and Joao Gama. 2016. An overview of concept drift applications. In Big Data Analysis: New Algorithms for a New Society. Springer, 91–114
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.