pith. sign in

arxiv: 1907.03334 · v1 · pith:3ZHSVQWTnew · submitted 2019-07-07 · 💻 cs.LG · cs.HC· stat.ML

Case-Based Reasoning for Assisting Domain Experts in Processing Fraud Alerts of Black-Box Machine Learning Models

Pith reviewed 2026-05-25 01:15 UTC · model grok-4.3

classification 💻 cs.LG cs.HCstat.ML
keywords case-based reasoningfraud detectionpost-hoc explanationsblack-box modelstrustworthinessvisualizationmachine learning
0
0 comments X

The pith

Similarity of local post-hoc explanations enables case-based visualizations that help fraud analysts assess black-box model alerts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a case-based reasoning system that retrieves and visualizes prior instances similar to a new alert according to the similarity of their local post-hoc explanations. The goal is to supply domain experts with concrete evidence bearing on whether a black-box prediction is trustworthy enough to act on. Empirical evaluation indicates that the resulting visualizations support alert processing, and a user study at a major Dutch bank shows the approach is rated useful and easy to use. A sympathetic reader would care because the method offers a practical route to handling opaque high-stakes predictions without demanding full model transparency.

Core claim

A case-based reasoning approach that measures similarity on local post-hoc explanations of predictions can generate visualizations that supply useful evidence on trustworthiness for fraud analysts processing machine-learning alerts.

What carries the argument

Case-based reasoning retrieval that ranks and displays prior cases by similarity of their local post-hoc explanations to supply trustworthiness evidence.

If this is right

  • The visualization can be useful for processing alerts.
  • The approach is perceived useful and easy to use by fraud analysts at a major Dutch bank.
  • Similarity between local post-hoc explanations provides evidence that domain experts can act on.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same retrieval logic could transfer to other regulated domains that rely on black-box scoring.
  • If the explanation similarity metric correlates with expert judgment, the system may shorten review time per alert.
  • The method offers an alternative to building inherently interpretable models when post-hoc tools already exist.

Load-bearing premise

That similarity of local post-hoc explanations between predictions indicates cases that are meaningfully informative about trustworthiness for domain experts.

What would settle it

A controlled comparison in which fraud analysts process the same set of alerts with and without the visualization and show no measurable difference in decision accuracy, speed, or reported .

Figures

Figures reproduced from arXiv: 1907.03334 by Hilde J.P. Weerts, Mykola Pechenizkiy, Werner van Ipenburg.

Figure 1
Figure 1. Figure 1: The two stages of the CBR approach for estimating [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: t-SNE visualization that groups transactions with similar SHAP explanations. The SHAP explanations explain pre [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: In the simulated user experiment, the dataset is [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Improvement or decrease in average MAP of the estimated user confidence score compared to the MAP of the model’s [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The performance of different neighborhood visual [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: CBR dashboard when applied to predictions of a random forest model trained on the [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

In many contexts, it can be useful for domain experts to understand to what extent predictions made by a machine learning model can be trusted. In particular, estimates of trustworthiness can be useful for fraud analysts who process machine learning-generated alerts of fraudulent transactions. In this work, we present a case-based reasoning (CBR) approach that provides evidence on the trustworthiness of a prediction in the form of a visualization of similar previous instances. Different from previous works, we consider similarity of local post-hoc explanations of predictions and show empirically that our visualization can be useful for processing alerts. Furthermore, our approach is perceived useful and easy to use by fraud analysts at a major Dutch bank.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript introduces a case-based reasoning (CBR) approach that visualizes similar previous instances based on the similarity of their local post-hoc explanations, to help fraud analysts assess the trustworthiness of black-box ML model alerts for fraudulent transactions. It reports an empirical demonstration that the visualization aids alert processing and that the approach is perceived as useful and easy to use by fraud analysts at a major Dutch bank.

Significance. If the user study is methodologically sound, the work could provide practical evidence for combining CBR with post-hoc explanations in a high-stakes domain, addressing a real need for domain experts to efficiently process ML-generated alerts.

major comments (1)
  1. [Evaluation] Evaluation section: the abstract and manuscript report an empirical demonstration and positive user feedback but supply no information on study design, sample size, statistical tests, controls, or objective measures of processing improvement; this absence prevents evaluation of support for the central claim that the visualization is useful for processing alerts.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and constructive comment on the evaluation. We address the point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the abstract and manuscript report an empirical demonstration and positive user feedback but supply no information on study design, sample size, statistical tests, controls, or objective measures of processing improvement; this absence prevents evaluation of support for the central claim that the visualization is useful for processing alerts.

    Authors: We agree that the manuscript as submitted provides insufficient detail on the user study methodology. The study was a qualitative evaluation with fraud analysts at the Dutch bank, using questionnaires for perceived usefulness and ease of use (based on TAM) along with think-aloud sessions, but the current text does not report participant count, exact protocol, or any quantitative metrics. In the revision we will expand the Evaluation section with a full description of the study design, sample size, procedure, and any available objective or statistical results to allow proper assessment of the claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes an empirical CBR visualization method based on similarity of local post-hoc explanations, with utility demonstrated via a user study and perception evaluation by fraud analysts. No equations, fitted parameters presented as predictions, self-citation load-bearing steps, or derivation chains exist that reduce any claim to its inputs by construction. The similarity-to-trustworthiness link is introduced as a design premise whose practical value is then tested externally through domain-expert feedback rather than asserted via internal reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no equations, parameters, or invented entities; ledger is empty.

pith-pipeline@v0.9.0 · 5656 in / 941 out tokens · 24903 ms · 2026-05-25T01:15:45.141694+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 1 internal anchor

  1. [1]

    José Miguel Benedí Ruiz, Francisco Casacuberta Nolla, Enrique Vidal Ruiz, In- maculada Benlloch, Antonio Castellanos López, María José Castro Bleda, Jon An- der Gómez Adrián, Alfons Juan Císcar, and Juan Antonio Puchol García. 1991. Proyecto ROARS: Robust Analytical Speech Recognition System. (1991)

  2. [2]

    Fred D. Davis. 1989. Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology. MIS Quarterly 13, 3 (1989), 319–340

  3. [3]

    João Gama, Indre Zliobaite, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. ACM Comput. Surv. 46, 4 (2014), 44:1–44:37. https://doi.org/10.1145/2523813

  4. [4]

    Heinrich Jiang, Been Kim, Melody Guan, and Maya Gupta. 2018. To Trust Or Not To Trust A Classifier. In Advances in Neural Information Processing Systems 31 , S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Inc., 5541–5552

  5. [5]

    Ron Kohavi. 1997. Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision- Tree Hybrid. KDD (09 1997)

  6. [6]

    Kolodner

    Janet L. Kolodner. 1992. An introduction to case-based reasoning. Artificial Intelligence Review 6, 1 (1992), 3–34. https://doi.org/10.1007/bf00155578

  7. [7]

    J. B. Kruskal. 1964. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29, 1 (01 Mar 1964), 1–27. https: //doi.org/10.1007/BF02289565

  8. [8]

    Volodymyr Kuleshov and Percy S Liang. 2015. Calibrated Structured Prediction. In Advances in Neural Information Processing Systems 28 , C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 3474–3482. http://papers.nips.cc/paper/5658-calibrated-structured-prediction.pdf

  9. [9]

    Consistent Individualized Feature Attribution for Tree Ensembles

    Scott M. Lundberg, Gabriel G. Erion, and Su-In Lee. 2018. Consistent Individual- ized Feature Attribution for Tree Ensembles. (2018). arXiv:1802.03888

  10. [10]

    Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30 , I Guyon, U V Luxburg, S Bengio, H Wallach, R Fergus, S Vishwanathan, and R Garnett (Eds.). Curran Associates, Inc., 4765–4774. http://papers.nips.cc/paper/7062-a- unified-approach-to-interpreting-model-predi...

  11. [11]

    McArdle and D.C

    G.P. McArdle and D.C. Wilson. 2003. Visualising Case-Base Usage. In Workshop Proceedings ICCBR, L. McGinty (Ed.). Trondhuim, 105–114

  12. [12]

    Conor Nugent and Pádraig Cunningham. 2005. A Case-Based Explanation System for Black-Box Systems. Artificial Intelligence Review 24, 2 (oct 2005), 163–178. https://doi.org/10.1007/s10462-005-4609-5

  13. [13]

    Olson, William La Cava, Patryk Orzechowski, Ryan J

    Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore. 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10, 1 (11 Dec 2017), 36. https: //doi.org/10.1186/s13040-017-0154-4

  14. [14]

    Pedregosa, G

    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour- napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830

  15. [15]

    Why Should I Trust You?

    Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. InProceedings of the 22nd ACM SIG International Conference on Knowledge Discovery and Data Mining (KDD). ACM Press, New York, New York, USA, 1135–1144. https://doi.org/10. 1145/2939672.2939778

  16. [16]

    Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High- Precision Model-Agnostic Explanations. In AAAI

  17. [17]

    Frode Sørmo, Jörg Cassens, and Agnar Aamodt. 2005. Explanation in Case-Based Reasoning–Perspectives and Goals. Artificial Intelligence Review 24, 2 (oct 2005), 109–143. https://doi.org/10.1007/s10462-005-4607-7

  18. [18]

    Erik Štrumbelj and Igor Kononenko. 2014. Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems 41, 3 (2014), 647–665. https://doi.org/10.1007/s10115-013-0679-x KDD-ADF ’19, August 2019, Anchorage, Alaska, USA Weerts, et al. Figure 6: CBR dashboard when applied to predictions of a random fores...

  19. [19]

    van Rijn, Bernd Bischl, and Luis Torgo

    Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. 2013. OpenML: Networked Science in Machine Learning.SIGKDD Explorations 15, 2 (2013), 49–60. https://doi.org/10.1145/2641190.2641198

  20. [20]

    Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2018. Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR. Harvard journal of law & technology 31 (04 2018), 841–887

  21. [21]

    Weerts, Werner van Ipenburg, and Mykola Pechenizkiy

    Hilde J.P. Weerts, Werner van Ipenburg, and Mykola Pechenizkiy. 2019. A Human- Grounded Evaluation of SHAP for Alert Processing. In Proceedings of KDD Work- shop on Explainable AI (KDD-XAI ’19)

  22. [22]

    Aha, and Takao Mohri

    Dietrich Wettschereck, David W. Aha, and Takao Mohri. 1997. A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms. Artificial Intelligence Review 11, 1/5 (1997), 273–314. https://doi.org/ 10.1023/a:1006593614256

  23. [23]

    Indre Zliobaite, Mykola Pechenizkiy, and Joao Gama. 2016. An overview of concept drift applications. In Big Data Analysis: New Algorithms for a New Society. Springer, 91–114