Uncertainty Quantification as a Principled Foundation for Explainable Artificial Intelligence: A Case Study of Counterfactual Explanations

Eyke H\"ullermeier; Kacper Sokol; Santo M.A.R. Thies

arxiv: 2502.17007 · v2 · pith:EZZW7QAMnew · submitted 2025-02-24 · 💻 cs.LG · cs.AI· stat.ML

Uncertainty Quantification as a Principled Foundation for Explainable Artificial Intelligence: A Case Study of Counterfactual Explanations

Kacper Sokol , Santo M.A.R. Thies , Eyke H\"ullermeier This is my paper

Pith reviewed 2026-05-23 01:43 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords uncertainty quantificationcounterfactual explanationsexplainable artificial intelligencetransparencymachine learningvalidityproximity

0 comments

The pith

Uncertainty quantification supplies a unifying framework for counterfactual explainability

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that uncertainty estimates can express the essential properties of counterfactual explanations, such as validity and proximity, in a direct way. This framing supports two simple explainer variants, one using only uncertainty and another combining it with feature-space distance. Experiments show these variants perform competitively with far more elaborate state-of-the-art methods. A reader would care because the approach replaces ad-hoc rules with a single, already-available quantity from the underlying model. The broader argument is that folding core AI concepts like uncertainty into explainability research produces more reliable predictive systems.

Core claim

Uncertainty can provide a principled unifying framework for counterfactual explainability by expressing the core counterfactual properties in terms of uncertainty, allowing us to build two variants of an explainer upon them -- one based solely on uncertainty estimates and another pairing them with distance measured in the feature space. Comprehensive experiments illustrate highly competitive performance of this framework when compared to many state-of-the-art methods despite its radically simple design.

What carries the argument

Expressing core counterfactual properties such as validity and proximity directly in terms of uncertainty estimates

If this is right

Two explainer variants become available: one relying solely on uncertainty estimates and one that also uses feature-space distance.
The resulting explainers reach performance levels comparable to many state-of-the-art methods despite a far simpler design.
Transparency research that incorporates uncertainty quantification produces more reliable, robust, and understandable predictive models.
Making explainability uncertainty-aware constitutes the first step toward integrating artificial-intelligence fundamentals into transparency methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same uncertainty-based reduction could be applied to other explanation types such as feature attributions or decision rules.
High-uncertainty regions identified by the framework might flag inputs where explanations are inherently less trustworthy.
The approach invites direct comparison of uncertainty calibration quality across different counterfactual generators.

Load-bearing premise

Core counterfactual properties can be expressed using uncertainty estimates without losing essential aspects of the explanation task.

What would settle it

An experiment in which the uncertainty-only or uncertainty-plus-distance explainers produce fewer valid counterfactuals or systematically larger feature distances than established methods would falsify the unifying-framework claim.

read the original abstract

In this paper we argue that, to its detriment, transparency research overlooks many foundational concepts of artificial intelligence. As an illustrating example we focus on uncertainty quantification in the context of counterfactual explainability, demonstrating that its broader adoption could address key challenges in the field. To this end, we show how uncertainty can provide a principled unifying framework for counterfactual explainability by expressing the core counterfactual properties in terms of uncertainty, allowing us to build two variants of an explainer upon them -- one based solely on uncertainty estimates and another pairing them with distance measured in the feature space. Our comprehensive experiments illustrate highly competitive performance of our framework when compared to many state-of-the-art methods despite its radically simple design. More broadly, the paper demonstrates that integrating artificial intelligence fundamentals into transparency research promises to yield more reliable, robust and understandable predictive models. We posit that making artificial intelligence explainability truly uncertainty-aware is the first step towards this goal.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper recasts counterfactual properties as uncertainty terms and builds two simple explainers from them, but the uncertainty-only version cannot enforce proximity without an extra distance component.

read the letter

The main contribution is showing that validity and proximity in counterfactuals can be expressed through uncertainty estimates, then using that to derive two explainer variants. One relies only on uncertainty; the other combines it with feature-space distance. The experiments report competitive results against standard baselines on common datasets while keeping the methods lightweight. That simplicity is the strongest part of the work and makes the framing easy to test or extend. The stress-test concern holds up: validity translates reasonably to low uncertainty on the target class, but proximity is a distance property in input space that uncertainty measures do not capture. Two points can share the same uncertainty score yet sit arbitrarily far apart, so the pure uncertainty variant either leaves proximity uncontrolled or relies on implicit selection that the framework does not state. The second variant adds distance explicitly, which confirms the gap. The paper is therefore a useful conceptual link between uncertainty quantification and counterfactual XAI rather than a complete unification. Readers already working on uncertainty-aware transparency methods will get the most from it. The idea is clear enough and the empirical claims are concrete enough that it deserves peer review, even if the theoretical mapping needs tightening on the proximity side.

Referee Report

2 major / 2 minor

Summary. The manuscript argues that uncertainty quantification provides a principled unifying framework for counterfactual explainability. It claims that core properties (validity, proximity) can be expressed directly in terms of uncertainty estimates, enabling construction of two explainer variants—one relying solely on uncertainty and another combining uncertainty with feature-space distance—while achieving competitive empirical performance against state-of-the-art methods despite a simple design.

Significance. If the central mapping from uncertainty to counterfactual properties holds without hidden assumptions, the work would usefully connect XAI to foundational AI concepts such as predictive uncertainty, potentially improving robustness of explanations. The emphasis on a radically simple design that still matches SOTA is a positive feature worth highlighting if the experiments are reproducible and the uncertainty-only variant is shown to satisfy proximity without post-hoc selection.

major comments (2)

[Abstract and §3] Abstract and §3 (framework definition): the claim that 'core counterfactual properties' including proximity 'can be expressed in terms of uncertainty' is load-bearing for both variants. Predictive uncertainty (entropy, variance, etc.) is invariant under input-space isometries, so two points with identical uncertainty can lie arbitrarily far apart; the manuscript must explicitly derive or bound the proximity term from uncertainty alone or acknowledge that the uncertainty-only variant requires an implicit distance penalty or post-selection step.
[§4 and Tables 2/3] §4 (experimental setup) and Table 2/3: the abstract asserts 'highly competitive performance' for the uncertainty-only variant, yet no baseline definitions, dataset statistics, statistical significance tests, or ablation isolating the uncertainty component versus the distance-augmented variant are visible in the provided description. Without these, the empirical support for the unifying claim cannot be evaluated.

minor comments (2)

[§2] Notation for uncertainty measures (e.g., which specific estimator—MC dropout, ensemble variance, etc.) should be introduced with an equation in §2 before being used in the explainer definitions.
[Abstract] The abstract is information-dense; a short paragraph separating the conceptual contribution from the experimental claims would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The two major comments identify areas where additional formalization and experimental rigor would strengthen the manuscript. We address each point below and will incorporate revisions as noted.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (framework definition): the claim that 'core counterfactual properties' including proximity 'can be expressed in terms of uncertainty' is load-bearing for both variants. Predictive uncertainty (entropy, variance, etc.) is invariant under input-space isometries, so two points with identical uncertainty can lie arbitrarily far apart; the manuscript must explicitly derive or bound the proximity term from uncertainty alone or acknowledge that the uncertainty-only variant requires an implicit distance penalty or post-selection step.

Authors: We agree that the invariance of standard predictive uncertainty measures under isometries is an important observation that requires explicit treatment. Our framework expresses validity directly via the uncertainty of the counterfactual prediction (low uncertainty implies the model assigns high probability to the desired class). For proximity, the uncertainty-only variant relies on the empirical observation that, in practice, low-uncertainty regions tend to lie near the original instance for locally smooth models; however, we acknowledge this is not formally bounded in the current text. We will revise §3 to include a discussion of this limitation, add any available derivation or assumption under which proximity follows from uncertainty, and clarify whether the uncertainty-only variant implicitly benefits from model properties or requires post-selection. revision: yes
Referee: [§4 and Tables 2/3] §4 (experimental setup) and Table 2/3: the abstract asserts 'highly competitive performance' for the uncertainty-only variant, yet no baseline definitions, dataset statistics, statistical significance tests, or ablation isolating the uncertainty component versus the distance-augmented variant are visible in the provided description. Without these, the empirical support for the unifying claim cannot be evaluated.

Authors: The full manuscript contains baseline definitions (standard methods such as those from Wachter et al. and Mothilal et al.), dataset statistics (Table 1), and results (Tables 2/3). However, we accept that statistical significance testing and an explicit ablation isolating the uncertainty component are missing. We will add these elements in the revision: paired statistical tests with p-values across datasets, and an ablation comparing the uncertainty-only variant against a pure distance baseline and the combined variant. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation from uncertainty mapping is independent and self-contained

full rationale

The paper's central step expresses counterfactual properties (validity, proximity) in terms of uncertainty estimates to construct two explainer variants. No equations, fitted parameters, or self-citations are shown reducing this mapping to the target result by construction. The uncertainty-only variant is presented as a direct consequence of the uncertainty formulation, while the second variant explicitly incorporates feature-space distance; neither reduces to renaming inputs or load-bearing self-citation. The derivation remains externally falsifiable against standard counterfactual benchmarks and does not match any enumerated circularity pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on the abstract alone, the central claim rests on the assumption that uncertainty estimates can express counterfactual properties; no explicit free parameters, invented entities, or additional axioms are described.

axioms (1)

domain assumption Uncertainty estimates from predictive models are reliable and sufficient to express core counterfactual properties such as validity and proximity.
This premise is required to build the two explainer variants from uncertainty alone.

pith-pipeline@v0.9.0 · 5705 in / 1150 out tokens · 41841 ms · 2026-05-23T01:43:19.585439+00:00 · methodology

Uncertainty Quantification as a Principled Foundation for Explainable Artificial Intelligence: A Case Study of Counterfactual Explanations

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)