Counterfactual Maps: What They Are and How to Find Them

Awa Khouna; Julien Ferry; Thibaut Vidal

arxiv: 2602.09128 · v2 · submitted 2026-02-09 · 💻 cs.LG

Counterfactual Maps: What They Are and How to Find Them

Awa Khouna , Julien Ferry , Thibaut Vidal This is my paper

Pith reviewed 2026-05-16 05:17 UTC · model grok-4.3

classification 💻 cs.LG

keywords counterfactual explanationstree ensemblesKD-treesnearest region searchinterpretable machine learningrecoursehyperrectangles

0 comments

The pith

Tree ensembles can be turned into hyperrectangle partitions whose nearest alternative-label region gives an exact counterfactual.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Any tree ensemble compresses without loss into a finite collection of axis-aligned hyperrectangles, each carrying one fixed prediction label. For a query point the optimal counterfactual is therefore simply the closest point that lands inside a rectangle of a different label. The paper treats this search as a nearest-region problem and builds a global data structure, called a counterfactual map, that answers it exactly. The map is realized with volumetric KD-trees that support branch-and-bound queries together with explicit optimality certificates. After a one-time build phase the method returns globally optimal recourse in milliseconds on real data, orders of magnitude faster than cold-start exact solvers.

Core claim

By compressing any tree ensemble into an equivalent partition of labeled hyperrectangles, counterfactual search is cast as the problem of identifying the generalized Voronoi cell associated with the nearest rectangle of an alternative label. This leads to an exact, amortized algorithm based on volumetric k-dimensional trees, which performs branch-and-bound nearest-region queries with explicit optimality certificates and sublinear average query time after a one-time preprocessing phase.

What carries the argument

volumetric KD-trees that answer nearest-region queries over a partition of labeled hyperrectangles

If this is right

Counterfactuals are globally optimal with explicit optimality certificates
Query latency drops to milliseconds after one preprocessing pass
The same map supports every future query on the same model without re-optimization
The approach scales to interactive use in high-stakes domains where prior exact solvers were too slow

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same partition view could be used to generate entire counterfactual maps for visualization or auditing of model behavior.
If the hyperrectangle representation can be approximated for non-tree models, similar amortized search might apply more broadly.
Batch queries over many points could be further accelerated by sharing the same KD-tree traversal.

Load-bearing premise

The KD-tree nearest-region search stays both exact and sublinear even when the number and dimensionality of the hyperrectangles produced by real tree ensembles become large.

What would settle it

Run the method on successively deeper trees or higher-dimensional data and check whether measured query times remain sublinear while every returned point is provably the closest alternative-label rectangle.

read the original abstract

Counterfactual explanations are a central tool in interpretable machine learning, yet computing them exactly for complex models remains challenging. For tree ensembles, predictions are piecewise constant over a large collection of axis-aligned hyperrectangles, implying that an optimal counterfactual for a point corresponds to its projection onto the nearest rectangle with an alternative label under a chosen metric. Existing methods largely overlook this geometric structure, relying either on heuristics with no optimality guarantees or on mixed-integer programming formulations that do not scale to interactive use. In this work, we revisit counterfactual generation through the lens of nearest-region search and introduce counterfactual maps, a global representation of recourse for tree ensembles. Leveraging the fact that any tree ensemble can be compressed into an equivalent partition of labeled hyperrectangles, we cast counterfactual search as the problem of identifying the generalized Voronoi cell associated with the nearest rectangle of an alternative label. This leads to an exact, amortized algorithm based on volumetric k-dimensional (KD) trees, which performs branch-and-bound nearest-region queries with explicit optimality certificates and sublinear average query time after a one-time preprocessing phase. Our experimental analyses on several real datasets drawn from high-stakes application domains show that this approach delivers globally optimal counterfactual explanations with millisecond-level latency, achieving query times that are orders of magnitude faster than existing exact, cold-start optimization methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces counterfactual maps as a global representation of recourse for tree ensembles. It compresses any tree ensemble into an equivalent partition of labeled axis-aligned hyperrectangles, then casts exact counterfactual search as identifying the nearest alternative-label rectangle under a chosen metric via generalized Voronoi cells. This yields an exact amortized algorithm using volumetric KD-trees that performs branch-and-bound nearest-region queries with optimality certificates and claimed sublinear average query time after one-time preprocessing, with experimental support for millisecond latencies on real high-stakes datasets.

Significance. If the exactness and sublinear scaling hold under realistic partition cardinalities and dimensions, the work would provide a meaningful advance for interpretable ML by enabling fast, globally optimal counterfactual explanations for tree models without relying on heuristics or unscalable MIP solvers. The geometric reduction to nearest-region search in hyperrectangular partitions is a clean and reusable insight that builds directly on standard properties of decision trees.

major comments (3)

[Abstract, §3] Abstract and §3: The central claim of sublinear average query time after preprocessing rests on effective volume-based pruning during branch-and-bound in the volumetric KD-tree. No analysis is supplied of how query cost scales with input dimension d or the number of hyperrectangles N (which can reach thousands in gradient-boosted trees); in d ≳ 20 the pruning may visit a linear fraction of nodes, undermining the amortized guarantee.
[§4] §4 (experiments): The abstract asserts millisecond-level queries and orders-of-magnitude speedups over exact cold-start methods on real datasets, yet reports neither the input dimensions, the cardinality of the compressed partitions, nor scaling plots versus N or d. Without these, it is impossible to confirm that the sublinear behavior observed is not an artifact of the specific tested regimes.
[§3] §3: The description of the volumetric KD-tree nearest-region search and the explicit optimality certificates is conceptual only. No pseudocode, proof sketch, or bound on nodes visited (e.g., in terms of d and N) is provided, leaving the exactness and efficiency claims only partially substantiated.

minor comments (2)

[Abstract, §2] The term 'generalized Voronoi cell' is used without a formal definition or pointer to prior literature on Voronoi diagrams over hyperrectangular partitions.
[§2] Notation for the hyperrectangle partition, the distance metric, and the label function should be introduced with explicit symbols early in the main text to improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. The geometric reduction to nearest-region search is indeed central to the contribution, and we agree that strengthening the analysis, experimental reporting, and algorithmic description will improve the manuscript. We address each point below.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3: The central claim of sublinear average query time after preprocessing rests on effective volume-based pruning during branch-and-bound in the volumetric KD-tree. No analysis is supplied of how query cost scales with input dimension d or the number of hyperrectangles N (which can reach thousands in gradient-boosted trees); in d ≳ 20 the pruning may visit a linear fraction of nodes, undermining the amortized guarantee.

Authors: We acknowledge that the manuscript does not include a formal complexity analysis or explicit bounds on nodes visited as a function of d and N. The sublinear claim is supported by the volume-based pruning mechanism in the volumetric KD-tree, which exploits the axis-aligned hyperrectangular structure, but we agree this requires further substantiation. In the revision we will add a dedicated paragraph in §3 discussing expected scaling behavior, the role of region volumes in pruning, and the practical regimes (including moderate d typical of tabular data) where sublinear performance holds, along with a note on potential degradation in very high dimensions. revision: yes
Referee: [§4] §4 (experiments): The abstract asserts millisecond-level queries and orders-of-magnitude speedups over exact cold-start methods on real datasets, yet reports neither the input dimensions, the cardinality of the compressed partitions, nor scaling plots versus N or d. Without these, it is impossible to confirm that the sublinear behavior observed is not an artifact of the specific tested regimes.

Authors: We agree that these details are necessary for proper interpretation. The revised §4 will explicitly report the input dimensions and the number of hyperrectangles after compression for each dataset. We will also add scaling plots of query time versus N (by varying ensemble size) and versus d (by subsampling features) to demonstrate the observed scaling behavior. revision: yes
Referee: [§3] §3: The description of the volumetric KD-tree nearest-region search and the explicit optimality certificates is conceptual only. No pseudocode, proof sketch, or bound on nodes visited (e.g., in terms of d and N) is provided, leaving the exactness and efficiency claims only partially substantiated.

Authors: The optimality certificate is obtained by maintaining a global upper bound on the distance to the nearest alternative-label region and pruning any KD-tree node whose minimum possible distance exceeds this bound. We will add pseudocode for the branch-and-bound nearest-region query in the revised §3, together with a concise proof sketch establishing that termination yields the exact nearest alternative-label hyperrectangle. A brief discussion of node-visit bounds in terms of volume ratios will also be included. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation uses standard geometric data structures

full rationale

The paper compresses tree ensembles into labeled hyperrectangles (a direct, standard equivalence) and reduces counterfactual search to nearest-region queries on the induced partition via volumetric KD-trees and branch-and-bound. This rests on established properties of axis-aligned partitions and KD-tree nearest-neighbor search rather than any self-definitional loop, fitted parameter renamed as prediction, or load-bearing self-citation. No equation or step reduces the claimed exact optimality or sublinear amortized time to a quantity defined by the method itself; the approach is self-contained against external algorithmic benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the domain assumption that tree ensembles admit an exact lossless compression to a finite collection of labeled axis-aligned hyperrectangles and on the standard mathematical properties of KD-trees for nearest-neighbor search in Euclidean space.

axioms (2)

domain assumption Any tree ensemble can be represented exactly as a finite partition of the input space into axis-aligned hyperrectangles each carrying a constant label.
Invoked in the abstract as the starting point for casting counterfactual search as nearest-region search.
standard math Volumetric KD-trees support branch-and-bound nearest-region queries that return exact nearest neighbors with optimality certificates.
Standard property of KD-tree data structures used to obtain sublinear amortized query time.

invented entities (1)

counterfactual map no independent evidence
purpose: Global data structure that encodes all possible optimal counterfactuals for a given tree ensemble via generalized Voronoi cells over labeled hyperrectangles.
New conceptual object introduced to enable the amortized exact search; no independent evidence outside the paper is provided.

pith-pipeline@v0.9.0 · 5534 in / 1557 out tokens · 41058 ms · 2026-05-16T05:17:19.420402+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean, Cost/FunctionalEquation.lean, AlexanderDuality.lean reality_from_one_distinction, washburn_uniqueness_aczel, alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Leveraging the fact that any tree ensemble can be compressed into an equivalent partition of labeled hyperrectangles, we cast counterfactual search as the problem of identifying the generalized Voronoi cell associated with the nearest rectangle of an alternative label. This leads to an exact, amortized algorithm based on volumetric k-dimensional (KD) trees...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.