pith. machine review for the scientific record. sign in

arxiv: 2603.05093 · v2 · submitted 2026-03-05 · 💻 cs.LG · cs.AI· cs.CV

Recognition: no theorem link

From Baselines to Transport Geodesics: Axiomatic Attribution via Optimal Generative Flows

Authors on Pith no claims yet

Pith reviewed 2026-05-15 15:59 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV
keywords feature attributionoptimal transportgenerative flowsAumann-Shapleyrectified flowpath selectionkinetic action
0
0 comments X

The pith

Aumann-Shapley line integrals along transport geodesics give unique and stable attributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Feature attributions depend on an implicit choice of path from a reference state to the input, and different paths produce different explanations. The paper proves that for any fixed path the Aumann-Shapley line integral is the unique rule satisfying standard fixed-path axioms plus coordinate-trace regularity. It then treats path selection itself as an optimization problem: minimize the kinetic action of flows that transport a reference distribution to the observed data distribution. The resulting transport-geodesic paths are approximated by Rectified Flow and Reflow, with derived bounds that link vector-field error to attribution error. Experiments indicate that these lower-action paths yield attributions with greater stability and structure than hand-designed interpolations while retaining deletion faithfulness.

Core claim

For a fixed path, the Aumann-Shapley line integral is the unique attribution rule under standard fixed-path axioms and explicit coordinate-trace regularity. For path selection, minimizing kinetic action over flows that transport a reference distribution to the data distribution yields a transport-geodesic attribution principle. This principle is approximated with Rectified Flow and Reflow, together with stability bounds that connect vector-field error to attribution error.

What carries the argument

Kinetic action minimization over generative flows that transport a reference distribution to the data distribution, which selects the transport geodesic used for attribution.

If this is right

  • Aumann-Shapley attributions become independent of arbitrary baseline or interpolation choices once the path is fixed by the transport principle.
  • Lower kinetic action paths produce attributions with measurably greater stability and structure across repeated runs.
  • Approximation error in the learned vector field is bounded in its effect on the final attribution values.
  • Deletion faithfulness remains competitive with standard baseline methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same transport principle could be substituted into other path-dependent methods such as integrated gradients to reduce baseline sensitivity.
  • In settings where the reference distribution must be chosen by the user, the stability gains may still depend on that modeling decision.
  • The stability bounds suggest a practical test: measure attribution variance as a function of flow training epochs on a fixed dataset.

Load-bearing premise

The data-generating process can be adequately modeled by flows whose kinetic action minimization produces attribution paths that are meaningfully better than hand-designed interpolations.

What would settle it

A controlled comparison in which attributions along the approximated transport-geodesic paths exhibit equal or greater variance and less structure than those along linear baselines on identical models and datasets.

Figures

Figures reproduced from arXiv: 2603.05093 by Cenwei Zhang, Lei You, Lin Zhu, Manxi Lin.

Figure 1
Figure 1. Figure 1: Overview of Canonical On-Manifold Shapley via Optimal Flows. Our framework computes the unique axiomatic attribution Ψ (Def 4.1) by integrating the model gradient ∇xfc along the optimal transport path γ ∗ (red curve). As shown by the top samples, γ ∗ remains on the data manifold p1 throughout the transition from the reference distribution p0 to the data. Unlike heuristic methods, this path is geometrically… view at source ↗
Figure 2
Figure 2. Figure 2: Geometric Straightness Implies Explanation Stability. (a) Qualitative Consistency: We visualize attribution maps for the same input across distinct seeds. The Reflowed Shapley (2-RF) yields robust, structure-aligned explanations, whereas the One￾Step Baseline (1-RF) exhibits minor fluctuations due to residual trajectory curvature. (b) Quantitative Correlation: A scatter plot of Transport Cost (Kinetic Ener… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative Visualization Results. Top (CIFAR-10): Traditional methods (IG, GradientSHAP) produce scattered noise, whereas Geodesic Flow method yields coherent object masks. Bottom (CelebA-HQ): In high dimensions, our method captures fine-grained details (e.g., beard, nose, jaw, eyes) without the over-smoothing artifacts observed in DDIM. As can be seen from the comparison, our method is more in line with … view at source ↗
Figure 5
Figure 5. Figure 5: Validation on a Synthetic Additive Model. (Top-left) Attribution identity between the analytical Shapley values (ground truth) and the straight-line path-integral estimator with midpoint quadrature (K=200), showing near-perfect alignment (y=x). (Top-right) Relative ℓ2 error versus integration steps K (log-log), matching the expected O(K−2 ) convergence of the midpoint rule. (Bottom-left) Residual histogram… view at source ↗
Figure 6
Figure 6. Figure 6: 2D Gaussian-to-Gaussian toy: trajectories and density evolution. (a) Trajectory overlays for OT-oracle, 1-RF, 2-RF, and 3-RF (reflow iterations). (b) Density slices at t ∈ {0, 0.25, 0.5, 0.75, 1} for the same methods. The plot illustrates that 1-RF can yield visibly more curved paths than the OT-oracle, while reflow (2-RF/3-RF) significantly straightens trajectories and improves agreement with the oracle i… view at source ↗
Figure 7
Figure 7. Figure 7: High-dimensional (d=10) toy: stability vs. macro-structure (combined view). Left: Attribution discrepancy ∆Ψ versus relative field mismatch ϵrel (both axes shown in symlog for readability across regimes). Points correspond to OT-oracle, 1-RF, strong 1-RF, and reflow variants (2-RF/3-RF). Right: Neighborhood preservation measured by kNN overlap (here k=10) at t ∈ {0, 0.5}, with OT-oracle references included… view at source ↗
Figure 8
Figure 8. Figure 8: Randomly selected attribution results on CelebA-HQ (256 × 256) - Part I. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Randomly selected attribution results on CelebA-HQ (256 × 256) - Part II. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Randomly selected attribution results on CIFAR-10 (32 × 32) - Part I. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Randomly selected attribution results on CIFAR-10 (32 × 32) - Part II. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗
read the original abstract

Feature attributions often hide a critical modeling choice: they explain a prediction along a counterfactual path from a reference state to an input. Different baselines, interpolations, and generative trajectories define different paths and can therefor produce different explanations. We study this path ambiguity as a modeling problem. Our central question is whether the path can be chosen by the data-generating transport process, rather than by a hand-designed interpolation or by the sensitivity geometry of the model being explained. We separate attribution into fixed-path credit allocation and path selection. For a fixed path, we prove that the Aumann-Shapley line integral is the unique attribution rule under standard fixed-path axioms and explicit coordinate-trace regularity. For path selection, we minimize kinetic action over flows that transport a reference distribution to the data distribution, yielding a transport-geodesic attribution principle. We approximate this ideal with Rectified Flow and Reflow and derive stability bounds linking vector-field error to attribution error. Experiments show that lower-action, transport-consistent paths produce more stable and structured explanations, preserving competitive deletion faithfulness, without claiming data-manifold membership. Our code is available at https://github.com/cenweizhang/OTFlowSHAP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper separates feature attribution into fixed-path credit allocation and path selection. For a fixed path, it proves that the Aumann-Shapley line integral is the unique attribution satisfying standard axioms plus explicit coordinate-trace regularity. For path selection, it proposes minimizing kinetic action over generative flows that transport a reference distribution to the data distribution, yielding transport-geodesic paths; these are approximated via Rectified Flow/Reflow with derived stability bounds relating vector-field error to attribution error. Experiments report that lower-action paths yield more stable and structured explanations while preserving deletion faithfulness, without claiming membership on the data manifold.

Significance. If the derivations hold, the work provides a clean axiomatic treatment of fixed-path attribution together with a data-driven path-selection principle grounded in optimal transport. The uniqueness result, the stability bounds, the public code release, and the empirical demonstration of improved stability are concrete strengths. The approach could reduce reliance on arbitrary baselines or interpolations, though its advantage ultimately depends on whether kinetic-action minimization produces paths that are meaningfully closer to the underlying data-generating process than hand-designed alternatives.

major comments (3)
  1. [Abstract and §4] Abstract and §4 (path-selection development): the justification for preferring kinetic-action-minimizing flows rests on the modeling assumption that such paths better reflect the data-generating transport process, yet the manuscript explicitly disclaims data-manifold membership and supplies no additional axioms that would render the kinetic-action minimizer canonical for attribution. This assumption is load-bearing for the central claim that transport-geodesic paths are superior to hand-designed interpolations.
  2. [§3.2] §3.2 (uniqueness proof): the coordinate-trace regularity condition is introduced to obtain uniqueness of the Aumann-Shapley integral, but its necessity and practical restrictiveness are not quantified; if this regularity fails for common model architectures or input distributions, the uniqueness result would not apply to those cases.
  3. [§5] §5 (stability bounds): the derived bounds link Rectified-Flow vector-field approximation error to attribution error, but the manuscript does not report the magnitude of the approximation error observed in the experiments or verify that the bounds remain non-vacuous under the reported Reflow iterations.
minor comments (3)
  1. [Abstract] Abstract: 'therefor' should be 'therefore'.
  2. [§4] Notation for the kinetic-action functional and the transport-geodesic paths should be introduced with a single consistent symbol set rather than being redefined across sections.
  3. [Figures] Figure captions should explicitly state the number of Reflow iterations and the reference distribution used for each panel.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below with clarifications and proposed revisions.

read point-by-point responses
  1. Referee: [Abstract and §4] the justification for preferring kinetic-action-minimizing flows rests on the modeling assumption that such paths better reflect the data-generating transport process, yet the manuscript explicitly disclaims data-manifold membership and supplies no additional axioms that would render the kinetic-action minimizer canonical for attribution.

    Authors: The kinetic-action minimizer is motivated by optimal transport as a canonical, distribution-level principle for selecting paths that minimize integrated squared velocity, independent of any manifold assumption. We explicitly disclaim manifold membership to avoid overclaiming, and present transport geodesics as one principled data-driven alternative rather than the unique canonical choice. In revision we will expand §4 with a dedicated motivation paragraph that contrasts this OT-grounded selection against hand-designed baselines and notes the modeling assumptions without introducing new axioms. revision: partial

  2. Referee: [§3.2] the coordinate-trace regularity condition is introduced to obtain uniqueness of the Aumann-Shapley integral, but its necessity and practical restrictiveness are not quantified; if this regularity fails for common model architectures or input distributions, the uniqueness result would not apply to those cases.

    Authors: The coordinate-trace regularity is the minimal technical condition that closes the axiomatic characterization. We will add a new paragraph to §3.2 that discusses its implications for standard architectures (ReLU MLPs, CNNs) and input distributions, provides simple verification examples, and explicitly states the cases in which uniqueness may fail while the Aumann-Shapley integral remains a valid attribution satisfying the remaining axioms. revision: yes

  3. Referee: [§5] the derived bounds link Rectified-Flow vector-field approximation error to attribution error, but the manuscript does not report the magnitude of the approximation error observed in the experiments or verify that the bounds remain non-vacuous under the reported Reflow iterations.

    Authors: We agree that empirical validation of the bounds is needed. In the revision we will add to §5 and the experimental section the observed vector-field approximation errors (L2 norms) for the Rectified Flow and Reflow models, together with the numerical values of the resulting attribution-error bounds, confirming they are non-vacuous under the reported iteration counts. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained under stated axioms

full rationale

The paper separates fixed-path attribution (proving Aumann-Shapley uniqueness via explicit axioms plus coordinate-trace regularity) from path selection (defining paths via kinetic-action minimization over transport flows). Neither step reduces by construction to fitted inputs, self-citations, or renamed assumptions; the uniqueness claim is a direct proof under listed axioms rather than an imported theorem, and the transport objective is introduced as a modeling principle without parameter fitting to the attribution target. No load-bearing self-citation chains or ansatzes appear in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard fixed-path axioms for the uniqueness proof and on the modeling assumption that data can be transported by flows whose kinetic action defines a meaningful attribution path. No explicit free parameters or invented entities are stated in the abstract.

axioms (1)
  • domain assumption standard fixed-path axioms plus explicit coordinate-trace regularity
    Invoked to establish uniqueness of the Aumann-Shapley line integral for fixed-path attribution.

pith-pipeline@v0.9.0 · 5517 in / 1168 out tokens · 37946 ms · 2026-05-15T15:59:44.986282+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages · 6 internal anchors

  1. [1]

    Shapley explainability on the data manifold

    Frye, C., de Mijolla, D., Begley, T., Cowton, L., Stanley, M., and Feige, I. Shapley explainability on the data manifold. arXiv preprint arXiv:2006.01272, 2020a. Frye, C., Rowat, C., and Feige, I. Asymmetric shapley val- ues: incorporating causal knowledge into model-agnostic explainability.Advances in neural information processing systems, 33:1229–1239, ...

  2. [3]

    Progressive Growing of GANs for Improved Quality, Stability, and Variation

    URL http://arxiv.org/abs/1710.10196. Krizhevsky, A. Learning multiple layers of features from tiny images. Technical report,

  3. [4]

    Flow Matching for Generative Modeling

    Lipman, Y ., Chen, R. T., Ben-Hamu, H., Nickel, M., and Le, M. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747,

  4. [5]

    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

    Liu, X., Gong, C., and Liu, Q. Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003,

  5. [6]

    URL https://proceedings.neurips

    Curran Associates, Inc., 2017a. URL https://proceedings.neurips. cc/paper_files/paper/2017/file/ 8a20a8621978632d76c43dfd28b67767-Paper. pdf. Lundberg, S. M. and Lee, S.-I. A unified approach to inter- preting model predictions.Advances in neural informa- tion processing systems, 30, 2017b. Montavon, G., Lapuschkin, S., Binder, A., Samek, W., and M¨uller,...

  6. [7]

    RISE: Randomized Input Sampling for Explanation of Black-box Models

    Petsiuk, V ., Das, A., and Saenko, K. Rise: Randomized in- put sampling for explanation of black-box models.arXiv preprint arXiv:1806.07421,

  7. [9]

    SmoothGrad: removing noise by adding noise

    URL http: //arxiv.org/abs/1706.03825. Song, J., Meng, C., and Ermon, S. Denoising diffusion implicit models.arXiv:2010.02502, October

  8. [10]

    Denoising Diffusion Implicit Models

    URL https://arxiv.org/abs/2010.02502. Springenberg, J. T., Dosovitskiy, A., Brox, T., and Ried- miller, M. Striving for simplicity: The all convolutional net.arXiv preprint arXiv:1412.6806,

  9. [11]

    Caltech-ucsd birds-200-2011

    Wah, C., Branson, S., Welinder, P., Perona, P., and Belongie, S. Caltech-ucsd birds-200-2011. Technical Report CNS- TR-2011-001, California Institute of Technology,

  10. [12]

    Our work generalizes this by relaxing the straight-line requirement to allow paths induced by generative flows fitted to the data distribution

    dt.(13) This formulation corresponds to the unique attribution method satisfying axioms such as sensitivity and implementation invariance under the straight-line constraint. Our work generalizes this by relaxing the straight-line requirement to allow paths induced by generative flows fitted to the data distribution. A.3. Flow Matching Details In our frame...

  11. [13]

    (Top-right) Relative ℓ2 error versus integration steps K (log-log), matching the expected O(K −2) convergence of the midpoint rule

    (theoretical) 4 2 0 2 4 Residual (Flow - Classical) 1e 6 0 100000 200000 300000 400000 500000 600000 700000Density Residual Distribution (K=200) Mean: -1.33e-07, Std: 2.41e-06 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Classical Shapley 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Path-Integrated Shapley Comparison Across Different K K=10 K=20 K=50 K=10...