Recognition: no theorem link
From Baselines to Transport Geodesics: Axiomatic Attribution via Optimal Generative Flows
Pith reviewed 2026-05-15 15:59 UTC · model grok-4.3
The pith
Aumann-Shapley line integrals along transport geodesics give unique and stable attributions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For a fixed path, the Aumann-Shapley line integral is the unique attribution rule under standard fixed-path axioms and explicit coordinate-trace regularity. For path selection, minimizing kinetic action over flows that transport a reference distribution to the data distribution yields a transport-geodesic attribution principle. This principle is approximated with Rectified Flow and Reflow, together with stability bounds that connect vector-field error to attribution error.
What carries the argument
Kinetic action minimization over generative flows that transport a reference distribution to the data distribution, which selects the transport geodesic used for attribution.
If this is right
- Aumann-Shapley attributions become independent of arbitrary baseline or interpolation choices once the path is fixed by the transport principle.
- Lower kinetic action paths produce attributions with measurably greater stability and structure across repeated runs.
- Approximation error in the learned vector field is bounded in its effect on the final attribution values.
- Deletion faithfulness remains competitive with standard baseline methods.
Where Pith is reading between the lines
- The same transport principle could be substituted into other path-dependent methods such as integrated gradients to reduce baseline sensitivity.
- In settings where the reference distribution must be chosen by the user, the stability gains may still depend on that modeling decision.
- The stability bounds suggest a practical test: measure attribution variance as a function of flow training epochs on a fixed dataset.
Load-bearing premise
The data-generating process can be adequately modeled by flows whose kinetic action minimization produces attribution paths that are meaningfully better than hand-designed interpolations.
What would settle it
A controlled comparison in which attributions along the approximated transport-geodesic paths exhibit equal or greater variance and less structure than those along linear baselines on identical models and datasets.
Figures
read the original abstract
Feature attributions often hide a critical modeling choice: they explain a prediction along a counterfactual path from a reference state to an input. Different baselines, interpolations, and generative trajectories define different paths and can therefor produce different explanations. We study this path ambiguity as a modeling problem. Our central question is whether the path can be chosen by the data-generating transport process, rather than by a hand-designed interpolation or by the sensitivity geometry of the model being explained. We separate attribution into fixed-path credit allocation and path selection. For a fixed path, we prove that the Aumann-Shapley line integral is the unique attribution rule under standard fixed-path axioms and explicit coordinate-trace regularity. For path selection, we minimize kinetic action over flows that transport a reference distribution to the data distribution, yielding a transport-geodesic attribution principle. We approximate this ideal with Rectified Flow and Reflow and derive stability bounds linking vector-field error to attribution error. Experiments show that lower-action, transport-consistent paths produce more stable and structured explanations, preserving competitive deletion faithfulness, without claiming data-manifold membership. Our code is available at https://github.com/cenweizhang/OTFlowSHAP.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper separates feature attribution into fixed-path credit allocation and path selection. For a fixed path, it proves that the Aumann-Shapley line integral is the unique attribution satisfying standard axioms plus explicit coordinate-trace regularity. For path selection, it proposes minimizing kinetic action over generative flows that transport a reference distribution to the data distribution, yielding transport-geodesic paths; these are approximated via Rectified Flow/Reflow with derived stability bounds relating vector-field error to attribution error. Experiments report that lower-action paths yield more stable and structured explanations while preserving deletion faithfulness, without claiming membership on the data manifold.
Significance. If the derivations hold, the work provides a clean axiomatic treatment of fixed-path attribution together with a data-driven path-selection principle grounded in optimal transport. The uniqueness result, the stability bounds, the public code release, and the empirical demonstration of improved stability are concrete strengths. The approach could reduce reliance on arbitrary baselines or interpolations, though its advantage ultimately depends on whether kinetic-action minimization produces paths that are meaningfully closer to the underlying data-generating process than hand-designed alternatives.
major comments (3)
- [Abstract and §4] Abstract and §4 (path-selection development): the justification for preferring kinetic-action-minimizing flows rests on the modeling assumption that such paths better reflect the data-generating transport process, yet the manuscript explicitly disclaims data-manifold membership and supplies no additional axioms that would render the kinetic-action minimizer canonical for attribution. This assumption is load-bearing for the central claim that transport-geodesic paths are superior to hand-designed interpolations.
- [§3.2] §3.2 (uniqueness proof): the coordinate-trace regularity condition is introduced to obtain uniqueness of the Aumann-Shapley integral, but its necessity and practical restrictiveness are not quantified; if this regularity fails for common model architectures or input distributions, the uniqueness result would not apply to those cases.
- [§5] §5 (stability bounds): the derived bounds link Rectified-Flow vector-field approximation error to attribution error, but the manuscript does not report the magnitude of the approximation error observed in the experiments or verify that the bounds remain non-vacuous under the reported Reflow iterations.
minor comments (3)
- [Abstract] Abstract: 'therefor' should be 'therefore'.
- [§4] Notation for the kinetic-action functional and the transport-geodesic paths should be introduced with a single consistent symbol set rather than being redefined across sections.
- [Figures] Figure captions should explicitly state the number of Reflow iterations and the reference distribution used for each panel.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below with clarifications and proposed revisions.
read point-by-point responses
-
Referee: [Abstract and §4] the justification for preferring kinetic-action-minimizing flows rests on the modeling assumption that such paths better reflect the data-generating transport process, yet the manuscript explicitly disclaims data-manifold membership and supplies no additional axioms that would render the kinetic-action minimizer canonical for attribution.
Authors: The kinetic-action minimizer is motivated by optimal transport as a canonical, distribution-level principle for selecting paths that minimize integrated squared velocity, independent of any manifold assumption. We explicitly disclaim manifold membership to avoid overclaiming, and present transport geodesics as one principled data-driven alternative rather than the unique canonical choice. In revision we will expand §4 with a dedicated motivation paragraph that contrasts this OT-grounded selection against hand-designed baselines and notes the modeling assumptions without introducing new axioms. revision: partial
-
Referee: [§3.2] the coordinate-trace regularity condition is introduced to obtain uniqueness of the Aumann-Shapley integral, but its necessity and practical restrictiveness are not quantified; if this regularity fails for common model architectures or input distributions, the uniqueness result would not apply to those cases.
Authors: The coordinate-trace regularity is the minimal technical condition that closes the axiomatic characterization. We will add a new paragraph to §3.2 that discusses its implications for standard architectures (ReLU MLPs, CNNs) and input distributions, provides simple verification examples, and explicitly states the cases in which uniqueness may fail while the Aumann-Shapley integral remains a valid attribution satisfying the remaining axioms. revision: yes
-
Referee: [§5] the derived bounds link Rectified-Flow vector-field approximation error to attribution error, but the manuscript does not report the magnitude of the approximation error observed in the experiments or verify that the bounds remain non-vacuous under the reported Reflow iterations.
Authors: We agree that empirical validation of the bounds is needed. In the revision we will add to §5 and the experimental section the observed vector-field approximation errors (L2 norms) for the Rectified Flow and Reflow models, together with the numerical values of the resulting attribution-error bounds, confirming they are non-vacuous under the reported iteration counts. revision: yes
Circularity Check
No significant circularity; derivation self-contained under stated axioms
full rationale
The paper separates fixed-path attribution (proving Aumann-Shapley uniqueness via explicit axioms plus coordinate-trace regularity) from path selection (defining paths via kinetic-action minimization over transport flows). Neither step reduces by construction to fitted inputs, self-citations, or renamed assumptions; the uniqueness claim is a direct proof under listed axioms rather than an imported theorem, and the transport objective is introduced as a modeling principle without parameter fitting to the attribution target. No load-bearing self-citation chains or ansatzes appear in the provided text.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption standard fixed-path axioms plus explicit coordinate-trace regularity
Reference graph
Works this paper leans on
-
[1]
Shapley explainability on the data manifold
Frye, C., de Mijolla, D., Begley, T., Cowton, L., Stanley, M., and Feige, I. Shapley explainability on the data manifold. arXiv preprint arXiv:2006.01272, 2020a. Frye, C., Rowat, C., and Feige, I. Asymmetric shapley val- ues: incorporating causal knowledge into model-agnostic explainability.Advances in neural information processing systems, 33:1229–1239, ...
-
[3]
Progressive Growing of GANs for Improved Quality, Stability, and Variation
URL http://arxiv.org/abs/1710.10196. Krizhevsky, A. Learning multiple layers of features from tiny images. Technical report,
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
Flow Matching for Generative Modeling
Lipman, Y ., Chen, R. T., Ben-Hamu, H., Nickel, M., and Le, M. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Liu, X., Gong, C., and Liu, Q. Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003,
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
URL https://proceedings.neurips
Curran Associates, Inc., 2017a. URL https://proceedings.neurips. cc/paper_files/paper/2017/file/ 8a20a8621978632d76c43dfd28b67767-Paper. pdf. Lundberg, S. M. and Lee, S.-I. A unified approach to inter- preting model predictions.Advances in neural informa- tion processing systems, 30, 2017b. Montavon, G., Lapuschkin, S., Binder, A., Samek, W., and M¨uller,...
work page 2017
-
[7]
RISE: Randomized Input Sampling for Explanation of Black-box Models
Petsiuk, V ., Das, A., and Saenko, K. Rise: Randomized in- put sampling for explanation of black-box models.arXiv preprint arXiv:1806.07421,
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
SmoothGrad: removing noise by adding noise
URL http: //arxiv.org/abs/1706.03825. Song, J., Meng, C., and Ermon, S. Denoising diffusion implicit models.arXiv:2010.02502, October
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[10]
Denoising Diffusion Implicit Models
URL https://arxiv.org/abs/2010.02502. Springenberg, J. T., Dosovitskiy, A., Brox, T., and Ried- miller, M. Striving for simplicity: The all convolutional net.arXiv preprint arXiv:1412.6806,
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[11]
Wah, C., Branson, S., Welinder, P., Perona, P., and Belongie, S. Caltech-ucsd birds-200-2011. Technical Report CNS- TR-2011-001, California Institute of Technology,
work page 2011
-
[12]
dt.(13) This formulation corresponds to the unique attribution method satisfying axioms such as sensitivity and implementation invariance under the straight-line constraint. Our work generalizes this by relaxing the straight-line requirement to allow paths induced by generative flows fitted to the data distribution. A.3. Flow Matching Details In our frame...
work page 2022
-
[13]
(theoretical) 4 2 0 2 4 Residual (Flow - Classical) 1e 6 0 100000 200000 300000 400000 500000 600000 700000Density Residual Distribution (K=200) Mean: -1.33e-07, Std: 2.41e-06 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Classical Shapley 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Path-Integrated Shapley Comparison Across Different K K=10 K=20 K=50 K=10...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.