pith. sign in

arxiv: 2512.20914 · v2 · submitted 2025-12-24 · 🧮 math.ST · stat.AP· stat.ML· stat.TH

Invariant Feature Extraction Through Conditional Independence and the Optimal Transport Barycenter Problem: the Gaussian case

Pith reviewed 2026-05-16 20:11 UTC · model grok-4.3

classification 🧮 math.ST stat.APstat.MLstat.TH
keywords invariant feature extractionoptimal transport barycenterMonge problemGaussian caseconditional independenceeigenvector extractorsurrogate variablesconfounder adjustment
0
0 comments X

The pith

Invariant features predicting Y are recovered as the leading eigenvectors of a matrix built from the optimal transport barycenter of confounders Z given Y

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to extract d invariant features W from X that predict a response Y without being confounded by variables Z. It replaces the penalization of conditional dependence between W and Z given Y with the simpler requirement of plain independence between W and a transformed variable Z_Y, where Z_Y solves the Monge optimal transport barycenter problem for the conditional law of Z given Y. In the Gaussian case this substitution is exactly equivalent, so the optimal linear map is recovered directly from the first d eigenvectors of an explicitly known matrix. When the true confounders are unobserved, measurable surrogate variables S may be used instead provided the covariance matrix between Z and S has full rank, again without relaxation under the Gaussian assumption. The same construction extends with little change to non-Gaussian and nonlinear settings.

Core claim

In the Gaussian case the penalization of statistical dependence between W and Z conditioned on Y is equivalent to plain independence between W and the random variable Z_Y that solves the Monge optimal transport barycenter problem for Z given Y. This equivalence produces a linear feature extractor given by the first d eigenvectors of a known matrix, and the construction remains valid when only surrogate contextual variables are available under a full-rank covariance condition.

What carries the argument

The Monge optimal transport barycenter Z_Y of Z conditioned on Y, which converts the conditional-independence requirement into an unconditional independence constraint that admits a closed-form eigenvector solution under joint Gaussianity.

If this is right

  • The feature extractor is given explicitly by the leading eigenvectors of a matrix assembled from the problem covariances.
  • Observable surrogate variables S can replace the true confounders Z without any loss of guarantee whenever the covariance between Z and S has full rank.
  • The same linear construction extends with only small modifications to non-Gaussian and nonlinear feature maps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Optimal transport barycenters may supply a canonical reduction of conditional independence tasks to unconditional ones in other statistical settings beyond feature extraction.
  • The required matrix can be estimated directly from sample covariances, making the procedure immediately computable on moderate-dimensional data.
  • Empirical checks on non-Gaussian data would show how far the eigenvector solution continues to deliver approximate invariance outside the Gaussian regime.

Load-bearing premise

The penalization of conditional dependence between the features and the confounders is exactly equivalent to independence from the optimal transport barycenter only when all variables are jointly Gaussian.

What would settle it

Generate samples from a known multivariate Gaussian joint distribution of X, Y and Z; compute the proposed eigenvector-based features W and test whether W remains independent of Z given Y; observed conditional dependence would falsify the claimed equivalence.

Figures

Figures reproduced from arXiv: 2512.20914 by Esteban Tabak, Ian Bounos, Mariela Sued, Pablo Groisman.

Figure 1
Figure 1. Figure 1: Scatterplot comparing the best target MSE achieved by the barycentric method [PITH_FULL_IMAGE:figures/full_fig_p019_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Frobenius distance between source and target covariance matrices, grouped by [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Conditional correlation curves λ 7→ ∥ Corr(Wλ, Z | Y )∥F over many independently generated environments. Despite random variation, all curves exhibit the same structural behaviour: a relatively flat region for small λ followed by a systematic decay and eventual approach to invariance as λ → 1. Given each covariance, finite samples are drawn and the observed variables are generated according to the structur… view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of the optimal λ in the second experiment. As illustrated in Figures 4 and 5, the empirical distribution of λ ∗ exhibits a clear and consistent bimodality across both experiments. The peaks are concentrated at the boundaries of the interval, specifically at λ = 0 and λ = 1. The mode at λ = 0 corresponds to cases where the structural shift is negligible, meaning the source and target distributi… view at source ↗
read the original abstract

A methodology is developed to extract $d$ invariant features $W=f(X)$ that predict a response variable $Y$ without being confounded by variables $Z$ that may influence both $X$ and $Y$. The methodology's main ingredient is the penalization of any statistical dependence between $W$ and $Z$ conditioned on $Y$, replaced by the more readily implementable plain independence between $W$ and the random variable $Z_Y = T(Z,Y)$ that solves the [Monge] Optimal Transport Barycenter Problem for $Z\mid Y$. In the Gaussian case considered in this article, the two statements are equivalent. When the true confounders $Z$ are unknown, other measurable contextual variables $S$ can be used as surrogates, a replacement that involves no relaxation in the Gaussian case if the covariance matrix $\Sigma_{ZS}$ has full range. The resulting linear feature extractor adopts a closed form in terms of the first $d$ eigenvectors of a known matrix. The procedure extends with little change to more general, non-Gaussian / non-linear cases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript develops a method to extract d invariant linear features W from X that predict Y while remaining unconfounded by Z. It replaces penalization of conditional dependence between W and Z given Y with plain independence between W and the Monge optimal-transport barycenter variable Z_Y = T(Z,Y). The central claim is that these two independence statements are equivalent in the Gaussian case, which permits the feature extractor to be recovered in closed form as the leading d eigenvectors of an explicitly known covariance-derived matrix. The construction extends to surrogate variables S (when Σ_ZS has full column rank) with no relaxation and is asserted to extend with little change to non-Gaussian settings.

Significance. If the asserted Gaussian equivalence is rigorously established, the work would supply a computationally attractive, closed-form linear solution for invariant feature extraction that directly links optimal-transport barycenters to conditional-independence penalties. The surrogate-variable extension without relaxation would be a further practical strength. At present the manuscript provides only the statement of the equivalence and the eigenvector claim, with no derivations, proofs, or numerical checks, so the significance remains prospective rather than demonstrated.

major comments (3)
  1. [Abstract] Abstract: the claim that 'in the Gaussian case ... the two statements are equivalent' is load-bearing for the entire closed-form eigenvector construction yet is asserted without derivation, proof, or even a sketch of the argument. Explicit verification is required for arbitrary joint Gaussians (including cases where the conditional covariance of Z given Y is non-constant), because the equivalence may fail outside special sub-families and would then invalidate the claimed invariance property.
  2. [Main construction] Main construction (Gaussian case): the manuscript states that the linear extractor is given by the first d eigenvectors of a known matrix built from covariances, but neither the explicit matrix nor the algebraic steps connecting the OT barycenter map to this matrix are supplied. Without this derivation the closed-form claim cannot be assessed.
  3. [Surrogate extension] Surrogate-variable extension: the assertion that replacement of Z by S incurs 'no relaxation' when Σ_ZS has full column rank is presented as immediate from the Gaussian equivalence, but no argument is given showing that the barycenter map for the surrogate preserves the required independence properties for arbitrary full-rank covariances.
minor comments (2)
  1. [Abstract] The abstract refers to 'the Gaussian case considered in this article' without indicating the precise assumptions (e.g., joint normality, conditional covariance structure) under which the equivalence is claimed; a short clarifying sentence would improve readability.
  2. Notation for the barycenter map T(Z,Y) and the resulting random variable Z_Y is introduced without an explicit definition or reference to the Monge problem formulation used; a brief equation or citation in the main text would clarify the construction.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for the thorough review and valuable suggestions. We will address all the points raised by providing the missing derivations and proofs in a revised version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that 'in the Gaussian case ... the two statements are equivalent' is load-bearing for the entire closed-form eigenvector construction yet is asserted without derivation, proof, or even a sketch of the argument. Explicit verification is required for arbitrary joint Gaussians (including cases where the conditional covariance of Z given Y is non-constant), because the equivalence may fail outside special sub-families and would then invalidate the claimed invariance property.

    Authors: We thank the referee for highlighting this. The equivalence holds for general joint Gaussians, including non-constant conditional covariances, because the OT barycenter map T(Z,Y) is an affine transformation that preserves the necessary independence relations under Gaussianity. In the revision, we will provide a full derivation and proof of this equivalence, including verification for the general case. We will also add a brief sketch to the abstract. revision: yes

  2. Referee: [Main construction] Main construction (Gaussian case): the manuscript states that the linear extractor is given by the first d eigenvectors of a known matrix built from covariances, but neither the explicit matrix nor the algebraic steps connecting the OT barycenter map to this matrix are supplied. Without this derivation the closed-form claim cannot be assessed.

    Authors: We agree that the algebraic steps and explicit form of the matrix need to be supplied. The matrix is constructed from the covariance of X, the cross-covariances involving the barycenter, and the conditional covariances. In the revised manuscript, we will include the complete derivation showing how the independence condition translates to the eigenvector problem for this specific matrix. revision: yes

  3. Referee: [Surrogate extension] Surrogate-variable extension: the assertion that replacement of Z by S incurs 'no relaxation' when Σ_ZS has full column rank is presented as immediate from the Gaussian equivalence, but no argument is given showing that the barycenter map for the surrogate preserves the required independence properties for arbitrary full-rank covariances.

    Authors: We will expand this section with a detailed argument. Since in the Gaussian case the relationship between Z and S is linear (via the full-rank covariance), the barycenter for S can be shown to induce the same independence constraints on W as the original Z, without relaxation. The proof will be added to the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: Gaussian equivalence and eigenvector closed form are independently derived from OT barycenter construction

full rationale

The paper's central step replaces the conditional-independence penalty with independence from the Monge OT barycenter variable Z_Y and asserts equivalence under joint Gaussianity; this equivalence is not definitional but is stated to follow from the properties of Gaussian optimal transport maps, after which the linear extractor is recovered as the leading eigenvectors of an explicitly constructed covariance matrix. No parameter is fitted on a data subset and then relabeled as a prediction, no self-citation chain is invoked to justify the equivalence itself, and the matrix is built directly from the joint second-moment structure without reference to the target W. The derivation therefore remains self-contained once the Gaussian assumption and the OT barycenter definition are granted; the claimed closed form is a consequence rather than a restatement of the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the stated equivalence holding exactly when variables are jointly Gaussian and on the full-rank covariance condition when surrogates are used.

axioms (2)
  • domain assumption All relevant random variables are jointly Gaussian
    The equivalence between conditional independence and independence from the barycenter transform is asserted only for the Gaussian case.
  • domain assumption Covariance matrix between true confounders Z and surrogates S has full column rank
    Required for the surrogate replacement to involve no relaxation.

pith-pipeline@v0.9.0 · 5507 in / 1335 out tokens · 40399 ms · 2026-05-16T20:11:22.281856+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

  1. [1]

    Barycenters in the wasserstein space.SIAM Journal on Mathematical Analysis, 43(2), 2011

    Martial Agueh and Guillaume Carlier. Barycenters in the wasserstein space.SIAM Journal on Mathematical Analysis, 43(2), 2011

  2. [2]

    A fixed- point approach to barycenters in wasserstein space.Journal of Mathematical Analysis and Applications, 441(2):744–762, 2016

    Pedro C Álvarez-Esteban, E Del Barrio, JA Cuesta-Albertos, and C Matrán. A fixed- point approach to barycenters in wasserstein space.Journal of Mathematical Analysis and Applications, 441(2):744–762, 2016

  3. [3]

    Phd thesis, New York University, 2020

    Martin Arjovsky.Out of Distribution Generalization in Machine Learning. Phd thesis, New York University, 2020

  4. [4]

    Invariant Risk Minimization

    Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization.arXiv preprint arXiv:1907.02893, 2019

  5. [5]

    Transfer learning un- der group-label shift: A semiparametric exponential tilting approach.arXiv preprint arXiv:2509.22268, 2025

    Manli Cheng, Subha Maity, Qinglong Tian, and Pengfei Li. Transfer learning un- der group-label shift: A semiparametric exponential tilting approach.arXiv preprint arXiv:2509.22268, 2025

  6. [6]

    Conditional independence in statistical theory.Journal of the Royal Statistical Society Series B: Statistical Methodology, 41(1):1–15, 1979

    A Philip Dawid. Conditional independence in statistical theory.Journal of the Royal Statistical Society Series B: Statistical Methodology, 41(1):1–15, 1979

  7. [7]

    Environment invariant linear least squares.The Annals of Statistics, 52(5), 2024

    Jianqing Fan, Cong Fang, Yihong Gu, and Tong Zhang. Environment invariant linear least squares.The Annals of Statistics, 52(5), 2024

  8. [8]

    Domain-adversarial training of neural networks.Journal of Machine Learning Research, 17(59), 2016

    Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks.Journal of Machine Learning Research, 17(59), 2016

  9. [9]

    Equality of opportunity in supervised learning

    Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. Advances in neural information processing systems, 29, 2016. 24

  10. [10]

    Williams, and Emily C

    Jimmy Hickey, Jonathan P. Williams, and Emily C. Hector. Transfer learning with uncertainty quantification: Random effect calibration of source to target (recast).Journal of Machine Learning Research, 25(338):1–40, 2024

  11. [11]

    Gender imbalance in medical imaging datasets produces biased classi- fiers for computer-aided diagnosis.Proceedings of the National Academy of Sciences, 117(23):12592–12594, 2020

    Agostina J Larrazabal, Nicolás Nieto, Victoria Peterson, Diego H Milone, and Enzo Ferrante. Gender imbalance in medical imaging datasets produces biased classi- fiers for computer-aided diagnosis.Proceedings of the National Academy of Sciences, 117(23):12592–12594, 2020

  12. [12]

    Lipnick, Esteban G

    Andrew D. Lipnick, Esteban G. Tabak, Giulio Trigila, Yating Wang, Xuancheng Ye, and Wenjun Zhao. The monge optimal transport barycenter problem, 2025

  13. [13]

    Detecting and correcting for label shift with black box predictors

    Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. Detecting and correcting for label shift with black box predictors. InInternational conference on machine learning, pages 3122–3130. PMLR, 2018

  14. [14]

    A survey on bias and fairness in machine learning, 2022

    Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. A survey on bias and fairness in machine learning, 2022

  15. [15]

    Causal inference by using invariant prediction: identification and confidence intervals.Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5), 2016

    Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. Causal inference by using invariant prediction: identification and confidence intervals.Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5), 2016

  16. [16]

    Computational optimal transport: With applications to data science.Foundations and Trends®in Machine Learning, 11(5-6):355–607, 2019

    Gabriel Peyré, Marco Cuturi, et al. Computational optimal transport: With applications to data science.Foundations and Trends®in Machine Learning, 11(5-6):355–607, 2019

  17. [17]

    Out- of-distribution generalization in the presence of nuisance-induced spurious correlations

    Aahlad Manas Puli, Lily H Zhang, Eric Karl Oermann, and Rajesh Ranganath. Out- of-distribution generalization in the presence of nuisance-induced spurious correlations. ICLR 2022, 2021

  18. [18]

    Anchor regression: Heterogeneity-aware regression.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 83(2), 2021

    Dominik Rothenhäusler, Nicolai Meinshausen, Peter Bühlmann, and Daniel Marbach. Anchor regression: Heterogeneity-aware regression.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 83(2), 2021

  19. [19]

    Oncausalandanticausallearning

    Bernhard Schölkopf, Dominik Janzing, Jonas Peters, Eleni Sgouritsa, Kun Zhang, and JorisM.Mooij. Oncausalandanticausallearning. InProceedings of the 29th International Conference on Machine Learning (ICML) – Workshop on Causality, pages 1–40, 2012

  20. [20]

    Toward causal representation learning

    Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalch- brenner, Anirudh Goyal, and Yoshua Bengio. Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634, 2021

  21. [21]

    Tabak, Giulio Trigila, and Wenjun Zhao

    Esteban G. Tabak, Giulio Trigila, and Wenjun Zhao. Data driven conditional optimal transport.Machine Learning, 2021

  22. [22]

    Coun- terfactual invariance to spurious correlations in text classification

    Victor Veitch, Alexander D'Amour, Steve Yadlowsky, and Jacob Eisenstein. Coun- terfactual invariance to spurious correlations in text classification. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 16196–16208. Curran Associates, Inc., 2021. 25

  23. [23]

    Hongkang Yang and Esteban G. Tabak. Conditional density estimation, latent variable discovery, and optimal transport.Communications on Pure and Applied Mathematics, 2020

  24. [24]

    Generalized out-of-distribution detection: A survey.International Journal of Computer Vision, 132(12):5635–5662, 2024

    Jingkang Yang, Kaiyang Zhou, Yixuan Li, and Ziwei Liu. Generalized out-of-distribution detection: A survey.International Journal of Computer Vision, 132(12):5635–5662, 2024

  25. [25]

    Domain generalization via optimal transport with metric similarity learning.Neurocomputing, 456, 2021

    Fan Zhou, Zhuqing Jiang, Changjian Shui, Boyu Wang, and Brahim Chaib-Draa. Domain generalization via optimal transport with metric similarity learning.Neurocomputing, 456, 2021. 26