Invariant Feature Extraction Through Conditional Independence and the Optimal Transport Barycenter Problem: the Gaussian case
Pith reviewed 2026-05-16 20:11 UTC · model grok-4.3
The pith
Invariant features predicting Y are recovered as the leading eigenvectors of a matrix built from the optimal transport barycenter of confounders Z given Y
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In the Gaussian case the penalization of statistical dependence between W and Z conditioned on Y is equivalent to plain independence between W and the random variable Z_Y that solves the Monge optimal transport barycenter problem for Z given Y. This equivalence produces a linear feature extractor given by the first d eigenvectors of a known matrix, and the construction remains valid when only surrogate contextual variables are available under a full-rank covariance condition.
What carries the argument
The Monge optimal transport barycenter Z_Y of Z conditioned on Y, which converts the conditional-independence requirement into an unconditional independence constraint that admits a closed-form eigenvector solution under joint Gaussianity.
If this is right
- The feature extractor is given explicitly by the leading eigenvectors of a matrix assembled from the problem covariances.
- Observable surrogate variables S can replace the true confounders Z without any loss of guarantee whenever the covariance between Z and S has full rank.
- The same linear construction extends with only small modifications to non-Gaussian and nonlinear feature maps.
Where Pith is reading between the lines
- Optimal transport barycenters may supply a canonical reduction of conditional independence tasks to unconditional ones in other statistical settings beyond feature extraction.
- The required matrix can be estimated directly from sample covariances, making the procedure immediately computable on moderate-dimensional data.
- Empirical checks on non-Gaussian data would show how far the eigenvector solution continues to deliver approximate invariance outside the Gaussian regime.
Load-bearing premise
The penalization of conditional dependence between the features and the confounders is exactly equivalent to independence from the optimal transport barycenter only when all variables are jointly Gaussian.
What would settle it
Generate samples from a known multivariate Gaussian joint distribution of X, Y and Z; compute the proposed eigenvector-based features W and test whether W remains independent of Z given Y; observed conditional dependence would falsify the claimed equivalence.
Figures
read the original abstract
A methodology is developed to extract $d$ invariant features $W=f(X)$ that predict a response variable $Y$ without being confounded by variables $Z$ that may influence both $X$ and $Y$. The methodology's main ingredient is the penalization of any statistical dependence between $W$ and $Z$ conditioned on $Y$, replaced by the more readily implementable plain independence between $W$ and the random variable $Z_Y = T(Z,Y)$ that solves the [Monge] Optimal Transport Barycenter Problem for $Z\mid Y$. In the Gaussian case considered in this article, the two statements are equivalent. When the true confounders $Z$ are unknown, other measurable contextual variables $S$ can be used as surrogates, a replacement that involves no relaxation in the Gaussian case if the covariance matrix $\Sigma_{ZS}$ has full range. The resulting linear feature extractor adopts a closed form in terms of the first $d$ eigenvectors of a known matrix. The procedure extends with little change to more general, non-Gaussian / non-linear cases.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a method to extract d invariant linear features W from X that predict Y while remaining unconfounded by Z. It replaces penalization of conditional dependence between W and Z given Y with plain independence between W and the Monge optimal-transport barycenter variable Z_Y = T(Z,Y). The central claim is that these two independence statements are equivalent in the Gaussian case, which permits the feature extractor to be recovered in closed form as the leading d eigenvectors of an explicitly known covariance-derived matrix. The construction extends to surrogate variables S (when Σ_ZS has full column rank) with no relaxation and is asserted to extend with little change to non-Gaussian settings.
Significance. If the asserted Gaussian equivalence is rigorously established, the work would supply a computationally attractive, closed-form linear solution for invariant feature extraction that directly links optimal-transport barycenters to conditional-independence penalties. The surrogate-variable extension without relaxation would be a further practical strength. At present the manuscript provides only the statement of the equivalence and the eigenvector claim, with no derivations, proofs, or numerical checks, so the significance remains prospective rather than demonstrated.
major comments (3)
- [Abstract] Abstract: the claim that 'in the Gaussian case ... the two statements are equivalent' is load-bearing for the entire closed-form eigenvector construction yet is asserted without derivation, proof, or even a sketch of the argument. Explicit verification is required for arbitrary joint Gaussians (including cases where the conditional covariance of Z given Y is non-constant), because the equivalence may fail outside special sub-families and would then invalidate the claimed invariance property.
- [Main construction] Main construction (Gaussian case): the manuscript states that the linear extractor is given by the first d eigenvectors of a known matrix built from covariances, but neither the explicit matrix nor the algebraic steps connecting the OT barycenter map to this matrix are supplied. Without this derivation the closed-form claim cannot be assessed.
- [Surrogate extension] Surrogate-variable extension: the assertion that replacement of Z by S incurs 'no relaxation' when Σ_ZS has full column rank is presented as immediate from the Gaussian equivalence, but no argument is given showing that the barycenter map for the surrogate preserves the required independence properties for arbitrary full-rank covariances.
minor comments (2)
- [Abstract] The abstract refers to 'the Gaussian case considered in this article' without indicating the precise assumptions (e.g., joint normality, conditional covariance structure) under which the equivalence is claimed; a short clarifying sentence would improve readability.
- Notation for the barycenter map T(Z,Y) and the resulting random variable Z_Y is introduced without an explicit definition or reference to the Monge problem formulation used; a brief equation or citation in the main text would clarify the construction.
Simulated Author's Rebuttal
We are grateful to the referee for the thorough review and valuable suggestions. We will address all the points raised by providing the missing derivations and proofs in a revised version of the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'in the Gaussian case ... the two statements are equivalent' is load-bearing for the entire closed-form eigenvector construction yet is asserted without derivation, proof, or even a sketch of the argument. Explicit verification is required for arbitrary joint Gaussians (including cases where the conditional covariance of Z given Y is non-constant), because the equivalence may fail outside special sub-families and would then invalidate the claimed invariance property.
Authors: We thank the referee for highlighting this. The equivalence holds for general joint Gaussians, including non-constant conditional covariances, because the OT barycenter map T(Z,Y) is an affine transformation that preserves the necessary independence relations under Gaussianity. In the revision, we will provide a full derivation and proof of this equivalence, including verification for the general case. We will also add a brief sketch to the abstract. revision: yes
-
Referee: [Main construction] Main construction (Gaussian case): the manuscript states that the linear extractor is given by the first d eigenvectors of a known matrix built from covariances, but neither the explicit matrix nor the algebraic steps connecting the OT barycenter map to this matrix are supplied. Without this derivation the closed-form claim cannot be assessed.
Authors: We agree that the algebraic steps and explicit form of the matrix need to be supplied. The matrix is constructed from the covariance of X, the cross-covariances involving the barycenter, and the conditional covariances. In the revised manuscript, we will include the complete derivation showing how the independence condition translates to the eigenvector problem for this specific matrix. revision: yes
-
Referee: [Surrogate extension] Surrogate-variable extension: the assertion that replacement of Z by S incurs 'no relaxation' when Σ_ZS has full column rank is presented as immediate from the Gaussian equivalence, but no argument is given showing that the barycenter map for the surrogate preserves the required independence properties for arbitrary full-rank covariances.
Authors: We will expand this section with a detailed argument. Since in the Gaussian case the relationship between Z and S is linear (via the full-rank covariance), the barycenter for S can be shown to induce the same independence constraints on W as the original Z, without relaxation. The proof will be added to the revision. revision: yes
Circularity Check
No circularity: Gaussian equivalence and eigenvector closed form are independently derived from OT barycenter construction
full rationale
The paper's central step replaces the conditional-independence penalty with independence from the Monge OT barycenter variable Z_Y and asserts equivalence under joint Gaussianity; this equivalence is not definitional but is stated to follow from the properties of Gaussian optimal transport maps, after which the linear extractor is recovered as the leading eigenvectors of an explicitly constructed covariance matrix. No parameter is fitted on a data subset and then relabeled as a prediction, no self-citation chain is invoked to justify the equivalence itself, and the matrix is built directly from the joint second-moment structure without reference to the target W. The derivation therefore remains self-contained once the Gaussian assumption and the OT barycenter definition are granted; the claimed closed form is a consequence rather than a restatement of the inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption All relevant random variables are jointly Gaussian
- domain assumption Covariance matrix between true confounders Z and surrogates S has full column rank
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
In the Gaussian case ... W⊥⊥Z|Y ⇔ W⊥⊥Z_Y ... a∗ is the normalized eigenvector corresponding to the largest eigenvalue of H = (1-λ)CCᵀ - λDDᵀ
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Barycenters in the wasserstein space.SIAM Journal on Mathematical Analysis, 43(2), 2011
Martial Agueh and Guillaume Carlier. Barycenters in the wasserstein space.SIAM Journal on Mathematical Analysis, 43(2), 2011
work page 2011
-
[2]
Pedro C Álvarez-Esteban, E Del Barrio, JA Cuesta-Albertos, and C Matrán. A fixed- point approach to barycenters in wasserstein space.Journal of Mathematical Analysis and Applications, 441(2):744–762, 2016
work page 2016
-
[3]
Phd thesis, New York University, 2020
Martin Arjovsky.Out of Distribution Generalization in Machine Learning. Phd thesis, New York University, 2020
work page 2020
-
[4]
Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization.arXiv preprint arXiv:1907.02893, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[5]
Manli Cheng, Subha Maity, Qinglong Tian, and Pengfei Li. Transfer learning un- der group-label shift: A semiparametric exponential tilting approach.arXiv preprint arXiv:2509.22268, 2025
-
[6]
A Philip Dawid. Conditional independence in statistical theory.Journal of the Royal Statistical Society Series B: Statistical Methodology, 41(1):1–15, 1979
work page 1979
-
[7]
Environment invariant linear least squares.The Annals of Statistics, 52(5), 2024
Jianqing Fan, Cong Fang, Yihong Gu, and Tong Zhang. Environment invariant linear least squares.The Annals of Statistics, 52(5), 2024
work page 2024
-
[8]
Domain-adversarial training of neural networks.Journal of Machine Learning Research, 17(59), 2016
Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks.Journal of Machine Learning Research, 17(59), 2016
work page 2016
-
[9]
Equality of opportunity in supervised learning
Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. Advances in neural information processing systems, 29, 2016. 24
work page 2016
-
[10]
Jimmy Hickey, Jonathan P. Williams, and Emily C. Hector. Transfer learning with uncertainty quantification: Random effect calibration of source to target (recast).Journal of Machine Learning Research, 25(338):1–40, 2024
work page 2024
-
[11]
Agostina J Larrazabal, Nicolás Nieto, Victoria Peterson, Diego H Milone, and Enzo Ferrante. Gender imbalance in medical imaging datasets produces biased classi- fiers for computer-aided diagnosis.Proceedings of the National Academy of Sciences, 117(23):12592–12594, 2020
work page 2020
-
[12]
Andrew D. Lipnick, Esteban G. Tabak, Giulio Trigila, Yating Wang, Xuancheng Ye, and Wenjun Zhao. The monge optimal transport barycenter problem, 2025
work page 2025
-
[13]
Detecting and correcting for label shift with black box predictors
Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. Detecting and correcting for label shift with black box predictors. InInternational conference on machine learning, pages 3122–3130. PMLR, 2018
work page 2018
-
[14]
A survey on bias and fairness in machine learning, 2022
Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. A survey on bias and fairness in machine learning, 2022
work page 2022
-
[15]
Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. Causal inference by using invariant prediction: identification and confidence intervals.Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5), 2016
work page 2016
-
[16]
Gabriel Peyré, Marco Cuturi, et al. Computational optimal transport: With applications to data science.Foundations and Trends®in Machine Learning, 11(5-6):355–607, 2019
work page 2019
-
[17]
Out- of-distribution generalization in the presence of nuisance-induced spurious correlations
Aahlad Manas Puli, Lily H Zhang, Eric Karl Oermann, and Rajesh Ranganath. Out- of-distribution generalization in the presence of nuisance-induced spurious correlations. ICLR 2022, 2021
work page 2022
-
[18]
Dominik Rothenhäusler, Nicolai Meinshausen, Peter Bühlmann, and Daniel Marbach. Anchor regression: Heterogeneity-aware regression.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 83(2), 2021
work page 2021
-
[19]
Bernhard Schölkopf, Dominik Janzing, Jonas Peters, Eleni Sgouritsa, Kun Zhang, and JorisM.Mooij. Oncausalandanticausallearning. InProceedings of the 29th International Conference on Machine Learning (ICML) – Workshop on Causality, pages 1–40, 2012
work page 2012
-
[20]
Toward causal representation learning
Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalch- brenner, Anirudh Goyal, and Yoshua Bengio. Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634, 2021
work page 2021
-
[21]
Tabak, Giulio Trigila, and Wenjun Zhao
Esteban G. Tabak, Giulio Trigila, and Wenjun Zhao. Data driven conditional optimal transport.Machine Learning, 2021
work page 2021
-
[22]
Coun- terfactual invariance to spurious correlations in text classification
Victor Veitch, Alexander D'Amour, Steve Yadlowsky, and Jacob Eisenstein. Coun- terfactual invariance to spurious correlations in text classification. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 16196–16208. Curran Associates, Inc., 2021. 25
work page 2021
-
[23]
Hongkang Yang and Esteban G. Tabak. Conditional density estimation, latent variable discovery, and optimal transport.Communications on Pure and Applied Mathematics, 2020
work page 2020
-
[24]
Jingkang Yang, Kaiyang Zhou, Yixuan Li, and Ziwei Liu. Generalized out-of-distribution detection: A survey.International Journal of Computer Vision, 132(12):5635–5662, 2024
work page 2024
-
[25]
Fan Zhou, Zhuqing Jiang, Changjian Shui, Boyu Wang, and Brahim Chaib-Draa. Domain generalization via optimal transport with metric similarity learning.Neurocomputing, 456, 2021. 26
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.