Invariant Feature Extraction Through Conditional Independence and the Optimal Transport Barycenter Problem: the Gaussian case

Esteban Tabak; Ian Bounos; Mariela Sued; Pablo Groisman

arxiv: 2512.20914 · v2 · submitted 2025-12-24 · 🧮 math.ST · stat.AP· stat.ML· stat.TH

Invariant Feature Extraction Through Conditional Independence and the Optimal Transport Barycenter Problem: the Gaussian case

Ian Bounos , Pablo Groisman , Mariela Sued , Esteban Tabak This is my paper

Pith reviewed 2026-05-16 20:11 UTC · model grok-4.3

classification 🧮 math.ST stat.APstat.MLstat.TH

keywords invariant feature extractionoptimal transport barycenterMonge problemGaussian caseconditional independenceeigenvector extractorsurrogate variablesconfounder adjustment

0 comments

The pith

Invariant features predicting Y are recovered as the leading eigenvectors of a matrix built from the optimal transport barycenter of confounders Z given Y

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to extract d invariant features W from X that predict a response Y without being confounded by variables Z. It replaces the penalization of conditional dependence between W and Z given Y with the simpler requirement of plain independence between W and a transformed variable Z_Y, where Z_Y solves the Monge optimal transport barycenter problem for the conditional law of Z given Y. In the Gaussian case this substitution is exactly equivalent, so the optimal linear map is recovered directly from the first d eigenvectors of an explicitly known matrix. When the true confounders are unobserved, measurable surrogate variables S may be used instead provided the covariance matrix between Z and S has full rank, again without relaxation under the Gaussian assumption. The same construction extends with little change to non-Gaussian and nonlinear settings.

Core claim

In the Gaussian case the penalization of statistical dependence between W and Z conditioned on Y is equivalent to plain independence between W and the random variable Z_Y that solves the Monge optimal transport barycenter problem for Z given Y. This equivalence produces a linear feature extractor given by the first d eigenvectors of a known matrix, and the construction remains valid when only surrogate contextual variables are available under a full-rank covariance condition.

What carries the argument

The Monge optimal transport barycenter Z_Y of Z conditioned on Y, which converts the conditional-independence requirement into an unconditional independence constraint that admits a closed-form eigenvector solution under joint Gaussianity.

If this is right

The feature extractor is given explicitly by the leading eigenvectors of a matrix assembled from the problem covariances.
Observable surrogate variables S can replace the true confounders Z without any loss of guarantee whenever the covariance between Z and S has full rank.
The same linear construction extends with only small modifications to non-Gaussian and nonlinear feature maps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Optimal transport barycenters may supply a canonical reduction of conditional independence tasks to unconditional ones in other statistical settings beyond feature extraction.
The required matrix can be estimated directly from sample covariances, making the procedure immediately computable on moderate-dimensional data.
Empirical checks on non-Gaussian data would show how far the eigenvector solution continues to deliver approximate invariance outside the Gaussian regime.

Load-bearing premise

The penalization of conditional dependence between the features and the confounders is exactly equivalent to independence from the optimal transport barycenter only when all variables are jointly Gaussian.

What would settle it

Generate samples from a known multivariate Gaussian joint distribution of X, Y and Z; compute the proposed eigenvector-based features W and test whether W remains independent of Z given Y; observed conditional dependence would falsify the claimed equivalence.

Figures

Figures reproduced from arXiv: 2512.20914 by Esteban Tabak, Ian Bounos, Mariela Sued, Pablo Groisman.

**Figure 2.** Figure 2: Frobenius distance between source and target covariance matrices, grouped by [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗

**Figure 3.** Figure 3: Conditional correlation curves λ 7→ ∥ Corr(Wλ, Z | Y )∥F over many independently generated environments. Despite random variation, all curves exhibit the same structural behaviour: a relatively flat region for small λ followed by a systematic decay and eventual approach to invariance as λ → 1. Given each covariance, finite samples are drawn and the observed variables are generated according to the structur… view at source ↗

**Figure 5.** Figure 5: Distribution of the optimal λ in the second experiment. As illustrated in Figures 4 and 5, the empirical distribution of λ ∗ exhibits a clear and consistent bimodality across both experiments. The peaks are concentrated at the boundaries of the interval, specifically at λ = 0 and λ = 1. The mode at λ = 0 corresponds to cases where the structural shift is negligible, meaning the source and target distributi… view at source ↗

read the original abstract

A methodology is developed to extract $d$ invariant features $W=f(X)$ that predict a response variable $Y$ without being confounded by variables $Z$ that may influence both $X$ and $Y$. The methodology's main ingredient is the penalization of any statistical dependence between $W$ and $Z$ conditioned on $Y$, replaced by the more readily implementable plain independence between $W$ and the random variable $Z_Y = T(Z,Y)$ that solves the [Monge] Optimal Transport Barycenter Problem for $Z\mid Y$. In the Gaussian case considered in this article, the two statements are equivalent. When the true confounders $Z$ are unknown, other measurable contextual variables $S$ can be used as surrogates, a replacement that involves no relaxation in the Gaussian case if the covariance matrix $\Sigma_{ZS}$ has full range. The resulting linear feature extractor adopts a closed form in terms of the first $d$ eigenvectors of a known matrix. The procedure extends with little change to more general, non-Gaussian / non-linear cases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims a closed-form eigenvector solution for linear invariant features in the Gaussian case by swapping conditional independence for independence from an OT barycenter transform, but that equivalence is asserted without shown derivation.

read the letter

The main thing to know is that this work replaces the usual penalty on dependence between W and Z given Y with plain independence between W and a transformed Z_Y that comes from the Monge optimal transport barycenter of Z conditional on Y. In the Gaussian setting the authors state these two conditions are equivalent, which then lets the linear extractor be recovered directly as the leading eigenvectors of a matrix assembled from the relevant covariances. When Z is unobserved they allow surrogate variables S provided the cross-covariance has full column rank, and they claim this substitution introduces no extra relaxation under Gaussianity. The non-Gaussian case is mentioned as a straightforward extension but is not developed here. What the paper does well is turn an optimization problem with a conditional-independence term into an explicit eigenvector computation, which is attractive for anyone who wants a simple, non-iterative procedure under Gaussian assumptions. The surrogate handling is also a practical touch. The soft spot is that the claimed equivalence between the conditional-independence condition and independence from the barycenter map is load-bearing for the closed form, yet the abstract and available description give no derivation or check that it holds for arbitrary joint Gaussians rather than special sub-cases such as constant conditional covariance. Without that step verified, the eigenvector formula may not deliver the invariance it promises. No simulations, counterexamples, or proof sketches appear in what is presented, so it is hard to gauge robustness or where the method actually breaks. This is aimed at researchers in causal representation learning or invariant feature extraction who work with Gaussian models and are looking for closed-form tools. A reader who needs a quick linear method under those assumptions could extract something useful, but anyone requiring guarantees should wait for the details. I would send it to peer review because the construction is clean enough that referees can test the equivalence directly and decide whether the closed form survives scrutiny.

Referee Report

3 major / 2 minor

Summary. The manuscript develops a method to extract d invariant linear features W from X that predict Y while remaining unconfounded by Z. It replaces penalization of conditional dependence between W and Z given Y with plain independence between W and the Monge optimal-transport barycenter variable Z_Y = T(Z,Y). The central claim is that these two independence statements are equivalent in the Gaussian case, which permits the feature extractor to be recovered in closed form as the leading d eigenvectors of an explicitly known covariance-derived matrix. The construction extends to surrogate variables S (when Σ_ZS has full column rank) with no relaxation and is asserted to extend with little change to non-Gaussian settings.

Significance. If the asserted Gaussian equivalence is rigorously established, the work would supply a computationally attractive, closed-form linear solution for invariant feature extraction that directly links optimal-transport barycenters to conditional-independence penalties. The surrogate-variable extension without relaxation would be a further practical strength. At present the manuscript provides only the statement of the equivalence and the eigenvector claim, with no derivations, proofs, or numerical checks, so the significance remains prospective rather than demonstrated.

major comments (3)

[Abstract] Abstract: the claim that 'in the Gaussian case ... the two statements are equivalent' is load-bearing for the entire closed-form eigenvector construction yet is asserted without derivation, proof, or even a sketch of the argument. Explicit verification is required for arbitrary joint Gaussians (including cases where the conditional covariance of Z given Y is non-constant), because the equivalence may fail outside special sub-families and would then invalidate the claimed invariance property.
[Main construction] Main construction (Gaussian case): the manuscript states that the linear extractor is given by the first d eigenvectors of a known matrix built from covariances, but neither the explicit matrix nor the algebraic steps connecting the OT barycenter map to this matrix are supplied. Without this derivation the closed-form claim cannot be assessed.
[Surrogate extension] Surrogate-variable extension: the assertion that replacement of Z by S incurs 'no relaxation' when Σ_ZS has full column rank is presented as immediate from the Gaussian equivalence, but no argument is given showing that the barycenter map for the surrogate preserves the required independence properties for arbitrary full-rank covariances.

minor comments (2)

[Abstract] The abstract refers to 'the Gaussian case considered in this article' without indicating the precise assumptions (e.g., joint normality, conditional covariance structure) under which the equivalence is claimed; a short clarifying sentence would improve readability.
Notation for the barycenter map T(Z,Y) and the resulting random variable Z_Y is introduced without an explicit definition or reference to the Monge problem formulation used; a brief equation or citation in the main text would clarify the construction.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for the thorough review and valuable suggestions. We will address all the points raised by providing the missing derivations and proofs in a revised version of the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'in the Gaussian case ... the two statements are equivalent' is load-bearing for the entire closed-form eigenvector construction yet is asserted without derivation, proof, or even a sketch of the argument. Explicit verification is required for arbitrary joint Gaussians (including cases where the conditional covariance of Z given Y is non-constant), because the equivalence may fail outside special sub-families and would then invalidate the claimed invariance property.

Authors: We thank the referee for highlighting this. The equivalence holds for general joint Gaussians, including non-constant conditional covariances, because the OT barycenter map T(Z,Y) is an affine transformation that preserves the necessary independence relations under Gaussianity. In the revision, we will provide a full derivation and proof of this equivalence, including verification for the general case. We will also add a brief sketch to the abstract. revision: yes
Referee: [Main construction] Main construction (Gaussian case): the manuscript states that the linear extractor is given by the first d eigenvectors of a known matrix built from covariances, but neither the explicit matrix nor the algebraic steps connecting the OT barycenter map to this matrix are supplied. Without this derivation the closed-form claim cannot be assessed.

Authors: We agree that the algebraic steps and explicit form of the matrix need to be supplied. The matrix is constructed from the covariance of X, the cross-covariances involving the barycenter, and the conditional covariances. In the revised manuscript, we will include the complete derivation showing how the independence condition translates to the eigenvector problem for this specific matrix. revision: yes
Referee: [Surrogate extension] Surrogate-variable extension: the assertion that replacement of Z by S incurs 'no relaxation' when Σ_ZS has full column rank is presented as immediate from the Gaussian equivalence, but no argument is given showing that the barycenter map for the surrogate preserves the required independence properties for arbitrary full-rank covariances.

Authors: We will expand this section with a detailed argument. Since in the Gaussian case the relationship between Z and S is linear (via the full-rank covariance), the barycenter for S can be shown to induce the same independence constraints on W as the original Z, without relaxation. The proof will be added to the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: Gaussian equivalence and eigenvector closed form are independently derived from OT barycenter construction

full rationale

The paper's central step replaces the conditional-independence penalty with independence from the Monge OT barycenter variable Z_Y and asserts equivalence under joint Gaussianity; this equivalence is not definitional but is stated to follow from the properties of Gaussian optimal transport maps, after which the linear extractor is recovered as the leading eigenvectors of an explicitly constructed covariance matrix. No parameter is fitted on a data subset and then relabeled as a prediction, no self-citation chain is invoked to justify the equivalence itself, and the matrix is built directly from the joint second-moment structure without reference to the target W. The derivation therefore remains self-contained once the Gaussian assumption and the OT barycenter definition are granted; the claimed closed form is a consequence rather than a restatement of the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the stated equivalence holding exactly when variables are jointly Gaussian and on the full-rank covariance condition when surrogates are used.

axioms (2)

domain assumption All relevant random variables are jointly Gaussian
The equivalence between conditional independence and independence from the barycenter transform is asserted only for the Gaussian case.
domain assumption Covariance matrix between true confounders Z and surrogates S has full column rank
Required for the surrogate replacement to involve no relaxation.

pith-pipeline@v0.9.0 · 5507 in / 1335 out tokens · 40399 ms · 2026-05-16T20:11:22.281856+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

In the Gaussian case ... W⊥⊥Z|Y ⇔ W⊥⊥Z_Y ... a∗ is the normalized eigenvector corresponding to the largest eigenvalue of H = (1-λ)CCᵀ - λDDᵀ

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

[1]

Barycenters in the wasserstein space.SIAM Journal on Mathematical Analysis, 43(2), 2011

Martial Agueh and Guillaume Carlier. Barycenters in the wasserstein space.SIAM Journal on Mathematical Analysis, 43(2), 2011

work page 2011
[2]

A fixed- point approach to barycenters in wasserstein space.Journal of Mathematical Analysis and Applications, 441(2):744–762, 2016

Pedro C Álvarez-Esteban, E Del Barrio, JA Cuesta-Albertos, and C Matrán. A fixed- point approach to barycenters in wasserstein space.Journal of Mathematical Analysis and Applications, 441(2):744–762, 2016

work page 2016
[3]

Phd thesis, New York University, 2020

Martin Arjovsky.Out of Distribution Generalization in Machine Learning. Phd thesis, New York University, 2020

work page 2020
[4]

Invariant Risk Minimization

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization.arXiv preprint arXiv:1907.02893, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[5]

Transfer learning un- der group-label shift: A semiparametric exponential tilting approach.arXiv preprint arXiv:2509.22268, 2025

Manli Cheng, Subha Maity, Qinglong Tian, and Pengfei Li. Transfer learning un- der group-label shift: A semiparametric exponential tilting approach.arXiv preprint arXiv:2509.22268, 2025

work page arXiv 2025
[6]

Conditional independence in statistical theory.Journal of the Royal Statistical Society Series B: Statistical Methodology, 41(1):1–15, 1979

A Philip Dawid. Conditional independence in statistical theory.Journal of the Royal Statistical Society Series B: Statistical Methodology, 41(1):1–15, 1979

work page 1979
[7]

Environment invariant linear least squares.The Annals of Statistics, 52(5), 2024

Jianqing Fan, Cong Fang, Yihong Gu, and Tong Zhang. Environment invariant linear least squares.The Annals of Statistics, 52(5), 2024

work page 2024
[8]

Domain-adversarial training of neural networks.Journal of Machine Learning Research, 17(59), 2016

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks.Journal of Machine Learning Research, 17(59), 2016

work page 2016
[9]

Equality of opportunity in supervised learning

Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. Advances in neural information processing systems, 29, 2016. 24

work page 2016
[10]

Williams, and Emily C

Jimmy Hickey, Jonathan P. Williams, and Emily C. Hector. Transfer learning with uncertainty quantification: Random effect calibration of source to target (recast).Journal of Machine Learning Research, 25(338):1–40, 2024

work page 2024
[11]

Gender imbalance in medical imaging datasets produces biased classi- fiers for computer-aided diagnosis.Proceedings of the National Academy of Sciences, 117(23):12592–12594, 2020

Agostina J Larrazabal, Nicolás Nieto, Victoria Peterson, Diego H Milone, and Enzo Ferrante. Gender imbalance in medical imaging datasets produces biased classi- fiers for computer-aided diagnosis.Proceedings of the National Academy of Sciences, 117(23):12592–12594, 2020

work page 2020
[12]

Lipnick, Esteban G

Andrew D. Lipnick, Esteban G. Tabak, Giulio Trigila, Yating Wang, Xuancheng Ye, and Wenjun Zhao. The monge optimal transport barycenter problem, 2025

work page 2025
[13]

Detecting and correcting for label shift with black box predictors

Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. Detecting and correcting for label shift with black box predictors. InInternational conference on machine learning, pages 3122–3130. PMLR, 2018

work page 2018
[14]

A survey on bias and fairness in machine learning, 2022

Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. A survey on bias and fairness in machine learning, 2022

work page 2022
[15]

Causal inference by using invariant prediction: identification and confidence intervals.Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5), 2016

Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. Causal inference by using invariant prediction: identification and confidence intervals.Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5), 2016

work page 2016
[16]

Computational optimal transport: With applications to data science.Foundations and Trends®in Machine Learning, 11(5-6):355–607, 2019

Gabriel Peyré, Marco Cuturi, et al. Computational optimal transport: With applications to data science.Foundations and Trends®in Machine Learning, 11(5-6):355–607, 2019

work page 2019
[17]

Out- of-distribution generalization in the presence of nuisance-induced spurious correlations

Aahlad Manas Puli, Lily H Zhang, Eric Karl Oermann, and Rajesh Ranganath. Out- of-distribution generalization in the presence of nuisance-induced spurious correlations. ICLR 2022, 2021

work page 2022
[18]

Anchor regression: Heterogeneity-aware regression.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 83(2), 2021

Dominik Rothenhäusler, Nicolai Meinshausen, Peter Bühlmann, and Daniel Marbach. Anchor regression: Heterogeneity-aware regression.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 83(2), 2021

work page 2021
[19]

Oncausalandanticausallearning

Bernhard Schölkopf, Dominik Janzing, Jonas Peters, Eleni Sgouritsa, Kun Zhang, and JorisM.Mooij. Oncausalandanticausallearning. InProceedings of the 29th International Conference on Machine Learning (ICML) – Workshop on Causality, pages 1–40, 2012

work page 2012
[20]

Toward causal representation learning

Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalch- brenner, Anirudh Goyal, and Yoshua Bengio. Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634, 2021

work page 2021
[21]

Tabak, Giulio Trigila, and Wenjun Zhao

Esteban G. Tabak, Giulio Trigila, and Wenjun Zhao. Data driven conditional optimal transport.Machine Learning, 2021

work page 2021
[22]

Coun- terfactual invariance to spurious correlations in text classification

Victor Veitch, Alexander D'Amour, Steve Yadlowsky, and Jacob Eisenstein. Coun- terfactual invariance to spurious correlations in text classification. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 16196–16208. Curran Associates, Inc., 2021. 25

work page 2021
[23]

Hongkang Yang and Esteban G. Tabak. Conditional density estimation, latent variable discovery, and optimal transport.Communications on Pure and Applied Mathematics, 2020

work page 2020
[24]

Generalized out-of-distribution detection: A survey.International Journal of Computer Vision, 132(12):5635–5662, 2024

Jingkang Yang, Kaiyang Zhou, Yixuan Li, and Ziwei Liu. Generalized out-of-distribution detection: A survey.International Journal of Computer Vision, 132(12):5635–5662, 2024

work page 2024
[25]

Domain generalization via optimal transport with metric similarity learning.Neurocomputing, 456, 2021

Fan Zhou, Zhuqing Jiang, Changjian Shui, Boyu Wang, and Brahim Chaib-Draa. Domain generalization via optimal transport with metric similarity learning.Neurocomputing, 456, 2021. 26

work page 2021

[1] [1]

Barycenters in the wasserstein space.SIAM Journal on Mathematical Analysis, 43(2), 2011

Martial Agueh and Guillaume Carlier. Barycenters in the wasserstein space.SIAM Journal on Mathematical Analysis, 43(2), 2011

work page 2011

[2] [2]

A fixed- point approach to barycenters in wasserstein space.Journal of Mathematical Analysis and Applications, 441(2):744–762, 2016

Pedro C Álvarez-Esteban, E Del Barrio, JA Cuesta-Albertos, and C Matrán. A fixed- point approach to barycenters in wasserstein space.Journal of Mathematical Analysis and Applications, 441(2):744–762, 2016

work page 2016

[3] [3]

Phd thesis, New York University, 2020

Martin Arjovsky.Out of Distribution Generalization in Machine Learning. Phd thesis, New York University, 2020

work page 2020

[4] [4]

Invariant Risk Minimization

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization.arXiv preprint arXiv:1907.02893, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907

[5] [5]

Transfer learning un- der group-label shift: A semiparametric exponential tilting approach.arXiv preprint arXiv:2509.22268, 2025

Manli Cheng, Subha Maity, Qinglong Tian, and Pengfei Li. Transfer learning un- der group-label shift: A semiparametric exponential tilting approach.arXiv preprint arXiv:2509.22268, 2025

work page arXiv 2025

[6] [6]

Conditional independence in statistical theory.Journal of the Royal Statistical Society Series B: Statistical Methodology, 41(1):1–15, 1979

A Philip Dawid. Conditional independence in statistical theory.Journal of the Royal Statistical Society Series B: Statistical Methodology, 41(1):1–15, 1979

work page 1979

[7] [7]

Environment invariant linear least squares.The Annals of Statistics, 52(5), 2024

Jianqing Fan, Cong Fang, Yihong Gu, and Tong Zhang. Environment invariant linear least squares.The Annals of Statistics, 52(5), 2024

work page 2024

[8] [8]

Domain-adversarial training of neural networks.Journal of Machine Learning Research, 17(59), 2016

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks.Journal of Machine Learning Research, 17(59), 2016

work page 2016

[9] [9]

Equality of opportunity in supervised learning

Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. Advances in neural information processing systems, 29, 2016. 24

work page 2016

[10] [10]

Williams, and Emily C

Jimmy Hickey, Jonathan P. Williams, and Emily C. Hector. Transfer learning with uncertainty quantification: Random effect calibration of source to target (recast).Journal of Machine Learning Research, 25(338):1–40, 2024

work page 2024

[11] [11]

Gender imbalance in medical imaging datasets produces biased classi- fiers for computer-aided diagnosis.Proceedings of the National Academy of Sciences, 117(23):12592–12594, 2020

Agostina J Larrazabal, Nicolás Nieto, Victoria Peterson, Diego H Milone, and Enzo Ferrante. Gender imbalance in medical imaging datasets produces biased classi- fiers for computer-aided diagnosis.Proceedings of the National Academy of Sciences, 117(23):12592–12594, 2020

work page 2020

[12] [12]

Lipnick, Esteban G

Andrew D. Lipnick, Esteban G. Tabak, Giulio Trigila, Yating Wang, Xuancheng Ye, and Wenjun Zhao. The monge optimal transport barycenter problem, 2025

work page 2025

[13] [13]

Detecting and correcting for label shift with black box predictors

Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. Detecting and correcting for label shift with black box predictors. InInternational conference on machine learning, pages 3122–3130. PMLR, 2018

work page 2018

[14] [14]

A survey on bias and fairness in machine learning, 2022

Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. A survey on bias and fairness in machine learning, 2022

work page 2022

[15] [15]

Causal inference by using invariant prediction: identification and confidence intervals.Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5), 2016

Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. Causal inference by using invariant prediction: identification and confidence intervals.Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5), 2016

work page 2016

[16] [16]

Computational optimal transport: With applications to data science.Foundations and Trends®in Machine Learning, 11(5-6):355–607, 2019

Gabriel Peyré, Marco Cuturi, et al. Computational optimal transport: With applications to data science.Foundations and Trends®in Machine Learning, 11(5-6):355–607, 2019

work page 2019

[17] [17]

Out- of-distribution generalization in the presence of nuisance-induced spurious correlations

Aahlad Manas Puli, Lily H Zhang, Eric Karl Oermann, and Rajesh Ranganath. Out- of-distribution generalization in the presence of nuisance-induced spurious correlations. ICLR 2022, 2021

work page 2022

[18] [18]

Anchor regression: Heterogeneity-aware regression.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 83(2), 2021

Dominik Rothenhäusler, Nicolai Meinshausen, Peter Bühlmann, and Daniel Marbach. Anchor regression: Heterogeneity-aware regression.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 83(2), 2021

work page 2021

[19] [19]

Oncausalandanticausallearning

Bernhard Schölkopf, Dominik Janzing, Jonas Peters, Eleni Sgouritsa, Kun Zhang, and JorisM.Mooij. Oncausalandanticausallearning. InProceedings of the 29th International Conference on Machine Learning (ICML) – Workshop on Causality, pages 1–40, 2012

work page 2012

[20] [20]

Toward causal representation learning

Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalch- brenner, Anirudh Goyal, and Yoshua Bengio. Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634, 2021

work page 2021

[21] [21]

Tabak, Giulio Trigila, and Wenjun Zhao

Esteban G. Tabak, Giulio Trigila, and Wenjun Zhao. Data driven conditional optimal transport.Machine Learning, 2021

work page 2021

[22] [22]

Coun- terfactual invariance to spurious correlations in text classification

Victor Veitch, Alexander D'Amour, Steve Yadlowsky, and Jacob Eisenstein. Coun- terfactual invariance to spurious correlations in text classification. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 16196–16208. Curran Associates, Inc., 2021. 25

work page 2021

[23] [23]

Hongkang Yang and Esteban G. Tabak. Conditional density estimation, latent variable discovery, and optimal transport.Communications on Pure and Applied Mathematics, 2020

work page 2020

[24] [24]

Generalized out-of-distribution detection: A survey.International Journal of Computer Vision, 132(12):5635–5662, 2024

Jingkang Yang, Kaiyang Zhou, Yixuan Li, and Ziwei Liu. Generalized out-of-distribution detection: A survey.International Journal of Computer Vision, 132(12):5635–5662, 2024

work page 2024

[25] [25]

Domain generalization via optimal transport with metric similarity learning.Neurocomputing, 456, 2021

Fan Zhou, Zhuqing Jiang, Changjian Shui, Boyu Wang, and Brahim Chaib-Draa. Domain generalization via optimal transport with metric similarity learning.Neurocomputing, 456, 2021. 26

work page 2021