Counterfactually Fair Regression via Optimal Transport
Pith reviewed 2026-06-29 09:59 UTC · model grok-4.3
The pith
Counterfactual fairness equals demographic parity conditional on the latent variable, giving a closed-form optimal fair regressor via barycentric quantile map.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Counterfactual fairness is equivalent to satisfying demographic parity conditional on the latent variable. This equivalence yields a closed-form expression of the optimal fair regressor via a barycentric quantile map. The discretized post-processing method provides finite-sample fairness guarantees at rate Õ(n^{-1/3}) with matching risk bounds and a matching lower bound on excess risk for almost fair predictions.
What carries the argument
Barycentric quantile map obtained from optimal transport that enforces conditional demographic parity while minimizing regression risk.
If this is right
- The optimal fair regressor admits an explicit closed-form expression via the barycentric quantile map.
- The estimator satisfies high-probability finite-sample fairness at rate Õ(n^{-1/3}).
- A matching lower bound holds for the excess risk of predictors that are almost counterfactually fair.
- The same guarantees extend to relaxed counterfactual fairness.
Where Pith is reading between the lines
- The discretization step may be reusable for other post-processing fairness methods that involve continuous conditioning variables.
- The cubic dependence on sample size suggests practical limits on how small the fairness tolerance can be made without very large datasets.
- Similar optimal-transport reductions could apply to other causal fairness definitions that involve latent noise resampling.
Load-bearing premise
The underlying distributions satisfy mild regularity conditions that justify the convergence of the empirical maps and the discretization approximation.
What would settle it
On data drawn from distributions satisfying the regularity conditions, observe whether the post-processed estimator's conditional demographic parity violation decays slower than n to the power of minus one third with high probability.
Figures
read the original abstract
We consider the problem of learning a counterfactually fair regressor. We adopt a causal uncertainty view in which counterfactual fairness is defined with resampled noise. We focus on obtaining theoretical fairness guarantees for a new post-processing estimator. We begin by showing that counterfactual fairness is equivalent to satisfying demographic parity conditional on the latent variable. This allows us to provide a closed-form expression of the optimal fair regressor via a barycentric quantile map. In order to handle continuous latent variables, we propose a discretized post-processing method. Then, under mild regularity assumptions, we prove high-probability finite-sample fairness guarantees for our estimator, providing an unfairness decay at rate $\tilde O(n^{-1/3})$, and establishing a matching risk bound of order $\tilde O(n^{-1/3})$. We provide a matching lower bound on the excess risk of almost fair predictions. Finally, we extend our results to the setting of relaxed counterfactual fairness. We validate our approach on real-world and synthetic data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper considers learning a counterfactually fair regressor under a resampled-noise definition of counterfactual fairness. It establishes equivalence to demographic parity conditional on the latent variable, yielding a closed-form optimal fair regressor via a barycentric quantile map. A discretized post-processing estimator is proposed for continuous latent variables. Under mild regularity assumptions, high-probability finite-sample fairness guarantees are proved with unfairness decaying at rate ilde O(n^{-1/3}), along with a matching excess-risk bound of the same order, a matching lower bound on excess risk for almost-fair predictors, and an extension to relaxed counterfactual fairness. The approach is validated on synthetic and real-world data.
Significance. If the derivations hold, the work makes a solid contribution by linking counterfactual fairness to conditional demographic parity and optimal transport, delivering an explicit closed-form solution and finite-sample rates with a matching lower bound. These elements provide concrete, falsifiable guarantees that go beyond typical heuristic post-processing in fair ML, and the use of standard one-dimensional OT regularity conditions keeps the assumptions mild and interpretable.
minor comments (2)
- [Abstract] The abstract and main text use \tilde O without an explicit definition on first appearance; adding a short clarification (e.g., hiding polylog factors) would improve readability.
- The description of the discretization step for continuous latents would benefit from a brief remark on how the grid size is chosen in practice and its effect on the stated rates.
Simulated Author's Rebuttal
We thank the referee for the positive and accurate summary of our contributions, the assessment of significance, and the recommendation for minor revision. No specific major comments were raised in the report.
Circularity Check
No significant circularity identified
full rationale
The derivation begins by establishing an equivalence between counterfactual fairness (under the resampled-noise definition) and conditional demographic parity given the latent variable; this equivalence is derived rather than assumed by definition. The optimal fair regressor is then expressed via the barycentric quantile map from one-dimensional optimal transport, which is an external property applied to the equivalence. Finite-sample high-probability fairness guarantees at rate Õ(n^{-1/3}) and the matching excess-risk bound are obtained from concentration arguments and discretization under explicitly stated mild regularity conditions on the distributions. A separate lower bound on excess risk is provided for grounding. No steps reduce by construction to fitted inputs, self-citations, or renamed ansatzes; the central claims remain independent of the paper's own outputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Mild regularity assumptions on the underlying distributions and latent variable
Reference graph
Works this paper leans on
-
[1]
Fair regression: Quantitative defi- nitions and reduction-based algorithms
Alekh Agarwal, Miroslav Dudik, and Zhiwei Steven Wu. Fair regression: Quantitative defi- nitions and reduction-based algorithms. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Proceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 120–129. PMLR, 09–15 Jun 2019. URL https:/...
2019
-
[2]
Barycenters in the wasserstein space.SIAM Journal on Mathematical Analysis, 43(2):904–924, 2011
Martial Agueh and Guillaume Carlier. Barycenters in the wasserstein space.SIAM Journal on Mathematical Analysis, 43(2):904–924, 2011. doi: 10.1137/100805741. URL https: //doi.org/10.1137/100805741
-
[3]
MIT Press, 2023
Solon Barocas, Moritz Hardt, and Arvind Narayanan.Fairness and Machine Learning: Limitations and Opportunities. MIT Press, 2023
2023
-
[4]
Emily Black, Talia Gillis, and Zara Yasmine Hall. D-hacking. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’24, page 602–615, New York, NY, USA, 2024. Association for Computing Machinery. ISBN 9798400704505. doi: 10.1145/3630106.3658928. URLhttps://doi.org/10.1145/3630106.3658928
-
[5]
Gender shades: Intersectional accuracy disparities in commercial gender classification
Joy Buolamwini and Timnit Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Sorelle A. Friedler and Christo Wilson, editors, Proceedings of the 1st Conference on Fairness, Accountability and Transparency, volume 81 ofProceedings of Machine Learning Research, pages 77–91. PMLR, 23–24 Feb 2018. URL https://pro...
2018
-
[6]
Building classifiers with indepen- dency constraints
Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. Building classifiers with indepen- dency constraints. In2009 IEEE International Conference on Data Mining Workshops, pages 13–18, 2009. doi: 10.1109/ICDMW.2009.83
-
[8]
Path-specific counterfactual fairness
Silvia Chiappa. Path-specific counterfactual fairness. InProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’19/IAAI’19/EAAI’19. AAAI Press, 2019. ISBN 978-1-57735- 809-1. d...
-
[9]
Fairness with continuous optimal transport.arXiv preprint arXiv:2101.02084, 2021
Silvia Chiappa and Aldo Pacchiano. Fairness with continuous optimal transport.arXiv preprint arXiv:2101.02084, 2021. URLhttps://arxiv.org/abs/2101.02084
-
[10]
Silvia Chiappa, Ray Jiang, Tom Stepleton, Aldo Pacchiano, Heinrich Jiang, and John P. Cunningham. A general approach to fairness with optimal transport. 34(04):3633–3640, Apr. 2020. doi: 10.1609/aaai.v34i04.5771. URLhttps://ojs.aaai.org/index.php/AAAI/ article/view/5771
-
[11]
Evgenii Chzhen and Nicolas Schreuder. A minimax framework for quantifying risk-fairness trade-off in regression.The Annals of Statistics, 50(4):2416–2442, August 2022. doi: 10.1214/22-AOS2198. URLhttps://arxiv.org/abs/2007.14265. 12
-
[13]
Fair regression with wasserstein barycenters
Evgenii Chzhen, Christophe Denis, Mohamed Hebiri, Luca Oneto, and Massimiliano Pontil. Fair regression with wasserstein barycenters. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 7321–7331. Curran Associates, Inc., 2020. URLhttps://proceedings.neurips.cc/ pape...
2020
-
[14]
Fair regression via plug-in estimator and recalibration with statistical guarantees
Evgenii Chzhen, Christophe Denis, Mohamed Hebiri, Luca Oneto, and Massimiliano Pon- til. Fair regression via plug-in estimator and recalibration with statistical guarantees. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 19137–19148. Curran As- sociates, Inc., ...
2020
-
[15]
Explaining machine learning classifiers through diverse counterfactual explanations,
Amanda Coston, Alan Mishler, Edward H. Kennedy, and Alexandra Chouldechova. Coun- terfactual risk assessments, evaluation, and fairness. InProceedings of the 2020 Con- ference on Fairness, Accountability, and Transparency, FAT* ’20, page 582–593, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450369367. doi: 10.1145/3351095.3372851...
-
[16]
Causal modeling for fairness in dynamical systems
Elliot Creager, David Madras, Toniann Pitassi, and Richard Zemel. Causal modeling for fairness in dynamical systems. InProceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020
2020
-
[17]
Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics
Kimberlé Crenshaw. Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. InFeminist legal theories, pages 23–51. Routledge, 2013
2013
-
[18]
Tri Duong, Qian Li, and Guandong Xu. Achieving counterfactual fairness with imperfect structural causal model.Expert Systems with Applications, 240:122411, 11 2023. doi: 10.1016/j.eswa.2023.122411
-
[19]
Fairness through awareness, 2011
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Rich Zemel. Fairness through awareness, 2011
2011
-
[20]
Equality of opportunity in supervised learning
Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, edi- tors,Advances in Neural Information Processing Systems, volume 29. Curran Asso- ciates, Inc., 2016. URL https://proceedings.neurips.cc/paper_files/paper/2016/ file/6a9659feb1216f14f7384ba499518b38-Paper.pdf
2016
-
[21]
Learning representations for counterfactual inference
Fredrik Johansson, Uri Shalit, and David Sontag. Learning representations for counterfactual inference. In Maria Florina Balcan and Kilian Q. Weinberger, editors,Proceedings of The 33rd International Conference on Machine Learning, volume 48 ofProceedings of Machine Learning Research, pages 3020–3029, New York, New York, USA, 20–22 Jun 2016. PMLR. URLhttp...
2016
-
[22]
Preventing fairness gerrymandering: Auditing and learning for subgroup fairness
Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In Jennifer Dy and Andreas 13 Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 2564–2572. PMLR, 2018. URLhttps://proce...
2018
-
[23]
Counterfactual fairness
Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 30. Cur- ran Associates, Inc., 2017. URLhttps://proceedings.neurips.cc/paper_files/paper/ 2017/file/a486cd...
2017
-
[24]
When worlds col- lide: Integrating different counterfactual assumptions in fairness
Matt J Kusner, Chris Russell, Joshua Loftus, and Ricardo Silva. When worlds col- lide: Integrating different counterfactual assumptions in fairness. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, ed- itors,Advances in Neural Information Processing Systems, volume 30. Curran Asso- ciates, Inc., 2017. URL htt...
2017
-
[25]
Projection to fairness in statistical learning, 2020
Thibaut Le Gouic, Jean-Michel Loubes, and Philippe Rigollet. Projection to fairness in statistical learning, 2020. URLhttps://arxiv.org/abs/2005.11720
-
[26]
Too relaxed to be fair
Michael Lohaus, Michael Perrot, and Ulrike Von Luxburg. Too relaxed to be fair. In Hal Daumé III and Aarti Singh, editors,Proceedings of the 37th International Conference on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pages 6360–6369. PMLR, 2020. URLhttps://proceedings.mlr.press/v119/lohaus20a.html. 13–18 Jul
2020
-
[27]
Learning for counterfactual fairness from observational data
Jing Ma, Ruocheng Guo, Aidong Zhang, and Jundong Li. Learning for counterfactual fairness from observational data. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, page 1620–1630. ACM, August 2023. doi: 10.1145/3580305.3599408. URLhttp://dx.doi.org/10.1145/3580305.3599408
-
[28]
Survey on causal-based machine learning fairness notions, 2022
Karima Makhlouf, Sami Zhioua, and Catuscia Palamidessi. Survey on causal-based machine learning fairness notions, 2022. URLhttps://arxiv.org/abs/2010.09553
-
[29]
P. Martinot, B. Colnet, T. Breda, J. Sultan, L. Touitou, P. Huguet, E. Spelke, G. Dehaene- Lambertz, P. Bressoux, and S. Dehaene. Rapid emergence of a maths gender gap in first grade.Nature, 643(8073):1020–1029, 2025. doi: 10.1038/s41586-025-09126-4. URL https://doi.org/10.1038/s41586-025-09126-4
-
[30]
Pascal Massart. The tight constant in the dvoretzky–kiefer–wolfowitz inequality.The Annals of Probability, 18(3):1269–1283, July 1990. doi: 10.1214/aop/1176990746. URL https://doi.org/10.1214/aop/1176990746
-
[31]
Pearl and D
J. Pearl and D. Mackenzie.The Book of Why: The New Science of Cause and Effect. Basic Books, 2018. ISBN 9780465097616. URL https://books.google.fr/books?id= BzM0DwAAQBAJ
2018
-
[32]
Cambridge University Press, 2 edition, 2009
Judea Pearl.Causality. Cambridge University Press, 2 edition, 2009
2009
-
[33]
Causal fairness analysis, 2022
Drago Plecko and Elias Bareinboim. Causal fairness analysis, 2022. URLhttps://arxiv. org/abs/2207.11385
-
[34]
FairPFN: A tabular foundation model for causal fairness
Jake Robertson, Noah Hollmann, Samuel Müller, Noor Awad, and Frank Hutter. FairPFN: A tabular foundation model for causal fairness. InForty-second International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=I8DVh2jnEA. 14
2025
-
[35]
Counterfactual fairness is basically demographic parity
Lucas Rosenblatt and R Teal Witter. Counterfactual fairness is basically demographic parity. Proceedings of AAAI, 2023
2023
-
[36]
Birkhäuser Cham, 1 edition, 2015
Filippo Santambrogio.Optimal Transport for Applied Mathematicians: Calculus of Vari- ations, PDEs, and Modeling, volume 87 ofProgress in Nonlinear Differential Equations and Their Applications. Birkhäuser Cham, 1 edition, 2015. ISBN 978-3-319-20827-5. doi: 10.1007/978-3-319-20828-2. Published 27 October 2015
-
[37]
Counterfactual fairness is not demographic parity, and other observations,
Ricardo Silva. Counterfactual fairness is not demographic parity, and other observations,
- [38]
-
[39]
Towards counterfactual fairness through auxiliary variables
Bowei Tian, Ziyao Wang, Shwai He, Wanghao Ye, Guoheng Sun, Yucong Dai, Yongkai Wu, and Ang Li. Towards counterfactual fairness through auxiliary variables. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview. net/forum?id=GpUv1FvZi1
2025
-
[40]
Tsybakov.Introduction to Nonparametric Estimation
Alexandre B. Tsybakov.Introduction to Nonparametric Estimation. Springer Series in Statistics. Springer, New York, NY, 2009. ISBN 978-0-387-79051-0. doi: 10.1007/b13794
-
[41]
A. W. van der Vaart.Quantiles and Order Statistics, page 304–315. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998
1998
-
[42]
Counterfactual explanations and algorithmic recourses for machine learning: A review
Sahil Verma, Varich Boonsanong, Minh Hoang, Keegan Hines, John Dickerson, and Chirag Shah. Counterfactual explanations and algorithmic recourses for machine learning: A review. ACM Comput. Surv., 56(12), October 2024. ISSN 0360-0300. doi: 10.1145/3677119. URL https://doi.org/10.1145/3677119
-
[43]
Counterfactual fairness: Unidentification, bound and algorithm
Yongkai Wu, Lu Zhang, and Xintao Wu. Counterfactual fairness: Unidentification, bound and algorithm. InProceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 1438–1444. International Joint Conferences on Artificial Intelligence Organization, 7 2019. doi: 10.24963/ijcai.2019/199. URL https: //doi.org/10....
-
[44]
Zeyu Zhou, Tianci Liu, Ruqi Bai, Jing Gao, Murat Kocaoglu, and David I. Inouye. Counterfactual fairness by combining factual and counterfactual predictions. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=J0Itri0UiN
2024
-
[45]
act as non-deterministic causes of observable variables
Zhiqun Zuo, Mohammad Mahdi Khalili, and Xueru Zhang. Counterfactually fair representa- tion. InThirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=QZo1cge4Tc. 15 Appendix Table of Contents A Useful background 2 A.1 Additional notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....
2023
-
[46]
Latent Knowledge:The unobserved ability is drawn from a standard normal distribution: V∼ N(0,1) 2.UGPA:Modeled as a linear function of race, sex, and knowledge: UGPA∼ N(µ GPA +w ⊤ GPAS+λ GPA V, σ 2 GPA)
-
[47]
The model is fitted using Hamiltonian Monte Carlo (HMC) with 2000 iterations
LSAT Score:Modeled as a count variable (approximated via Poisson) driven by the same factors: LSAT∼Poisson(exp(µ LSAT +w ⊤ LSATS+λ LSAT V)) 4.First-Year Average (ZFYA):The outcome variable is also a noisy linear function: ZFYA∼ N(µ ZFYA +w ⊤ ZFYAS+λ ZFYA V,1) Inference Procedure.We use the Stan implementation provided by Kusner et al.[23]. The model is fi...
2000
-
[48]
We partitionVintoLequal-width intervals{I ℓ}L ℓ=1
-
[49]
In each intervalℓand groups, we collect the scoreszi =f bb(xi, si)
-
[50]
We compute empirical quantilesqℓ,s on a fixed grid and form the barycenter quantiles qℓ,⋆ =P s wsqℓ,s
-
[51]
For a new test point with scoreznew in groupsand intervalℓ, we: •Compute its percentileτwithin the groupsdistribution via linear interpolation. •Mapτto the barycenter distribution:by=q ℓ,⋆(τ). Plug-in Selection of L∗.We select the optimal discretization level L∗ using the formula derived in Theorem 1. We estimate the Lipschitz constantLcdf using finite di...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.