Fairness criteria through the lens of directed acyclic graphical models
Pith reviewed 2026-05-25 14:47 UTC · model grok-4.3
The pith
Directed acyclic graphical models show that Equalized Odds and similar fairness criteria are misleading.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Representing fairness scenarios with directed acyclic graphical models reveals that Equalized Odds and related criteria are ultimately misleading because they do not properly reflect the information processed by the algorithm; therefore fairness criteria should be case-specific and sensitive to the nature of that information.
What carries the argument
Directed acyclic graphical models that encode causal structures and information flows between variables in algorithmic decision systems.
If this is right
- Equalized Odds will misclassify fairness in scenarios where the graph shows protected attributes influencing predictions through allowable paths.
- Calibration by Group shares similar limitations once information flows are modeled explicitly.
- The incompatibility of common criteria becomes visible as conflicting constraints on the same graph structure.
- Fairness evaluations require first specifying the exact variables and edges that represent the algorithm's inputs and processing steps.
Where Pith is reading between the lines
- Auditors may need to construct a custom graph for each application before selecting any fairness metric.
- Regulatory guidelines could shift from mandating fixed metrics toward requiring documented graphical models of the decision process.
- The same graphical approach might be applied to other fairness definitions such as individual fairness or counterfactual fairness to check consistency.
Load-bearing premise
The directed acyclic graphical models examined accurately capture the relevant causal structures and information flows in real algorithmic decision systems.
What would settle it
An empirical case study of a deployed decision system in which Equalized Odds produces outcomes that match the fairness indicated by the corresponding graphical model, or fails to do so when the model predicts it should.
Figures
read the original abstract
A substantial portion of the literature on fairness in algorithms proposes, analyzes, and operationalizes simple formulaic criteria for assessing fairness. Two of these criteria, Equalized Odds and Calibration by Group, have gained significant attention for their simplicity and intuitive appeal, but also for their incompatibility. This chapter provides a perspective on the meaning and consequences of these and other fairness criteria using graphical models which reveals Equalized Odds and related criteria to be ultimately misleading. An assessment of various graphical models suggests that fairness criteria should ultimately be case-specific and sensitive to the nature of the information the algorithm processes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript analyzes fairness criteria in algorithmic decision-making, focusing on Equalized Odds and Calibration by Group, through the lens of directed acyclic graphical models (DAGs). It illustrates incompatibilities and information-flow consequences under varying causal structures, arguing that these criteria are ultimately misleading and that appropriate fairness assessments must instead be case-specific, depending on the nature of the information processed by the algorithm.
Significance. If the illustrative DAGs capture the relevant causal structures, the work supplies a coherent conceptual framework for understanding why simple formulaic criteria can lead to unintended consequences. It strengthens the case for context-dependent fairness by deriving logical implications directly from the graphs rather than empirical fitting, which is a strength for a primarily theoretical contribution.
minor comments (3)
- The abstract refers to 'this chapter,' which suggests the work may be excerpted from a larger volume; clarify the standalone status and any cross-references in the introduction.
- Ensure that each DAG figure includes an explicit legend or caption defining all nodes (e.g., protected attribute, outcome, prediction) and the interpretation of directed edges.
- Section discussing Equalized Odds could benefit from a short table summarizing the conditions under which the criterion holds or fails across the examined graphs.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the manuscript and for recommending minor revision. The referee's summary correctly identifies the core argument that DAGs reveal limitations in formulaic criteria such as Equalized Odds and Calibration by Group, supporting the need for case-specific fairness evaluations based on information flow.
Circularity Check
No significant circularity identified
full rationale
The paper advances a conceptual argument by constructing illustrative DAGs to derive logical incompatibilities and information-flow consequences among fairness criteria such as Equalized Odds and Calibration. No equations, fitted parameters, or quantitative predictions appear; the derivations consist of direct implications from the chosen graph structures rather than any reduction to self-defined inputs or self-citation chains. The central claim that criteria must be case-specific follows from the explicit modeling choices without circular redefinition or renaming of known results. The analysis remains self-contained against external benchmarks because the graphs are presented as illustrative tools whose consequences are logically entailed by the stated causal assumptions.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Foundations of linear and generalized linear models
Alan Agresti. Foundations of linear and generalized linear models. John Wiley & Sons, 2015
work page 2015
-
[2]
Hiring by algorithm: predicting and preventing disparate impact
Ifeoma Ajunwa, Sorelle Friedler, Carlos E Scheidegger, and Suresh Venkatasubramanian. Hiring by algorithm: predicting and preventing disparate impact. SSRN Electronic Journal, 2016
work page 2016
-
[3]
Machine bias: There’s software used across the country to predict future criminals
Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks. ProPublica, 23, 2016
work page 2016
-
[4]
Solon Barocas and Andrew D Selbst. Big data's disparate impact. Calif. L. Rev., 104: 0 671, 2016
work page 2016
-
[5]
Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and Machine Learning. fairmlbook.org, 2018. http://www.fairmlbook.org
work page 2018
-
[6]
Fairness in machine learning: Lessons from political philosophy
Reuben Binns. Fairness in machine learning: Lessons from political philosophy. In Proceedings of Machine Learning Research, volume 81, 2017
work page 2017
-
[7]
Three naive bayes approaches for discrimination-free classification
Toon Calders and Sicco Verwer. Three naive bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery, 21 0 (2): 0 277--292, 2010
work page 2010
-
[8]
Building classifiers with independency constraints
Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. Building classifiers with independency constraints. In 2009 IEEE International Conference on Data Mining Workshops, pages 13--18. IEEE, 2009
work page 2009
-
[9]
Path-specific counterfactual fairness
Silvia Chiappa. Path-specific counterfactual fairness. In Thirty-Third AAAI Conference on Artificial Intelligence, 2019
work page 2019
-
[10]
Fair prediction with disparate impact: A study of bias in recidivism prediction instruments
Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5 0 (2): 0 153--163, 2017
work page 2017
-
[11]
The Frontiers of Fairness in Machine Learning
Alexandra Chouldechova and Aaron Roth. The frontiers of fairness in machine learning. arXiv:1810.08810, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[12]
The measure and mismeasure of fairness: A critical review of fair machine learning
Sam Corbett-Davies and Sharad Goel. The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv:1808.00023, 2018
-
[13]
Algorithmic decision making and the cost of fairness
Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 797--806. ACM, 2017
work page 2017
-
[14]
Obtaining fairness using optimal transport theory
Eustasio Del Barrio, Fabrice Gamboa, Paula Gordaliza, and Jean-Michel Loubes. Obtaining fairness using optimal transport theory. arXiv:1806.03195, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[15]
Empirical risk minimization under fairness constraints
Michele Donini, Luca Oneto, Shai Ben-David, John S Shawe-Taylor, and Massimiliano Pontil. Empirical risk minimization under fairness constraints. In Advances in Neural Information Processing Systems, pages 2796--2806, 2018
work page 2018
-
[16]
The romance of work: Gender and aspirational labour in the digital culture industries
Brooke Duffy. The romance of work: Gender and aspirational labour in the digital culture industries. International Journal of Cultural Studies, 19(4): 0 441--457, 2016
work page 2016
-
[17]
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214--226. ACM, 2012
work page 2012
-
[18]
Automating inequality: How high-tech tools profile, police, and punish the poor
Virginia Eubanks. Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin's Press, 2018
work page 2018
-
[19]
Anthony W Flores, Kristin Bechtel, and Christopher T Lowenkamp. False positives, false negatives, and false analyses: A rejoinder to machine bias: There's software used across the country to predict future criminals. and it's biased against blacks. Fed. Probation, 80: 0 38, 2016
work page 2016
-
[20]
Facebook won’t let employers, landlords or lenders discriminate in ads anymore
Jack Gillum and Ariana Tobin. Facebook won’t let employers, landlords or lenders discriminate in ads anymore. ProPublica, 2019
work page 2019
-
[21]
The case for process fairness in learning: Feature selection for fair decision making
Nina Grgic-Hlaca, Muhammad Bilal Zafar, Krishna P Gummadi, and Adrian Weller. The case for process fairness in learning: Feature selection for fair decision making. In NIPS Symposium on Machine Learning and the Law, volume 1, page 2, 2016
work page 2016
-
[22]
A continuous framework for fairness
Philipp Hacker and Emil Wiedemann. A continuous framework for fairness. arXiv:1712.07924, 2017
-
[23]
Equality of opportunity in supervised learning
Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of opportunity in supervised learning. In Advances in neural information processing systems, pages 3315--3323, 2016
work page 2016
-
[24]
James E Johndrow and Kristian Lum. An algorithm for removing sensitive information: application to race-independent recidivism prediction. The Annals of Applied Statistics, 13 0 (1): 0 189--220, 2019
work page 2019
-
[25]
Avoiding discrimination through causal reasoning
Niki Kilbertus, Mateo Rojas Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard Sch \"o lkopf. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems, pages 656--666, 2017
work page 2017
-
[26]
Inherent trade-offs in the fair determination of risk scores
Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. In The 8th Innovations in Theoretical Computer Science Conference, 2017
work page 2017
-
[27]
Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems, pages 4066--4076, 2017
work page 2017
-
[28]
How we analyzed the compas recidivism algorithm
Jeff Larson, Surya Mattu, Lauren Kirchner, and Julia Angwin. How we analyzed the compas recidivism algorithm. ProPublica, 9, 2016
work page 2016
-
[29]
Calibration of probabilities: The state of the art to 1980
Sarah Lichtenstein, Baruch Fischhoff, and Lawrence D Phillips. Calibration of probabilities: The state of the art to 1980. Technical report, Perceptronics: Decision Research, 1981
work page 1980
-
[30]
Limitations of mitigating judicial bias with machine learning
Kristian Lum. Limitations of mitigating judicial bias with machine learning. Nature Human Behaviour, 2017
work page 2017
-
[31]
Prediction-based decisions and fairness: A catalogue of choices, assumptions, and definitions
Shira Mitchell, Eric Potash, and Solon Barocas. Prediction-based decisions and fairness: A catalogue of choices, assumptions, and definitions. arXiv:1811.07867, 2018
-
[32]
Linear models: A mean model approach
William R Moser. Linear models: A mean model approach. Elsevier, 1996
work page 1996
-
[33]
Razieh Nabi and Ilya Shpitser. Fair inference on outcomes. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018
work page 2018
-
[34]
Weapons of math destruction: How big data increases inequality and threatens democracy
Cathy O'Neil. Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books, 2017
work page 2017
-
[35]
Causal inference in statistics: An overview
Judea Pearl. Causal inference in statistics: An overview. Statistics surveys, 3: 0 96--146, 2009 a
work page 2009
- [36]
-
[37]
Causal inference in statistics: A primer
Judea Pearl, Madelyn Glymour, and Nicholas P Jewell. Causal inference in statistics: A primer. John Wiley & Sons, 2016
work page 2016
-
[38]
Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. On fairness and calibration. In Advances in Neural Information Processing Systems, pages 5680--5689, 2017
work page 2017
-
[39]
Knowing the score: New data, underwriting and marketing in the consumer credit marketplace
Yu Robinson and H Yu. Knowing the score: New data, underwriting and marketing in the consumer credit marketplace. A Guide for Financial Inclusion Stakeholders, pages 1--34, 2014
work page 2014
-
[40]
The problem of infra-marginality in outcome tests for discrimination
Camelia Simoiu, Sam Corbett-Davies, Sharad Goel, et al. The problem of infra-marginality in outcome tests for discrimination. The Annals of Applied Statistics, 11 0 (3): 0 1193--1216, 2017
work page 2017
-
[41]
Causation, prediction, and search
Peter Spirtes, Clark N Glymour, Richard Scheines, David Heckerman, Christopher Meek, Gregory Cooper, and Thomas Richardson. Causation, prediction, and search. MIT press, 2000
work page 2000
-
[42]
New york city to ban discrimination based on hair
Stacy Stowe. New york city to ban discrimination based on hair. New York Times, 2019
work page 2019
-
[43]
Samuel Yeom and Michael Carl Tschantz. Discriminative but not discriminatory: A comparison of fairness definitions under different worldviews. arXiv:1808.08619, 2018
-
[44]
Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, pages 1171--1180. International World Wide Web Conferences Steering Committee, 2017
work page 2017
-
[45]
Governing algorithms: Myth, mess, and methods
Malte Ziewitz. Governing algorithms: Myth, mess, and methods. Science, Technology & Human Values, 41(1): 0 3--16, 2016
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.