pith. sign in

arxiv: 1906.11333 · v1 · pith:D5SGH4QQnew · submitted 2019-06-26 · 💻 cs.CY · cs.AI

Fairness criteria through the lens of directed acyclic graphical models

Pith reviewed 2026-05-25 14:47 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords algorithmic fairnessequalized oddscalibrationdirected acyclic graphscausal modelsfairness criteriainformation flows
0
0 comments X

The pith

Directed acyclic graphical models show that Equalized Odds and similar fairness criteria are misleading.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines common fairness criteria such as Equalized Odds and Calibration by Group by representing algorithmic decisions as directed acyclic graphs. These models expose how the criteria fail to align with the actual causal structures and information flows in decision systems, and they demonstrate the known incompatibility between the two criteria. A sympathetic reader would care because the analysis indicates that formulaic fairness rules applied uniformly can produce misleading assessments. The paper concludes that fairness criteria must instead be selected case by case according to the specific information an algorithm processes.

Core claim

Representing fairness scenarios with directed acyclic graphical models reveals that Equalized Odds and related criteria are ultimately misleading because they do not properly reflect the information processed by the algorithm; therefore fairness criteria should be case-specific and sensitive to the nature of that information.

What carries the argument

Directed acyclic graphical models that encode causal structures and information flows between variables in algorithmic decision systems.

If this is right

  • Equalized Odds will misclassify fairness in scenarios where the graph shows protected attributes influencing predictions through allowable paths.
  • Calibration by Group shares similar limitations once information flows are modeled explicitly.
  • The incompatibility of common criteria becomes visible as conflicting constraints on the same graph structure.
  • Fairness evaluations require first specifying the exact variables and edges that represent the algorithm's inputs and processing steps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Auditors may need to construct a custom graph for each application before selecting any fairness metric.
  • Regulatory guidelines could shift from mandating fixed metrics toward requiring documented graphical models of the decision process.
  • The same graphical approach might be applied to other fairness definitions such as individual fairness or counterfactual fairness to check consistency.

Load-bearing premise

The directed acyclic graphical models examined accurately capture the relevant causal structures and information flows in real algorithmic decision systems.

What would settle it

An empirical case study of a deployed decision system in which Equalized Odds produces outcomes that match the fairness indicated by the corresponding graphical model, or fails to do so when the model predicts it should.

Figures

Figures reproduced from arXiv: 1906.11333 by Benjamin R. Baer, Daniel E. Gilbert, Martin T. Wells.

Figure 1
Figure 1. Figure 1: In the leftmost graph, V2 is a confounder, in the center graph, V2 is a mediator, and in the rightmost graph, V2 is a collider. V1 V2 V3 V4 [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Because there are multiple paths from V1 to V4, this PDAG model may be unfaithful if the effect of V1 on V4 along one path perfectly counteracts the effect along the other path. if the structure of the DAG implies that Vi and Vj are (marginally) independent, i.e., Vi ⊥⊥ Vj . A set S of nodes d-separates Vi and Vj if the structure of the DAG implies that Vi and Vj are independent given the variables in S, i… view at source ↗
Figure 3
Figure 3. Figure 3: In this PDAG model, nodes V2 and V3 are d-separated a priori, that is, conditional on the empty set S = ∅. However, conditional on the collider S = {V5}, V2 and V3 are d-connected. V2 and V4 are d-separated given any of the following sets: {V1},{V1, V5, V3} or {V1, V6, V3}. 4 [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: In this PDAG model, V2 and V3 are confounded by the unobserved variable V1; this will render an expression such as P(V3 | do(V2)) unidentifiable. 2.2 Causality Strictly speaking, the directed edges in PDAG models need not have any causal interpretation, as long as they are consistent with the conditional dependencies in P. However, DAGs are not fully determined by their associated probability distributions… view at source ↗
Figure 5
Figure 5. Figure 5: The random variables in Scenario 1: various features and their relationship to race [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: This prediction R is d-separated from A and therefore satisfies Independence. A X1 X2 X3 Y X2 R [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The prediction R depends on only the part of X2 independent of race A. However, a prediction R which descends from either the applicant’s hair color X1 or credit rating X2 would fail to satisfy Independence. Nonetheless, there may remain valuable informa￾tion within X2 which may help us predict Y while maintaining Independence. We can extract this information by constructing a model which can be represente… view at source ↗
Figure 8
Figure 8. Figure 8: A modification of Scenario 1 in which the only part of [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Three examples of predictions, each where the prediction rule [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: The prediction R takes as input both A and X2, whose effects conspire to violate faithfulness and make R⊥⊥ A | Y [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Here, the mean and variance of the distribution of [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: The random variables in Scenario 2: various features and their relationship to gender [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Two examples of predictions, each where the prediction rule [PITH_FULL_IMAGE:figures/full_fig_p013_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: A perhaps more realistic DAG that underlies the click-predicting task in Scenario 2. [PITH_FULL_IMAGE:figures/full_fig_p013_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: An example of a DAG where all features are mediators between the sensitive charac [PITH_FULL_IMAGE:figures/full_fig_p014_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: A depiction of the prohibited paths between the random variables [PITH_FULL_IMAGE:figures/full_fig_p015_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: The leftmost graph is the same example graph, and the rightmost graph shows the [PITH_FULL_IMAGE:figures/full_fig_p015_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: The random variables in Scenario 3: various features and their relationship to race [PITH_FULL_IMAGE:figures/full_fig_p017_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: The random variables in Scenario 4: an unobserved variable, and various features [PITH_FULL_IMAGE:figures/full_fig_p018_19.png] view at source ↗
read the original abstract

A substantial portion of the literature on fairness in algorithms proposes, analyzes, and operationalizes simple formulaic criteria for assessing fairness. Two of these criteria, Equalized Odds and Calibration by Group, have gained significant attention for their simplicity and intuitive appeal, but also for their incompatibility. This chapter provides a perspective on the meaning and consequences of these and other fairness criteria using graphical models which reveals Equalized Odds and related criteria to be ultimately misleading. An assessment of various graphical models suggests that fairness criteria should ultimately be case-specific and sensitive to the nature of the information the algorithm processes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript analyzes fairness criteria in algorithmic decision-making, focusing on Equalized Odds and Calibration by Group, through the lens of directed acyclic graphical models (DAGs). It illustrates incompatibilities and information-flow consequences under varying causal structures, arguing that these criteria are ultimately misleading and that appropriate fairness assessments must instead be case-specific, depending on the nature of the information processed by the algorithm.

Significance. If the illustrative DAGs capture the relevant causal structures, the work supplies a coherent conceptual framework for understanding why simple formulaic criteria can lead to unintended consequences. It strengthens the case for context-dependent fairness by deriving logical implications directly from the graphs rather than empirical fitting, which is a strength for a primarily theoretical contribution.

minor comments (3)
  1. The abstract refers to 'this chapter,' which suggests the work may be excerpted from a larger volume; clarify the standalone status and any cross-references in the introduction.
  2. Ensure that each DAG figure includes an explicit legend or caption defining all nodes (e.g., protected attribute, outcome, prediction) and the interpretation of directed edges.
  3. Section discussing Equalized Odds could benefit from a short table summarizing the conditions under which the criterion holds or fails across the examined graphs.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for recommending minor revision. The referee's summary correctly identifies the core argument that DAGs reveal limitations in formulaic criteria such as Equalized Odds and Calibration by Group, supporting the need for case-specific fairness evaluations based on information flow.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper advances a conceptual argument by constructing illustrative DAGs to derive logical incompatibilities and information-flow consequences among fairness criteria such as Equalized Odds and Calibration. No equations, fitted parameters, or quantitative predictions appear; the derivations consist of direct implications from the chosen graph structures rather than any reduction to self-defined inputs or self-citation chains. The central claim that criteria must be case-specific follows from the explicit modeling choices without circular redefinition or renaming of known results. The analysis remains self-contained against external benchmarks because the graphs are presented as illustrative tools whose consequences are logically entailed by the stated causal assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5619 in / 874 out tokens · 16223 ms · 2026-05-25T14:47:58.053345+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 2 internal anchors

  1. [1]

    Foundations of linear and generalized linear models

    Alan Agresti. Foundations of linear and generalized linear models. John Wiley & Sons, 2015

  2. [2]

    Hiring by algorithm: predicting and preventing disparate impact

    Ifeoma Ajunwa, Sorelle Friedler, Carlos E Scheidegger, and Suresh Venkatasubramanian. Hiring by algorithm: predicting and preventing disparate impact. SSRN Electronic Journal, 2016

  3. [3]

    Machine bias: There’s software used across the country to predict future criminals

    Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks. ProPublica, 23, 2016

  4. [4]

    Big data's disparate impact

    Solon Barocas and Andrew D Selbst. Big data's disparate impact. Calif. L. Rev., 104: 0 671, 2016

  5. [5]

    Fairness and Machine Learning

    Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and Machine Learning. fairmlbook.org, 2018. http://www.fairmlbook.org

  6. [6]

    Fairness in machine learning: Lessons from political philosophy

    Reuben Binns. Fairness in machine learning: Lessons from political philosophy. In Proceedings of Machine Learning Research, volume 81, 2017

  7. [7]

    Three naive bayes approaches for discrimination-free classification

    Toon Calders and Sicco Verwer. Three naive bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery, 21 0 (2): 0 277--292, 2010

  8. [8]

    Building classifiers with independency constraints

    Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. Building classifiers with independency constraints. In 2009 IEEE International Conference on Data Mining Workshops, pages 13--18. IEEE, 2009

  9. [9]

    Path-specific counterfactual fairness

    Silvia Chiappa. Path-specific counterfactual fairness. In Thirty-Third AAAI Conference on Artificial Intelligence, 2019

  10. [10]

    Fair prediction with disparate impact: A study of bias in recidivism prediction instruments

    Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5 0 (2): 0 153--163, 2017

  11. [11]

    The Frontiers of Fairness in Machine Learning

    Alexandra Chouldechova and Aaron Roth. The frontiers of fairness in machine learning. arXiv:1810.08810, 2018

  12. [12]

    The measure and mismeasure of fairness: A critical review of fair machine learning

    Sam Corbett-Davies and Sharad Goel. The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv:1808.00023, 2018

  13. [13]

    Algorithmic decision making and the cost of fairness

    Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 797--806. ACM, 2017

  14. [14]

    Obtaining fairness using optimal transport theory

    Eustasio Del Barrio, Fabrice Gamboa, Paula Gordaliza, and Jean-Michel Loubes. Obtaining fairness using optimal transport theory. arXiv:1806.03195, 2018

  15. [15]

    Empirical risk minimization under fairness constraints

    Michele Donini, Luca Oneto, Shai Ben-David, John S Shawe-Taylor, and Massimiliano Pontil. Empirical risk minimization under fairness constraints. In Advances in Neural Information Processing Systems, pages 2796--2806, 2018

  16. [16]

    The romance of work: Gender and aspirational labour in the digital culture industries

    Brooke Duffy. The romance of work: Gender and aspirational labour in the digital culture industries. International Journal of Cultural Studies, 19(4): 0 441--457, 2016

  17. [17]

    Fairness through awareness

    Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214--226. ACM, 2012

  18. [18]

    Automating inequality: How high-tech tools profile, police, and punish the poor

    Virginia Eubanks. Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin's Press, 2018

  19. [19]

    False positives, false negatives, and false analyses: A rejoinder to machine bias: There's software used across the country to predict future criminals

    Anthony W Flores, Kristin Bechtel, and Christopher T Lowenkamp. False positives, false negatives, and false analyses: A rejoinder to machine bias: There's software used across the country to predict future criminals. and it's biased against blacks. Fed. Probation, 80: 0 38, 2016

  20. [20]

    Facebook won’t let employers, landlords or lenders discriminate in ads anymore

    Jack Gillum and Ariana Tobin. Facebook won’t let employers, landlords or lenders discriminate in ads anymore. ProPublica, 2019

  21. [21]

    The case for process fairness in learning: Feature selection for fair decision making

    Nina Grgic-Hlaca, Muhammad Bilal Zafar, Krishna P Gummadi, and Adrian Weller. The case for process fairness in learning: Feature selection for fair decision making. In NIPS Symposium on Machine Learning and the Law, volume 1, page 2, 2016

  22. [22]

    A continuous framework for fairness

    Philipp Hacker and Emil Wiedemann. A continuous framework for fairness. arXiv:1712.07924, 2017

  23. [23]

    Equality of opportunity in supervised learning

    Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of opportunity in supervised learning. In Advances in neural information processing systems, pages 3315--3323, 2016

  24. [24]

    An algorithm for removing sensitive information: application to race-independent recidivism prediction

    James E Johndrow and Kristian Lum. An algorithm for removing sensitive information: application to race-independent recidivism prediction. The Annals of Applied Statistics, 13 0 (1): 0 189--220, 2019

  25. [25]

    Avoiding discrimination through causal reasoning

    Niki Kilbertus, Mateo Rojas Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard Sch \"o lkopf. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems, pages 656--666, 2017

  26. [26]

    Inherent trade-offs in the fair determination of risk scores

    Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. In The 8th Innovations in Theoretical Computer Science Conference, 2017

  27. [27]

    Counterfactual fairness

    Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems, pages 4066--4076, 2017

  28. [28]

    How we analyzed the compas recidivism algorithm

    Jeff Larson, Surya Mattu, Lauren Kirchner, and Julia Angwin. How we analyzed the compas recidivism algorithm. ProPublica, 9, 2016

  29. [29]

    Calibration of probabilities: The state of the art to 1980

    Sarah Lichtenstein, Baruch Fischhoff, and Lawrence D Phillips. Calibration of probabilities: The state of the art to 1980. Technical report, Perceptronics: Decision Research, 1981

  30. [30]

    Limitations of mitigating judicial bias with machine learning

    Kristian Lum. Limitations of mitigating judicial bias with machine learning. Nature Human Behaviour, 2017

  31. [31]

    Prediction-based decisions and fairness: A catalogue of choices, assumptions, and definitions

    Shira Mitchell, Eric Potash, and Solon Barocas. Prediction-based decisions and fairness: A catalogue of choices, assumptions, and definitions. arXiv:1811.07867, 2018

  32. [32]

    Linear models: A mean model approach

    William R Moser. Linear models: A mean model approach. Elsevier, 1996

  33. [33]

    Fair inference on outcomes

    Razieh Nabi and Ilya Shpitser. Fair inference on outcomes. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018

  34. [34]

    Weapons of math destruction: How big data increases inequality and threatens democracy

    Cathy O'Neil. Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books, 2017

  35. [35]

    Causal inference in statistics: An overview

    Judea Pearl. Causal inference in statistics: An overview. Statistics surveys, 3: 0 96--146, 2009 a

  36. [36]

    Causality

    Judea Pearl. Causality. Cambridge University Press, 2009 b

  37. [37]

    Causal inference in statistics: A primer

    Judea Pearl, Madelyn Glymour, and Nicholas P Jewell. Causal inference in statistics: A primer. John Wiley & Sons, 2016

  38. [38]

    On fairness and calibration

    Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. On fairness and calibration. In Advances in Neural Information Processing Systems, pages 5680--5689, 2017

  39. [39]

    Knowing the score: New data, underwriting and marketing in the consumer credit marketplace

    Yu Robinson and H Yu. Knowing the score: New data, underwriting and marketing in the consumer credit marketplace. A Guide for Financial Inclusion Stakeholders, pages 1--34, 2014

  40. [40]

    The problem of infra-marginality in outcome tests for discrimination

    Camelia Simoiu, Sam Corbett-Davies, Sharad Goel, et al. The problem of infra-marginality in outcome tests for discrimination. The Annals of Applied Statistics, 11 0 (3): 0 1193--1216, 2017

  41. [41]

    Causation, prediction, and search

    Peter Spirtes, Clark N Glymour, Richard Scheines, David Heckerman, Christopher Meek, Gregory Cooper, and Thomas Richardson. Causation, prediction, and search. MIT press, 2000

  42. [42]

    New york city to ban discrimination based on hair

    Stacy Stowe. New york city to ban discrimination based on hair. New York Times, 2019

  43. [43]

    Discriminative but not discriminatory: A comparison of fairness definitions under different worldviews

    Samuel Yeom and Michael Carl Tschantz. Discriminative but not discriminatory: A comparison of fairness definitions under different worldviews. arXiv:1808.08619, 2018

  44. [44]

    Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment

    Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, pages 1171--1180. International World Wide Web Conferences Steering Committee, 2017

  45. [45]

    Governing algorithms: Myth, mess, and methods

    Malte Ziewitz. Governing algorithms: Myth, mess, and methods. Science, Technology & Human Values, 41(1): 0 3--16, 2016