CAFP: A Post-Processing Framework for Group Fairness via Counterfactual Model Averaging
Pith reviewed 2026-05-10 17:23 UTC · model grok-4.3
The pith
Averaging a model's predictions on factual and counterfactual inputs eliminates direct dependence on the protected attribute.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CAFP operates by generating counterfactual versions of each input in which the sensitive attribute is flipped, and then averaging the model's predictions across factual and counterfactual instances. This eliminates direct dependence on the protected attribute, reduces mutual information between predictions and sensitive attributes, and provably bounds the distortion introduced relative to the original model. Under mild assumptions, CAFP achieves perfect demographic parity and reduces the equalized odds gap by at least half the average counterfactual bias.
What carries the argument
Counterfactual model averaging: averaging the original model's output on the factual input with its output on the input after flipping the value of the protected attribute.
Load-bearing premise
Realistic counterfactual inputs can be created by simply flipping the protected attribute value and the original model can be queried on these at inference time.
What would settle it
Test the averaged model on a dataset where flipping the protected attribute produces inputs that are out of distribution or implausible, and check whether the demographic parity or equalized odds guarantees still hold.
read the original abstract
Ensuring fairness in machine learning predictions is a critical challenge, especially when models are deployed in sensitive domains such as credit scoring, healthcare, and criminal justice. While many fairness interventions rely on data preprocessing or algorithmic constraints during training, these approaches often require full control over the model architecture and access to protected attribute information, which may not be feasible in real-world systems. In this paper, we propose Counterfactual Averaging for Fair Predictions (CAFP), a model-agnostic post-processing method that mitigates unfair influence from protected attributes without retraining or modifying the original classifier. CAFP operates by generating counterfactual versions of each input in which the sensitive attribute is flipped, and then averaging the model's predictions across factual and counterfactual instances. We provide a theoretical analysis of CAFP, showing that it eliminates direct dependence on the protected attribute, reduces mutual information between predictions and sensitive attributes, and provably bounds the distortion introduced relative to the original model. Under mild assumptions, we further show that CAFP achieves perfect demographic parity and reduces the equalized odds gap by at least half the average counterfactual bias.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes CAFP, a model-agnostic post-processing framework for group fairness. For each test input, it generates a counterfactual by flipping the protected attribute A and averages the original model's predictions on the factual and counterfactual versions. The paper claims this eliminates direct dependence on A, reduces mutual information between predictions and A, bounds distortion relative to the original model, and—under mild assumptions—achieves perfect demographic parity while reducing the equalized odds gap by at least half the average counterfactual bias.
Significance. If the theoretical claims are rigorously established and the assumptions hold in practice, CAFP would supply a simple, training-free post-processing technique applicable to any black-box classifier. This could be useful in deployment settings where retraining is infeasible. The post-processing design and the explicit bounds on distortion and information leakage are potentially valuable, but their impact is limited by the requirement for protected-attribute access at inference.
major comments (3)
- [Abstract / Method] Abstract and Method section: The perfect demographic parity guarantee is obtained only by averaging f(x, A) and f(x, 1-A) at inference time. This construction mathematically forces identical output distributions across groups solely when A is observed and the flipped input is a valid query to the original model. No alternative procedure is supplied for the common case in which A is withheld at deployment; the fairness claims therefore do not hold under that realistic constraint.
- [Theoretical Analysis] Theoretical Analysis section: The stated bounds on mutual-information reduction and distortion, as well as the 'at least half' reduction in the equalized-odds gap, appear to follow directly from the averaging definition itself rather than from an independent derivation. Explicit equations, proof sketches, and the precise 'mild assumptions' must be provided to demonstrate that the results are not tautological with the method's construction.
- [Method] Method section: The assumption that simply flipping the value of A produces realistic counterfactuals is load-bearing for all fairness guarantees. When features are correlated with A, the counterfactual (x, 1-A) may lie far outside the data distribution, rendering the averaged prediction meaningless and invalidating the claimed bounds.
minor comments (2)
- [Abstract] Abstract: The phrase 'mild assumptions' is used without enumeration; these assumptions should be stated explicitly so readers can assess their realism.
- [Experiments] Throughout: No empirical results, tables, or figures are referenced in the provided abstract or summary; if experiments exist, they should be summarized to illustrate that the theoretical reductions materialize on real data.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address each major comment point by point below, indicating planned revisions to the manuscript where appropriate.
read point-by-point responses
-
Referee: [Abstract / Method] Abstract and Method section: The perfect demographic parity guarantee is obtained only by averaging f(x, A) and f(x, 1-A) at inference time. This construction mathematically forces identical output distributions across groups solely when A is observed and the flipped input is a valid query to the original model. No alternative procedure is supplied for the common case in which A is withheld at deployment; the fairness claims therefore do not hold under that realistic constraint.
Authors: We agree that the exact demographic parity guarantee requires access to the protected attribute A at inference time to construct and query the counterfactual input. This is a core aspect of the post-processing design presented in the Method section. We will revise the Abstract to explicitly state the inference-time access requirement and add a dedicated paragraph in the Method section discussing deployment scenarios where A is unavailable. In those cases, CAFP cannot be applied directly, and we will note that alternative approaches (such as those relying on proxies or training-time interventions) would be needed instead. revision: yes
-
Referee: [Theoretical Analysis] Theoretical Analysis section: The stated bounds on mutual-information reduction and distortion, as well as the 'at least half' reduction in the equalized-odds gap, appear to follow directly from the averaging definition itself rather than from an independent derivation. Explicit equations, proof sketches, and the precise 'mild assumptions' must be provided to demonstrate that the results are not tautological with the method's construction.
Authors: The properties do derive from the averaging construction, but the Theoretical Analysis section presents them as formal theorems obtained via probabilistic arguments applied to the averaged predictor. We will expand this section substantially by inserting the explicit equations (e.g., the mutual-information bound I(Ŷ;A) ≤ ½ I(f(X,A);A), the distortion bound in terms of total variation, and the equalized-odds gap reduction), step-by-step proof sketches, and a precise list of the mild assumptions (including that the base model is defined on the augmented feature space and that averaging is performed exactly). These additions will clarify the derivations and show they are not merely restatements of the method. revision: yes
-
Referee: [Method] Method section: The assumption that simply flipping the value of A produces realistic counterfactuals is load-bearing for all fairness guarantees. When features are correlated with A, the counterfactual (x, 1-A) may lie far outside the data distribution, rendering the averaged prediction meaningless and invalidating the claimed bounds.
Authors: We acknowledge that the realism of the counterfactual inputs is a key assumption underlying the guarantees. The paper treats this as one of the 'mild assumptions' under which the bounds hold, without requiring the counterfactual to lie in the training distribution—only that the original model can evaluate it. When features are strongly correlated with A, the averaged output may indeed be less interpretable. We will revise the Method section to state the assumption more explicitly and add a new Limitations paragraph that discusses this issue, references related work on counterfactual generation, and notes that more advanced causal models could be used to improve counterfactual quality in practice. revision: partial
Circularity Check
No significant circularity in CAFP derivation chain
full rationale
The paper defines CAFP via counterfactual averaging of model outputs and separately provides a theoretical analysis deriving fairness properties (elimination of direct dependence, MI reduction, distortion bounds, perfect DP and EO gap reduction under mild assumptions). No equations or steps are exhibited where a claimed result reduces exactly to the input definition by construction, no self-citations load-bear the central claims, and no fitted parameters are relabeled as predictions. The analysis is presented as independent derivation from the averaging operator plus stated assumptions, making the chain self-contained rather than tautological.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Mild assumptions that enable perfect demographic parity after averaging
- domain assumption Counterfactual inputs can be generated by flipping the protected attribute value
Reference graph
Works this paper leans on
-
[1]
and it’s biased against blacks
Angwin, J., Larson, J., Mattu, S., Kirch- ner, L.: Machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. ProPublica (2016) 28
work page 2016
-
[2]
California Law Review104(3), 671– 732 (2016)
Barocas, S., Selbst, A.D.: Big data’s disparate impact. California Law Review104(3), 671– 732 (2016)
work page 2016
-
[3]
Eubanks, V.: Automating Inequality: How High-Tech Tools Profile, Police, and Pun- ish the Poor. St. Martin’s Press, Inc., USA (2018)
work page 2018
-
[4]
How Search Engines Reinforce Racism
Noble, S.U.: Algorithms of Oppression. How Search Engines Reinforce Racism. New York University Press, New York (2018)
work page 2018
-
[5]
Buolamwini, J., Gebru, T.: Gender shades: Intersectional accuracy disparities in com- mercial gender classification. In: Friedler, S.A., Wilson, C. (eds.) Proceedings of the 1st Conference on Fairness, Accountability and Transparency. Proceedings of Machine Learn- ing Research, vol. 81, pp. 77–91. PMLR, Cambridge, MA (2018)
work page 2018
-
[6]
Kamiran, F., Calders, T.: Data preprocessing techniques for classification without discrim- ination, vol. 33, pp. 1–33. Springer, Berlin, Heidelberg (2012)
work page 2012
-
[7]
Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Dis- covery and Data Mining. KDD ’15, pp. 259–
-
[8]
Association for Computing Machinery, New York, NY, USA (2015)
work page 2015
-
[9]
In: Proceedings of the 26th International Con- ference on World Wide Web
Zafar, M.B., Valera, I., Gomez Rodriguez, M., Gummadi, K.P.: Fairness beyond disparate treatment & disparate impact: Learning clas- sification without disparate mistreatment. In: Proceedings of the 26th International Con- ference on World Wide Web. WWW ’17, pp. 1171–1180. International World Wide Web Conferences Steering Committee, Republic and Canton of ...
work page 2017
-
[10]
Agarwal, A., Beygelzimer, A., Dudik, M., Langford, J., Wallach, H.: A reductions approach to fair classification. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 60–69. PMLR, Cam- bridge, MA (2018)
work page 2018
-
[11]
In: Pro- ceedings of the 30th International Conference on Neural Information Processing Systems
Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Pro- ceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16, pp. 3323–3331. Curran Associates Inc., Red Hook, NY, USA (2016)
work page 2016
-
[12]
In: Proceedings of the 31st International Conference on Neural Information Process- ing Systems
Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J., Weinberger, K.Q.: On fairness and calibra- tion. In: Proceedings of the 31st International Conference on Neural Information Process- ing Systems. NIPS’17, pp. 5684–5693. Curran Associates Inc., Red Hook, NY, USA (2017)
work page 2017
-
[13]
MIT Press, Cambridge, MA (2023)
Barocas, S., Hardt, M., Narayanan, A.: Fair- ness and Machine Learning: Limitations and Opportunities. MIT Press, Cambridge, MA (2023)
work page 2023
-
[14]
Science366(6464), 447–453 (2019)
Obermeyer, Z., Powers, B., Vogeli, C., Mul- lainathan, S.: Dissecting racial bias in an algorithm used to manage the health of pop- ulations. Science366(6464), 447–453 (2019)
work page 2019
-
[15]
In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society
Raji, I.D., Buolamwini, J.: Actionable audit- ing: Investigating the impact of publicly nam- ing biased performance results of commercial ai products. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. AIES ’19, pp. 429–435. Association for Computing Machinery, New York, NY, USA (2019)
work page 2019
-
[16]
In: Proceedings of the 3rd Innovations in The- oretical Computer Science Conference
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Proceedings of the 3rd Innovations in The- oretical Computer Science Conference. ITCS ’12, pp. 214–226. Association for Computing Machinery, New York, NY, USA (2012)
work page 2012
-
[17]
In: Proceedings of the 31st International Conference on Neu- ral Information Processing Systems
Kusner, M., Loftus, J., Russell, C., Silva, R.: Counterfactual fairness. In: Proceedings of the 31st International Conference on Neu- ral Information Processing Systems. NIPS’17, pp. 4069–4079. Curran Associates Inc., Red Hook, NY, USA (2017)
work page 2017
-
[18]
Chiappa, S.: Path-specific counterfactual 29 fairness. In: Proceedings of the Thirty- Third AAAI Conference on Artificial Intel- ligence and Thirty-First Innovative Appli- cations of Artificial Intelligence Conference and Ninth AAAI Symposium on Educa- tional Advances in Artificial Intelligence. AAAI’19/IAAI’19/EAAI’19. AAAI Press, Washington, DC (2019)
work page 2019
-
[19]
Kleinberg, J., Mullainathan, S., Raghavan, M.: Inherent trade-offs in the fair determina- tion of risk scores. In: Papadimitriou, C.H. (ed.) 8th Innovations in Theoretical Com- puter Science Conference (ITCS 2017). Leib- niz International Proceedings in Informat- ics (LIPIcs), vol. 67, pp. 43–14323. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik
work page 2017
-
[20]
Chouldechova, A.: Fair prediction with dis- parate impact: A study of bias in recidivism prediction instruments. Big Data5(2), 153– 163 (2017)
work page 2017
-
[21]
In: 2012 IEEE 12th International Conference on Data Mining, pp
Kamiran, F., Karim, A., Zhang, X.: Decision theory for discrimination-aware classification. In: 2012 IEEE 12th International Conference on Data Mining, pp. 924–929 (2012)
work page 2012
-
[22]
Fairbatch: Batch selection for model fairness,
Roh, Y., Lee, K., Whang, S.E., Suh, C.: Fair- Batch: Batch Selection for Model Fairness (2021). https://arxiv.org/abs/2012.01696
-
[23]
In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency
Mishler, A., Kennedy, E.H., Chouldechova, A.: Fairness in risk assessment instruments: Post-processing to achieve counterfactual equalized odds. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. FAccT ’21, pp. 386–400. Association for Computing Machinery, New York, NY, USA (2021)
work page 2021
-
[24]
Nabi, R., Shpitser, I.: Fair inference on outcomes. In: Proceedings of the Thirty- Second AAAI Conference on Artificial Intel- ligence and Thirtieth Innovative Applica- tions of Artificial Intelligence Conference and Eighth AAAI Symposium on Educa- tional Advances in Artificial Intelligence. AAAI’18/IAAI’18/EAAI’18. AAAI Press, Washington, DC (2018)
work page 2018
-
[25]
In: Proceedings of the 31st Inter- national Conference on Neural Information Processing Systems
Kilbertus, N., Rojas-Carulla, M., Parascan- dolo, G., Hardt, M., Janzing, D., Sch¨ olkopf, B.: Avoiding discrimination through causal reasoning. In: Proceedings of the 31st Inter- national Conference on Neural Information Processing Systems. NIPS’17, pp. 656–666. Curran Associates Inc., Red Hook, NY, USA (2017)
work page 2017
-
[26]
In: Proceedings of the Conference on Fairness, Accountability, and Transparency
Madras, D., Creager, E., Pitassi, T., Zemel, R.: Fairness through causal awareness: Learn- ing causal latent-variable models for biased data. In: Proceedings of the Conference on Fairness, Accountability, and Transparency. FAT* ’19, pp. 349–358. Association for Com- puting Machinery, New York, NY, USA (2019)
work page 2019
-
[27]
In: Proceedings of the 31st International Conference on Neural Information Process- ing Systems
Russell, C., Kusner, M.J., Loftus, J.R., Silva, R.: When worlds collide: integrating differ- ent counterfactual assumptions in fairness. In: Proceedings of the 31st International Conference on Neural Information Process- ing Systems. NIPS’17, pp. 6417–6426. Curran Associates Inc., Red Hook, NY, USA (2017)
work page 2017
-
[28]
In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp
Wang, M., Deng, W., Hu, J., Tao, X., Huang, Y.: Racial Faces in the Wild: Reducing Racial Bias by Information Max- imization Adaptation Network . In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 692–702. IEEE Computer Society, Los Alamitos, CA, USA (2019)
work page 2019
-
[29]
In: Palmer, M., Hwa, R., Riedel, S
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., Chang, K.-W.: Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Nat- ural Language Processing, pp. 2979–2989. Association for Computational Linguistics, Copenhagen, Den...
work page 2017
-
[30]
In: Inui, K., Jiang, J., Ng, V., Wan, X
Sheng, E., Chang, K.-W., Natarajan, P., Peng, N.: The woman worked as a babysitter: On biases in language generation. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural 30 Language Processing (EMNLP-IJCNLP), pp. 3407–34...
work page 2019
-
[31]
Electronic Commerce Research, 1–31 (2024)
Bahi, A., Gasmi, I., Bentrad, S., Khantouchi, R.: Mycgnn: enhancing recommendation diversity in e-commerce through mycelium- inspired graph neural network. Electronic Commerce Research, 1–31 (2024)
work page 2024
-
[32]
West Virginia Law Review123(3), 735–790 (2021)
Wachter, S., Mittelstadt, B., Russell, C.: Bias preservation in machine learning: The legality of fairness metrics under eu non- discrimination law. West Virginia Law Review123(3), 735–790 (2021)
work page 2021
-
[33]
Wiley-Interscience, USA (2006)
Cover, T.M., Thomas, J.A.: Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, USA (2006)
work page 2006
-
[34]
Menon, A.K., Williamson, R.C.: The cost of fairness in binary classification. In: Friedler, S.A., Wilson, C. (eds.) Proceedings of the 1st Conference on Fairness, Accountability and Transparency. Proceedings of Machine Learn- ing Research, vol. 81, pp. 107–118. PMLR, Cambridge, MA (2018)
work page 2018
-
[35]
In: Proceedings of the 32nd International Conference on Neu- ral Information Processing Systems
Moyer, D., Gao, S., Brekelmans, R., Steeg, G.V., Galstyan, A.: Invariant representations without adversarial training. In: Proceedings of the 32nd International Conference on Neu- ral Information Processing Systems. NIPS’18, pp. 9102–9111. Curran Associates Inc., Red Hook, NY, USA (2018)
work page 2018
-
[36]
UCI Machine Learning Repository (1996)
Becker, B., Kohavi, R.: Adult. UCI Machine Learning Repository (1996)
work page 1996
-
[37]
Angwin, J., Larson, J., Mattu, S., Kirchner, L.: How we analyzed the compas recidivism algorithm (2016)
work page 2016
-
[38]
Hofmann, H.: Statlog (German Credit Data). UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5NC77 (1994) 31
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.