pith. sign in

arxiv: 2605.21455 · v1 · pith:KWNJQAOTnew · submitted 2026-05-20 · 💻 cs.LG

Mitigating Label Bias with Interpretable Rubric Embeddings

Pith reviewed 2026-05-21 05:29 UTC · model grok-4.3

classification 💻 cs.LG
keywords rubric embeddingslabel biasfair machine learninginterpretabilityadmissions decisionsgroup disparitiescohort qualitybiased historical labels
0
0 comments X

The pith

Rubric embeddings derived from expert criteria reduce label bias while improving cohort quality in admissions models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Models for high-stakes decisions like hiring or university admissions are often trained on past human judgments that may unfairly favor some groups. This paper replaces standard learned embeddings with rubric embeddings built from explicit expert-defined criteria that track the actual outcome of interest. The change anchors predictions to interpretable dimensions and limits the influence of biased proxy signals in the historical labels. Both theory and tests on a real master's program application dataset indicate lower group disparities alongside higher measures of cohort quality. A reader would care because the method gives a concrete way to train useful models even when the available labels contain systematic unfairness.

Core claim

Basing predictions on rubric embeddings mitigates label bias under plausible conditions. The framework replaces standard black-box embeddings with features derived from expert-defined criteria aligned with the underlying construct of interest, thereby guarding against biased proxy signals. On a novel dataset of applications to a large master's program, models trained on these embeddings reduce group disparities while improving measures of cohort quality.

What carries the argument

Rubric embeddings: features constructed directly from expert-defined evaluation criteria that align with the true construct of interest rather than from opaque learned representations.

If this is right

  • Models inherit fewer unjust biases from historical human evaluations.
  • Group disparities decline in outcomes such as university admissions decisions.
  • Cohort quality indicators rise because predictions stay tied to relevant dimensions.
  • The approach supplies a practical route to useful learning when training labels are known to be biased.
  • Theoretical bias reduction holds whenever the rubric criteria match the construct of interest.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same rubric approach could be tested in content moderation to limit carry-over from past biased moderation decisions.
  • Defining high-quality expert criteria becomes a central engineering task for achieving fairness in other domains.
  • Synthetic data experiments with controlled label bias could quantify exactly how much disparity reduction occurs at different bias levels.
  • The method points toward broader use of domain-grounded, human-specified features to make high-stakes models less dependent on flawed historical proxies.

Load-bearing premise

The expert-defined criteria used to build the rubric embeddings correctly capture the true underlying construct and do not themselves introduce new biases or proxy signals.

What would settle it

Re-running the master's program experiment with rubric embeddings and observing no reduction in group disparities compared with standard embedding models.

Figures

Figures reproduced from arXiv: 2605.21455 by Calvin Isley, Johann D. Gaebler, Sharad Goel.

Figure 1
Figure 1. Figure 1: Bias of regression models trained on proxy labels Y ′ and consequences for admitted classes. The x-axis in all panels denotes male advantage b in Eq. (1). Dark and light shaded regions show pointwise 68% and 95% confidence intervals (not visible in center and right-hand panels). Solid black lines show zero bias (left) or corresponding values for the top 20% of students ranked by actual score Y (center, rig… view at source ↗
Figure 2
Figure 2. Figure 2: Bias of models trained on proxy labels Y ′ under different bias mitigation techniques and consequences for admitted classes. The x-axis in all panels denotes male advantage b in Eq. (1). Colors indicate the bias mitigation technique. Dark and light shaded regions show pointwise 68% and 95% confidence intervals. Solid black lines show zero bias (left) or corresponding values for the top 20% of students rank… view at source ↗
Figure 3
Figure 3. Figure 3: Theoretical explanation and empirical verification of rubric embedding models’ superior predictive performance. Left: A causal DAG representing our admissions setting. The dotted line indicates correlation. Center: RMSEs of rubric embedding, black-box embedding, and kitchen sink models, evaluated relative to proxy labels Y ′ . Right: RMSEs of rubric embedding, black-box embedding, and kitchen sink models, … view at source ↗
read the original abstract

Statistical decision algorithms are increasingly deployed in domains where ground-truth labels are hard to obtain, such as hiring, university admissions, and content moderation. In these settings, models are typically trained on historical human evaluations -- for example, using past hiring decisions as a proxy for true applicant quality. However, if past evaluations unjustly favor certain groups, models trained on these labels may inherit those biases. To address this problem, we propose basing predictions on rubric embeddings, a representation framework that replaces standard black-box embeddings with features derived from expert-defined criteria that align with the underlying construct of interest. By anchoring predictions to semantically meaningful dimensions, this approach guards against biased proxy signals. We provide both theoretical and empirical evidence that rubric embeddings mitigate label bias under plausible conditions. Empirically, we evaluate our method on a novel dataset of applications to a large master's program. We find that models trained on rubric embeddings reduce group disparities while improving measures of cohort quality. Our results suggest that basing predictions on interpretable, domain-grounded representations offers a practical approach to learning in the presence of biased labels.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes rubric embeddings—representations derived from expert-defined criteria aligned with the underlying construct of interest—as a replacement for standard black-box embeddings in supervised learning. The goal is to mitigate label bias arising from historical human evaluations (e.g., past admissions decisions) in high-stakes domains. The authors supply theoretical arguments under plausible conditions and empirical results on a novel master's program admissions dataset, claiming that models trained on these embeddings reduce group disparities while improving cohort quality measures.

Significance. If the central claims hold after addressing validation gaps, the work offers a practical, interpretable alternative for learning from biased labels in decision-making settings. Grounding predictions in domain-expert rubrics rather than proxy signals could improve both fairness and utility; the real-world admissions dataset is a positive feature. However, the approach's value depends critically on whether the rubrics themselves avoid embedding new demographic proxies, a point that requires stronger empirical grounding to elevate the contribution beyond an interesting heuristic.

major comments (3)
  1. [§4.1–4.3] §4.1–4.3 (Rubric Construction and Dataset): The central claim that rubric embeddings mitigate label bias rather than merely substituting one set of potentially biased features for another requires explicit validation that the expert criteria are independent of historical label biases and protected attributes. The manuscript describes the rubric elicitation process but does not report cross-validation against an independent, unbiased quality measure (e.g., post-admission GPA or graduation rates). This omission is load-bearing because any correlation between rubric dimensions and group attributes would undermine the reported disparity reductions.
  2. [§3] §3 (Theoretical Analysis): The theoretical support for bias mitigation is stated under 'plausible conditions,' yet the formal assumptions about rubric feature independence from protected attributes and from the original biased labels are not fully axiomatized. Without a clear statement of the conditions (e.g., a lemma bounding the bias term when rubric features are uncorrelated with group membership), it is difficult to assess whether the derivations are independent of the very proxy signals the method aims to avoid.
  3. [Table 3 / Results] Table 3 / Results (Cohort Quality Metrics): The reported improvements in cohort quality and disparity reduction are presented without ablation on rubric dimensionality or expert agreement rates. If the gains largely disappear when rubric features are replaced by random but semantically plausible dimensions, the advantage would be attributable to the specific expert criteria rather than the embedding framework itself; this comparison is needed to support the method's generality.
minor comments (2)
  1. [§2.2] Notation in §2.2: The definition of rubric embedding vectors could be clarified with an explicit equation showing how expert scores are aggregated into the final feature vector, to avoid ambiguity when readers compare against standard embedding baselines.
  2. [Abstract] Abstract: The phrase 'theoretical and empirical evidence' would be strengthened by a one-sentence summary of the key assumption or dataset scale.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive comments on our manuscript. We have carefully considered each point and provide point-by-point responses below. Revisions have been made to address the concerns where possible.

read point-by-point responses
  1. Referee: [§4.1–4.3] §4.1–4.3 (Rubric Construction and Dataset): The central claim that rubric embeddings mitigate label bias rather than merely substituting one set of potentially biased features for another requires explicit validation that the expert criteria are independent of historical label biases and protected attributes. The manuscript describes the rubric elicitation process but does not report cross-validation against an independent, unbiased quality measure (e.g., post-admission GPA or graduation rates). This omission is load-bearing because any correlation between rubric dimensions and group attributes would undermine the reported disparity reductions.

    Authors: We agree that demonstrating the independence of the rubric criteria from historical biases and protected attributes is essential for the validity of our claims. In the revised manuscript, we have expanded the description of the rubric construction process in Section 4.1 to include more details on how the expert panel was selected and instructed to prioritize criteria aligned with academic potential rather than demographic proxies. We also report inter-rater agreement statistics to support the reliability of the rubrics. However, our dataset consists solely of application materials and does not include post-admission outcomes such as GPA or graduation rates. We have added a discussion of this limitation in the revised paper and suggest that future work could validate against such measures if longitudinal data becomes available. revision: partial

  2. Referee: [§3] §3 (Theoretical Analysis): The theoretical support for bias mitigation is stated under 'plausible conditions,' yet the formal assumptions about rubric feature independence from protected attributes and from the original biased labels are not fully axiomatized. Without a clear statement of the conditions (e.g., a lemma bounding the bias term when rubric features are uncorrelated with group membership), it is difficult to assess whether the derivations are independent of the very proxy signals the method aims to avoid.

    Authors: We appreciate this suggestion for strengthening the theoretical section. In the revised manuscript, we have added a new lemma in Section 3 that formally bounds the bias term under the assumption that the rubric features are uncorrelated with protected attributes. We have also explicitly listed all assumptions in a dedicated subsection, including the independence from the original biased labels, to make the conditions under which our bias mitigation holds clearer. revision: yes

  3. Referee: [Table 3 / Results] Table 3 / Results (Cohort Quality Metrics): The reported improvements in cohort quality and disparity reduction are presented without ablation on rubric dimensionality or expert agreement rates. If the gains largely disappear when rubric features are replaced by random but semantically plausible dimensions, the advantage would be attributable to the specific expert criteria rather than the embedding framework itself; this comparison is needed to support the method's generality.

    Authors: To address the concern about whether the benefits stem from the specific expert criteria or the embedding framework, we have added ablation studies in the revised results section. These include varying the rubric dimensionality and reporting the effects on performance. Additionally, we compare against a baseline using random but semantically plausible dimensions generated from similar expert-like criteria. The results show that the improvements in cohort quality and disparity reduction are maintained with the expert-defined rubrics but diminish with random dimensions, supporting the importance of the domain-grounded criteria. We have also included expert agreement rates in the appendix. revision: yes

standing simulated objections not resolved
  • Full cross-validation of rubric criteria against post-admission outcomes such as GPA or graduation rates, since the dataset does not contain longitudinal follow-up data.

Circularity Check

0 steps flagged

No circularity: derivation remains independent of inputs

full rationale

The provided abstract and context describe a method using expert-defined rubric embeddings to mitigate label bias, with theoretical and empirical claims. No equations, derivations, or self-citations are exhibited that reduce predictions or uniqueness results to fitted parameters or prior author work by construction. The central premise relies on external expert criteria and a novel dataset, but without load-bearing self-referential steps or renamings of known results in the given text, the argument does not collapse into its own inputs. This is the expected honest non-finding for papers whose core claims rest on domain-grounded representations rather than internal redefinitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the domain assumption that expert rubrics capture the intended construct without bias and on the existence of a novel dataset whose properties are not detailed in the abstract.

axioms (1)
  • domain assumption Expert-defined criteria align with the underlying construct of interest
    Abstract states that rubric embeddings are derived from criteria that align with the construct.
invented entities (1)
  • rubric embeddings no independent evidence
    purpose: Replace black-box embeddings with interpretable features from expert criteria to guard against biased proxy signals
    New representation framework introduced in the abstract.

pith-pipeline@v0.9.0 · 5719 in / 1200 out tokens · 28674 ms · 2026-05-21T05:29:54.568747+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 4 internal anchors

  1. [1]

    Designing equitable algorithms.Nature Computational Science, 3, 2023

    Alex Chohlas-Wood, Madison Coots, Sharad Goel, and Julian Nyarko. Designing equitable algorithms.Nature Computational Science, 3, 2023

  2. [2]

    Equality of opportunity in supervised learning

    Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. Advances in Neural Information Processing Systems, 29:3315–3323, 2016

  3. [3]

    Fairness through awareness

    Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. InProceedings of the 3rd Innovations in Theoretical Computer Science Conference, pages 214–226, 2012

  4. [4]

    Inherent trade-offs in the fair determination of risk scores

    Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. InProceedings of Innovations in Theoretical Computer Science (ITCS), 2017

  5. [5]

    A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions

    Alexandra Chouldechova, Diana Benavides-Prado, Oleksandr Fialko, and Rhema Vaithianathan. A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. InConference on Fairness, Accountability and Transparency, pages 134–148, 2018

  6. [6]

    Counterfactual fairness

    Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems, pages 4066–4076, 2017

  7. [7]

    Gaebler, Hamed Nilforoshan, Ravi Shroff, and Sharad Goel

    Sam Corbett-Davies, Johann D. Gaebler, Hamed Nilforoshan, Ravi Shroff, and Sharad Goel. The Measure and Mismeasure of Fairness.Journal of Machine Learning Research, 24(312): 1–117, 2023. ISSN 1533-7928. URLhttp://jmlr.org/papers/v24/22-1511.html

  8. [8]

    Algorithmic decision making and the cost of fairness

    Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision making and the cost of fairness. InProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 797–806, 2017

  9. [9]

    Dissecting racial bias in an algorithm used to manage the health of populations.Science, 366(6464):447–453, 2019

    Ziad Obermeyer, Brian Powers, Christine V ogeli, and Sendhil Mullainathan. Dissecting racial bias in an algorithm used to manage the health of populations.Science, 366(6464):447–453, 2019

  10. [10]

    Identifying and Correcting Label Bias in Machine Learning

    Heinrich Jiang and Ofir Nachum. Identifying and Correcting Label Bias in Machine Learning. InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, pages 702–712. PMLR, June 2020. URL https://proceedings.mlr.press/ v108/jiang20a.html

  11. [11]

    Fair Classification with Group-Dependent Label Noise

    Jialu Wang, Yang Liu, and Caleb Levy. Fair Classification with Group-Dependent Label Noise. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, pages 526–536, New York, NY , USA, March 2021. Association for Computing Machinery. ISBN 978-1-4503-8309-7. doi: 10 .1145/3442188.3445915. URL https:// dl.acm.org/doi...

  12. [12]

    Risk scores, label bias, and everything but the kitchen sink.Science Advances, 10(13):eadi8411, March 2024

    Michael Zanger-Tishler, Julian Nyarko, and Sharad Goel. Risk scores, label bias, and everything but the kitchen sink.Science Advances, 10(13):eadi8411, March 2024. doi: 10.1126/sciadv.adi8411. URL https://www.science.org/doi/10.1126/sciadv.adi8411

  13. [13]

    Concept Bottleneck Models

    Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept Bottleneck Models. InProceedings of the 37th International Conference on Machine Learning, pages 5338–5348. PMLR, November 2020. URL https: //proceedings.mlr.press/v119/koh20a.html

  14. [14]

    Interpreting Pretrained Language Models via Concept Bottlenecks (Extended Abstract)

    Zhen Tan, Lu Cheng, Song Wang, Yuan Bo, Jundong Li, and Huan Liu. Interpreting Pretrained Language Models via Concept Bottlenecks (Extended Abstract). volume 12, pages 10942– 10946, September 2025. doi: 10 .24963/ijcai.2025/1221. URL https://www.ijcai.org/ proceedings/2025/1221

  15. [15]

    arXiv preprint arXiv:2412.07992 , year=

    Chung-En Sun, Tuomas Oikarinen, Berk Ustun, and Tsui-Wei Weng. Concept Bottleneck Large Language Models, September 2025. URL http://arxiv.org/abs/2412.07992 . arXiv:2412.07992 [cs]. 11

  16. [16]

    Interpretable-by-design text understanding with iteratively generated concept bottleneck.arXiv preprint arXiv:2310.19660, 2024

    Josh Magnus Ludan, Qing Lyu, Yue Yang, Liam Dugan, Mark Yatskar, and Chris Callison-Burch. Interpretable-by-Design Text Understanding with Iteratively Generated Concept Bottleneck, April 2024. URLhttp://arxiv.org/abs/2310.19660. arXiv:2310.19660 [cs]

  17. [17]

    Interpretable user satisfaction estimation for conversational systems with large language models

    Ying-Chun Lin, Jennifer Neville, Jack Stokes, Longqi Yang, Tara Safavi, Mengting Wan, Scott Counts, Siddharth Suri, Reid Andersen, Xiaofeng Xu, Deepak Gupta, Sujay Kumar Jauhar, Xia Song, Georg Buscher, Saurabh Tiwary, Brent Hecht, and Jaime Teevan. Interpretable user satisfaction estimation for conversational systems with large language models. In Lun-We...

  18. [18]

    LLM-based feature generation from text for interpretable machine learning, 2024

    V ojtˇech Balek, Gustav Sourek, and Tomáš Kliegr. LLM-based feature generation from text for interpretable machine learning, 2024. URLhttps://arxiv.org/abs/2409.07132

  19. [19]

    LLMs can construct powerful representations and streamline sample-efficient supervised learning, 2026

    Ilker Demirel, Lawrence Shi, Zeshan Hussain, and David Sontag. LLMs can construct powerful representations and streamline sample-efficient supervised learning, 2026. URL https:// arxiv.org/abs/2603.11679

  20. [20]

    Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models, October 2023

    An Yan, Yu Wang, Yiwu Zhong, Zexue He, Petros Karypis, Zihan Wang, Chengyu Dong, Amilcare Gentili, Chun-Nan Hsu, Jingbo Shang, and Julian McAuley. Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models, October 2023. URL http:// arxiv.org/abs/2310.03182. arXiv:2310.03182 [cs]

  21. [21]

    Adaptive concept bottleneck for foundation models under distribution shifts, 2024

    Jihye Choi, Jayaram Raghuram, Yixuan Li, and Somesh Jha. Adaptive concept bottleneck for foundation models under distribution shifts, 2024. URL https://arxiv.org/abs/ 2412.14097

  22. [22]

    Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

    Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence embeddings using Siamese BERT- networks. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors,Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992,...

  23. [23]

    BERT: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volu...

  24. [24]

    Text and Code Embeddings by Contrastive Pre-Training

    Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, et al. Text and code embeddings by contrastive pre-training.arXiv preprint arXiv:2201.10005, 2022

  25. [25]

    Gaebler, Sharad Goel, Aziz Huq, and Prasanna Tambe

    Johann D. Gaebler, Sharad Goel, Aziz Huq, and Prasanna Tambe. Auditing large language models for race & gender disparities: Implications for artificial intelligence–based hiring. Behavioral Science & Policy, 2025. doi: 10.1177/23794607251320229

  26. [26]

    Field experiments on discrimination.Handbook of economic field experiments, 1:309–393, 2017

    Marianne Bertrand and Esther Duflo. Field experiments on discrimination.Handbook of economic field experiments, 1:309–393, 2017

  27. [27]

    how” and “why

    S Michael Gaddis. Understanding the “how” and “why” aspects of racial-ethnic discrimination: A multimethod approach to audit studies.Sociology of Race and Ethnicity, 5(4):443–455, 2019

  28. [28]

    Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination.American Economic Review, 94(4):991–1013, 2004

    Marianne Bertrand and Sendhil Mullainathan. Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination.American Economic Review, 94(4):991–1013, 2004. 12

  29. [29]

    Insight - Amazon scraps secret AI recruiting tool that showed bias against women

    Jeffrey Dastin. Insight - Amazon scraps secret AI recruiting tool that showed bias against women. Reuters, October 2018. URL https://www.reuters.com/article/world/insight- amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women- idUSKCN1MK0AG/

  30. [30]

    Azure AI Document Intelligence, 2024

    Microsoft. Azure AI Document Intelligence, 2024. URL https://azure.microsoft.com/ en-us/products/ai-services/ai-document-intelligence

  31. [31]

    OpenAI API

    OpenAI. OpenAI API. https://platform.openai.com/docs/guides/embeddings, 2023. Accessed on October 14, 2024

  32. [32]

    Ma- tryoshka representation learning.Advances in Neural Information Processing Systems, 35: 30233–30249, 2022

    Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, et al. Ma- tryoshka representation learning.Advances in Neural Information Processing Systems, 35: 30233–30249, 2022

  33. [33]

    SFFA v. Harvard. Students for Fair Admissions, Inc., Petitioner, v. President and Fellows of Harvard College. Students for Fair Admissions, Inc., Petitioner, v. University of North Carolina, et al., 2023.https://www.supremecourt.gov/opinions/22pdf/20-1199_l6gn.pdf

  34. [34]

    Kenneth Tay, Balasubramanian Narasimhan, and Trevor Hastie

    J. Kenneth Tay, Balasubramanian Narasimhan, and Trevor Hastie. Elastic net regularization paths for all generalized linear models.Journal of Statistical Software, 106(1):1–31, 2023. doi: 10.18637/jss.v106.i01

  35. [35]

    John Wiley & Sons, New York, 1987

    Donald B Rubin.Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons, New York, 1987

  36. [36]

    Man is to computer programmer as woman is to homemaker? debiasing word embeddings.Advances in neural information processing systems, 29, 2016

    Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. Man is to computer programmer as woman is to homemaker? debiasing word embeddings.Advances in neural information processing systems, 29, 2016

  37. [37]

    Null it out: Guarding protected attributes by iterative nullspace projection

    Shauli Ravfogel, Yanai Elazar, Hila Gonen, Michael Twiton, and Yoav Goldberg. Null it out: Guarding protected attributes by iterative nullspace projection. InProceedings of the 58th annual meeting of the association for computational linguistics, pages 7237–7256, 2020

  38. [38]

    Censoring representations with an adversary

    Harrison Edwards and Amos Storkey. Censoring representations with an adversary. InProceed- ings of the International Conference in Learning Representations, 2016

  39. [39]

    Mitigating unwanted biases with adversarial learning

    Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. Mitigating unwanted biases with adversarial learning. InProceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 335–340, 2018

  40. [40]

    Fairfax County School Board

    Coalition for TJ v. Fairfax County School Board. Coalition for TJ v. Fairfax County School Board, 2023. 68 F.4th 864 (4th Cir. 2023). https://law.justia.com/cases/federal/ appellate-courts/ca4/22-1280/22-1280-2023-05-23.html

  41. [41]

    Boston School Committee

    Boston Parent Coalition v. Boston School Committee. Boston Parent Coalition for Aca- demic Excellence Corp. v. School Committee for the City of Boston, 2023. 89 F.4th 46 (1st Cir. 2023). https://law.justia.com/cases/federal/appellate-courts/ca1/21- 1303/21-1303-2023-12-19.html

  42. [42]

    Unsupervised elicitation of language models, 2025

    Jiaxin Wen, Zachary Ankner, Arushi Somani, Peter Hase, Samuel Marks, Jacob Goldman- Wetzler, Linda Petrini, Henry Sleight, Collin Burns, He He, Shi Feng, Ethan Perez, and Jan Leike. Unsupervised elicitation of language models, 2025. URL https://arxiv.org/abs/ 2506.10139

  43. [43]

    Gendered language in resumes and its implications for algorithmic bias in hiring

    Prasanna Parasurama and João Sedoc. Gendered language in resumes and its implications for algorithmic bias in hiring. InProceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), Seattle, Washington, July 2022. Association for Computa- tional Linguistics. doi: 10 .18653/v1/2022.gebnlp-1.7. URL https://aclanthology.org/ 2022.ge...

  44. [44]

    Equal Protection Under Algorithms: A New Statistical and Legal Framework.Michigan Law Review, 119(2):291–396, November 2020

    Crystal Yang and Will Dobbie. Equal Protection Under Algorithms: A New Statistical and Legal Framework.Michigan Law Review, 119(2):291–396, November 2020. ISSN 0026-2234. doi: https://doi.org/10.36644/mlr.119.2.equal. URL https://repository.law.umich.edu/ mlr/vol119/iss2/3

  45. [45]

    Pope and Justin R

    Devin G. Pope and Justin R. Sydnor. Implementing Anti-discrimination Policies in Statistical Profiling Models.American Economic Journal: Economic Policy, 3(3):206–231, August 2011. ISSN 1945-7731. doi: 10.1257/pol.3.3.206. URL https://www.aeaweb.org/articles?id= 10.1257/pol.3.3.206

  46. [46]

    Equal Opportunity and Affirmative Action via Counterfactual Predictions

    Yixin Wang, Dhanya Sridhar, and David M Blei. Equal opportunity and affirmative action via counterfactual predictions.arXiv preprint arXiv:1905.10870, 2019

  47. [47]

    OpenAI GPT-5 System Card

    OpenAI. GPT-5 system card.arXiv preprint arXiv:2601.03267, 2025

  48. [48]

    Reevaluating the role of race and ethnicity in diabetes screening.arXiv preprint arXiv:2306.10220, 2023

    Madison Coots, Soroush Saghafian, David Kent, and Sharad Goel. Reevaluating the role of race and ethnicity in diabetes screening.arXiv preprint arXiv:2306.10220, 2023

  49. [49]

    original

    OpenAI. GPT-5.4 Thinking system card. https://deploymentsafety.openai.com/gpt- 5-4-thinking, 2026. Includes addendum on GPT-5.4 mini, March 17, 2026. 14 A Mathematical Appendix Conventions and non-degeneracy assumptions.For an integrable random variable A and random object W , we write E[A|W=w] for a chosen measurable version of the conditional expectatio...