AI Alignment Amplifies the Role of Race, Gender, and Disability in Hiring Decisions
Pith reviewed 2026-05-15 06:09 UTC · model grok-4.3
The pith
Post-training alignment amplifies hiring advantages for female and Black candidates by over 300 percent while increasing disadvantages for disabled candidates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Across 27 models and 177 occupations, language models give female and Black candidates hiring advantages relative to otherwise-comparable male and white candidates while giving disabled candidates disadvantages. Post-training alignment is the primary driver: relative to matched pre-trained models, alignment amplifies advantages for female and Black candidates by 325% and 330%, and disadvantages for disabled candidates by 171%. Alignment increases returns to skills and work experience overall, but relatively more so for female and Black candidates, while the absence of qualification signals harms marginalised groups more, particularly disabled candidates.
What carries the argument
The direct comparison of matched pre-trained and post-training-aligned models in identical simulated hiring prompts, which isolates how alignment changes the weight placed on demographic signals versus qualification signals.
Load-bearing premise
Simulated hiring decisions by the language models accurately reflect real-world hiring biases without being strongly shaped by the specific prompt designs or training data distributions.
What would settle it
A field study in which employers actually use the aligned models to screen real resumes and then measure whether the same demographic advantages and disadvantages appear in callback or interview rates.
read the original abstract
Humans increasingly delegate decisions to language models, yet whether these systems reproduce or reshape human patterns of discrimination remains unclear. Here we run a large-scale study to analyse whether language models use demographic information in hiring decisions. We show, across 27 models and 177 occupations, that language models give female and Black candidates hiring advantages relative to otherwise-comparable male and white candidates, while giving disabled candidates disadvantages. The differences are meaningful in magnitude: the role of race, gender, and disability status is comparable to six months to one year of additional education. Post-training alignment is the primary driver: relative to matched pre-trained models, alignment amplifies advantages for female and Black candidates by 325% and 330%, and disadvantages for disabled candidates by 171%. Compared with previous human correspondence studies, language models reverse the direction of racial discrimination, attenuate the disability penalty, and amplify the female advantage by 190%. Alignment changes how models use qualification signals: alignment increases returns to skills and work experience overall, but relatively more so for female and Black candidates. Meanwhile, the absence of qualification signals harms marginalised groups more, particularly for disabled candidates, differences that may explain the asymmetry of alignment effects across groups we observe.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports a large-scale empirical study across 27 language models and 177 occupations in which models are prompted to make simulated hiring decisions. It finds that language models assign hiring advantages to female and Black candidates relative to matched male and white candidates, and disadvantages to disabled candidates. These effects are substantial in magnitude, comparable to six months to one year of additional education. Post-training alignment is identified as the primary driver, amplifying the female and Black advantages by 325% and 330% and the disability disadvantage by 171% relative to matched pre-trained models. The study further compares these patterns to prior human correspondence studies, noting reversals in racial discrimination direction and amplification of the female advantage, and examines how alignment alters returns to qualification signals such as skills and experience.
Significance. If the central empirical comparisons hold, the results indicate that alignment procedures can systematically reshape demographic biases in LLM decision-making, with implications for deploying these models in hiring and other high-stakes domains. The scale of the study (27 models, 177 occupations) and the direct pre- versus post-alignment contrasts provide a broad empirical basis. The finding that alignment increases returns to qualifications overall but differentially across groups, and that missing qualification signals harm marginalized groups more, offers testable mechanisms that could guide future alignment research.
major comments (3)
- [§3 (Experimental Setup)] §3 (Experimental Setup): The central claim that alignment is the primary driver of the reported amplification effects (325% for female candidates, 330% for Black candidates, 171% for disabled candidates) rests on comparisons between matched pre- and post-trained models. However, the manuscript provides no robustness checks across alternative prompt phrasings or framings of demographic attributes. This is load-bearing because the measured differences could arise from interactions between alignment and the specific elicitation method rather than a general property of aligned models.
- [§4.1 (Magnitude Comparisons)] §4.1 (Magnitude Comparisons): The claim that demographic effects are 'comparable to six months to one year of additional education' requires the underlying regression specification and equivalence calculation to be fully specified. Without details on controls for occupation fixed effects, model-specific baselines, or how the education-equivalent metric is derived, it is difficult to evaluate whether the reported magnitudes are robust or sensitive to modeling choices.
- [§5 (Human Comparison)] §5 (Human Comparison): The reversal of racial discrimination direction relative to human correspondence studies is a key interpretive claim. The manuscript should clarify whether the human benchmarks are drawn from studies with comparable occupation coverage and decision criteria; any mismatch in task framing could undermine the direct comparison of amplification factors.
minor comments (2)
- [Table 1] Table 1: The caption should explicitly state the number of observations per model-occupation cell and whether standard errors are clustered at the model or occupation level.
- Notation: The term 'alignment' is used throughout to refer to post-training procedures; a brief footnote distinguishing RLHF, instruction tuning, and other techniques would improve clarity for readers outside the alignment literature.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which have helped us improve the clarity and robustness of our analysis. We address each major comment below and have incorporated revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3 (Experimental Setup)] §3 (Experimental Setup): The central claim that alignment is the primary driver of the reported amplification effects (325% for female candidates, 330% for Black candidates, 171% for disabled candidates) rests on comparisons between matched pre- and post-trained models. However, the manuscript provides no robustness checks across alternative prompt phrasings or framings of demographic attributes. This is load-bearing because the measured differences could arise from interactions between alignment and the specific elicitation method rather than a general property of aligned models.
Authors: We acknowledge the importance of testing robustness to prompt variations to ensure our findings are not artifacts of the specific elicitation method. In the revised manuscript, we have added robustness analyses using alternative prompt phrasings, including rephrased hiring scenarios and varied framings of demographic attributes (e.g., using names vs. explicit labels). These additional checks show that the amplification effects due to alignment remain consistent in direction and magnitude, indicating that the results reflect general properties of post-training alignment rather than prompt-specific interactions. The full set of prompts used is now detailed in the appendix. revision: yes
-
Referee: [§4.1 (Magnitude Comparisons)] §4.1 (Magnitude Comparisons): The claim that demographic effects are 'comparable to six months to one year of additional education' requires the underlying regression specification and equivalence calculation to be fully specified. Without details on controls for occupation fixed effects, model-specific baselines, or how the education-equivalent metric is derived, it is difficult to evaluate whether the reported magnitudes are robust or sensitive to modeling choices.
Authors: We agree that full specification is necessary for evaluating the magnitude claims. We have revised §4.1 to include the complete regression equation, which incorporates occupation fixed effects, model fixed effects, and controls for other candidate attributes. The education-equivalent metric is derived by dividing the coefficient on the demographic indicator by the coefficient on years of education from the same model, then converting to months. We now report this calculation explicitly, along with robustness to alternative controls and specifications, confirming the reported range holds. revision: yes
-
Referee: [§5 (Human Comparison)] §5 (Human Comparison): The reversal of racial discrimination direction relative to human correspondence studies is a key interpretive claim. The manuscript should clarify whether the human benchmarks are drawn from studies with comparable occupation coverage and decision criteria; any mismatch in task framing could undermine the direct comparison of amplification factors.
Authors: We have expanded the discussion in §5 to address comparability. The human benchmarks are primarily from large-scale correspondence studies (e.g., Bertrand and Mullainathan 2004 and subsequent replications) that cover a broad range of occupations similar to our 177 occupations, including both professional and non-professional roles. Decision criteria are resume-based callback rates, which align with our simulated hiring decisions. We now include a supplementary table detailing occupation overlap and note that while task framing differs (real vs. simulated), the directional reversal in racial effects is robust. We acknowledge that exact equivalence is limited but argue the comparison highlights meaningful shifts. revision: yes
Circularity Check
No circularity: results from direct empirical model comparisons
full rationale
The paper derives its central claims through explicit large-scale simulations of hiring decisions across 27 models and 177 occupations, directly measuring output differences between matched pre-trained and post-alignment versions. No parameters are fitted to data subsets and then re-predicted as results, no self-definitional loops exist in any equations or definitions, and no load-bearing steps reduce to self-citations or imported uniqueness theorems. The amplification percentages (325%, 330%, 171%) are computed from observed output ratios rather than constructed by re-labeling inputs. The derivation remains self-contained against external benchmarks of model behavior.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Language model responses to hiring prompts can be used to measure demographic biases in decision-making.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Post-training alignment is the primary driver: relative to matched pre-trained models, alignment amplifies advantages for female and Black candidates by 325% and 330%, and disadvantages for disabled candidates by 171%.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Obermeyer, Z., Powers, B., V ogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science366, 447–453 (2019)
work page 2019
-
[2]
Lambrecht, A. & Tucker, C. Algorithmic bias? An empirical study of apparent gender-based discrimination in the display of STEM career ads. Manage. Sci.65, 2966–2981 (2019)
work page 2019
-
[3]
Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived automat- ically from language corpora contain human-like biases.Science356, 183–186 (2017)
work page 2017
-
[4]
Lippens, L., Vermeiren, S. & Baert, S. The state of hiring discrimination: A meta-analysis of (almost) all recent correspondence experiments.Eur. Econ. Rev.151, 104315 (2023)
work page 2023
-
[5]
Wang, Z.et al.Jobfair: A framework for benchmarking gender hir- ing bias in large language models.Findings of the Association for Computational Linguistics: EMNLP 20243227–3246 (2024)
work page 2024
- [6]
-
[7]
Gaebler, J. D., Goel, S., Huq, A. & Tambe, P. Auditing large lan- guage models for race and gender disparities: Implications for artificial intelligence-based hiring.Behav. Sci. Policy10, 46–55 (2024)
work page 2024
-
[8]
Glazko, K., Mohammed, Y ., Kosa, B., Potluri, V . & Mankoff, J. Identify- ing and improving disability bias in GPT-based resume screening.Proc. ACM FAccT687–700 (2024)
work page 2024
-
[9]
Hofmann, V ., Kalluri, P. R., Jurafsky, D. & King, S. AI generates covertly racist decisions about people based on their dialect.Nature633, 147–154 (2024)
work page 2024
-
[10]
National Center for ONET Development. ONET 29.2 database. https: //www.onetcenter.org/db releases.html (2025)
work page 2025
-
[11]
U.S. Bureau of Labor Statistics. Employed persons by detailed occupa- tion, sex, race, and Hispanic or Latino ethnicity, 2024 annual averages. https://www.bls.gov/cps/cps aa2024.htm (2024)
work page 2024
-
[12]
U.S. Bureau of Labor Statistics. Occupational employment and wage statistics, May 2024, all data. https://www.bls.gov/oes/tables.htm (2024)
work page 2024
-
[13]
U.S. Bureau of Labor Statistics. Persons with a disability: Labor force characteristics – 2024. https://www.bls.gov/news.release/disabl.t03.htm (2025). 14
work page 2024
-
[14]
Jiang, L. Y .et al.Health system-scale language models are all-purpose prediction engines.Nature619, 357–362 (2023)
work page 2023
-
[15]
Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J. & Mullainathan, S. Human decisions and machine predictions.The Quarterly Journal of Economics133, 237–293 (2018)
work page 2018
-
[16]
Lippens, L. Computer says ‘no’: Exploring systemic bias in ChatGPT using an audit approach.Information Economics and Policy68, 101145 (2024)
work page 2024
-
[17]
Cheng, K. H. R., Castelo, N., Joshi, P. D. & Bhatia, S. Sycophantic AI decreases prosocial intentions and promotes dependence.Science (2026)
work page 2026
-
[18]
S.The Economics of Discrimination(University of Chicago Press, 1957)
Becker, G. S.The Economics of Discrimination(University of Chicago Press, 1957)
work page 1957
-
[19]
Phelps, E. S. The statistical theory of racism and sexism.Am. Econ. Rev. 62, 659–661 (1972)
work page 1972
-
[20]
Arrow, K. The theory of discrimination. Working Paper 30A, Princeton University, Industrial Relations Section (1971)
work page 1971
-
[21]
Aigner, D. J. & Cain, G. G. Statistical theories of discrimination in labor markets.ILR Rev.30, 175–187 (1977). 15
work page 1977
-
[22]
Bertrand, M. & Mullainathan, S. Are Emily and Greg more employ- able than Lakisha and Jamal? A field experiment on labor market discrimination.Am. Econ. Rev.94, 991–1013 (2004)
work page 2004
-
[23]
Kline, P., Rose, E. K. & Walters, C. R. Systemic discrimination among large U.S. employers.Q. J. Econ.137, 1963–2036 (2022)
work page 1963
-
[24]
Quillian, L., Pager, D., Hexel, O. & Midtboen, A. H. Meta-analysis of field experiments shows no change in racial discrimination in hiring over time.Proc. Natl Acad. Sci. USA114, 10870–10875 (2017)
work page 2017
-
[25]
Ameri, M.et al.The disability employment puzzle: A field experiment on employer hiring behavior.ILR Rev.71, 329–364 (2018)
work page 2018
-
[26]
Bohren, J. A., Imas, A. & Rosenberg, M. The dynamics of dis- crimination: Theory and evidence.Am. Econ. Rev.109, 3395–3436 (2019)
work page 2019
-
[27]
Altonji, J. G. & Pierret, C. R. Employer learning and statistical discrimination.Q. J. Econ.116, 313–350 (2001)
work page 2001
-
[28]
Bohren, J. A., Haggag, K., Imas, A. & Pope, D. G. Inaccurate statistical discrimination: An identification problem.Rev. Econ. Stat.107, 605–620 (2025)
work page 2025
-
[29]
EU Artificial Intelligence Act (2024)
European Parliament. EU Artificial Intelligence Act (2024). 16
work page 2024
-
[30]
Auto- mated employment decision tools (AEDT), Local Law 144 (2023)
New York City Department of Consumer and Worker Protection. Auto- mated employment decision tools (AEDT), Local Law 144 (2023)
work page 2023
-
[31]
SAT suite of assessments annual report
College Board. SAT suite of assessments annual report. https://reports. collegeboard.org/sat-suite-program-results (2025)
work page 2025
-
[32]
ACT. The ACT technical manual. https://www.act.org/content/act/en/ research/reports/technical-manuals-and-fairness-reports.html (2025)
work page 2025
-
[33]
Nord, C., Roey, S., Perkins, R., Lyons, M. & Lemanski, N. America’s high school graduates: Results of the 2009 NAEP high school tran- script study. Tech. Rep. NCES 2011-462, National Center for Education Statistics (2011)
work page 2009
-
[34]
Rojstaczer, S. & Healy, C. Grade inflation at american colleges and universities. https://www.gradeinflation.com/ (2024)
work page 2024
-
[35]
Grattafiori, A.et al.The Llama 3 herd of models. Preprint at https: //arxiv.org/abs/2407.21783 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[36]
Yang, A.et al.Qwen2.5 technical report. Preprint at https://arxiv.org/ abs/2412.15115 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[37]
Yang, A.et al.Qwen3 technical report. Preprint at https://arxiv.org/abs/ 2505.09388 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[38]
Singh, A.et al.OpenAI GPT-5 system card. Preprint at https://arxiv.org/ abs/2601.03267 (2025). 17
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[39]
Gemini 2.5: Our most intelligent AI model
Google DeepMind. Gemini 2.5: Our most intelligent AI model. Google DeepMind Technical Report (2025)
work page 2025
-
[40]
Cameron, A. C. & Miller, D. L. A practitioner’s guide to cluster-robust inference.J. Hum. Resour.50, 317–372 (2015)
work page 2015
-
[41]
Cameron, A. C., Gelbach, J. B. & Miller, D. L. Robust inference with multiway clustering.J. Bus. Econ. Stat.29, 238–249 (2011)
work page 2011
-
[42]
DerSimonian, R. & Laird, N. Meta-analysis in clinical trials.Control. Clin. Trials7, 177–188 (1986)
work page 1986
-
[43]
Hartung, J. & Knapp, G. On tests of the overall treatment effect in meta- analysis with normally distributed responses.Stat. Med.20, 1771–1782 (2001)
work page 2001
-
[44]
Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining785–794 (2016)
work page 2016
- [45]
-
[46]
Less than a High School Diploma
Cameron, A. C., Gelbach, J. B. & Miller, D. L. Bootstrap-based improve- ments for inference with clustered errors.Rev. Econ. Stat.90, 414–427 (2008). 18 Methods I. Audit design Occupation sample.We draw occupation-level data from four public U.S. sources: the ONET database 10 (required education, work experience, general skills, and technology-related ski...
work page 2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.