pith. sign in

arxiv: 2605.13866 · v1 · pith:7L2K5X24new · submitted 2026-05-02 · 💻 cs.CY · econ.GN· q-fin.EC

AI Alignment Amplifies the Role of Race, Gender, and Disability in Hiring Decisions

Pith reviewed 2026-05-15 06:09 UTC · model grok-4.3

classification 💻 cs.CY econ.GNq-fin.EC
keywords language modelsAI alignmenthiring biasdemographic discriminationrace gender disabilitypost-training alignmentemployment decisionssimulated hiring
0
0 comments X

The pith

Post-training alignment amplifies hiring advantages for female and Black candidates by over 300 percent while increasing disadvantages for disabled candidates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Language models are now used to screen candidates, but the study tests whether they reproduce or change human patterns of favoring or disfavoring people based on race, gender, and disability. Across dozens of models and many occupations, the models give clear advantages to female and Black applicants relative to comparable male and white ones, while giving clear disadvantages to disabled applicants. The main cause is post-training alignment, the step that turns raw models into helpful, safe assistants. Alignment multiplies the female and Black advantages by more than three times and the disabled disadvantage by nearly two times compared with the same models before alignment. This matters because companies and platforms are already using these aligned models for real employment decisions.

Core claim

Across 27 models and 177 occupations, language models give female and Black candidates hiring advantages relative to otherwise-comparable male and white candidates while giving disabled candidates disadvantages. Post-training alignment is the primary driver: relative to matched pre-trained models, alignment amplifies advantages for female and Black candidates by 325% and 330%, and disadvantages for disabled candidates by 171%. Alignment increases returns to skills and work experience overall, but relatively more so for female and Black candidates, while the absence of qualification signals harms marginalised groups more, particularly disabled candidates.

What carries the argument

The direct comparison of matched pre-trained and post-training-aligned models in identical simulated hiring prompts, which isolates how alignment changes the weight placed on demographic signals versus qualification signals.

Load-bearing premise

Simulated hiring decisions by the language models accurately reflect real-world hiring biases without being strongly shaped by the specific prompt designs or training data distributions.

What would settle it

A field study in which employers actually use the aligned models to screen real resumes and then measure whether the same demographic advantages and disadvantages appear in callback or interview rates.

read the original abstract

Humans increasingly delegate decisions to language models, yet whether these systems reproduce or reshape human patterns of discrimination remains unclear. Here we run a large-scale study to analyse whether language models use demographic information in hiring decisions. We show, across 27 models and 177 occupations, that language models give female and Black candidates hiring advantages relative to otherwise-comparable male and white candidates, while giving disabled candidates disadvantages. The differences are meaningful in magnitude: the role of race, gender, and disability status is comparable to six months to one year of additional education. Post-training alignment is the primary driver: relative to matched pre-trained models, alignment amplifies advantages for female and Black candidates by 325% and 330%, and disadvantages for disabled candidates by 171%. Compared with previous human correspondence studies, language models reverse the direction of racial discrimination, attenuate the disability penalty, and amplify the female advantage by 190%. Alignment changes how models use qualification signals: alignment increases returns to skills and work experience overall, but relatively more so for female and Black candidates. Meanwhile, the absence of qualification signals harms marginalised groups more, particularly for disabled candidates, differences that may explain the asymmetry of alignment effects across groups we observe.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript reports a large-scale empirical study across 27 language models and 177 occupations in which models are prompted to make simulated hiring decisions. It finds that language models assign hiring advantages to female and Black candidates relative to matched male and white candidates, and disadvantages to disabled candidates. These effects are substantial in magnitude, comparable to six months to one year of additional education. Post-training alignment is identified as the primary driver, amplifying the female and Black advantages by 325% and 330% and the disability disadvantage by 171% relative to matched pre-trained models. The study further compares these patterns to prior human correspondence studies, noting reversals in racial discrimination direction and amplification of the female advantage, and examines how alignment alters returns to qualification signals such as skills and experience.

Significance. If the central empirical comparisons hold, the results indicate that alignment procedures can systematically reshape demographic biases in LLM decision-making, with implications for deploying these models in hiring and other high-stakes domains. The scale of the study (27 models, 177 occupations) and the direct pre- versus post-alignment contrasts provide a broad empirical basis. The finding that alignment increases returns to qualifications overall but differentially across groups, and that missing qualification signals harm marginalized groups more, offers testable mechanisms that could guide future alignment research.

major comments (3)
  1. [§3 (Experimental Setup)] §3 (Experimental Setup): The central claim that alignment is the primary driver of the reported amplification effects (325% for female candidates, 330% for Black candidates, 171% for disabled candidates) rests on comparisons between matched pre- and post-trained models. However, the manuscript provides no robustness checks across alternative prompt phrasings or framings of demographic attributes. This is load-bearing because the measured differences could arise from interactions between alignment and the specific elicitation method rather than a general property of aligned models.
  2. [§4.1 (Magnitude Comparisons)] §4.1 (Magnitude Comparisons): The claim that demographic effects are 'comparable to six months to one year of additional education' requires the underlying regression specification and equivalence calculation to be fully specified. Without details on controls for occupation fixed effects, model-specific baselines, or how the education-equivalent metric is derived, it is difficult to evaluate whether the reported magnitudes are robust or sensitive to modeling choices.
  3. [§5 (Human Comparison)] §5 (Human Comparison): The reversal of racial discrimination direction relative to human correspondence studies is a key interpretive claim. The manuscript should clarify whether the human benchmarks are drawn from studies with comparable occupation coverage and decision criteria; any mismatch in task framing could undermine the direct comparison of amplification factors.
minor comments (2)
  1. [Table 1] Table 1: The caption should explicitly state the number of observations per model-occupation cell and whether standard errors are clustered at the model or occupation level.
  2. Notation: The term 'alignment' is used throughout to refer to post-training procedures; a brief footnote distinguishing RLHF, instruction tuning, and other techniques would improve clarity for readers outside the alignment literature.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped us improve the clarity and robustness of our analysis. We address each major comment below and have incorporated revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3 (Experimental Setup)] §3 (Experimental Setup): The central claim that alignment is the primary driver of the reported amplification effects (325% for female candidates, 330% for Black candidates, 171% for disabled candidates) rests on comparisons between matched pre- and post-trained models. However, the manuscript provides no robustness checks across alternative prompt phrasings or framings of demographic attributes. This is load-bearing because the measured differences could arise from interactions between alignment and the specific elicitation method rather than a general property of aligned models.

    Authors: We acknowledge the importance of testing robustness to prompt variations to ensure our findings are not artifacts of the specific elicitation method. In the revised manuscript, we have added robustness analyses using alternative prompt phrasings, including rephrased hiring scenarios and varied framings of demographic attributes (e.g., using names vs. explicit labels). These additional checks show that the amplification effects due to alignment remain consistent in direction and magnitude, indicating that the results reflect general properties of post-training alignment rather than prompt-specific interactions. The full set of prompts used is now detailed in the appendix. revision: yes

  2. Referee: [§4.1 (Magnitude Comparisons)] §4.1 (Magnitude Comparisons): The claim that demographic effects are 'comparable to six months to one year of additional education' requires the underlying regression specification and equivalence calculation to be fully specified. Without details on controls for occupation fixed effects, model-specific baselines, or how the education-equivalent metric is derived, it is difficult to evaluate whether the reported magnitudes are robust or sensitive to modeling choices.

    Authors: We agree that full specification is necessary for evaluating the magnitude claims. We have revised §4.1 to include the complete regression equation, which incorporates occupation fixed effects, model fixed effects, and controls for other candidate attributes. The education-equivalent metric is derived by dividing the coefficient on the demographic indicator by the coefficient on years of education from the same model, then converting to months. We now report this calculation explicitly, along with robustness to alternative controls and specifications, confirming the reported range holds. revision: yes

  3. Referee: [§5 (Human Comparison)] §5 (Human Comparison): The reversal of racial discrimination direction relative to human correspondence studies is a key interpretive claim. The manuscript should clarify whether the human benchmarks are drawn from studies with comparable occupation coverage and decision criteria; any mismatch in task framing could undermine the direct comparison of amplification factors.

    Authors: We have expanded the discussion in §5 to address comparability. The human benchmarks are primarily from large-scale correspondence studies (e.g., Bertrand and Mullainathan 2004 and subsequent replications) that cover a broad range of occupations similar to our 177 occupations, including both professional and non-professional roles. Decision criteria are resume-based callback rates, which align with our simulated hiring decisions. We now include a supplementary table detailing occupation overlap and note that while task framing differs (real vs. simulated), the directional reversal in racial effects is robust. We acknowledge that exact equivalence is limited but argue the comparison highlights meaningful shifts. revision: yes

Circularity Check

0 steps flagged

No circularity: results from direct empirical model comparisons

full rationale

The paper derives its central claims through explicit large-scale simulations of hiring decisions across 27 models and 177 occupations, directly measuring output differences between matched pre-trained and post-alignment versions. No parameters are fitted to data subsets and then re-predicted as results, no self-definitional loops exist in any equations or definitions, and no load-bearing steps reduce to self-citations or imported uniqueness theorems. The amplification percentages (325%, 330%, 171%) are computed from observed output ratios rather than constructed by re-labeling inputs. The derivation remains self-contained against external benchmarks of model behavior.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is an empirical measurement study relying on standard assumptions about LLM behavior and statistical inference rather than new theoretical constructs or fitted parameters.

axioms (1)
  • domain assumption Language model responses to hiring prompts can be used to measure demographic biases in decision-making.
    Central to the experimental design described in the abstract.

pith-pipeline@v0.9.0 · 5520 in / 1303 out tokens · 53312 ms · 2026-05-15T06:09:06.773339+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 4 internal anchors

  1. [1]

    & Mullainathan, S

    Obermeyer, Z., Powers, B., V ogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science366, 447–453 (2019)

  2. [2]

    & Tucker, C

    Lambrecht, A. & Tucker, C. Algorithmic bias? An empirical study of apparent gender-based discrimination in the display of STEM career ads. Manage. Sci.65, 2966–2981 (2019)

  3. [3]

    Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived automat- ically from language corpora contain human-like biases.Science356, 183–186 (2017)

  4. [4]

    & Baert, S

    Lippens, L., Vermeiren, S. & Baert, S. The state of hiring discrimination: A meta-analysis of (almost) all recent correspondence experiments.Eur. Econ. Rev.151, 104315 (2023)

  5. [5]

    Wang, Z.et al.Jobfair: A framework for benchmarking gender hir- ing bias in large language models.Findings of the Association for Computational Linguistics: EMNLP 20243227–3246 (2024)

  6. [6]

    & Tai, M

    An, J., Huang, D., Lin, C. & Tai, M. Measuring gender and racial biases in large language models: Intersectional evidence from automated resume evaluation.PNAS Nexus4, pgaf089 (2025). 13

  7. [7]

    D., Goel, S., Huq, A

    Gaebler, J. D., Goel, S., Huq, A. & Tambe, P. Auditing large lan- guage models for race and gender disparities: Implications for artificial intelligence-based hiring.Behav. Sci. Policy10, 46–55 (2024)

  8. [8]

    & Mankoff, J

    Glazko, K., Mohammed, Y ., Kosa, B., Potluri, V . & Mankoff, J. Identify- ing and improving disability bias in GPT-based resume screening.Proc. ACM FAccT687–700 (2024)

  9. [9]

    R., Jurafsky, D

    Hofmann, V ., Kalluri, P. R., Jurafsky, D. & King, S. AI generates covertly racist decisions about people based on their dialect.Nature633, 147–154 (2024)

  10. [10]

    ONET 29.2 database

    National Center for ONET Development. ONET 29.2 database. https: //www.onetcenter.org/db releases.html (2025)

  11. [11]

    Bureau of Labor Statistics

    U.S. Bureau of Labor Statistics. Employed persons by detailed occupa- tion, sex, race, and Hispanic or Latino ethnicity, 2024 annual averages. https://www.bls.gov/cps/cps aa2024.htm (2024)

  12. [12]

    Bureau of Labor Statistics

    U.S. Bureau of Labor Statistics. Occupational employment and wage statistics, May 2024, all data. https://www.bls.gov/oes/tables.htm (2024)

  13. [13]

    Bureau of Labor Statistics

    U.S. Bureau of Labor Statistics. Persons with a disability: Labor force characteristics – 2024. https://www.bls.gov/news.release/disabl.t03.htm (2025). 14

  14. [14]

    Y .et al.Health system-scale language models are all-purpose prediction engines.Nature619, 357–362 (2023)

    Jiang, L. Y .et al.Health system-scale language models are all-purpose prediction engines.Nature619, 357–362 (2023)

  15. [15]

    & Mullainathan, S

    Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J. & Mullainathan, S. Human decisions and machine predictions.The Quarterly Journal of Economics133, 237–293 (2018)

  16. [16]

    Computer says ‘no’: Exploring systemic bias in ChatGPT using an audit approach.Information Economics and Policy68, 101145 (2024)

    Lippens, L. Computer says ‘no’: Exploring systemic bias in ChatGPT using an audit approach.Information Economics and Policy68, 101145 (2024)

  17. [17]

    Cheng, K. H. R., Castelo, N., Joshi, P. D. & Bhatia, S. Sycophantic AI decreases prosocial intentions and promotes dependence.Science (2026)

  18. [18]

    S.The Economics of Discrimination(University of Chicago Press, 1957)

    Becker, G. S.The Economics of Discrimination(University of Chicago Press, 1957)

  19. [19]

    Phelps, E. S. The statistical theory of racism and sexism.Am. Econ. Rev. 62, 659–661 (1972)

  20. [20]

    The theory of discrimination

    Arrow, K. The theory of discrimination. Working Paper 30A, Princeton University, Industrial Relations Section (1971)

  21. [21]

    Aigner, D. J. & Cain, G. G. Statistical theories of discrimination in labor markets.ILR Rev.30, 175–187 (1977). 15

  22. [22]

    & Mullainathan, S

    Bertrand, M. & Mullainathan, S. Are Emily and Greg more employ- able than Lakisha and Jamal? A field experiment on labor market discrimination.Am. Econ. Rev.94, 991–1013 (2004)

  23. [23]

    Kline, P., Rose, E. K. & Walters, C. R. Systemic discrimination among large U.S. employers.Q. J. Econ.137, 1963–2036 (2022)

  24. [24]

    & Midtboen, A

    Quillian, L., Pager, D., Hexel, O. & Midtboen, A. H. Meta-analysis of field experiments shows no change in racial discrimination in hiring over time.Proc. Natl Acad. Sci. USA114, 10870–10875 (2017)

  25. [25]

    Ameri, M.et al.The disability employment puzzle: A field experiment on employer hiring behavior.ILR Rev.71, 329–364 (2018)

  26. [26]

    A., Imas, A

    Bohren, J. A., Imas, A. & Rosenberg, M. The dynamics of dis- crimination: Theory and evidence.Am. Econ. Rev.109, 3395–3436 (2019)

  27. [27]

    Altonji, J. G. & Pierret, C. R. Employer learning and statistical discrimination.Q. J. Econ.116, 313–350 (2001)

  28. [28]

    A., Haggag, K., Imas, A

    Bohren, J. A., Haggag, K., Imas, A. & Pope, D. G. Inaccurate statistical discrimination: An identification problem.Rev. Econ. Stat.107, 605–620 (2025)

  29. [29]

    EU Artificial Intelligence Act (2024)

    European Parliament. EU Artificial Intelligence Act (2024). 16

  30. [30]

    Auto- mated employment decision tools (AEDT), Local Law 144 (2023)

    New York City Department of Consumer and Worker Protection. Auto- mated employment decision tools (AEDT), Local Law 144 (2023)

  31. [31]

    SAT suite of assessments annual report

    College Board. SAT suite of assessments annual report. https://reports. collegeboard.org/sat-suite-program-results (2025)

  32. [32]

    The ACT technical manual

    ACT. The ACT technical manual. https://www.act.org/content/act/en/ research/reports/technical-manuals-and-fairness-reports.html (2025)

  33. [33]

    & Lemanski, N

    Nord, C., Roey, S., Perkins, R., Lyons, M. & Lemanski, N. America’s high school graduates: Results of the 2009 NAEP high school tran- script study. Tech. Rep. NCES 2011-462, National Center for Education Statistics (2011)

  34. [34]

    & Healy, C

    Rojstaczer, S. & Healy, C. Grade inflation at american colleges and universities. https://www.gradeinflation.com/ (2024)

  35. [35]

    The Llama 3 Herd of Models

    Grattafiori, A.et al.The Llama 3 herd of models. Preprint at https: //arxiv.org/abs/2407.21783 (2024)

  36. [36]

    Qwen2.5 Technical Report

    Yang, A.et al.Qwen2.5 technical report. Preprint at https://arxiv.org/ abs/2412.15115 (2025)

  37. [37]

    Qwen3 Technical Report

    Yang, A.et al.Qwen3 technical report. Preprint at https://arxiv.org/abs/ 2505.09388 (2025)

  38. [38]

    OpenAI GPT-5 System Card

    Singh, A.et al.OpenAI GPT-5 system card. Preprint at https://arxiv.org/ abs/2601.03267 (2025). 17

  39. [39]

    Gemini 2.5: Our most intelligent AI model

    Google DeepMind. Gemini 2.5: Our most intelligent AI model. Google DeepMind Technical Report (2025)

  40. [40]

    Cameron, A. C. & Miller, D. L. A practitioner’s guide to cluster-robust inference.J. Hum. Resour.50, 317–372 (2015)

  41. [41]

    C., Gelbach, J

    Cameron, A. C., Gelbach, J. B. & Miller, D. L. Robust inference with multiway clustering.J. Bus. Econ. Stat.29, 238–249 (2011)

  42. [42]

    & Laird, N

    DerSimonian, R. & Laird, N. Meta-analysis in clinical trials.Control. Clin. Trials7, 177–188 (1986)

  43. [43]

    & Knapp, G

    Hartung, J. & Knapp, G. On tests of the overall treatment effect in meta- analysis with normally distributed responses.Stat. Med.20, 1771–1782 (2001)

  44. [44]

    & Guestrin, C

    Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining785–794 (2016)

  45. [45]

    Random forests.Mach

    Breiman, L. Random forests.Mach. Learn.45, 5–32 (2001)

  46. [46]

    Less than a High School Diploma

    Cameron, A. C., Gelbach, J. B. & Miller, D. L. Bootstrap-based improve- ments for inference with clustered errors.Rev. Econ. Stat.90, 414–427 (2008). 18 Methods I. Audit design Occupation sample.We draw occupation-level data from four public U.S. sources: the ONET database 10 (required education, work experience, general skills, and technology-related ski...