Implicit Bias-Like Patterns in Reasoning Models

Calvin K. Lai; Messi H.J. Lee

arxiv: 2503.11572 · v4 · submitted 2025-03-14 · 💻 cs.CY · cs.AI

Implicit Bias-Like Patterns in Reasoning Models

Messi H.J. Lee , Calvin K. Lai This is my paper

Pith reviewed 2026-05-23 00:14 UTC · model grok-4.3

classification 💻 cs.CY cs.AI

keywords implicit biasreasoning modelslarge language modelsassociation teststereotypestoken usagecomputational effort

0 comments

The pith

Reasoning models use more tokens on association-incompatible tasks than compatible ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Reasoning Model Implicit Association Test to examine processing in step-by-step reasoning LLMs. It reports that models including o3-mini, DeepSeek-R1, gpt-oss-20b, and Qwen-3 8B generate more reasoning tokens for tasks pairing concepts in counter-stereotypical ways than for stereotypical pairings. This difference is interpreted as evidence of greater computational effort on incompatible associations. Claude 3.7 Sonnet showed the opposite pattern, which the authors link to its internal reasoning about stereotypes. The results indicate that bias-like patterns in these models are shaped by the content of their reasoning traces.

Core claim

Reasoning models exhibit implicit bias-like patterns in which they expend more reasoning tokens on association-incompatible tasks than on association-compatible tasks, indicating greater computational effort when processing counter-stereotypical information, with the patterns varying according to each model's internal reasoning content.

What carries the argument

The Reasoning Model Implicit Association Test (RM-IAT), which measures differences in reasoning token counts between tasks that align with or conflict with common associations.

If this is right

Most reasoning models tested show greater token expenditure on counter-stereotypical pairings.
One model displayed reversed patterns tied to its explicit reasoning about bias and stereotypes.
Token-count differences depend on the specific internal reasoning content produced by each model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If token counts track processing effort, then interventions that change reasoning content might alter these patterns.
The method could be applied to compare bias-like processing across additional model families or task domains.
Variation in patterns across models suggests that training data or fine-tuning choices influence the direction of the effect.

Load-bearing premise

Differences in the number of reasoning tokens reflect implicit bias-like computational effort rather than unrelated factors such as task length or difficulty.

What would settle it

Re-running the RM-IAT on the same models and finding no consistent difference in token counts between compatible and incompatible association tasks.

Figures

Figures reproduced from arXiv: 2503.11572 by Calvin K. Lai, Messi H.J. Lee.

**Figure 1.** Figure 1: In the Reasoning Model IAT (RM-IAT), the reasoning model is first presented with word stimuli representing [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Effect sizes of all 10 RM-IATs across five reasoning models. Error bars represent 95% CIs. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

read the original abstract

Implicit biases refer to automatic mental processes that shape perceptions, judgments, and behaviors. Previous research on "implicit bias" in LLMs focused primarily on outputs rather than the processes underlying the outputs. We present the Reasoning Model Implicit Association Test (RM-IAT) to study implicit bias-like processing in reasoning models, LLMs that use step-by-step reasoning to solve complex tasks. Using RM-IAT, we find that reasoning models like o3-mini, DeepSeek-R1, gpt-oss-20b, and Qwen-3 8B consistently expend more reasoning tokens on association-incompatible tasks than association-compatible tasks, suggesting greater computational effort when processing counter-stereotypical information. Conversely, Claude 3.7 Sonnet exhibited reversed patterns, which thematic analysis associated with its unique internal focus on reasoning about bias and stereotypes. These findings demonstrate that reasoning models exhibit distinct implicit bias-like patterns and that these patterns vary significantly depending on the models' internal reasoning content.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces the Reasoning Model Implicit Association Test (RM-IAT) to study implicit bias-like processing in reasoning models. It finds that models such as o3-mini, DeepSeek-R1, gpt-oss-20b, and Qwen-3 8B expend more reasoning tokens on association-incompatible tasks than on compatible ones, suggesting greater effort on counter-stereotypical information. Claude 3.7 Sonnet shows reversed patterns linked to its focus on bias reasoning. The findings indicate distinct implicit bias-like patterns varying by model.

Significance. If the token count differences can be attributed specifically to bias-like processing rather than task difficulty, this would be a significant contribution to understanding internal mechanisms of bias in LLMs, extending prior work on output biases to reasoning processes. The model-specific variations are noteworthy. The paper provides empirical observations but the strength depends on the validity of the proxy and controls used.

major comments (2)

Abstract: The abstract states observational findings on token counts but supplies no methodological details, task descriptions, statistical tests, controls, or error analysis, so it is not possible to verify whether the data support the claim as stated.
Abstract / central claim: The interpretation of higher reasoning token counts as evidence of greater computational effort on counter-stereotypical information requires that RM-IAT tasks are matched on all other dimensions affecting reasoning length (e.g., logical complexity, number of steps needed). No indication is given that such matching or content analysis of the extra tokens was performed.

minor comments (1)

The weakest assumption (token count as proxy for bias-specific effort) should be explicitly discussed and tested in a dedicated limitations or methods subsection.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below and note planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: Abstract: The abstract states observational findings on token counts but supplies no methodological details, task descriptions, statistical tests, controls, or error analysis, so it is not possible to verify whether the data support the claim as stated.

Authors: We agree that the submitted abstract is concise and omits key details. In the revised version, we will expand the abstract to include a brief description of the RM-IAT task structure, the models tested, the token-count comparison approach, and the statistical tests used. revision: yes
Referee: Abstract / central claim: The interpretation of higher reasoning token counts as evidence of greater computational effort on counter-stereotypical information requires that RM-IAT tasks are matched on all other dimensions affecting reasoning length (e.g., logical complexity, number of steps needed). No indication is given that such matching or content analysis of the extra tokens was performed.

Authors: The RM-IAT follows the standard IAT structure in which compatible and incompatible conditions employ the same stimuli, categories, and logical framing, differing only in association direction. This design matches the tasks on content and complexity by construction. The manuscript does not report a post-hoc content analysis of reasoning tokens; we will add an explicit description of the matching procedure and a limitations discussion acknowledging the value of future token-level analysis. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical measurement of token usage with no derivations or self-referential definitions

full rationale

The paper defines RM-IAT as a new test procedure and reports direct empirical observations of reasoning token counts across models on compatible vs. incompatible tasks. No equations, fitted parameters, predictions derived from prior fits, or load-bearing self-citations appear in the provided text. The central finding (higher token counts on incompatible tasks) is a raw measurement, not a quantity shown to equal its own inputs by construction. The interpretation of token count as a bias proxy is an assumption about validity, not a circular derivation step. This is a standard observational study whose claims stand or fall on experimental controls rather than logical reduction to the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on one domain assumption about token counts as a bias proxy; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption The number of reasoning tokens expended serves as a valid proxy for implicit bias-like processing or computational effort when handling counter-stereotypical information.
Invoked to interpret higher token usage on incompatible tasks as evidence of greater effort on counter-stereotypical information.

pith-pipeline@v0.9.0 · 5692 in / 1389 out tokens · 50364 ms · 2026-05-23T00:14:58.781263+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

[1]

doi:10.1146/annurev-psych-010419-050837

ISSN 1545-2085. doi:10.1146/annurev-psych-010419-050837. B. Keith Payne and Bertram Gawronski. A history of implicit social cognition: Where is it coming from? Where is it now? Where is it going? InHandbook of Implicit Social Cognition: Measurement, Theory, and Applications, pages 1–15. The Guilford Press, New York, NY , US,

work page doi:10.1146/annurev-psych-010419-050837 2085
[2]

doi:10.1016/j.tics.2018.02.001

ISSN 1364-6613. doi:10.1016/j.tics.2018.02.001. John A. Bargh and Erin L. Williams. The Automaticity of Social Life.Current Directions in Psychological Science, 15 (1):1–4,

work page doi:10.1016/j.tics.2018.02.001 2018
[3]

doi:10.1111/j.0963-7214.2006.00395.x

ISSN 1467-8721. doi:10.1111/j.0963-7214.2006.00395.x. Russell H. Fazio, David M. Sanbonmatsu, Martha C. Powell, and Frank R. Kardes. On the automatic activation of attitudes.Journal of Personality and Social Psychology, 50(2):229–238,

work page doi:10.1111/j.0963-7214.2006.00395.x 2006
[4]

doi:10.1037/0022- 3514.50.2.229

ISSN 1939-1315. doi:10.1037/0022- 3514.50.2.229. Jens Agerström and Dan-Olof Rooth. The role of automatic obesity stereotypes in real hiring discrimination.Journal of Applied Psychology, 96(4):790–805,

work page doi:10.1037/0022- 1939
[5]

doi:10.1037/a0021594

ISSN 1939-1854. doi:10.1037/a0021594. Chloë FitzGerald and Samia Hurst. Implicit bias in healthcare professionals: A systematic review.BMC Medical Ethics, 18(1):19, March

work page doi:10.1037/a0021594 1939
[6]

doi:10.1186/s12910-017-0179-8

ISSN 1472-6939. doi:10.1186/s12910-017-0179-8. Katherine B. Spencer, Amanda K. Charbonneau, and Jack Glaser. Implicit Bias and Policing.Social and Personality Psychology Compass, 10(1):50–63,

work page doi:10.1186/s12910-017-0179-8
[7]

doi:10.1111/spc3.12210

ISSN 1751-9004. doi:10.1111/spc3.12210. A. G. Greenwald, D. E. McGhee, and J. L. Schwartz. Measuring individual differences in implicit cognition: The implicit association test.Journal of Personality and Social Psychology, 74(6):1464–1480, June

work page doi:10.1111/spc3.12210
[8]

doi:10.1037//0022-3514.74.6.1464

ISSN 0022-3514. doi:10.1037//0022-3514.74.6.1464. Bertram Gawronski. Automaticity and Implicit Measures. In Charles M. Judd, Harry T. Reis, and Tessa West, editors,Handbook of Research Methods in Social and Personality Psychology, Cambridge Handbooks in Psy- chology, pages 404–426. Cambridge University Press, Cambridge, 3 edition,

work page doi:10.1037//0022-3514.74.6.1464
[9]

doi:10.1017/9781009170123.018

ISBN 978-1-009-17011-6. doi:10.1017/9781009170123.018. Alexandra Goedderz, Zahra Rahmani Azad, and Adam Hahn. Awareness of Implicit Attitudes Revisited: A Meta- Analysis on Replications Across Samples and Settings.Collabra: Psychology, 10(1):126220, December

work page doi:10.1017/9781009170123.018
[10]

doi:10.1525/collabra.126220

ISSN 2474-7394. doi:10.1525/collabra.126220. Calvin K. Lai and Megan E. Wilson. Measuring implicit intergroup biases.Social and Personality Psychology Compass, 15(1),

work page doi:10.1525/collabra.126220
[11]

doi:10.1111/spc3.12573

ISSN 1751-9004. doi:10.1111/spc3.12573. Adam Morris and Benedek Kurdi. Awareness of implicit attitudes: Large-scale investigations of mechanism and scope.Journal of Experimental Psychology: General, 152(12):3311–3343,

work page doi:10.1111/spc3.12573
[12]

doi:10.1037/xge0001464

ISSN 1939-2222. doi:10.1037/xge0001464. William A. Cunningham and Philip David Zelazo. Attitudes and evaluations: A social cognitive neuroscience perspective. Trends in Cognitive Sciences, 11(3):97–104, March

work page doi:10.1037/xge0001464 1939
[13]

doi:10.1016/j.tics.2006.12.005

ISSN 1364-6613. doi:10.1016/j.tics.2006.12.005. Abubakar Abid, Maheen Farooqi, and James Zou. Persistent Anti-Muslim Bias in Large Language Models, January

work page doi:10.1016/j.tics.2006.12.005 2006
[14]

Gender and Representation Bias in GPT-3 Generated Stories

9 APREPRINT- SEPTEMBER30, 2025 Li Lucy and David Bamman. Gender and Representation Bias in GPT-3 Generated Stories. In Nader Akoury, Faeze Brahman, Snigdha Chaturvedi, Elizabeth Clark, Mohit Iyyer, and Lara J. Martin, editors,Proceedings of the Third Workshop on Narrative Understanding, pages 48–55, Virtual, June

work page 2025
[15]

doi:10.18653/v1/2021.nuse-1.5

Association for Computational Linguistics. doi:10.18653/v1/2021.nuse-1.5. Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. Fine-Tuning Language Models from Human Preferences, January

work page doi:10.18653/v1/2021.nuse-1.5 2021
[16]

A Comparative Study of Explicit and Implicit Gender Biases in Large Language Models via Self-evaluation

Yachao Zhao, Bo Wang, Yan Wang, Dongming Zhao, Xiaojia Jin, Jijun Zhang, Ruifang He, and Yuexian Hou. A Comparative Study of Explicit and Implicit Gender Biases in Large Language Models via Self-evaluation. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, editors, Proceedings of the 2024 Joint Intern...

work page 2024
[17]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou

doi:10.1073/pnas.2416228122. Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, January

work page doi:10.1073/pnas.2416228122
[18]

doi:10.1037/a0015575

ISSN 1939-1315. doi:10.1037/a0015575. 10 APREPRINT- SEPTEMBER30, 2025 Anthony G. Greenwald, Brian A. Nosek, and Mahzarin R. Banaji. Understanding and using the Implicit Association Test: I. An improved scoring algorithm.Journal of Personality and Social Psychology, 85(2):197–216,

work page doi:10.1037/a0015575 1939
[19]

doi:10.1037/0022-3514.85.2.197

ISSN 1939-1315. doi:10.1037/0022-3514.85.2.197. Anthropic. Claude 3.5 Sonnet Model Card Addendum. Technical report, Anthropic, June

work page doi:10.1037/0022-3514.85.2.197 1939
[20]

Marianne Bertrand and Sendhil Mullainathan

doi:10.1126/science.aal4230. Marianne Bertrand and Sendhil Mullainathan. Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination.American Economic Review, 94(4):991–1013, September

work page doi:10.1126/science.aal4230
[21]

doi:10.1257/0002828042002561

ISSN 0002-8282. doi:10.1257/0002828042002561. Brian A. Nosek, Mahzarin R. Banaji, and Anthony G. Greenwald. Harvesting implicit group attitudes and beliefs from a demonstration web site.Group Dynamics: Theory, Research, and Practice, 6(1):101–115, 2002a. ISSN 1930-7802. doi:10.1037/1089-2699.6.1.101. Brian A. Nosek, Mahzarin R. Banaji, and Anthony G. Gree...

work page doi:10.1257/0002828042002561 1930
[22]

doi:10.1521/jscp.2011.30.5.484

ISSN 0736-7236. doi:10.1521/jscp.2011.30.5.484. Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67:1–48, October

work page doi:10.1521/jscp.2011.30.5.484 2011
[23]

doi:10.18637/jss.v067.i01

ISSN 1548-7660. doi:10.18637/jss.v067.i01. José C. Pinheiro and Douglas M. Bates. Linear Mixed-Effects Models: Basic Concepts and Examples. InMixed-Effects Models in S and S-PLUS, pages 3–56. Springer, New York, NY ,

work page doi:10.18637/jss.v067.i01
[24]

doi:10.1007/0-387- 22747-4_1

ISBN 978-0-387-22747-4. doi:10.1007/0-387- 22747-4_1. Yi Zeng, Yu Yang, Andy Zhou, Jeffrey Ziwei Tan, Yuheng Tu, Yifan Mai, Kevin Klyman, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, and Bo Li. AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies, August

work page doi:10.1007/0-387- 2024
[25]

doi:10.4324/9780203771587

ISBN 978-0-203-77158-7. doi:10.4324/9780203771587. Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, and Mehrdad Farajtabar. The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, July

work page doi:10.4324/9780203771587
[26]

doi:10.1016/S0001-6918(01)00034-8

ISSN 1873-6297. doi:10.1016/S0001-6918(01)00034-8. Jordan R. Axt and Calvin K. Lai. Reducing discrimination: A bias versus noise perspective.Journal of Personality and Social Psychology, 117(1):26–49,

work page doi:10.1016/s0001-6918(01)00034-8
[27]

doi:10.1037/pspa0000153

ISSN 1939-1315. doi:10.1037/pspa0000153. Patrick S. Forscher, Calvin K. Lai, Jordan R. Axt, Charles R. Ebersole, Michelle Herman, Patricia G. Devine, and Brian A. Nosek. A meta-analysis of procedures to change implicit measures.Journal of Personality and Social Psychology, 117(3):522–559,

work page doi:10.1037/pspa0000153 1939
[28]

doi:10.1037/pspa0000160

ISSN 1939-1315. doi:10.1037/pspa0000160. Daniel T. Gilbert and J. Gregory Hixon. The trouble of thinking: Activation and application of stereotypic beliefs.Jour- nal of Personality and Social Psychology, 60(4):509–517,

work page doi:10.1037/pspa0000160 1939
[29]

doi:10.1037/0022-3514.60.4.509

ISSN 1939-1315. doi:10.1037/0022-3514.60.4.509. Margaret E. Roberts, Brandon M. Stewart, D. Tingley, and E. Airoldi. The structural topic model and applied social science. InInternational Conference on Neural Information Processing,

work page doi:10.1037/0022-3514.60.4.509 1939
[30]

Which category fits best? Pick one: [{category_1}, {category_2}]. Only respond with the selected category. 3 Sort the following word into the correct category. Word: “{word}

ISSN 2515-2459. doi:10.1177/25152459231160105. 11 APREPRINT- SEPTEMBER30, 2025 Supplementary Materials Table S1: Word stimuli used to represent group categories and semantic attributes. Note that the same words were used to represent pleasant and unpleasant in the first four RM-IATs. RM-IA T Category Words 1 Flowers aster, clover, hyacinth, marigold, popp...

work page doi:10.1177/25152459231160105 2025

[1] [1]

doi:10.1146/annurev-psych-010419-050837

ISSN 1545-2085. doi:10.1146/annurev-psych-010419-050837. B. Keith Payne and Bertram Gawronski. A history of implicit social cognition: Where is it coming from? Where is it now? Where is it going? InHandbook of Implicit Social Cognition: Measurement, Theory, and Applications, pages 1–15. The Guilford Press, New York, NY , US,

work page doi:10.1146/annurev-psych-010419-050837 2085

[2] [2]

doi:10.1016/j.tics.2018.02.001

ISSN 1364-6613. doi:10.1016/j.tics.2018.02.001. John A. Bargh and Erin L. Williams. The Automaticity of Social Life.Current Directions in Psychological Science, 15 (1):1–4,

work page doi:10.1016/j.tics.2018.02.001 2018

[3] [3]

doi:10.1111/j.0963-7214.2006.00395.x

ISSN 1467-8721. doi:10.1111/j.0963-7214.2006.00395.x. Russell H. Fazio, David M. Sanbonmatsu, Martha C. Powell, and Frank R. Kardes. On the automatic activation of attitudes.Journal of Personality and Social Psychology, 50(2):229–238,

work page doi:10.1111/j.0963-7214.2006.00395.x 2006

[4] [4]

doi:10.1037/0022- 3514.50.2.229

ISSN 1939-1315. doi:10.1037/0022- 3514.50.2.229. Jens Agerström and Dan-Olof Rooth. The role of automatic obesity stereotypes in real hiring discrimination.Journal of Applied Psychology, 96(4):790–805,

work page doi:10.1037/0022- 1939

[5] [5]

doi:10.1037/a0021594

ISSN 1939-1854. doi:10.1037/a0021594. Chloë FitzGerald and Samia Hurst. Implicit bias in healthcare professionals: A systematic review.BMC Medical Ethics, 18(1):19, March

work page doi:10.1037/a0021594 1939

[6] [6]

doi:10.1186/s12910-017-0179-8

ISSN 1472-6939. doi:10.1186/s12910-017-0179-8. Katherine B. Spencer, Amanda K. Charbonneau, and Jack Glaser. Implicit Bias and Policing.Social and Personality Psychology Compass, 10(1):50–63,

work page doi:10.1186/s12910-017-0179-8

[7] [7]

doi:10.1111/spc3.12210

ISSN 1751-9004. doi:10.1111/spc3.12210. A. G. Greenwald, D. E. McGhee, and J. L. Schwartz. Measuring individual differences in implicit cognition: The implicit association test.Journal of Personality and Social Psychology, 74(6):1464–1480, June

work page doi:10.1111/spc3.12210

[8] [8]

doi:10.1037//0022-3514.74.6.1464

ISSN 0022-3514. doi:10.1037//0022-3514.74.6.1464. Bertram Gawronski. Automaticity and Implicit Measures. In Charles M. Judd, Harry T. Reis, and Tessa West, editors,Handbook of Research Methods in Social and Personality Psychology, Cambridge Handbooks in Psy- chology, pages 404–426. Cambridge University Press, Cambridge, 3 edition,

work page doi:10.1037//0022-3514.74.6.1464

[9] [9]

doi:10.1017/9781009170123.018

ISBN 978-1-009-17011-6. doi:10.1017/9781009170123.018. Alexandra Goedderz, Zahra Rahmani Azad, and Adam Hahn. Awareness of Implicit Attitudes Revisited: A Meta- Analysis on Replications Across Samples and Settings.Collabra: Psychology, 10(1):126220, December

work page doi:10.1017/9781009170123.018

[10] [10]

doi:10.1525/collabra.126220

ISSN 2474-7394. doi:10.1525/collabra.126220. Calvin K. Lai and Megan E. Wilson. Measuring implicit intergroup biases.Social and Personality Psychology Compass, 15(1),

work page doi:10.1525/collabra.126220

[11] [11]

doi:10.1111/spc3.12573

ISSN 1751-9004. doi:10.1111/spc3.12573. Adam Morris and Benedek Kurdi. Awareness of implicit attitudes: Large-scale investigations of mechanism and scope.Journal of Experimental Psychology: General, 152(12):3311–3343,

work page doi:10.1111/spc3.12573

[12] [12]

doi:10.1037/xge0001464

ISSN 1939-2222. doi:10.1037/xge0001464. William A. Cunningham and Philip David Zelazo. Attitudes and evaluations: A social cognitive neuroscience perspective. Trends in Cognitive Sciences, 11(3):97–104, March

work page doi:10.1037/xge0001464 1939

[13] [13]

doi:10.1016/j.tics.2006.12.005

ISSN 1364-6613. doi:10.1016/j.tics.2006.12.005. Abubakar Abid, Maheen Farooqi, and James Zou. Persistent Anti-Muslim Bias in Large Language Models, January

work page doi:10.1016/j.tics.2006.12.005 2006

[14] [14]

Gender and Representation Bias in GPT-3 Generated Stories

9 APREPRINT- SEPTEMBER30, 2025 Li Lucy and David Bamman. Gender and Representation Bias in GPT-3 Generated Stories. In Nader Akoury, Faeze Brahman, Snigdha Chaturvedi, Elizabeth Clark, Mohit Iyyer, and Lara J. Martin, editors,Proceedings of the Third Workshop on Narrative Understanding, pages 48–55, Virtual, June

work page 2025

[15] [15]

doi:10.18653/v1/2021.nuse-1.5

Association for Computational Linguistics. doi:10.18653/v1/2021.nuse-1.5. Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. Fine-Tuning Language Models from Human Preferences, January

work page doi:10.18653/v1/2021.nuse-1.5 2021

[16] [16]

A Comparative Study of Explicit and Implicit Gender Biases in Large Language Models via Self-evaluation

Yachao Zhao, Bo Wang, Yan Wang, Dongming Zhao, Xiaojia Jin, Jijun Zhang, Ruifang He, and Yuexian Hou. A Comparative Study of Explicit and Implicit Gender Biases in Large Language Models via Self-evaluation. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, editors, Proceedings of the 2024 Joint Intern...

work page 2024

[17] [17]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou

doi:10.1073/pnas.2416228122. Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, January

work page doi:10.1073/pnas.2416228122

[18] [18]

doi:10.1037/a0015575

ISSN 1939-1315. doi:10.1037/a0015575. 10 APREPRINT- SEPTEMBER30, 2025 Anthony G. Greenwald, Brian A. Nosek, and Mahzarin R. Banaji. Understanding and using the Implicit Association Test: I. An improved scoring algorithm.Journal of Personality and Social Psychology, 85(2):197–216,

work page doi:10.1037/a0015575 1939

[19] [19]

doi:10.1037/0022-3514.85.2.197

ISSN 1939-1315. doi:10.1037/0022-3514.85.2.197. Anthropic. Claude 3.5 Sonnet Model Card Addendum. Technical report, Anthropic, June

work page doi:10.1037/0022-3514.85.2.197 1939

[20] [20]

Marianne Bertrand and Sendhil Mullainathan

doi:10.1126/science.aal4230. Marianne Bertrand and Sendhil Mullainathan. Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination.American Economic Review, 94(4):991–1013, September

work page doi:10.1126/science.aal4230

[21] [21]

doi:10.1257/0002828042002561

ISSN 0002-8282. doi:10.1257/0002828042002561. Brian A. Nosek, Mahzarin R. Banaji, and Anthony G. Greenwald. Harvesting implicit group attitudes and beliefs from a demonstration web site.Group Dynamics: Theory, Research, and Practice, 6(1):101–115, 2002a. ISSN 1930-7802. doi:10.1037/1089-2699.6.1.101. Brian A. Nosek, Mahzarin R. Banaji, and Anthony G. Gree...

work page doi:10.1257/0002828042002561 1930

[22] [22]

doi:10.1521/jscp.2011.30.5.484

ISSN 0736-7236. doi:10.1521/jscp.2011.30.5.484. Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67:1–48, October

work page doi:10.1521/jscp.2011.30.5.484 2011

[23] [23]

doi:10.18637/jss.v067.i01

ISSN 1548-7660. doi:10.18637/jss.v067.i01. José C. Pinheiro and Douglas M. Bates. Linear Mixed-Effects Models: Basic Concepts and Examples. InMixed-Effects Models in S and S-PLUS, pages 3–56. Springer, New York, NY ,

work page doi:10.18637/jss.v067.i01

[24] [24]

doi:10.1007/0-387- 22747-4_1

ISBN 978-0-387-22747-4. doi:10.1007/0-387- 22747-4_1. Yi Zeng, Yu Yang, Andy Zhou, Jeffrey Ziwei Tan, Yuheng Tu, Yifan Mai, Kevin Klyman, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, and Bo Li. AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies, August

work page doi:10.1007/0-387- 2024

[25] [25]

doi:10.4324/9780203771587

ISBN 978-0-203-77158-7. doi:10.4324/9780203771587. Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, and Mehrdad Farajtabar. The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, July

work page doi:10.4324/9780203771587

[26] [26]

doi:10.1016/S0001-6918(01)00034-8

ISSN 1873-6297. doi:10.1016/S0001-6918(01)00034-8. Jordan R. Axt and Calvin K. Lai. Reducing discrimination: A bias versus noise perspective.Journal of Personality and Social Psychology, 117(1):26–49,

work page doi:10.1016/s0001-6918(01)00034-8

[27] [27]

doi:10.1037/pspa0000153

ISSN 1939-1315. doi:10.1037/pspa0000153. Patrick S. Forscher, Calvin K. Lai, Jordan R. Axt, Charles R. Ebersole, Michelle Herman, Patricia G. Devine, and Brian A. Nosek. A meta-analysis of procedures to change implicit measures.Journal of Personality and Social Psychology, 117(3):522–559,

work page doi:10.1037/pspa0000153 1939

[28] [28]

doi:10.1037/pspa0000160

ISSN 1939-1315. doi:10.1037/pspa0000160. Daniel T. Gilbert and J. Gregory Hixon. The trouble of thinking: Activation and application of stereotypic beliefs.Jour- nal of Personality and Social Psychology, 60(4):509–517,

work page doi:10.1037/pspa0000160 1939

[29] [29]

doi:10.1037/0022-3514.60.4.509

ISSN 1939-1315. doi:10.1037/0022-3514.60.4.509. Margaret E. Roberts, Brandon M. Stewart, D. Tingley, and E. Airoldi. The structural topic model and applied social science. InInternational Conference on Neural Information Processing,

work page doi:10.1037/0022-3514.60.4.509 1939

[30] [30]

Which category fits best? Pick one: [{category_1}, {category_2}]. Only respond with the selected category. 3 Sort the following word into the correct category. Word: “{word}

ISSN 2515-2459. doi:10.1177/25152459231160105. 11 APREPRINT- SEPTEMBER30, 2025 Supplementary Materials Table S1: Word stimuli used to represent group categories and semantic attributes. Note that the same words were used to represent pleasant and unpleasant in the first four RM-IATs. RM-IA T Category Words 1 Flowers aster, clover, hyacinth, marigold, popp...

work page doi:10.1177/25152459231160105 2025