Implicit Bias-Like Patterns in Reasoning Models
Pith reviewed 2026-05-23 00:14 UTC · model grok-4.3
The pith
Reasoning models use more tokens on association-incompatible tasks than compatible ones.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Reasoning models exhibit implicit bias-like patterns in which they expend more reasoning tokens on association-incompatible tasks than on association-compatible tasks, indicating greater computational effort when processing counter-stereotypical information, with the patterns varying according to each model's internal reasoning content.
What carries the argument
The Reasoning Model Implicit Association Test (RM-IAT), which measures differences in reasoning token counts between tasks that align with or conflict with common associations.
If this is right
- Most reasoning models tested show greater token expenditure on counter-stereotypical pairings.
- One model displayed reversed patterns tied to its explicit reasoning about bias and stereotypes.
- Token-count differences depend on the specific internal reasoning content produced by each model.
Where Pith is reading between the lines
- If token counts track processing effort, then interventions that change reasoning content might alter these patterns.
- The method could be applied to compare bias-like processing across additional model families or task domains.
- Variation in patterns across models suggests that training data or fine-tuning choices influence the direction of the effect.
Load-bearing premise
Differences in the number of reasoning tokens reflect implicit bias-like computational effort rather than unrelated factors such as task length or difficulty.
What would settle it
Re-running the RM-IAT on the same models and finding no consistent difference in token counts between compatible and incompatible association tasks.
Figures
read the original abstract
Implicit biases refer to automatic mental processes that shape perceptions, judgments, and behaviors. Previous research on "implicit bias" in LLMs focused primarily on outputs rather than the processes underlying the outputs. We present the Reasoning Model Implicit Association Test (RM-IAT) to study implicit bias-like processing in reasoning models, LLMs that use step-by-step reasoning to solve complex tasks. Using RM-IAT, we find that reasoning models like o3-mini, DeepSeek-R1, gpt-oss-20b, and Qwen-3 8B consistently expend more reasoning tokens on association-incompatible tasks than association-compatible tasks, suggesting greater computational effort when processing counter-stereotypical information. Conversely, Claude 3.7 Sonnet exhibited reversed patterns, which thematic analysis associated with its unique internal focus on reasoning about bias and stereotypes. These findings demonstrate that reasoning models exhibit distinct implicit bias-like patterns and that these patterns vary significantly depending on the models' internal reasoning content.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Reasoning Model Implicit Association Test (RM-IAT) to study implicit bias-like processing in reasoning models. It finds that models such as o3-mini, DeepSeek-R1, gpt-oss-20b, and Qwen-3 8B expend more reasoning tokens on association-incompatible tasks than on compatible ones, suggesting greater effort on counter-stereotypical information. Claude 3.7 Sonnet shows reversed patterns linked to its focus on bias reasoning. The findings indicate distinct implicit bias-like patterns varying by model.
Significance. If the token count differences can be attributed specifically to bias-like processing rather than task difficulty, this would be a significant contribution to understanding internal mechanisms of bias in LLMs, extending prior work on output biases to reasoning processes. The model-specific variations are noteworthy. The paper provides empirical observations but the strength depends on the validity of the proxy and controls used.
major comments (2)
- Abstract: The abstract states observational findings on token counts but supplies no methodological details, task descriptions, statistical tests, controls, or error analysis, so it is not possible to verify whether the data support the claim as stated.
- Abstract / central claim: The interpretation of higher reasoning token counts as evidence of greater computational effort on counter-stereotypical information requires that RM-IAT tasks are matched on all other dimensions affecting reasoning length (e.g., logical complexity, number of steps needed). No indication is given that such matching or content analysis of the extra tokens was performed.
minor comments (1)
- The weakest assumption (token count as proxy for bias-specific effort) should be explicitly discussed and tested in a dedicated limitations or methods subsection.
Simulated Author's Rebuttal
We thank the referee for their constructive comments. We address each major comment below and note planned revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: Abstract: The abstract states observational findings on token counts but supplies no methodological details, task descriptions, statistical tests, controls, or error analysis, so it is not possible to verify whether the data support the claim as stated.
Authors: We agree that the submitted abstract is concise and omits key details. In the revised version, we will expand the abstract to include a brief description of the RM-IAT task structure, the models tested, the token-count comparison approach, and the statistical tests used. revision: yes
-
Referee: Abstract / central claim: The interpretation of higher reasoning token counts as evidence of greater computational effort on counter-stereotypical information requires that RM-IAT tasks are matched on all other dimensions affecting reasoning length (e.g., logical complexity, number of steps needed). No indication is given that such matching or content analysis of the extra tokens was performed.
Authors: The RM-IAT follows the standard IAT structure in which compatible and incompatible conditions employ the same stimuli, categories, and logical framing, differing only in association direction. This design matches the tasks on content and complexity by construction. The manuscript does not report a post-hoc content analysis of reasoning tokens; we will add an explicit description of the matching procedure and a limitations discussion acknowledging the value of future token-level analysis. revision: partial
Circularity Check
No circularity: purely empirical measurement of token usage with no derivations or self-referential definitions
full rationale
The paper defines RM-IAT as a new test procedure and reports direct empirical observations of reasoning token counts across models on compatible vs. incompatible tasks. No equations, fitted parameters, predictions derived from prior fits, or load-bearing self-citations appear in the provided text. The central finding (higher token counts on incompatible tasks) is a raw measurement, not a quantity shown to equal its own inputs by construction. The interpretation of token count as a bias proxy is an assumption about validity, not a circular derivation step. This is a standard observational study whose claims stand or fall on experimental controls rather than logical reduction to the inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The number of reasoning tokens expended serves as a valid proxy for implicit bias-like processing or computational effort when handling counter-stereotypical information.
Reference graph
Works this paper leans on
-
[1]
doi:10.1146/annurev-psych-010419-050837
ISSN 1545-2085. doi:10.1146/annurev-psych-010419-050837. B. Keith Payne and Bertram Gawronski. A history of implicit social cognition: Where is it coming from? Where is it now? Where is it going? InHandbook of Implicit Social Cognition: Measurement, Theory, and Applications, pages 1–15. The Guilford Press, New York, NY , US,
-
[2]
doi:10.1016/j.tics.2018.02.001
ISSN 1364-6613. doi:10.1016/j.tics.2018.02.001. John A. Bargh and Erin L. Williams. The Automaticity of Social Life.Current Directions in Psychological Science, 15 (1):1–4,
-
[3]
doi:10.1111/j.0963-7214.2006.00395.x
ISSN 1467-8721. doi:10.1111/j.0963-7214.2006.00395.x. Russell H. Fazio, David M. Sanbonmatsu, Martha C. Powell, and Frank R. Kardes. On the automatic activation of attitudes.Journal of Personality and Social Psychology, 50(2):229–238,
-
[4]
doi:10.1037/0022- 3514.50.2.229
ISSN 1939-1315. doi:10.1037/0022- 3514.50.2.229. Jens Agerström and Dan-Olof Rooth. The role of automatic obesity stereotypes in real hiring discrimination.Journal of Applied Psychology, 96(4):790–805,
-
[5]
ISSN 1939-1854. doi:10.1037/a0021594. Chloë FitzGerald and Samia Hurst. Implicit bias in healthcare professionals: A systematic review.BMC Medical Ethics, 18(1):19, March
-
[6]
ISSN 1472-6939. doi:10.1186/s12910-017-0179-8. Katherine B. Spencer, Amanda K. Charbonneau, and Jack Glaser. Implicit Bias and Policing.Social and Personality Psychology Compass, 10(1):50–63,
-
[7]
ISSN 1751-9004. doi:10.1111/spc3.12210. A. G. Greenwald, D. E. McGhee, and J. L. Schwartz. Measuring individual differences in implicit cognition: The implicit association test.Journal of Personality and Social Psychology, 74(6):1464–1480, June
-
[8]
doi:10.1037//0022-3514.74.6.1464
ISSN 0022-3514. doi:10.1037//0022-3514.74.6.1464. Bertram Gawronski. Automaticity and Implicit Measures. In Charles M. Judd, Harry T. Reis, and Tessa West, editors,Handbook of Research Methods in Social and Personality Psychology, Cambridge Handbooks in Psy- chology, pages 404–426. Cambridge University Press, Cambridge, 3 edition,
-
[9]
ISBN 978-1-009-17011-6. doi:10.1017/9781009170123.018. Alexandra Goedderz, Zahra Rahmani Azad, and Adam Hahn. Awareness of Implicit Attitudes Revisited: A Meta- Analysis on Replications Across Samples and Settings.Collabra: Psychology, 10(1):126220, December
-
[10]
ISSN 2474-7394. doi:10.1525/collabra.126220. Calvin K. Lai and Megan E. Wilson. Measuring implicit intergroup biases.Social and Personality Psychology Compass, 15(1),
-
[11]
ISSN 1751-9004. doi:10.1111/spc3.12573. Adam Morris and Benedek Kurdi. Awareness of implicit attitudes: Large-scale investigations of mechanism and scope.Journal of Experimental Psychology: General, 152(12):3311–3343,
-
[12]
ISSN 1939-2222. doi:10.1037/xge0001464. William A. Cunningham and Philip David Zelazo. Attitudes and evaluations: A social cognitive neuroscience perspective. Trends in Cognitive Sciences, 11(3):97–104, March
-
[13]
doi:10.1016/j.tics.2006.12.005
ISSN 1364-6613. doi:10.1016/j.tics.2006.12.005. Abubakar Abid, Maheen Farooqi, and James Zou. Persistent Anti-Muslim Bias in Large Language Models, January
-
[14]
Gender and Representation Bias in GPT-3 Generated Stories
9 APREPRINT- SEPTEMBER30, 2025 Li Lucy and David Bamman. Gender and Representation Bias in GPT-3 Generated Stories. In Nader Akoury, Faeze Brahman, Snigdha Chaturvedi, Elizabeth Clark, Mohit Iyyer, and Lara J. Martin, editors,Proceedings of the Third Workshop on Narrative Understanding, pages 48–55, Virtual, June
work page 2025
-
[15]
Association for Computational Linguistics. doi:10.18653/v1/2021.nuse-1.5. Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. Fine-Tuning Language Models from Human Preferences, January
-
[16]
Yachao Zhao, Bo Wang, Yan Wang, Dongming Zhao, Xiaojia Jin, Jijun Zhang, Ruifang He, and Yuexian Hou. A Comparative Study of Explicit and Implicit Gender Biases in Large Language Models via Self-evaluation. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, editors, Proceedings of the 2024 Joint Intern...
work page 2024
-
[17]
doi:10.1073/pnas.2416228122. Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, January
-
[18]
ISSN 1939-1315. doi:10.1037/a0015575. 10 APREPRINT- SEPTEMBER30, 2025 Anthony G. Greenwald, Brian A. Nosek, and Mahzarin R. Banaji. Understanding and using the Implicit Association Test: I. An improved scoring algorithm.Journal of Personality and Social Psychology, 85(2):197–216,
-
[19]
doi:10.1037/0022-3514.85.2.197
ISSN 1939-1315. doi:10.1037/0022-3514.85.2.197. Anthropic. Claude 3.5 Sonnet Model Card Addendum. Technical report, Anthropic, June
-
[20]
Marianne Bertrand and Sendhil Mullainathan
doi:10.1126/science.aal4230. Marianne Bertrand and Sendhil Mullainathan. Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination.American Economic Review, 94(4):991–1013, September
-
[21]
ISSN 0002-8282. doi:10.1257/0002828042002561. Brian A. Nosek, Mahzarin R. Banaji, and Anthony G. Greenwald. Harvesting implicit group attitudes and beliefs from a demonstration web site.Group Dynamics: Theory, Research, and Practice, 6(1):101–115, 2002a. ISSN 1930-7802. doi:10.1037/1089-2699.6.1.101. Brian A. Nosek, Mahzarin R. Banaji, and Anthony G. Gree...
-
[22]
doi:10.1521/jscp.2011.30.5.484
ISSN 0736-7236. doi:10.1521/jscp.2011.30.5.484. Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67:1–48, October
-
[23]
ISSN 1548-7660. doi:10.18637/jss.v067.i01. José C. Pinheiro and Douglas M. Bates. Linear Mixed-Effects Models: Basic Concepts and Examples. InMixed-Effects Models in S and S-PLUS, pages 3–56. Springer, New York, NY ,
-
[24]
ISBN 978-0-387-22747-4. doi:10.1007/0-387- 22747-4_1. Yi Zeng, Yu Yang, Andy Zhou, Jeffrey Ziwei Tan, Yuheng Tu, Yifan Mai, Kevin Klyman, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, and Bo Li. AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies, August
-
[25]
ISBN 978-0-203-77158-7. doi:10.4324/9780203771587. Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, and Mehrdad Farajtabar. The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, July
-
[26]
doi:10.1016/S0001-6918(01)00034-8
ISSN 1873-6297. doi:10.1016/S0001-6918(01)00034-8. Jordan R. Axt and Calvin K. Lai. Reducing discrimination: A bias versus noise perspective.Journal of Personality and Social Psychology, 117(1):26–49,
-
[27]
ISSN 1939-1315. doi:10.1037/pspa0000153. Patrick S. Forscher, Calvin K. Lai, Jordan R. Axt, Charles R. Ebersole, Michelle Herman, Patricia G. Devine, and Brian A. Nosek. A meta-analysis of procedures to change implicit measures.Journal of Personality and Social Psychology, 117(3):522–559,
-
[28]
ISSN 1939-1315. doi:10.1037/pspa0000160. Daniel T. Gilbert and J. Gregory Hixon. The trouble of thinking: Activation and application of stereotypic beliefs.Jour- nal of Personality and Social Psychology, 60(4):509–517,
-
[29]
doi:10.1037/0022-3514.60.4.509
ISSN 1939-1315. doi:10.1037/0022-3514.60.4.509. Margaret E. Roberts, Brandon M. Stewart, D. Tingley, and E. Airoldi. The structural topic model and applied social science. InInternational Conference on Neural Information Processing,
-
[30]
ISSN 2515-2459. doi:10.1177/25152459231160105. 11 APREPRINT- SEPTEMBER30, 2025 Supplementary Materials Table S1: Word stimuli used to represent group categories and semantic attributes. Note that the same words were used to represent pleasant and unpleasant in the first four RM-IATs. RM-IA T Category Words 1 Flowers aster, clover, hyacinth, marigold, popp...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.