pith. machine review for the scientific record. sign in

arxiv: 2605.08716 · v1 · submitted 2026-05-09 · 💻 cs.AI · cs.CL· cs.LG

Recognition: no theorem link

Bias by Necessity: Impossibility Theorems for Sequential Processing with Convergent AI and Human Validation

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:07 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LG
keywords cognitive biasesimpossibility theoremscausal maskingautoregressive modelsprimacy effectanchoringsequential processingorder dependence
0
0 comments X

The pith

Primacy effects and anchoring are mathematically unavoidable in any sequential processor using causal masking, such as current language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to prove that certain cognitive biases are not flaws or optional features but direct, necessary outcomes of processing information one token or step at a time under causal constraints. Three impossibility theorems show that primacy arises from asymmetric accumulation of attention, anchoring follows from bounded information in sequential conditioning, and complete removal of order effects by averaging over all permutations demands factorial time. These mathematical necessities are then checked against twelve large language models and two human studies, where the predicted patterns of bias appear in both. If the claim holds, efforts to eliminate such biases in AI or in human judgment by simple reordering or averaging are fundamentally limited by computation.

Core claim

Primacy bias arises from asymmetric attention accumulation, anchoring emerges from sequential conditioning with provable information bounds, and exact debiasing by permutation marginalization requires factorial-time computation, with Monte Carlo approximation feasible at constant per-tolerance overhead. These results follow directly from the causal masking that defines autoregressive generation.

What carries the argument

Three impossibility theorems grounded in causal masking constraints that force asymmetric attention and sequential conditioning in autoregressive models.

If this is right

  • Exact removal of order-dependent bias in language models requires exponential computation in the number of items.
  • Anchor position and working memory load can be used to predict and modulate the size of observed biases in both models and people.
  • Monte Carlo sampling offers a practical, constant-cost way to approximate unbiased outputs from biased sequential processors.
  • The same information bounds that produce anchoring in models also appear in human data when working memory is taxed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Architectures that relax strict causal masking might sidestep these biases but would need to preserve generation coherence.
  • Ordering of training examples could be treated as a controllable variable for managing bias magnitude in deployed systems.
  • Similar necessity arguments may apply to other causal sequential systems such as online decision processes or streaming data pipelines.
  • Direct tests in non-autoregressive sequential models would clarify whether the impossibility is specific to causal masking or broader.

Load-bearing premise

That strict causal masking in autoregressive transformers is the only relevant form of sequential processing and that the quantitative bias predictions transfer directly to human behavior.

What would settle it

A sequential causal model or human experiment in which primacy and anchoring effects disappear or reverse when causal order is strictly enforced.

Figures

Figures reproduced from arXiv: 2605.08716 by Dongxin Guo, Jikun Wu, Siu-Ming Yiu.

Figure 1
Figure 1. Figure 1: Visualization of causal masking and positional priv [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Anchoring bias (mean shift in estimate from neu [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Human behavioral results. Left: Study 1 shows [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Are certain cognitive biases mathematically inevitable consequences of sequential information processing? We prove that primacy effects, anchoring, and order-dependence are architecturally necessary in autoregressive language models due to causal masking constraints. Our three impossibility theorems establish: (1) primacy bias arises from asymmetric attention accumulation; (2) anchoring emerges from sequential conditioning with provable information bounds; and (3) exact debiasing by permutation marginalization requires factorial-time computation, with Monte Carlo approximation feasible at constant per-tolerance overhead. We validate these bounds across 12 frontier LLMs ($R^2 = 0.89$; $\Delta$BIC $= 16.6$ vs. next-best alternative). We then derive quantitative predictions from the framework and test them in two pre-registered human experiments ($N = 464$ analyzed). Study 1 confirms anchor position modulates anchoring magnitude ($d = 0.52$, BF$_{10} = 847$). Study 2 shows working memory load amplifies primacy bias ($d = 0.41$, BF$_{10} = 156$), with WM capacity predicting bias reduction ($r = -.38$). These convergent findings reframe cognitive biases as resource-rational responses to sequential processing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proves three impossibility theorems establishing that primacy effects, anchoring, and order-dependence are architecturally necessary in autoregressive language models due to causal masking constraints on attention accumulation and sequential conditioning. It reports quantitative validation on 12 frontier LLMs (R²=0.89, ΔBIC=16.6) and tests derived predictions in two pre-registered human experiments (N=464) showing position-modulated anchoring (d=0.52, BF10=847), WM-load amplification of primacy (d=0.41, BF10=156), and WM-capacity correlation with bias reduction (r=-.38), reframing these biases as resource-rational consequences of sequential processing.

Significance. If the derivations and mappings hold, the work supplies a formal information-theoretic basis for specific biases in both LLMs and humans, with notable strengths in the pre-registered human studies, large Bayes factors, and explicit Monte Carlo feasibility result for the third theorem.

major comments (2)
  1. [human experiments section (Studies 1-2)] The section deriving quantitative predictions for humans from the LLM theorems: the framework assumes human working memory implements the same asymmetric attention accumulation and factorial marginalization costs as causal-masked autoregressive models, yet no explicit mechanistic derivation or exclusion of alternative resource-rational accounts is provided; the reported correlations are compatible with multiple non-architectural explanations.
  2. [Theorem 3] Theorem 3 on permutation marginalization: while the factorial-time lower bound follows from causal masking, the claim that Monte Carlo approximation incurs only constant per-tolerance overhead is not shown to hold uniformly across the 12-LLM validation set or tied back to the reported R² fit.
minor comments (2)
  1. [Abstract] Abstract: the next-best model for the ΔBIC=16.6 comparison is not named, which would clarify the relative strength of the reported fit.
  2. [LLM validation section] LLM validation section: additional detail on LLM selection criteria and any pre-specification of the exact R² regression would aid assessment of the quantitative bounds.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major point below, clarifying the scope of our claims and indicating where the manuscript has been revised for greater precision.

read point-by-point responses
  1. Referee: [human experiments section (Studies 1-2)] The section deriving quantitative predictions for humans from the LLM theorems: the framework assumes human working memory implements the same asymmetric attention accumulation and factorial marginalization costs as causal-masked autoregressive models, yet no explicit mechanistic derivation or exclusion of alternative resource-rational accounts is provided; the reported correlations are compatible with multiple non-architectural explanations.

    Authors: We agree that the manuscript does not provide a mechanistic derivation equating human working memory to causal-masked autoregressive attention. Our framework instead treats the information-theoretic constraints of sequential conditioning as domain-general, applying to any resource-limited sequential processor. The human experiments were designed to test specific, pre-registered quantitative predictions derived from the theorems (position-dependent anchoring magnitude and WM-load amplification of primacy), which received strong evidential support. We have added a dedicated subsection in the revised Discussion that explicitly acknowledges alternative resource-rational explanations (e.g., capacity-based decay models without architectural asymmetry) and notes that the observed correlations are consistent with, but not uniquely diagnostic of, our account. revision: partial

  2. Referee: [Theorem 3] Theorem 3 on permutation marginalization: while the factorial-time lower bound follows from causal masking, the claim that Monte Carlo approximation incurs only constant per-tolerance overhead is not shown to hold uniformly across the 12-LLM validation set or tied back to the reported R² fit.

    Authors: The constant per-tolerance overhead of Monte Carlo marginalization is a general result from sampling theory (variance reduction scales with sample size independently of the underlying distribution) and does not require per-model empirical verification to hold. The 12-LLM validation set was used exclusively to test the bias predictions of Theorems 1 and 2 (R² = 0.89), not the approximation overhead of Theorem 3. We have revised the text to separate these elements clearly, added a brief simulation appendix confirming the constant overhead across representative model scales, and removed any implication that the R² statistic directly validates the Monte Carlo claim. revision: yes

Circularity Check

0 steps flagged

No circularity: impossibility theorems derive directly from causal masking without reduction to fitted inputs or self-citations

full rationale

The paper derives its three impossibility theorems from the explicit properties of strict causal masking and autoregressive conditioning in transformer architectures, as described in the abstract. These bounds on asymmetric attention accumulation and sequential information follow mathematically from the given model constraints without any self-referential definition of the target biases in terms of the theorems themselves. Validation via R² on LLM outputs checks consistency with the derived bounds rather than using a fitted parameter as the prediction, and the human experiments apply pre-registered quantitative predictions from the framework. No load-bearing self-citations, uniqueness theorems from prior author work, or ansatzes are present in the provided text. The derivation chain remains self-contained as a proof from architectural first principles, with empirical tests serving as external checks rather than circular inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard assumption that current frontier LLMs are strictly autoregressive with causal attention masks; no new entities are introduced and the only free parameter mentioned is the constant overhead of the Monte Carlo approximation.

free parameters (1)
  • Monte Carlo per-tolerance overhead
    Described as constant but not numerically specified or derived from first principles in the abstract.
axioms (1)
  • domain assumption Causal masking is strictly enforced and cannot be relaxed in the autoregressive generation process
    Invoked as the source of asymmetric attention accumulation and sequential conditioning in the three theorems.

pith-pipeline@v0.9.0 · 5526 in / 1356 out tokens · 68720 ms · 2026-05-12T01:07:15.465476+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 2 internal anchors

  1. [1]

    Science , title =

    Amos Tversky and Daniel Kahneman , doi =. Science , title =

  2. [2]

    Miller , doi =

    George A. Miller , doi =. Psychological Review , title =

  3. [3]

    Jacowitz and Daniel Kahneman , doi =

    Karen E. Jacowitz and Daniel Kahneman , doi =. Personality and Social Psychology Bulletin , title =

  4. [4]

    Lindsay , title =

    Grace W. Lindsay , title =. Frontiers Comput. Neurosci. , volume =. 2020 , url =. doi:10.3389/FNCOM.2020.00029 , timestamp =

  5. [5]

    Current Opinion in Neurobiology , title =

    Charan Ranganath and Mark D'Esposito , doi =. Current Opinion in Neurobiology , title =

  6. [6]

    Heitz and Josef C

    Nash Unsworth and Richard P. Heitz and Josef C. Schrock and Randall W. Engle , doi =. Behavior Research Methods , title =

  7. [7]

    Journal of Experimental Psychology: Learning Memory and Cognition , title =

    Klaus Oberauer , doi =. Journal of Experimental Psychology: Learning Memory and Cognition , title =

  8. [8]

    Longformer: The Long-Document Transformer

    Iz Beltagy and Matthew E. Peters and Arman Cohan , title =. arXiv preprint , volume =. 2020 , url =. 2004.05150 , timestamp =

  9. [9]

    Big Bird: Transformers for Longer Sequences , journal =

    Manzil Zaheer and Guru Guruganesh and Kumar Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Onta. Big Bird: Transformers for Longer Sequences , journal =. 2020 , url =

  10. [10]

    Simon , doi =

    Herbert A. Simon , doi =. Quarterly Journal of Economics , title =

  11. [11]

    Gigerenzer, Gerd and Todd, Peter M. and. Simple Heuristics That Make Us Smart , publisher =. 1999 , address =

  12. [12]

    Psychological Science , title =

    Nicholas Epley and Thomas Gilovich , doi =. Psychological Science , title =

  13. [13]

    Griffiths , doi =

    Falk Lieder and Thomas L. Griffiths , doi =. Behavioral and Brain Sciences , title =

  14. [14]

    Griffiths and Falk Lieder and Noah D

    Thomas L. Griffiths and Falk Lieder and Noah D. Goodman , title =. Top. Cogn. Sci. , volume =. 2015 , url =. doi:10.1111/TOPS.12142 , timestamp =

  15. [15]

    Marr , doi =

    D. Marr , doi =. Vision: a computational investigation into the human representation and processing of visual information. , title =

  16. [16]

    Cowan , doi =

    N. Cowan , doi =. Behavioral and Brain Sciences , title =

  17. [17]

    Wilson and Anne G.E

    Robert C. Wilson and Anne G.E. Collins , doi =. eLife , title =

  18. [18]

    Lost in the Middle: How Language Models Use Long Contexts

    Nelson F. Liu and Kevin Lin and John Hewitt and Ashwin Paranjape and Michele Bevilacqua and Fabio Petroni and Percy Liang , title =. Trans. Assoc. Comput. Linguistics , volume =. 2024 , url =. doi:10.1162/TACL\_A\_00638 , timestamp =

  19. [19]

    Jiaxu Lou and Yifan Sun , title =. J. Comput. Soc. Sci. , volume =. 2026 , url =. doi:10.1007/S42001-025-00435-2 , timestamp =

  20. [20]

    Large Language Models are not Fair Evaluators

    Peiyi Wang and Lei Li and Liang Chen and Zefan Cai and Dawei Zhu and Binghuai Lin and Yunbo Cao and Lingpeng Kong and Qi Liu and Tianyu Liu and Zhifang Sui , editor =. Large Language Models are not Fair Evaluators , journal =. 2024 , url =. doi:10.18653/V1/2024.ACL-LONG.511 , timestamp =

  21. [21]

    Judging the Judges: A Systematic Study of Position Bias in LLM -as-a-Judge

    Shi, Lin and Ma, Chiyu and Liang, Wenhua and Diao, Xingjian and Ma, Weicheng and Vosoughi, Soroush. Judging the Judges: A Systematic Study of Position Bias in LLM -as-a-Judge. Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguis...

  22. [22]

    Alexander Knipper and Charles S

    R. Alexander Knipper and Charles S. Knipper and Kaiqi Zhang and Valerie K. Sims and Clint A. Bowers and Shubhra Kanti Karmaker Santu , title =. arXiv preprint , volume =. 2025 , url =. doi:10.48550/ARXIV.2509.22856 , eprinttype =. 2509.22856 , timestamp =

  23. [23]

    Thilo Hagendorff and Sarah Fabi and Michal Kosinski , title =. Nat. Comput. Sci. , volume =. 2023 , url =. doi:10.1038/S43588-023-00527-X , timestamp =

  24. [24]

    Proceedings of the National Academy of Sciences of the United States of America , title =

    Marcel Binz and Eric Schulz , doi =. Proceedings of the National Academy of Sciences of the United States of America , title =

  25. [25]

    Unveiling Confirmation Bias in Chain-of-Thought Reasoning , journal =

    Yue Wan and Xiaowei Jia and Xiang Lorraine Li , editor =. Unveiling Confirmation Bias in Chain-of-Thought Reasoning , journal =. 2025 , url =

  26. [26]

    Chi and Quoc V

    Jason Wei and Xuezhi Wang and Dale Schuurmans and Maarten Bosma and Brian Ichter and Fei Xia and Ed H. Chi and Quoc V. Le and Denny Zhou , editor =. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , journal =. 2022 , url =

  27. [27]

    Reflexion: language agents with verbal reinforcement learning , journal =

    Noah Shinn and Federico Cassano and Ashwin Gopinath and Karthik Narasimhan and Shunyu Yao , editor =. Reflexion: language agents with verbal reinforcement learning , journal =. 2023 , url =

  28. [28]

    Le and Ed H

    Xuezhi Wang and Jason Wei and Dale Schuurmans and Quoc V. Le and Ed H. Chi and Sharan Narang and Aakanksha Chowdhery and Denny Zhou , title =. The Eleventh International Conference on Learning Representations,. 2023 , url =

  29. [29]

    Gomez and Lukasz Kaiser and Illia Polosukhin , editor =

    Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin , editor =. Attention is All you Need , journal =. 2017 , url =

  30. [30]

    OpenAI Blog , title =

    Radford Alec and Wu Jeffrey and Child Rewon and Luan David and Amodei Dario and Sutskever Ilya , issue =. OpenAI Blog , title =

  31. [31]

    Forty-second International Conference on Machine Learning,

    Xinyi Wu and Yifei Wang and Stefanie Jegelka and Ali Jadbabaie , title =. Forty-second International Conference on Machine Learning,. 2025 , url =

  32. [32]

    The Twelfth International Conference on Learning Representations,

    Guangxuan Xiao and Yuandong Tian and Beidi Chen and Song Han and Mike Lewis , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =

  33. [33]

    Perspectives on Psychological Science , title =

    Mark Steyvers and Aakriti Kumar , doi =. Perspectives on Psychological Science , title =