Recognition: unknown
Do Large Language Models Plan Answer Positions? Position Bias in Multiple-Choice Question Generation
Pith reviewed 2026-05-09 17:20 UTC · model grok-4.3
The pith
LLMs encode predictive signals of the correct answer position in the hidden states of question stems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Hidden representations in the question stem encode predictive signals of the correct answer position, suggesting that answer position may be implicitly planned during generation. Activation steering applied to these representations can partially control positional preferences and substantially shift answer position distributions.
What carries the argument
Hidden representations in the question stem that encode predictive signals of answer position, which can be read out by probes and altered via activation steering.
Load-bearing premise
Signals detected in the hidden states of the question stem reflect the model's planning of answer position rather than mere correlations from training data or surface features.
What would settle it
An experiment in which the predictive accuracy of stem probes falls to chance after controlling for question length, vocabulary, or other surface statistics, or in which activation steering produces no reliable change in final answer position distributions.
Figures
read the original abstract
Large language models (LLMs) are increasingly used to generate multiple-choice questions (MCQs), where correct answers should ideally be uniformly distributed across options. However, we observe that LLMs exhibit systematic position biases during generation. Through extensive experiments with 10 LLMs and 5 vision-language models (VLMs) on three MCQ generation tasks, we show that these biases are structured, with similar patterns emerging within model families. To investigate the underlying mechanisms, we conduct probing experiments and find that hidden representations in the question stem encode predictive signals of the correct answer position, suggesting that answer position may be implicitly planned during generation. Building on this insight, we apply activation steering to manipulate internal representations and influence answer position. Our results show that steering can partially control positional preferences and substantially shift answer position distributions. Our findings provide a practical framework for studying implicit positional planning in LLMs and highlight the importance of controllable generation for reliable MCQ construction and evaluation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that LLMs exhibit systematic position biases when generating MCQs across 10 LLMs and 5 VLMs on three tasks, that hidden representations in the question stem encode predictive signals of answer position (suggesting implicit planning during generation), and that activation steering on these representations can partially control and shift positional preferences. The work combines observational bias analysis, linear probing, and causal-style interventions via steering.
Significance. If the results hold, the work is significant for providing an empirical framework to study and mitigate positional biases in LLM-generated MCQs, which has direct implications for reliable evaluation and data construction. Strengths include the scale of model coverage (10 LLMs + 5 VLMs), the use of activation steering as an intervention (a positive for moving beyond pure correlation), and the observation of family-level patterns in biases. These elements make the findings potentially actionable for controllable generation.
major comments (3)
- [Probing experiments] Probing experiments: the central claim that stem hidden states encode signals of 'implicit planning' is load-bearing but rests on correlational linear probes. This is consistent with learned training-data associations between stem patterns and position rather than a dedicated planning mechanism; the manuscript should add controls (e.g., ablation of surface features or content-matched baselines) or stronger causal tests to distinguish these alternatives.
- [Activation steering] Activation steering section: while steering shifts answer-position distributions, the intervention does not isolate a planning-specific mechanism from co-varying content or surface features. Additional ablations (e.g., steering on unrelated dimensions or random vectors) are needed to support the interpretation that the effect targets implicit positional planning.
- [Methods] Methods and experimental details: the abstract and results describe extensive experiments but omit sample sizes per task/model, the statistical tests used to establish 'systematic' biases, and explicit controls for confounding factors. These omissions undermine evaluation of the robustness of the structured-bias and predictive-signal claims.
minor comments (2)
- [Abstract] The abstract and introduction could more precisely separate the observational bias findings from the stronger 'implicit planning' interpretation to avoid overstatement.
- [Figures/Tables] Figure and table captions should include exact definitions of the position-bias metric and probe accuracy computation for reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address each major comment below, providing our responses and indicating planned revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: Probing experiments: the central claim that stem hidden states encode signals of 'implicit planning' is load-bearing but rests on correlational linear probes. This is consistent with learned training-data associations between stem patterns and position rather than a dedicated planning mechanism; the manuscript should add controls (e.g., ablation of surface features or content-matched baselines) or stronger causal tests to distinguish these alternatives.
Authors: We agree that linear probes yield correlational evidence and cannot alone establish a dedicated planning mechanism over learned associations. The probes operate on stem representations before option generation, and predictive signals are consistent across model families and tasks, which reduces the likelihood of purely superficial confounds. Nevertheless, we will add the suggested controls in revision: content-matched baselines (pairing identical stems with varied positions) and surface-feature ablations (e.g., lexical shuffling or masking). We will also report probe performance on these controls to better isolate positional signals. revision: yes
-
Referee: Activation steering section: while steering shifts answer-position distributions, the intervention does not isolate a planning-specific mechanism from co-varying content or surface features. Additional ablations (e.g., steering on unrelated dimensions or random vectors) are needed to support the interpretation that the effect targets implicit positional planning.
Authors: We acknowledge that the current steering results demonstrate a causal influence on position distributions but do not fully rule out effects from co-varying features. The steered directions were derived from position-predictive probe weights, providing some specificity. To address this, we will include additional ablations: steering with random vectors of matched norm and with directions from unrelated dimensions (e.g., unrelated task probes). These will be reported to quantify the degree of specificity to positional planning. revision: yes
-
Referee: Methods and experimental details: the abstract and results describe extensive experiments but omit sample sizes per task/model, the statistical tests used to establish 'systematic' biases, and explicit controls for confounding factors. These omissions undermine evaluation of the robustness of the structured-bias and predictive-signal claims.
Authors: We appreciate this observation. The revised manuscript will explicitly report sample sizes for every task-model combination, detail the statistical tests (chi-squared tests for uniformity of position distributions and permutation tests for probe significance), and describe controls already used (prompt randomization, content balancing across positions, and multiple prompt templates). These additions will improve transparency and allow readers to assess robustness directly. revision: yes
Circularity Check
No circularity: empirical probing and steering rest on independent measurements
full rationale
The paper reports observational position bias statistics across models, trains linear probes on stem hidden states to detect position-predictive information (standard supervised probe evaluation on held-out data), and performs activation steering interventions that causally alter output distributions. None of these steps reduce a claimed result to a fitted parameter or self-referential definition by construction. The interpretive phrase 'implicitly planned' is presented as a hypothesis suggested by the probe and steering results, not as a derived equality. No self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing premises. The work is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Hidden states in transformer layers encode semantically meaningful information about future generation choices
Reference graph
Works this paper leans on
-
[1]
Syntactic and semantic control of large language models via sequential
Loula, João and LeBrun, Benjamin and Du, Li and Lipkin, Ben and Pasti, Clemente and Grand, Gabriel and Liu, Tianyu and Emara, Yahya and Freedman, Marjorie and Eisner, Jason and Cotterell, Ryan and Mansinghka, Vikash and Lew, Alexander K. and Vieira, Tim and O’Donnell, Timothy J. , year=. Syntactic and Semantic Control of Large Language Models via Sequenti...
-
[2]
LayerNavigator: Finding Promising Intervention Layers for Efficient Activation Steering in Large Language Models , url=
Sun Hao and Peng Huailiang and Dai Qiong and Bai Xu and Cao Yanan , year=. LayerNavigator: Finding Promising Intervention Layers for Efficient Activation Steering in Large Language Models , url=
-
[3]
arXiv preprint arXiv:2406.03009 , year=
Wei, Sheng-Lun and Wu, Cheng-Kuang and Huang, Hen-Hsen and Chen, Hsin-Hsi , year=. Unveiling Selection Biases: Exploring Order and Token Sensitivity in Large Language Models , url=. doi:10.48550/arXiv.2406.03009 , abstractNote=
-
[4]
Guda, Blessed and Francis, Lawrence and Ashungafac, Gabrial Zencha and Joe-Wong, Carlee and Busogi, Moise , year=. Quantifying and Mitigating Selection Bias in LLMs: A Transferable LoRA Fine-Tuning and Efficient Majority Voting Approach , url=. doi:10.48550/arXiv.2511.21709 , abstractNote=
-
[5]
Choi, Hyeong Kyu and Xu, Weijie and Xue, Chi and Eckman, Stephanie and Reddy, Chandan K. , year=. Mitigating Selection Bias with Node Pruning and Auxiliary Options , url=. doi:10.48550/arXiv.2409.18857 , abstractNote=
-
[6]
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Li, Kenneth and Patel, Oam and Viégas, Fernanda and Pfister, Hanspeter and Wattenberg, Martin , year=. Inference-Time Intervention: Eliciting Truthful Answers from a Language Model , url=. doi:10.48550/arXiv.2306.03341 , abstractNote=
work page internal anchor Pith review doi:10.48550/arxiv.2306.03341
-
[7]
Steering Llama 2 via Contrastive Activation Addition
Rimsky, Nina and Gabrieli, Nick and Schulz, Julian and Tong, Meg and Hubinger, Evan and Turner, Alexander , editor=. Steering Llama 2 via Contrastive Activation Addition , url=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , publisher=. 2024 , month=aug, pages=. doi:10.18653/v1/2024.acl-lo...
-
[8]
Steer2Adapt: Dynamically Composing Steering Vectors Elicits Efficient Adaptation of LLMs , url=
Han, Pengrui and Xu, Xueqiang and Xuan, Keyang and Song, Peiyang and Ouyang, Siru and Tian, Runchu and Jiang, Yuqing and Qian, Cheng and Jiang, Pengcheng and Sun, Jiashuo and Cui, Junxia and Zhong, Ming and Liu, Ge and Han, Jiawei and You, Jiaxuan , year=. Steer2Adapt: Dynamically Composing Steering Vectors Elicits Efficient Adaptation of LLMs , url=. doi...
-
[9]
Improving Instruction-Following in Language Models through Activation Steering, April 2025
Stolfo, Alessandro and Balachandran, Vidhisha and Yousefi, Safoora and Horvitz, Eric and Nushi, Besmira , year=. Improving Instruction-Following in Language Models through Activation Steering , url=. doi:10.48550/arXiv.2410.12877 , abstractNote=
-
[10]
Extracting Latent Steering Vectors from Pretrained Language Models
Subramani, Nishant and Suresh, Nivedita and Peters, Matthew , editor=. Extracting Latent Steering Vectors from Pretrained Language Models , url=. Findings of the Association for Computational Linguistics: ACL 2022 , publisher=. 2022 , month=may, pages=. doi:10.18653/v1/2022.findings-acl.48 , abstractNote=
-
[11]
arXiv.org , author=
Steering Language Models With Activation Engineering , url=. arXiv.org , author=. 2023 , month=aug, language=
2023
-
[12]
Men, Tianyi and Cao, Pengfei and Jin, Zhuoran and Chen, Yubo and Liu, Kang and Zhao, Jun , editor=. Unlocking the Future: Exploring Look-Ahead Planning Mechanistic Interpretability in Large Language Models , url=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , publisher=. 2024 , month=nov, pages=. doi:10.18653/v1/...
-
[13]
Extracting Paragraphs from LLM Token Activations , url=
Pochinkov, Nicholas and Benoit, Angelo and Agarwal, Lovkush and Majid, Zainab Ali and Ter-Minassian, Lucile , year=. Extracting Paragraphs from LLM Token Activations , url=. doi:10.48550/arXiv.2409.06328 , abstractNote=
-
[14]
Can multiple- choice questions really be useful in detecting the abilities of llms?
Li, Wangyue and Li, Liangzhi and Xiang, Tong and Liu, Xiao and Deng, Wei and Garcia, Noa , year=. Can multiple-choice questions really be useful in detecting the abilities of LLMs? , url=. doi:10.48550/arXiv.2403.17752 , abstractNote=
-
[15]
Questioning the Survey Responses of Large Language Models , url=
Dominguez-Olmedo, Ricardo and Hardt, Moritz and Mendler-Dünner, Celestine , year=. Questioning the Survey Responses of Large Language Models , url=. doi:10.48550/arXiv.2306.07951 , abstractNote=
-
[16]
and Zhang, Hao and Gonzalez, Joseph E
Zheng, Lianmin and Chiang, Wei-Lin and Sheng, Ying and Zhuang, Siyuan and Wu, Zhanghao and Zhuang, Yonghao and Lin, Zi and Li, Zhuohan and Li, Dacheng and Xing, Eric P. and Zhang, Hao and Gonzalez, Joseph E. and Stoica, Ion , year=. Judging LLM-as-a-judge with MT-bench and Chatbot Arena , abstractNote=. Proceedings of the 37th International Conference on ...
-
[17]
Wu, Wilson and Morris, John X. and Levine, Lionel , year=. Do language models plan ahead for future tokens? , url=. doi:10.48550/arXiv.2404.00859 , abstractNote=
-
[18]
Emergent response planning in llm
Dong, Zhichen and Zhou, Zhanhui and Liu, Zhixuan and Yang, Chao and Lu, Chaochao , year=. Emergent Response Planning in LLMs , url=. doi:10.48550/arXiv.2502.06258 , abstractNote=
-
[19]
https://doi.org/10.18653/v1/2024.alvr-1.15,https://aclanthology.org/ 2024.alvr-1.15/
Urailertprasert, Norawit and Limkonchotiwat, Peerat and Suwajanakorn, Supasorn and Nutanong, Sarana , editor=. SEA-VQA: Southeast Asian Cultural Context Dataset For Visual Question Answering , url=. Proceedings of the 3rd Workshop on Advances in Language and Vision Research (ALVR) , publisher=. 2024 , month=aug, pages=. doi:10.18653/v1/2024.alvr-1.15 , ab...
-
[20]
Findings of the Association for Computational Linguistics: NAACL 2024 , pages =
Pezeshkpour, Pouya and Hruschka, Estevam , editor=. Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions , url=. Findings of the Association for Computational Linguistics: NAACL 2024 , publisher=. 2024 , month=june, pages=. doi:10.18653/v1/2024.findings-naacl.130 , abstractNote=
-
[21]
Automatic Multiple-Choice Question Generation and Evaluation Systems Based on LLM: A Study Case With University Resolutions , url=
Mucciaccia, Sérgio Silva and Meireles Paixão, Thiago and Wall Mutz, Filipe and Santos Badue, Claudine and Ferreira de Souza, Alberto and Oliveira-Santos, Thiago , editor=. Automatic Multiple-Choice Question Generation and Evaluation Systems Based on LLM: A Study Case With University Resolutions , url=. Proceedings of the 31st International Conference on C...
2025
-
[22]
Quasi: a synthetic Question-Answering dataset in Swedish using GPT-3 and zero-shot learning , url=
Kalpakchi, Dmytro and Boye, Johan , editor=. Quasi: a synthetic Question-Answering dataset in Swedish using GPT-3 and zero-shot learning , url=. Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa) , publisher=. 2023 , month=may, pages=
2023
-
[23]
Loginova, Olga and Bezrukov, Oleksandr and Shekhar, Ravi and Kravets, Alexey , editor=. Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models , ISBN=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , publisher=. 2025 , month=j...
-
[24]
Maar, Jim and Paperno, Denis and McDougall, Callum Stuart and Nanda, Neel , year=. What’s the plan? Metrics for implicit planning in LLMs and their application to rhyme generation and question answering , url=. doi:10.48550/arXiv.2601.20164 , abstractNote=
work page internal anchor Pith review doi:10.48550/arxiv.2601.20164
-
[25]
Large language models are not robust multiple choice selectors.arXiv preprint arXiv:2309.03882, 2023
Zheng, Chujie and Zhou, Hao and Meng, Fandong and Zhou, Jie and Huang, Minlie , year=. Large Language Models Are Not Robust Multiple Choice Selectors , url=. doi:10.48550/arXiv.2309.03882 , abstractNote=
-
[26]
Beyond Performance: Quantifying and Mitigating Label Bias in LLMs , url=
Reif, Yuval and Schwartz, Roy , editor=. Beyond Performance: Quantifying and Mitigating Label Bias in LLMs , url=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , publisher=. 2024 , month=june, pages=. doi:10.18653/v1/2024.naacl-long.37...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.