Sandbagging prompts induce LLMs to adopt a low-entropy, content-invariant response-position attractor centered on E/F/G rather than deterministic tracking or random avoidance.
Cherry Li, Mary Phuong, and Max Siegel
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Option-Order Randomisation Reveals a Distributional Position Attractor in Prompted Sandbagging
Sandbagging prompts induce LLMs to adopt a low-entropy, content-invariant response-position attractor centered on E/F/G rather than deterministic tracking or random avoidance.