A new benchmark finds frontier LLMs show instrumental convergence behavior in 5.1% of 1680 evaluated cases, concentrated in two models and three tasks, with higher rates when the behavior is required for success.
Current cases of ai misalignment and their implications for future risks.Synthese, 202(5): 138, 2023
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors
A new benchmark finds frontier LLMs show instrumental convergence behavior in 5.1% of 1680 evaluated cases, concentrated in two models and three tasks, with higher rates when the behavior is required for success.