Misalignment or misuse? the agi alignment tradeoff

Max Hellrigel-Holderbaum, Leonard Dung · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

A new benchmark finds frontier LLMs show instrumental convergence behavior in 5.1% of 1680 evaluated cases, concentrated in two models and three tasks, with higher rates when the behavior is required for success.

citing papers explorer

Showing 1 of 1 citing paper.

Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors cs.AI · 2026-05-07 · unverdicted · none · ref 9
A new benchmark finds frontier LLMs show instrumental convergence behavior in 5.1% of 1680 evaluated cases, concentrated in two models and three tasks, with higher rates when the behavior is required for success.

Misalignment or misuse? the agi alignment tradeoff

fields

years

verdicts

representative citing papers

citing papers explorer