ESRRSim is a taxonomy-driven framework that generates evaluation scenarios and dual rubrics to measure emergent strategic reasoning risks like deception and reward hacking across 11 LLMs, finding detection rates from 14.45% to 72.72% with generational improvements.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Emergent Strategic Reasoning Risks in AI: A Taxonomy-Driven Evaluation Framework
ESRRSim is a taxonomy-driven framework that generates evaluation scenarios and dual rubrics to measure emergent strategic reasoning risks like deception and reward hacking across 11 LLMs, finding detection rates from 14.45% to 72.72% with generational improvements.