HERO'S JOURNEY benchmark evaluates LLMs on attribute and procedural rule induction across four structural forms, finding limited uneven performance with execution as the main bottleneck and steering helping only attribute tasks.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
HERO'S JOURNEY: Testing Complex Rule Induction with Text Games
HERO'S JOURNEY benchmark evaluates LLMs on attribute and procedural rule induction across four structural forms, finding limited uneven performance with execution as the main bottleneck and steering helping only attribute tasks.