ICL-derived intrinsic rewards are biased in general MDPs but asymptotically match true learning progress in non-temporal settings, with supporting experiments.
arXiv preprint arXiv:2508.10142 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
The paper introduces a multi-turn interactive benchmark using 474 executable games to evaluate LLMs on evidence acquisition, belief updating, contextual robustness, and metacognitive adaptation, revealing large performance gaps and sensitivity to perturbations.
citing papers explorer
-
Can In-Context Learning Support Intrinsic Curiosity?
ICL-derived intrinsic rewards are biased in general MDPs but asymptotically match true learning progress in non-temporal settings, with supporting experiments.
-
Evaluating Interactive Reasoning in Large Language Models: A Hierarchical Benchmark with Executable Games
The paper introduces a multi-turn interactive benchmark using 474 executable games to evaluate LLMs on evidence acquisition, belief updating, contextual robustness, and metacognitive adaptation, revealing large performance gaps and sensitivity to perturbations.