arXiv preprint arXiv:2508.10142 , year=

Multi-turn puzzles: Evaluating interactive reasoning, strategic dialogue in llms , author= · arXiv 2508.10142

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Can In-Context Learning Support Intrinsic Curiosity?

cs.LG · 2026-06-17 · unverdicted · novelty 7.0

ICL-derived intrinsic rewards are biased in general MDPs but asymptotically match true learning progress in non-temporal settings, with supporting experiments.

Evaluating Interactive Reasoning in Large Language Models: A Hierarchical Benchmark with Executable Games

cs.AI · 2026-05-26 · unverdicted · novelty 7.0

The paper introduces a multi-turn interactive benchmark using 474 executable games to evaluate LLMs on evidence acquisition, belief updating, contextual robustness, and metacognitive adaptation, revealing large performance gaps and sensitivity to perturbations.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Evaluating Interactive Reasoning in Large Language Models: A Hierarchical Benchmark with Executable Games cs.AI · 2026-05-26 · unverdicted · none · ref 16
The paper introduces a multi-turn interactive benchmark using 474 executable games to evaluate LLMs on evidence acquisition, belief updating, contextual robustness, and metacognitive adaptation, revealing large performance gaps and sensitivity to perturbations.

arXiv preprint arXiv:2508.10142 , year=

fields

years

verdicts

representative citing papers

citing papers explorer