CuSearch reallocates rollout budget in RLVR toward deeper-search trajectories as a proxy for retrieval supervision density, yielding up to 11.8 exact-match gains over uniform GRPO sampling on ZeroSearch.
Measuring and narrowing the compositionality gap in language models
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.AI 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
LEAD lets LLMs solve checkers jumping puzzles up to size 13 by using lookahead to recover from irreversible errors on hard steps that break extreme decomposition.
Derives an information-theoretic accuracy upper bound for single-pass LLM multi-hop QA and introduces the InfoQA multi-call framework that improves performance by keeping per-step information load within model capacity.
citing papers explorer
-
CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG
CuSearch reallocates rollout budget in RLVR toward deeper-search trajectories as a proxy for retrieval supervision density, yielding up to 11.8 exact-match gains over uniform GRPO sampling on ZeroSearch.
-
LEAD: Breaking the No-Recovery Bottleneck in Long-Horizon Reasoning
LEAD lets LLMs solve checkers jumping puzzles up to size 13 by using lookahead to recover from irreversible errors on hard steps that break extreme decomposition.
-
A Fano-Style Accuracy Upper Bound for LLM Single-Pass Reasoning in Multi-Hop QA
Derives an information-theoretic accuracy upper bound for single-pass LLM multi-hop QA and introduces the InfoQA multi-call framework that improves performance by keeping per-step information load within model capacity.