arXiv preprint arXiv:2505.17621 , year=

Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration , author= · 2025 · arXiv 2505.17621

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Epistemic Uncertainty for Test-Time Discovery

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

UG-TTT adds epistemic uncertainty measured by adapter disagreement as an exploration bonus in RL for LLMs, raising maximum reward and diversity on scientific discovery benchmarks.

MemSearch-o1: Empowering Large Language Models with Reasoning-Aligned Memory Growth in Agentic Search

cs.IR · 2026-04-19 · unverdicted · novelty 6.0

MemSearch-o1 mitigates memory dilution in agentic LLM search through reasoning-aligned token-level memory growth, retracing with a contribution function, and path reorganization, improving reasoning activation on benchmarks.

How You Begin is How You Reason: Driving Exploration in RLVR via Prefix-Tuned Priors

cs.AI · 2026-05-09 · unverdicted · novelty 5.0

IMAX trains soft prefixes with an InfoMax reward to drive diverse exploration in RLVR, yielding up to 11.60% gains in Pass@4 over standard RLVR across model scales.

Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

cs.AI · 2026-05-07 · unverdicted · novelty 5.0 · 3 refs

Skill1 trains a single RL policy to co-evolve skill selection, utilization, and distillation in language model agents from one task-outcome reward, using low-frequency trends to credit selection and high-frequency variation to credit distillation, outperforming baselines on ALFWorld and WebShop.

Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

cs.AI · 2026-04-26 · conditional · novelty 5.0

Tandem lets a large model supply compact strategic guidance to a small model for reasoning tasks, achieving similar or better performance at about 40 percent lower cost through adaptive early stopping.

citing papers explorer

Showing 5 of 5 citing papers.

Epistemic Uncertainty for Test-Time Discovery cs.LG · 2026-05-11 · unverdicted · none · ref 9
UG-TTT adds epistemic uncertainty measured by adapter disagreement as an exploration bonus in RL for LLMs, raising maximum reward and diversity on scientific discovery benchmarks.
MemSearch-o1: Empowering Large Language Models with Reasoning-Aligned Memory Growth in Agentic Search cs.IR · 2026-04-19 · unverdicted · none · ref 56
MemSearch-o1 mitigates memory dilution in agentic LLM search through reasoning-aligned token-level memory growth, retracing with a contribution function, and path reorganization, improving reasoning activation on benchmarks.
How You Begin is How You Reason: Driving Exploration in RLVR via Prefix-Tuned Priors cs.AI · 2026-05-09 · unverdicted · none · ref 13
IMAX trains soft prefixes with an InfoMax reward to drive diverse exploration in RLVR, yielding up to 11.60% gains in Pass@4 over standard RLVR across model scales.
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning cs.AI · 2026-05-07 · unverdicted · none · ref 88 · 3 links
Skill1 trains a single RL policy to co-evolve skill selection, utilization, and distillation in language model agents from one task-outcome reward, using low-frequency trends to credit selection and high-frequency variation to credit distillation, outperforming baselines on ALFWorld and WebShop.
Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning cs.AI · 2026-04-26 · conditional · none · ref 4
Tandem lets a large model supply compact strategic guidance to a small model for reasoning tasks, achieving similar or better performance at about 40 percent lower cost through adaptive early stopping.

arXiv preprint arXiv:2505.17621 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer