Adhint: Adaptive hints with difficulty priors for reinforcement learning

Feng Zhang, Zezhong Tan, Xinhong Ma, Ziqiang Dong, Xi Leng, Jianfei Zhao, Xin Sun,andYangYang · 2025 · arXiv 2512.13095

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

AnE: Pushing the Reasoning Frontier of Multimodal LLMs via Anchor Evolution

cs.CV · 2026-05-25 · unverdicted · novelty 6.0

AnE combines Truth Anchor Expansion and Scaffold-Stripping to deliver 10.3% gains on eight multimodal reasoning benchmarks for MLLMs.

Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective

cs.LG · 2026-05-13 · unverdicted · novelty 6.0 · 2 refs

ConSPO is a new contrastive sequence-level policy optimization method that addresses GRPO limitations via length-normalized log-probability scores and InfoNCE-style objectives, outperforming baselines on reasoning benchmarks.

PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

PruneTIR prunes erroneous tool-call trajectories during LLM inference via three trigger-based components to raise Pass@1 accuracy and efficiency while shortening context.

Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime

cs.LG · 2026-05-06 · unverdicted · novelty 6.0 · 3 refs

Prefix Sampling replays self-generated trajectory prefixes to control rollout pass rates near 50% in binary-reward RL, delivering wall-clock speedups and modest performance gains on SWE-bench Verified and AIME tasks.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime cs.LG · 2026-05-06 · unverdicted · none · ref 8 · 3 links
Prefix Sampling replays self-generated trajectory prefixes to control rollout pass rates near 50% in binary-reward RL, delivering wall-clock speedups and modest performance gains on SWE-bench Verified and AIME tasks.

Adhint: Adaptive hints with difficulty priors for reinforcement learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer