arXiv preprint arXiv:2503.24377 , year=

Harnessing the reasoning economy: A survey of efficient reasoning for large language models , author= · 2025 · arXiv 2503.24377

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models

cs.CL · 2026-06-16 · unverdicted · novelty 6.0

Dynamic Rollout Editing reduces overthinking in RL-trained LLMs by editing post-answer continuations in successful rollouts and preferring the edited versions within GRPO groups.

BV-Blend: Uncertainty-Weighted Historical Baselines for Stable Critic-Free RL with Verifiable Rewards

cs.AI · 2026-06-27 · unverdicted · novelty 4.0

BV-Blend blends prompt-local and semantic-cluster historical reward statistics via SEM-derived weights to stabilize critic-free RL advantage estimation.

citing papers explorer

Showing 1 of 1 citing paper after filters.

BV-Blend: Uncertainty-Weighted Historical Baselines for Stable Critic-Free RL with Verifiable Rewards cs.AI · 2026-06-27 · unverdicted · none · ref 103
BV-Blend blends prompt-local and semantic-cluster historical reward statistics via SEM-derived weights to stabilize critic-free RL advantage estimation.

arXiv preprint arXiv:2503.24377 , year=

fields

years

verdicts

representative citing papers

citing papers explorer