Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi- Stage RL.ArXiv, abs/2505.10832

Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL , author= · 2025 · arXiv 2505.10832

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

An Efficient Streaming Video Understanding Framework with Agentic Control

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

R3-Streaming uses cascaded control, age-aware memory forgetting, and TB-GRPO reinforcement learning to reach SOTA scores on streaming video benchmarks while cutting visual token usage by 95-96%.

Nice Fold or Hero Call: Learning Budget-Efficient Thinking for Adaptive Reasoning

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

BET reduces reasoning tokens by about 55% on average while improving performance across benchmarks by learning to short-solve easy queries, fold early on unsolvable ones, and preserve budget for hard solvable queries.

HiRO-Nav: Hybrid ReasOning Enables Efficient Embodied Navigation

cs.AI · 2026-04-09 · unverdicted · novelty 6.0

HiRO-Nav adaptively triggers reasoning only on high-entropy actions via a hybrid training pipeline and shows better success-token trade-offs than always-reason or never-reason baselines on the CHORES-S benchmark.

Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training

cs.AI · 2025-09-30 · unverdicted · novelty 6.0

Post-training on reasoning tasks sparks the emergence of specialized attention heads that enable structured computation, with SFT adding stable heads while GRPO uses dynamic activation and pruning tied to reward signals, and controllable think models relying on compensatory heads instead of specific

citing papers explorer

Showing 4 of 4 citing papers.

An Efficient Streaming Video Understanding Framework with Agentic Control cs.CV · 2026-05-18 · unverdicted · none · ref 25
R3-Streaming uses cascaded control, age-aware memory forgetting, and TB-GRPO reinforcement learning to reach SOTA scores on streaming video benchmarks while cutting visual token usage by 95-96%.
Nice Fold or Hero Call: Learning Budget-Efficient Thinking for Adaptive Reasoning cs.AI · 2026-05-12 · unverdicted · none · ref 40
BET reduces reasoning tokens by about 55% on average while improving performance across benchmarks by learning to short-solve easy queries, fold early on unsolvable ones, and preserve budget for hard solvable queries.
HiRO-Nav: Hybrid ReasOning Enables Efficient Embodied Navigation cs.AI · 2026-04-09 · unverdicted · none · ref 31
HiRO-Nav adaptively triggers reasoning only on high-entropy actions via a hybrid training pipeline and shows better success-token trade-offs than always-reason or never-reason baselines on the CHORES-S benchmark.
Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training cs.AI · 2025-09-30 · unverdicted · none · ref 40
Post-training on reasoning tasks sparks the emergence of specialized attention heads that enable structured computation, with SFT adding stable heads while GRPO uses dynamic activation and pruning tied to reward signals, and controllable think models relying on compensatory heads instead of specific

Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi- Stage RL.ArXiv, abs/2505.10832

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer