Scaling reasoning, losing control: Evaluating instruction following in large reasoning models.arXiv preprint arXiv:2505.14810,

Tingchen Fu, Jiawei Gu, Yafu Li, Xiaoye Qu, Yu Cheng · 2025 · arXiv 2505.14810

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

ImpRIF: Stronger Implicit Reasoning Leads to Better Complex Instruction Following

cs.CL · 2026-02-04 · unverdicted · novelty 6.0

ImpRIF improves LLM complex instruction following by synthesizing data from reasoning graphs and training models to reason explicitly along those graphs.

Retrieval-of-Thought: Efficient Reasoning via Reusing Thoughts

cs.AI · 2025-09-26 · unverdicted · novelty 6.0

Retrieval-of-Thought organizes prior reasoning into a thought graph for retrieval and reward-guided recombination, reducing output tokens by up to 40% and latency by 82% while preserving accuracy on reasoning benchmarks.

Learning to Reason under Off-Policy Guidance

cs.LG · 2025-04-21 · unverdicted · novelty 6.0

LUFFY mixes off-policy reasoning traces into RLVR training via Mixed-Policy GRPO and regularized importance sampling, delivering over 6-point gains on math benchmarks and enabling training of weak models where on-policy RLVR fails.

A Survey of Reinforcement Learning for Large Reasoning Models

cs.CL · 2025-09-10 · accept · novelty 3.0

A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards

cs.LG · 2025-09-26

citing papers explorer

Showing 5 of 5 citing papers.

ImpRIF: Stronger Implicit Reasoning Leads to Better Complex Instruction Following cs.CL · 2026-02-04 · unverdicted · none · ref 1
ImpRIF improves LLM complex instruction following by synthesizing data from reasoning graphs and training models to reason explicitly along those graphs.
Retrieval-of-Thought: Efficient Reasoning via Reusing Thoughts cs.AI · 2025-09-26 · unverdicted · none · ref 7
Retrieval-of-Thought organizes prior reasoning into a thought graph for retrieval and reward-guided recombination, reducing output tokens by up to 40% and latency by 82% while preserving accuracy on reasoning benchmarks.
Learning to Reason under Off-Policy Guidance cs.LG · 2025-04-21 · unverdicted · none · ref 57
LUFFY mixes off-policy reasoning traces into RLVR training via Mixed-Policy GRPO and regularized importance sampling, delivering over 6-point gains on math benchmarks and enabling training of weak models where on-policy RLVR fails.
A Survey of Reinforcement Learning for Large Reasoning Models cs.CL · 2025-09-10 · accept · none · ref 146
A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.
Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards cs.LG · 2025-09-26 · unreviewed · ref 7

Scaling reasoning, losing control: Evaluating instruction following in large reasoning models.arXiv preprint arXiv:2505.14810,

fields

years

verdicts

representative citing papers

citing papers explorer