pith. sign in

Scaling reasoning, losing control: Evaluating instruction following in large reasoning models.arXiv preprint arXiv:2505.14810,

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

years

2026 1 2025 4

representative citing papers

Retrieval-of-Thought: Efficient Reasoning via Reusing Thoughts

cs.AI · 2025-09-26 · unverdicted · novelty 6.0

Retrieval-of-Thought organizes prior reasoning into a thought graph for retrieval and reward-guided recombination, reducing output tokens by up to 40% and latency by 82% while preserving accuracy on reasoning benchmarks.

Learning to Reason under Off-Policy Guidance

cs.LG · 2025-04-21 · unverdicted · novelty 6.0

LUFFY mixes off-policy reasoning traces into RLVR training via Mixed-Policy GRPO and regularized importance sampling, delivering over 6-point gains on math benchmarks and enabling training of weak models where on-policy RLVR fails.

citing papers explorer

Showing 5 of 5 citing papers.