OPSD functions primarily as a compression stage after RLVR in mathematical reasoning, preserving accuracy on correct rollouts but reducing it on incorrect ones, supporting a SFT-then-RLVR-then-OPSD pipeline.
Mitigating overthinking in large reasoning models via difficulty-aware reinforcement learning.CoRR, abs/2601.21418
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
OPSD Compresses What RLVR Teaches: A Post-RL Compaction Stage for Reasoning Models
OPSD functions primarily as a compression stage after RLVR in mathematical reasoning, preserving accuracy on correct rollouts but reducing it on incorrect ones, supporting a SFT-then-RLVR-then-OPSD pipeline.