OPSD functions primarily as a compression stage after RLVR in mathematical reasoning, preserving accuracy on correct rollouts but reducing it on incorrect ones, supporting a SFT-then-RLVR-then-OPSD pipeline.
**Patterns to Recognize:** - **Pair 2s and 5s:** Whenever a product involves $2^n \cdot 5^m$, pairing $2^k \cdot 5^k = 10^k$ simplifies the problem
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
OPSD Compresses What RLVR Teaches: A Post-RL Compaction Stage for Reasoning Models
OPSD functions primarily as a compression stage after RLVR in mathematical reasoning, preserving accuracy on correct rollouts but reducing it on incorrect ones, supporting a SFT-then-RLVR-then-OPSD pipeline.