Mismatched wrong drafts from a 1.5B math model injected into GRPO training of a 7B model yield higher pass rates on MATH-500 and AIME than on-policy baselines or matched variants.
C.3 AIME 2026 Problem 22 (inverse case: base>ours) Problem.A standard fair six-sided die is rolled repeatedly
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Weak-to-Strong Elicitation via Mismatched Wrong Drafts
Mismatched wrong drafts from a 1.5B math model injected into GRPO training of a 7B model yield higher pass rates on MATH-500 and AIME than on-policy baselines or matched variants.