A 4B model post-trained with SFT, RL, and a reasoning cache surpasses larger open models and approaches proprietary ones on Olympiad proof generation.
3.h(2x) =2h(x)−cfor allx∈Z From these, we derive: - Ifc̸=0, thenh(x) =cfor allx, sof(x) =2x+c - If c= 0, then h(x) satisfies h(2x) = 2h(x) and h(f(x)) = 0
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
QED-Nano: Teaching a Tiny Model to Prove Hard Theorems
A 4B model post-trained with SFT, RL, and a reasoning cache surpasses larger open models and approaches proprietary ones on Olympiad proof generation.