A model trained only by proposing and solving its own verifiable code tasks achieves state-of-the-art results on math and coding benchmarks without external data.
1·(4−1) = 3,2·(4−2) = 4,3·(4−3) = 3,4·(4−4) = 0, Sum:3+4+3+0=1 0 - For the input[4,3,2,1], the sorted form is[1,2,3,4], and the output is 20 (same as above)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
other 1
citation-polarity summary
fields
cs.LG 1years
2025 1verdicts
CONDITIONAL 1roles
other 1polarities
unclear 1representative citing papers
citing papers explorer
-
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
A model trained only by proposing and solving its own verifiable code tasks achieves state-of-the-art results on math and coding benchmarks without external data.