A model trained only by proposing and solving its own verifiable code tasks achieves state-of-the-art results on math and coding benchmarks without external data.
The outer loop iterates over each number, and the inner loop iterates over numbers that come after the current number in the outer loop (ሰđ২ೂ(2, 3)ބ3, 2))b
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
A model trained only by proposing and solving its own verifiable code tasks achieves state-of-the-art results on math and coding benchmarks without external data.