The 10K initial responses are roughly split into 5K correct and 5K incorrect responses

Verification:Verify the binary reward r∈ { 0, 1} for each yinitial, build selfrevision promptP r

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision

cs.CL · 2026-04-13 · unverdicted · novelty 6.0

SD-Zero converts binary rewards into dense self-supervision by having a model revise its own outputs and distill the improvements back into generation, yielding at least 10% gains on math and code benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision cs.CL · 2026-04-13 · unverdicted · none · ref 2
SD-Zero converts binary rewards into dense self-supervision by having a model revise its own outputs and distill the improvements back into generation, yielding at least 10% gains on math and code benchmarks.

The 10K initial responses are roughly split into 5K correct and 5K incorrect responses

fields

years

verdicts

representative citing papers

citing papers explorer