We also provide ablation study forτ eval in Section 4.3.3

By usingτ eval = 1as our default, we aim to provide a direct assessment of the model’s original capabilities as learned during training, without post-hoc optimization of decoding parameters · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

A Model Can Help Itself: Reward-Free Self-Training for LLM Reasoning

cs.LG · 2025-10-21 · unverdicted · novelty 5.0 · 2 refs

SePT alternates self-generation of responses at controlled temperatures with training on the latest model outputs, yielding gains over a strong no-training baseline on six math reasoning benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

A Model Can Help Itself: Reward-Free Self-Training for LLM Reasoning cs.LG · 2025-10-21 · unverdicted · none · ref 29 · 2 links
SePT alternates self-generation of responses at controlled temperatures with training on the latest model outputs, yielding gains over a strong no-training baseline on six math reasoning benchmarks.

We also provide ablation study forτ eval in Section 4.3.3

fields

years

verdicts

representative citing papers

citing papers explorer