arXiv preprint arXiv:2502.17543 , year=

Training a generally curious agent , author= · arXiv 2502.17543

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

On-Policy Self-Distillation with Sampled Demonstrations Reduces Output Diversity

cs.LG · 2026-06-24 · unverdicted · novelty 6.0

On-policy self-distillation with sampled demonstrations reduces rollout diversity by amplifying existing probability gaps in the base model, unlike ideal RL which preserves ratios among correct outputs.

citing papers explorer

Showing 1 of 1 citing paper after filters.

On-Policy Self-Distillation with Sampled Demonstrations Reduces Output Diversity cs.LG · 2026-06-24 · unverdicted · none · ref 34
On-policy self-distillation with sampled demonstrations reduces rollout diversity by amplifying existing probability gaps in the base model, unlike ideal RL which preserves ratios among correct outputs.

arXiv preprint arXiv:2502.17543 , year=

fields

years

verdicts

representative citing papers

citing papers explorer