pith. sign in

Moreover, KD is not monotonic in teacher strength: for autoregressive LMs, stronger teachers can sometimes degrade student performance (Zhong et al., 2024; Busbridge et al., 2025)

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.LG 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Strong Teacher Not Needed? On Distillation in LLM Pretraining

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

Even small or undertrained teachers improve larger LLM students via distillation with tuned loss mixing, while stronger teachers can saturate or reverse gains and distillation aids generalization more than in-domain fit.

citing papers explorer

Showing 1 of 1 citing paper.

  • Strong Teacher Not Needed? On Distillation in LLM Pretraining cs.LG · 2026-05-22 · unverdicted · none · ref 17

    Even small or undertrained teachers improve larger LLM students via distillation with tuned loss mixing, while stronger teachers can saturate or reverse gains and distillation aids generalization more than in-domain fit.