Sequence-level knowledge distillation

Yoon Kim, Alexander M Rush · 2016

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

cs.LG · 2026-02-12 · conditional · novelty 6.0

Generalized on-policy distillation with reward scaling above one (ExOPD) lets student models surpass teacher performance when merging domain experts on math and code tasks.

Multi-Token Prediction via Self-Distillation

cs.CL · 2026-02-05 · unverdicted · novelty 6.0

Self-distillation turns pretrained autoregressive LMs into multi-token predictors that decode over 3x faster with under 5% accuracy drop on GSM8K.

citing papers explorer

Showing 2 of 2 citing papers.

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation cs.LG · 2026-02-12 · conditional · none · ref 12
Generalized on-policy distillation with reward scaling above one (ExOPD) lets student models surpass teacher performance when merging domain experts on math and code tasks.
Multi-Token Prediction via Self-Distillation cs.CL · 2026-02-05 · unverdicted · none · ref 11
Self-distillation turns pretrained autoregressive LMs into multi-token predictors that decode over 3x faster with under 5% accuracy drop on GSM8K.

Sequence-level knowledge distillation

fields

years

verdicts

representative citing papers

citing papers explorer