Sequence-level knowledge distillation

Yoon Kim, Alexander M Rush · 2016

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Towards Distillation-Resistant Large Language Models: An Information-Theoretic Perspective

cs.CL · 2026-02-03 · unverdicted · novelty 7.0

A learned transformation matrix minimizes CMI in teacher logits to degrade distillation performance while preserving task accuracy.

Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

RESD turns failure trajectories into token-level supervision via retrospective reflections and a persistent global playbook, enabling faster improvement than standard self-distillation or GRPO with only one rollout per prompt.

citing papers explorer

Showing 2 of 2 citing papers.

Towards Distillation-Resistant Large Language Models: An Information-Theoretic Perspective cs.CL · 2026-02-03 · unverdicted · none · ref 5
A learned transformation matrix minimizes CMI in teacher logits to degrade distillation performance while preserving task accuracy.
Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation cs.LG · 2026-05-12 · unverdicted · none · ref 8
RESD turns failure trajectories into token-level supervision via retrospective reflections and a persistent global playbook, enabling faster improvement than standard self-distillation or GRPO with only one rollout per prompt.

Sequence-level knowledge distillation

fields

years

verdicts

representative citing papers

citing papers explorer