MTA improves LLM knowledge distillation by aligning representations along layer-wise trajectories with adaptive granularity from words to phrases using dynamic structural and hidden representation alignment losses.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
SRA reframes cross-tokenizer LLM distillation as alignment of attention-weighted span centers of mass in a multi-particle dynamical system and reports consistent gains over prior CTKD baselines.
citing papers explorer
-
MTA: Multi-Granular Trajectory Alignment for Large Language Model Distillation
MTA improves LLM knowledge distillation by aligning representations along layer-wise trajectories with adaptive granularity from words to phrases using dynamic structural and hidden representation alignment losses.
-
SRA: Span Representation Alignment for Large Language Model Distillation
SRA reframes cross-tokenizer LLM distillation as alignment of attention-weighted span centers of mass in a multi-particle dynamical system and reports consistent gains over prior CTKD baselines.