G rokking of H ierarchical S tructure in V anilla T ransformers

Murty, Shikhar, Sharma, Pratyusha, Andreas, Jacob, Manning, Christopher · 2023 · DOI 10.18653/v1/2023.acl-short.39

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

MTA: Multi-Granular Trajectory Alignment for Large Language Model Distillation

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

MTA improves LLM knowledge distillation by aligning representations along layer-wise trajectories with adaptive granularity from words to phrases using dynamic structural and hidden representation alignment losses.

citing papers explorer

Showing 1 of 1 citing paper.

MTA: Multi-Granular Trajectory Alignment for Large Language Model Distillation cs.CL · 2026-05-02 · unverdicted · none · ref 92
MTA improves LLM knowledge distillation by aligning representations along layer-wise trajectories with adaptive granularity from words to phrases using dynamic structural and hidden representation alignment losses.

G rokking of H ierarchical S tructure in V anilla T ransformers

fields

years

verdicts

representative citing papers

citing papers explorer