Findings of the Association for Computational Linguistics

Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong, Furu Wei , title = · 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

A Study on Hidden Layer Distillation for Large Language Model Pre-Training

cs.CL · 2026-05-12 · unverdicted · novelty 5.0

Hidden layer distillation yields systematic perplexity gains over logit KD in LLM pre-training but does not consistently improve downstream performance.

citing papers explorer

Showing 1 of 1 citing paper.

A Study on Hidden Layer Distillation for Large Language Model Pre-Training cs.CL · 2026-05-12 · unverdicted · none · ref 10
Hidden layer distillation yields systematic perplexity gains over logit KD in LLM pre-training but does not consistently improve downstream performance.

Findings of the Association for Computational Linguistics

fields

years

verdicts

representative citing papers

citing papers explorer