5th International Conference on Learning Representations

Stephen Merity, Caiming Xiong, James Bradbury, Richard Socher , title = · 2017

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

A Study on Hidden Layer Distillation for Large Language Model Pre-Training

cs.CL · 2026-05-12 · unverdicted · novelty 5.0

Hidden layer distillation yields systematic perplexity gains over logit KD in LLM pre-training but does not consistently improve downstream performance.

K-Quantization and its Impact on Output Performance

cs.CL · 2026-05-19 · unverdicted · novelty 3.0

Empirical evaluation of quantization effects on eight LLMs across bit widths, showing performance generally declines at lower precision but with model-size-dependent resilience and acceptable accuracy at 2 bits for many cases.

citing papers explorer

Showing 2 of 2 citing papers.

A Study on Hidden Layer Distillation for Large Language Model Pre-Training cs.CL · 2026-05-12 · unverdicted · none · ref 28
Hidden layer distillation yields systematic perplexity gains over logit KD in LLM pre-training but does not consistently improve downstream performance.
K-Quantization and its Impact on Output Performance cs.CL · 2026-05-19 · unverdicted · none · ref 26
Empirical evaluation of quantization effects on eight LLMs across bit widths, showing performance generally declines at lower precision but with model-size-dependent resilience and acceptable accuracy at 2 bits for many cases.

5th International Conference on Learning Representations

fields

years

verdicts

representative citing papers

citing papers explorer