pith. sign in

Openelm: An efficient language model family with open training and inference framework.arXiv:2404.14619,

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.LG 4 cs.DC 1

roles

background 1

polarities

background 1

representative citing papers

Strong Teacher Not Needed? On Distillation in LLM Pretraining

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

Even small or undertrained teachers improve larger LLM students via distillation with tuned loss mixing, while stronger teachers can saturate or reverse gains and distillation aids generalization more than in-domain fit.

FlashNorm: Fast Normalization for Transformers

cs.LG · 2024-07-12 · accept · novelty 6.0

FlashNorm is an exact algebraic reformulation of RMSNorm plus linear projection that folds weights and defers normalization to allow parallel execution, plus scale-invariance simplifications that remove redundant norms in certain architectures.

citing papers explorer

Showing 5 of 5 citing papers.