Training compute-optimal large language models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al · 2022

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

background 2 other 1

citation-polarity summary

background 2 unclear 1

representative citing papers

PRISM: A Geometric Risk Bound that Decomposes Drift into Scale, Shape, and Head

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

PRISM supplies a geometric upper bound on LLM variant risk that splits drift into scale, shape, and head axes and doubles as a differentiable regularizer against forgetting.

Selection Plateau and a Sparsity-Dependent Hierarchy of Pruning Features

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

All rank-monotone pruning scorers converge to identical accuracy at fixed sparsity, but non-monotone features with sparsity-dependent complexity can escape this plateau, as shown by the SICS hypothesis on ViT-Small/CIFAR-10.

ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

ForgeVLA enables federated VLA model training from unlabeled vision-action pairs by recovering language via embodied classifiers and using contrastive planning plus adaptive aggregation to avoid feature collapse.

citing papers explorer

Showing 3 of 3 citing papers.

PRISM: A Geometric Risk Bound that Decomposes Drift into Scale, Shape, and Head cs.CL · 2026-05-12 · unverdicted · none · ref 20
PRISM supplies a geometric upper bound on LLM variant risk that splits drift into scale, shape, and head axes and doubles as a differentiable regularizer against forgetting.
Selection Plateau and a Sparsity-Dependent Hierarchy of Pruning Features cs.LG · 2026-05-10 · unverdicted · none · ref 9
All rank-monotone pruning scorers converge to identical accuracy at fixed sparsity, but non-monotone features with sparsity-dependent complexity can escape this plateau, as shown by the SICS hypothesis on ViT-Small/CIFAR-10.
ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations cs.CV · 2026-05-08 · unverdicted · none · ref 20
ForgeVLA enables federated VLA model training from unlabeled vision-action pairs by recovering language via embodied classifiers and using contrastive planning plus adaptive aggregation to avoid feature collapse.

Training compute-optimal large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer