Scaling laws for neural language models.arXiv, 2020

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei · 2020

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Enhancing Mixture-of-Experts Specialization via Cluster-Aware Upcycling

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

Cluster-aware Upcycling breaks MoE expert symmetry by initializing experts from semantic activation clusters using SVD subspaces and cluster centroids for the router, plus self-distillation, yielding better zero- and few-shot performance on CLIP ViT models than standard upcycling.

When Pretty Isn't Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators

cs.CV · 2026-02-23 · unverdicted · novelty 6.0

Newer text-to-image models produce less diverse synthetic data, causing classifiers trained solely on them to show declining accuracy on real test sets.

citing papers explorer

Showing 2 of 2 citing papers.

Enhancing Mixture-of-Experts Specialization via Cluster-Aware Upcycling cs.CV · 2026-04-15 · unverdicted · none · ref 19
Cluster-aware Upcycling breaks MoE expert symmetry by initializing experts from semantic activation clusters using SVD subspaces and cluster centroids for the router, plus self-distillation, yielding better zero- and few-shot performance on CLIP ViT models than standard upcycling.
When Pretty Isn't Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators cs.CV · 2026-02-23 · unverdicted · none · ref 25
Newer text-to-image models produce less diverse synthetic data, causing classifiers trained solely on them to show declining accuracy on real test sets.

Scaling laws for neural language models.arXiv, 2020

fields

years

verdicts

representative citing papers

citing papers explorer