pith. sign in

hub

Explaining neural scaling laws

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

hub tools

citation-role summary

background 3 method 1

citation-polarity summary

years

2026 11 2024 1

clear filters

representative citing papers

Internal Data Repetition Destroys Language Models

cs.LG · 2026-06-23 · unverdicted · novelty 6.0

Repetition of training data produces a systematic eval loss peak at intermediate repeat counts whose location scales with model size, quantifiable as large compute-equivalent loss even at modest repetition fractions.

Explaining Data Mixing Scaling Laws

cs.LG · 2026-06-06 · unverdicted · novelty 6.0

A framework using capacity competition and noise reduction under an overlapping-skills assumption explains multi-domain loss behaviors and extrapolates optimal mixtures to large scales from small-scale fits with fewer parameters.

Scaling and renormalization in high-dimensional regression

stat.ML · 2024-05-01 · unverdicted · novelty 6.0

Ridge regression in high dimensions exhibits power-law scalings because covariance fluctuations renormalize the ridge parameter, allowing closed-form error expressions and bias-variance decompositions for random feature models via free probability.

Two AI Metrics Diverged: Will it Make All the Difference?

cs.AI · 2026-07-01 · unverdicted · novelty 5.0

Bounded performance metrics always favor convergence of AI capabilities to meek models while unbounded metrics allow frontier models to maintain leads indefinitely, with policy implications for capability concentration.

There Will Be a Scientific Theory of Deep Learning

stat.ML · 2026-04-23 · unverdicted · novelty 2.0

A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.

citing papers explorer

Showing 4 of 4 citing papers after filters.

  • Spectral Lens: Activation and Gradient Spectra as Diagnostics of LLM Optimization stat.ML · 2026-05-07 · unverdicted · none · ref 10

    Spectral analysis of activations and gradients provides new diagnostics that link batch size to representation geometry, early covariance tails to token efficiency, and spectral shifts to learning dynamics in decoder-only LLMs, backed by a mechanistic model.

  • Scaling and renormalization in high-dimensional regression stat.ML · 2024-05-01 · unverdicted · none · ref 4

    Ridge regression in high dimensions exhibits power-law scalings because covariance fluctuations renormalize the ridge parameter, allowing closed-form error expressions and bias-variance decompositions for random feature models via free probability.

  • There Will Be a Scientific Theory of Deep Learning stat.ML · 2026-04-23 · unverdicted · none · ref 107

    A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.

  • Statistical Properties of Training & Generalization stat.ML · 2026-06-18 · unverdicted · none · ref 200 · 2 links

    Review of neural scaling laws and their relation to constraints and inductive biases when applying machine learning to physics problems.