A Resource Model For Neural Scaling Law

Jeff Gore; Jinyeop Song; Max Tegmark; Ziming Liu

arxiv: 2402.05164 · v2 · pith:KN53N7E3new · submitted 2024-02-07 · 💻 cs.LG · cs.AI· cs.NE

A Resource Model For Neural Scaling Law

Jinyeop Song , Ziming Liu , Max Tegmark , Jeff Gore This is my paper

classification 💻 cs.LG cs.AIcs.NE

keywords neuralmodelscalingcompositeresourceresourcessubtasksacquired

0 comments

read the original abstract

Neural scaling laws characterize how model performance improves as the model size scales up. Inspired by empirical observations, we introduce a resource model of neural scaling. A task is usually composite hence can be decomposed into many subtasks, which compete for resources (measured by the number of neurons allocated to subtasks). On toy problems, we empirically find that: (1) The loss of a subtask is inversely proportional to its allocated neurons. (2) When multiple subtasks are present in a composite task, the resources acquired by each subtask uniformly grow as models get larger, keeping the ratios of acquired resources constants. We hypothesize these findings to be generally true and build a model to predict neural scaling laws for general composite tasks, which successfully replicates the neural scaling law of Chinchilla models reported in arXiv:2203.15556. We believe that the notion of resource used in this paper will be a useful tool for characterizing and diagnosing neural networks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

KAN: Kolmogorov-Arnold Networks
cs.LG 2024-04 conditional novelty 8.0

KANs with learnable univariate spline activations on edges achieve better accuracy than MLPs with fewer parameters, faster scaling, and direct visualization for scientific discovery.
Superposition Yields Robust Neural Scaling
cs.LG 2025-05 conditional novelty 6.0

Strong superposition causes neural loss to scale as the inverse of model dimension due to geometric feature overlaps, explaining scaling laws for broad frequency distributions.
Law of Neural Interaction: Depth-Width Shape, Interaction Efficiency, and Generalization
cs.LG 2026-05 unverdicted novelty 5.0

Tuning the depth-width ratio positions models in an efficient neural interaction interval that correlates with better generalization under fixed budgets and remains stable with scale.