Recognition: no theorem link
Continued AI Scaling Requires Repeated Efficiency Doublings
Pith reviewed 2026-05-14 21:27 UTC · model grok-4.3
The pith
AI scaling continues only with repeated efficiency doublings in hardware, algorithms, and systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that if AI scaling is to remain active, repeated efficiency doublings are not optional. They are required. Classical AI scaling laws remain useful because they make progress predictable despite diminishing returns, but the compute variable in those laws is best read as logical compute, not as a record of one fixed physical implementation. Practical burden therefore depends on the efficiency with which physical resources realize that compute. Under that interpretation, diminishing returns mean rising operational burden, not merely a flatter curve. Sustained progress then requires recurrent gains in hardware, algorithms, and systems that keep additional logical compute at
What carries the argument
The reinterpretation of the scaling-law compute variable as logical compute, which shifts the burden of progress onto efficiency gains in hardware, algorithms, and systems.
If this is right
- Diminishing returns in scaling laws translate directly into higher operational costs unless efficiency improves.
- Progress remains predictable only if hardware, algorithm, and system gains occur at least as fast as Moore's Law.
- AI lacks one agreed cadence for these gains, yet recent trends show rates that are Moore-like or faster.
- Scaling stays active only while additional logical compute can be delivered at acceptable cost.
Where Pith is reading between the lines
- Investment priorities may shift toward efficiency research once the dependence on repeated doublings is accepted.
- If efficiency gains cannot be sustained, development may turn toward non-scaling approaches sooner than raw compute forecasts suggest.
- Economic models of AI deployment should treat efficiency cadence as a first-order variable rather than an afterthought.
Load-bearing premise
Efficiency doublings of the needed size and frequency can be achieved repeatedly through hardware, algorithms, and systems improvements without hitting hard physical or economic limits.
What would settle it
A multi-year series of frontier models in which measured efficiency gains fall below one doubling per generation, causing training and inference costs to rise faster than performance improvements.
Figures
read the original abstract
This paper argues that continued AI scaling requires repeated efficiency doublings. Classical AI scaling laws remain useful because they make progress predictable despite diminishing returns, but the compute variable in those laws is best read as logical compute, not as a record of one fixed physical implementation. Practical burden therefore depends on the efficiency with which physical resources realize that compute. Under that interpretation, diminishing returns mean rising operational burden, not merely a flatter curve. Sustained progress then requires recurrent gains in hardware, algorithms, and systems that keep additional logical compute feasible at acceptable cost. The relevant analogy is Moore's Law, understood less as a theorem than as an organizing expectation of repeated efficiency improvement. AI does not yet have a single agreed cadence for such gains, but recent evidence suggests trends that are at least Moore-like and sometimes faster. The paper's claim is therefore simple: if AI scaling is to remain active, repeated efficiency doublings are not optional. They are required.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript argues that continued AI scaling, as governed by classical scaling laws, requires repeated doublings in efficiency. It reinterprets the compute variable in those laws as logical compute (rather than a fixed physical implementation), so that diminishing returns imply rising operational burden; sustained progress therefore demands recurrent gains in hardware, algorithms, and systems, analogous to the organizing expectation of Moore's Law. The central claim is that such efficiency doublings are not optional but required if scaling is to remain active.
Significance. If the logical reinterpretation is accepted, the paper supplies a clear framing that positions efficiency improvement as a necessary condition for ongoing scaling rather than an incidental benefit. Its value is primarily conceptual and organizational, highlighting the distinction between logical and physical compute without supplying new data, derivations, or quantitative projections that could be directly tested.
major comments (1)
- Abstract: the claim that repeated efficiency doublings are required follows directly from redefining the scaling-law compute variable as logical rather than physical; under this definition the rising burden and consequent need for doublings become tautological consequences rather than independently derived results that could be confronted with empirical data on physical limits or cost curves.
minor comments (1)
- The manuscript would be strengthened by explicit citations to the classical scaling-law references (e.g., Kaplan et al. 2020 or Hoffmann et al. 2022) when invoking the compute variable and diminishing returns.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for identifying the central interpretive move in the manuscript. We respond to the single major comment below.
read point-by-point responses
-
Referee: Abstract: the claim that repeated efficiency doublings are required follows directly from redefining the scaling-law compute variable as logical rather than physical; under this definition the rising burden and consequent need for doublings become tautological consequences rather than independently derived results that could be confronted with empirical data on physical limits or cost curves.
Authors: We agree that, once compute is read as logical rather than physical, the necessity of recurrent efficiency doublings follows directly. That logical step is intentional: the manuscript's purpose is to make the implication explicit and to treat efficiency improvement as an organizing requirement rather than an incidental benefit. The contribution is therefore conceptual and organizational, not a new empirical derivation or set of physical bounds. We reference existing hardware, algorithmic, and systems trends that are at least Moore-like, but we do not claim to supply independent cost-curve projections. If the referee would like the abstract and discussion section revised to state this framing more explicitly and to expand the cited efficiency trends, we will do so. revision: partial
Circularity Check
Central claim reduces to tautology via logical-compute redefinition
specific steps
-
self definitional
[Abstract]
"the compute variable in those laws is best read as logical compute, not as a record of one fixed physical implementation. Practical burden therefore depends on the efficiency with which physical resources realize that compute. Under that interpretation, diminishing returns mean rising operational burden, not merely a flatter curve. Sustained progress then requires recurrent gains in hardware, algorithms, and systems that keep additional logical compute feasible at acceptable cost."
By stipulating that scaling-law compute equals logical compute, the paper makes rising physical burden and the consequent need for efficiency doublings a direct logical consequence of the definition itself. The claim that such doublings are 'required' for continued scaling is therefore true by the initial interpretive choice rather than derived from separate evidence.
full rationale
The paper's derivation begins by reinterpreting the compute variable in classical scaling laws as logical compute rather than physical. This definitional move directly entails that any flattening of returns must manifest as rising operational burden, which in turn requires repeated efficiency doublings to sustain scaling. No independent equations, datasets, or external benchmarks are introduced to derive the necessity; the requirement follows by construction from the chosen interpretation. The Moore's Law analogy is invoked as an organizing expectation rather than a proven mechanism, leaving the load-bearing step self-definitional.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
DeepSeek-AI. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024. doi: 10.48550/arXiv.2412.19437. URLhttps://arxiv.org/abs/2412.19437
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.19437 2024
-
[2]
Training Compute-Optimal Large Language Models
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia 7 Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sif...
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Training Compute-Optimal Large Language Models
doi: 10.48550/arXiv.2203.15556. URLhttps://arxiv.org/abs/2203.15556
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2203.15556
-
[4]
Scaling Laws for Neural Language Models
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020. doi: 10.48550/arXiv.2001.08361. URL https://arxiv.org/abs/2001.08361
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2001.08361 2001
-
[6]
Tanishq Kumar, Zachary Ankner, Benjamin F. Spector, Blake Bordelon, Niklas Muennighoff, Mansheej Paul, Cengiz Pehlevan, Christopher Re, and Aditi Raghunathan. Scaling laws for precision.arXiv preprint arXiv:2411.04330, 2024. doi: 10.48550/arXiv.2411.04330. URL https://arxiv.org/abs/2411.04330
-
[7]
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. Scaling LLM test-time compute op- timally can be more effective than scaling model parameters.arXiv preprint arXiv:2408.03314,
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
doi: 10.48550/arXiv.2408.03314. URLhttps://arxiv.org/abs/2408.03314
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2408.03314
-
[9]
Yanxi Chen, Xuchen Pan, Yaliang Li, Bolin Ding, and Jingren Wang. Provable scaling laws for the test-time compute of large language models.arXiv preprint arXiv:2411.19477, 2024. doi: 10.48550/arXiv.2411.19477. URLhttps://arxiv.org/abs/2411.19477
-
[10]
Scaling laws for fine-grained mixture of experts
Jan Ludziejewski, Jakub Krajewski, Kamil Adamczewski, Maciej Pióro, Michał Krutul, Szymon Antoniak, Kamil Ciebiera, Krystian Król, Tomasz Odrzygó´ zd´ z, Piotr Sankowski, Marek Cygan, and Sebastian Jaszczur. Scaling laws for fine-grained mixture of experts. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of...
work page 2024
-
[11]
Joint MoE scaling laws: Mixture of experts can be memory efficient
Jan Ludziejewski, Maciej Pióro, Jakub Krajewski, Maciej Stefaniak, Michał Krutul, Jan Mała´snicki, Marek Cygan, Piotr Sankowski, Kamil Adamczewski, Piotr Miło ´s, and Se- bastian Jaszczur. Joint MoE scaling laws: Mixture of experts can be memory efficient. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of ...
work page 2025
-
[12]
Gordon E. Moore. Cramming more components onto integrated circuits.Electronics, 38(8):114– 117, 1965. URL https://www.cs.utexas.edu/~fussell/courses/cs352h/papers/ moore.pdf
work page 1965
-
[13]
Danny Hernandez and Tom B. Brown. Measuring the algorithmic efficiency of neural networks. arXiv preprint arXiv:2005.04305, 2020. doi: 10.48550/arXiv.2005.04305. URL https: //arxiv.org/abs/2005.04305
-
[14]
Artificial intelligence index report 2025.arXiv preprint arXiv:2504.07139, 2025
Stanford Institute for Human-Centered Artificial Intelligence. Artificial intelligence index report 2025.arXiv preprint arXiv:2504.07139, 2025. doi: 10.48550/arXiv.2504.07139. URL https://arxiv.org/abs/2504.07139
-
[15]
Zelin Tan et al. Scaling behaviors of LLM reinforcement learning post-training: An empirical study in mathematical reasoning.arXiv preprint arXiv:2509.25300, 2025. doi: 10.48550/arXiv. 2509.25300. URLhttps://arxiv.org/abs/2509.25300
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2025
-
[16]
Dhillon, David Brandfonbrener, and Rishabh Agarwal
Devvrit Khatri, Lovish Madaan, Rishabh Tiwari, Rachit Bansal, Sai Surya Duvvuri, Manzil Zaheer, Inderjit S. Dhillon, and others. The art of scaling reinforcement learning compute for LLMs.arXiv preprint arXiv:2510.13786, 2025. doi: 10.48550/arXiv.2510.13786. URL https://arxiv.org/abs/2510.13786
-
[17]
Beyond chinchilla- optimal: Accounting for inference in language model scaling laws
Nikhil Sardana, Jacob Portes, Alexandre Doubov, and Jonathan Frankle. Beyond chinchilla- optimal: Accounting for inference in language model scaling laws. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine 8 Learning Research, pages 45173–45226. PMLR, 2024. URL https://proceedings.mlr. press/v235/sa...
work page 2024
-
[18]
Eugene P. Wigner. The unreasonable effectiveness of mathematics in the natural sciences. Communications on Pure and Applied Mathematics, 13(1):1–14, 1960. doi: 10.1002/cpa. 3160130102. 9
work page doi:10.1002/cpa 1960
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.