arxiv: 2603.28507 · v2 · submitted 2026-03-30 · 💻 cs.LG · cs.AI

Recognition: no theorem link

Continued AI Scaling Requires Repeated Efficiency Doublings

Chien-Ping Lu

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:27 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords AI scaling lawsefficiency doublingslogical computediminishing returnsMoore's Law analogyhardware algorithms systems

0 comments

The pith

AI scaling continues only with repeated efficiency doublings in hardware, algorithms, and systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that classical scaling laws still predict AI progress, but only when compute is treated as logical output rather than raw physical resources. Diminishing returns then appear as rising operational costs instead of a flatter curve. Sustained scaling therefore depends on recurrent efficiency gains that keep additional logical compute affordable. The author draws an analogy to Moore's Law as an expectation of repeated improvement rather than a fixed physical law. This reframes scaling from an automatic process into one that requires deliberate, ongoing advances to remain viable.

Core claim

The central claim is that if AI scaling is to remain active, repeated efficiency doublings are not optional. They are required. Classical AI scaling laws remain useful because they make progress predictable despite diminishing returns, but the compute variable in those laws is best read as logical compute, not as a record of one fixed physical implementation. Practical burden therefore depends on the efficiency with which physical resources realize that compute. Under that interpretation, diminishing returns mean rising operational burden, not merely a flatter curve. Sustained progress then requires recurrent gains in hardware, algorithms, and systems that keep additional logical compute at

What carries the argument

The reinterpretation of the scaling-law compute variable as logical compute, which shifts the burden of progress onto efficiency gains in hardware, algorithms, and systems.

If this is right

Diminishing returns in scaling laws translate directly into higher operational costs unless efficiency improves.
Progress remains predictable only if hardware, algorithm, and system gains occur at least as fast as Moore's Law.
AI lacks one agreed cadence for these gains, yet recent trends show rates that are Moore-like or faster.
Scaling stays active only while additional logical compute can be delivered at acceptable cost.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Investment priorities may shift toward efficiency research once the dependence on repeated doublings is accepted.
If efficiency gains cannot be sustained, development may turn toward non-scaling approaches sooner than raw compute forecasts suggest.
Economic models of AI deployment should treat efficiency cadence as a first-order variable rather than an afterthought.

Load-bearing premise

Efficiency doublings of the needed size and frequency can be achieved repeatedly through hardware, algorithms, and systems improvements without hitting hard physical or economic limits.

What would settle it

A multi-year series of frontier models in which measured efficiency gains fall below one doubling per generation, causing training and inference costs to rise faster than performance improvements.

Figures

Figures reproduced from arXiv: 2603.28507 by Chien-Ping Lu.

read the original abstract

This paper argues that continued AI scaling requires repeated efficiency doublings. Classical AI scaling laws remain useful because they make progress predictable despite diminishing returns, but the compute variable in those laws is best read as logical compute, not as a record of one fixed physical implementation. Practical burden therefore depends on the efficiency with which physical resources realize that compute. Under that interpretation, diminishing returns mean rising operational burden, not merely a flatter curve. Sustained progress then requires recurrent gains in hardware, algorithms, and systems that keep additional logical compute feasible at acceptable cost. The relevant analogy is Moore's Law, understood less as a theorem than as an organizing expectation of repeated efficiency improvement. AI does not yet have a single agreed cadence for such gains, but recent evidence suggests trends that are at least Moore-like and sometimes faster. The paper's claim is therefore simple: if AI scaling is to remain active, repeated efficiency doublings are not optional. They are required.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript argues that continued AI scaling, as governed by classical scaling laws, requires repeated doublings in efficiency. It reinterprets the compute variable in those laws as logical compute (rather than a fixed physical implementation), so that diminishing returns imply rising operational burden; sustained progress therefore demands recurrent gains in hardware, algorithms, and systems, analogous to the organizing expectation of Moore's Law. The central claim is that such efficiency doublings are not optional but required if scaling is to remain active.

Significance. If the logical reinterpretation is accepted, the paper supplies a clear framing that positions efficiency improvement as a necessary condition for ongoing scaling rather than an incidental benefit. Its value is primarily conceptual and organizational, highlighting the distinction between logical and physical compute without supplying new data, derivations, or quantitative projections that could be directly tested.

major comments (1)

Abstract: the claim that repeated efficiency doublings are required follows directly from redefining the scaling-law compute variable as logical rather than physical; under this definition the rising burden and consequent need for doublings become tautological consequences rather than independently derived results that could be confronted with empirical data on physical limits or cost curves.

minor comments (1)

The manuscript would be strengthened by explicit citations to the classical scaling-law references (e.g., Kaplan et al. 2020 or Hoffmann et al. 2022) when invoking the compute variable and diminishing returns.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for identifying the central interpretive move in the manuscript. We respond to the single major comment below.

read point-by-point responses

Referee: Abstract: the claim that repeated efficiency doublings are required follows directly from redefining the scaling-law compute variable as logical rather than physical; under this definition the rising burden and consequent need for doublings become tautological consequences rather than independently derived results that could be confronted with empirical data on physical limits or cost curves.

Authors: We agree that, once compute is read as logical rather than physical, the necessity of recurrent efficiency doublings follows directly. That logical step is intentional: the manuscript's purpose is to make the implication explicit and to treat efficiency improvement as an organizing requirement rather than an incidental benefit. The contribution is therefore conceptual and organizational, not a new empirical derivation or set of physical bounds. We reference existing hardware, algorithmic, and systems trends that are at least Moore-like, but we do not claim to supply independent cost-curve projections. If the referee would like the abstract and discussion section revised to state this framing more explicitly and to expand the cited efficiency trends, we will do so. revision: partial

Circularity Check

1 steps flagged

Central claim reduces to tautology via logical-compute redefinition

specific steps

self definitional [Abstract]
"the compute variable in those laws is best read as logical compute, not as a record of one fixed physical implementation. Practical burden therefore depends on the efficiency with which physical resources realize that compute. Under that interpretation, diminishing returns mean rising operational burden, not merely a flatter curve. Sustained progress then requires recurrent gains in hardware, algorithms, and systems that keep additional logical compute feasible at acceptable cost."

By stipulating that scaling-law compute equals logical compute, the paper makes rising physical burden and the consequent need for efficiency doublings a direct logical consequence of the definition itself. The claim that such doublings are 'required' for continued scaling is therefore true by the initial interpretive choice rather than derived from separate evidence.

full rationale

The paper's derivation begins by reinterpreting the compute variable in classical scaling laws as logical compute rather than physical. This definitional move directly entails that any flattening of returns must manifest as rising operational burden, which in turn requires repeated efficiency doublings to sustain scaling. No independent equations, datasets, or external benchmarks are introduced to derive the necessity; the requirement follows by construction from the chosen interpretation. The Moore's Law analogy is invoked as an organizing expectation rather than a proven mechanism, leaving the load-bearing step self-definitional.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The claim rests on the unstated premise that efficiency improvements remain achievable at a Moore-like or faster rate; no explicit free parameters, axioms, or invented entities are listed in the abstract.

pith-pipeline@v0.9.0 · 5448 in / 1046 out tokens · 34881 ms · 2026-05-14T21:27:34.777523+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 7 internal anchors

[1]

DeepSeek-V3 Technical Report

DeepSeek-AI. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024. doi: 10.48550/arXiv.2412.19437. URLhttps://arxiv.org/abs/2412.19437

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.19437 2024
[2]

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia 7 Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sif...

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Training Compute-Optimal Large Language Models

doi: 10.48550/arXiv.2203.15556. URLhttps://arxiv.org/abs/2203.15556

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2203.15556
[4]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020. doi: 10.48550/arXiv.2001.08361. URL https://arxiv.org/abs/2001.08361

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2001.08361 2001
[6]

Spector, Blake Bordelon, Niklas Muennighoff, Mansheej Paul, Cengiz Pehlevan, Christopher Re, and Aditi Raghunathan

Tanishq Kumar, Zachary Ankner, Benjamin F. Spector, Blake Bordelon, Niklas Muennighoff, Mansheej Paul, Cengiz Pehlevan, Christopher Re, and Aditi Raghunathan. Scaling laws for precision.arXiv preprint arXiv:2411.04330, 2024. doi: 10.48550/arXiv.2411.04330. URL https://arxiv.org/abs/2411.04330

work page doi:10.48550/arxiv.2411.04330 2024
[7]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. Scaling LLM test-time compute op- timally can be more effective than scaling model parameters.arXiv preprint arXiv:2408.03314,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

doi: 10.48550/arXiv.2408.03314. URLhttps://arxiv.org/abs/2408.03314

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2408.03314
[9]

Provable scaling laws for the test-time compute of large language models.arXiv preprint arXiv:2411.19477, 2024

Yanxi Chen, Xuchen Pan, Yaliang Li, Bolin Ding, and Jingren Wang. Provable scaling laws for the test-time compute of large language models.arXiv preprint arXiv:2411.19477, 2024. doi: 10.48550/arXiv.2411.19477. URLhttps://arxiv.org/abs/2411.19477

work page doi:10.48550/arxiv.2411.19477 2024
[10]

Scaling laws for fine-grained mixture of experts

Jan Ludziejewski, Jakub Krajewski, Kamil Adamczewski, Maciej Pióro, Michał Krutul, Szymon Antoniak, Kamil Ciebiera, Krystian Król, Tomasz Odrzygó´ zd´ z, Piotr Sankowski, Marek Cygan, and Sebastian Jaszczur. Scaling laws for fine-grained mixture of experts. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of...

work page 2024
[11]

Joint MoE scaling laws: Mixture of experts can be memory efficient

Jan Ludziejewski, Maciej Pióro, Jakub Krajewski, Maciej Stefaniak, Michał Krutul, Jan Mała´snicki, Marek Cygan, Piotr Sankowski, Kamil Adamczewski, Piotr Miło ´s, and Se- bastian Jaszczur. Joint MoE scaling laws: Mixture of experts can be memory efficient. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of ...

work page 2025
[12]

Gordon E. Moore. Cramming more components onto integrated circuits.Electronics, 38(8):114– 117, 1965. URL https://www.cs.utexas.edu/~fussell/courses/cs352h/papers/ moore.pdf

work page 1965
[13]

Danny Hernandez and Tom B. Brown. Measuring the algorithmic efficiency of neural networks. arXiv preprint arXiv:2005.04305, 2020. doi: 10.48550/arXiv.2005.04305. URL https: //arxiv.org/abs/2005.04305

work page doi:10.48550/arxiv.2005.04305 2005
[14]

Artificial intelligence index report 2025.arXiv preprint arXiv:2504.07139, 2025

Stanford Institute for Human-Centered Artificial Intelligence. Artificial intelligence index report 2025.arXiv preprint arXiv:2504.07139, 2025. doi: 10.48550/arXiv.2504.07139. URL https://arxiv.org/abs/2504.07139

work page doi:10.48550/arxiv.2504.07139 2025
[15]

Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning

Zelin Tan et al. Scaling behaviors of LLM reinforcement learning post-training: An empirical study in mathematical reasoning.arXiv preprint arXiv:2509.25300, 2025. doi: 10.48550/arXiv. 2509.25300. URLhttps://arxiv.org/abs/2509.25300

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2025
[16]

Dhillon, David Brandfonbrener, and Rishabh Agarwal

Devvrit Khatri, Lovish Madaan, Rishabh Tiwari, Rachit Bansal, Sai Surya Duvvuri, Manzil Zaheer, Inderjit S. Dhillon, and others. The art of scaling reinforcement learning compute for LLMs.arXiv preprint arXiv:2510.13786, 2025. doi: 10.48550/arXiv.2510.13786. URL https://arxiv.org/abs/2510.13786

work page doi:10.48550/arxiv.2510.13786 2025
[17]

Beyond chinchilla- optimal: Accounting for inference in language model scaling laws

Nikhil Sardana, Jacob Portes, Alexandre Doubov, and Jonathan Frankle. Beyond chinchilla- optimal: Accounting for inference in language model scaling laws. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine 8 Learning Research, pages 45173–45226. PMLR, 2024. URL https://proceedings.mlr. press/v235/sa...

work page 2024
[18]

Eugene P. Wigner. The unreasonable effectiveness of mathematics in the natural sciences. Communications on Pure and Applied Mathematics, 13(1):1–14, 1960. doi: 10.1002/cpa. 3160130102. 9

work page doi:10.1002/cpa 1960