Scaling Laws and Symmetry, Evidence from Neural Force Fields

Khang Ngo; Siamak Ravanbakhsh

arxiv: 2510.09768 · v2 · submitted 2025-10-10 · 💻 cs.LG · cs.AI· physics.comp-ph

Scaling Laws and Symmetry, Evidence from Neural Force Fields

Khang Ngo , Siamak Ravanbakhsh This is my paper

Pith reviewed 2026-05-18 08:04 UTC · model grok-4.3

classification 💻 cs.LG cs.AIphysics.comp-ph

keywords scaling lawsequivariant neural networksinteratomic potentialsneural force fieldssymmetrypower-law scalinggeometric deep learning

0 comments

The pith

Equivariant architectures for interatomic potentials follow better power-law scaling than non-equivariant models, with higher-order representations improving the exponents further.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper conducts an empirical study on learning interatomic potentials, a geometric task of predicting forces between atoms. It demonstrates that neural network architectures incorporating equivariance to leverage the underlying symmetry of the problem exhibit superior scaling behavior compared to those without it. The scaling follows power laws in data, parameters, and compute, but the exponents vary by architecture, favoring equivariant designs and especially higher-order ones. The results imply that embedding task symmetries explicitly alters the fundamental difficulty and scaling properties of the learning problem instead of requiring the model to discover them. Optimal training appears to require scaling data and model size in tandem across architectures.

Core claim

Equivariant architectures that leverage task symmetry scale better than non-equivariant models in learning interatomic potentials, with higher-order representations translating to better scaling exponents. The study observes clear power-law scaling with respect to data, parameters, and compute, where the exponents are architecture-dependent. Analysis also suggests that for compute-optimal training, data and model sizes should scale in tandem regardless of the architecture.

What carries the argument

Architecture-dependent power-law scaling exponents arising from equivariant versus non-equivariant neural network designs on geometric force prediction tasks.

If this is right

Equivariant models reach target accuracy with less data or compute at large scales.
Higher-order representations within equivariant models yield additional improvements in scaling efficiency.
Data volume and model capacity must increase together for optimal performance independent of architecture choice.
Explicit incorporation of symmetry reduces the effective learning difficulty compared to discovering it from data alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The advantage may generalize to other geometric or physics-based prediction tasks where symmetry is known a priori.
Extending experiments to much larger model regimes could test whether the exponent gap persists or narrows.
Designers of future large-scale models for symmetric domains may benefit from prioritizing built-in equivariance over purely data-driven approaches.

Load-bearing premise

The observed power-law scaling behaviors and differences in exponents between architectures will hold outside the specific datasets, model sizes, and training regimes tested.

What would settle it

Training a large non-equivariant model on substantially more data and compute until its effective scaling exponent matches or exceeds that of an equivariant counterpart would challenge the central claim.

read the original abstract

We present an empirical study in the geometric task of learning interatomic potentials, which shows equivariance matters even more at larger scales; we show a clear power-law scaling behaviour with respect to data, parameters and compute with ``architecture-dependent exponents''. In particular, we observe that equivariant architectures, which leverage task symmetry, scale better than non-equivariant models. Moreover, among equivariant architectures, higher-order representations translate to better scaling exponents. Our analysis also suggests that for compute-optimal training, the data and model sizes should scale in tandem regardless of the architecture. At a high level, these results suggest that, contrary to common belief, we should not leave it to the model to discover fundamental inductive biases such as symmetry, especially as we scale, because they change the inherent difficulty of the task and its scaling laws.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Equivariant and higher-order models show better scaling exponents than non-equivariant ones on interatomic potentials, but the differences could still be range-dependent rather than intrinsic.

read the letter

The main point is that this paper measures architecture-dependent scaling exponents in neural force fields and finds that equivariant models, especially higher-order ones, improve faster with data, parameters, and compute than non-equivariant baselines. The work also notes that compute-optimal scaling keeps data and model size growing together across architectures. That is the concrete empirical contribution. It is useful because it moves the symmetry discussion from performance at fixed size to how the scaling curve itself changes when symmetry is built in. The direct comparisons on the same geometric task give a clearer picture than most prior scaling-law papers that stay within one architecture family. The claim that we should not leave symmetry discovery to the model at scale follows from those exponent differences. The soft spot is exactly the one in the stress test: the observed exponents might reflect the particular scale window rather than a permanent change in task difficulty. Non-equivariant models could still be spending early capacity on learning the symmetry, so their power-law regime might start later or with a different slope. The paper would be stronger with either larger-scale runs or some diagnostic showing that all models have entered their asymptotic regime. Dataset details and error analysis would also help judge how stable the exponent estimates are. This is for people who design or scale models for chemistry and materials simulation. Readers who already care about equivariance and scaling laws will find the comparative numbers worth looking at. It is worth sending to referees because the empirical pattern is new and directly relevant to design choices, even if the generality of the exponents needs more checking.

Referee Report

2 major / 2 minor

Summary. The manuscript presents an empirical study of scaling laws for neural force fields on the task of learning interatomic potentials. It reports power-law scaling of performance with respect to data volume, parameter count, and compute, with exponents that vary by architecture. Equivariant models are shown to exhibit better scaling than non-equivariant baselines, and higher-order equivariant representations further improve the exponents. The authors conclude that symmetries should be explicitly encoded rather than discovered by the model, because they alter task difficulty and scaling behavior, and that data and model size should be scaled together for compute-optimal training.

Significance. If the central empirical findings hold, the work would be significant for geometric deep learning and physics-informed ML. It supplies quantitative evidence that equivariance improves scaling exponents in addition to absolute accuracy, supporting the design choice to hard-code task symmetries at large scales. The comparative evaluation across multiple architectures on the same geometric task is a concrete contribution to the literature on inductive biases and scaling laws.

major comments (2)

[Scaling experiments and discussion of architecture-dependent exponents] The central claim that equivariant (and higher-order) architectures inherently improve scaling exponents by reducing task difficulty via symmetry (abstract and final paragraph) is load-bearing. The reported exponent differences could instead reflect pre-asymptotic behavior in which non-equivariant models are still expending capacity to discover symmetries within the tested data/parameter/compute windows. The manuscript should include an analysis or additional runs that test whether the observed exponent gaps persist when the scale range is extended (e.g., larger models or datasets) or when the fitting window is shifted.
[Compute-optimal training analysis] The assertion that 'for compute-optimal training, the data and model sizes should scale in tandem regardless of the architecture' requires explicit support from the compute-optimal frontier analysis. If this conclusion rests on a single set of isoFLOPs curves, the paper should clarify how the optimal data-to-parameter ratio was determined and whether it is robust to the choice of loss metric or validation set.

minor comments (2)

[Methods / Experimental setup] The manuscript would benefit from a clearer description of the datasets (size, diversity, train/validation/test splits) and the precise procedure used to fit the power-law exponents, including any regularization or range selection criteria.
[Figures and results] Scaling plots should report the fitted exponents with uncertainty estimates (e.g., bootstrap or fit residuals) and indicate the exact data range over which each power law was fitted.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their constructive comments, which have helped us improve the clarity and robustness of our analysis. Below, we address each major comment in detail.

read point-by-point responses

Referee: [Scaling experiments and discussion of architecture-dependent exponents] The central claim that equivariant (and higher-order) architectures inherently improve scaling exponents by reducing task difficulty via symmetry (abstract and final paragraph) is load-bearing. The reported exponent differences could instead reflect pre-asymptotic behavior in which non-equivariant models are still expending capacity to discover symmetries within the tested data/parameter/compute windows. The manuscript should include an analysis or additional runs that test whether the observed exponent gaps persist when the scale range is extended (e.g., larger models or datasets) or when the fitting window is shifted.

Authors: We appreciate this concern regarding the possibility of pre-asymptotic effects. Our experiments span a wide range of scales, covering several orders of magnitude in both data volume and model parameters, which is typical for scaling law studies in this domain. The exponent differences between architectures are consistent across the fitted ranges. In the revised manuscript, we have added an analysis examining the scaling behavior in different sub-ranges of the data and parameter space to assess stability of the exponents. We agree that extending to substantially larger scales would provide further confirmation, but such experiments are computationally intensive and beyond the scope of the current work given available resources. We have updated the discussion to acknowledge this limitation explicitly. revision: partial
Referee: [Compute-optimal training analysis] The assertion that 'for compute-optimal training, the data and model sizes should scale in tandem regardless of the architecture' requires explicit support from the compute-optimal frontier analysis. If this conclusion rests on a single set of isoFLOPs curves, the paper should clarify how the optimal data-to-parameter ratio was determined and whether it is robust to the choice of loss metric or validation set.

Authors: We thank the referee for pointing out the need for more detail on this analysis. In the revised manuscript, we have expanded the relevant section to describe the procedure for determining the compute-optimal frontier: we generated isoFLOPs curves by varying data and model sizes while keeping compute fixed, then identified the optimal data-to-parameter ratio as the one that achieves the lowest validation error for a given compute budget. We performed this across multiple architectures and confirmed the tandem scaling. To address robustness, we repeated the analysis using different validation sets (e.g., held-out molecules) and observed consistent results. Regarding the loss metric, our primary metric is the mean squared error on forces, which is the standard for interatomic potential learning; we briefly note that using energy error yields qualitatively similar trends. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical scaling comparisons are self-contained

full rationale

The paper reports direct experimental measurements of power-law scaling exponents for data, parameters, and compute across equivariant and non-equivariant neural force field architectures on interatomic potential tasks. These exponents are obtained by fitting observed training curves rather than being derived from any first-principles equations or self-referential definitions within the work. No load-bearing step reduces to a fitted parameter renamed as a prediction, a self-citation chain, or an ansatz smuggled via prior work; the central claim that symmetry alters scaling behavior rests on the comparative empirical results themselves, which remain falsifiable against external benchmarks and do not presuppose the target conclusion.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central empirical claims rest on the assumption that power-law scaling applies to this task and that the tested architectures and datasets are representative; no new physical entities are introduced.

free parameters (1)

architecture-dependent scaling exponents
Observed exponents are fitted from training runs and vary by model type.

axioms (1)

domain assumption Power-law relationships govern performance improvement with scale in this domain
Invoked to interpret the observed trends in data, parameters, and compute.

pith-pipeline@v0.9.0 · 5668 in / 1284 out tokens · 56508 ms · 2026-05-18T08:04:04.990958+00:00 · methodology

Scaling Laws and Symmetry, Evidence from Neural Force Fields

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)