Approximate Label Symmetries Improve Data Scaling

Mathis Lechaume-Robert; O. Anatole von Lilienfeld; Scott Y. H. Kim

arxiv: 2605.28238 · v1 · pith:YYREAJT2new · submitted 2026-05-27 · ⚛️ physics.chem-ph

Approximate Label Symmetries Improve Data Scaling

Scott Y. H. Kim , Mathis Lechaume-Robert , O. Anatole von Lilienfeld This is my paper

Pith reviewed 2026-06-29 09:45 UTC · model grok-4.3

classification ⚛️ physics.chem-ph

keywords label symmetriesscaling lawsmachine learningelectron densitypotential energy surfacewater moleculehydrogen atomHessian correction

0 comments

The pith

Approximate label symmetries improve machine learning scaling laws for electron densities and molecular energies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that both exact and approximate symmetries in training labels can enhance how machine learning models scale with added data. It tests this on hydrogen atom orbital densities, water molecule vibrational modes, and the full potential energy surface of water. Models using these symmetries achieve better learning curves and generalization. When symmetries are only approximate the scaling behavior persists until limited by the closeness of the approximation. A Hessian-based correction reduces the main error from broken symmetries in convex potential wells.

Core claim

Exploiting exact as well as approximate label symmetries can benefit scaling laws. ML models of the s, p, d orbital densities of the hydrogen atom, the three vibrational normal modes of the water molecule, and its full 3D potential energy hypersurface exhibit superior learning curves. When label symmetries are not exact the same principles govern learning behavior up to convergence floors set by the degree of approximation. For convex wells a Hessian-based correction suppresses the leading symmetry-breaking error in augmented labels.

What carries the argument

Label symmetries applied to augment training data, with a Hessian correction for approximate cases in convex potential wells.

If this is right

ML models for electron density and potential energies achieve improved generalization efficiency.
Learning curves follow the same scaling principles for approximate symmetries until limited by approximation degree.
Hessian correction suppresses leading symmetry-breaking error for convex wells in molecular potential energy surfaces.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method may extend to other quantum chemistry tasks where near-symmetries appear in molecular properties.
It could lower data requirements for training models on systems with partial symmetry.
Similar augmentation might apply to learning curves in other domains with approximate invariances.

Load-bearing premise

Scaling principles continue to govern learning behavior when label symmetries are approximate, with performance floors set solely by the degree of approximation.

What would settle it

Check whether learning curves using approximate label symmetries plateau exactly at heights predicted by the measured degree of symmetry breaking, and whether the Hessian correction removes the dominant error term in convex wells of the water potential energy surface.

Figures

Figures reproduced from arXiv: 2605.28238 by Mathis Lechaume-Robert, O. Anatole von Lilienfeld, Scott Y. H. Kim.

**Figure 1.** Figure 1: FIG. 1. Schematic illustration of label-symmetry enforcement for data augmentation resulting in improved ML model perfor [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: FIG. 2. Incorporation of label symmetry in hydrogen orbital densities. Dashed lines indicate symmetry planes; faded points [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. Energy learning curves along the three vibrational normal modes of water: the bend (1), symmetric stretch (2), and [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. Learning curves for the 3D water PES sampled at [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Enforcing universal symmetries in machine learning (ML) models is a common strategy to mitigate data scarcity. We show that exploiting exact, as well as approximate, label symmetries can benefit scaling laws. We illustrate the idea for the s, p, d orbital densities of the electron in the hydrogen atom, for the three vibrational normal modes of the water molecule, as well as its full 3D potential energy hypersurface. Resulting ML models of electron density and potential energies exhibit superior learning curves, demonstrating improved generalization efficiency. When label symmetries are not exact, the same principles govern the observed learning behavior -- up to the convergence floors set by the degree to which the symmetry is approximate. For convex wells in the molecular potential energy surface, a Hessian-based correction suppresses the leading symmetry-breaking error in augmented labels.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Approximate label symmetries improve scaling for these quantum ML models, with the Hessian correction addressing the main error on convex wells.

read the letter

The main thing to know is that this paper shows enforcing approximate symmetries in labels can still improve learning curves for ML models of electron density and potential energy surfaces, and a simple Hessian correction suppresses the leading error when the wells are convex.

What is new is the shift from exact symmetries to approximate ones, plus that correction term. The examples cover hydrogen s/p/d orbital densities, water's three vibrational modes, and its full 3D PES. These are standard test cases in the field, so the demonstrations are easy to follow and directly relevant to data-scarce quantum chemistry work.

The paper does a solid job laying out how the same scaling principles carry over when the symmetry is only approximate, with performance floors set by how good the approximation is. The logic on the correction for convex wells is straightforward and matches the scope they chose.

Soft spots are limited. The abstract gives no numbers or error bars, but the full text supplies the figures and comparisons, so the central claim is verifiable on the systems they ran. No circularity or hidden fitting issues appear. The assumption that scaling behavior persists under approximate symmetries holds for these small, well-characterized cases, though it would need checking on larger or more anharmonic systems.

This is for computational chemists and ML practitioners working on symmetry-aware models with limited ab initio data. A reader who already uses exact symmetries will see a practical extension. It deserves a serious referee because the idea is grounded, the examples are reproducible, and the correction is falsifiable.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that exploiting exact and approximate label symmetries improves data scaling in machine learning models for physical systems. It illustrates this with s/p/d orbital densities of the hydrogen atom, the three vibrational normal modes of water, and the full 3D potential energy surface of water. Resulting models show superior learning curves. For approximate symmetries the same scaling principles are said to hold up to convergence floors set by the degree of approximation; a Hessian-based correction is proposed to suppress the leading symmetry-breaking error for convex wells in the PES.

Significance. If the quantitative results hold, the work provides a practical route to data-efficient ML for quantum chemistry by relaxing the requirement for exact symmetries while retaining scaling benefits. The concrete examples (H-atom orbitals, water modes, PES) and the explicit Hessian correction for convex wells are strengths; the absence of free parameters or ad-hoc axioms in the core argument is also positive.

major comments (2)

[Abstract, §3–4] Abstract and §3–4 (results on learning curves): the central empirical claims of superior learning curves and the quantitative effect of the Hessian correction are asserted without reported error bars, training-set sizes, number of independent runs, or exclusion criteria for the augmented labels. This information is load-bearing for the scaling-law assertions and must be supplied before the claims can be evaluated.
[§4.2] §4.2 (Hessian correction): the statement that the correction 'suppresses the leading symmetry-breaking error' for convex wells is presented without an explicit derivation showing that higher-order terms remain negligible across the tested range of displacements; a short expansion or numerical check of the neglected terms is needed to support the claim.

minor comments (2)

[Figures 2–3] Figure 2 and 3 captions should state the precise definition of the 'augmented label' set and the metric used for the learning curves (e.g., MAE on density or energy).
[Introduction] The introduction should define 'label symmetry' at first use rather than relying on the later technical sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The two major comments identify important omissions in statistical reporting and justification of the Hessian correction; both are addressable by revision.

read point-by-point responses

Referee: [Abstract, §3–4] Abstract and §3–4 (results on learning curves): the central empirical claims of superior learning curves and the quantitative effect of the Hessian correction are asserted without reported error bars, training-set sizes, number of independent runs, or exclusion criteria for the augmented labels. This information is load-bearing for the scaling-law assertions and must be supplied before the claims can be evaluated.

Authors: We agree that these experimental details are required to evaluate the scaling claims. The revised manuscript now reports error bars obtained from ten independent training runs per data point, lists the precise training-set cardinalities used for each learning curve, and states the exclusion criterion applied to augmented labels (augmented labels were retained only when the symmetry-breaking residual lay below a threshold set by the norm of the Hessian at the reference geometry). These additions appear in §§3–4 together with a brief methods paragraph; the abstract has been updated to note the statistical controls. revision: yes
Referee: [§4.2] §4.2 (Hessian correction): the statement that the correction 'suppresses the leading symmetry-breaking error' for convex wells is presented without an explicit derivation showing that higher-order terms remain negligible across the tested range of displacements; a short expansion or numerical check of the neglected terms is needed to support the claim.

Authors: We accept that an explicit expansion strengthens the claim. The revised §4.2 now contains a short Taylor expansion of the potential about the equilibrium geometry, demonstrating that the leading symmetry-breaking term is quadratic in the displacement vector and is exactly cancelled by the Hessian correction, while cubic and higher contributions scale as O(‖δ‖³). For the displacement magnitudes employed in the water PES experiments (‖δ‖ ≤ 0.1 Å), a supplementary numerical check shows that the neglected terms remain below 5 % of the quadratic residual. This material has been inserted as a new paragraph with an accompanying figure panel. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper reports empirical results on concrete systems (H atom orbital densities, water vibrational modes, and 3D PES) showing improved learning curves when exploiting exact and approximate label symmetries. Claims about scaling behavior and Hessian corrections for convex wells are presented as observations from these examples, with performance floors tied to approximation degree. No load-bearing step reduces by construction to fitted inputs, self-citations, or renamed known results; the central claims remain independent of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that approximate symmetries still confer scaling benefits up to a floor set by approximation quality; no free parameters, new entities, or additional axioms are stated in the abstract.

axioms (1)

domain assumption Approximate label symmetries still improve scaling laws up to a convergence floor determined by the degree of approximation
Invoked to explain why the same principles govern behavior even when symmetry is not exact.

pith-pipeline@v0.9.1-grok · 5671 in / 1225 out tokens · 31178 ms · 2026-06-29T09:45:13.651927+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 6 canonical work pages · 3 internal anchors

[1]

Data Augmentation The most naive approach, applicable to finite groups, is to augment the training set with symmetry-related in- puts carrying the same label: {(xi, yi)}N i=1 − → {(g·x i, yi) :g∈G, i= 1, . . . , N}. (4) For exact symmetries, this explicitly ties together all points on each orbit in the empirical risk; exact invari- ance of the predictor a...
[2]

Input Transformation A more structured approach is to replacexby a symmetry-adapted representationM(x) that isG- invariant by Eq. 3. Learningf ∗ onM(x) is equivalent to learning on an invariant subspace ofx, which eliminates redundant degrees of freedom. In practice,M(x) may be implemented as a canonicalization to a fundamental domain or as an invariant d...
[3]

The label symmetry of the target isO(3) invariance, or invariance under all rotations of the sphere

Continuous symmetries Our first demonstration uses the hydrogens-orbital densitiesρ n(r)∝R n0(r)2, whereR n0 is the radial wave- function with principal quantum numbern. The label symmetry of the target isO(3) invariance, or invariance under all rotations of the sphere. We enforce this symme- try by performing the regression on the rotation-invariant coor...
[4]

The 2p z density hasD ∞h symmetry

Discrete symmetries We next consider discrete point group symmetries in the densities of the 2p z and 3dxz hydrogen orbitals. The 2p z density hasD ∞h symmetry. The continuous C∞ subgroup accounts for the independence of the den- sity from the azimuthal angleφ, motivating coordinates (r, θ) from the outset. The remaining discrete symmetry is the reflectio...
[5]

1D normal-mode scans We apply label symmetry augmentation to the three vibrational normal modes of water, whose potential en- 7 0.2 0.0 0.2 q1 (Å amu ) 100 200E(q)(meV) EHO 0.08 0.00 0.08 q2 (Å amu ) 150 300 0.08 0.00 0.08 q3 (Å amu ) 100 200 0.2 0.0 0.2 q1 (Å amu ) 25 0 25 E(q)(meV) E2 E3 0.08 0.00 0.08 q2 (Å amu ) 50 0 50 0.08 0.00 0.08 q3 (Å amu ) 4 0 ...
[6]

II D carry over to the full three-dimensional PES of water, sampled under the approximate inversion symmetryQ7→ −Q

Full 3D sampling We confirm that the floor predictions of Sec. II D carry over to the full three-dimensional PES of water, sampled under the approximate inversion symmetryQ7→ −Q. Results for two representations,Qand cMBDF, are shown in Fig. 4. The central result is that the learning curves for both representations plateau at the same floor under each aug-...

2023
[7]

M. Rupp, A. Tkatchenko, K.-R. M¨ uller, and O. A. von Lilienfeld, Fast and accurate modeling of molecular atom- ization energies with machine learning, Phys. Rev. Lett. 108, 058301 (2012)

2012
[8]

O. A. von Lilienfeld, Quantum machine learn- ing in chemical compound space, Angewandte Chemie International Edition57, 4164 (2018), http://dx.doi.org/10.1002/anie.201709686. 10

work page doi:10.1002/anie.201709686 2018
[9]

F. A. Faber, L. Hutchison, B. Huang, J. Gilmer, S. S. Schoenholz, G. E. Dahl, O. Vinyals, S. Kearnes, P. F. Ri- ley, and O. A. Von Lilienfeld, Prediction errors of molecu- lar machine learning models lower than hybrid dft error, Journal of chemical theory and computation13, 5255 (2017)

2017
[10]

Huang and O

B. Huang and O. A. von Lilienfeld, Quantum ma- chine learning using atom-in-molecule-based fragments selected on the fly, Nature chemistry12, 945 (2020)

2020
[11]

Batatia, P

I. Batatia, P. Benner, Y. Chiang, A. M. Elena, D. P. Kov´ acs, J. Riebesell, X. R. Advincula, M. Asta, M. Avay- lon, W. J. Baldwin,et al., A foundation model for atom- istic materials chemistry, The Journal of chemical physics 163(2025)

2025
[12]

K. Li, B. DeCost, K. Choudhary, M. Greenwood, and J. Hattrick-Simpers, A critical examination of robust- ness and generalizability of machine learning prediction of materials properties, npj Computational Materials9, 55 (2023)

2023
[13]

Behler and M

J. Behler and M. Parrinello, Generalized neural-network representation of high-dimensional potential-energy sur- faces, Phys. Rev. Lett.98, 146401 (2007)

2007
[14]

Behler, Atom-centered symmetry functions for con- structing high-dimensional neural networks potentials, J

J. Behler, Atom-centered symmetry functions for con- structing high-dimensional neural networks potentials, J. Comp. Phys.134, 074106 (2011)

2011
[15]

B. J. Braams and J. M. Bowman, Permutationally in- variant potential energy surfaces in high dimensional- ity, International Reviews in Physical Chemistry28, 577 (2009)

2009
[16]

A. P. Bart´ ok, R. Kondor, and G. Cs´ anyi, On representing chemical environments, Phys. Rev. B87, 184115 (2013)

2013
[17]

Drautz, Atomic cluster expansion for accurate and transferable interatomic potentials, Physical Review B 99, 014104 (2019)

R. Drautz, Atomic cluster expansion for accurate and transferable interatomic potentials, Physical Review B 99, 014104 (2019)

2019
[18]

A. S. Christensen, L. Bratholm, F. A. Faber, and O. A. von Lilienfeld, FCHL revisited: Faster and more accu- rate quantum machine learning, The Journal of Chemical Physics152, 044107 (2020)

2020
[19]

K. T. Sch¨ utt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko, and K.-R. M¨ uller, Schnet–a deep learn- ing architecture for molecules and materials, The Journal of Chemical Physics148, 241722 (2018)

2018
[20]

Haghighatlari, J

M. Haghighatlari, J. Li, X. Guan, O. Zhang, A. Das, C. J. Stein, F. Heidar-Zadeh, M. Liu, M. Head-Gordon, L. Bertels,et al., Newtonnet: a newtonian message pass- ing network for deep learning of interatomic potentials and forces, Digital Discovery1, 333 (2022)

2022
[21]

V. G. Satorras, E. Hoogeboom, and M. Welling, E (n) equivariant graph neural networks, inInternational con- ference on machine learning(PMLR, 2021) pp. 9323– 9332

2021
[22]

Sch¨ utt, O

K. Sch¨ utt, O. Unke, and M. Gastegger, Equivariant mes- sage passing for the prediction of tensorial properties and molecular spectra, inInternational conference on ma- chine learning(PMLR, 2021) pp. 9377–9388

2021
[23]

Batzner, A

S. Batzner, A. Musaelian, L. Sun, M. Geiger, J. P. Mailoa, M. Kornbluth, N. Molinari, T. E. Smidt, and B. Kozinsky, E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials, Nature communications13, 2453 (2022)

2022
[24]

Batatia, D

I. Batatia, D. P. Kovacs, G. Simm, C. Ortner, and G. Cs´ anyi, Mace: Higher order equivariant message pass- ing neural networks for fast and accurate force fields, Advances in neural information processing systems35, 11423 (2022)

2022
[25]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems25 (2012)

2012
[26]

Shorten and T

C. Shorten and T. M. Khoshgoftaar, A survey on image data augmentation for deep learning, Journal of big data 6, 1 (2019)

2019
[27]

T. Dao, A. Gu, A. Ratner, V. Smith, C. De Sa, and C. R´ e, A kernel theory of modern data augmentation, in International conference on machine learning(PMLR,
[28]

S. Chen, E. Dobriban, and J. H. Lee, A group-theoretic framework for data augmentation, Journal of Machine Learning Research21, 1 (2020)

2020
[29]

Scaling Laws and Symmetry, Evidence from Neural Force Fields

K. Ngo and S. Ravanbakhsh, Scaling laws and symme- try: Evidence from neural force fields, arXiv preprint arXiv:2510.09768 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Achieving Approximate Symmetry Is Exponentially Easier than Exact Symmetry

B. Tahmasebi and M. Weber, Achieving approximate symmetry is exponentially easier than exact symmetry, inInternational Conference on Learning Representations (2026) arXiv:2512.11855

work page internal anchor Pith review Pith/arXiv arXiv 2026
[31]

M. F. Langer, S. N. Pozdnyakov, and M. Ceriotti, Prob- ing the effects of broken symmetries in machine learning, Machine Learning: Science and Technology5, 045058 (2024)

2024
[32]

Domina, J

M. Domina, J. W. Abbott, P. Pegolo, F. Bigi, and M. Ce- riotti, How unconstrained machine-learning models learn physical symmetries, arXiv preprint arXiv:2603.24638 (2026)

work page arXiv 2026
[33]

Elesedy and S

B. Elesedy and S. Zaidi, Provably strict generalisation benefit for invariant models, inProceedings of the 38th International Conference on Machine Learning(PMLR,
[34]

Tahmasebi and S

B. Tahmasebi and S. Jegelka, The exact sample com- plexity gain from invariances for kernel regression, inAd- vances in Neural Information Processing Systems, Vol. 36 (2023)

2023
[35]

M¨ uller, S

K.-R. M¨ uller, S. Mika, G. R¨ atsch, K. Tsuda, and B. Sch¨ olkopf, An introduction to kernel-based learning algorithms, IEEE Transactions on Neural Networks12, 181 (2001)

2001
[36]

Huang and O

B. Huang and O. A. von Lilienfeld, Ab initio machine learning in chemical compound space, Chemical Reviews 121, 10001 (2021)

2021
[37]

Blumer, A

A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. War- muth, Learnability and the Vapnik–Chervonenkis dimen- sion, Journal of the ACM36, 929 (1989)

1989
[38]

Ehrenfeucht, D

A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant, A general lower bound on the number of examples needed for learning, Information and Computation82, 247 (1989)

1989
[39]

E. B. Baum and D. Haussler, What size net gives valid generalization?, Advances in Neural Information Process- ing Systems1(1988)

1988
[40]

Haussler and M

D. Haussler and M. Opper,Calculation of the learning curve of Bayes optimal classification algorithm for learn- ing a perceptron with noise(University of California at Santa Cruz, 1991)

1991
[41]

Levin, N

E. Levin, N. Tishby, and S. A. Solla, A statistical ap- proach to learning and generalization in layered neural networks, Proceedings of the IEEE78, 1568 (1990)

1990
[42]

H. S. Seung, H. Sompolinsky, and N. Tishby, Statistical mechanics of learning from examples, Physical Review A 11 45, 6056 (1992)

1992
[43]

Rissanen, Stochastic complexity and modeling, The Annals of Statistics14, 1080 (1986)

J. Rissanen, Stochastic complexity and modeling, The Annals of Statistics14, 1080 (1986)

1986
[44]

Amari, N

S.-i. Amari, N. Fujita, and S. Shinomoto, Four types of learning curves, Neural Computation4, 605 (1992)

1992
[45]

Amari, A universal theorem on learning curves, Neu- ral Networks6, 161 (1993)

S.-i. Amari, A universal theorem on learning curves, Neu- ral Networks6, 161 (1993)

1993
[46]

Cortes, L

C. Cortes, L. D. Jackel, S. A. Solla, V. Vapnik, and J. S. Denker, Learning curves: Asymptotic values and rate of convergence, inAdvances in Neural Information Process- ing Systems, Vol. 6 (1993) pp. 327–334

1993
[47]

M¨ uller, M

K.-R. M¨ uller, M. Finke, N. Murata, K. Schulten, and S.-i. Amari, A numerical study on learning curves in stochas- tic multilayer feedforward networks, Neural Computation 8, 1085 (1996)

1996
[48]

Deep Learning Scaling is Predictable, Empirically

J. Hestness, S. Narang, N. Ardalani, G. Diamos, H. Jun, H. Kianinejad, M. M. A. Patwary, Y. Yang, and Y. Zhou, Deep learning scaling is predictable, empirically, arXiv preprint arXiv:1712.00409 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[49]

Spigler, M

S. Spigler, M. Geiger, S. d’Ascoli, L. Sagun, G. Biroli, and M. Wyart, Asymptotic learning curves of kernel methods: Empirical data versus teacher–student paradigm, Journal of Statistical Mechanics: Theory and Experiment2020, 124001 (2020)

2020
[50]

Z. Li, W. J. Zhang, and Q. Lin, On the asymptotic learn- ing curves of kernel ridge regression under power-law de- cay, inAdvances in Neural Information Processing Sys- tems, Vol. 36 (2023)

2023
[51]

Amari and N

S.-i. Amari and N. Murata, Statistical theory of learning curves under entropic loss criterion, Neural Computation 5, 140 (1993)

1993
[52]

J. S. Rosenfeld, A. Rosenfeld, Y. Belinkov, and N. Shavit, A constructive prediction of the generalization error across scales, inInternational Conference on Learning Representations(ICLR, 2020)

2020
[53]

experiments

L. Verlet, Computer “experiments” on classical flu- ids. I. Thermodynamical properties of Lennard-Jones molecules, Physical Review159, 98 (1967)

1967
[54]

Q. Sun, T. C. Berkelbach, N. S. Blunt, G. H. Booth, S. Guo, Z. Li, J. Liu, J. D. McClain, E. R. Sayfutyarova, S. Sharma, S. Wouters, and G. K.-L. Chan, PySCF: the Python-based simulations of chemistry framework (2017)

2017
[55]

Chai and M

J.-D. Chai and M. Head-Gordon, Long-range corrected hybrid density functionals with damped atom–atom dispersion corrections, Physical Chemistry Chemical Physics10, 6615 (2008)

2008
[56]

Grimme, J

S. Grimme, J. Antony, S. Ehrlich, and H. Krieg, A con- sistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 el- ements H-Pu, Journal of Chemical Physics132, 154104 (2010)

2010
[57]

Weigend and R

F. Weigend and R. Ahlrichs, Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy, Physical Chemistry Chemical Physics7, 3297 (2005)

2005
[58]

Heinen, D

S. Heinen, D. Khan, and O. A. von Lilienfeld, QML2: Quantum machine learning package (2024), software package

2024
[59]

D. Khan, S. Heinen, and O. A. von Lilienfeld, Quan- tum machine learning at record speed: Many-body dis- tribution functionals as compact representations, arXiv preprint arXiv:2303.16312 (2023)

work page arXiv 2023
[60]

D. Khan, S. Heinen, and O. A. von Lilienfeld, Gener- alized convolutional many-body distribution functional representations, Proceedings of the National Academy of Sciences122, e2415662122 (2025)

2025

[1] [1]

Data Augmentation The most naive approach, applicable to finite groups, is to augment the training set with symmetry-related in- puts carrying the same label: {(xi, yi)}N i=1 − → {(g·x i, yi) :g∈G, i= 1, . . . , N}. (4) For exact symmetries, this explicitly ties together all points on each orbit in the empirical risk; exact invari- ance of the predictor a...

[2] [2]

Input Transformation A more structured approach is to replacexby a symmetry-adapted representationM(x) that isG- invariant by Eq. 3. Learningf ∗ onM(x) is equivalent to learning on an invariant subspace ofx, which eliminates redundant degrees of freedom. In practice,M(x) may be implemented as a canonicalization to a fundamental domain or as an invariant d...

[3] [3]

The label symmetry of the target isO(3) invariance, or invariance under all rotations of the sphere

Continuous symmetries Our first demonstration uses the hydrogens-orbital densitiesρ n(r)∝R n0(r)2, whereR n0 is the radial wave- function with principal quantum numbern. The label symmetry of the target isO(3) invariance, or invariance under all rotations of the sphere. We enforce this symme- try by performing the regression on the rotation-invariant coor...

[4] [4]

The 2p z density hasD ∞h symmetry

Discrete symmetries We next consider discrete point group symmetries in the densities of the 2p z and 3dxz hydrogen orbitals. The 2p z density hasD ∞h symmetry. The continuous C∞ subgroup accounts for the independence of the den- sity from the azimuthal angleφ, motivating coordinates (r, θ) from the outset. The remaining discrete symmetry is the reflectio...

[5] [5]

1D normal-mode scans We apply label symmetry augmentation to the three vibrational normal modes of water, whose potential en- 7 0.2 0.0 0.2 q1 (Å amu ) 100 200E(q)(meV) EHO 0.08 0.00 0.08 q2 (Å amu ) 150 300 0.08 0.00 0.08 q3 (Å amu ) 100 200 0.2 0.0 0.2 q1 (Å amu ) 25 0 25 E(q)(meV) E2 E3 0.08 0.00 0.08 q2 (Å amu ) 50 0 50 0.08 0.00 0.08 q3 (Å amu ) 4 0 ...

[6] [6]

II D carry over to the full three-dimensional PES of water, sampled under the approximate inversion symmetryQ7→ −Q

Full 3D sampling We confirm that the floor predictions of Sec. II D carry over to the full three-dimensional PES of water, sampled under the approximate inversion symmetryQ7→ −Q. Results for two representations,Qand cMBDF, are shown in Fig. 4. The central result is that the learning curves for both representations plateau at the same floor under each aug-...

2023

[7] [7]

M. Rupp, A. Tkatchenko, K.-R. M¨ uller, and O. A. von Lilienfeld, Fast and accurate modeling of molecular atom- ization energies with machine learning, Phys. Rev. Lett. 108, 058301 (2012)

2012

[8] [8]

O. A. von Lilienfeld, Quantum machine learn- ing in chemical compound space, Angewandte Chemie International Edition57, 4164 (2018), http://dx.doi.org/10.1002/anie.201709686. 10

work page doi:10.1002/anie.201709686 2018

[9] [9]

F. A. Faber, L. Hutchison, B. Huang, J. Gilmer, S. S. Schoenholz, G. E. Dahl, O. Vinyals, S. Kearnes, P. F. Ri- ley, and O. A. Von Lilienfeld, Prediction errors of molecu- lar machine learning models lower than hybrid dft error, Journal of chemical theory and computation13, 5255 (2017)

2017

[10] [10]

Huang and O

B. Huang and O. A. von Lilienfeld, Quantum ma- chine learning using atom-in-molecule-based fragments selected on the fly, Nature chemistry12, 945 (2020)

2020

[11] [11]

Batatia, P

I. Batatia, P. Benner, Y. Chiang, A. M. Elena, D. P. Kov´ acs, J. Riebesell, X. R. Advincula, M. Asta, M. Avay- lon, W. J. Baldwin,et al., A foundation model for atom- istic materials chemistry, The Journal of chemical physics 163(2025)

2025

[12] [12]

K. Li, B. DeCost, K. Choudhary, M. Greenwood, and J. Hattrick-Simpers, A critical examination of robust- ness and generalizability of machine learning prediction of materials properties, npj Computational Materials9, 55 (2023)

2023

[13] [13]

Behler and M

J. Behler and M. Parrinello, Generalized neural-network representation of high-dimensional potential-energy sur- faces, Phys. Rev. Lett.98, 146401 (2007)

2007

[14] [14]

Behler, Atom-centered symmetry functions for con- structing high-dimensional neural networks potentials, J

J. Behler, Atom-centered symmetry functions for con- structing high-dimensional neural networks potentials, J. Comp. Phys.134, 074106 (2011)

2011

[15] [15]

B. J. Braams and J. M. Bowman, Permutationally in- variant potential energy surfaces in high dimensional- ity, International Reviews in Physical Chemistry28, 577 (2009)

2009

[16] [16]

A. P. Bart´ ok, R. Kondor, and G. Cs´ anyi, On representing chemical environments, Phys. Rev. B87, 184115 (2013)

2013

[17] [17]

Drautz, Atomic cluster expansion for accurate and transferable interatomic potentials, Physical Review B 99, 014104 (2019)

R. Drautz, Atomic cluster expansion for accurate and transferable interatomic potentials, Physical Review B 99, 014104 (2019)

2019

[18] [18]

A. S. Christensen, L. Bratholm, F. A. Faber, and O. A. von Lilienfeld, FCHL revisited: Faster and more accu- rate quantum machine learning, The Journal of Chemical Physics152, 044107 (2020)

2020

[19] [19]

K. T. Sch¨ utt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko, and K.-R. M¨ uller, Schnet–a deep learn- ing architecture for molecules and materials, The Journal of Chemical Physics148, 241722 (2018)

2018

[20] [20]

Haghighatlari, J

M. Haghighatlari, J. Li, X. Guan, O. Zhang, A. Das, C. J. Stein, F. Heidar-Zadeh, M. Liu, M. Head-Gordon, L. Bertels,et al., Newtonnet: a newtonian message pass- ing network for deep learning of interatomic potentials and forces, Digital Discovery1, 333 (2022)

2022

[21] [21]

V. G. Satorras, E. Hoogeboom, and M. Welling, E (n) equivariant graph neural networks, inInternational con- ference on machine learning(PMLR, 2021) pp. 9323– 9332

2021

[22] [22]

Sch¨ utt, O

K. Sch¨ utt, O. Unke, and M. Gastegger, Equivariant mes- sage passing for the prediction of tensorial properties and molecular spectra, inInternational conference on ma- chine learning(PMLR, 2021) pp. 9377–9388

2021

[23] [23]

Batzner, A

S. Batzner, A. Musaelian, L. Sun, M. Geiger, J. P. Mailoa, M. Kornbluth, N. Molinari, T. E. Smidt, and B. Kozinsky, E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials, Nature communications13, 2453 (2022)

2022

[24] [24]

Batatia, D

I. Batatia, D. P. Kovacs, G. Simm, C. Ortner, and G. Cs´ anyi, Mace: Higher order equivariant message pass- ing neural networks for fast and accurate force fields, Advances in neural information processing systems35, 11423 (2022)

2022

[25] [25]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems25 (2012)

2012

[26] [26]

Shorten and T

C. Shorten and T. M. Khoshgoftaar, A survey on image data augmentation for deep learning, Journal of big data 6, 1 (2019)

2019

[27] [27]

T. Dao, A. Gu, A. Ratner, V. Smith, C. De Sa, and C. R´ e, A kernel theory of modern data augmentation, in International conference on machine learning(PMLR,

[28] [28]

S. Chen, E. Dobriban, and J. H. Lee, A group-theoretic framework for data augmentation, Journal of Machine Learning Research21, 1 (2020)

2020

[29] [29]

Scaling Laws and Symmetry, Evidence from Neural Force Fields

K. Ngo and S. Ravanbakhsh, Scaling laws and symme- try: Evidence from neural force fields, arXiv preprint arXiv:2510.09768 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

Achieving Approximate Symmetry Is Exponentially Easier than Exact Symmetry

B. Tahmasebi and M. Weber, Achieving approximate symmetry is exponentially easier than exact symmetry, inInternational Conference on Learning Representations (2026) arXiv:2512.11855

work page internal anchor Pith review Pith/arXiv arXiv 2026

[31] [31]

M. F. Langer, S. N. Pozdnyakov, and M. Ceriotti, Prob- ing the effects of broken symmetries in machine learning, Machine Learning: Science and Technology5, 045058 (2024)

2024

[32] [32]

Domina, J

M. Domina, J. W. Abbott, P. Pegolo, F. Bigi, and M. Ce- riotti, How unconstrained machine-learning models learn physical symmetries, arXiv preprint arXiv:2603.24638 (2026)

work page arXiv 2026

[33] [33]

Elesedy and S

B. Elesedy and S. Zaidi, Provably strict generalisation benefit for invariant models, inProceedings of the 38th International Conference on Machine Learning(PMLR,

[34] [34]

Tahmasebi and S

B. Tahmasebi and S. Jegelka, The exact sample com- plexity gain from invariances for kernel regression, inAd- vances in Neural Information Processing Systems, Vol. 36 (2023)

2023

[35] [35]

M¨ uller, S

K.-R. M¨ uller, S. Mika, G. R¨ atsch, K. Tsuda, and B. Sch¨ olkopf, An introduction to kernel-based learning algorithms, IEEE Transactions on Neural Networks12, 181 (2001)

2001

[36] [36]

Huang and O

B. Huang and O. A. von Lilienfeld, Ab initio machine learning in chemical compound space, Chemical Reviews 121, 10001 (2021)

2021

[37] [37]

Blumer, A

A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. War- muth, Learnability and the Vapnik–Chervonenkis dimen- sion, Journal of the ACM36, 929 (1989)

1989

[38] [38]

Ehrenfeucht, D

A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant, A general lower bound on the number of examples needed for learning, Information and Computation82, 247 (1989)

1989

[39] [39]

E. B. Baum and D. Haussler, What size net gives valid generalization?, Advances in Neural Information Process- ing Systems1(1988)

1988

[40] [40]

Haussler and M

D. Haussler and M. Opper,Calculation of the learning curve of Bayes optimal classification algorithm for learn- ing a perceptron with noise(University of California at Santa Cruz, 1991)

1991

[41] [41]

Levin, N

E. Levin, N. Tishby, and S. A. Solla, A statistical ap- proach to learning and generalization in layered neural networks, Proceedings of the IEEE78, 1568 (1990)

1990

[42] [42]

H. S. Seung, H. Sompolinsky, and N. Tishby, Statistical mechanics of learning from examples, Physical Review A 11 45, 6056 (1992)

1992

[43] [43]

Rissanen, Stochastic complexity and modeling, The Annals of Statistics14, 1080 (1986)

J. Rissanen, Stochastic complexity and modeling, The Annals of Statistics14, 1080 (1986)

1986

[44] [44]

Amari, N

S.-i. Amari, N. Fujita, and S. Shinomoto, Four types of learning curves, Neural Computation4, 605 (1992)

1992

[45] [45]

Amari, A universal theorem on learning curves, Neu- ral Networks6, 161 (1993)

S.-i. Amari, A universal theorem on learning curves, Neu- ral Networks6, 161 (1993)

1993

[46] [46]

Cortes, L

C. Cortes, L. D. Jackel, S. A. Solla, V. Vapnik, and J. S. Denker, Learning curves: Asymptotic values and rate of convergence, inAdvances in Neural Information Process- ing Systems, Vol. 6 (1993) pp. 327–334

1993

[47] [47]

M¨ uller, M

K.-R. M¨ uller, M. Finke, N. Murata, K. Schulten, and S.-i. Amari, A numerical study on learning curves in stochas- tic multilayer feedforward networks, Neural Computation 8, 1085 (1996)

1996

[48] [48]

Deep Learning Scaling is Predictable, Empirically

J. Hestness, S. Narang, N. Ardalani, G. Diamos, H. Jun, H. Kianinejad, M. M. A. Patwary, Y. Yang, and Y. Zhou, Deep learning scaling is predictable, empirically, arXiv preprint arXiv:1712.00409 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[49] [49]

Spigler, M

S. Spigler, M. Geiger, S. d’Ascoli, L. Sagun, G. Biroli, and M. Wyart, Asymptotic learning curves of kernel methods: Empirical data versus teacher–student paradigm, Journal of Statistical Mechanics: Theory and Experiment2020, 124001 (2020)

2020

[50] [50]

Z. Li, W. J. Zhang, and Q. Lin, On the asymptotic learn- ing curves of kernel ridge regression under power-law de- cay, inAdvances in Neural Information Processing Sys- tems, Vol. 36 (2023)

2023

[51] [51]

Amari and N

S.-i. Amari and N. Murata, Statistical theory of learning curves under entropic loss criterion, Neural Computation 5, 140 (1993)

1993

[52] [52]

J. S. Rosenfeld, A. Rosenfeld, Y. Belinkov, and N. Shavit, A constructive prediction of the generalization error across scales, inInternational Conference on Learning Representations(ICLR, 2020)

2020

[53] [53]

experiments

L. Verlet, Computer “experiments” on classical flu- ids. I. Thermodynamical properties of Lennard-Jones molecules, Physical Review159, 98 (1967)

1967

[54] [54]

Q. Sun, T. C. Berkelbach, N. S. Blunt, G. H. Booth, S. Guo, Z. Li, J. Liu, J. D. McClain, E. R. Sayfutyarova, S. Sharma, S. Wouters, and G. K.-L. Chan, PySCF: the Python-based simulations of chemistry framework (2017)

2017

[55] [55]

Chai and M

J.-D. Chai and M. Head-Gordon, Long-range corrected hybrid density functionals with damped atom–atom dispersion corrections, Physical Chemistry Chemical Physics10, 6615 (2008)

2008

[56] [56]

Grimme, J

S. Grimme, J. Antony, S. Ehrlich, and H. Krieg, A con- sistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 el- ements H-Pu, Journal of Chemical Physics132, 154104 (2010)

2010

[57] [57]

Weigend and R

F. Weigend and R. Ahlrichs, Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy, Physical Chemistry Chemical Physics7, 3297 (2005)

2005

[58] [58]

Heinen, D

S. Heinen, D. Khan, and O. A. von Lilienfeld, QML2: Quantum machine learning package (2024), software package

2024

[59] [59]

D. Khan, S. Heinen, and O. A. von Lilienfeld, Quan- tum machine learning at record speed: Many-body dis- tribution functionals as compact representations, arXiv preprint arXiv:2303.16312 (2023)

work page arXiv 2023

[60] [60]

D. Khan, S. Heinen, and O. A. von Lilienfeld, Gener- alized convolutional many-body distribution functional representations, Proceedings of the National Academy of Sciences122, e2415662122 (2025)

2025