Approximate Label Symmetries Improve Data Scaling
Pith reviewed 2026-06-29 09:45 UTC · model grok-4.3
The pith
Approximate label symmetries improve machine learning scaling laws for electron densities and molecular energies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Exploiting exact as well as approximate label symmetries can benefit scaling laws. ML models of the s, p, d orbital densities of the hydrogen atom, the three vibrational normal modes of the water molecule, and its full 3D potential energy hypersurface exhibit superior learning curves. When label symmetries are not exact the same principles govern learning behavior up to convergence floors set by the degree of approximation. For convex wells a Hessian-based correction suppresses the leading symmetry-breaking error in augmented labels.
What carries the argument
Label symmetries applied to augment training data, with a Hessian correction for approximate cases in convex potential wells.
If this is right
- ML models for electron density and potential energies achieve improved generalization efficiency.
- Learning curves follow the same scaling principles for approximate symmetries until limited by approximation degree.
- Hessian correction suppresses leading symmetry-breaking error for convex wells in molecular potential energy surfaces.
Where Pith is reading between the lines
- The method may extend to other quantum chemistry tasks where near-symmetries appear in molecular properties.
- It could lower data requirements for training models on systems with partial symmetry.
- Similar augmentation might apply to learning curves in other domains with approximate invariances.
Load-bearing premise
Scaling principles continue to govern learning behavior when label symmetries are approximate, with performance floors set solely by the degree of approximation.
What would settle it
Check whether learning curves using approximate label symmetries plateau exactly at heights predicted by the measured degree of symmetry breaking, and whether the Hessian correction removes the dominant error term in convex wells of the water potential energy surface.
Figures
read the original abstract
Enforcing universal symmetries in machine learning (ML) models is a common strategy to mitigate data scarcity. We show that exploiting exact, as well as approximate, label symmetries can benefit scaling laws. We illustrate the idea for the s, p, d orbital densities of the electron in the hydrogen atom, for the three vibrational normal modes of the water molecule, as well as its full 3D potential energy hypersurface. Resulting ML models of electron density and potential energies exhibit superior learning curves, demonstrating improved generalization efficiency. When label symmetries are not exact, the same principles govern the observed learning behavior -- up to the convergence floors set by the degree to which the symmetry is approximate. For convex wells in the molecular potential energy surface, a Hessian-based correction suppresses the leading symmetry-breaking error in augmented labels.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that exploiting exact and approximate label symmetries improves data scaling in machine learning models for physical systems. It illustrates this with s/p/d orbital densities of the hydrogen atom, the three vibrational normal modes of water, and the full 3D potential energy surface of water. Resulting models show superior learning curves. For approximate symmetries the same scaling principles are said to hold up to convergence floors set by the degree of approximation; a Hessian-based correction is proposed to suppress the leading symmetry-breaking error for convex wells in the PES.
Significance. If the quantitative results hold, the work provides a practical route to data-efficient ML for quantum chemistry by relaxing the requirement for exact symmetries while retaining scaling benefits. The concrete examples (H-atom orbitals, water modes, PES) and the explicit Hessian correction for convex wells are strengths; the absence of free parameters or ad-hoc axioms in the core argument is also positive.
major comments (2)
- [Abstract, §3–4] Abstract and §3–4 (results on learning curves): the central empirical claims of superior learning curves and the quantitative effect of the Hessian correction are asserted without reported error bars, training-set sizes, number of independent runs, or exclusion criteria for the augmented labels. This information is load-bearing for the scaling-law assertions and must be supplied before the claims can be evaluated.
- [§4.2] §4.2 (Hessian correction): the statement that the correction 'suppresses the leading symmetry-breaking error' for convex wells is presented without an explicit derivation showing that higher-order terms remain negligible across the tested range of displacements; a short expansion or numerical check of the neglected terms is needed to support the claim.
minor comments (2)
- [Figures 2–3] Figure 2 and 3 captions should state the precise definition of the 'augmented label' set and the metric used for the learning curves (e.g., MAE on density or energy).
- [Introduction] The introduction should define 'label symmetry' at first use rather than relying on the later technical sections.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The two major comments identify important omissions in statistical reporting and justification of the Hessian correction; both are addressable by revision.
read point-by-point responses
-
Referee: [Abstract, §3–4] Abstract and §3–4 (results on learning curves): the central empirical claims of superior learning curves and the quantitative effect of the Hessian correction are asserted without reported error bars, training-set sizes, number of independent runs, or exclusion criteria for the augmented labels. This information is load-bearing for the scaling-law assertions and must be supplied before the claims can be evaluated.
Authors: We agree that these experimental details are required to evaluate the scaling claims. The revised manuscript now reports error bars obtained from ten independent training runs per data point, lists the precise training-set cardinalities used for each learning curve, and states the exclusion criterion applied to augmented labels (augmented labels were retained only when the symmetry-breaking residual lay below a threshold set by the norm of the Hessian at the reference geometry). These additions appear in §§3–4 together with a brief methods paragraph; the abstract has been updated to note the statistical controls. revision: yes
-
Referee: [§4.2] §4.2 (Hessian correction): the statement that the correction 'suppresses the leading symmetry-breaking error' for convex wells is presented without an explicit derivation showing that higher-order terms remain negligible across the tested range of displacements; a short expansion or numerical check of the neglected terms is needed to support the claim.
Authors: We accept that an explicit expansion strengthens the claim. The revised §4.2 now contains a short Taylor expansion of the potential about the equilibrium geometry, demonstrating that the leading symmetry-breaking term is quadratic in the displacement vector and is exactly cancelled by the Hessian correction, while cubic and higher contributions scale as O(‖δ‖³). For the displacement magnitudes employed in the water PES experiments (‖δ‖ ≤ 0.1 Å), a supplementary numerical check shows that the neglected terms remain below 5 % of the quadratic residual. This material has been inserted as a new paragraph with an accompanying figure panel. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper reports empirical results on concrete systems (H atom orbital densities, water vibrational modes, and 3D PES) showing improved learning curves when exploiting exact and approximate label symmetries. Claims about scaling behavior and Hessian corrections for convex wells are presented as observations from these examples, with performance floors tied to approximation degree. No load-bearing step reduces by construction to fitted inputs, self-citations, or renamed known results; the central claims remain independent of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Approximate label symmetries still improve scaling laws up to a convergence floor determined by the degree of approximation
Reference graph
Works this paper leans on
-
[1]
Data Augmentation The most naive approach, applicable to finite groups, is to augment the training set with symmetry-related in- puts carrying the same label: {(xi, yi)}N i=1 − → {(g·x i, yi) :g∈G, i= 1, . . . , N}. (4) For exact symmetries, this explicitly ties together all points on each orbit in the empirical risk; exact invari- ance of the predictor a...
-
[2]
Input Transformation A more structured approach is to replacexby a symmetry-adapted representationM(x) that isG- invariant by Eq. 3. Learningf ∗ onM(x) is equivalent to learning on an invariant subspace ofx, which eliminates redundant degrees of freedom. In practice,M(x) may be implemented as a canonicalization to a fundamental domain or as an invariant d...
-
[3]
The label symmetry of the target isO(3) invariance, or invariance under all rotations of the sphere
Continuous symmetries Our first demonstration uses the hydrogens-orbital densitiesρ n(r)∝R n0(r)2, whereR n0 is the radial wave- function with principal quantum numbern. The label symmetry of the target isO(3) invariance, or invariance under all rotations of the sphere. We enforce this symme- try by performing the regression on the rotation-invariant coor...
-
[4]
The 2p z density hasD ∞h symmetry
Discrete symmetries We next consider discrete point group symmetries in the densities of the 2p z and 3dxz hydrogen orbitals. The 2p z density hasD ∞h symmetry. The continuous C∞ subgroup accounts for the independence of the den- sity from the azimuthal angleφ, motivating coordinates (r, θ) from the outset. The remaining discrete symmetry is the reflectio...
-
[5]
1D normal-mode scans We apply label symmetry augmentation to the three vibrational normal modes of water, whose potential en- 7 0.2 0.0 0.2 q1 (Å amu ) 100 200E(q)(meV) EHO 0.08 0.00 0.08 q2 (Å amu ) 150 300 0.08 0.00 0.08 q3 (Å amu ) 100 200 0.2 0.0 0.2 q1 (Å amu ) 25 0 25 E(q)(meV) E2 E3 0.08 0.00 0.08 q2 (Å amu ) 50 0 50 0.08 0.00 0.08 q3 (Å amu ) 4 0 ...
-
[6]
II D carry over to the full three-dimensional PES of water, sampled under the approximate inversion symmetryQ7→ −Q
Full 3D sampling We confirm that the floor predictions of Sec. II D carry over to the full three-dimensional PES of water, sampled under the approximate inversion symmetryQ7→ −Q. Results for two representations,Qand cMBDF, are shown in Fig. 4. The central result is that the learning curves for both representations plateau at the same floor under each aug-...
2023
-
[7]
M. Rupp, A. Tkatchenko, K.-R. M¨ uller, and O. A. von Lilienfeld, Fast and accurate modeling of molecular atom- ization energies with machine learning, Phys. Rev. Lett. 108, 058301 (2012)
2012
-
[8]
O. A. von Lilienfeld, Quantum machine learn- ing in chemical compound space, Angewandte Chemie International Edition57, 4164 (2018), http://dx.doi.org/10.1002/anie.201709686. 10
-
[9]
F. A. Faber, L. Hutchison, B. Huang, J. Gilmer, S. S. Schoenholz, G. E. Dahl, O. Vinyals, S. Kearnes, P. F. Ri- ley, and O. A. Von Lilienfeld, Prediction errors of molecu- lar machine learning models lower than hybrid dft error, Journal of chemical theory and computation13, 5255 (2017)
2017
-
[10]
Huang and O
B. Huang and O. A. von Lilienfeld, Quantum ma- chine learning using atom-in-molecule-based fragments selected on the fly, Nature chemistry12, 945 (2020)
2020
-
[11]
Batatia, P
I. Batatia, P. Benner, Y. Chiang, A. M. Elena, D. P. Kov´ acs, J. Riebesell, X. R. Advincula, M. Asta, M. Avay- lon, W. J. Baldwin,et al., A foundation model for atom- istic materials chemistry, The Journal of chemical physics 163(2025)
2025
-
[12]
K. Li, B. DeCost, K. Choudhary, M. Greenwood, and J. Hattrick-Simpers, A critical examination of robust- ness and generalizability of machine learning prediction of materials properties, npj Computational Materials9, 55 (2023)
2023
-
[13]
Behler and M
J. Behler and M. Parrinello, Generalized neural-network representation of high-dimensional potential-energy sur- faces, Phys. Rev. Lett.98, 146401 (2007)
2007
-
[14]
Behler, Atom-centered symmetry functions for con- structing high-dimensional neural networks potentials, J
J. Behler, Atom-centered symmetry functions for con- structing high-dimensional neural networks potentials, J. Comp. Phys.134, 074106 (2011)
2011
-
[15]
B. J. Braams and J. M. Bowman, Permutationally in- variant potential energy surfaces in high dimensional- ity, International Reviews in Physical Chemistry28, 577 (2009)
2009
-
[16]
A. P. Bart´ ok, R. Kondor, and G. Cs´ anyi, On representing chemical environments, Phys. Rev. B87, 184115 (2013)
2013
-
[17]
Drautz, Atomic cluster expansion for accurate and transferable interatomic potentials, Physical Review B 99, 014104 (2019)
R. Drautz, Atomic cluster expansion for accurate and transferable interatomic potentials, Physical Review B 99, 014104 (2019)
2019
-
[18]
A. S. Christensen, L. Bratholm, F. A. Faber, and O. A. von Lilienfeld, FCHL revisited: Faster and more accu- rate quantum machine learning, The Journal of Chemical Physics152, 044107 (2020)
2020
-
[19]
K. T. Sch¨ utt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko, and K.-R. M¨ uller, Schnet–a deep learn- ing architecture for molecules and materials, The Journal of Chemical Physics148, 241722 (2018)
2018
-
[20]
Haghighatlari, J
M. Haghighatlari, J. Li, X. Guan, O. Zhang, A. Das, C. J. Stein, F. Heidar-Zadeh, M. Liu, M. Head-Gordon, L. Bertels,et al., Newtonnet: a newtonian message pass- ing network for deep learning of interatomic potentials and forces, Digital Discovery1, 333 (2022)
2022
-
[21]
V. G. Satorras, E. Hoogeboom, and M. Welling, E (n) equivariant graph neural networks, inInternational con- ference on machine learning(PMLR, 2021) pp. 9323– 9332
2021
-
[22]
Sch¨ utt, O
K. Sch¨ utt, O. Unke, and M. Gastegger, Equivariant mes- sage passing for the prediction of tensorial properties and molecular spectra, inInternational conference on ma- chine learning(PMLR, 2021) pp. 9377–9388
2021
-
[23]
Batzner, A
S. Batzner, A. Musaelian, L. Sun, M. Geiger, J. P. Mailoa, M. Kornbluth, N. Molinari, T. E. Smidt, and B. Kozinsky, E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials, Nature communications13, 2453 (2022)
2022
-
[24]
Batatia, D
I. Batatia, D. P. Kovacs, G. Simm, C. Ortner, and G. Cs´ anyi, Mace: Higher order equivariant message pass- ing neural networks for fast and accurate force fields, Advances in neural information processing systems35, 11423 (2022)
2022
-
[25]
Krizhevsky, I
A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems25 (2012)
2012
-
[26]
Shorten and T
C. Shorten and T. M. Khoshgoftaar, A survey on image data augmentation for deep learning, Journal of big data 6, 1 (2019)
2019
-
[27]
T. Dao, A. Gu, A. Ratner, V. Smith, C. De Sa, and C. R´ e, A kernel theory of modern data augmentation, in International conference on machine learning(PMLR,
-
[28]
S. Chen, E. Dobriban, and J. H. Lee, A group-theoretic framework for data augmentation, Journal of Machine Learning Research21, 1 (2020)
2020
-
[29]
Scaling Laws and Symmetry, Evidence from Neural Force Fields
K. Ngo and S. Ravanbakhsh, Scaling laws and symme- try: Evidence from neural force fields, arXiv preprint arXiv:2510.09768 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[30]
Achieving Approximate Symmetry Is Exponentially Easier than Exact Symmetry
B. Tahmasebi and M. Weber, Achieving approximate symmetry is exponentially easier than exact symmetry, inInternational Conference on Learning Representations (2026) arXiv:2512.11855
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[31]
M. F. Langer, S. N. Pozdnyakov, and M. Ceriotti, Prob- ing the effects of broken symmetries in machine learning, Machine Learning: Science and Technology5, 045058 (2024)
2024
- [32]
-
[33]
Elesedy and S
B. Elesedy and S. Zaidi, Provably strict generalisation benefit for invariant models, inProceedings of the 38th International Conference on Machine Learning(PMLR,
-
[34]
Tahmasebi and S
B. Tahmasebi and S. Jegelka, The exact sample com- plexity gain from invariances for kernel regression, inAd- vances in Neural Information Processing Systems, Vol. 36 (2023)
2023
-
[35]
M¨ uller, S
K.-R. M¨ uller, S. Mika, G. R¨ atsch, K. Tsuda, and B. Sch¨ olkopf, An introduction to kernel-based learning algorithms, IEEE Transactions on Neural Networks12, 181 (2001)
2001
-
[36]
Huang and O
B. Huang and O. A. von Lilienfeld, Ab initio machine learning in chemical compound space, Chemical Reviews 121, 10001 (2021)
2021
-
[37]
Blumer, A
A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. War- muth, Learnability and the Vapnik–Chervonenkis dimen- sion, Journal of the ACM36, 929 (1989)
1989
-
[38]
Ehrenfeucht, D
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant, A general lower bound on the number of examples needed for learning, Information and Computation82, 247 (1989)
1989
-
[39]
E. B. Baum and D. Haussler, What size net gives valid generalization?, Advances in Neural Information Process- ing Systems1(1988)
1988
-
[40]
Haussler and M
D. Haussler and M. Opper,Calculation of the learning curve of Bayes optimal classification algorithm for learn- ing a perceptron with noise(University of California at Santa Cruz, 1991)
1991
-
[41]
Levin, N
E. Levin, N. Tishby, and S. A. Solla, A statistical ap- proach to learning and generalization in layered neural networks, Proceedings of the IEEE78, 1568 (1990)
1990
-
[42]
H. S. Seung, H. Sompolinsky, and N. Tishby, Statistical mechanics of learning from examples, Physical Review A 11 45, 6056 (1992)
1992
-
[43]
Rissanen, Stochastic complexity and modeling, The Annals of Statistics14, 1080 (1986)
J. Rissanen, Stochastic complexity and modeling, The Annals of Statistics14, 1080 (1986)
1986
-
[44]
Amari, N
S.-i. Amari, N. Fujita, and S. Shinomoto, Four types of learning curves, Neural Computation4, 605 (1992)
1992
-
[45]
Amari, A universal theorem on learning curves, Neu- ral Networks6, 161 (1993)
S.-i. Amari, A universal theorem on learning curves, Neu- ral Networks6, 161 (1993)
1993
-
[46]
Cortes, L
C. Cortes, L. D. Jackel, S. A. Solla, V. Vapnik, and J. S. Denker, Learning curves: Asymptotic values and rate of convergence, inAdvances in Neural Information Process- ing Systems, Vol. 6 (1993) pp. 327–334
1993
-
[47]
M¨ uller, M
K.-R. M¨ uller, M. Finke, N. Murata, K. Schulten, and S.-i. Amari, A numerical study on learning curves in stochas- tic multilayer feedforward networks, Neural Computation 8, 1085 (1996)
1996
-
[48]
Deep Learning Scaling is Predictable, Empirically
J. Hestness, S. Narang, N. Ardalani, G. Diamos, H. Jun, H. Kianinejad, M. M. A. Patwary, Y. Yang, and Y. Zhou, Deep learning scaling is predictable, empirically, arXiv preprint arXiv:1712.00409 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[49]
Spigler, M
S. Spigler, M. Geiger, S. d’Ascoli, L. Sagun, G. Biroli, and M. Wyart, Asymptotic learning curves of kernel methods: Empirical data versus teacher–student paradigm, Journal of Statistical Mechanics: Theory and Experiment2020, 124001 (2020)
2020
-
[50]
Z. Li, W. J. Zhang, and Q. Lin, On the asymptotic learn- ing curves of kernel ridge regression under power-law de- cay, inAdvances in Neural Information Processing Sys- tems, Vol. 36 (2023)
2023
-
[51]
Amari and N
S.-i. Amari and N. Murata, Statistical theory of learning curves under entropic loss criterion, Neural Computation 5, 140 (1993)
1993
-
[52]
J. S. Rosenfeld, A. Rosenfeld, Y. Belinkov, and N. Shavit, A constructive prediction of the generalization error across scales, inInternational Conference on Learning Representations(ICLR, 2020)
2020
-
[53]
experiments
L. Verlet, Computer “experiments” on classical flu- ids. I. Thermodynamical properties of Lennard-Jones molecules, Physical Review159, 98 (1967)
1967
-
[54]
Q. Sun, T. C. Berkelbach, N. S. Blunt, G. H. Booth, S. Guo, Z. Li, J. Liu, J. D. McClain, E. R. Sayfutyarova, S. Sharma, S. Wouters, and G. K.-L. Chan, PySCF: the Python-based simulations of chemistry framework (2017)
2017
-
[55]
Chai and M
J.-D. Chai and M. Head-Gordon, Long-range corrected hybrid density functionals with damped atom–atom dispersion corrections, Physical Chemistry Chemical Physics10, 6615 (2008)
2008
-
[56]
Grimme, J
S. Grimme, J. Antony, S. Ehrlich, and H. Krieg, A con- sistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 el- ements H-Pu, Journal of Chemical Physics132, 154104 (2010)
2010
-
[57]
Weigend and R
F. Weigend and R. Ahlrichs, Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy, Physical Chemistry Chemical Physics7, 3297 (2005)
2005
-
[58]
Heinen, D
S. Heinen, D. Khan, and O. A. von Lilienfeld, QML2: Quantum machine learning package (2024), software package
2024
- [59]
-
[60]
D. Khan, S. Heinen, and O. A. von Lilienfeld, Gener- alized convolutional many-body distribution functional representations, Proceedings of the National Academy of Sciences122, e2415662122 (2025)
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.