Beyond Adam: SOAP and Muon for Faster, Label-Efficient Training of Machine Learning Interatomic Potentials

Boris Kozinsky; Chuin Wei Tan; Gil Harari; Laura Zichi; Marc L. Descoteaux; Ola Tangen Kulseng; Yoel Zimmermann

arxiv: 2607.02499 · v1 · pith:FSL3OQYWnew · submitted 2026-07-02 · 💻 cs.LG · cs.AI· physics.chem-ph· physics.comp-ph

Beyond Adam: SOAP and Muon for Faster, Label-Efficient Training of Machine Learning Interatomic Potentials

Gil Harari , Yoel Zimmermann , Ola Tangen Kulseng , Laura Zichi , Chuin Wei Tan , Marc L. Descoteaux , Boris Kozinsky This is my paper

Pith reviewed 2026-07-03 16:21 UTC · model grok-4.3

classification 💻 cs.LG cs.AIphysics.chem-phphysics.comp-ph

keywords machine learning interatomic potentialsoptimizersSOAPMuonNequIPAllegropartial supervisiontraining efficiency

0 comments

The pith

Matrix-structured optimizers like SOAP and SOAP-Muon train ML interatomic potentials faster and more accurately than Adam.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether replacing the default Adam optimizer with recently proposed matrix-structured methods improves training of NequIP and Allegro models for machine learning interatomic potentials. It reports that SOAP and the hybrid SOAP-Muon reach better final accuracy in fewer steps, with the largest gains appearing when only partial force labels are available during training. A sympathetic reader would care because optimizer choice has received little attention in the MLIP literature, yet the results suggest it can deliver measurable efficiency improvements without changes to architecture or data collection.

Core claim

SOAP and SOAP-Muon substantially outperform Adam in both convergence speed and final accuracy on NequIP and Allegro models, while Muon alone yields only partial gains; the advantage of the stronger methods is especially clear under partial force supervision.

What carries the argument

Matrix-structured optimizers (Muon, SOAP, SOAP-Muon) that maintain and update structured matrices of parameters rather than independent per-element adaptive rates.

If this is right

SOAP and SOAP-Muon deliver consistent gains across the tested architectures and datasets.
Partial force supervision amplifies the benefit of these optimizers relative to Adam.
Muon by itself improves training only modestly compared with the SOAP variants.
Optimizer selection becomes a practical lever for reducing training cost in MLIP development.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the gains hold for other architectures, routine adoption of SOAP could lower the data volume required to reach target accuracy.
The interaction between optimizer structure and partial supervision may point to broader principles for label-efficient scientific machine learning.
Replicating the comparison on larger or more diverse datasets would test whether the ranking of methods remains stable.

Load-bearing premise

Observed performance gaps are caused mainly by the choice of optimizer rather than by interactions with the specific model architectures, data distributions, or hyper-parameter tuning effort.

What would settle it

An experiment in which Adam is given exhaustive per-model hyper-parameter search and still fails to match the final accuracy or step count of SOAP on the same NequIP and Allegro training runs.

Figures

Figures reproduced from arXiv: 2607.02499 by Boris Kozinsky, Chuin Wei Tan, Gil Harari, Laura Zichi, Marc L. Descoteaux, Ola Tangen Kulseng, Yoel Zimmermann.

**Figure 1.** Figure 1: Relative test MAE vs AdamW at full (100%) force supervision. Each cell corresponds to MAEoptimizer/MAEAdamW@100%F. Green cells are below AdamW@100%F (better), red cells above. Boldface values indicate the best (lowest) MAE within each force-% column. MLIPs and comparing resulting observables against experimental measurements and ab initio reference data. 4. Results 4.1. Full Supervision Accuracy Gains Und… view at source ↗

**Figure 2.** Figure 2: Wall-clock convergence under full force supervision. Validation force MAE versus wall-clock time on CDP (a) and water (b). Lines are per-epoch medians and shaded bands are interquartile ranges across three seeds, all executed on NVIDIA A100 GPUs. The dashed line marks AdamW’s minimum median force MAE and circles indicate the first wall-clock time at which each optimizer’s median curve crosses that level. S… view at source ↗

**Figure 3.** Figure 3: Radial distribution functions (RDFs) of O–H pairs in CDP obtained from MD simulations using AdamW (blue) and SOAP-Muon (pink) models trained on energies and 5% of forces, compared against AIMD ground truth (black). ing almost immediately and producing nonphysical results under the same conditions, as shown in [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Force and per-atom energy validation training curves for a tuned SOAP-Muon (ρ = 0.5, (β1, β2) = (0.95, 0.95) for both parameter groups) and an untuned SOAP-Muon (ρ = 0.0, (β1, β2) = (0.95, 0.98) for the Muon parameter group and (β1, β2) = (0.9, 0.95) for the Adam parameter group). Solid lines are per-epoch means, while the faint background lines show each individual seed run. Both tuned and untuned variant… view at source ↗

**Figure 5.** Figure 5: Physical observables obtained from MD simulations of CDP using energy and force trained MLIPs with AdamW (blue), Muon (orange), SOAP (green), and SOAP-Muon (purple), compared against ab initio reference curves. (a–c) Radial distribution functions for the O–H, O–O, and P–O pairs, respectively. (d) Mean squared displacement of H calculated from the corresponding MD trajectories. 18 [PITH_FULL_IMAGE:figures/… view at source ↗

**Figure 6.** Figure 6: Physical observables obtained from MD simulations of water using energy-and-force-trained MLIPs with AdamW (blue), Muon (orange), SOAP (green), and SOAP-Muon (purple). (a–c) Radial distribution functions (RDFs) for the O–O, H–H, and O–H pairs, respectively. O-O RDF is compared against experimental data from Skinner et al. (2014) (d) Mean squared displacement calculated from the corresponding MD trajectorie… view at source ↗

**Figure 7.** Figure 7: Physical observables obtained from MD simulations of CDP using MLIPs trained with AdamW (blue) and SOAP-Muon (purple), on energy labels and varying fractions of force labels: 5%, 10%, 50%, and 100% force supervision (lighter to darker shades indicate increasing fractions of force labels). (a-c) RDFs for the O–H, O–O, and P–O pairs with AdamW. (d-f) RDFs for the O–H, O–O, and P–O pairs with SOAP-Muon. (g) M… view at source ↗

**Figure 8.** Figure 8: Physical observables obtained from MD simulations of water using MLIPs trained with AdamW (blue) and SOAP (green), on energy labels and varying fractions of force labels: 5% and 100% force supervision (lighter to darker shades indicate increasing fractions of force labels). (a-c) RDFs for the O–O, O–H, and H–H pairs with AdamW. (d-f) RDFs for the O–O, O–H, and H–H pairs with SOAP-Muon. (g) MSD calculated f… view at source ↗

read the original abstract

Machine learning interatomic potentials (MLIPs) have become a hallmark of AI for scientific simulation. While efforts on new architectures and datasets have led to increasingly accurate and general models, the choice of optimizer for training has largely remained unexplored, defaulting to Adam and its variants in the community. Here, we implement and systematically compare a class of recently proposed matrix-structured optimizers, including Muon, SOAP, and the hybrid SOAP-Muon, for training NequIP and Allegro MLIP models. We find that these optimizers can substantially outperform Adam in both convergence speed and final accuracy. SOAP and SOAP-Muon emerge as robust and consistently strong methods, while Muon only provides partial gains relative to Adam. The improvements are particularly pronounced under partial force supervision. Our results indicate that optimizer choice is an overlooked yet impactful design axis for MLIPs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SOAP and SOAP-Muon beat Adam on MLIP training speed and accuracy under partial labels, but the comparison needs equal tuning checks to hold up.

read the letter

Hi colleague,

The main thing to know is that SOAP and the SOAP-Muon hybrid train NequIP and Allegro models faster and reach higher accuracy than Adam, with the clearest edge when force labels are only partial. Muon alone helps less consistently. That is the practical result if the numbers survive scrutiny.

The work applies matrix-structured optimizers to ML interatomic potential training and stresses the label-efficiency angle. This specific head-to-head on two standard architectures has not been reported in the prior literature the abstract cites, so the comparison itself is new. The paper does well by holding the models fixed and varying only the optimizer, which gives a clean test of the claim. It also identifies SOAP as the most robust option, which is a usable takeaway for people already running these models.

The soft spot is the thin experimental detail. The abstract supplies no counts of independent runs, error bars, statistical tests, or the hyperparameter search protocol for Adam versus the new methods. The stress-test concern lands here: if Adam used narrower search or defaults while SOAP received broader tuning, the reported gaps could shrink. That risk is highest in the partial-supervision regime the paper highlights. The central claim is empirical, so it needs those controls to be convincing.

This paper is for researchers who train ML interatomic potentials on limited data and want to test a different optimizer without changing architecture or collecting more labels. A practitioner in computational materials would see immediate value in trying the suggested methods.

It deserves peer review because the question is practical and the domain application is fresh. A referee can request the missing protocol details and confirm the tuning was fair. I would bring it to a reading group as maybe, to examine the methods section. I would not cite it in my own work in the next year unless the gains hold under equal effort. The thinking is direct and engages the relevant optimizer and MLIP literature without overclaiming.

Referee Report

2 major / 2 minor

Summary. The paper implements matrix-structured optimizers (Muon, SOAP, SOAP-Muon) and benchmarks them against Adam for training NequIP and Allegro MLIP models on standard datasets. It claims that SOAP and SOAP-Muon deliver substantially faster convergence and higher final accuracy than Adam, with the largest gains under partial force supervision; Muon yields only partial improvements. The work positions optimizer choice as an overlooked but high-impact axis for MLIP training efficiency.

Significance. If the empirical gaps survive equivalent hyperparameter tuning and statistical controls, the result would establish optimizer architecture as a practical lever for label-efficient MLIP training, complementing architecture and dataset advances. The absence of any parameter-free derivations or machine-checked claims means significance rests entirely on the reproducibility and fairness of the reported benchmarks.

major comments (2)

[Methods/Experiments] The experimental protocol (Methods/Experiments section) supplies no quantitative details on run counts, error bars, statistical tests, hyper-parameter search budget, learning-rate schedules, or data splits. Without these, it is impossible to determine whether the reported superiority of SOAP/SOAP-Muon is statistically reliable or an artifact of unequal tuning effort between Adam and the new optimizers.
[Results and Abstract] The central claim that performance differences are driven by optimizer structure (rather than tuning disparity) is load-bearing for the abstract and results. The manuscript describes systematic comparison for the new methods but provides no corresponding protocol or search budget for Adam baselines, leaving open the possibility that observed gaps reflect optimization disparity rather than matrix-structured advantages.

minor comments (2)

[Methods] Clarify the precise implementation details of SOAP and Muon within the NequIP and Allegro codebases (e.g., how matrix operations interface with the force/energy loss terms).
[Appendix or Experiments] Add a table or figure summarizing the exact hyperparameter ranges searched for each optimizer to allow direct comparison of tuning effort.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments emphasizing reproducibility and fair comparisons. We will revise the manuscript to supply the requested experimental details and to make the hyperparameter protocol explicit and uniform across all optimizers.

read point-by-point responses

Referee: [Methods/Experiments] The experimental protocol (Methods/Experiments section) supplies no quantitative details on run counts, error bars, statistical tests, hyper-parameter search budget, learning-rate schedules, or data splits. Without these, it is impossible to determine whether the reported superiority of SOAP/SOAP-Muon is statistically reliable or an artifact of unequal tuning effort between Adam and the new optimizers.

Authors: We agree these quantitative details are necessary. The revised Methods section will report: 5 independent random seeds per configuration with mean and standard deviation; the hyperparameter search budget (identical grid over learning rate, weight decay, and schedule parameters for every optimizer); the learning-rate schedules employed; and the precise train/validation/test splits. These additions will allow direct assessment of statistical reliability and confirm equal tuning effort. revision: yes
Referee: [Results and Abstract] The central claim that performance differences are driven by optimizer structure (rather than tuning disparity) is load-bearing for the abstract and results. The manuscript describes systematic comparison for the new methods but provides no corresponding protocol or search budget for Adam baselines, leaving open the possibility that observed gaps reflect optimization disparity rather than matrix-structured advantages.

Authors: The experiments applied an identical systematic hyperparameter search procedure and budget to Adam as to Muon, SOAP, and SOAP-Muon. The revision will state this explicitly in both Methods and Results, including the search ranges and number of trials per optimizer. This documentation will substantiate that the reported gains arise from optimizer structure rather than unequal tuning. revision: yes

Circularity Check

0 steps flagged

Empirical benchmark with no derivation chain or self-referential reductions

full rationale

The paper is a direct empirical comparison of optimizers (SOAP, Muon, SOAP-Muon vs. Adam) on NequIP and Allegro models under varying supervision regimes. No equations, predictions, or first-principles derivations are presented that could reduce by construction to fitted inputs, self-citations, or ansatzes. All claims rest on reported training runs and metrics; the study is therefore self-contained against external benchmarks with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical benchmarking study. No new free parameters, mathematical axioms, or postulated entities are introduced; the central claim rests on direct performance measurements of existing optimizers.

pith-pipeline@v0.9.1-grok · 5712 in / 1063 out tokens · 25927 ms · 2026-07-03T16:21:40.743200+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

69 extracted references · 25 canonical work pages · 1 internal anchor

[1]

2025 , eprint=

Training a Foundation Model for Materials on a Budget , author=. 2025 , eprint=

2025
[2]

2017 , eprint=

Adam: A Method for Stochastic Optimization , author=. 2017 , eprint=

2017
[3]

2024 , url =

Keller Jordan and Yuchen Jin and Vlado Boza and Jiacheng You and Franz Cesista and Laker Newhouse and Jeremy Bernstein , title =. 2024 , url =

2024
[4]

Proceedings of the 35th International Conference on Machine Learning , pages =

Shampoo: Preconditioned Stochastic Tensor Optimization , author =. Proceedings of the 35th International Conference on Machine Learning , pages =. 2018 , editor =

2018
[5]

2025 , eprint=

SOAP: Improving and Stabilizing Shampoo using Adam , author=. 2025 , eprint=

2025
[6]

2025 , url=

Improving SOAP using iterative whitening and Muon , author=. 2025 , url=

2025
[7]

2024 , eprint=

Old Optimizer, New Norm: An Anthology , author=. 2024 , eprint=

2024
[8]

2019 , eprint=

Decoupled Weight Decay Regularization , author=. 2019 , eprint=

2019
[9]

2025 , eprint=

Revealing the proton slingshot mechanism in solid acid electrolytes through machine learning molecular dynamics , author=. 2025 , eprint=

2025
[10]

2025 , eprint=

Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective , author=. 2025 , eprint=

2025
[11]

Liu, Xiaoqing and Wang, Yangshuai and Zhao, Teng , journal =. Beyond. 2026 , month =. doi:10.1088/3050-287x/ae5078 , issn =

work page doi:10.1088/3050-287x/ae5078 2026
[12]

2013 , publisher=

Molecular electronic-structure theory , author=. 2013 , publisher=

2013
[13]

and Kornbluth, Mordechai and Molinari, Nicola and Smidt, Tess E

Batzner, Simon and Musaelian, Albert and Sun, Lixin and Geiger, Mario and Mailoa, Jonathan P. and Kornbluth, Mordechai and Molinari, Nicola and Smidt, Tess E. and Kozinsky, Boris , journal =. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials , url =. 2022 , month =. doi:10.1038/s41467-022-29939-5 , issn =

work page doi:10.1038/s41467-022-29939-5 2022
[14]

and Kornbluth, Mordechai and Kozinsky, Boris , journal =

Musaelian, Albert and Batzner, Simon and Johansson, Anders and Sun, Lixin and Owen, Cameron J. and Kornbluth, Mordechai and Kozinsky, Boris , journal =. Learning local equivariant representations for large-scale atomistic dynamics , url =. 2023 , month =. doi:10.1038/s41467-023-36329-y , issn =

work page doi:10.1038/s41467-023-36329-y 2023
[15]

Physical Review X , volume=

Graph atomic cluster expansion for semilocal interactions beyond equivariant message passing , author=. Physical Review X , volume=. 2024 , publisher=

2024
[16]

and Galvelis, Raimondas and Herr, John E

Eastman, Peter and Behara, Pavan Kumar and Dotson, David L. and Galvelis, Raimondas and Herr, John E. and Horton, Josh T. and Mao, Yuezhi and Chodera, John D. and Pritchard, Benjamin P. and Wang, Yuanqing and De Fabritiis, Gianni and Markland, Thomas E. , journal =. SPICE,. 2023 , month =. doi:10.1038/s41597-022-01882-6 , issn =

work page doi:10.1038/s41597-022-01882-6 2023
[17]

and Chodera, John D

Eastman, Peter and Pritchard, Benjamin P. and Chodera, John D. and Markland, Thomas E. , journal =. Nutmeg and. 2024 , month =. doi:10.1021/acs.jctc.4c00794 , issn =

work page doi:10.1021/acs.jctc.4c00794 2024
[18]

Multi-fidelity learning for interatomic potentials: low-level forces and high-level energies are all you need

Messerly, Mitchell and Matin, Sakib and Allen, Alice E A and Nebgen, Benjamin and Barros, Kipton and Smith, Justin S and Lubbers, Nicholas and Messerly, Richard , journal =. Multi-fidelity learning for interatomic potentials: low-level forces and high-level energies are all you need. 2025 , month =. doi:10.1088/2632-2153/ae040b , issn =

work page doi:10.1088/2632-2153/ae040b 2025
[19]

2026 , eprint=

A New Paradigm for Computational Chemistry , author=. 2026 , eprint=

2026
[20]

and Brown, Steven D

Blank, Thomas B. and Brown, Steven D. and Calhoun, August W. and Doren, Douglas J. , journal =. Neural network models of potential energy surfaces , url =. 1995 , month =. doi:10.1063/1.469597 , issn =

work page doi:10.1063/1.469597 1995
[21]

Generalized

Behler, J. Generalized. Physical Review Letters , doi =. 2007 , month =

2007
[22]

Gaussian

Bart. Gaussian. Physical Review Letters , doi =. 2010 , month =

2010
[23]

Physical Review B—Condensed Matter and Materials Physics , volume=

On representing chemical environments , author=. Physical Review B—Condensed Matter and Materials Physics , volume=. 2013 , publisher=

2013
[24]

Atomic cluster expansion for accurate and transferable interatomic potentials , author =. Phys. Rev. B , volume =. 2019 , month =. doi:10.1103/PhysRevB.99.014104 , url =

work page doi:10.1103/physrevb.99.014104 2019
[25]

Shapeev, Alexander V. , year=. Moment Tensor Potentials: A Class of Systematically Improvable Interatomic Potentials , volume=. Multiscale Modeling & Simulation , publisher=. doi:10.1137/15m1054183 , number=

work page doi:10.1137/15m1054183
[26]

Deep Potential: A General Representation of a Many-Body Potential Energy Surface , volume=

Han, Jiequn and Zhang, Linfeng and Car, Roberto and E, Weinan , year=. Deep Potential: A General Representation of a Many-Body Potential Energy Surface , volume=. Communications in Computational Physics , publisher=. doi:10.4208/cicp.oa-2017-0213 , number=

work page doi:10.4208/cicp.oa-2017-0213 2017
[27]

SchNet: a continuous-filter convolutional neural network for modeling quantum interactions , year =

Sch\". SchNet: a continuous-filter convolutional neural network for modeling quantum interactions , year =. Proceedings of the 31st International Conference on Neural Information Processing Systems , pages =
[28]

MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields , url =

Batatia, Ilyes and Kovacs, David P and Simm, Gregor and Ortner, Christoph and Csanyi, Gabor , booktitle =. MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields , url =
[29]

The Eleventh International Conference on Learning Representations , year=

Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs , author=. The Eleventh International Conference on Learning Representations , year=
[30]

and Isayev, Olexandr and Roitberg, Adrian E

Smith, Justin S. and Isayev, Olexandr and Roitberg, Adrian E. , journal =. ANI-1,. 2017 , month =. doi:10.1038/sdata.2017.193 , issn =

work page doi:10.1038/sdata.2017.193 2017
[31]

Lawrence and Ulissi, Zachary , journal =

Chanussot, Lowik and Das, Abhishek and Goyal, Siddharth and Lavril, Thibaut and Shuaibi, Muhammed and Riviere, Morgane and Tran, Kevin and Heras-Domingo, Javier and Ho, Caleb and Hu, Weihua and Palizhati, Aini and Sriram, Anuroop and Wood, Brandon and Yoon, Junwoong and Parikh, Devi and Zitnick, C. Lawrence and Ulissi, Zachary , journal =. Open. 2021 , mo...

work page doi:10.1021/acscatal.0c04525 2021
[32]

Nature Machine Intelligence , author=

CHGNet as a pretrained universal neural network potential for charge-informed atomistic modelling , DOI=. Nature Machine Intelligence , author=. 2023 , pages=

2023
[33]

and Dzamba, Misko and Gao, Meng and Rizvi, Ammar and Zitnick, C

Barroso-Luque, Luis and Shuaibi, Muhammed and Fu, Xiang and Wood, Brandon M. and Dzamba, Misko and Gao, Meng and Rizvi, Ammar and Zitnick, C. Lawrence and Ulissi, Zachary W. , doi =. Open. 2024 , publisher =

2024
[34]

and Shuaibi, Muhammed and Spotte-Smith, Evan Walter Clark and Taylor, Michael G

Levine, Daniel S. and Shuaibi, Muhammed and Spotte-Smith, Evan Walter Clark and Taylor, Michael G. and Hasyim, Muhammad R. and Michel, Kyle and Batatia, Ilyes and Cs. The. doi:10.48550/ARXIV.2505.08762 , year =

work page doi:10.48550/arxiv.2505.08762
[35]

, journal =

Huang, Cancan and Rubenstein, Brenda M. , journal =. Machine. 2022 , month =. doi:10.1021/acs.jpca.2c05904 , issn =

work page doi:10.1021/acs.jpca.2c05904 2022
[36]

The Journal of Chemical Physics , doi =

L. The Journal of Chemical Physics , doi =. 2023 , month =

2023
[37]

Engel and Jörg Behler and Christoph Dellago and Michele Ceriotti , title =

Bingqing Cheng and Edgar A. Engel and Jörg Behler and Christoph Dellago and Michele Ceriotti , title =. Proceedings of the National Academy of Sciences , volume =. 2019 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.1815117116 , abstract =

work page doi:10.1073/pnas.1815117116 2019
[38]

and Behler, J

Cheng, Bingqing and Engel, Edgar A. and Behler, J. Ab initio thermodynamics of liquid and solid water , url =. Proceedings of the National Academy of Sciences , doi =. 2019 , month =

2019
[39]

and Bernstein, Noam and Cs

Deringer, Volker L. and Bernstein, Noam and Cs. Origins of structural and electronic transitions in disordered silicon , url =. Nature , doi =. 2021 , month =

2021
[40]

and Aykol, Muratahan and Cheon, Gowoon and Cubuk, Ekin Dogus , journal =

Merchant, Amil and Batzner, Simon and Schoenholz, Samuel S. and Aykol, Muratahan and Cheon, Gowoon and Cubuk, Ekin Dogus , journal =. Scaling deep learning for materials discovery , url =. 2023 , month =. doi:10.1038/s41586-023-06735-9 , issn =

work page doi:10.1038/s41586-023-06735-9 2023
[41]

Scaling the

Kozinsky, Boris and Musaelian, Albert and Johansson, Anders and Batzner, Simon , booktitle =. Scaling the. doi:10.1145/3581784.3627041 , year =

work page doi:10.1145/3581784.3627041
[42]

An efficient sparse kernel generator for O(3)-equivariant deep networks

Bharadwaj, Vivek and Glover, Austin and Bulu c , Ayd n and Demmel, James. An efficient sparse kernel generator for O(3)-equivariant deep networks. 2025 Proceedings of the Conference on Applied and Computational Discrete Algorithms ( ACDA )

2025
[43]

High-performance training and inference for deep equivariant interatomic potentials

Tan, Chuin Wei and Descoteaux, Marc L and Kotak, Mit and de Miranda Nascimento, Gabriel and Kavanagh, Se \'a n R and Zichi, Laura and Wang, Menghang and Saluja, Aadit and Hu, Yizhong R and Smidt, Tess and Johansson, Anders and Witt, William C and Kozinsky, Boris and Musaelian, Albert. High-performance training and inference for deep equivariant interatomi...
[44]

2024 , eprint=

OpenQDC: Open Quantum Data Commons , author=. 2024 , eprint=

2024
[45]

Amari, Shun-ichi , journal =. Natural. 1998 , month =. doi:10.1162/089976698300017746 , issn =

work page doi:10.1162/089976698300017746 1998
[46]

and Chisholm, Calum R

Haile, Sossina M. and Chisholm, Calum R. I. and Sasaki, Kenji and Boysen, Dane A. and Uda, Tetsuya. Solid acid proton conductors: from laboratory curiosities to fuel cell electrolytes. Faraday Discuss. 2007. doi:10.1039/B604311A

work page doi:10.1039/b604311a 2007
[47]

and Tyagi, Madhusudan and Segalman, Rachel A

Hoarfrost, Megan L. and Tyagi, Madhusudan and Segalman, Rachel A. and Reimer, Jeffrey A. , title =. The Journal of Physical Chemistry B , volume =. 2012 , doi =

2012
[48]

2008 , issn =

Proton dynamics of CsH2PO4 studied by quasi-elastic neutron scattering and PFG-NMR , journal =. 2008 , issn =. doi:https://doi.org/10.1016/j.ssi.2008.10.002 , url =

work page doi:10.1016/j.ssi.2008.10.002 2008
[49]

Skinner, L. B. and Benmore, C. J. and Neuefeind, J. C. and Parise, J. B. , title =. The Journal of Chemical Physics , volume =. 2014 , month =. doi:10.1063/1.4902412 , url =

work page doi:10.1063/1.4902412 2014
[50]

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Noam Shazeer and Mitchell Stern , title =. CoRR , volume =. 2018 , url =. 1804.04235 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2018
[51]

Muon with

Lu, Binghang and Zhang, Jiahao and Lin, Guang , doi =. Muon with. 2026 , publisher =

2026
[52]

Advances in Neural Information Processing Systems , volume=

Smooth, exact rotational symmetrization for deep learning on point clouds , author=. Advances in Neural Information Processing Systems , volume=
[53]

Advances in Neural Information Processing Systems , volume=

The importance of being scalable: Improving the speed and accuracy of neural network interatomic potentials across chemical domains , author=. Advances in Neural Information Processing Systems , volume=
[54]

arXiv preprint arXiv:2603.06567 , year=

A recipe for scalable attention-based MLIPs: unlocking long-range accuracy with all-to-all node attention , author=. arXiv preprint arXiv:2603.06567 , year=

work page arXiv
[55]

arXiv preprint arXiv:2506.23971 , year=

Uma: A family of universal models for atoms , author=. arXiv preprint arXiv:2506.23971 , year=

work page arXiv
[56]

Nature Communications , year=

Optimizing cross-domain transfer for universal machine learning interatomic potentials , author=. Nature Communications , year=
[57]

arXiv preprint arXiv:2210.07237 , year=

Forces are not enough: Benchmark and critical evaluation for machine learning force fields with molecular simulations , author=. arXiv preprint arXiv:2210.07237 , year=

work page arXiv
[58]

Nature communications , volume=

Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning , author=. Nature communications , volume=. 2019 , publisher=

2019
[59]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
[60]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

A Stable Whitening Optimizer for Efficient Neural Network Training , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
[61]

2025 , eprint=

DyKAF: Dynamical Kronecker Approximation of the Fisher Information Matrix for Gradient Preconditioning , author=. 2025 , eprint=

2025
[62]

Training Deep Learning Models with Norm-Constrained

Thomas Pethick and Wanyun Xie and Kimon Antonakopoulos and Zhenyu Zhu and Antonio Silveti-Falls and Volkan Cevher , booktitle=. Training Deep Learning Models with Norm-Constrained. 2025 , url=

2025
[63]

From Muon to Gluon: Bridging Theory and Practice of

Artem Riabinin and Egor Shulgin and Kaja Gruntkowska and Peter Richt. From Muon to Gluon: Bridging Theory and Practice of. 2026 , url=

2026
[64]

Lau, Tim Tsz-Kit and Qi Long and Weijie Su , year=
[65]

2026 , url=

Liming Liu and Zhenghao Xu and Zixuan Zhang and Hao Kang and Zichong Li and Chen Liang and Weizhu Chen and Tuo Zhao , booktitle=. 2026 , url=

2026
[66]

CoRR , volume=

Chongjie Si and Debing Zhang and Wei Shen , title=. CoRR , volume=. 2025 , month=

2025
[67]

2026 , eprint=

Mousse: Rectifying the Geometry of Muon with Curvature-Aware Preconditioning , author=. 2026 , eprint=

2026
[68]

2026 , eprint=

The Newton-Muon Optimizer , author=. 2026 , eprint=

2026
[69]

Liu, Jingyuan and Su, Jianlin and Yao, Xingcheng and Jiang, Zhejun and Lai, Guokun and Du, Yulun and Qin, Yidao and Xu, Weixin and Lu, Enzhe and Yan, Junjie and Chen, Yanru and Zheng, Huabin and Liu, Yibo and Liu, Shaowei and Yin, Bohong and He, Weiran and Zhu, Han and Wang, Yuzhi and Wang, Jianzhou and Dong, Mengnan and Zhang, Zheng and Kang, Yongsheng a...

2025

[1] [1]

2025 , eprint=

Training a Foundation Model for Materials on a Budget , author=. 2025 , eprint=

2025

[2] [2]

2017 , eprint=

Adam: A Method for Stochastic Optimization , author=. 2017 , eprint=

2017

[3] [3]

2024 , url =

Keller Jordan and Yuchen Jin and Vlado Boza and Jiacheng You and Franz Cesista and Laker Newhouse and Jeremy Bernstein , title =. 2024 , url =

2024

[4] [4]

Proceedings of the 35th International Conference on Machine Learning , pages =

Shampoo: Preconditioned Stochastic Tensor Optimization , author =. Proceedings of the 35th International Conference on Machine Learning , pages =. 2018 , editor =

2018

[5] [5]

2025 , eprint=

SOAP: Improving and Stabilizing Shampoo using Adam , author=. 2025 , eprint=

2025

[6] [6]

2025 , url=

Improving SOAP using iterative whitening and Muon , author=. 2025 , url=

2025

[7] [7]

2024 , eprint=

Old Optimizer, New Norm: An Anthology , author=. 2024 , eprint=

2024

[8] [8]

2019 , eprint=

Decoupled Weight Decay Regularization , author=. 2019 , eprint=

2019

[9] [9]

2025 , eprint=

Revealing the proton slingshot mechanism in solid acid electrolytes through machine learning molecular dynamics , author=. 2025 , eprint=

2025

[10] [10]

2025 , eprint=

Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective , author=. 2025 , eprint=

2025

[11] [11]

Liu, Xiaoqing and Wang, Yangshuai and Zhao, Teng , journal =. Beyond. 2026 , month =. doi:10.1088/3050-287x/ae5078 , issn =

work page doi:10.1088/3050-287x/ae5078 2026

[12] [12]

2013 , publisher=

Molecular electronic-structure theory , author=. 2013 , publisher=

2013

[13] [13]

and Kornbluth, Mordechai and Molinari, Nicola and Smidt, Tess E

Batzner, Simon and Musaelian, Albert and Sun, Lixin and Geiger, Mario and Mailoa, Jonathan P. and Kornbluth, Mordechai and Molinari, Nicola and Smidt, Tess E. and Kozinsky, Boris , journal =. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials , url =. 2022 , month =. doi:10.1038/s41467-022-29939-5 , issn =

work page doi:10.1038/s41467-022-29939-5 2022

[14] [14]

and Kornbluth, Mordechai and Kozinsky, Boris , journal =

Musaelian, Albert and Batzner, Simon and Johansson, Anders and Sun, Lixin and Owen, Cameron J. and Kornbluth, Mordechai and Kozinsky, Boris , journal =. Learning local equivariant representations for large-scale atomistic dynamics , url =. 2023 , month =. doi:10.1038/s41467-023-36329-y , issn =

work page doi:10.1038/s41467-023-36329-y 2023

[15] [15]

Physical Review X , volume=

Graph atomic cluster expansion for semilocal interactions beyond equivariant message passing , author=. Physical Review X , volume=. 2024 , publisher=

2024

[16] [16]

and Galvelis, Raimondas and Herr, John E

Eastman, Peter and Behara, Pavan Kumar and Dotson, David L. and Galvelis, Raimondas and Herr, John E. and Horton, Josh T. and Mao, Yuezhi and Chodera, John D. and Pritchard, Benjamin P. and Wang, Yuanqing and De Fabritiis, Gianni and Markland, Thomas E. , journal =. SPICE,. 2023 , month =. doi:10.1038/s41597-022-01882-6 , issn =

work page doi:10.1038/s41597-022-01882-6 2023

[17] [17]

and Chodera, John D

Eastman, Peter and Pritchard, Benjamin P. and Chodera, John D. and Markland, Thomas E. , journal =. Nutmeg and. 2024 , month =. doi:10.1021/acs.jctc.4c00794 , issn =

work page doi:10.1021/acs.jctc.4c00794 2024

[18] [18]

Multi-fidelity learning for interatomic potentials: low-level forces and high-level energies are all you need

Messerly, Mitchell and Matin, Sakib and Allen, Alice E A and Nebgen, Benjamin and Barros, Kipton and Smith, Justin S and Lubbers, Nicholas and Messerly, Richard , journal =. Multi-fidelity learning for interatomic potentials: low-level forces and high-level energies are all you need. 2025 , month =. doi:10.1088/2632-2153/ae040b , issn =

work page doi:10.1088/2632-2153/ae040b 2025

[19] [19]

2026 , eprint=

A New Paradigm for Computational Chemistry , author=. 2026 , eprint=

2026

[20] [20]

and Brown, Steven D

Blank, Thomas B. and Brown, Steven D. and Calhoun, August W. and Doren, Douglas J. , journal =. Neural network models of potential energy surfaces , url =. 1995 , month =. doi:10.1063/1.469597 , issn =

work page doi:10.1063/1.469597 1995

[21] [21]

Generalized

Behler, J. Generalized. Physical Review Letters , doi =. 2007 , month =

2007

[22] [22]

Gaussian

Bart. Gaussian. Physical Review Letters , doi =. 2010 , month =

2010

[23] [23]

Physical Review B—Condensed Matter and Materials Physics , volume=

On representing chemical environments , author=. Physical Review B—Condensed Matter and Materials Physics , volume=. 2013 , publisher=

2013

[24] [24]

Atomic cluster expansion for accurate and transferable interatomic potentials , author =. Phys. Rev. B , volume =. 2019 , month =. doi:10.1103/PhysRevB.99.014104 , url =

work page doi:10.1103/physrevb.99.014104 2019

[25] [25]

Shapeev, Alexander V. , year=. Moment Tensor Potentials: A Class of Systematically Improvable Interatomic Potentials , volume=. Multiscale Modeling & Simulation , publisher=. doi:10.1137/15m1054183 , number=

work page doi:10.1137/15m1054183

[26] [26]

Deep Potential: A General Representation of a Many-Body Potential Energy Surface , volume=

Han, Jiequn and Zhang, Linfeng and Car, Roberto and E, Weinan , year=. Deep Potential: A General Representation of a Many-Body Potential Energy Surface , volume=. Communications in Computational Physics , publisher=. doi:10.4208/cicp.oa-2017-0213 , number=

work page doi:10.4208/cicp.oa-2017-0213 2017

[27] [27]

SchNet: a continuous-filter convolutional neural network for modeling quantum interactions , year =

Sch\". SchNet: a continuous-filter convolutional neural network for modeling quantum interactions , year =. Proceedings of the 31st International Conference on Neural Information Processing Systems , pages =

[28] [28]

MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields , url =

Batatia, Ilyes and Kovacs, David P and Simm, Gregor and Ortner, Christoph and Csanyi, Gabor , booktitle =. MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields , url =

[29] [29]

The Eleventh International Conference on Learning Representations , year=

Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs , author=. The Eleventh International Conference on Learning Representations , year=

[30] [30]

and Isayev, Olexandr and Roitberg, Adrian E

Smith, Justin S. and Isayev, Olexandr and Roitberg, Adrian E. , journal =. ANI-1,. 2017 , month =. doi:10.1038/sdata.2017.193 , issn =

work page doi:10.1038/sdata.2017.193 2017

[31] [31]

Lawrence and Ulissi, Zachary , journal =

Chanussot, Lowik and Das, Abhishek and Goyal, Siddharth and Lavril, Thibaut and Shuaibi, Muhammed and Riviere, Morgane and Tran, Kevin and Heras-Domingo, Javier and Ho, Caleb and Hu, Weihua and Palizhati, Aini and Sriram, Anuroop and Wood, Brandon and Yoon, Junwoong and Parikh, Devi and Zitnick, C. Lawrence and Ulissi, Zachary , journal =. Open. 2021 , mo...

work page doi:10.1021/acscatal.0c04525 2021

[32] [32]

Nature Machine Intelligence , author=

CHGNet as a pretrained universal neural network potential for charge-informed atomistic modelling , DOI=. Nature Machine Intelligence , author=. 2023 , pages=

2023

[33] [33]

and Dzamba, Misko and Gao, Meng and Rizvi, Ammar and Zitnick, C

Barroso-Luque, Luis and Shuaibi, Muhammed and Fu, Xiang and Wood, Brandon M. and Dzamba, Misko and Gao, Meng and Rizvi, Ammar and Zitnick, C. Lawrence and Ulissi, Zachary W. , doi =. Open. 2024 , publisher =

2024

[34] [34]

and Shuaibi, Muhammed and Spotte-Smith, Evan Walter Clark and Taylor, Michael G

Levine, Daniel S. and Shuaibi, Muhammed and Spotte-Smith, Evan Walter Clark and Taylor, Michael G. and Hasyim, Muhammad R. and Michel, Kyle and Batatia, Ilyes and Cs. The. doi:10.48550/ARXIV.2505.08762 , year =

work page doi:10.48550/arxiv.2505.08762

[35] [35]

, journal =

Huang, Cancan and Rubenstein, Brenda M. , journal =. Machine. 2022 , month =. doi:10.1021/acs.jpca.2c05904 , issn =

work page doi:10.1021/acs.jpca.2c05904 2022

[36] [36]

The Journal of Chemical Physics , doi =

L. The Journal of Chemical Physics , doi =. 2023 , month =

2023

[37] [37]

Engel and Jörg Behler and Christoph Dellago and Michele Ceriotti , title =

Bingqing Cheng and Edgar A. Engel and Jörg Behler and Christoph Dellago and Michele Ceriotti , title =. Proceedings of the National Academy of Sciences , volume =. 2019 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.1815117116 , abstract =

work page doi:10.1073/pnas.1815117116 2019

[38] [38]

and Behler, J

Cheng, Bingqing and Engel, Edgar A. and Behler, J. Ab initio thermodynamics of liquid and solid water , url =. Proceedings of the National Academy of Sciences , doi =. 2019 , month =

2019

[39] [39]

and Bernstein, Noam and Cs

Deringer, Volker L. and Bernstein, Noam and Cs. Origins of structural and electronic transitions in disordered silicon , url =. Nature , doi =. 2021 , month =

2021

[40] [40]

and Aykol, Muratahan and Cheon, Gowoon and Cubuk, Ekin Dogus , journal =

Merchant, Amil and Batzner, Simon and Schoenholz, Samuel S. and Aykol, Muratahan and Cheon, Gowoon and Cubuk, Ekin Dogus , journal =. Scaling deep learning for materials discovery , url =. 2023 , month =. doi:10.1038/s41586-023-06735-9 , issn =

work page doi:10.1038/s41586-023-06735-9 2023

[41] [41]

Scaling the

Kozinsky, Boris and Musaelian, Albert and Johansson, Anders and Batzner, Simon , booktitle =. Scaling the. doi:10.1145/3581784.3627041 , year =

work page doi:10.1145/3581784.3627041

[42] [42]

An efficient sparse kernel generator for O(3)-equivariant deep networks

Bharadwaj, Vivek and Glover, Austin and Bulu c , Ayd n and Demmel, James. An efficient sparse kernel generator for O(3)-equivariant deep networks. 2025 Proceedings of the Conference on Applied and Computational Discrete Algorithms ( ACDA )

2025

[43] [43]

High-performance training and inference for deep equivariant interatomic potentials

Tan, Chuin Wei and Descoteaux, Marc L and Kotak, Mit and de Miranda Nascimento, Gabriel and Kavanagh, Se \'a n R and Zichi, Laura and Wang, Menghang and Saluja, Aadit and Hu, Yizhong R and Smidt, Tess and Johansson, Anders and Witt, William C and Kozinsky, Boris and Musaelian, Albert. High-performance training and inference for deep equivariant interatomi...

[44] [44]

2024 , eprint=

OpenQDC: Open Quantum Data Commons , author=. 2024 , eprint=

2024

[45] [45]

Amari, Shun-ichi , journal =. Natural. 1998 , month =. doi:10.1162/089976698300017746 , issn =

work page doi:10.1162/089976698300017746 1998

[46] [46]

and Chisholm, Calum R

Haile, Sossina M. and Chisholm, Calum R. I. and Sasaki, Kenji and Boysen, Dane A. and Uda, Tetsuya. Solid acid proton conductors: from laboratory curiosities to fuel cell electrolytes. Faraday Discuss. 2007. doi:10.1039/B604311A

work page doi:10.1039/b604311a 2007

[47] [47]

and Tyagi, Madhusudan and Segalman, Rachel A

Hoarfrost, Megan L. and Tyagi, Madhusudan and Segalman, Rachel A. and Reimer, Jeffrey A. , title =. The Journal of Physical Chemistry B , volume =. 2012 , doi =

2012

[48] [48]

2008 , issn =

Proton dynamics of CsH2PO4 studied by quasi-elastic neutron scattering and PFG-NMR , journal =. 2008 , issn =. doi:https://doi.org/10.1016/j.ssi.2008.10.002 , url =

work page doi:10.1016/j.ssi.2008.10.002 2008

[49] [49]

Skinner, L. B. and Benmore, C. J. and Neuefeind, J. C. and Parise, J. B. , title =. The Journal of Chemical Physics , volume =. 2014 , month =. doi:10.1063/1.4902412 , url =

work page doi:10.1063/1.4902412 2014

[50] [50]

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Noam Shazeer and Mitchell Stern , title =. CoRR , volume =. 2018 , url =. 1804.04235 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2018

[51] [51]

Muon with

Lu, Binghang and Zhang, Jiahao and Lin, Guang , doi =. Muon with. 2026 , publisher =

2026

[52] [52]

Advances in Neural Information Processing Systems , volume=

Smooth, exact rotational symmetrization for deep learning on point clouds , author=. Advances in Neural Information Processing Systems , volume=

[53] [53]

Advances in Neural Information Processing Systems , volume=

The importance of being scalable: Improving the speed and accuracy of neural network interatomic potentials across chemical domains , author=. Advances in Neural Information Processing Systems , volume=

[54] [54]

arXiv preprint arXiv:2603.06567 , year=

A recipe for scalable attention-based MLIPs: unlocking long-range accuracy with all-to-all node attention , author=. arXiv preprint arXiv:2603.06567 , year=

work page arXiv

[55] [55]

arXiv preprint arXiv:2506.23971 , year=

Uma: A family of universal models for atoms , author=. arXiv preprint arXiv:2506.23971 , year=

work page arXiv

[56] [56]

Nature Communications , year=

Optimizing cross-domain transfer for universal machine learning interatomic potentials , author=. Nature Communications , year=

[57] [57]

arXiv preprint arXiv:2210.07237 , year=

Forces are not enough: Benchmark and critical evaluation for machine learning force fields with molecular simulations , author=. arXiv preprint arXiv:2210.07237 , year=

work page arXiv

[58] [58]

Nature communications , volume=

Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning , author=. Nature communications , volume=. 2019 , publisher=

2019

[59] [59]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

[60] [60]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

A Stable Whitening Optimizer for Efficient Neural Network Training , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

[61] [61]

2025 , eprint=

DyKAF: Dynamical Kronecker Approximation of the Fisher Information Matrix for Gradient Preconditioning , author=. 2025 , eprint=

2025

[62] [62]

Training Deep Learning Models with Norm-Constrained

Thomas Pethick and Wanyun Xie and Kimon Antonakopoulos and Zhenyu Zhu and Antonio Silveti-Falls and Volkan Cevher , booktitle=. Training Deep Learning Models with Norm-Constrained. 2025 , url=

2025

[63] [63]

From Muon to Gluon: Bridging Theory and Practice of

Artem Riabinin and Egor Shulgin and Kaja Gruntkowska and Peter Richt. From Muon to Gluon: Bridging Theory and Practice of. 2026 , url=

2026

[64] [64]

Lau, Tim Tsz-Kit and Qi Long and Weijie Su , year=

[65] [65]

2026 , url=

Liming Liu and Zhenghao Xu and Zixuan Zhang and Hao Kang and Zichong Li and Chen Liang and Weizhu Chen and Tuo Zhao , booktitle=. 2026 , url=

2026

[66] [66]

CoRR , volume=

Chongjie Si and Debing Zhang and Wei Shen , title=. CoRR , volume=. 2025 , month=

2025

[67] [67]

2026 , eprint=

Mousse: Rectifying the Geometry of Muon with Curvature-Aware Preconditioning , author=. 2026 , eprint=

2026

[68] [68]

2026 , eprint=

The Newton-Muon Optimizer , author=. 2026 , eprint=

2026

[69] [69]

Liu, Jingyuan and Su, Jianlin and Yao, Xingcheng and Jiang, Zhejun and Lai, Guokun and Du, Yulun and Qin, Yidao and Xu, Weixin and Lu, Enzhe and Yan, Junjie and Chen, Yanru and Zheng, Huabin and Liu, Yibo and Liu, Shaowei and Yin, Bohong and He, Weiran and Zhu, Han and Wang, Yuzhi and Wang, Jianzhou and Dong, Mengnan and Zhang, Zheng and Kang, Yongsheng a...

2025