pith. sign in

arxiv: 2606.12704 · v1 · pith:NIWATT5Qnew · submitted 2026-06-10 · ⚛️ physics.chem-ph · cond-mat.mtrl-sci

Fine-tuning MLIP foundation models: strategies for accuracy and transferability

Pith reviewed 2026-06-27 07:36 UTC · model grok-4.3

classification ⚛️ physics.chem-ph cond-mat.mtrl-sci
keywords fine-tuningmachine-learned interatomic potentialsfoundation modelstransferabilityrobustnessLoRAreplay methodsequivariant models
0
0 comments X

The pith

Foundation model quality and hyperparameters matter more than fine-tuning method, but only multihead replay preserves out-of-distribution robustness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates seven fine-tuning strategies for machine-learned interatomic potential foundation models on five chemically diverse benchmarks. It shows that starting model quality, reference energy setup, and hyperparameter choices have bigger effects on results than the adaptation method chosen. With those basics in place, most strategies reach strong accuracy on the target task and beat models trained from scratch. The key difference appears in scope: simple full updates work best for one narrow system, while multihead replay with either original or generated data is the only method that keeps accuracy on the original data distribution and maintains short-range repulsions.

Core claim

Foundation model quality, correct E0 initialisation, and well-chosen hyperparameters are prerequisites whose impact routinely exceeds that of the fine-tuning strategy itself. Once these prerequisites are met, most strategies achieve strong target-task accuracy, consistently surpassing models trained from scratch. The practical distinction depends on deployment scope: naive fine-tuning offers the best convergence for single-system applications, while multihead replay -- with either original or pseudolabelled data -- is the only approach tested that consistently preserves out-of-distribution robustness, maintaining both pretraining-distribution accuracy for broader deployment and many-body sho

What carries the argument

Comparison of seven fine-tuning strategies (naive full-parameter updates, layer freezing, LoRA for equivariant layers, multihead replay, pseudolabelled replay, and replay plus LoRA) implemented in MACE and tested across benchmarks spanning aqueous NaCl, ice, SN2 reactions, biomolecules, and electrolytes.

If this is right

  • Naive fine-tuning gives fastest convergence when the goal is accuracy on one target system only.
  • Multihead replay maintains accuracy on the original pretraining distribution for models intended for wider use.
  • Pseudolabelled replay achieves similar robustness without needing the original pretraining corpus.
  • Proper model-aware E0 reestimation improves all fine-tuning workflows.
  • Equivariant LoRA enables parameter-efficient adaptation without breaking message-passing structure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same prerequisite emphasis may apply when fine-tuning other equivariant architectures.
  • Resources spent improving foundation models could yield larger gains than refining adaptation methods.
  • Extending the replay approach to longer-range interactions or periodic systems would test its generality.
  • The decoupling of replay data via pseudolabelling opens the possibility of using synthetic data for continual learning.

Load-bearing premise

The five chemically diverse benchmarks sufficiently represent the range of chemical environments where fine-tuning robustness matters.

What would settle it

A new benchmark or deployment setting in which multihead replay loses out-of-distribution accuracy or short-range repulsion while naive fine-tuning succeeds would falsify the reported distinction between strategies.

Figures

Figures reproduced from arXiv: 2606.12704 by Alin M. Elena, Eszter Varga-Umbrich, G\'abor Cs\'anyi, Ilyes Batatia, Noam Bernstein, Tam\'as Lajos Tompa.

Figure 1
Figure 1. Figure 1: Overview of fine-tuning strategies evaluated. Schematic representation of from-scratch model training and the fine-tuning methods benchmarked in this work: naive fine-tuning (full parameter updates), layer freezing, low-rank adaptation (LoRA), multihead replay, pseudolabelled replay, and replay combined with LoRA. Here we hypothesise that early fine-tuning failures were primarily the result of weaker found… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the five main benchmark systems used in this work. Representative structures and key dataset attributes for each benchmark, ordered by increasing departure from OMat24’s core domain of periodic inorganic bulk materials: the argyrodite Li6PS5Cl is an inorganic periodic solid most aligned with the pretraining data, while aqueous NaCl, ice polymorphs, SN2 reactions, and SPICE biomolecules progress… view at source ↗
Figure 3
Figure 3. Figure 3: Foundation model quality strongly shapes fine-tuning outcomes. (a) Lithium elec￾trolytes: Multihead-pseudolabel fine-tuning of several MACE foundation models on the lithium solid elec￾trolyte system. Violin plots show force and energy errors on two held-out evaluation sets, both unseen during fine-tuning: other argyrodite compositions (top row) and non-argyrodite structures (bottom row). (b) SN2 reaction: … view at source ↗
Figure 4
Figure 4. Figure 4: E0 initialisation can exceed method-level differences. (a) Lithium electrolyte (LPSC): reestimated versus averaged E0s for Naive and LoRA fine-tuning with the MACE-OMat-0-medium founda￾tion. Models trained with averaged E0s exhibit substantially higher errors and wider distributions on both held-out test sets (other argyrodites and non-argyrodites, both unseen during fine-tuning). (b) SN2 reac￾tion: NEB an… view at source ↗
Figure 5
Figure 5. Figure 5: Impact of E0 initialisation on ice cross-phase learning with MACE-OMat-0-medium. Force RMSE [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Narrow-task fine-tuning results. a, Cl–O RDFs at 10% training data, averaged over three random seeds. b, Cl–O RDFs at 100% training data. c, SN2 nudged elastic band energy profiles for models trained only on reactant/product configurations, excluding the transition state. Relative energy (eV) along the reaction coordinate is compared to the MP2 reference (dashed red). Naive and LoRA closely reproduce the r… view at source ↗
Figure 7
Figure 7. Figure 7: Near-transfer within related chemical families. [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: SPICE test set errors with 1 configuration per molecule training (19 687 configurations total). En [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: SPICE torsion accuracy across training methods. [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Foundation-model quality preservation after fine-tuning. [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Random structure search summary for fine-tuned NaCl models. Heat maps report the fraction of [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Loss-weight schedule comparison on ice Ih. Comparison of constant target-task loss weights (λE = 10, λF = 10 throughout training, “baseline”) against two-stage schedules in which λF is dominant in stage one and λE is increased in stage two (starting at epoch 500), with and without a reduced learning rate in stage two. a,b,c,d, Naive fine-tuning of MACE-OMat-0-medium: final validation energy and force RMSE… view at source ↗
Figure 13
Figure 13. Figure 13: Multihead replay is sensitive to the learning rate (LPSC). Multihead replay fine-tuning of MACE-OMat-0-medium on the lithium electrolyte (LPSC) system with pseudolabelled MPTraj replay, swept over learning rates {10−4 , 10−3 , 10−2}. Training and validation loss curves for the target head (“Default”) and the replay head (“pt head”) as a function of epoch. At the higher learning rates the loss on the repla… view at source ↗
Figure 14
Figure 14. Figure 14: Multihead replay learning-rate sensitivity on the SN2 system. Multihead replay fine￾tuning of MACE-OMat-0-medium on the SN2 reaction dataset, swept over learning rates {10−4 , 10−3 , 10−2} for two pseudolabelled replay-set choices: an OMat24 element-matched subsample and a combined OMat24 + 10 000-structure MPTraj subsample. Training and validation losses for the target head (“Default”) and replay head (“… view at source ↗
Figure 15
Figure 15. Figure 15: Effect of replay-data composition on multihead fine-tuning. Comparison of three replay￾set constructions on the lithium electrolyte system: element-matched subsampling of the OMat24 pretraining corpus, a random 10 000-structure subsample drawn from MPTraj, and a combination of the two. In all three cases the replay labels are obtained by pseudolabelling the structures with the OMat24 foundation model. Tar… view at source ↗
Figure 16
Figure 16. Figure 16: Replay-strategy comparison on NaCl PES holes. Fraction of flagged RSS structures (with absolute change relative to the foundation baseline before fine-tuning shown in parentheses) at 0.1 GPa and 50 GPa for Scratch, Naive, and three multihead-pseudolabel replay-set choices (element-matched OMat24 subsample, random MPTraj subsample, combination of the two), all on 100% of the NaCl training data. All three r… view at source ↗
Figure 17
Figure 17. Figure 17: Foundation models versus fine-tuning methods on the lithium electrolyte system. Violin plots of energy MAE (left) and force MAE (right) on the two out-of-distribution evaluation sets used in the main text—other argyrodite compositions (top row) and non-argyrodite structures (bottom row)—for the MP0 and OMat foundation families. Within each foundation, the panels show the foundation baseline before fine-tu… view at source ↗
Figure 18
Figure 18. Figure 18: Effect of foundation-model capacity on ice cross-phase learning with naive fine-tuning on 100 con [PITH_FULL_IMAGE:figures/full_fig_p035_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Single-phase versus combined ice fine-tuning across methods. Energy RMSE (left, eV/atom) and force RMSE (right, eV/˚A) for ice Ih, II, VI, and VIII, comparing single-phase fine-tuning with a reference model trained on all four phases jointly. Results are shown for Naive, LoRA, and Pseudolabel fine-tuning of MACE-OMat-0-medium. Dashed vertical lines separate the single-phase evaluation blocks from the comb… view at source ↗
Figure 20
Figure 20. Figure 20: Full aqueous-NaCl RDF and EMD panel. a–c, Cl–O RDFs at 10%, 10%-10× epochs, and 100% training data. d–f, corresponding Na–O RDFs. g–i, EMD to the BPNN reference for each method at each training-data condition. Fine-tuning methods saturate near the reference already at 10% data; from￾scratch training remains substantially worse even at 100%. 37 [PITH_FULL_IMAGE:figures/full_fig_p037_20.png] view at source ↗
read the original abstract

Adapting machine-learned interatomic potential (MLIP) foundation models to specialised tasks through fine-tuning is an increasingly important practice, yet systematic guidance on when and how to fine-tune is currently limited. We evaluate seven fine-tuning strategies -- naive full-parameter updates, two layer-freezing variants, Low-Rank Adaptation (LoRA), multihead replay, pseudolabelled replay, and replay combined with LoRA -- across five chemically diverse benchmarks (aqueous NaCl, ice polymorphs, S$_\mathrm{N}$2 reactions, SPICE biomolecules, and lithium electrolytes), three generations of foundation models, and training sets spanning five orders of magnitude. To support this evaluation we implement three capabilities in the MACE codebase: LoRA adapted for equivariant message-passing architectures, including both scalar and equivariant linear layers; pseudolabelled replay, which decouples the replay data source from the original pretraining corpus; and model-aware atomic reference energy (E0) reestimation for fine-tuning workflows. We find that foundation model quality, correct E0 initialisation, and well-chosen hyperparameters are prerequisites whose impact routinely exceeds that of the fine-tuning strategy itself. Once these prerequisites are met, most strategies achieve strong target-task accuracy, consistently surpassing models trained from scratch. The practical distinction depends on deployment scope: naive fine-tuning offers the best convergence for single-system applications, while multihead replay -- with either original or pseudolabelled data -- is the only approach tested that consistently preserves out-of-distribution robustness, maintaining both pretraining-distribution accuracy for broader deployment and many-body short-range repulsion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript evaluates seven fine-tuning strategies (naive full-parameter, layer-freezing variants, LoRA, multihead replay, pseudolabelled replay, and replay+LoRA) for MLIP foundation models across five chemically diverse benchmarks (aqueous NaCl, ice polymorphs, SN2 reactions, SPICE biomolecules, lithium electrolytes), three model generations, and training-set sizes spanning five orders of magnitude. It reports that foundation-model quality, correct E0 initialization, and hyperparameter choice are prerequisites whose impact exceeds that of strategy choice; once met, most strategies achieve strong target-task accuracy (surpassing from-scratch training), while only multihead replay (original or pseudolabelled) consistently preserves out-of-distribution robustness and many-body short-range repulsion.

Significance. If the central empirical findings hold after addressing the noted concerns, the work supplies practical, deployment-scope-dependent guidance for fine-tuning MLIPs and contributes three reusable capabilities (equivariant LoRA, pseudolabelled replay, model-aware E0 reestimation) to the open MACE codebase. The scale of the benchmarking (multiple models, benchmarks, and data regimes) strengthens the evidence base for when naive fine-tuning suffices versus when replay is required.

major comments (2)
  1. [Methods (LoRA adaptation)] Methods section describing the LoRA implementation for equivariant message-passing: the manuscript states that LoRA was adapted for both scalar and equivariant linear layers but provides no explicit verification (e.g., numerical tests of energy invariance or force covariance under SO(3) rotations, or weight-tying arguments) that the adaptation preserves the underlying equivariance. Because the central ranking of strategies includes LoRA variants, any symmetry-breaking artifact would confound the attribution of performance differences to the replay versus naive distinction.
  2. [Results (prerequisite impact)] Results section on prerequisite versus strategy impact: the claim that model quality, E0 initialization, and hyperparameters routinely exceed strategy effects is load-bearing for the practical recommendations, yet the manuscript does not report quantitative effect-size comparisons (e.g., variance partitioning or ablation tables) across the five-order-of-magnitude training-size range that would allow readers to assess the relative magnitudes directly.
minor comments (2)
  1. [Abstract] Abstract: the chemical formula notation S$_ ext{N}$2 may not render consistently; standard subscript formatting (S_N2) would improve clarity.
  2. [Benchmark description] The five benchmarks are chemically diverse, but the manuscript could briefly note any chemical environments (e.g., transition-metal catalysis or extended solids) that remain outside the tested scope to help readers gauge transferability limits.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive feedback on our manuscript. We address each major comment below, proposing revisions to strengthen the work where appropriate while maintaining the integrity of our empirical findings.

read point-by-point responses
  1. Referee: Methods (LoRA adaptation)] Methods section describing the LoRA implementation for equivariant message-passing: the manuscript states that LoRA was adapted for both scalar and equivariant linear layers but provides no explicit verification (e.g., numerical tests of energy invariance or force covariance under SO(3) rotations, or weight-tying arguments) that the adaptation preserves the underlying equivariance. Because the central ranking of strategies includes LoRA variants, any symmetry-breaking artifact would confound the attribution of performance differences to the replay versus naive distinction.

    Authors: We agree that explicit verification of equivariance preservation would strengthen the manuscript and eliminate any potential concern about symmetry-breaking artifacts. Although the LoRA adaptation applies low-rank updates to the weights of both scalar and equivariant linear layers while preserving the tensorial structure of the message-passing operations (thereby maintaining equivariance by construction), we will add numerical tests in the revised Methods section. These will demonstrate that energies remain invariant and forces transform covariantly under random SO(3) rotations for models employing the equivariant LoRA implementation. This addition will directly address the referee's concern without altering our reported results. revision: yes

  2. Referee: Results (prerequisite impact)] Results section on prerequisite versus strategy impact: the claim that model quality, E0 initialization, and hyperparameters routinely exceed strategy effects is load-bearing for the practical recommendations, yet the manuscript does not report quantitative effect-size comparisons (e.g., variance partitioning or ablation tables) across the five-order-of-magnitude training-size range that would allow readers to assess the relative magnitudes directly.

    Authors: We acknowledge that while our conclusions are based on consistent patterns observed across all five benchmarks, three model generations, and the full range of training-set sizes, we did not include formal quantitative effect-size comparisons such as variance partitioning. To enable readers to directly assess relative magnitudes, we will add an ablation table (or supplementary figure) in the revised Results section that reports mean absolute errors for variations in foundation-model quality, E0 initialization, and hyperparameters versus those arising from fine-tuning strategy choice, stratified by training-set size. This will provide the requested quantitative support for our claim that prerequisite factors routinely dominate strategy effects. revision: yes

Circularity Check

0 steps flagged

Empirical benchmarking study with no circular derivations or self-referential predictions

full rationale

The paper is a comparative empirical study evaluating seven fine-tuning strategies on five benchmarks using implemented extensions to the MACE codebase. No equations, predictions, or first-principles derivations are presented that reduce to fitted parameters or prior results by construction. Central claims rest on direct performance comparisons (target accuracy and OOD robustness) rather than any self-definitional, fitted-input, or self-citation load-bearing steps. Self-citations to prior MACE work are present but not invoked to justify uniqueness theorems or ansatzes that would create circularity; the evaluation is externally falsifiable via the reported benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review limits visibility into parameters; no explicit free parameters named, but hyperparameters are flagged as critical. No invented entities. Axioms are standard assumptions in MLIP literature such as equivariance of message passing.

free parameters (1)
  • hyperparameters for fine-tuning
    Paper states well-chosen hyperparameters are prerequisites whose impact exceeds strategy choice; specific values not listed in abstract.
axioms (1)
  • domain assumption Equivariant message-passing architectures require adapted LoRA for both scalar and equivariant layers
    Invoked when describing the LoRA implementation for MACE models.

pith-pipeline@v0.9.1-grok · 5856 in / 1249 out tokens · 19750 ms · 2026-06-27T07:36:57.769981+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Universal Interatomic Potentials as Configuration-Space Generators for One-Shot and Iterative Fine-Tuning of Ab Initio-Accurate Material-Specific Models

    cond-mat.mtrl-sci 2026-06 unverdicted novelty 5.0

    Universal MLIPs serve as configuration generators whose DFT-relabeled subsamples enable one-shot or iterative training of material-specific MLIPs that recover accurate reactive energy profiles with 600-2000 DFT calculations.

Reference graph

Works this paper leans on

37 extracted references · 15 canonical work pages · cited by 1 Pith paper

  1. [1]

    URLhttps://arxiv

    Batatia, I.et al.A foundation model for atomistic materials chemistry (2025). URLhttps://arxiv. org/abs/2401.00096.2401.00096

  2. [2]

    URLhttps://doi.org/10.1038/ s42256-023-00716-3

    Deng, B.et al.CHGNet as a pretrained universal neural network potential for charge-informed atom- istic modelling.Nature Machine Intelligence5, 1031–1041 (2023). URLhttps://doi.org/10.1038/ s42256-023-00716-3

  3. [3]

    M.et al.Uma: A family of universal models for atoms (2026)

    Wood, B. M.et al.Uma: A family of universal models for atoms (2026). URLhttps://arxiv.org/ abs/2506.23971.2506.23971

  4. [4]

    URL https://doi.org/10.1038/s41586-023-06735-9

    Merchant, A.et al.Scaling deep learning for materials discovery.Nature624, 80–85 (2023). URL https://doi.org/10.1038/s41586-023-06735-9

  5. [5]

    URLhttps://arxiv.org/ abs/2410.22570.2410.22570

    Neumann, M.et al.Orb: A fast, scalable neural network potential (2024). URLhttps://arxiv.org/ abs/2410.22570.2410.22570

  6. [6]

    & Han, S

    Park, Y., Kim, J., Hwang, S. & Han, S. Scalable parallel algorithm for graph neural network interatomic potentials in molecular dynamics simulations.Journal of Chemical Theory and Computation20, 4857– 4868 (2024). URLhttps://doi.org/10.1021/acs.jctc.4c00190. PMID: 38813770,https://doi. org/10.1021/acs.jctc.4c00190. 21

  7. [7]

    URLhttps://arxiv.org/abs/2510.25380.2510

    Batatia, I.et al.Cross learning between electronic structure theories for unifying molecular, surface, and inorganic crystal foundation force fields (2025). URLhttps://arxiv.org/abs/2510.25380.2510. 25380

  8. [8]

    T.et al.Machine learning force fields.Chemical Reviews121, 10142–10186 (2021)

    Unke, O. T.et al.Machine learning force fields.Chemical Reviews121, 10142–10186 (2021). URL https://doi.org/10.1021/acs.chemrev.0c01111. PMID: 33705118

  9. [9]

    Four generations of high-dimensional neural network potentials.Chemical Reviews121, 10037–10072 (2021)

    Behler, J. Four generations of high-dimensional neural network potentials.Chemical Reviews121, 10037–10072 (2021). URLhttps://doi.org/10.1021/acs.chemrev.0c00868. PMID: 33779150, https://doi.org/10.1021/acs.chemrev.0c00868

  10. [10]

    S., Nebgen, B., Lubbers, N., Isayev, O

    Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: Sampling chemical space with active learning.The Journal of Chemical Physics148, 241733 (2018). URLhttps://doi. org/10.1063/1.5023802.https://pubs.aip.org/aip/jcp/article-pdf/doi/10.1063/1.5023802/ 16656391/241733_1_online.pdf

  11. [11]

    & Marsalek, O

    Schran, C., Brezina, K. & Marsalek, O. Committee neural network potentials control generalization errors and enable active learning.The Journal of Chemical Physics153(2020). URLhttp://dx.doi. org/10.1063/5.0016004

  12. [12]

    URLhttps://arxiv.org/abs/2405.20217.2405.20217

    Kaur, H.et al.Data-efficient fine-tuning of foundational models for first-principles quality sublimation enthalpies (2024). URLhttps://arxiv.org/abs/2405.20217.2405.20217

  13. [13]

    Sci.16, 11419–11433 (2025)

    Della Pia, F.et al.Accurate and efficient machine learning interatomic potentials for finite temperature modelling of molecular crystals.Chem. Sci.16, 11419–11433 (2025). URLhttp://dx.doi.org/10. 1039/D5SC01325A

  14. [14]

    URLhttps://arxiv.org/abs/2503.14118.2503.14118

    Mazitov, A.et al.Pet-mad, a lightweight universal interatomic potential for advanced materials modeling (2025). URLhttps://arxiv.org/abs/2503.14118.2503.14118

  15. [15]

    G., Allen, C

    Radova, M., Stark, W. G., Allen, C. S., Maurer, R. J. & Bart´ ok, A. P. Fine-tuning foundation models of materials interatomic potentials with frozen transfer learning (2025). URLhttps://arxiv.org/abs/ 2502.15582.2502.15582

  16. [16]

    & Zhong, Z

    Wang, R., Gao, Y., Wu, H. & Zhong, Z. Pre-training, fine-tuning, and distillation (pfd): Automatically generating machine learning force fields from universal models.Physical Review Materials9(2025). URLhttp://dx.doi.org/10.1103/sbz6-btz8

  17. [17]

    E., Lyons, J

    Turiansky, M. E., Lyons, J. L. & Bernstein, N. Machine learning phonon spectra for fast and accurate optical lineshapes of defects.ACS Nano20, 7454–7463 (2026). URLhttp://dx.doi.org/10.1021/ acsnano.5c15446

  18. [18]

    URLhttp://dx.doi.org/10.1073/pnas.1611835114

    Kirkpatrick, J.et al.Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences114, 3521–3526 (2017). URLhttp://dx.doi.org/10.1073/pnas.1611835114

  19. [19]

    J.et al.Lora: Low-rank adaptation of large language models (2021)

    Hu, E. J.et al.Lora: Low-rank adaptation of large language models (2021). URLhttps://arxiv.org/ abs/2106.09685.2106.09685

  20. [20]

    & Jia, W

    Wang, C., Hu, S., Tan, G. & Jia, W. ELoRA: Low-rank adaptation for equivariant GNNs. InProceedings of the 42nd International Conference on Machine Learning, vol. 267 ofProceedings of Machine Learning Research, 63113–63135 (PMLR, 2025). URLhttps://proceedings.mlr.press/v267/wang25al.html

  21. [21]

    X., Fong, K., Michaelides, A

    O’Neill, N., Shi, B. X., Fong, K., Michaelides, A. & Schran, C. To pair or not to pair? machine-learned explicitly-correlated electronic structure for nacl in water (2024). URLhttps://arxiv.org/abs/2311. 01527.2311.01527

  22. [22]

    Kuryla, D., Cs´ anyi, G., van Duin, A. C. T. & Michaelides, A. Efficient exploration of reaction path- ways using reaction databases and active learning.The Journal of Chemical Physics162, 114122 (2025). URLhttps://doi.org/10.1063/5.0235715.https://pubs.aip.org/aip/jcp/article-pdf/ doi/10.1063/5.0235715/20449557/114122_1_5.0235715.pdf. 22

  23. [23]

    Eastman, P.et al.SPICE, a dataset of drug-like molecules and peptides for training machine learning potentials.Scientific Data10, 11 (2023)

  24. [24]

    Elena, A.et al.Machine learned potential for high-throughput phonon calculations of metal-organic frameworks.Npj Computational Materials11(2025)

  25. [25]

    URLhttps://arxiv.org/abs/2410.12771.2410.12771

    Barroso-Luque, L.et al.Open materials 2024 (omat24) inorganic materials dataset and models (2024). URLhttps://arxiv.org/abs/2410.12771.2410.12771

  26. [26]

    L., Groves, M

    Kolsbjerg, E. L., Groves, M. N. & Hammer, B. An automated nudged elastic band method.The Journal of Chemical Physics145, 094107 (2016). URLhttps://doi.org/10.1063/1.4961868.https: //pubs.aip.org/aip/jcp/article-pdf/doi/10.1063/1.4961868/15516968/094107_1_online.pdf

  27. [27]

    Pickard, C. J. & Needs, R. J. ¡i¿ab initio¡/i¿random structure searching.Journal of Physics: Condensed Matter23, 053201 (2011). URLhttp://dx.doi.org/10.1088/0953-8984/23/5/053201

  28. [28]

    P.et al.Mace-off: Transferable short range machine learning force fields for organic molecules (2025)

    Kov´ acs, D. P.et al.Mace-off: Transferable short range machine learning force fields for organic molecules (2025). URLhttps://arxiv.org/abs/2312.15211.2312.15211

  29. [29]

    Rolnick, D., Ahuja, A., Schwarz, J., Lillicrap, T. P. & Wayne, G. Experience replay for continual learning. InAdvances in Neural Information Processing Systems, vol. 32, 348–358 (2019). URLhttps: //papers.nips.cc/paper/8327-experience-replay-for-continual-learning

  30. [30]

    & Filliat, D

    Lesort, T., Caselles-Dupr´ e, H., Garcia-Ortiz, M., Stoian, A. & Filliat, D. Generative models from the perspective of continual learning. In2019 International Joint Conference on Neural Networks (IJCNN), 1–8 (2019). URLhttps://doi.org/10.1109/IJCNN.2019.8851986

  31. [31]

    & Hoiem, D

    Li, Z. & Hoiem, D. Learning without forgetting. InComputer Vision – ECCV 2016, vol. 9908 ofLecture Notes in Computer Science, 614–629 (Springer, 2016). URLhttps://doi.org/10.1007/ 978-3-319-46493-0_37

  32. [32]

    & Lab, T

    Schulman, J. & Lab, T. M. Lora without regret.Thinking Machines Lab: Connectionism(2025). https://thinkingmachines.ai/blog/lora/

  33. [33]

    P., Batatia, I., Arany, E

    Kov´ acs, D. P., Batatia, I., Arany, E. S. & Cs´ anyi, G. Evaluation of the mace force field architec- ture: From medicinal chemistry to materials science.The Journal of Chemical Physics159, 044118 (2023). URLhttps://doi.org/10.1063/5.0155322.https://pubs.aip.org/aip/jcp/article-pdf/ doi/10.1063/5.0155322/18065016/044118_1_5.0155322.pdf

  34. [34]

    URLhttps://arxiv.org/abs/2602.19411.2602.19411

    Batatia, I.et al.Mace-polar-1: A polarisable electrostatic foundation model for molecular chemistry (2026). URLhttps://arxiv.org/abs/2602.19411.2602.19411

  35. [35]

    McIntosh-Smith, S., Alam, S. R. & Woods, C. Isambard-ai: a leadership class supercomputer optimised specifically for artificial intelligence (2024). URLhttps://doi.org/10.48550/arXiv.2410.11199. 2410.11199

  36. [36]

    & Hutter, F

    Loshchilov, I. & Hutter, F. Decoupled weight decay regularization (2019). URLhttps://arxiv.org/ abs/1711.05101.1711.05101

  37. [37]

    baseline

    Fredericks, S., Parrish, K., Sayre, D. & Zhu, Q. PyXtal: A python library for crystal structure generation and symmetry analysis.Computer Physics Communications261, 107810 (2021). URLhttps://doi. org/10.1016/j.cpc.2020.107810. 23 Appendix This appendix collects the computational details that support the main text and the additional results and comparisons...