Fine-tuning MLIP foundation models: strategies for accuracy and transferability

Alin M. Elena; Eszter Varga-Umbrich; G\'abor Cs\'anyi; Ilyes Batatia; Noam Bernstein; Tam\'as Lajos Tompa

arxiv: 2606.12704 · v1 · pith:NIWATT5Qnew · submitted 2026-06-10 · ⚛️ physics.chem-ph · cond-mat.mtrl-sci

Fine-tuning MLIP foundation models: strategies for accuracy and transferability

Tam\'as Lajos Tompa , Eszter Varga-Umbrich , Ilyes Batatia , Alin M. Elena , Noam Bernstein , G\'abor Cs\'anyi This is my paper

classification ⚛️ physics.chem-ph cond-mat.mtrl-sci

keywords fine-tuningreplayfoundationmodelsaccuracylorapseudolabelledstrategies

0 comments

read the original abstract

Adapting machine-learned interatomic potential (MLIP) foundation models to specialised tasks through fine-tuning is an increasingly important practice, yet systematic guidance on when and how to fine-tune is currently limited. We evaluate seven fine-tuning strategies -- naive full-parameter updates, two layer-freezing variants, Low-Rank Adaptation (LoRA), multihead replay, pseudolabelled replay, and replay combined with LoRA -- across five chemically diverse benchmarks (aqueous NaCl, ice polymorphs, S$_\mathrm{N}$2 reactions, SPICE biomolecules, and lithium electrolytes), three generations of foundation models, and training sets spanning five orders of magnitude. To support this evaluation we implement three capabilities in the MACE codebase: LoRA adapted for equivariant message-passing architectures, including both scalar and equivariant linear layers; pseudolabelled replay, which decouples the replay data source from the original pretraining corpus; and model-aware atomic reference energy (E0) reestimation for fine-tuning workflows. We find that foundation model quality, correct E0 initialisation, and well-chosen hyperparameters are prerequisites whose impact routinely exceeds that of the fine-tuning strategy itself. Once these prerequisites are met, most strategies achieve strong target-task accuracy, consistently surpassing models trained from scratch. The practical distinction depends on deployment scope: naive fine-tuning offers the best convergence for single-system applications, while multihead replay -- with either original or pseudolabelled data -- is the only approach tested that consistently preserves out-of-distribution robustness, maintaining both pretraining-distribution accuracy for broader deployment and many-body short-range repulsion.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Universal Interatomic Potentials as Configuration-Space Generators for One-Shot and Iterative Fine-Tuning of Ab Initio-Accurate Material-Specific Models
cond-mat.mtrl-sci 2026-06 unverdicted novelty 5.0

Universal MLIPs serve as configuration generators whose DFT-relabeled subsamples enable one-shot or iterative training of material-specific MLIPs that recover accurate reactive energy profiles with 600-2000 DFT calculations.