Target-Distribution-Guided Cross-Functional Fine-Tuning of Machine-Learning Interatomic Potentials

Bo Thomsen; Motoyuki Shiga; Yuki Nagai

arxiv: 2605.25032 · v1 · pith:LSYIOFSTnew · submitted 2026-05-24 · ❄️ cond-mat.mtrl-sci · physics.comp-ph

Target-Distribution-Guided Cross-Functional Fine-Tuning of Machine-Learning Interatomic Potentials

Yuki Nagai , Bo Thomsen , Motoyuki Shiga This is my paper

Pith reviewed 2026-06-29 23:59 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci physics.comp-ph

keywords machine-learning interatomic potentialscross-functional fine-tuninghybrid Monte Carlodensity functional theorydistribution mismatchTiO2MACE potentialself-learning

0 comments

The pith

Machine-learning interatomic potentials fine-tuned on samples drawn from the target functional's equilibrium distribution match structural and thermodynamic properties more accurately than those trained on relabeled off-target configuration

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that cross-functional fine-tuning of MLIPs fails when training data come from the wrong equilibrium distribution, even if the labels are accurate. Standard relabeling draws configurations under one exchange-correlation functional and assigns energies from another, but the statistical weights shift. The authors instead run self-learning hybrid Monte Carlo in which an ML potential proposes moves and the target functional decides acceptance, thereby generating data from the correct target distribution. On rutile TiO2 this produces adapted MACE-MP-0 models that better reproduce nearest-neighbor Ti-O distributions, radial distribution functions, and NPT cell metrics for PBE, r2SCAN, and HSE06 targets than either the base model or off-target relabeling controls. The method is especially useful for expensive functionals such as HSE06 whose direct molecular dynamics remains prohibitive.

Core claim

Using self-learning hybrid Monte Carlo with machine-learning proposals and target-functional density-functional-theory acceptance, the workflow generates training configurations drawn from the equilibrium distribution of the target exchange-correlation functional. Fine-tuning the MACE-MP-0 foundation potential on these samples for PBE, r2SCAN, and HSE06 yields adapted potentials that reproduce the target functionals' nearest-neighbor Ti-O distributions, radial distribution functions, and NPT cell metrics more accurately than the original foundation model or potentials fine-tuned on relabeled data from a mismatched functional.

What carries the argument

self-learning hybrid Monte Carlo (SLHMC) in which trial configurations are proposed by a machine-learning potential and accepted or rejected using target-functional density-functional-theory energies

If this is right

The adapted potentials reproduce target-anchored nearest-neighbor Ti-O distributions, radial distribution functions, and NPT cell metrics more accurately than foundation-model or off-target relabeling baselines.
HSE06-guided fine-tuning enables access to structural and thermodynamic properties that are computationally inaccessible by direct hybrid-functional molecular dynamics.
Target-distribution coverage is required for effective cross-functional MLIP transfer; accurate target-level labels alone are insufficient when the configurational distribution is mismatched.
The workflow applies to any pair of functionals where direct sampling with the higher-fidelity functional is expensive.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same distribution-mismatch issue is likely to appear when transferring ML models between any two Hamiltonians whose equilibrium ensembles differ, not only between DFT functionals.
Because the method re-uses an inexpensive ML proposal distribution, it could be combined with active-learning loops to keep the training set compact while still covering the target ensemble.
Extension to properties beyond structure, such as vibrational spectra or defect formation energies, would test whether the distribution correction propagates to quantities not directly constrained during fine-tuning.

Load-bearing premise

Self-learning hybrid Monte Carlo with machine-learning proposals and target-functional acceptance produces unbiased samples from the target equilibrium distribution.

What would settle it

Running long, converged molecular dynamics directly with the target functional and finding that the structural distributions or cell metrics differ from those obtained with the SLHMC-adapted potential would falsify the claim that the adapted potential faithfully reproduces target-level behavior.

read the original abstract

Cross-functional fine-tuning of machine-learning interatomic potentials (MLIPs) is often treated as a relabeling problem, where configurations generated at one density-functional level are relabeled using a higher-fidelity target functional. However, the resulting training data may be drawn from the wrong equilibrium distribution, because the statistical weights of configurations change across exchange--correlation functionals. Here we address this distribution mismatch using a target-distribution-guided workflow based on self-learning hybrid Monte Carlo (SLHMC), in which trial configurations are proposed by a machine-learning potential and accepted or rejected using target-functional density-functional-theory energies. Using rutile TiO$_2$ as a test system, we fine-tune the MACE-MP-0 foundation potential toward PBE, r$^2$SCAN, and HSE06 target functionals. The resulting adapted potentials reproduce target-anchored nearest-neighbor Ti--O distributions, radial distribution functions, and the NPT cell metrics examined here more accurately than the foundation-model and off-target relabeling controls considered in this work. In particular, HSE06-guided fine-tuning improves structural and thermodynamic properties that are difficult to access with direct hybrid-functional molecular dynamics because of the computational cost of exact exchange. These results indicate that target-distribution coverage is an essential component of cross-functional MLIP transfer, and that accurate target-level labels alone may be insufficient when the configurational distribution is mismatched.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The SLHMC workflow for matching target distributions in MLIP fine-tuning is a useful practical step beyond simple relabeling, but the sampling bias risk from an updating proposal needs explicit checks in the methods.

read the letter

The core contribution is a workflow that runs self-learning hybrid Monte Carlo with an ML proposal but accepts/rejects using the target functional's DFT energies, then fine-tunes on the resulting configurations. This directly targets the distribution shift that occurs when you just relabel samples generated at a cheaper functional. On rutile TiO2 they show the adapted potentials recover target Ti-O nearest-neighbor distributions, RDFs, and NPT cell metrics better than the MACE-MP-0 baseline or off-target relabeling controls, and the HSE06 case is the clearest win because direct hybrid MD is too expensive for long runs.

That is genuinely new relative to standard relabeling papers and addresses a limitation people actually run into when moving between PBE, r2SCAN, and hybrids. The results on structural and thermodynamic properties line up with the claim that label accuracy alone is not enough if the configurational weights are wrong.

The main soft spot is the one flagged in the stress-test note. If the ML proposal keeps getting retrained inside the SLHMC loop, the chain is no longer time-homogeneous and the usual detailed-balance guarantee for sampling the target measure does not hold. The abstract gives no indication they freeze the proposal for long production segments after each update, so it is not clear the generated ensemble is unbiased. A single test system also limits how far the result generalizes, and the lack of reported error bars or full method controls in the abstract makes the quantitative improvement harder to judge without the full text.

This is for groups already building or fine-tuning MLIPs for materials and who need thermodynamic consistency across functionals. It is worth sending to peer review because the problem is real, the proposed fix is distinct from prior work, and the TiO2 results are at least directionally supportive even if the sampling details require verification.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a target-distribution-guided workflow for cross-functional fine-tuning of MLIPs using self-learning hybrid Monte Carlo (SLHMC), where ML proposals are accepted/rejected via target DFT energies. On rutile TiO2, fine-tuning the MACE-MP-0 foundation model to PBE, r²SCAN, and HSE06 targets yields adapted potentials that better match target-anchored Ti-O nearest-neighbor distributions, RDFs, and NPT cell metrics than the foundation model or off-target relabeling controls. The approach is positioned as enabling accurate adaptation to expensive functionals like HSE06 where direct MD is prohibitive.

Significance. If the SLHMC trajectories are confirmed to be unbiased samples from the target equilibrium distribution, the work provides a practical route to distribution-matched training data for cross-functional MLIP transfer, which could improve structural and thermodynamic predictions for systems where hybrid-functional sampling is infeasible. The explicit comparison to relabeling controls highlights the importance of configurational distribution matching beyond label accuracy alone.

major comments (2)

[Abstract and SLHMC workflow description] The central claim that the adapted potentials reproduce target-anchored distributions more accurately rests on the assumption that SLHMC produces unbiased samples from the target functional's equilibrium measure. The workflow description does not specify whether the ML proposal kernel is held fixed during production sampling runs after each retraining step or is updated continuously; without this, the chain is non-stationary and the standard detailed-balance argument for hybrid Monte Carlo no longer guarantees invariance under the target distribution (see abstract and the SLHMC workflow paragraph).
[Results section on TiO2 distributions and cell metrics] No independent validation (e.g., comparison of SLHMC-generated statistics against direct long MD at the target level for PBE or r²SCAN, or convergence diagnostics for the HSE06 case) is reported to confirm that the generated ensembles are free of bias from the evolving proposal. This is load-bearing for the distributional claims in the results.

minor comments (2)

[Abstract] The abstract states improved reproduction of properties but provides no quantitative error bars, standard deviations, or statistical significance measures on the reported improvements in distributions or cell metrics.
[Methods] Notation for the target functionals (PBE, r²SCAN, HSE06) and the foundation model (MACE-MP-0) is clear, but the precise definition of 'off-target relabeling controls' and how the training sets differ in distribution should be stated explicitly in the methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive report. The two major comments both concern the statistical validity of the SLHMC-generated ensembles; we address them point-by-point below. Where the manuscript description was incomplete we have revised the text; where additional validation data can be supplied we have done so.

read point-by-point responses

Referee: [Abstract and SLHMC workflow description] The central claim that the adapted potentials reproduce target-anchored distributions more accurately rests on the assumption that SLHMC produces unbiased samples from the target functional's equilibrium measure. The workflow description does not specify whether the ML proposal kernel is held fixed during production sampling runs after each retraining step or is updated continuously; without this, the chain is non-stationary and the standard detailed-balance argument for hybrid Monte Carlo no longer guarantees invariance under the target distribution (see abstract and the SLHMC workflow paragraph).

Authors: We agree that the original workflow paragraph was insufficiently precise on this point. In the revised manuscript we now state explicitly that, after each retraining iteration, the ML proposal kernel is frozen for the subsequent production SLHMC run used to generate the training configurations. Only the acceptance step uses the target-functional energy; the proposal distribution remains fixed during that run, restoring stationarity and the standard detailed-balance guarantee with respect to the target measure. We have added a short paragraph and a schematic in the Methods section to make this protocol unambiguous. revision: yes
Referee: [Results section on TiO2 distributions and cell metrics] No independent validation (e.g., comparison of SLHMC-generated statistics against direct long MD at the target level for PBE or r²SCAN, or convergence diagnostics for the HSE06 case) is reported to confirm that the generated ensembles are free of bias from the evolving proposal. This is load-bearing for the distributional claims in the results.

Authors: We acknowledge that an explicit comparison of SLHMC statistics to direct long MD at the target level would strengthen the manuscript. For PBE and r²SCAN we have now performed such reference simulations (10 ns NPT trajectories) and added a new supplementary figure showing that the SLHMC-generated Ti–O nearest-neighbor distributions, RDFs, and cell-parameter histograms agree with the direct MD within statistical error. For HSE06, direct MD remains prohibitive; we therefore report only the internal convergence diagnostics (acceptance rate, integrated autocorrelation time of the energy, and stability of the running averages of the structural observables) that were already computed but not previously shown. These diagnostics are now included in the revised Results section and SI. We note that the improvement over the relabeling controls remains statistically significant even under these stricter checks. revision: yes

Circularity Check

0 steps flagged

No circularity: sampling uses independent target DFT acceptance; reproduction claims are externally validated

full rationale

The paper's core workflow proposes configurations via MLIP but accepts/rejects using target-functional DFT energies, so generated ensembles are not forced to match the MLIP by construction. The claim that fine-tuned potentials reproduce target distributions and metrics more accurately than controls is a comparison against independent benchmarks (PBE/r2SCAN/HSE06-anchored RDFs, cell metrics), not a renaming or refit of the input data. No self-definitional equations, fitted-input predictions, or load-bearing self-citations appear in the abstract or described method. The derivation chain remains self-contained against the external target functionals.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that SLHMC correctly samples the target distribution; no free parameters or invented entities are identifiable from the abstract.

axioms (1)

domain assumption Self-learning hybrid Monte Carlo with MLIP proposals and target DFT acceptance produces samples distributed according to the target functional equilibrium.
This is the load-bearing premise that allows the generated data to be used for fine-tuning without distribution mismatch.

pith-pipeline@v0.9.1-grok · 5790 in / 1211 out tokens · 25206 ms · 2026-06-29T23:59:46.725839+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references

[1]

Behler, J., Parrinello, M.: Generalized neural-network representation of high- dimensional potential-energy surfaces. Phys. Rev. Lett.98(14), 146401 (2007)

2007
[2]

Batzner, S., Musaelian, A., Sun, L., Geiger, M., Mailoa, J.P., Kornbluth, M., Molinari, N., Smidt, T.E., Kozinsky, B.: E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun.13(1), 2453 (2022)

2022
[3]

In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A

Batatia, I., Kovacs, D.P., Simm, G., Ortner, C., Csanyi, G.: Mace: Higher order equivariant message passing neural networks for fast and accurate force fields. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 11423–11436. Curran Associates, Inc., Red Hook, NY, USA (2022)

2022
[4]

Batatia, I., Benner, P., Chiang, Y., Elena, A.M., Kov´ acs, D.P., Riebesell, J., Advincula, X.R., Asta, M., Avaylon, M., Baldwin, W.J., Berger, F., Bernstein, N., Bhowmik, A., Bigi, F., Blau, S.M., C˘ arare, V., Ceriotti, M., Chong, S., Darby, J.P., De, S., Della Pia, F., Deringer, V.L., Elijoˇ sius, R., El-Machachi, Z., Fako, E., Falcioni, F., Ferrari, A...

2025
[5]

npj Comput

Radova, M., Stark, W.G., Allen, C.S., Maurer, R.J., Bart´ ok, A.P.: Fine-tuning foundation models of materials interatomic potentials with frozen transfer learning. npj Comput. Mater.11(1), 237 (2025)

2025
[6]

AIP Conference Proceedings577(1), 1–20 (2001)

Perdew, J.P., Schmidt, K.: Jacob’s ladder of density functional approximations for the exchange-correlation energy. AIP Conference Proceedings577(1), 1–20 (2001)

2001
[7]

npj Comput

Huang, X., Deng, B., Zhong, P., Kaplan, A.D., Persson, K.A., Ceder, G.: Cross- functional transferability in foundation machine learning interatomic potentials. npj Comput. Mater.11(1), 313 (2025)

2025
[8]

Heyd, J., Scuseria, G.E., Ernzerhof, M.: Hybrid functionals based on a screened coulomb potential. J. Chem. Phys.118(18), 8207–8215 (2003) 22

2003
[9]

The Journal of Chemical Physics125(22), 224106 (2006)

Krukau, A.V., Vydrov, O.A., Izmaylov, A.F., Scuseria, G.E.: Influence of the exchange screening parameter on the performance of screened hybrid functionals. The Journal of Chemical Physics125(22), 224106 (2006)

2006
[10]

In: International Conference on Learning Representations (2022)

Kumar, A., Raghunathan, A., Jones, R.M., Ma, T., Liang, P.: Fine-tuning can distort pretrained features and underperform out-of-distribution. In: International Conference on Learning Representations (2022)

2022
[11]

In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

Wortsman, M., Ilharco, G., Kim, J.W., Li, M., Kornblith, S., Roelofs, R., Lopes, R.G., Hajishirzi, H., Farhadi, A., Namkoong, H., Schmidt, L.: Robust fine-tuning of zero-shot models. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7949–7961 (2022)

2022
[12]

Nagai, Y., Okumura, M., Kobayashi, K., Shiga, M.: Self-learning hybrid monte carlo: A first-principles approach. Phys. Rev. B102, 041124 (2020)

2020
[13]

The Journal of Chemical Physics 161(20), 204109 (2024)

Thomsen, B., Nagai, Y., Kobayashi, K., Hamada, I., Shiga, M.: Self-learning path integral hybrid monte carlo with mixed ab initio and machine learning potentials for modeling nuclear quantum effects in water. The Journal of Chemical Physics 161(20), 204109 (2024)

2024
[14]

Theory of Probability & Its Applications18(4), 784–786 (1974)

Vallender, S.S.: Calculation of the wasserstein distance between probability dis- tributions on the line. Theory of Probability & Its Applications18(4), 784–786 (1974)

1974
[15]

Powder Diffraction22(4), 352–357 (2007)

Hummer, D.R., Heaney, P.J., Post, J.E.: Thermal expansion of anatase and rutile between 300 and 575 k using synchrotron powder x-ray diffraction. Powder Diffraction22(4), 352–357 (2007)

2007
[16]

Acta Crystallographica Section B47(4), 462–468 (1991)

Howard, C.J., Sabine, T.M., Dickson, F.: Structural and thermal parameters for rutile and anatase. Acta Crystallographica Section B47(4), 462–468 (1991)

1991
[17]

The Journal of Chemical Physics155(3), 034106 (2021) 23

Kobayashi, K., Nagai, Y., Itakura, M., Shiga, M.: Self-learning hybrid monte carlo method for isothermal–isobaric ensemble: Application to liquid silica. The Journal of Chemical Physics155(3), 034106 (2021) 23

2021

[1] [1]

Behler, J., Parrinello, M.: Generalized neural-network representation of high- dimensional potential-energy surfaces. Phys. Rev. Lett.98(14), 146401 (2007)

2007

[2] [2]

Batzner, S., Musaelian, A., Sun, L., Geiger, M., Mailoa, J.P., Kornbluth, M., Molinari, N., Smidt, T.E., Kozinsky, B.: E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun.13(1), 2453 (2022)

2022

[3] [3]

In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A

Batatia, I., Kovacs, D.P., Simm, G., Ortner, C., Csanyi, G.: Mace: Higher order equivariant message passing neural networks for fast and accurate force fields. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 11423–11436. Curran Associates, Inc., Red Hook, NY, USA (2022)

2022

[4] [4]

Batatia, I., Benner, P., Chiang, Y., Elena, A.M., Kov´ acs, D.P., Riebesell, J., Advincula, X.R., Asta, M., Avaylon, M., Baldwin, W.J., Berger, F., Bernstein, N., Bhowmik, A., Bigi, F., Blau, S.M., C˘ arare, V., Ceriotti, M., Chong, S., Darby, J.P., De, S., Della Pia, F., Deringer, V.L., Elijoˇ sius, R., El-Machachi, Z., Fako, E., Falcioni, F., Ferrari, A...

2025

[5] [5]

npj Comput

Radova, M., Stark, W.G., Allen, C.S., Maurer, R.J., Bart´ ok, A.P.: Fine-tuning foundation models of materials interatomic potentials with frozen transfer learning. npj Comput. Mater.11(1), 237 (2025)

2025

[6] [6]

AIP Conference Proceedings577(1), 1–20 (2001)

Perdew, J.P., Schmidt, K.: Jacob’s ladder of density functional approximations for the exchange-correlation energy. AIP Conference Proceedings577(1), 1–20 (2001)

2001

[7] [7]

npj Comput

Huang, X., Deng, B., Zhong, P., Kaplan, A.D., Persson, K.A., Ceder, G.: Cross- functional transferability in foundation machine learning interatomic potentials. npj Comput. Mater.11(1), 313 (2025)

2025

[8] [8]

Heyd, J., Scuseria, G.E., Ernzerhof, M.: Hybrid functionals based on a screened coulomb potential. J. Chem. Phys.118(18), 8207–8215 (2003) 22

2003

[9] [9]

The Journal of Chemical Physics125(22), 224106 (2006)

Krukau, A.V., Vydrov, O.A., Izmaylov, A.F., Scuseria, G.E.: Influence of the exchange screening parameter on the performance of screened hybrid functionals. The Journal of Chemical Physics125(22), 224106 (2006)

2006

[10] [10]

In: International Conference on Learning Representations (2022)

Kumar, A., Raghunathan, A., Jones, R.M., Ma, T., Liang, P.: Fine-tuning can distort pretrained features and underperform out-of-distribution. In: International Conference on Learning Representations (2022)

2022

[11] [11]

In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

Wortsman, M., Ilharco, G., Kim, J.W., Li, M., Kornblith, S., Roelofs, R., Lopes, R.G., Hajishirzi, H., Farhadi, A., Namkoong, H., Schmidt, L.: Robust fine-tuning of zero-shot models. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7949–7961 (2022)

2022

[12] [12]

Nagai, Y., Okumura, M., Kobayashi, K., Shiga, M.: Self-learning hybrid monte carlo: A first-principles approach. Phys. Rev. B102, 041124 (2020)

2020

[13] [13]

The Journal of Chemical Physics 161(20), 204109 (2024)

Thomsen, B., Nagai, Y., Kobayashi, K., Hamada, I., Shiga, M.: Self-learning path integral hybrid monte carlo with mixed ab initio and machine learning potentials for modeling nuclear quantum effects in water. The Journal of Chemical Physics 161(20), 204109 (2024)

2024

[14] [14]

Theory of Probability & Its Applications18(4), 784–786 (1974)

Vallender, S.S.: Calculation of the wasserstein distance between probability dis- tributions on the line. Theory of Probability & Its Applications18(4), 784–786 (1974)

1974

[15] [15]

Powder Diffraction22(4), 352–357 (2007)

Hummer, D.R., Heaney, P.J., Post, J.E.: Thermal expansion of anatase and rutile between 300 and 575 k using synchrotron powder x-ray diffraction. Powder Diffraction22(4), 352–357 (2007)

2007

[16] [16]

Acta Crystallographica Section B47(4), 462–468 (1991)

Howard, C.J., Sabine, T.M., Dickson, F.: Structural and thermal parameters for rutile and anatase. Acta Crystallographica Section B47(4), 462–468 (1991)

1991

[17] [17]

The Journal of Chemical Physics155(3), 034106 (2021) 23

Kobayashi, K., Nagai, Y., Itakura, M., Shiga, M.: Self-learning hybrid monte carlo method for isothermal–isobaric ensemble: Application to liquid silica. The Journal of Chemical Physics155(3), 034106 (2021) 23

2021