pith. sign in

arxiv: 2605.19939 · v1 · pith:GDPWG2AOnew · submitted 2026-05-19 · 💻 cs.CE

Uncertainty-aware Machine Learning Interatomic Potentials via Learned Functional Perturbations

Pith reviewed 2026-05-20 04:22 UTC · model grok-4.3

classification 💻 cs.CE
keywords uncertainty quantificationmachine learning interatomic potentialsequivariant graph neural networkscontinuous ranked probability scorefunctional perturbationsout-of-distribution predictionsilicaN-body benchmark
0
0 comments X

The pith

Machine learning interatomic potentials gain reliable uncertainty estimates by adding learned functional perturbations and training with the continuous ranked probability score.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to make existing machine learning interatomic potentials uncertainty-aware in a simple way that avoids ensembles or extra hyperparameters. It introduces learned functional perturbations to the deterministic output and finetunes the model end-to-end using the continuous ranked probability score as the training objective. This matters for simulations and active learning because it lets users know when predictions are likely to fail on new atomic arrangements. Tests on charged-particle systems and silica structures show the approach produces uncertainties that align better with actual errors than prior Bayesian methods.

Core claim

A deterministic MLIP becomes probabilistic when its predictions receive learned functional perturbations that are optimized jointly with the continuous ranked probability score, producing calibrated uncertainty estimates that improve correlation with true errors on out-of-distribution configurations.

What carries the argument

Learned functional perturbations, which modify the model's output function during end-to-end CRPS training to encode predictive uncertainty.

If this is right

  • Active learning for MLIPs can select new training structures more efficiently by using the uncertainty signal.
  • Molecular dynamics simulations become safer because high-uncertainty regions can trigger fallback to more expensive calculations.
  • Foundation models for materials can be turned uncertainty-aware through the same finetuning procedure without redesign.
  • The method applies equally to models trained from scratch and to large pretrained potentials.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same perturbation idea may extend to other scientific machine-learning models that currently lack built-in uncertainty.
  • Combining these perturbations with selective ensemble averaging could further tighten calibration on rare events.
  • The approach might reduce the data needed for reliable potentials by guiding data collection toward uncertain regions.

Load-bearing premise

Learned functional perturbations, when optimized with CRPS, can represent the uncertainty of atomic configurations that lie outside the training distribution.

What would settle it

A new test set of atomic configurations with errors that do not increase in line with the model's reported uncertainty, or CRPS scores that fail to beat the Bayesian baseline.

Figures

Figures reproduced from arXiv: 2605.19939 by Dario Coscia, David R. Wessels, Erik J. Bekkers, Maksim Zhdanov, Olga Zaghen.

Figure 1
Figure 1. Figure 1: N-body test performance vs. training size (mean ± std over 4 seeds, shaded). Left: MSE; Center: CRPS; Right: spread-to￾skill ratio SSR. P-EGNN consistently achieves the best CRPS and the SSR closest to 1 at every training size, with the calibration gap widening as n grows. 32 128 1024 2 4 6 32 128 1024 2 4 6 Training set size n 32 128 1024 0.2 0.4 0.6 0.8 1 F-MAE[meV/A] ˚ ↓ F-CRPS[meV/A] ˚ ↓ F-Spear ↑ (ide… view at source ↗
read the original abstract

Machine Learning Interatomic Potentials (MLIPs) achieve near ab initio accuracy at a fraction of the cost of quantum-mechanical simulations, yet they remain prone to silent failures on out-of-distribution configurations, making principled uncertainty quantification (UQ) essential for error-aware simulations and active learning. Existing non-ensemble UQ methods for MLIPs rely either on variational inference or on parametric distributional assumptions, both of which add architectural complexity and hyper-parameters that must be tuned per task. Inspired by recent advances in probabilistic weather forecasting, we propose a simpler alternative: turn a deterministic MLIP into a probabilistic one through learned functional perturbations and finetune it end-to-end with the Continuous Ranked Probability Score (CRPS), a proper scoring rule. We validate the approach with an equivariant GNN (P-EGNN) trained from scratch and by finetuning the foundation model the Orb-v3 for silica. On the N-body charged particle benchmark, P-EGNN improves CRPS over the state-of-the-art Bayesian MLIP method BLIP by 19-32% across all training sizes; on silica, P-Orb raises the Spearman correlation between predicted uncertainty and actual error from 0.75 (BLIP-Orb) to 0.84.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proposes turning deterministic ML interatomic potentials into probabilistic models by introducing learned functional perturbations and end-to-end finetuning with the Continuous Ranked Probability Score (CRPS). It reports that P-EGNN improves CRPS by 19-32% over BLIP on the N-body charged-particle benchmark across training sizes, and that P-Orb raises the Spearman correlation between predicted uncertainty and error from 0.75 (BLIP-Orb) to 0.84 on silica.

Significance. If the central results hold under full verification, the approach supplies a lower-complexity alternative to variational or ensemble UQ for MLIPs, which could simplify reliable active learning and error-aware molecular dynamics. The concrete benchmark gains (CRPS and Spearman lifts) constitute a clear, falsifiable advance worth testing on additional OOD regimes.

major comments (1)
  1. The claim that learned functional perturbations, when finetuned with CRPS, adequately represent epistemic uncertainty on out-of-distribution atomic configurations rests on the N-body and silica results; however, the manuscript provides insufficient detail on data splits, OOD construction, and verification that the observed gains (19-32% CRPS, 0.75 to 0.84 Spearman) arise from epistemic rather than in-distribution calibration improvements.
minor comments (1)
  1. The abstract and methods description omit explicit statements of the perturbation parameterization, the precise form of the CRPS loss, and the training protocol for P-Orb finetuning; adding these would strengthen reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and positive assessment of the work's potential impact. We address the major comment below and will incorporate clarifications and additional details in the revised manuscript.

read point-by-point responses
  1. Referee: The claim that learned functional perturbations, when finetuned with CRPS, adequately represent epistemic uncertainty on out-of-distribution atomic configurations rests on the N-body and silica results; however, the manuscript provides insufficient detail on data splits, OOD construction, and verification that the observed gains (19-32% CRPS, 0.75 to 0.84 Spearman) arise from epistemic rather than in-distribution calibration improvements.

    Authors: We agree that greater explicitness on data splits and OOD construction will strengthen the presentation. In the revised manuscript we will add a dedicated paragraph in the Experiments section that specifies: (i) for the N-body benchmark, training configurations are generated with 5–10 particles while test sets include systems with 15–20 particles to induce controlled distributional shift; (ii) for the silica benchmark, the OOD subset is constructed from trajectories at temperatures and defect densities outside the training distribution. On the epistemic-versus-calibration question, the functional perturbations are introduced precisely to allow the model to express epistemic variability in the learned potential; CRPS training then optimizes the entire predictive distribution under this variability. The reported CRPS gains are measured on the shifted test distributions, and the Spearman improvement quantifies better ranking of actual errors by the predicted uncertainty—precisely the behavior expected when epistemic uncertainty is better captured. We will include a short discussion of this distinction together with a supplementary plot of uncertainty–error correlation stratified by in-distribution versus OOD subsets. revision: yes

Circularity Check

0 steps flagged

No significant circularity; new perturbations and CRPS objective are independent of inputs

full rationale

The paper's central derivation introduces learned functional perturbations applied to a deterministic MLIP, followed by end-to-end finetuning using the CRPS proper scoring rule. This construction does not reduce by definition or by the paper's equations to any previously fitted parameter or self-citation chain. The reported gains (CRPS improvements of 19-32% on N-body, Spearman lift from 0.75 to 0.84 on silica) are presented as empirical outcomes of the new training procedure rather than tautological renamings or fitted-input predictions. No self-definitional steps, uniqueness theorems imported from the same authors, or ansatz smuggling via prior work appear in the provided text. The approach is self-contained against external benchmarks and does not rely on load-bearing self-citations for its core claim.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 1 invented entities

Abstract-only access limits visibility into exact parameter counts or background assumptions; the primary addition appears to be the perturbation mechanism itself.

free parameters (1)
  • learned perturbation parameters
    Parameters introduced to create functional perturbations; their number and initialization are not specified in the abstract.
invented entities (1)
  • learned functional perturbations no independent evidence
    purpose: To convert a deterministic MLIP into a probabilistic model without architectural redesign
    Introduced as the core mechanism to enable uncertainty quantification via end-to-end training.

pith-pipeline@v0.9.0 · 5767 in / 1044 out tokens · 39942 ms · 2026-05-20T04:22:16.238849+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 1 internal anchor

  1. [1]

    BLIPs: Bayesian Learned Interatomic Potentials

    BLIPs: Bayesian Learned Interatomic Potentials , author=. arXiv preprint arXiv:2508.14022 , year=

  2. [2]

    (Sparse) Attention to the Details: Preserving Spectral Fidelity in ML-based Weather Forecasting Models

    (Sparse) Attention to the Details: Preserving Spectral Fidelity in ML-based Weather Forecasting Models , author=. arXiv preprint arXiv:2604.16429 , year=

  3. [3]

    International conference on machine learning , pages=

    E (n) equivariant graph neural networks , author=. International conference on machine learning , pages=. 2021 , organization=

  4. [4]

    Advances in neural information processing systems , volume=

    Simple and scalable predictive uncertainty estimation using deep ensembles , author=. Advances in neural information processing systems , volume=

  5. [5]

    international conference on machine learning , pages=

    Dropout as a bayesian approximation: Representing model uncertainty in deep learning , author=. international conference on machine learning , pages=. 2016 , organization=

  6. [6]

    Advances in neural information processing systems , volume=

    MACE: Higher order equivariant message passing neural networks for fast and accurate force fields , author=. Advances in neural information processing systems , volume=

  7. [7]

    Orb-v3: atomistic simulation at scale, 2025

    Orb-v3: atomistic simulation at scale , author=. arXiv preprint arXiv:2504.06231 , year=

  8. [8]

    Mathematical Geosciences , volume=

    Estimation of the continuous ranked probability score with limited information and applications to ensemble weather forecasts , author=. Mathematical Geosciences , volume=. 2018 , publisher=

  9. [9]

    Nature computational science , volume=

    Uncertainty-driven dynamics for active learning of interatomic potentials , author=. Nature computational science , volume=. 2023 , publisher=

  10. [10]

    Advances in neural information processing systems , volume=

    Schnet: A continuous-filter convolutional neural network for modeling quantum interactions , author=. Advances in neural information processing systems , volume=

  11. [11]

    Journal of the American statistical Association , volume=

    Strictly proper scoring rules, prediction, and estimation , author=. Journal of the American statistical Association , volume=. 2007 , publisher=

  12. [12]

    arXiv preprint arXiv:2506.10772 , year=

    Skillful joint probabilistic weather forecasting from marginals , author=. arXiv preprint arXiv:2506.10772 , year=

  13. [13]

    AIFS-CRPS: Ensemble forecasting using a model trained with a loss function based on the

    Lang, Simon and Alexe, Mihai and Clare, Mariana C A and Roberts, Christopher and Adewoyin, Rilwan and Bouallegue, Zied Ben and Chantry, Matthew and Dramsch, Jesper and Dueben, Peter D and Hahner, Sara and others , journal=. AIFS-CRPS: Ensemble forecasting using a model trained with a loss function based on the

  14. [14]

    ACS central science , volume=

    Evidential deep learning for guided molecular property prediction and discovery , author=. ACS central science , volume=. 2021 , publisher=

  15. [15]

    arXiv preprint arXiv:2407.13994 , year=

    Evidential deep learning for interatomic potentials , author=. arXiv preprint arXiv:2407.13994 , year=

  16. [16]

    Machine Learning: Science and Technology , year=

    Robust and scalable uncertainty estimation with conformal prediction for machine-learned interatomic potentials , author=. Machine Learning: Science and Technology , year=

  17. [17]

    Nature , volume=

    Probabilistic weather forecasting with machine learning , author=. Nature , volume=

  18. [18]

    npj Computational Materials , volume=

    Single-model uncertainty quantification in neural network potentials does not consistently outperform model ensembles , author=. npj Computational Materials , volume=. 2023 , publisher=

  19. [19]

    Advances in Neural Information Processing Systems , volume=

    UMA: A family of universal models for atoms , author=. Advances in Neural Information Processing Systems , volume=

  20. [20]

    Neumann, J

    Orb: A fast, scalable neural network potential , author=. arXiv preprint arXiv:2410.22570 , year=

  21. [21]

    2013 IEEE international conference on acoustics, speech and signal processing , pages=

    Speech recognition with deep recurrent neural networks , author=. 2013 IEEE international conference on acoustics, speech and signal processing , pages=. 2013 , organization=

  22. [22]

    Neural computation , volume=

    Training with noise is equivalent to Tikhonov regularization , author=. Neural computation , volume=. 1995 , publisher=

  23. [23]

    Advances in neural information processing systems , volume=

    Se (3)-transformers: 3d roto-translation equivariant attention networks , author=. Advances in neural information processing systems , volume=