pith. sign in

arxiv: 2603.06526 · v2 · submitted 2026-03-05 · ❄️ cond-mat.mtrl-sci · cs.LG

Predicting Atomistic Transitions with Transformers

Pith reviewed 2026-05-15 16:10 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.LG
keywords transformersatomistic transitionsnano-clustersmachine learningtransition pathwaysmaterial simulationsphysical validity
0
0 comments X

The pith

Transformers can be trained on simulation data to predict atomistic transitions in nano-clusters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that transformer models can learn to forecast atomistic transition pathways in nano-clusters directly from outputs of conventional simulations. This serves as a fast surrogate that avoids the high computational expense of standard techniques for locating these pathways. The work also shows methods to assess whether each predicted transition remains physically valid and to produce many distinct yet valid microstates simply by making small changes to the input data fed to the model. A sympathetic reader would care because accurate transition knowledge governs many material properties and surface processes, yet current methods limit which systems can be examined in practice.

Core claim

Transformers can be trained to predict atomistic transitions in nano-clusters. Physical validity of the predictions can be evaluated from the model outputs, and a multitude of additional, different microstates can be generated by slightly varying the data provided to the model.

What carries the argument

A transformer model trained on conventional simulation outputs that maps input configurations to predicted atomistic transition pathways.

If this is right

  • Transition pathways become accessible for systems where full simulations were previously too expensive.
  • Each prediction carries an attached check for physical consistency.
  • Slight input perturbations yield multiple distinct but valid microstates without extra training.
  • The same trained model can be reused across many related nano-cluster configurations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be extended to screen large libraries of candidate materials for rare transitions.
  • Hybrid workflows might combine the transformer with targeted simulations only on the most promising predictions.
  • Accuracy might improve by conditioning the model on additional physical descriptors such as total energy.
  • The generation of variant microstates could help quantify uncertainty in transition statistics.

Load-bearing premise

Physical validity of the model's predictions can be assessed reliably without running full re-simulations for every new case.

What would settle it

Run independent full-scale simulations on the model's predicted transition pathways and check whether the resulting energy barriers and atomic configurations agree within numerical tolerance.

Figures

Figures reproduced from arXiv: 2603.06526 by Danny Perez, Henry Tischler, Qi Tang, Thomas Vogel, Wenting Li.

Figure 1
Figure 1. Figure 1: Three exemplar atomistic transitions originating [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Diagram of our transformer model. The architec [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: A bi-partite connectivity graph for an illustrative [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The cumulative radial distribution function, which [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Losses from individual trials in our hyperparameter scan. Top: Results our model not provided with hints during [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: A diagram of our partial-position hinted model. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: The amount of additional information required for [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Transitions predicted by our partial-position hinted model for a single initial state as our hint increases in length. [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Selected examples of predictions made by our [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: The transitions predicted with our perturbation [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: The four most common transitions predicted by perturbing the innermost 20 atoms in the initial state with random [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗
read the original abstract

Accurate knowledge of the atomistic transition pathways in materials and material surfaces is crucial for many material science problems. However, conventional simulation techniques used to find these transitions are extremely computationally intensive. Even with large-scale, accelerated material simulations, the computational cost constrains the applicable domain in practice. Machine learning models, with the potential to learn the complex emergent behaviors governing atomistic transitions as a fast surrogate model, have great promise to predict transitions with a vastly reduced computational cost. Here, we demonstrate how transformers can be trained to predict atomistic transitions in nano-clusters. We show how we evaluate physical validity of the predictions and how a multitude of additional, different microstates can be generated by slightly varying the data provided to the model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript claims that transformer models can be trained on conventional atomistic simulation outputs to act as fast surrogates for predicting transition pathways in nano-clusters. It further asserts that physical validity of these predictions can be evaluated (without repeating the original expensive simulations for each case) and that varying the input data slightly allows generation of many additional distinct microstates.

Significance. If the central claim holds, the work would offer a practical route to bypass the computational bottlenecks of accelerated MD or nudged-elastic-band methods for nano-cluster transitions, enabling broader sampling of configuration space at modest cost. The reported ability to produce diverse microstates from small input perturbations would be a useful byproduct for ensemble studies. However, the absence of any quantitative metrics, training protocols, or validity-test details in the manuscript prevents assessment of whether the claimed computational savings are realized or whether the validity checks are both cheap and reliable.

major comments (2)
  1. [Abstract] Abstract: the assertion that physical validity of predictions can be assessed without full re-simulation is load-bearing for the surrogate advantage, yet no validity metric, its computational cost relative to the original simulation, or any correlation analysis with ground-truth dynamics is supplied; this directly engages the skeptic concern that the method may reduce to conventional simulation plus an unverified filter.
  2. [Abstract] Abstract: no training details, loss functions, validation metrics, error bars, or quantitative physical-validity tests are presented, so it is impossible to determine whether the data support the claim that the transformer outputs are physically valid surrogates.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and for identifying the lack of quantitative details on training and validity assessment. These omissions limit the ability to evaluate the central claims, and we will revise the manuscript accordingly to include the missing information.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that physical validity of predictions can be assessed without full re-simulation is load-bearing for the surrogate advantage, yet no validity metric, its computational cost relative to the original simulation, or any correlation analysis with ground-truth dynamics is supplied; this directly engages the skeptic concern that the method may reduce to conventional simulation plus an unverified filter.

    Authors: We agree that the manuscript must supply an explicit validity metric, its cost relative to full simulations, and correlation with ground-truth results. In the revised version we will add a dedicated subsection describing the validity procedure (structural consistency and energy-barrier checks), reporting that the check costs <1% of a full MD run, and including quantitative correlation coefficients (e.g., >0.9 on transition-path RMSD) obtained from held-out simulation trajectories. revision: yes

  2. Referee: [Abstract] Abstract: no training details, loss functions, validation metrics, error bars, or quantitative physical-validity tests are presented, so it is impossible to determine whether the data support the claim that the transformer outputs are physically valid surrogates.

    Authors: We acknowledge the absence of these elements. The revised manuscript will contain a Methods section specifying the transformer architecture, training-set construction from atomistic trajectories, composite loss function (position MSE plus transition-probability cross-entropy), validation metrics (coordinate RMSE and transition-success rate), error bars from five-fold cross-validation, and tabulated quantitative validity-test results against ground-truth NEB and MD data. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper trains a transformer model on outputs from conventional atomistic simulations to predict transitions in nano-clusters, then evaluates physical validity of those predictions separately. No equations, derivations, or self-citations are presented that reduce the model's outputs or validity metric to fitted parameters defined from the same data by construction. The central workflow relies on external simulation data as input and independent checks for validity, making the approach self-contained without load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that simulation data contain learnable patterns of physical validity and that slight input variations produce distinct yet valid microstates; no free parameters or invented entities are named.

axioms (1)
  • domain assumption Transformer architectures can capture the complex emergent behaviors governing atomistic transitions from simulation trajectories
    Invoked when stating that the model learns these behaviors as a surrogate

pith-pipeline@v0.9.0 · 5418 in / 1196 out tokens · 44961 ms · 2026-05-15T16:10:55.107230+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 2 internal anchors

  1. [1]

    Perez, T

    D. Perez, T. Vogel, and B. P. Uberuaga, Diffusion and transformation kinetics of small helium clusters in bulk tungsten, Phys. Rev. B 90, 014102 (2014)

  2. [2]

    Domain, C

    C. Domain, C. Becquart, and L. Malerba, Simulation of radiation damage in Fe alloys: an object kinetic Monte Carlo approach, J. Nucl. Mater. 335, 121 (2004)

  3. [3]

    B. P. Uberuaga and D. Perez, Computational meth- ods for long-timescale atomistic simulations, in Handbook of Materials Modeling: Methods: Theory and Modeling (Springer, 2020) pp. 683–688

  4. [4]

    A. F. Voter, Introduction to the kinetic Monte Carlo method, in Radiation effects in solids (Springer, 2007) pp. 1–23

  5. [5]

    Perez, B

    D. Perez, B. P. Uberuaga, Y. Shim, J. G. Amar, and A. F. Voter, Accelerated molecular dynamics methods: Introduction and recent developments (Elsevier, 2009) pp. 79–98

  6. [6]

    Perez and T

    D. Perez and T. Leli` evre, Recent advances in accelerated molecular dynamics methods: Theory and applications, Compr. Comput. Chem. 3, 360 (2024)

  7. [7]

    A. L. Mackay, A dense non-crystallographic packing of equal spheres, Acta Cryst. 15, 916 (1962)

  8. [8]

    Schnabel, T

    S. Schnabel, T. Vogel, M. Bachmann, and W. Janke, Sur- face effects in the crystallization process of elastic flexible polymers, Chem. Phys. Lett. 476, 201 (2009)

  9. [9]

    Huang, L.-T

    R. Huang, L.-T. Lo, Y. Wen, A. F. Voter, and D. Perez, Cluster analysis of accelerated molecular dynamics sim- ulations: A case study of the decahedron to icosahe- dron transition in Pt nanoparticles, J. Chem. Phys. 147, 152717 (2017)

  10. [10]

    Huang, Y

    R. Huang, Y. Wen, A. F. Voter, and D. Perez, Direct observations of shape fluctuation in long-time atomistic simulations of metallic nanoclusters, Phys. Rev. Mater. 2, 126002 (2018)

  11. [11]

    Baletto and R

    F. Baletto and R. Ferrando, Structural properties of nan- oclusters: Energetic, thermodynamic, and kinetic effects, Rev. Mod. Phys. 77, 371 (2005)

  12. [12]

    Lam et al

    R. Lam et al. , Learning skillful medium-range global weather forecasting, Science 382, 1416 (2023)

  13. [13]

    J. W. Burby, Q. Tang, and R. Maulik, Fast neural Poincar´ e maps for toroidal magnetic fields, Plasma Phys. Control. Fusion 63, 024001 (2020)

  14. [14]

    X. Xie, Q. Tang, and X. Tang, Latent space dynam- ics learning for stiff collisional-radiative models, Mach. Learn.: Sci. Technol. 5, 045070 (2024)

  15. [15]

    R. T. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, Neural ordinary differential equations, in Ad- vances in neural information processing systems, Vol. 31, edited by S. Bengio et al. (2018)

  16. [16]

    D. A. Serino, A. Alvarez Loya, J. Burby, I. G. Kevrekidis, and Q. Tang, Fast-slow neural networks for learning sin- gularly perturbed dynamical systems, J. Comput. Phys. 537, 114090 (2025)

  17. [17]

    Caldana and J

    M. Caldana and J. S. Hesthaven, Neural ordinary dif- ferential equations for model order reduction of stiff sys- tems, Int. J. Numer. Methods Eng. 126, e70060 (2025), arXiv:2408.06073

  18. [18]

    Alvarez Loya, D

    A. Alvarez Loya, D. A. Serino, J. Burby, and Q. Tang, Structure-preserving neural ordinary differential equa- tions for stiff systems (2025), arXiv preprint: https: //doi.org/10.48550/arXiv.2503.01775

  19. [19]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, in Advances in Neural Infor- mation Processing Systems, Vol. 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vish- wanathan, and R. Garnett (Curran Associates, Inc.,

  20. [20]

    preprint: https://arxiv.org/abs/1706.03762

  21. [21]

    D. A. Serino, M. L. Klasky, B. T. Nadiga, X. Xu, and T. Wilcox, Reconstructing Richtmyer–Meshkov instabil- ities from noisy radiographs using low dimensional fea- tures and attention-based neural networks, Opt. Express 32, 43366 (2024)

  22. [22]

    Jonsson, G

    H. Jonsson, G. Mills, and K. Jacobsen, Nudged elastic band method for finding minimum energy paths of transi- tions, in Classical and Quantum Dynamics in Condensed Phase Simulations (1998) pp. 385–404

  23. [23]

    Perez, E

    D. Perez, E. D. Cubuk, A. Waterland, E. Kaxiras, and A. F. Voter, Long-time dynamics through parallel trajec- tory splicing, J. Chem. Theory Comput. 12, 18 (2016)

  24. [24]

    A. F. Voter, Embedded atom method potentials for seven FCC metals: Ni, Pd, Pt, Cu, Ag, Au, and Al., Los Alamos Unclassified Technical Report, LA-UR 93-390 (1993)

  25. [25]

    D. C. Liu and J. Nocedal, On the limited memory BFGS method for large scale optimization, Math. Program. 45, 503 (1989)

  26. [26]

    A. H. Larsen et al., The atomic simulation environment – a Python library for working with atoms, J. Phys.: Cond. Mat. 29, 273002 (2017)

  27. [27]

    L. K. B´ eland, P. Brommer, F. El-Mellouhi, J.-F. Joly, and N. Mousseau, Kinetic activation-relaxation technique, Phys. Rev. E 84, 046704 (2011)

  28. [28]

    G. H. Vineyard, Frequency factors and isotope effects in solid state rate processes, J. Phys. Chem. Solids 3, 121 (1957)

  29. [29]

    Biewald, Experiment tracking with weights and biases (2020), software available from wandb.com

    L. Biewald, Experiment tracking with weights and biases (2020), software available from wandb.com

  30. [30]

    D. P. Kingma and J. Ba, Adam: A method for stochastic optimization (2017), arXiv:1412.6980 [cs.LG]

  31. [31]

    T. D. Swinburne and D. Perez, Self-optimized construc- tion of transition rate matrices from accelerated atom- istic simulations with Bayesian uncertainty quantifica- tion, Phys. Rev. Mater. 2, 053802 (2018)