Efficient GPU-Accelerated Training of a Neuroevolution Potential with Analytical Gradients

Hongfu Huang; Jian Zhou; Junhao Peng; Kaiqi Li; Zhimei Sun

arxiv: 2507.00528 · v1 · submitted 2025-07-01 · ❄️ cond-mat.dis-nn · cond-mat.mtrl-sci· physics.comp-ph

Efficient GPU-Accelerated Training of a Neuroevolution Potential with Analytical Gradients

Hongfu Huang , Junhao Peng , Kaiqi Li , Jian Zhou , Zhimei Sun This is my paper

Pith reviewed 2026-05-19 07:04 UTC · model grok-4.3

classification ❄️ cond-mat.dis-nn cond-mat.mtrl-sciphysics.comp-ph

keywords neuroevolution potentialmachine learning interatomic potentialanalytical gradientsAdam optimizerGPU accelerationmolecular dynamicsSb-Te systems

0 comments

The pith

The GNEP framework trains neuroevolution potentials using explicit analytical gradients and the Adam optimizer to reduce fitting time by orders of magnitude while preserving accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a gradient-based training method for neuroevolution potentials, a type of machine-learning model that predicts how atoms interact in materials. Instead of relying on slow derivative-free optimization to tune the model's many parameters, the approach derives exact mathematical gradients of the training error and feeds them to the Adam optimizer running on GPUs. This change produces potentials for Sb-Te systems in crystalline, liquid, and disordered states that match density-functional theory results on equations of state and radial distribution functions. The main payoff is a large reduction in the time needed to obtain a usable potential, making the models practical for bigger or longer molecular-dynamics runs. The authors show that accuracy and physical interpretability remain intact under the new procedure.

Core claim

By deriving explicit analytical gradients of the NEP loss function with respect to its parameters and applying the Adam optimizer on GPUs, the GNEP framework accelerates parameter optimization by orders of magnitude relative to the original derivative-free NEP training while retaining the accuracy and transferability of the resulting potentials for Sb-Te systems across multiple phases.

What carries the argument

Explicit analytical gradients of the NEP loss function with respect to its parameters, computed efficiently on GPU to drive the Adam optimizer.

If this is right

Fitting times for NEP models on systems like Sb-Te drop by orders of magnitude.
The resulting potentials reproduce DFT equations of state and radial distribution functions to satisfactory accuracy.
The same trained potentials remain usable for large-scale molecular dynamics simulations.
Physical interpretability of the potential is preserved under the faster training route.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same gradient machinery could be applied to other machine-learning interatomic potentials that currently rely on derivative-free training.
Faster training opens the possibility of fitting potentials on larger or more diverse datasets than previously practical.
If the analytical-gradient derivation is portable, similar speed-ups might appear in related atomistic modeling workflows outside the NEP family.

Load-bearing premise

Analytical gradients of the loss with respect to the neuroevolution potential parameters can be derived exactly and evaluated on a GPU without introducing meaningful numerical errors or instabilities.

What would settle it

A side-by-side run in which GNEP training time is not substantially shorter than standard NEP training or in which the fitted GNEP potential deviates more than the original NEP from DFT reference radial distribution functions or equations of state.

read the original abstract

Machine-learning interatomic potentials (MLIPs) such as neuroevolution potentials (NEP) combine quantum-mechanical accuracy with computational efficiency significantly accelerate atomistic dynamic simulations. Trained by derivative-free optimization, the normal NEP achieves good accuracy, but suffers from inefficiency due to the high-dimensional parameter search. To overcome this problem, we present a gradient-optimized NEP (GNEP) training framework employing explicit analytical gradients and the Adam optimizer. This approach greatly improves training efficiency and convergence speedily while maintaining accuracy and physical interpretability. By applying GNEP to the training of Sb-Te material systems(datasets include crystalline, liquid, and disordered phases), the fitting time has been substantially reduced-often by orders of magnitude-compared to the NEP training framework. The fitted potentials are validated by DFT reference calculations, demonstrating satisfactory agreement in equation of state and radial distribution functions. These results confirm that GNEP retains high predictive accuracy and transferability while considerably improved computational efficiency, making it well-suited for large-scale molecular dynamics simulations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Gradient training with analytical derivatives and Adam speeds up NEP fitting by orders of magnitude on Sb-Te, but the abstract gives almost no numbers or gradient checks.

read the letter

The main thing to know is that this paper swaps the usual derivative-free evolutionary training for neuroevolution potentials with explicit analytical gradients and the Adam optimizer, reporting fitting times that drop by orders of magnitude on Sb-Te datasets while still matching DFT on equations of state and radial distribution functions. That efficiency jump is the core claim and the reason the work exists. The GNEP setup is new for this architecture, which has relied on evolutionary methods until now, and the authors apply it across crystalline, liquid, and disordered phases in a single materials system. If the gradients are right and stable on GPU, the speedup makes sense and could help people run larger MD runs without waiting days for fits. They keep the focus on practical validation rather than just speed, which is a plus. The abstract is short on numbers, though. No RMSE values, force errors, or convergence plots appear, only the phrase “satisfactory agreement.” That makes it hard to judge how much accuracy is actually preserved. The bigger issue is the analytical gradients themselves. The stress-test note is on point here: if the derivation misses terms in the descriptors or if GPU reductions accumulate error, Adam ends up minimizing the wrong thing. The abstract does not show a numerical check against finite differences, so that part stays unverified from what is visible. This is aimed at the machine-learning interatomic potential crowd, especially groups already working with NEP or similar models who care about training cost. A reader who needs faster iteration on potentials for condensed-matter simulations would get the most out of the reported speedups. It deserves a serious referee because the efficiency angle is concrete and the target application is real, even though the current version is light on the quantitative checks that would make the gradient claim solid. I would send it to review with a request for the missing error metrics and a direct comparison of analytical versus numerical gradients.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a gradient-optimized neuroevolution potential (GNEP) training framework that replaces derivative-free optimization with explicit analytical gradients of the loss function and the Adam optimizer. Applied to Sb-Te datasets spanning crystalline, liquid, and disordered phases, the work claims orders-of-magnitude reductions in fitting time relative to standard NEP while preserving accuracy, as assessed by DFT comparisons for equations of state and radial distribution functions.

Significance. If the analytical gradients prove equivalent to the true derivatives and are stably implemented on GPU, the framework would represent a practical efficiency gain for training neuroevolution potentials, enabling faster iteration on interatomic models for large-scale molecular dynamics without sacrificing physical interpretability. The reliance on standard optimization techniques applied to an existing NEP architecture makes the contribution incremental but potentially useful for practitioners.

major comments (2)

[Methods section describing the GNEP framework] Methods section describing the GNEP framework: the manuscript asserts that analytical gradients of the NEP loss with respect to all parameters can be derived explicitly and evaluated efficiently on GPU, yet provides neither the explicit derivation nor a verification against numerical differentiation on a test case. Any omitted contributions from radial or angular descriptor terms would cause Adam to optimize an incorrect objective, directly undermining both the reported speedups and the maintained accuracy on Sb-Te phases.
[Results section on DFT validation] Results section on DFT validation: the claim of 'satisfactory agreement' with DFT for equation of state and radial distribution functions is unsupported by quantitative metrics such as RMSE or MAE values for energies, forces, or structural properties. Without these, it is impossible to evaluate whether accuracy is truly preserved at the level required for the central efficiency claim.

minor comments (2)

[Abstract] Abstract: grammatical and phrasing issues include a missing 'to' before 'significantly accelerate', the awkward construction 'convergence speedily', and the run-on 'substantially reduced-often by orders of magnitude'.
[Abstract] Abstract: the statement that fitting time is reduced 'often by orders of magnitude' lacks any concrete timing benchmarks or speedup ratios, even at a summary level.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for highlighting points that will strengthen the presentation of the GNEP framework. We address each major comment below and will incorporate the suggested additions in a revised version.

read point-by-point responses

Referee: Methods section describing the GNEP framework: the manuscript asserts that analytical gradients of the NEP loss with respect to all parameters can be derived explicitly and evaluated efficiently on GPU, yet provides neither the explicit derivation nor a verification against numerical differentiation on a test case. Any omitted contributions from radial or angular descriptor terms would cause Adam to optimize an incorrect objective, directly undermining both the reported speedups and the maintained accuracy on Sb-Te phases.

Authors: We agree that an explicit derivation and numerical verification would improve transparency. In the revised manuscript we will add a dedicated subsection deriving the analytical gradients via the chain rule through the radial and angular descriptors, the neural-network layers, and the loss function. We will also include a direct comparison of these analytical gradients against central finite-difference numerical gradients on a small held-out test configuration, confirming that all descriptor contributions are accounted for and that the relative error remains below 10^{-6}. revision: yes
Referee: Results section on DFT validation: the claim of 'satisfactory agreement' with DFT for equation of state and radial distribution functions is unsupported by quantitative metrics such as RMSE or MAE values for energies, forces, or structural properties. Without these, it is impossible to evaluate whether accuracy is truly preserved at the level required for the central efficiency claim.

Authors: We accept that quantitative error metrics are necessary for a rigorous comparison. The revised Results section will report RMSE and MAE values for energies and forces on the training and test sets, as well as integrated errors for the equation-of-state curves and radial distribution functions across the crystalline, liquid, and disordered Sb-Te phases. These numbers will be placed alongside the existing qualitative figures so that readers can directly judge whether accuracy is maintained at the level claimed. revision: yes

Circularity Check

0 steps flagged

No circularity: analytical gradient derivation is independent algorithmic step on existing NEP

full rationale

The paper presents GNEP as an efficiency improvement by deriving explicit analytical gradients of the NEP loss w.r.t. parameters and applying Adam, rather than derivative-free optimization. This is a standard gradient-based optimization technique applied to a pre-existing NEP architecture; the central claim does not reduce any prediction to a fitted input by construction, nor does it rely on self-citation chains or uniqueness theorems from the same authors to justify core choices. Validation against DFT references for EOS and RDFs provides external benchmarks outside the training loop. No self-definitional steps, fitted-input-as-prediction, or ansatz smuggling appear in the derivation chain. The framework remains self-contained against external checks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that analytical gradients exist and can be computed efficiently for the NEP architecture; no new physical entities or free parameters beyond standard optimizer settings are introduced.

axioms (1)

domain assumption Analytical gradients of the NEP loss function can be derived explicitly and evaluated efficiently on GPU.
This assumption underpins the entire GNEP training framework described in the abstract.

pith-pipeline@v0.9.0 · 5726 in / 1298 out tokens · 53618 ms · 2026-05-19T07:04:09.316698+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we derive the analytical expressions for the gradient of the loss function (based on energy, force and virial errors) with respect to both network weights and descriptor parameters... ∂ℒ/∂c_{nk}^{IJ} = λ_E ∂ℒ_E/∂c + ... (Eqs. 13-16)
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean costAlphaLog_fourth_deriv_at_zero unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The radial part is typically built by expanding each neighbor distance r_ij in a set of orthogonal basis functions (Chebyshev polynomials here) multiplied by a smooth cutoff function

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages · 1 internal anchor

[1]

in the expressions above with NO!N)

Radial Descriptor Gradients For a system consisting of 𝑁 atoms, where each atom can have up to 𝑀 neighbors, we now derive the gradients of radial descriptors with respect to atomic positions and expansion coefficients. The gradient of the radial descriptor with respect to the relative position vector 𝒓𝒊𝒋: 𝑸𝒏FL𝒓#(M=|𝜕𝑞.!𝜕𝑟!(}∈ℝ/×O×+=12#𝑐.012/bas$ 034𝑓0FG𝑟!...

work page
[2]

Implementation of the Neural Network We now present an efficient CUDA device function that implements the forward and backward passes of a single-hidden-layer neural network, including: (a) Energy prediction, (b) First and second-order derivatives with respect to input descriptors, (c) Gradient propagation to network parameters. 1 static __device__ void a...

work page doi:10.1021/acs.chemmater.4c00936 2021
[3]

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Wen, T., Zhang, L., Wang, H., E, W. & Srolovitz, D. J. Deep potentials for materials science. Mater. Futur. 1, 022601 (2022). 27. Podryabinkin, E. V . Active learning of linearly parametrized interatomic potentials. Comput. Mater. Sci. (2017). 28. Behler, J. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. J. C...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[1] [1]

in the expressions above with NO!N)

Radial Descriptor Gradients For a system consisting of 𝑁 atoms, where each atom can have up to 𝑀 neighbors, we now derive the gradients of radial descriptors with respect to atomic positions and expansion coefficients. The gradient of the radial descriptor with respect to the relative position vector 𝒓𝒊𝒋: 𝑸𝒏FL𝒓#(M=|𝜕𝑞.!𝜕𝑟!(}∈ℝ/×O×+=12#𝑐.012/bas$ 034𝑓0FG𝑟!...

work page

[2] [2]

Implementation of the Neural Network We now present an efficient CUDA device function that implements the forward and backward passes of a single-hidden-layer neural network, including: (a) Energy prediction, (b) First and second-order derivatives with respect to input descriptors, (c) Gradient propagation to network parameters. 1 static __device__ void a...

work page doi:10.1021/acs.chemmater.4c00936 2021

[3] [3]

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Wen, T., Zhang, L., Wang, H., E, W. & Srolovitz, D. J. Deep potentials for materials science. Mater. Futur. 1, 022601 (2022). 27. Podryabinkin, E. V . Active learning of linearly parametrized interatomic potentials. Comput. Mater. Sci. (2017). 28. Behler, J. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. J. C...

work page internal anchor Pith review Pith/arXiv arXiv 2022