Quantum-Accurate Conformational Stabilities and Vibrational Dynamics in Molecules and Proteins with Machine-Learned Force Fields

Alexandre Tkatchenko; Florian N. Br\"unig; Joshua T. Berryman; Kyunghoon Han; Miguel Gallegos; Sergio Su\'arez-Dou

arxiv: 2601.09845 · v2 · pith:R535VLH6new · submitted 2026-01-14 · ⚛️ physics.chem-ph · physics.bio-ph

Quantum-Accurate Conformational Stabilities and Vibrational Dynamics in Molecules and Proteins with Machine-Learned Force Fields

Sergio Su\'arez-Dou , Miguel Gallegos , Kyunghoon Han , Florian N. Br\"unig , Joshua T. Berryman , Alexandre Tkatchenko This is my paper

Pith reviewed 2026-05-25 07:22 UTC · model grok-4.3

classification ⚛️ physics.chem-ph physics.bio-ph

keywords machine-learned force fieldsvibrational spectrabiomoleculesDFTinfrared spectroscopyconformational energeticsmolecular dynamicsproteins

0 comments

The pith

Machine-learned force fields reproduce DFT-level forces, vibrational spectra, and conformational energies far better than molecular mechanics across molecules and proteins.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates general-purpose machine-learned force fields on their ability to match quantum mechanical references for forces, vibrational frequencies, mode shapes, densities of states, and conformational stabilities. It introduces the QVib dataset of 293 molecules and 1365 conformers plus benchmarks on gas-phase peptides, p53 oligomers, and solvated proteins to test transferability to experimental infrared spectra. Conventional molecular mechanics force fields are shown to misrepresent infrared intensities and environment-dependent responses, while the learned models close much of the gap to DFT calculations. The work demonstrates that these models recover collective vibrational landscapes at near-DFT fidelity while retaining the computational speed of classical force fields. This matters because accurate vibrational dynamics and relative conformer energies underpin biomolecular thermodynamics and spectroscopy.

Core claim

Machine-learned force fields substantially improve over molecular mechanics in reproducing DFT-level forces, vibrational frequencies, densities of states, mode eigenvectors, conformational energetics, and experimental infrared spectra. These results show that machine-learned force-field dynamics can recover collective, environment-dependent vibrational landscapes at near-DFT fidelity, enabling spectroscopically validated biomolecular simulations at force-field-like cost.

What carries the argument

Machine-learned force fields trained on DFT reference data to predict atomic forces and energies, enabling molecular dynamics at classical cost with quantum-level accuracy on the potential energy surface.

If this is right

Machine-learned force fields recover DFT-level vibrational frequencies, densities of states, and mode eigenvectors across small molecules to solvated proteins.
Among models with explicit long-range electrostatics, SO3LR provides the most favourable accuracy-cost balance for the biomolecular systems considered.
Conformational energetics and environment-dependent vibrational response can be captured at higher fidelity than with conventional molecular mechanics.
Spectroscopically validated biomolecular simulations become feasible at force-field-like computational cost.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The QVib dataset offers a reusable benchmark that future force-field developers can use to test vibrational transferability.
Improved vibrational accuracy could affect calculated thermodynamic quantities such as free-energy differences between conformers in larger assemblies.
The approach suggests that dynamics trajectories from these models can be used directly for interpreting experimental spectra in complex biomolecular environments.

Load-bearing premise

That the chosen DFT reference level provides a sufficiently accurate and transferable proxy for both experimental vibrational spectra and conformational energetics across the tested molecules, peptides, and solvated protein systems.

What would settle it

A comparison of MLFF-computed infrared spectra against new experimental measurements on a solvated protein system outside the QVib training and test sets would directly test whether the claimed near-DFT fidelity holds.

read the original abstract

Biomolecular thermodynamics and spectroscopy depend on relative conformer energies, local curvatures, and collective dipole fluctuations on the potential-energy surface. Conventional molecular mechanics force fields enable large-scale simulations, but their fixed functional forms can misrepresent infrared intensities, mode character, and environment-dependent vibrational response. Here we assess general-purpose machine-learned force fields across small molecules, finite-temperature infrared spectra, gas-phase peptides, and monomeric, oligomeric, and solvated protein assemblies. To enable this analysis, we introduce QVib, a dataset of 293 molecules and 1365 conformers, together with peptide amide-band benchmarks and p53 oligomerization-domain models, to evaluate vibrational transferability from DFT references to experimental spectra. Across these systems, machine-learned force fields substantially improve over molecular mechanics in reproducing DFT-level forces, vibrational frequencies, densities of states, mode eigenvectors, conformational energetics, and experimental infrared spectra. Among models with explicit long-range electrostatics, SO3LR provides the most favourable accuracy-cost balance for the biomolecular systems considered. These results show that machine-learned force-field dynamics can recover collective, environment-dependent vibrational landscapes at near-DFT fidelity, enabling spectroscopically validated biomolecular simulations at force-field-like cost.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces the QVib dataset (293 molecules, 1365 conformers) along with peptide and p53 protein models to benchmark general-purpose machine-learned force fields (MLFFs) against molecular mechanics (MM) for reproducing DFT-level forces, vibrational frequencies, densities of states, eigenvectors, conformational energetics, and experimental IR spectra across gas-phase and solvated systems. It concludes that MLFFs achieve near-DFT fidelity at force-field cost, with SO3LR offering the best accuracy-cost trade-off among models with explicit electrostatics.

Significance. If the central claims hold after addressing the noted gaps, the work would be significant for enabling spectroscopically validated biomolecular MD at scale. The introduction of QVib and the multi-scale evaluation (molecules to solvated oligomers) provide a concrete testbed for vibrational transferability that is currently lacking in the MLFF literature.

major comments (3)

[Abstract] Abstract: the claim that MLFFs reproduce experimental infrared spectra at near-DFT fidelity is load-bearing for the central conclusion, yet the text provides no quantitative DFT-vs-experiment error metrics (e.g., MAE on amide I/II frequencies or intensities) on the same systems used for MLFF evaluation. Without this benchmark, improvements over MM could be limited by systematic DFT errors in dispersion or anharmonicity rather than demonstrating experimental fidelity.
[Abstract] Abstract and methods (inferred from dataset description): no details are given on training/test splits, data exclusion criteria, or error bars for the reported improvements in forces, frequencies, and DOS. This absence prevents verification that the MLFF gains are not inflated by overfitting or cherry-picked conformers in QVib.
[Results (p53 and solvated sections)] Results on p53 oligomers and solvated systems: the assertion of environment-dependent vibrational landscapes at near-DFT fidelity rests on the assumption that the chosen DFT functional/basis is transferable; known DFT shortcomings in H-bonded charge transfer and dispersion could dominate the reported MLFF-MM differences, but no sensitivity analysis to functional choice is referenced.

minor comments (2)

[Notation] Notation for vibrational quantities (frequencies, DOS, eigenvectors) should be defined consistently in the main text rather than relying on supplementary material.
[Figures] Figure captions for IR spectra comparisons should explicitly state the DFT functional and basis set used for the reference data.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which helps clarify the scope and limitations of our claims. We respond to each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that MLFFs reproduce experimental infrared spectra at near-DFT fidelity is load-bearing for the central conclusion, yet the text provides no quantitative DFT-vs-experiment error metrics (e.g., MAE on amide I/II frequencies or intensities) on the same systems used for MLFF evaluation. Without this benchmark, improvements over MM could be limited by systematic DFT errors in dispersion or anharmonicity rather than demonstrating experimental fidelity.

Authors: We agree the abstract phrasing could be tightened. The manuscript demonstrates that MLFFs achieve closer agreement with experimental IR spectra than MM does, mediated through improved fidelity to the DFT reference (frequencies, DOS, eigenvectors). However, we do not report direct quantitative DFT-vs-experiment MAEs on the identical QVib or peptide systems. We will revise the abstract and relevant results paragraphs to state explicitly that 'near-DFT fidelity' refers to agreement with the DFT calculations, while the experimental match is shown via superior alignment relative to MM. This addresses the concern without overstating experimental validation. revision: partial
Referee: [Abstract] Abstract and methods (inferred from dataset description): no details are given on training/test splits, data exclusion criteria, or error bars for the reported improvements in forces, frequencies, and DOS. This absence prevents verification that the MLFF gains are not inflated by overfitting or cherry-picked conformers in QVib.

Authors: The evaluated MLFFs (including SO3LR) are general-purpose models pretrained on separate datasets and are not retrained or fine-tuned on QVib. QVib serves solely as an external benchmark for transferability. Consequently, no training/test splits or exclusion criteria apply to the MLFF evaluation itself. We will add explicit language in the Methods and Results sections stating this, together with error bars (standard deviations across conformers or bootstrap estimates) for the reported force, frequency, and DOS metrics to improve verifiability. revision: yes
Referee: [Results (p53 and solvated sections)] Results on p53 oligomers and solvated systems: the assertion of environment-dependent vibrational landscapes at near-DFT fidelity rests on the assumption that the chosen DFT functional/basis is transferable; known DFT shortcomings in H-bonded charge transfer and dispersion could dominate the reported MLFF-MM differences, but no sensitivity analysis to functional choice is referenced.

Authors: We acknowledge that all MLFF–MM comparisons are performed against a single DFT reference (standard hybrid functional and basis set). Systematic DFT errors in dispersion or charge transfer could influence absolute values, though the relative MLFF vs. MM improvements remain internally consistent with that reference. A full sensitivity study across multiple functionals was not performed. We will add a paragraph in the Methods and a brief limitations discussion noting the functional choice and its known shortcomings, while emphasizing that the central result is the improved transferability of MLFFs to the chosen DFT level. revision: partial

Circularity Check

0 steps flagged

No significant circularity; benchmarks are external

full rationale

The paper evaluates MLFF performance via direct comparison to independent DFT calculations and experimental IR spectra on the newly introduced QVib dataset, gas-phase peptides, and protein models. No claimed result reduces by the paper's equations to a quantity defined in terms of itself, no fitted parameter is relabeled as a prediction, and no load-bearing premise rests on a self-citation chain. All reported improvements (forces, frequencies, DOS, eigenvectors, conformer energies, spectra) are measured against external references, satisfying the self-contained criterion for a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the representativeness of the QVib dataset and the suitability of the chosen DFT level as reference; no new free parameters or invented entities are introduced by this benchmarking study.

axioms (1)

domain assumption DFT calculations at the chosen level provide a reliable reference for molecular forces, energies, and vibrational properties.
The paper uses DFT results as the target for evaluating all force fields.

pith-pipeline@v0.9.0 · 5784 in / 1255 out tokens · 40684 ms · 2026-05-25T07:22:45.680687+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

machine-learned force fields substantially improve over molecular mechanics in reproducing DFT-level forces, vibrational frequencies, densities of states, mode eigenvectors, conformational energetics, and experimental infrared spectra
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SO3LR model trained on PBE0+MBD calculations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.