pith. sign in

arxiv: 2607.01362 · v1 · pith:OEACZTGCnew · submitted 2026-07-01 · ⚛️ physics.chem-ph · cs.LG· q-bio.BM

Enerzyme: A Framework for Efficient Training of Reactive Neural Network Potentials for Enzyme Catalysis with Application to Methyltransferases

Pith reviewed 2026-07-03 17:52 UTC · model grok-4.3

classification ⚛️ physics.chem-ph cs.LGq-bio.BM
keywords neural network potentialsenzyme catalysismethyltransferasesQM cluster modelsreaction energeticsatomic chargestransferability
0
0 comments X

The pith

Neural network potentials trained on under 1,000 system-specific points reproduce methyltransferase reaction energetics and transition-state structures with near-chemical accuracy in clusters up to 545 atoms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Enerzyme framework for training neural network potentials on quantum mechanical cluster models of S-adenosyl-L-methionine-dependent methyltransferases. It shows that these potentials, when trained on fewer than 1,000 reactive datapoints generated through automated cluster construction and flexible scans, match DFT-level reaction energetics and transition-state geometries despite large system sizes and charge-transfer effects. Direct supervision of atomic charges together with dielectric screening improves stability, while multitask learning of charges yields chemically interpretable reactivity descriptors. The approach also demonstrates transferability when data from multiple substrates are combined.

Core claim

NNPs trained on fewer than 1,000 system-specific datapoints reproduce reaction energetics and transition-state structures for MTase clusters containing up to 545 atoms with near-chemical accuracy, using electrostatics-aware architectures, automated QM-cluster construction, and reactive dataset generation via iterative flexible scans and nudged elastic band calculations.

What carries the argument

Modular electrostatics-aware NNP architectures combined with automated QM-cluster construction and reactive dataset generation that includes direct atomic-charge supervision and consistent dielectric screening.

If this is right

  • Iterative flexible scans and nudged elastic band calculations impose stricter accuracy requirements on NNPs than conventional dataset error metrics.
  • Multitask-learned atomic charges capture charge-transfer and polarization trends and serve as chemically meaningful reactivity descriptors.
  • Transferability across chemically diverse catechol O-methyltransferase substrates improves as training data expand across multiple enzymes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same training protocol could be applied to QM clusters of other enzyme families once comparable reactive datasets are generated.
  • The resulting NNPs could be coupled to larger-scale molecular dynamics to explore full-protein conformational effects on catalysis at reduced cost.
  • Performance on reactions that require explicit solvent molecules beyond the implicit dielectric model remains an open test.

Load-bearing premise

Automated QM-cluster construction and reactive dataset generation produce representative configurations that capture essential polarization, charge transfer, and solvent effects without missing critical reaction-path regions.

What would settle it

Large errors in NEB-computed barrier heights or transition-state bond lengths for an MTase reaction whose configurations were absent from the training set.

Figures

Figures reproduced from arXiv: 2607.01362 by Heather J. Kulik, Weiliang Luo.

Figure 1
Figure 1. Figure 1: An overview of data generation for SAM-MT enzyme cluster models. From a full SAM-MT structure in the PDB, we construct its cluster model by manual selection or an automated workflow with QuantumPDB. We use high-temperature steered MD to sample the configurations along reaction paths with efficient, reactive simulators, including semi-empirical QM methods or universal NNPs. The obtained structures are label… view at source ↗
Figure 2
Figure 2. Figure 2: An overview of studying reactivity in enzyme cluster models with electrostatics-aware NNPs. Left: We implemented NNPs in a modularized way in our Enerzyme package, where the key modules that equip an NNP with electrostatics are shown in blue. Right: We perform flexible scans or NEBs to study reactivity and evaluate NNPs using our Enerzymette workflow manager. A methyl-transfer reaction from an SAM sulfoniu… view at source ↗
Figure 4
Figure 4. Figure 4: NEB-Estimated TS structures of the reaction center of methyl group transfer in HcgC and PfPMT cluster models by the reference DFT and Enerzyme-NNPs. The chemical structures of the methylated substrates in both enzymes are shown on the left with the transferred methyl group in green. dSC, dCX, and ∠SCX are the local geometry descriptors defined in [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Single-point energy evaluation of NNPs along DFT flexible scan trajectories in COMT, HcgC, and PfPMT cluster models. The COMT cluster comes from PDB ID 2ZVJ. The initial energies of different energy curves are aligned to be the same, and the root mean square errors (RMSE) are calculated between the energy curves of each NNP and the reference DFT after energy alignment. The reaction coordinate is the differ… view at source ↗
Figure 8
Figure 8. Figure 8: The correlation between the partial charge of the nucleophilic atom of COMT substrates and the reaction barrier and the reaction energy change. For all systems from 5 PDB IDs (Scheme 1), their reaction barriers and atomic charges obtained from reference DFT flexible scans and an Enerzyme NNP flexible scan with atomic charge prediction are shown with a legend corresponding to each system at the top left. Th… view at source ↗
Figure 9
Figure 9. Figure 9: Accuracy–cost trade-offs of Enerzyme-NNPs in iterative scan simulations on COMT cluster models. Left: The x-axis reports the total GPU hours of a complete Enerzyme-NNP workflow, while the y-axis reports the mean absolute error (MAE) of reaction energy barriers relative to the reference DFT calculations. Dashed and dotted vertical lines indicate the total GPU time cost of the corresponding iterative scans d… view at source ↗
read the original abstract

Quantum mechanical (QM) cluster models provide an effective framework for mechanistic studies of enzymatic reactions but remain computationally demanding. Neural network potentials (NNPs) offer a promising route to reduce this cost, but enzymes present challenges beyond small molecules, including large system sizes, implicit-solvent environments, substantial polarization, and charge transfer. Here, we present an integrated software framework for efficient NNP training for mechanistic studies of enzymes, demonstrated on QM cluster models of S-adenosyl-L-methionine-dependent methyltransferases (MTases). Our Enerzyme code introduces modular electrostatics-aware NNP architectures and combines automated QM-cluster construction with reactive dataset generation. The Enerzymette subpackage automates reaction pathway exploration at both NNP and DFT levels. We show that iterative flexible scans and nudged elastic band calculations impose stricter requirements on NNPs than conventional dataset metrics. Nevertheless, NNPs trained on fewer than 1,000 system-specific datapoints reproduce reaction energetics and transition-state structures for MTase clusters containing up to 545 atoms with near-chemical accuracy. Direct supervision of atomic charges and consistent dielectric screening substantially improve simulation stability and accuracy, while multitask-learned atomic charges capture charge transfer and polarization trends and provide chemically meaningful descriptors of reactivity. Finally, transferability across chemically diverse catechol O-methyltransferase substrates indicates that NNPs learn generalizable reactivity patterns as training data expand across multiple enzymes. Together, these results establish a foundation for accelerating enzyme mechanistic studies and guide future NNP development for biomolecular reactivity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript presents Enerzyme, an integrated software framework for training electrostatics-aware neural network potentials (NNPs) on QM cluster models of enzymatic reactions, with application to S-adenosyl-L-methionine-dependent methyltransferases (MTases). Enerzymette automates QM-cluster construction and reactive dataset generation, supporting iterative flexible scans and nudged elastic band (NEB) calculations at both NNP and DFT levels. The central claim is that NNPs trained on fewer than 1,000 system-specific datapoints achieve near-chemical accuracy in reproducing reaction energetics and transition-state structures for MTase clusters up to 545 atoms; direct charge supervision and consistent dielectric screening improve stability, while multitask-learned atomic charges capture polarization and charge transfer. Transferability across chemically diverse substrates is also reported.

Significance. If the results hold, the work would be significant for computational enzymology by demonstrating a practical route to NNP-based modeling of large reactive biomolecular systems that incorporates polarization and charge transfer effects. The modular electrostatics-aware architectures, automated reactive dataset pipeline, and emphasis on stricter NEB/flexible-scan validation (rather than conventional metrics alone) address documented limitations of standard NNP training for enzymes. Explicit credit is due for the reproducible software framework and the demonstration that charge supervision yields chemically meaningful descriptors.

major comments (1)
  1. [Abstract] Abstract: the claim that NNPs trained on <1,000 system-specific datapoints reproduce energetics and TS structures 'with near-chemical accuracy' under NEB and iterative flexible-scan validation rests on the unquantified assumption that the automated QM-cluster construction and reactive dataset generation produce representative configurations. No coverage metric (e.g., fraction of reaction coordinate sampled or distance to nearest training point for held-out TS geometries) is supplied to confirm that critical polarization/charge-transfer regions are not omitted; this is load-bearing for the central claim in 545-atom clusters.
minor comments (1)
  1. [Abstract] Abstract: numerical error values, baseline comparisons, and validation statistics for the 'near-chemical accuracy' claim are not reported, making it difficult to assess the result against standard chemical accuracy thresholds (~1 kcal/mol).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. We address the single major comment below and will revise the manuscript accordingly to strengthen the presentation of our results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that NNPs trained on <1,000 system-specific datapoints reproduce energetics and TS structures 'with near-chemical accuracy' under NEB and iterative flexible-scan validation rests on the unquantified assumption that the automated QM-cluster construction and reactive dataset generation produce representative configurations. No coverage metric (e.g., fraction of reaction coordinate sampled or distance to nearest training point for held-out TS geometries) is supplied to confirm that critical polarization/charge-transfer regions are not omitted; this is load-bearing for the central claim in 545-atom clusters.

    Authors: We agree that explicit coverage metrics would make the central claim more robust. The Enerzymette pipeline is designed to sample the reaction coordinate via iterative flexible scans and NEB paths at both DFT and NNP levels, and the held-out TS validation already demonstrates reproduction of energetics and structures. Nevertheless, to directly address the concern, the revised manuscript will include quantitative coverage analysis: the fraction of the reaction coordinate spanned by training points and the minimum distance (in the NNP descriptor space) between held-out TS geometries and the nearest training configurations. These additions will confirm adequate sampling of polarization and charge-transfer regions. revision: yes

Circularity Check

0 steps flagged

No circularity; results rest on held-out validation of trained NNPs

full rationale

The paper presents an empirical framework and training results for reactive NNPs on QM-cluster data for MTases. Reported accuracies on reaction energetics and TS structures are obtained via NEB and iterative flexible-scan validation on configurations generated separately from the training set. No equations, derivations, or self-citations reduce any claimed prediction to a fitted input by construction. The central claims are supported by external performance metrics on held-out paths rather than self-referential definitions or renamed fits.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; all ledger entries are therefore empty.

pith-pipeline@v0.9.1-grok · 5816 in / 1111 out tokens · 22102 ms · 2026-07-03T17:52:02.058015+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 11 canonical work pages · 2 internal anchors

  1. [1]

    We combined automated QM-cluster construction, reactive dataset generation, electrostatics-aware NNP modules, and iterative reaction-path exploration

    Conclusions In this work, we developed an integrated framework for NNP-driven mechanistic simulations in QM cluster models that we demonstrated on representative MTases. We combined automated QM-cluster construction, reactive dataset generation, electrostatics-aware NNP modules, and iterative reaction-path exploration. These advances enabled NNP developme...

  2. [2]

    Accuracy and Efficiency Benchmarks of Pretrained Machine Learning Potentials for Molecular Simulations

    (3) Hammes, G. G.; Benkovic, S. J.; Hammes-Schiffer, S. Flexibility, Diversity, and Cooperativity: Pillars of Enzyme Catalysis. Biochemistry 2011, 50, 10422-10430. (4) S. Chaturvedi, S.; Bím, D.; Z. Christov, C.; N. Alexandrova, A. From random to rational: improving enzyme design through electric fields, second coordination sphere interactions, and confor...

  3. [3]

    Advances in the Simulations of Enzyme Reactivity in the Dawn of the Artificial Intelligence Age

    (43) Świderek, K.; Bertran, J.; Zinovjev, K.; Tuñón, I.; Moliner, V. Advances in the Simulations of Enzyme Reactivity in the Dawn of the Artificial Intelligence Age. WIREs Computational Molecular Science 2025, 15, e70003. 39 (44) Lei, Y.-K.; Yagi, K.; Sugita, Y. Efficient Training of Neural Network Potentials for Chemical and Enzymatic Reactions by Contin...

  4. [4]

    DPA4: Pushing the Accuracy-Cost Frontier of Interatomic Potentials with EMFA SO(2) Convolution

    (51) Li, T.; Li, W.; Peng, A.; Xue, J.; Zhang, L.; Zhang, D.; Wang, H. DPA4: Pushing the Accuracy-Cost Frontier of Interatomic Potentials with EMFA SO(2) Convolution. arXiv.org 2026, DOI:10.48550/arXiv.2606.02419. (52) Schreiner, M.; Bhowmik, A.; Vegge, T.; Busk, J.; Winther, O. Transition1x - a dataset for building generalizable reactive machine learning...

  5. [5]

    ANI-1xBB: An ANI-Based Reactive Potential for Small Organic Molecules

    (53) Zhang, S.; Zubatyuk, R.; Yang, Y.; Roitberg, A.; Isayev, O. ANI-1xBB: An ANI-Based Reactive Potential for Small Organic Molecules. J. Chem. Theory Comput. 2025, 21, 4365-4374. (54) Levine, D. S.; Shuaibi, M.; Spotte-Smith, E. W. C.; Taylor, M. G.; Hasyim, M. R.; Michel, K.; Batatia, I.; Csányi, G.; Dzamba, M.; Eastman, P.; Frey, N. C.; Fu, X.; Gharak...

  6. [6]

    A Benchmark for Quantum Chemistry Relaxations via Machine Learning Interatomic Potentials

    (77) Fu, C.; Lin, Y.; Krueger, Z.; Yu, W.; Qian, X.; Yoon, B.-J.; Arróyave, R.; Qian, X.; Maeda, T.; Nakata, M.; Ji, S. A Benchmark for Quantum Chemistry Relaxations via Machine Learning Interatomic Potentials. arXiv.org 2025, DOI:10.48550/arXiv.2506.23008. (78) Struck, A.-W.; Thompson, M. L.; Wong, L. S.; Micklefield, J. S-Adenosyl-Methionine-Dependent M...

  7. [7]

    Strategies for Two-Electron Integral Evaluation. J. Chem. Theory Comput. 2008, 4, 222-231. (104) Ufimtsev, I. S.; Martinez, T. J. Quantum Chemistry on Graphical Processing Units

  8. [8]

    Direct Self-Consistent-Field Implementation. J. Chem. Theory Comput. 2009, 5, 1004-1015. (105) Ufimtsev, I. S.; Martinez, T. J. Quantum Chemistry on Graphical Processing Units

  9. [9]

    Analytical Energy Gradients, Geometry Optimization, and First Principles Molecular Dynamics. J. Chem. Theory Comput. 2009, 5, 2619-2628. (106) Hariharan, P. C.; Pople, J. A. The influence of polarization functions on molecular orbital hydrogenation energies. Theoret. Chim. Acta 1973, 28, 213-222. (107) Lee, C.; Yang, W.; Parr, R. G. Development of the Col...

  10. [10]

    Effect of the damping function in dispersion corrected density functional theory

    (109) Grimme, S.; Ehrlich, S.; Goerigk, L. Effect of the damping function in dispersion corrected density functional theory. Journal of Computational Chemistry 2011, 32, 1456-1465. (110) York, D. M.; Karplus, M. A Smooth Solvation Potential Based on the Conductor-Like Screening Model. J. Phys. Chem. A 1999, 103, 11060-11079. (111) Lange, A. W.; Herbert, J...

  11. [11]

    P.; Simm, G.; Ortner, C.; Csanyi, G

    (131) Batatia, I.; Kovacs, D. P.; Simm, G.; Ortner, C.; Csanyi, G. MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields. Advances in Neural Information Processing Systems 2022, 35, 11423-11436. (132) Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Ant...

  12. [12]

    (134) Prechelt, L. In Neural Networks: Tricks of the Trade; Orr, Genevieve B.;Müller, Klaus-Robert, Eds.; Springer: Berlin, Heidelberg, 1998, DOI:10.1007/3-540-49430-8_3,55-69 45 (135) Morales-Brotons, D.; Vogels, T.; Hendrikx, H. Exponential Moving Average of Weights in Deep Learning: Dynamics and Benefits. Transactions on Machine Learning Research

  13. [13]

    T.; Kaakkola, S

    (136) Männistö, P. T.; Kaakkola, S. Catechol-O-methyltransferase (COMT): Biochemistry, Molecular Biology, Pharmacology, and Clinical Efficacy of the New Selective COMT Inhibitors. Pharmacological Reviews 1999, 51, 593-628. (137) Izrailev, S.; Stepaniants, S.; Isralewitz, B.; Kosztin, D.; Lu, H.; Molnar, F.; Wriggers, W.; Schulten, K. Steered Molecular Dyn...

  14. [14]

    (7) Unke, O

    https://doi.org/10.1186/1758-2946-6-12. (7) Unke, O. T.; Meuwly, M. PhysNet: A Neural Network for Predicting Energies, Forces, Dipole Moments, and Partial Charges. J. Chem. Theory Comput. 2019, 15 (6), 3678–3693. https://doi.org/10.1021/acs.jctc.9b00181. (8) Unke, O. T.; Chmiela, S.; Gastegger, M.; Schütt, K. T.; Sauceda, H. E.; Müller, K.-R. SpookyNet: L...

  15. [15]

    (9) Batatia, I.; Kovacs, D

    https://doi.org/10.1038/s41467-021-27504-0. (9) Batatia, I.; Kovacs, D. P.; Simm, G.; Ortner, C.; Csanyi, G. MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields. Adv. Neural Inf. Process. Syst. 2022, 35, 11423–11436. (10) Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, ...

  16. [16]

    (12) Unke, O

    https://github.com/MMunibas/PhysNet (accessed 2025-12-04). (12) Unke, O. OUnke/SpookyNet,

  17. [17]

    (13) ACEsuit/Mace,

    https://github.com/OUnke/SpookyNet (accessed 2025-12-05). (13) ACEsuit/Mace,

  18. [18]

    (14) Kovács, D

    https://github.com/ACEsuit/mace (accessed 2025-12-05). (14) Kovács, D. P.; Moore, J. H.; Browning, N. J.; Batatia, I.; Horton, J. T.; Pu, Y.; Kapil, V.; Witt, W. C.; Magdău, I.-B.; Cole, D. J.; Csányi, G. MACE-OFF: Short-Range Transferable Machine Learning Force Fields for Organic Molecules. J. Am. Chem. Soc. 2025, 147 (21), 17598–17611. https://doi.org/1...