Training Hamiltonian neural networks without backpropagation
Pith reviewed 2026-05-23 16:30 UTC · model grok-4.3
The pith
Data-driven sampling of network parameters trains Hamiltonian neural networks without backpropagation, yielding over 100 times faster CPU training and four orders of magnitude higher accuracy on chaotic systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a backpropagation-free procedure using data-agnostic and data-driven algorithms to sample network parameters produces Hamiltonian neural networks that approximate dynamical systems more accurately than gradient-based training when the target functions have steep gradients or wide domains, with measured speedups exceeding 100 times on CPUs and accuracy improvements of more than four orders of magnitude in chaotic regimes.
What carries the argument
Data-driven sampling of the network parameters, which selects weights and biases to match observed trajectories while preserving Hamiltonian structure without any gradient computation or iterative refinement.
If this is right
- Training time on standard CPUs drops by more than two orders of magnitude for the same Hamiltonian approximation task.
- Prediction error on chaotic orbits falls by more than four orders of magnitude relative to gradient-optimized networks.
- The method succeeds on input domains and gradient magnitudes where iterative optimization stalls or converges slowly.
- Hamiltonian structure is maintained by construction through the sampling process rather than through an explicit loss term.
- No backpropagation or automatic differentiation is required at any stage of parameter selection.
Where Pith is reading between the lines
- The same sampling strategy might extend to other structure-preserving networks, such as those enforcing symplectic or energy-conserving constraints, without requiring new loss functions.
- If sampling distributions can be refined further from modest data, the approach could reduce the need for large training sets in physics-informed modeling.
- Performance on high-dimensional phase spaces remains untested and could reveal whether the sampling efficiency scales with dimension.
Load-bearing premise
That high-quality parameter values capable of preserving Hamiltonian structure and generalizing to new data can be found by direct sampling without any gradient information or iterative refinement, even when the loss landscape is difficult.
What would settle it
Running the data-driven sampler on the Hénon-Heiles system and measuring whether the resulting network's predicted trajectories deviate by more than 10^-4 from the true conserved quantities over long times, compared with a gradient-trained baseline.
Figures
read the original abstract
Neural networks that synergistically integrate data and physical laws offer great promise in modeling dynamical systems. However, iterative gradient-based optimization of network parameters is often computationally expensive and suffers from slow convergence. In this work, we present a backpropagation-free algorithm to accelerate the training of neural networks for approximating Hamiltonian systems through data-agnostic and data-driven algorithms. We empirically show that data-driven sampling of the network parameters outperforms data-agnostic sampling or the traditional gradient-based iterative optimization of the network parameters when approximating functions with steep gradients or wide input domains. We demonstrate that our approach is more than 100 times faster with CPUs than the traditionally trained Hamiltonian Neural Networks using gradient-based iterative optimization and is more than four orders of magnitude accurate in chaotic examples, including the H\'enon-Heiles system.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a backpropagation-free algorithm for training Hamiltonian neural networks via data-agnostic and data-driven sampling of network parameters. It claims empirical superiority of data-driven sampling over both data-agnostic sampling and traditional gradient-based optimization, with reported gains of >100x speed on CPUs and >4 orders of magnitude accuracy on functions with steep gradients, wide domains, and chaotic systems such as the Hénon-Heiles system.
Significance. If the central empirical claims hold under rigorous controls, the work could provide a practical route to faster training of structure-preserving neural networks for dynamical systems, particularly where gradient descent struggles with steep or chaotic landscapes. The absence of machine-checked proofs or parameter-free derivations means the significance rests entirely on the reproducibility and generality of the reported speed/accuracy improvements.
major comments (3)
- [Results / Experiments] The experimental comparisons (presumably in the results section) report >100x speed and 10^4 accuracy gains without error bars, multiple random seeds, or ablation on sampling hyperparameters; this leaves open whether the data-driven procedure reliably locates Hamiltonian-preserving solutions or simply benefits from favorable random draws in the tested low-dimensional cases.
- [Method] The description of the data-driven sampling algorithm does not specify its exact mechanism (e.g., how data is used to guide parameter selection without gradients or iterative refinement); without this, it is impossible to evaluate whether the method avoids the exponential improbability of hitting narrow high-quality basins in high-dimensional parameter space for steep-gradient or chaotic Hamiltonians.
- [Hénon-Heiles Experiments] For the Hénon-Heiles example, the paper should report long-term energy conservation errors and trajectory divergence metrics over integration times comparable to those used in gradient-trained HNN baselines; the current claims of four orders of magnitude accuracy improvement rest on short-term or single-trajectory comparisons whose controls are not visible.
minor comments (2)
- [Preliminaries] Notation for the sampling distributions and loss functions could be made more explicit to allow direct comparison with standard HNN formulations.
- [Abstract / Introduction] The abstract and introduction would benefit from a concise statement of the precise conditions under which data-driven sampling is expected to succeed versus fail.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address each major comment below and will revise the manuscript to incorporate the requested improvements.
read point-by-point responses
-
Referee: [Results / Experiments] The experimental comparisons (presumably in the results section) report >100x speed and 10^4 accuracy gains without error bars, multiple random seeds, or ablation on sampling hyperparameters; this leaves open whether the data-driven procedure reliably locates Hamiltonian-preserving solutions or simply benefits from favorable random draws in the tested low-dimensional cases.
Authors: We agree that the reported results would benefit from additional statistical controls. In the revised manuscript we will rerun all experiments over multiple random seeds, report means with error bars, and include ablations on sampling hyperparameters to demonstrate that the performance gains are reliable rather than due to favorable draws. revision: yes
-
Referee: [Method] The description of the data-driven sampling algorithm does not specify its exact mechanism (e.g., how data is used to guide parameter selection without gradients or iterative refinement); without this, it is impossible to evaluate whether the method avoids the exponential improbability of hitting narrow high-quality basins in high-dimensional parameter space for steep-gradient or chaotic Hamiltonians.
Authors: We will expand the method section with a precise algorithmic description and pseudocode that details how the training data is used to evaluate and select parameter samples without gradients or iterative refinement. revision: yes
-
Referee: [Hénon-Heiles Experiments] For the Hénon-Heiles example, the paper should report long-term energy conservation errors and trajectory divergence metrics over integration times comparable to those used in gradient-trained HNN baselines; the current claims of four orders of magnitude accuracy improvement rest on short-term or single-trajectory comparisons whose controls are not visible.
Authors: We will extend the Hénon-Heiles experiments in the revised manuscript to include long-term energy conservation errors and trajectory divergence metrics over integration times matching those in prior gradient-based HNN studies. revision: yes
Circularity Check
No circularity: empirical sampling comparison is independent of fitted inputs
full rationale
The paper reports an empirical algorithm for training Hamiltonian neural networks via data-agnostic and data-driven parameter sampling, with performance claims (speed and accuracy gains) resting on direct experimental comparisons against gradient-based baselines on benchmark systems. No derivation chain, uniqueness theorem, ansatz, or prediction is presented that reduces by construction to quantities defined inside the paper; the central results are falsifiable measurements on held-out trajectories and do not invoke self-citations as load-bearing premises. The work is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The target systems obey Hamiltonian dynamics and can be represented by a neural network whose architecture enforces energy conservation.
Reference graph
Works this paper leans on
-
[1]
On learning Hamiltonian systems from data
Tom Bertalan, Felix Dietrich, Igor Mezic, and Ioannis G. Kevrekidis. On learning hamiltonian systems from data. Chaos: An Interdisciplinary Journal of Nonlinear Science, 29(12), 2019. doi: 10.1063/1.5128231
-
[2]
Sampling weights of deep neural networks
Erik L Bolager, Iryna Burak, Chinmay Datar, Qing Sun, and Felix Dietrich. Sampling weights of deep neural networks. In Advances in Neural Information Processing Systems, volume 36, pages 63075–63116. Curran Associates, Inc., 2023
work page 2023
-
[3]
A compositional object- based approach to learning physical dynamics
Michael B Chang, Tomer Ullman, Antonio Torralba, and Joshua B Tenenbaum. A compositional object- based approach to learning physical dynamics. arXiv, 2016
work page 2016
-
[4]
Symplectic recurrent neural networks
Zhengdao Chen, Jianyu Zhang, Martin Arjovsky, and Leon Bottou. Symplectic recurrent neural networks. arXiv, 2019
work page 2019
-
[5]
Solving partial differential equations with sampled neural networks
Chinmay Datar, Taniya Kapoor, Abhishek Chandra, Qing Sun, Iryna Burak, Erik Lien Bolager, Anna Veselovska, Massimo Fornasier, and Felix Dietrich. Solving partial differential equations with sampled neural networks. arXiv, May 2024
work page 2024
-
[6]
Symplectic learning for hamiltonian neural networks
Marco David and Florian Mehats. Symplectic learning for hamiltonian neural networks. Journal of Computational Physics, 494:112495, 2023
work page 2023
-
[7]
End-to-end differentiable physics for learning and control
Filipe de Avila Belbute-Peres, Kevin Smith, Kelsey Allen, Josh Tenenbaum, and J Zico Kolter. End-to-end differentiable physics for learning and control. Advances in neural information processing systems, 31, 2018
work page 2018
-
[8]
Port- hamiltonian neural networks for learning explicit time-dependent dynamical systems
Shaan A Desai, Marios Mattheakis, David Sondak, Pavlos Protopapas, and Stephen J Roberts. Port- hamiltonian neural networks for learning explicit time-dependent dynamical systems. Physical Review E, 104(3):034312, 2021
work page 2021
-
[9]
Hamiltonian neural networks with automatic symmetry detection
Eva Dierkes, Christian Offen, Sina Ober-Bloebaum, and Kathrin Flasskamp. Hamiltonian neural networks with automatic symmetry detection. Chaos: An Interdisciplinary Journal of Nonlinear Science , 33(6): 063115, 2023. doi: 10.1063/5.0142969
-
[10]
Hamiltonian dynamics of the lotka-volterra equations
Rui Loja Fernandes and Waldyr Muniz Oliva. Hamiltonian dynamics of the lotka-volterra equations. In International Conference on Differential Equations, Lisboa, pages 327–334. World Scientific, 1995
work page 1995
-
[11]
Understanding the difficulty of training deep feedforward neural networks
Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010
work page 2010
-
[12]
Samuel Greydanus, Misko Dzamba, and Jason Yosinski. Hamiltonian neural networks. Advances in neural information processing systems, 32, 2019. 5
work page 2019
-
[13]
Geometric numerical integration
Ernst Hairer, Marlis Hochbruck, Arieh Iserles, and Christian Lubich. Geometric numerical integration. Oberwolfach Reports, 3(1):805–882, 2006
work page 2006
-
[14]
On a general method in dynamics
William Rowan Hamilton. On a general method in dynamics. Philosophical Transactions of the Royal Society, 124:247–308, 1834
-
[15]
Second essay on a general method in dynamics
William Rowan Hamilton. Second essay on a general method in dynamics. Philosophical Transactions of the Royal Society, 125:95–144, 1835
-
[16]
Charles R. Harris, K. Jarrod Millman, Stefan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernandez del Rio, Mark Wiebe, Pearu Peterson, Pierre Gerard-Marchant, Kevin Shepp...
-
[17]
The applicability of the third integral of motion: some numerical experiments
Michel Henon and Carl Heiles. The applicability of the third integral of motion: some numerical experiments. Astronomical Journal, Vol. 69, p. 73 (1964), 69:73, 1964
work page 1964
-
[18]
Extreme learning machine: a new learning scheme of feedforward neural networks
Guang-Bin Huang, Qin-Yu Zhu, and Chee-Kheong Siew. Extreme learning machine: a new learning scheme of feedforward neural networks. In 2004 IEEE international joint conference on neural networks (IEEE Cat. No. 04CH37541), volume 2, pages 985–990, 2004
work page 2004
-
[19]
Universal approximation using incremental constructive feedforward networks with random hidden nodes
Guang-Bin Huang, Lei Chen, and Chee Siew. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council, 17:879–92, 2006. doi: 10.1109/TNN.2006.875977
-
[20]
J. D. Hunter. Matplotlib: A 2d graphics environment. Computing in Science & Engineering, 9(3):90–95,
-
[21]
doi: 10.1109/MCSE.2007.55
-
[22]
Increased rates of convergence through learning rate adaptation
Robert A Jacobs. Increased rates of convergence through learning rate adaptation. Neural networks, 1(4): 295–307, 1988
work page 1988
-
[23]
Reconstruction of observed mechanical motions with artificial intelligence tools
Antal Jakovac, Marcell T Kurbucz, and Péter Posfay. Reconstruction of observed mechanical motions with artificial intelligence tools. New Journal of Physics, 24(7):073021, 2022. doi: 10.1088/1367-2630/ac7c2d
-
[24]
Fault and noise tolerance in the incremental extreme learning machine
Ho Chun Leung, Chi Sing Leung, and Eric Wing Ming Wong. Fault and noise tolerance in the incremental extreme learning machine. IEEE Access, 7:155171–155183, 2019
work page 2019
-
[25]
Variational learning of Euler–Lagrange dynamics from data
Sina Ober-Bloebaum and Christian Offen. Variational learning of Euler–Lagrange dynamics from data. Journal of Computational and Applied Mathematics, 421:114780, 2023. doi: 10.1016/j.cam.2022.114780
-
[26]
Symplectic integration of learned Hamiltonian systems
C. Offen and S. Ober-Bloebaum. Symplectic integration of learned Hamiltonian systems. Chaos: An Interdisciplinary Journal of Nonlinear Science, 32(1):013122, 2022. doi: 10.1063/5.0065913
-
[27]
Functional-link net computing: theory, system architecture, and functionalities
Y-H Pao and Yoshiyasu Takefuji. Functional-link net computing: theory, system architecture, and functionalities. Computer, 25(5):76–79, 1992
work page 1992
-
[28]
Pytorch: An imperative style, high-performance deep learning library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-perfo...
work page 2019
-
[29]
Uniform approximation of functions with random bases
Ali Rahimi and Benjamin Recht. Uniform approximation of functions with random bases. In 2008 46th annual allerton conference on communication, control, and computing, pages 555–561. IEEE, 2008
work page 2008
-
[30]
Feed forward neural networks with random weights
Wouter F Schmidt, Martin A Kraaijveld, Robert PW Duin, et al. Feed forward neural networks with random weights. In International conference on pattern recognition, pages 1–1. IEEE Computer Society Press, 1992
work page 1992
-
[31]
Dissipative hamiltonian neural networks: Learning dissipative and conservative dynamics separately
Andrew Sosanya and Sam Greydanus. Dissipative hamiltonian neural networks: Learning dissipative and conservative dynamics separately. arXiv, 2022
work page 2022
-
[32]
Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stefan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, Ilhan Polat, Yu Feng, Eric W. Mo...
-
[33]
Visual interaction networks: Learning a physics simulator from video
Nicholas Watters, Daniel Zoran, Theophane Weber, Peter Battaglia, Razvan Pascanu, and Andrea Tacchetti. Visual interaction networks: Learning a physics simulator from video. Advances in neural information processing systems, 30, 2017
work page 2017
-
[34]
Nonseparable symplectic neural networks
Shiying Xiong, Yunjin Tong, Xingzhe He, Shuqi Yang, Cheng Yang, and Bo Zhu. Nonseparable symplectic neural networks. arXiv, 2022
work page 2022
-
[35]
Universal approximation of extreme learning machine with adaptive growth of hidden nodes
Rui Zhang, Yuan Lan, Guang-Bin Huang, and Zong-Ben Xu. Universal approximation of extreme learning machine with adaptive growth of hidden nodes. IEEE Transactions on Neural Networks and Learning Systems, 23(2):365–371, 2012. doi: 10.1109/TNNLS.2011.2178124
-
[36]
Dissipative symoden: Encoding hamiltonian dynamics with dissipation and control into deep learning
Yaofeng Desmond Zhong, Biswadip Dey, and Amit Chakraborty. Dissipative symoden: Encoding hamiltonian dynamics with dissipation and control into deep learning. In 8th International Conference on Learning Representations, ICLR 2020, Workshop on Integration of Deep Neural Models and Differential Equations (DeepDiffEq), 2020
work page 2020
-
[37]
Symplectic ode-net: Learning hamiltonian dynamics with control
Yaofeng Desmond Zhong, Biswadip Dey, and Amit Chakraborty. Symplectic ode-net: Learning hamiltonian dynamics with control. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 2020
work page 2020
-
[38]
Aiqing Zhu, Pengzhan Jin, and Yifa Tang. Deep hamiltonian networks based on symplectic integrators. arXiv, 2020. 7 Appendix A Mathematical framework Feed-forward neural networks: In this paper, we work with feed-forward neural networks config- ured for regression, i.e., no activation is used in the output layer to approximate a Hamiltonian. We define the ...
work page 2020
-
[39]
+ α(q2 1q2 − 1 3 q3 2), (B.9) where we set the bifurcation parameter α = 1 for the experiments in Table 2. Single pendulum: We show that all the methods can reach very low approximation errors in Figure B.3 when approximating the single pendulum Hamiltonian. 100 200 300 400 500 600 700 800 90010001100120013001400150016001700180019002000 Network width −12 ...
work page 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.