pith. sign in

arxiv: 2506.06558 · v3 · submitted 2025-06-06 · 💻 cs.LG · cs.NE

Rapid training of Hamiltonian graph networks using random features

Pith reviewed 2026-05-19 10:12 UTC · model grok-4.3

classification 💻 cs.LG cs.NE
keywords Hamiltonian Graph NetworksRandom FeaturesFast TrainingN-body DynamicsPhysical SymmetriesGraph Neural NetworksDynamical SystemsZero-shot Generalization
0
0 comments X

The pith

Hamiltonian Graph Networks can be trained 150-600 times faster using random features instead of gradient descent while keeping comparable accuracy and physical symmetries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that Hamiltonian Graph Networks, which combine graph neural networks with Hamiltonian mechanics to model physical N-body systems, can replace slow iterative optimizers like Adam with direct random feature-based parameter construction. This yields speedups of 150 to 600 times on mass-spring and molecular dynamics simulations involving up to 10,000 particles while preserving permutation, rotation, and translation invariance. The approach also shows zero-shot generalization from training on 8-node graphs to systems with 4096 nodes. A sympathetic reader would care because conventional training times have limited the use of such models for large-scale physical dynamics.

Core claim

Replacing iterative gradient-descent optimization with random feature-based parameter construction allows Hamiltonian Graph Networks to reach training speeds 150-600 times faster than with 15 standard optimizers, while delivering comparable accuracy on diverse N-body systems and retaining Hamiltonian structure plus physical invariances.

What carries the argument

Random feature-based parameter construction, which directly generates the network weights to enforce Hamiltonian dynamics and symmetries without any iterative refinement.

If this is right

  • Training becomes practical for systems with thousands of particles in short time.
  • Models trained on minimal 8-node examples generalize zero-shot to 4096-node systems.
  • Physical symmetries remain intact across different geometries and dimensions.
  • The method applies to both mass-spring and molecular dynamics benchmarks.
  • Performance stays robust when benchmarked against NeurIPS 2022 dataset standards.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar random-construction shortcuts could apply to other physics-constrained network families.
  • Lower training cost might enable on-the-fly adaptation of models in engineering simulations.
  • Limits may appear in systems with dissipation or external driving forces not tested here.

Load-bearing premise

Randomly generated features can produce parameters that automatically preserve Hamiltonian structure, physical invariances, and predictive accuracy without gradient-based adjustment.

What would settle it

If the random-feature model on a large test trajectory shows substantially higher error or violates energy conservation compared to an optimized Hamiltonian Graph Network, the speedup claim would not hold.

Figures

Figures reproduced from arXiv: 2506.06558 by Ana Cukarska, Atamert Rahma, Chinmay Datar, Felix Dietrich.

Figure 1
Figure 1. Figure 1: We propose an efficient training method for Hamiltonian graph networks using random [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of train and test N-body system posi [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Random-feature Hamiltonian graph neural network architecture. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Graphs considered in the experiments: (a) 3D lattice (nodes arranged on a 2D grid with motion in a 3D space - see Section 4.1 and Section 4.2), (b) an open chain (nodes with motion in 2D space - see Section 4.2), and (c) 2D closed chain (nodes with motion in 2D space - see Section 4.3). 4.1 Benchmarking against SOTA optimizers The goal of this experiment is to demonstrate the efficiency of our training app… view at source ↗
Figure 5
Figure 5. Figure 5: Illustration of accurate zero-shot generalization for 3D lattice (see Figure 4 (a)): Training on [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Zero-shot generalization in 2D open chain (see Figure 4 (b)): RF-HGN trained up to [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Illustration of position trajectories (first two columns), Hamiltonian predictions (third [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
read the original abstract

Learning dynamical systems that respect physical symmetries and constraints remains a fundamental challenge in data-driven modeling. Integrating physical laws with graph neural networks facilitates principled modeling of complex N-body dynamics and yields accurate and permutation-invariant models. However, training graph neural networks with iterative, gradient-descent-based optimization algorithms (e.g., Adam, RMSProp, LBFGS) often leads to slow training, especially for large, complex systems. In comparison to 15 different optimizers, we demonstrate that Hamiltonian Graph Networks (HGN) can be trained 150-600x faster - but with comparable accuracy - by replacing iterative optimization with random feature-based parameter construction. We show robust performance in diverse simulations, including N-body mass-spring and molecular dynamics systems in up to dimensions and 10,000 particles with different geometries, while retaining essential physical invariances with respect to permutation, rotation, and translation. Our proposed approach is benchmarked using a NeurIPS 2022 Datasets and Benchmarks Track publication to further demonstrate its versatility. We reveal that even when trained on minimal 8-node systems, the model can generalize in a zero-shot manner to systems as large as 4096 nodes without retraining. Our work challenges the dominance of iterative gradient-descent-based optimization algorithms for training neural network models for physical systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that Hamiltonian Graph Networks (HGN) for modeling N-body and molecular dynamics can be trained 150-600x faster with comparable accuracy by replacing gradient-based iterative optimization (e.g., Adam) with random feature-based parameter construction. It reports robust performance across simulations with up to 10,000 particles and different geometries, retention of permutation/rotation/translation invariances, and zero-shot generalization from 8-node training systems to 4096-node systems, benchmarked against a NeurIPS 2022 dataset.

Significance. If the central empirical result holds, the work would offer a practical alternative to standard optimizers for training physics-constrained graph networks, potentially enabling faster iteration on large-scale dynamical system modeling while maintaining symmetries. The reported generalization and invariance retention, if rigorously verified, would strengthen the case for structure-preserving random constructions in this domain.

major comments (3)
  1. [Method / random feature construction] The section on random feature construction (likely §3 or equivalent): the manuscript must include an explicit derivation or constraint showing how the random feature map produces parameters that lie on the Hamiltonian manifold and preserve symplectic structure/energy conservation. Random features typically approximate kernels but do not automatically enforce the required physical form; without this, the speedup claim rests on unverified empirical accuracy alone.
  2. [Experiments / results] Experimental setup and results (likely §4 and tables/figures): for the 150-600x speedup and 'comparable accuracy' claims against 15 optimizers, provide exact baseline code/implementation details, wall-clock measurements including any preprocessing for random features, and statistical error bars or multiple runs. The current support for the central claim is limited without these.
  3. [Generalization experiments] Generalization section (likely §4.3 or equivalent): clarify whether the random feature construction is system-size independent and how invariances are explicitly verified (e.g., via conservation checks or symmetry tests) when scaling from 8 to 4096 nodes or to 10,000-particle regimes; any deviation could undermine the zero-shot claim.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'in up to dimensions' appears truncated; specify the maximum spatial dimension tested.
  2. [Throughout] Notation and figures: ensure all symbols for the random feature map and Hamiltonian terms are defined before first use and that energy-drift plots (if present) are clearly labeled with scales.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses
  1. Referee: [Method / random feature construction] The section on random feature construction (likely §3 or equivalent): the manuscript must include an explicit derivation or constraint showing how the random feature map produces parameters that lie on the Hamiltonian manifold and preserve symplectic structure/energy conservation. Random features typically approximate kernels but do not automatically enforce the required physical form; without this, the speedup claim rests on unverified empirical accuracy alone.

    Authors: We agree that an explicit derivation strengthens the theoretical foundation. In the revised manuscript we will add a dedicated paragraph (or short subsection) in Section 3 that derives the random feature construction from the requirement that the resulting parameters define a Hamiltonian vector field. The derivation shows that the chosen random feature distribution, combined with the graph-network architecture, ensures the output lies on the symplectic manifold and conserves energy up to the approximation error of the random features. We will also include a brief remark on why this construction differs from generic kernel approximation. revision: yes

  2. Referee: [Experiments / results] Experimental setup and results (likely §4 and tables/figures): for the 150-600x speedup and 'comparable accuracy' claims against 15 optimizers, provide exact baseline code/implementation details, wall-clock measurements including any preprocessing for random features, and statistical error bars or multiple runs. The current support for the central claim is limited without these.

    Authors: We accept that additional experimental detail is required. The revised version will (i) list the precise library versions and hyper-parameter settings for all 15 baseline optimizers, (ii) report wall-clock times that explicitly separate random-feature preprocessing from the subsequent forward-pass evaluation, and (iii) include mean and standard-deviation results computed over five independent random seeds for every reported metric. These additions will appear in the main text and in a new appendix containing the full experimental protocol. revision: yes

  3. Referee: [Generalization experiments] Generalization section (likely §4.3 or equivalent): clarify whether the random feature construction is system-size independent and how invariances are explicitly verified (e.g., via conservation checks or symmetry tests) when scaling from 8 to 4096 nodes or to 10,000-particle regimes; any deviation could undermine the zero-shot claim.

    Authors: The random-feature map is constructed locally on nodes and edges and therefore does not depend on total system size; we will state this explicitly in Section 4.3. In addition, we will augment the generalization experiments with quantitative invariance checks: we will report relative energy drift and linear-momentum conservation errors on the 4096-node and 10,000-particle test systems, together with a permutation-symmetry test obtained by randomly reordering node indices. These diagnostics will be added to the existing zero-shot generalization figures and tables. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical validation on external benchmarks is self-contained

full rationale

The paper's central claim is an empirical demonstration that random-feature parameter construction yields 150-600x faster training of Hamiltonian Graph Networks with comparable accuracy on N-body and molecular-dynamics benchmarks. This result is obtained by direct experimental comparison against 15 optimizers on externally defined simulation tasks (including zero-shot generalization from 8-node to 4096-node systems) rather than by any mathematical derivation that reduces to a fitted parameter or self-citation. No load-bearing step equates a prediction to its own input by construction, and the cited NeurIPS 2022 benchmark is an independent dataset rather than a self-referential theorem.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the domain assumption that Hamiltonian Graph Networks already encode physical constraints and that random features can substitute for learned parameters without violating those constraints.

axioms (1)
  • domain assumption Hamiltonian Graph Networks inherently respect physical constraints such as energy conservation and permutation/rotation/translation invariance.
    This property is invoked as the basis for why the random-feature construction still yields valid physical models.

pith-pipeline@v0.9.0 · 5765 in / 1273 out tokens · 36616 ms · 2026-05-19T10:12:42.209629+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

108 extracted references · 108 canonical work pages · 3 internal anchors

  1. [1]

    End-to-end differentiable physics for learning and control

    Filipe de Avila Belbute-Peres et al. “End-to-end differentiable physics for learning and control”. In: Advances in neural information processing systems 31 (2018)

  2. [2]

    On learning Hamiltonian systems from data

    Tom Bertalan et al. “On learning Hamiltonian systems from data”. In:Chaos: An Interdisci- plinary Journal of Nonlinear Science 29.12 (2019)

  3. [3]

    Learning Articulated Rigid Body Dynamics with Lagrangian Graph Neural Network

    Ravinder Bhattoo, Sayan Ranu, and N M Anoop Krishnan. “Learning Articulated Rigid Body Dynamics with Lagrangian Graph Neural Network”. In: Advances in Neural Information Processing Systems. Ed. by S. Koyejo et al. V ol. 35. Curran Associates, Inc., 2022, pp. 29789– 29800

  4. [4]

    Sampling Weights of Deep Neural Networks

    Erik L Bolager et al. “Sampling Weights of Deep Neural Networks”. In:Advances in Neural Information Processing Systems. V ol. 36. Curran Associates, Inc., 2023, pp. 63075–63116

  5. [5]

    Gradient-Free Training of Recurrent Neural Networks

    Erik Lien Bolager et al. Gradient-Free Training of Recurrent Neural Networks. Oct. 30, 2024. arXiv: 2410.23467 [cs]. Pre-published

  6. [6]

    A unifying framework for spectrum- preserving graph sparsification and coarsening

    Gecia Bravo-Hermsdorff and Lee M. Gunderson. “A unifying framework for spectrum- preserving graph sparsification and coarsening”. In:Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook, NY , USA: Curran Asso- ciates Inc., 2019

  7. [7]

    GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modu- lation

    Marc Brockschmidt. “GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modu- lation”. In: Proceedings of the 37th International Conference on Machine Learning. Ed. by Hal Daumé III and Aarti Singh. V ol. 119. Proceedings of Machine Learning Research. PMLR, July 2020, pp. 1144–1152. 10

  8. [8]

    DGCL: an efficient communication library for distributed GNN training

    Zhenkun Cai et al. “DGCL: an efficient communication library for distributed GNN training”. In: Proceedings of the Sixteenth European Conference on Computer Systems. EuroSys ’21. Online Event, United Kingdom: Association for Computing Machinery, 2021, pp. 130–144. ISBN : 9781450383349. DOI: 10.1145/3447786.3456233

  9. [9]

    Building a knowledge graph to enable precision medicine

    Payal Chandak, Kexin Huang, and Marinka Zitnik. “Building a knowledge graph to enable precision medicine”. In: Scientific Data 10.1 (2023), p. 67

  10. [10]

    A Compositional Object-Based Approach to Learning Physical Dynamics

    Michael B Chang et al. “A compositional object-based approach to learning physical dynam- ics”. In: arXiv (2016). eprint: 1612.00341. Pre-published

  11. [11]

    Taming graph kernels with random features

    Krzysztof Marcin Choromanski. “Taming graph kernels with random features”. In:Proceed- ings of the 40th International Conference on Machine Learning. Ed. by Andreas Krause et al. V ol. 202. Proceedings of Machine Learning Research. PMLR, July 2023, pp. 5964–5977

  12. [12]

    Graph Neural Networks

    Gabriele Corso et al. “Graph Neural Networks”. In: Nature Reviews Methods Primers 4.1 (Mar. 2024), pp. 1–13. ISSN : 2662-8449. DOI: 10.1038/s43586-024-00294-7

  13. [13]

    Lagrangian Neural Networks

    Miles Cranmer et al. “Lagrangian Neural Networks”. In:ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations. 2019

  14. [14]

    Fast training of accurate physics-informed neural networks without gradient descent

    Chinmay Datar et al. Solving Partial Differential Equations with Sampled Neural Networks. May 31, 2024. arXiv: 2405.20836 [math]. Pre-published

  15. [15]

    Robust deep learning–based protein sequence design using Protein- MPNN

    Justas Dauparas et al. “Robust deep learning–based protein sequence design using Protein- MPNN”. In: Science 378.6615 (2022), pp. 49–56

  16. [16]

    Port-Hamiltonian neural networks for learning explicit time-dependent dynamical systems

    Shaan A Desai et al. “Port-Hamiltonian neural networks for learning explicit time-dependent dynamical systems”. In: Physical Review E 104.3 (2021), p. 034312

  17. [17]

    Graph neural networks at the Large Hadron Collider

    Gage DeZoort et al. “Graph neural networks at the Large Hadron Collider”. In: Nature Reviews Physics 5.5 (2023), pp. 281–303

  18. [18]

    Hamiltonian Neural Networks with Automatic Symmetry Detection

    Eva Dierkes et al. “Hamiltonian Neural Networks with Automatic Symmetry Detection”. In: Chaos: An Interdisciplinary Journal of Nonlinear Science 33.6 (June 1, 2023), p. 063115. ISSN : 1054-1500, 1089-7682

  19. [19]

    Incorporating Nesterov momentum into Adam

    Timothy Dozat. “Incorporating Nesterov momentum into Adam”. In:Proceedings of the 4th International Conference on Learning Representations, Workshop Track(May 2016)

  20. [20]

    Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

    John Duchi, Elad Hazan, and Yoram Singer. “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization”. In: Journal of Machine Learning Research 12.61 (2011), pp. 2121–2159

  21. [21]

    Random Projection Neural Networks of Best Approximation: Convergence Theory and Practical Applications

    Gianluca Fabiani. Random Projection Neural Networks of Best Approximation: Convergence Theory and Practical Applications. Feb. 2024. arXiv: 2402.11397 [cs]. Pre-published

  22. [22]

    Numerical Solution and Bifurcation Analysis of Nonlinear Partial Differential Equations with Extreme Learning Machines

    Gianluca Fabiani et al. “Numerical Solution and Bifurcation Analysis of Nonlinear Partial Differential Equations with Extreme Learning Machines”. In:Journal of Scientific Computing 89.2 (Nov. 2021), p. 44. ISSN : 0885-7474, 1573-7691. DOI: 10.1007/s10915-021-01650- 5

  23. [23]

    RandONets: Shallow Networks with Random Projections for Learn- ing Linear and Nonlinear Operators

    Gianluca Fabiani et al. “RandONets: Shallow Networks with Random Projections for Learn- ing Linear and Nonlinear Operators”. In: Journal of Computational Physics 520 (Jan. 2025), p. 113433. ISSN : 00219991. DOI: 10.1016/j.jcp.2024.113433

  24. [24]

    Structure-Aware Random Fourier Kernel for Graphs

    Jinyuan Fang et al. “Structure-Aware Random Fourier Kernel for Graphs”. In:Advances in Neural Information Processing Systems. Ed. by M. Ranzato et al. V ol. 34. Curran Associates, Inc., 2021, pp. 17681–17694

  25. [25]

    Numerical Bifurcation Analysis of PDEs From Lattice Boltzmann Model Simulations: A Parsimonious Machine Learning Approach

    Evangelos Galaris et al. “Numerical Bifurcation Analysis of PDEs From Lattice Boltzmann Model Simulations: A Parsimonious Machine Learning Approach”. In: Journal of Scientific Computing 92.2 (Aug. 2022), p. 34. ISSN : 0885-7474, 1573-7691. DOI: 10.1007/s10915- 022-01883-y

  26. [26]

    Fast and deep graph neural networks

    Claudio Gallicchio and Alessio Micheli. “Fast and deep graph neural networks”. In: Pro- ceedings of the AAAI conference on artificial intelligence . V ol. 34. 04. 2020, pp. 3898– 3905

  27. [27]

    Graph echo state networks

    Claudio Gallicchio and Alessio Micheli. “Graph echo state networks”. In: The 2010 interna- tional joint conference on neural networks (IJCNN). IEEE. 2010, pp. 1–8

  28. [28]

    Neural Message Passing for Quantum Chemistry

    Justin Gilmer et al. “Neural Message Passing for Quantum Chemistry”. In:Proceedings of the 34th International Conference on Machine Learning. Ed. by Doina Precup and Yee Whye Teh. V ol. 70. Proceedings of Machine Learning Research. PMLR, Aug. 2017, pp. 1263–1272. 11

  29. [29]

    SGD: General Analysis and Improved Rates

    Robert Mansel Gower et al. SGD: General Analysis and Improved Rates . 2019. arXiv: 1901.09401 [cs.LG]. Pre-published

  30. [30]

    Hamiltonian Neural Networks

    Samuel Greydanus, Misko Dzamba, and Jason Yosinski. “Hamiltonian Neural Networks”. In: Advances in Neural Information Processing Systems. Ed. by H. Wallach et al. V ol. 32. Curran Associates, Inc., 2019

  31. [31]

    Efficiently Parameterized Neural Metriplectic Systems

    Anthony Gruber et al. Efficiently Parameterized Neural Metriplectic Systems. Jan. 27, 2025. arXiv: 2405.16305 [cs]. Pre-published

  32. [32]

    GraphScale: A Framework to Enable Machine Learning over Billion-node Graphs

    Vipul Gupta et al. “GraphScale: A Framework to Enable Machine Learning over Billion-node Graphs”. In: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management. CIKM ’24. Boise, ID, USA: Association for Computing Machinery, 2024, pp. 4514–4521. ISBN : 9798400704369. DOI: 10.1145/3627673.3680021

  33. [33]

    Geometric numerical integration illustrated by the Störmer–Verlet method

    Ernst Hairer, Christian Lubich, and Gerhard Wanner. “Geometric numerical integration illustrated by the Störmer–Verlet method”. In:Acta numerica 12 (2003), pp. 399–450

  34. [34]

    On a General Method in Dynamics

    William Rowan Hamilton. “On a General Method in Dynamics”. In:Philosophical Transac- tions of the Royal Society 124 (1834), pp. 247–308

  35. [35]

    Second Essay on a General Method in Dynamics

    William Rowan Hamilton. “Second Essay on a General Method in Dynamics”. In:Philosoph- ical Transactions of the Royal Society 125 (1835), pp. 95–144

  36. [36]

    A Comprehensive Survey on Graph Reduction: Sparsification, Coarsening, and Condensation

    Mohammad Hashemi et al. A Comprehensive Survey on Graph Reduction: Sparsification, Coarsening, and Condensation. 2024. arXiv: 2402.03358 [cs.SI]

  37. [37]

    Structure-Preserving Neural Networks

    Quercus Hernández et al. “Structure-Preserving Neural Networks”. In: Journal of Computa- tional Physics 426 (Feb. 2021), p. 109950. ISSN : 00219991

  38. [38]

    Universal Approximation Using Incremental Constructive Feedforward Networks With Random Hidden Nodes

    Guang-Bin Huang, Lei Chen, and Chee Siew. “Universal Approximation Using Incremental Constructive Feedforward Networks With Random Hidden Nodes”. In:IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council 17 (2006), pp. 879–92

  39. [39]

    Extreme learning machine: a new learning scheme of feedforward neural networks

    Guang-Bin Huang, Qin-Yu Zhu, and Chee-Kheong Siew. “Extreme learning machine: a new learning scheme of feedforward neural networks”. In: 2004 IEEE international joint conference on neural networks (IEEE Cat. No. 04CH37541). V ol. 2. 2004, pp. 985–990

  40. [40]

    Condensing Graphs via One-Step Gradient Matching

    Wei Jin et al. “Condensing Graphs via One-Step Gradient Matching”. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. KDD ’22. Washington DC, USA: Association for Computing Machinery, 2022, pp. 720–730. ISBN : 9781450393850. DOI: 10.1145/3534678.3539429

  41. [41]

    Graph Condensation for Graph Neural Networks

    Wei Jin et al. “Graph Condensation for Graph Neural Networks”. In:International Confer- ence on Learning Representations. 2022. URL: https://openreview.net/forum?id= WLEx3Jo4QaB

  42. [42]

    Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining

    Tim Kaler et al. “Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining”. In: Proceedings of Machine Learning and Systems . Ed. by D. Marculescu, Y . Chi, and C. Wu. V ol. 4. 2022, pp. 172–189

  43. [43]

    Adam: A Method for Stochastic Optimization

    D. P. Kingma and L. J. Ba. “Adam: A Method for Stochastic Optimization”. In:International Conference on Learning Representations ICLR 2015. 2015

  44. [44]

    Directional Message Passing for Molecular Graphs

    Johannes Klicpera, Janek Groß, Stephan Günnemann, et al. “Directional Message Passing for Molecular Graphs.” In: ICLR. 2020, pp. 1–13

  45. [45]

    Fast&Fair: Training Acceleration and Bias Mitigation for GNNs

    Oyku Deniz Kose and Yanning Shen. “Fast&Fair: Training Acceleration and Bias Mitigation for GNNs”. In: Transactions on Machine Learning Research (2023). ISSN : 2835-8856. URL: https://openreview.net/forum?id=nOk4XEB7Ke

  46. [46]

    Featured Graph Coarsening with Similarity Guarantees

    Manoj Kumar et al. “Featured Graph Coarsening with Similarity Guarantees”. In:Proceedings of the 40th International Conference on Machine Learning . Ed. by Andreas Krause et al. V ol. 202. Proceedings of Machine Learning Research. PMLR, July 2023, pp. 17953–17975

  47. [47]

    Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude

    “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude”. In: COURSERA: Neural networks for machine learning 4.2 (2012), p. 26

  48. [48]

    Machine learning structure preserving brack- ets for forecasting irreversible processes

    Kookjin Lee, Nathaniel Trask, and Panos Stinis. “Machine learning structure preserving brack- ets for forecasting irreversible processes”. In:Advances in Neural Information Processing Systems 34 (2021), pp. 5696–5707

  49. [49]

    Fault and Noise Tolerance in the Incremental Extreme Learning Machine

    Ho Chun Leung, Chi Sing Leung, and Eric Wing Ming Wong. “Fault and Noise Tolerance in the Incremental Extreme Learning Machine”. In: IEEE Access 7 (2019), pp. 155171–155183

  50. [50]

    Physics-constrained and flow-field-message-informed graph neural network for solving unsteady compressible flows

    Siye Li et al. “Physics-constrained and flow-field-message-informed graph neural network for solving unsteady compressible flows”. In:Physics of Fluids 36.4 (2024). 12

  51. [51]

    PaGraph: Scaling GNN training on large graphs via computation-aware caching

    Zhiqi Lin et al. “PaGraph: Scaling GNN training on large graphs via computation-aware caching”. In: Proceedings of the 11th ACM Symposium on Cloud Computing . SoCC ’20. Virtual Event, USA: Association for Computing Machinery, 2020, pp. 401–415. ISBN : 9781450381376. DOI: 10.1145/3419111.3421281

  52. [52]

    On the limited memory BFGS method for large scale optimization

    Dong C Liu and Jorge Nocedal. “On the limited memory BFGS method for large scale optimization”. In: Mathematical programming 45.1 (1989), pp. 503–528

  53. [53]

    On the Variance of the Adaptive Learning Rate and Beyond

    Liyuan Liu et al. On the Variance of the Adaptive Learning Rate and Beyond. 2021. arXiv: 1908.03265 [cs.LG]. Pre-published

  54. [54]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. 2019. arXiv: 1711.05101 [cs.LG]

  55. [55]

    Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning

    Michael Lutter, Christian Ritter, and Jan Peters. “Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning”. In:International Conference on Learning Representations. 2019

  56. [56]

    A Gated Graph Neural Network Approach to Fast-Convergent Dynamic Average Estimation

    Antonio Marino, Claudio Pacchierotti, and Paolo Robuffo Giordano. “A Gated Graph Neural Network Approach to Fast-Convergent Dynamic Average Estimation”. In:ACM Trans. Intell. Syst. Technol. (Mar. 2025). Just Accepted. ISSN : 2157-6904. DOI: 10.1145/3725857

  57. [57]

    LSRN: A Parallel Iterative Solver for Strongly Over- or Underdetermined Systems

    Xiangrui Meng, Michael A. Saunders, and Michael W. Mahoney. “LSRN: A Parallel Iterative Solver for Strongly Over- or Underdetermined Systems”. In: SIAM Journal on Scientific Computing 36.2 (Jan. 2014), pp. C95–C118. ISSN : 1064-8275, 1095-7197. DOI: 10.1137/ 120866580

  58. [58]

    arXiv preprint arXiv:2411.17164 , year=

    Mohammad Amin Nabian et al. X-MeshGraphNet: Scalable Multi-Scale Graph Neural Networks for Physics Simulation. 2024. arXiv: 2411.17164 [cs.LG]. Pre-published

  59. [59]

    FASTRAIN-GNN: Fast and Accurate Self- Training for Graph Neural Networks

    Amrit Nagarajan and Anand Raghunathan. “FASTRAIN-GNN: Fast and Accurate Self- Training for Graph Neural Networks”. In: Transactions on Machine Learning Research (2023). ISSN : 2835-8856. URL: https://openreview.net/forum?id=1IYJfwJtjQ

  60. [60]

    Variational Learning of Euler–Lagrange Dy- namics from Data

    Sina Ober-Bloebaum and Christian Offen. “Variational Learning of Euler–Lagrange Dy- namics from Data”. In: Journal of Computational and Applied Mathematics 421 (2023), p. 114780

  61. [61]

    Symplectic Integration of Learned Hamiltonian Systems

    C. Offen and S. Ober-Bloebaum. “Symplectic Integration of Learned Hamiltonian Systems”. In: Chaos: An Interdisciplinary Journal of Nonlinear Science 32.1 (2022), p. 013122

  62. [62]

    Functional-link net computing: theory, system architecture, and functionalities

    Y-H Pao and Yoshiyasu Takefuji. “Functional-link net computing: theory, system architecture, and functionalities”. In: Computer 25.5 (1992), pp. 76–79

  63. [63]

    PyTorch: An Imperative Style, High-Performance Deep Learning Library

    Adam Paszke et al. “PyTorch: An Imperative Style, High-Performance Deep Learning Library”. In: Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 2019, pp. 8024–8035

  64. [64]

    Physics-informed graph convolutional neural network for modeling fluid flow and heat convection

    Jiang-Zhou Peng et al. “Physics-informed graph convolutional neural network for modeling fluid flow and heat convection”. In:Physics of Fluids 35.8 (2023)

  65. [65]

    Learning Mesh-Based Simulation with Graph Networks

    Tobias Pfaff et al. “Learning Mesh-Based Simulation with Graph Networks”. In:International Conference on Learning Representations. 2021

  66. [66]

    Uniform approximation of functions with random bases

    Ali Rahimi and Benjamin Recht. “Uniform approximation of functions with random bases”. In: 2008 46th annual allerton conference on communication, control, and computing. IEEE. 2008, pp. 555–561

  67. [67]

    Training Hamiltonian Neural Net- works without Backpropagation

    Atamert Rahma, Chinmay Datar, and Felix Dietrich. “Training Hamiltonian Neural Net- works without Backpropagation”. In: NeurIPS 2024 Workshop on Machine Learning and the Physical Sciences. NeurIPS 2024, Nov. 26, 2024

  68. [68]

    Quasi-Monte Carlo Graph Random Features

    Isaac Reid, Krzysztof M Choromanski, and Adrian Weller. “Quasi-Monte Carlo Graph Random Features”. In: Advances in Neural Information Processing Systems. Ed. by A. Oh et al. V ol. 36. Curran Associates, Inc., 2023, pp. 14770–14796

  69. [69]

    General Graph Random Features

    Isaac Reid et al. General Graph Random Features. 2023. arXiv: 2310.04859 [stat.ML]. Pre-published

  70. [70]

    A direct adaptive method for faster backpropagation learning: The RPROP algorithm

    Martin Riedmiller and Heinrich Braun. “A direct adaptive method for faster backpropagation learning: The RPROP algorithm”. In: IEEE international conference on neural networks . IEEE. 1993, pp. 586–591

  71. [71]

    A Stochastic Approximation Method

    Herbert E. Robbins. “A Stochastic Approximation Method”. In: Annals of Mathematical Statistics 22 (1951), pp. 400–407. 13

  72. [72]

    Roth et al

    Fabian J. Roth et al. Stable Port-Hamiltonian Neural Networks. Feb. 4, 2025. arXiv: 2502. 02480 [cs]. Pre-published

  73. [73]

    Hamiltonian Graph Networks with ODE Integrators

    Alvaro Sanchez-Gonzalez et al. “Hamiltonian Graph Networks with ODE Integrators”. In: Second Workshop on Machine Learning and the Physical Sciences (NeurIPS 2019), Vancouver, Canada. NeurIPS 2019, Sept. 27, 2019

  74. [74]

    Learning to simulate complex physics with graph networks

    Alvaro Sanchez-Gonzalez et al. “Learning to simulate complex physics with graph networks”. In: International conference on machine learning. PMLR. 2020, pp. 8459–8468

  75. [75]

    Modeling Relational Data with Graph Convolutional Networks

    Michael Schlichtkrull et al. “Modeling Relational Data with Graph Convolutional Networks”. In: The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings. Heraklion, Greece: Springer-Verlag, 2018, pp. 593–607. ISBN : 978-3-319-93416-7

  76. [76]

    Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers

    Robin M Schmidt, Frank Schneider, and Philipp Hennig. “Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers”. In: Proceedings of the 38th Interna- tional Conference on Machine Learning. Ed. by Marina Meila and Tong Zhang. V ol. 139. Proceedings of Machine Learning Research. PMLR, 2021, pp. 9367–9376

  77. [77]

    Feed forward neural networks with random weights

    Wouter F Schmidt, Martin A Kraaijveld, Robert PW Duin, et al. “Feed forward neural networks with random weights”. In: International conference on pattern recognition. IEEE Computer Society Press. 1992, pp. 1–1

  78. [78]

    Schnet: A continuous-filter convolutional neural network for modeling quantum interactions

    Kristof Schütt et al. “Schnet: A continuous-filter convolutional neural network for modeling quantum interactions”. In: Advances in neural information processing systems 30 (2017)

  79. [79]

    ACM Comput

    Yingxia Shao et al. “Distributed Graph Neural Network Training: A Survey”. In: ACM Comput. Surv. 56.8 (Apr. 2024). ISSN : 0360-0300. DOI: 10.1145/3648358

  80. [80]

    arXiv preprint arXiv:2501.07373 , year =

    Vinay Sharma and Olga Fink. Dynami-CAL GraphNet: A Physics-Informed Graph Neural Network Conserving Linear and Angular Momentum for Dynamical Systems. 2025. arXiv: 2501.07373 [cs.LG]. Pre-published

Showing first 80 references.