Prediction Is Not Physics: Learning and Evaluating Conserved Quantities in Neural Simulators

Aditya Kothari; Andrew Bukowski; Ishir Rao; Simba Shi

arxiv: 2605.18883 · v1 · pith:VGFZXNLPnew · submitted 2026-05-16 · 💻 cs.LG · cs.AI

Prediction Is Not Physics: Learning and Evaluating Conserved Quantities in Neural Simulators

Andrew Bukowski , Aditya Kothari , Simba Shi , Ishir Rao This is my paper

Pith reviewed 2026-05-20 15:56 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords conserved quantitiesneural simulatorsHamiltonian systemsenergy conservationtrajectory predictiondiffusion modelsconservation discovery networktemporal consistency

0 comments

The pith

Neural networks can predict physical trajectories accurately without learning the true conserved energy unless given explicit alignment to analytical values.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether neural networks trained on trajectories from Hamiltonian systems can discover globally conserved quantities such as energy. It tests structured kinetic-plus-potential models, black-box Conservation Discovery Networks, polynomial variants, and diffusion baselines on projectile motion, pendulum, and spring-mass systems. The structured model recovers analytical energy with R squared at or above 0.9999 on clean data, while black-box networks reach R squared at or above 0.996 only when a temporal consistency loss is paired with a small alignment term to analytical energy at the initial time. Removing the alignment term causes correlation to drop below 0.001 on the pendulum and spring-mass cases, showing that good rollout accuracy alone does not produce physical conservation. Under modest noise the black-box approach sometimes proves more robust than the structured one, though polynomial results vary strongly with training length and data volume.

Core claim

A diffusion model achieves rollout MSE near 10 to the minus 3 on Hamiltonian trajectories yet produces energy standard deviation 7500 to 36000 times larger than ground truth. This gap leads to the question of whether networks can learn or select conserved quantities. The structured T of v plus V of q model matches analytical energy to R squared greater than or equal to 0.9999 on clean data. The black-box CDN reaches R squared greater than or equal to 0.996 only with temporal consistency plus an alignment loss of lambda equal to 0.2 to analytical energy at t equals 0; with lambda equal to 0 the Pearson R squared collapses below 10 to the minus 3 on pendulum and spring-mass. Under 1 percent 1D

What carries the argument

The Conservation Discovery Network (CDN), a neural model that learns a scalar conserved quantity from position-velocity trajectories by optimizing a temporal consistency loss, optionally augmented by alignment to analytical energy at the first timestep.

If this is right

Structured kinetic-plus-potential models recover analytical energy to R squared above 0.9999 on clean Hamiltonian trajectories.
Black-box CDN performance collapses without the alignment term, indicating temporal consistency alone does not reliably identify the true conserved quantity.
Under 1 percent additive noise the black-box CDN can outperform the structured model on projectile and spring-mass systems.
Polynomial CDN variants reach R squared of 0.9998 given longer training and more data regardless of noise level.
Low rollout mean squared error in neural simulators does not imply preservation of physical invariants such as energy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hybrid training that combines rollout prediction with conservation discovery could reduce long-term drift in learned simulators even when full analytical expressions are unavailable.
The observed sensitivity to alignment loss suggests that discovery methods may need adaptation when conserved quantities are unknown or when data contain unknown external influences.
Applying similar consistency-plus-alignment objectives to systems with multiple invariants or mild dissipation could clarify the boundaries of what black-box networks can extract from trajectories.
If conserved-quantity learning improves stability, it might serve as an auxiliary objective for training predictive models on real sensor data from physical experiments.

Load-bearing premise

The three chosen systems are exactly Hamiltonian with no hidden dissipation or external forces, so the analytical energy is the unique globally conserved quantity.

What would settle it

Retraining the black-box CDN with only the temporal consistency loss on larger datasets or new Hamiltonian systems and checking whether it recovers analytical energy with R squared above 0.99 would test whether alignment is required.

Figures

Figures reproduced from arXiv: 2605.18883 by Aditya Kothari, Andrew Bukowski, Ishir Rao, Simba Shi.

**Figure 1.** Figure 1: Model schematic. The black-box CDN is an MLP with four hidden Linear–SiLU blocks of hidden dimension 256, mapping s ∈ R D to a scalar invariant f(s) ∈ R with no imposed physical structure. It is trained on min-max-normalized states using the temporal consistency loss and variance-hinge regularizer in Equation 1. We evaluate two variants: CDN-Conservation, which uses only the conservation objective, and CDN… view at source ↗

read the original abstract

A diffusion model trained on Hamiltonian trajectories can achieve rollout MSE near $10^{-3}$, but the standard deviation of its energy over time is between 7500 and 36000 times larger than the ground-truth energy standard deviation, indicating a failure to preserve conservation laws. This gap motivates our central question of whether neural networks can learn or select globally conserved quantities from physical trajectories. We investigate this across three Hamiltonian systems: projectile motion, pendulum, and spring-mass. We use a structured $T(v)+V(q)$ energy model, a black-box Conservation Discovery Network (CDN), a polynomial CDN, and a conditional diffusion baseline. The structured network reaches $R^2 \geq 0.9999$ against analytical energy on clean data, while the black-box CDN reaches $R^2 \geq 0.996$ when trained with temporal consistency plus a small alignment loss to analytical energy at $t=0$ ($\lambda_{\mathrm{align}}=0.2$). With $\lambda_{\mathrm{align}}=0$, CDN Pearson $R^2$ collapses on pendulum and spring-mass ($< 10^{-3}$), showing that temporal consistency alone is not enough to reliably identify the true energy. Under $1\%$ additive Gaussian noise, the CDN outperforms the structured model on the projectile and spring-mass systems, suggesting that the CDN may be more robust to noisy inputs in this setting. However, the polynomial CDN is sensitive to training configuration: it achieves $R^2=0.78$ under a short training schedule on the pendulum system, but reaches $R^2=0.9998$ with more training time and data, regardless of whether noise is added.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Temporal consistency alone collapses for the black-box CDN on pendulum and spring-mass, so a small alignment term is needed to match analytical energy, but the projectile case leaves room for the network to have found momentum instead.

read the letter

The central result is that a black-box Conservation Discovery Network only recovers high correlation with the true energy when a small alignment loss to the analytical value at t=0 is added; without that term the R² drops below 10^{-3} on the pendulum and spring-mass. The structured T+V model does better on clean data, and the diffusion baseline shows the usual drift in energy over long rollouts. They run the same three systems for all four approaches and ablate the alignment coefficient, which makes the comparison straightforward to follow. The noise experiment is the other concrete piece: under 1% Gaussian noise the CDN beats the structured model on projectile and spring-mass. That is useful incremental evidence on a practical pain point for long-horizon simulators. The polynomial CDN result is also worth noting because it improves sharply with more training time and data, showing sensitivity to schedule rather than architecture alone. The main soft spot is that the uniqueness assumption is not fully tested. Projectile motion conserves both energy and horizontal momentum, so a variance-minimizing network could converge to a momentum-like function and still produce low correlation with energy; the paper reports the collapse on all three systems but does not check whether the learned scalar is constant only for the energy functional or for other invariants. The noise-robustness claim would be stronger with training details, exact loss weights, and a statistical test rather than the summary numbers in the abstract. This work is aimed at people who already build or evaluate neural simulators for robotics or scientific computing and want measurable conservation. It has enough controlled comparisons and a clear negative result on pure temporal consistency to justify sending it to referees, though the authors should add checks on alternative invariants and fuller reproducibility information before final acceptance.

Referee Report

3 major / 2 minor

Summary. The paper shows that diffusion models trained on Hamiltonian trajectories achieve rollout MSE near 10^{-3} but produce energy trajectories whose standard deviation is 7500–36000 times larger than the ground-truth energy standard deviation. It then compares four approaches—structured T(v)+V(q) energy model, black-box Conservation Discovery Network (CDN), polynomial CDN, and conditional diffusion baseline—on projectile motion, pendulum, and spring-mass systems. The structured model reaches R² ≥ 0.9999 against analytical energy; the black-box CDN reaches R² ≥ 0.996 only when a small alignment loss to analytical energy at t=0 (λ_align=0.2) is added to temporal consistency, collapsing to R² < 10^{-3} without it; under 1% additive Gaussian noise the CDN outperforms the structured model on two systems.

Significance. If the central empirical findings hold, the work supplies concrete evidence that low prediction error in neural simulators does not imply preservation of conserved quantities and that purely unsupervised temporal-consistency objectives are insufficient to recover the physically relevant invariant on these systems. The explicit comparison of structured, black-box, and polynomial architectures, together with the noise-robustness results, offers a useful benchmark for future work on physically consistent neural simulators.

major comments (3)

[Abstract and results on projectile motion] Abstract and §4 (results on projectile motion): the claim that temporal consistency alone fails to identify the true energy rests on the assumption that analytical energy is the unique globally conserved scalar recoverable from trajectories. Projectile motion also conserves horizontal momentum; a black-box CDN minimizing per-trajectory variance could recover a momentum-like function, producing low Pearson R² specifically with energy. The manuscript should report the variance of the learned CDN output and its correlation with both energy and momentum on this system.
[Abstract and methods] Abstract and methods (loss formulations): the positive CDN result (R² ≥ 0.996) is obtained only with λ_align=0.2; with λ_align=0 the method collapses on pendulum and spring-mass. Because the successful regime therefore depends on supervision from the analytical energy at t=0, the manuscript should clarify whether the central claim is that conserved quantities can be discovered from trajectories alone or that a modest amount of analytical supervision is required.
[Abstract] Abstract (noise-robustness paragraph): the statement that CDN outperforms the structured model under 1% additive Gaussian noise on projectile and spring-mass lacks the exact loss formulations, training schedules, and statistical significance tests used for that comparison. Given that the polynomial CDN is shown to be sensitive to training configuration, these details are load-bearing for the robustness claim.

minor comments (2)

[Abstract] The precise numerical ranges for energy standard-deviation ratios (7500–36000) should be broken down by system rather than reported as a single interval.
[Methods] Notation for the alignment loss term and the exact value chosen for λ_align should be defined in the main text before the results are presented.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the scope of our claims about unsupervised conservation discovery and the robustness of our empirical comparisons. We address each major point below, proposing targeted revisions to the manuscript where the suggestions strengthen the presentation without altering the core findings.

read point-by-point responses

Referee: [Abstract and results on projectile motion] Abstract and §4 (results on projectile motion): the claim that temporal consistency alone fails to identify the true energy rests on the assumption that analytical energy is the unique globally conserved scalar recoverable from trajectories. Projectile motion also conserves horizontal momentum; a black-box CDN minimizing per-trajectory variance could recover a momentum-like function, producing low Pearson R² specifically with energy. The manuscript should report the variance of the learned CDN output and its correlation with both energy and momentum on this system.

Authors: We agree that projectile motion conserves horizontal momentum in addition to total energy, and that a variance-minimizing black-box CDN could in principle recover a momentum-like scalar rather than energy. This is a valid point that qualifies our interpretation of the low R² values on this system. We will add the requested analysis: for the projectile-motion experiments we will report the temporal variance of the learned CDN output together with its Pearson correlations against both the analytical energy and the horizontal momentum. These results will be included in a revised §4 (and referenced from the abstract if they materially affect the summary claims). revision: yes
Referee: [Abstract and methods] Abstract and methods (loss formulations): the positive CDN result (R² ≥ 0.996) is obtained only with λ_align=0.2; with λ_align=0 the method collapses on pendulum and spring-mass. Because the successful regime therefore depends on supervision from the analytical energy at t=0, the manuscript should clarify whether the central claim is that conserved quantities can be discovered from trajectories alone or that a modest amount of analytical supervision is required.

Authors: We accept the observation. The manuscript already states that R² collapses to < 10^{-3} when λ_align=0 on pendulum and spring-mass, but the abstract and methods sections do not sufficiently foreground that the reported high-R² regime uses a small supervised alignment term. In the revision we will (i) explicitly label the λ_align=0.2 setting as “temporal consistency plus modest alignment supervision” throughout the abstract and §3, and (ii) rephrase the central claim to emphasize that purely unsupervised temporal consistency is insufficient to recover the physically relevant invariant on these systems, while a modest amount of supervision at a single time point enables recovery. This does not change the empirical result but makes the scope of the claim precise. revision: yes
Referee: [Abstract] Abstract (noise-robustness paragraph): the statement that CDN outperforms the structured model under 1% additive Gaussian noise on projectile and spring-mass lacks the exact loss formulations, training schedules, and statistical significance tests used for that comparison. Given that the polynomial CDN is shown to be sensitive to training configuration, these details are load-bearing for the robustness claim.

Authors: We agree that the noise-robustness paragraph in the abstract is too terse. In the revised manuscript we will expand the corresponding paragraph in §4 (and the abstract summary) to include: the precise loss weights used for both models under noise, the training schedule and data volume, the optimizer settings, and the number of random seeds together with any statistical significance tests performed. We already note the sensitivity of the polynomial CDN to training configuration; the added details will make the CDN-versus-structured comparison reproducible and will qualify the robustness statement accordingly. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical evaluation chain

full rationale

The paper reports direct experimental comparisons of learned functions against independently known analytical energies for three Hamiltonian systems, using explicit loss terms (temporal consistency and optional alignment) whose effects are ablated and quantified via R². No derivation reduces a claimed prediction to a fitted input by construction, no uniqueness theorem is imported from self-citation, and the central observation—that λ_align=0 yields collapse in correlation—is a measured outcome rather than a definitional equivalence. The evaluation benchmark (analytical energy) is external to the training procedure when λ_align=0, rendering the reported failure of pure temporal consistency a self-contained empirical result.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The paper relies on the assumption that the tested systems are purely Hamiltonian and that the analytical energy is the target conserved quantity; it introduces the CDN architecture and the alignment loss weight as a tunable parameter.

free parameters (1)

lambda_align = 0.2
Weight of the alignment loss to analytical energy at t=0; set to 0.2 for the successful CDN runs.

axioms (1)

domain assumption The projectile, pendulum, and spring-mass systems obey Hamiltonian dynamics with a single globally conserved energy function.
Invoked when treating analytical energy as ground truth for alignment and R² evaluation.

invented entities (1)

Conservation Discovery Network (CDN) no independent evidence
purpose: Black-box network that outputs a scalar conserved quantity from state trajectories.
New model architecture introduced and compared to structured and polynomial variants.

pith-pipeline@v0.9.0 · 5848 in / 1480 out tokens · 43929 ms · 2026-05-20T15:56:48.603574+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

black-box CDN reaches R² ≥ 0.996 when trained with temporal consistency plus a small alignment loss... With λ_align=0, CDN Pearson R² collapses on pendulum and spring-mass (< 10^{-3})
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

structured T(v)+V(q) energy model... CDN... polynomial CDN

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 1 internal anchor

[1]

[Brunton et al.(2016)] S. L. Brunton, J. L. Proctor, and J. N. Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.Proceedings of the National Academy of Sciences, 113(15):3932–3937,

work page 2016
[2]

Champion, B

[Champion et al.(2019)] K. Champion, B. Lusch, J. N. Kutz, and S. L. Brunton. Data-driven discovery of coordinates and governing equations.Proceedings of the National Academy of Sciences, 116(45):22445– 22451,

work page 2019
[3]

[Chen et al.(2018)] R. T. Q. Chen, Y . Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural ordinary differen- tial equations.Advances in Neural Information Processing Systems, 31,

work page 2018
[4]

Cranmer, S

[Cranmer et al.(2020)] M. Cranmer, S. Greydanus, S. Hoyer, P. Battaglia, D. Spergel, and S. Ho. Lagrangian neural networks.ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations,

work page 2020
[5]

[Cranmer(2023)] M. Cranmer. Interpretable machine learning for science with PySR and SymbolicRegression.jl. arXiv preprint arXiv:2305.01582,

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

[Du et al.(2023)] Y . Du, C. Durkan, R. Strudel, J. B. Tenenbaum, S. Dieleman, R. Fergus, J. Sohl-Dickstein, A. Doucet, and W. S. Grathwohl. Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and MCMC.International Conference on Machine Learning,

work page 2023
[7]

Greydanus, M

[Greydanus et al.(2019)] S. Greydanus, M. Dzamba, and J. Yosinski. Hamiltonian neural networks.Advances in Neural Information Processing Systems, 32,

work page 2019
[8]

[Ho et al.(2020)] J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851,

work page 2020
[9]

Liu and M

9 [Liu and Tegmark(2021)] Z. Liu and M. Tegmark. Machine learning conservation laws from trajectories. Physical Review Letters, 126(18):180604,

work page 2021
[10]

Schmidt and H

[Schmidt and Lipson(2009)] M. Schmidt and H. Lipson. Distilling free-form natural laws from experimental data.Science, 324(5923):81–85,

work page 2009
[11]

[Song et al.(2021)] Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations.International Conference on Learning Representations,

work page 2021
[12]

Udrescu and M

[Udrescu and Tegmark(2020)] S.-M. Udrescu and M. Tegmark. AI Feynman: A physics-inspired method for symbolic regression.Science Advances, 6(16):eaay2631,

work page 2020

[1] [1]

[Brunton et al.(2016)] S. L. Brunton, J. L. Proctor, and J. N. Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.Proceedings of the National Academy of Sciences, 113(15):3932–3937,

work page 2016

[2] [2]

Champion, B

[Champion et al.(2019)] K. Champion, B. Lusch, J. N. Kutz, and S. L. Brunton. Data-driven discovery of coordinates and governing equations.Proceedings of the National Academy of Sciences, 116(45):22445– 22451,

work page 2019

[3] [3]

[Chen et al.(2018)] R. T. Q. Chen, Y . Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural ordinary differen- tial equations.Advances in Neural Information Processing Systems, 31,

work page 2018

[4] [4]

Cranmer, S

[Cranmer et al.(2020)] M. Cranmer, S. Greydanus, S. Hoyer, P. Battaglia, D. Spergel, and S. Ho. Lagrangian neural networks.ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations,

work page 2020

[5] [5]

[Cranmer(2023)] M. Cranmer. Interpretable machine learning for science with PySR and SymbolicRegression.jl. arXiv preprint arXiv:2305.01582,

work page internal anchor Pith review Pith/arXiv arXiv 2023

[6] [6]

[Du et al.(2023)] Y . Du, C. Durkan, R. Strudel, J. B. Tenenbaum, S. Dieleman, R. Fergus, J. Sohl-Dickstein, A. Doucet, and W. S. Grathwohl. Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and MCMC.International Conference on Machine Learning,

work page 2023

[7] [7]

Greydanus, M

[Greydanus et al.(2019)] S. Greydanus, M. Dzamba, and J. Yosinski. Hamiltonian neural networks.Advances in Neural Information Processing Systems, 32,

work page 2019

[8] [8]

[Ho et al.(2020)] J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851,

work page 2020

[9] [9]

Liu and M

9 [Liu and Tegmark(2021)] Z. Liu and M. Tegmark. Machine learning conservation laws from trajectories. Physical Review Letters, 126(18):180604,

work page 2021

[10] [10]

Schmidt and H

[Schmidt and Lipson(2009)] M. Schmidt and H. Lipson. Distilling free-form natural laws from experimental data.Science, 324(5923):81–85,

work page 2009

[11] [11]

[Song et al.(2021)] Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations.International Conference on Learning Representations,

work page 2021

[12] [12]

Udrescu and M

[Udrescu and Tegmark(2020)] S.-M. Udrescu and M. Tegmark. AI Feynman: A physics-inspired method for symbolic regression.Science Advances, 6(16):eaay2631,

work page 2020