Prediction Is Not Physics: Learning and Evaluating Conserved Quantities in Neural Simulators
Pith reviewed 2026-05-20 15:56 UTC · model grok-4.3
The pith
Neural networks can predict physical trajectories accurately without learning the true conserved energy unless given explicit alignment to analytical values.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A diffusion model achieves rollout MSE near 10 to the minus 3 on Hamiltonian trajectories yet produces energy standard deviation 7500 to 36000 times larger than ground truth. This gap leads to the question of whether networks can learn or select conserved quantities. The structured T of v plus V of q model matches analytical energy to R squared greater than or equal to 0.9999 on clean data. The black-box CDN reaches R squared greater than or equal to 0.996 only with temporal consistency plus an alignment loss of lambda equal to 0.2 to analytical energy at t equals 0; with lambda equal to 0 the Pearson R squared collapses below 10 to the minus 3 on pendulum and spring-mass. Under 1 percent 1D
What carries the argument
The Conservation Discovery Network (CDN), a neural model that learns a scalar conserved quantity from position-velocity trajectories by optimizing a temporal consistency loss, optionally augmented by alignment to analytical energy at the first timestep.
If this is right
- Structured kinetic-plus-potential models recover analytical energy to R squared above 0.9999 on clean Hamiltonian trajectories.
- Black-box CDN performance collapses without the alignment term, indicating temporal consistency alone does not reliably identify the true conserved quantity.
- Under 1 percent additive noise the black-box CDN can outperform the structured model on projectile and spring-mass systems.
- Polynomial CDN variants reach R squared of 0.9998 given longer training and more data regardless of noise level.
- Low rollout mean squared error in neural simulators does not imply preservation of physical invariants such as energy.
Where Pith is reading between the lines
- Hybrid training that combines rollout prediction with conservation discovery could reduce long-term drift in learned simulators even when full analytical expressions are unavailable.
- The observed sensitivity to alignment loss suggests that discovery methods may need adaptation when conserved quantities are unknown or when data contain unknown external influences.
- Applying similar consistency-plus-alignment objectives to systems with multiple invariants or mild dissipation could clarify the boundaries of what black-box networks can extract from trajectories.
- If conserved-quantity learning improves stability, it might serve as an auxiliary objective for training predictive models on real sensor data from physical experiments.
Load-bearing premise
The three chosen systems are exactly Hamiltonian with no hidden dissipation or external forces, so the analytical energy is the unique globally conserved quantity.
What would settle it
Retraining the black-box CDN with only the temporal consistency loss on larger datasets or new Hamiltonian systems and checking whether it recovers analytical energy with R squared above 0.99 would test whether alignment is required.
Figures
read the original abstract
A diffusion model trained on Hamiltonian trajectories can achieve rollout MSE near $10^{-3}$, but the standard deviation of its energy over time is between 7500 and 36000 times larger than the ground-truth energy standard deviation, indicating a failure to preserve conservation laws. This gap motivates our central question of whether neural networks can learn or select globally conserved quantities from physical trajectories. We investigate this across three Hamiltonian systems: projectile motion, pendulum, and spring-mass. We use a structured $T(v)+V(q)$ energy model, a black-box Conservation Discovery Network (CDN), a polynomial CDN, and a conditional diffusion baseline. The structured network reaches $R^2 \geq 0.9999$ against analytical energy on clean data, while the black-box CDN reaches $R^2 \geq 0.996$ when trained with temporal consistency plus a small alignment loss to analytical energy at $t=0$ ($\lambda_{\mathrm{align}}=0.2$). With $\lambda_{\mathrm{align}}=0$, CDN Pearson $R^2$ collapses on pendulum and spring-mass ($< 10^{-3}$), showing that temporal consistency alone is not enough to reliably identify the true energy. Under $1\%$ additive Gaussian noise, the CDN outperforms the structured model on the projectile and spring-mass systems, suggesting that the CDN may be more robust to noisy inputs in this setting. However, the polynomial CDN is sensitive to training configuration: it achieves $R^2=0.78$ under a short training schedule on the pendulum system, but reaches $R^2=0.9998$ with more training time and data, regardless of whether noise is added.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper shows that diffusion models trained on Hamiltonian trajectories achieve rollout MSE near 10^{-3} but produce energy trajectories whose standard deviation is 7500–36000 times larger than the ground-truth energy standard deviation. It then compares four approaches—structured T(v)+V(q) energy model, black-box Conservation Discovery Network (CDN), polynomial CDN, and conditional diffusion baseline—on projectile motion, pendulum, and spring-mass systems. The structured model reaches R² ≥ 0.9999 against analytical energy; the black-box CDN reaches R² ≥ 0.996 only when a small alignment loss to analytical energy at t=0 (λ_align=0.2) is added to temporal consistency, collapsing to R² < 10^{-3} without it; under 1% additive Gaussian noise the CDN outperforms the structured model on two systems.
Significance. If the central empirical findings hold, the work supplies concrete evidence that low prediction error in neural simulators does not imply preservation of conserved quantities and that purely unsupervised temporal-consistency objectives are insufficient to recover the physically relevant invariant on these systems. The explicit comparison of structured, black-box, and polynomial architectures, together with the noise-robustness results, offers a useful benchmark for future work on physically consistent neural simulators.
major comments (3)
- [Abstract and results on projectile motion] Abstract and §4 (results on projectile motion): the claim that temporal consistency alone fails to identify the true energy rests on the assumption that analytical energy is the unique globally conserved scalar recoverable from trajectories. Projectile motion also conserves horizontal momentum; a black-box CDN minimizing per-trajectory variance could recover a momentum-like function, producing low Pearson R² specifically with energy. The manuscript should report the variance of the learned CDN output and its correlation with both energy and momentum on this system.
- [Abstract and methods] Abstract and methods (loss formulations): the positive CDN result (R² ≥ 0.996) is obtained only with λ_align=0.2; with λ_align=0 the method collapses on pendulum and spring-mass. Because the successful regime therefore depends on supervision from the analytical energy at t=0, the manuscript should clarify whether the central claim is that conserved quantities can be discovered from trajectories alone or that a modest amount of analytical supervision is required.
- [Abstract] Abstract (noise-robustness paragraph): the statement that CDN outperforms the structured model under 1% additive Gaussian noise on projectile and spring-mass lacks the exact loss formulations, training schedules, and statistical significance tests used for that comparison. Given that the polynomial CDN is shown to be sensitive to training configuration, these details are load-bearing for the robustness claim.
minor comments (2)
- [Abstract] The precise numerical ranges for energy standard-deviation ratios (7500–36000) should be broken down by system rather than reported as a single interval.
- [Methods] Notation for the alignment loss term and the exact value chosen for λ_align should be defined in the main text before the results are presented.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the scope of our claims about unsupervised conservation discovery and the robustness of our empirical comparisons. We address each major point below, proposing targeted revisions to the manuscript where the suggestions strengthen the presentation without altering the core findings.
read point-by-point responses
-
Referee: [Abstract and results on projectile motion] Abstract and §4 (results on projectile motion): the claim that temporal consistency alone fails to identify the true energy rests on the assumption that analytical energy is the unique globally conserved scalar recoverable from trajectories. Projectile motion also conserves horizontal momentum; a black-box CDN minimizing per-trajectory variance could recover a momentum-like function, producing low Pearson R² specifically with energy. The manuscript should report the variance of the learned CDN output and its correlation with both energy and momentum on this system.
Authors: We agree that projectile motion conserves horizontal momentum in addition to total energy, and that a variance-minimizing black-box CDN could in principle recover a momentum-like scalar rather than energy. This is a valid point that qualifies our interpretation of the low R² values on this system. We will add the requested analysis: for the projectile-motion experiments we will report the temporal variance of the learned CDN output together with its Pearson correlations against both the analytical energy and the horizontal momentum. These results will be included in a revised §4 (and referenced from the abstract if they materially affect the summary claims). revision: yes
-
Referee: [Abstract and methods] Abstract and methods (loss formulations): the positive CDN result (R² ≥ 0.996) is obtained only with λ_align=0.2; with λ_align=0 the method collapses on pendulum and spring-mass. Because the successful regime therefore depends on supervision from the analytical energy at t=0, the manuscript should clarify whether the central claim is that conserved quantities can be discovered from trajectories alone or that a modest amount of analytical supervision is required.
Authors: We accept the observation. The manuscript already states that R² collapses to < 10^{-3} when λ_align=0 on pendulum and spring-mass, but the abstract and methods sections do not sufficiently foreground that the reported high-R² regime uses a small supervised alignment term. In the revision we will (i) explicitly label the λ_align=0.2 setting as “temporal consistency plus modest alignment supervision” throughout the abstract and §3, and (ii) rephrase the central claim to emphasize that purely unsupervised temporal consistency is insufficient to recover the physically relevant invariant on these systems, while a modest amount of supervision at a single time point enables recovery. This does not change the empirical result but makes the scope of the claim precise. revision: yes
-
Referee: [Abstract] Abstract (noise-robustness paragraph): the statement that CDN outperforms the structured model under 1% additive Gaussian noise on projectile and spring-mass lacks the exact loss formulations, training schedules, and statistical significance tests used for that comparison. Given that the polynomial CDN is shown to be sensitive to training configuration, these details are load-bearing for the robustness claim.
Authors: We agree that the noise-robustness paragraph in the abstract is too terse. In the revised manuscript we will expand the corresponding paragraph in §4 (and the abstract summary) to include: the precise loss weights used for both models under noise, the training schedule and data volume, the optimizer settings, and the number of random seeds together with any statistical significance tests performed. We already note the sensitivity of the polynomial CDN to training configuration; the added details will make the CDN-versus-structured comparison reproducible and will qualify the robustness statement accordingly. revision: yes
Circularity Check
No significant circularity in empirical evaluation chain
full rationale
The paper reports direct experimental comparisons of learned functions against independently known analytical energies for three Hamiltonian systems, using explicit loss terms (temporal consistency and optional alignment) whose effects are ablated and quantified via R². No derivation reduces a claimed prediction to a fitted input by construction, no uniqueness theorem is imported from self-citation, and the central observation—that λ_align=0 yields collapse in correlation—is a measured outcome rather than a definitional equivalence. The evaluation benchmark (analytical energy) is external to the training procedure when λ_align=0, rendering the reported failure of pure temporal consistency a self-contained empirical result.
Axiom & Free-Parameter Ledger
free parameters (1)
- lambda_align =
0.2
axioms (1)
- domain assumption The projectile, pendulum, and spring-mass systems obey Hamiltonian dynamics with a single globally conserved energy function.
invented entities (1)
-
Conservation Discovery Network (CDN)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
black-box CDN reaches R² ≥ 0.996 when trained with temporal consistency plus a small alignment loss... With λ_align=0, CDN Pearson R² collapses on pendulum and spring-mass (< 10^{-3})
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
structured T(v)+V(q) energy model... CDN... polynomial CDN
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
[Brunton et al.(2016)] S. L. Brunton, J. L. Proctor, and J. N. Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.Proceedings of the National Academy of Sciences, 113(15):3932–3937,
work page 2016
-
[2]
[Champion et al.(2019)] K. Champion, B. Lusch, J. N. Kutz, and S. L. Brunton. Data-driven discovery of coordinates and governing equations.Proceedings of the National Academy of Sciences, 116(45):22445– 22451,
work page 2019
-
[3]
[Chen et al.(2018)] R. T. Q. Chen, Y . Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural ordinary differen- tial equations.Advances in Neural Information Processing Systems, 31,
work page 2018
-
[4]
[Cranmer et al.(2020)] M. Cranmer, S. Greydanus, S. Hoyer, P. Battaglia, D. Spergel, and S. Ho. Lagrangian neural networks.ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations,
work page 2020
-
[5]
[Cranmer(2023)] M. Cranmer. Interpretable machine learning for science with PySR and SymbolicRegression.jl. arXiv preprint arXiv:2305.01582,
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[6]
[Du et al.(2023)] Y . Du, C. Durkan, R. Strudel, J. B. Tenenbaum, S. Dieleman, R. Fergus, J. Sohl-Dickstein, A. Doucet, and W. S. Grathwohl. Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and MCMC.International Conference on Machine Learning,
work page 2023
-
[7]
[Greydanus et al.(2019)] S. Greydanus, M. Dzamba, and J. Yosinski. Hamiltonian neural networks.Advances in Neural Information Processing Systems, 32,
work page 2019
-
[8]
[Ho et al.(2020)] J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851,
work page 2020
- [9]
-
[10]
[Schmidt and Lipson(2009)] M. Schmidt and H. Lipson. Distilling free-form natural laws from experimental data.Science, 324(5923):81–85,
work page 2009
-
[11]
[Song et al.(2021)] Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations.International Conference on Learning Representations,
work page 2021
-
[12]
[Udrescu and Tegmark(2020)] S.-M. Udrescu and M. Tegmark. AI Feynman: A physics-inspired method for symbolic regression.Science Advances, 6(16):eaay2631,
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.