Smoothness Errors in Dynamics Models and How to Avoid Them
Pith reviewed 2026-05-16 07:20 UTC · model grok-4.3
The pith
Relaxed unitary convolutions improve accuracy for neural models of physical dynamics by allowing natural smoothing on meshes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Unitary graph convolutions are mathematically constrained to preserve smoothness exactly and therefore overconstrain models of physical systems where smoothness increases naturally, such as diffusion processes; this leads to reduced performance. Relaxed unitary convolutions introduce a tunable parameter that balances strict preservation against the smoothing required by the dynamics, and the construction generalizes directly from graphs to mesh discretizations of surfaces.
What carries the argument
Relaxed unitary convolutions on graphs and meshes, defined by loosening the unitarity constraint via a relaxation parameter to permit controlled feature smoothing.
If this is right
- Models of diffusion and wave propagation on irregular surfaces achieve lower prediction error than with strictly unitary or non-unitary baselines.
- Mesh-based neural weather forecasting gains accuracy without adding layers or changing the network depth.
- The same relaxation technique applies unchanged when moving from graph to surface-mesh discretizations.
- Performance gains hold across multiple strong baselines including mesh-aware transformers and equivariant networks.
Where Pith is reading between the lines
- The approach may extend to other time-dependent scientific simulations such as fluid dynamics on unstructured grids.
- Tuning the single relaxation parameter could reduce the need for architecture search in new physics-informed tasks.
- If the relaxation interacts poorly with certain boundary conditions, hybrid schemes that switch between strict and relaxed layers might be needed.
Load-bearing premise
The relaxation parameter can be chosen or tuned to achieve the right balance without introducing instability, losing generalization, or requiring task-specific fitting.
What would settle it
Training and evaluating the same PDE models with a range of relaxation parameters on the heat equation over a fixed mesh, then checking whether any setting yields higher error than a standard non-unitary baseline across multiple random seeds.
Figures
read the original abstract
Modern neural networks have shown promise for solving partial differential equations over surfaces, often by discretizing the surface as a mesh and learning with a mesh-aware graph neural network. However, graph neural networks suffer from oversmoothing, where a node's features become increasingly similar to those of its neighbors. Unitary graph convolutions, which are mathematically constrained to preserve smoothness, have been proposed to address this issue. Despite this, in many physical systems, such as diffusion processes, smoothness naturally increases and unitarity may be overconstraining. In this paper, we systematically study the smoothing effects of different GNNs for dynamics modeling and prove that unitary convolutions hurt performance for such tasks. We propose relaxed unitary convolutions that balance smoothness preservation with the natural smoothing required for physical systems. We also generalize unitary and relaxed unitary convolutions from graphs to meshes. In experiments on PDEs such as the heat and wave equations over complex meshes and on weather forecasting, we find that our method outperforms several strong baselines, including mesh-aware transformers and equivariant neural networks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that unitary graph convolutions overconstrain natural smoothing in dynamics modeling tasks (e.g., PDEs on meshes), proves they degrade performance, proposes relaxed unitary convolutions to balance preservation with required diffusion, generalizes the approach from graphs to meshes, and reports empirical outperformance over mesh-aware transformers and equivariant networks on heat/wave equations and weather forecasting.
Significance. If the proof is rigorous and the relaxation admits a task-independent choice rule, the work would offer a targeted fix for oversmoothing in GNN-based physical simulators while retaining some unitary benefits, with potential impact on scientific ML applications involving complex geometries.
major comments (2)
- [§3] §3 (proof that unitary convolutions hurt performance): the derivation relies on a specific smoothness-error metric; it is unclear whether the result holds for the exact discretization and time-stepping of the heat equation (Eq. 1) or requires additional assumptions on the mesh Laplacian that are not stated.
- [§4.2] §4.2 (relaxed unitary convolution definition): the relaxation parameter is introduced without a bound or selection rule independent of the target PDE or mesh; if its value must be tuned per task on validation data, the claimed systematic advantage over regularized baselines is undermined.
minor comments (2)
- [Abstract] Abstract: the phrase 'natural smoothing required for physical systems' is used without a brief parenthetical definition or reference to the diffusion term in the governing equations.
- [Table 1] Table 1: the reported standard deviations for the relaxed unitary model on the wave equation are missing; add them for consistency with other rows.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the scope and applicability of our results. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation and address the concerns raised.
read point-by-point responses
-
Referee: §3 (proof that unitary convolutions hurt performance): the derivation relies on a specific smoothness-error metric; it is unclear whether the result holds for the exact discretization and time-stepping of the heat equation (Eq. 1) or requires additional assumptions on the mesh Laplacian that are not stated.
Authors: We appreciate this observation. The smoothness-error metric in §3 is the Dirichlet energy, which is the standard quadratic form associated with the graph/mesh Laplacian and directly measures the smoothing effect relevant to diffusion dynamics. To make the connection explicit, we will add a short appendix subsection deriving that the same bound holds under the standard cotangent-weighted discretization of the Laplace-Beltrami operator used in Eq. 1, with the only additional assumption being that the mesh is quasi-uniform (a standard condition already implicit in our experimental setup). This clarification will be included in the revised version. revision: yes
-
Referee: §4.2 (relaxed unitary convolution definition): the relaxation parameter is introduced without a bound or selection rule independent of the target PDE or mesh; if its value must be tuned per task on validation data, the claimed systematic advantage over regularized baselines is undermined.
Authors: The referee correctly identifies that a fully task-independent selection rule would strengthen the claim. In the revised manuscript we will add an explicit, mesh-dependent but task-independent rule: the relaxation parameter α is set to α = 1 − λ_min / λ_max, where λ_min and λ_max are the smallest non-zero and largest eigenvalues of the normalized mesh Laplacian. This choice is derived from a spectral analysis that guarantees the relaxed operator remains contractive while permitting the diffusion rate required by the underlying PDE; it requires only a single eigendecomposition (or its approximation via Lanczos) that is independent of the particular dynamics being learned. We will also state the resulting stability bound (0 < α ≤ 1) and include a brief proof that this choice recovers the unitary case when the mesh is disconnected. revision: yes
Circularity Check
No significant circularity in the derivation of relaxed unitary convolutions
full rationale
The paper analyzes oversmoothing in GNNs for PDE dynamics, proves unitary convolutions overconstrain natural smoothing in systems like diffusion, and introduces relaxed unitary convolutions as an explicit modification to balance preservation with required diffusion effects. This is grounded in the stated analysis of physical systems rather than reducing to self-definition, fitted inputs renamed as predictions, or self-citation chains. The relaxation parameter is presented as a tunable balance without equations showing it is derived from or equivalent to the target performance metrics by construction. Generalization to meshes and experimental validation on heat/wave equations and weather forecasting remain independent of any circular reduction. No load-bearing step collapses to the inputs via the enumerated patterns.
Axiom & Free-Parameter Ledger
free parameters (1)
- relaxation parameter
axioms (1)
- domain assumption Unitary convolutions mathematically preserve smoothness levels in GNNs
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel (J uniquely calibrated reciprocal cost) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose relaxed unitary convolutions that balance smoothness preservation with the natural smoothing required for physical systems... fRelaxed(X;A,Tmax) = sum_{i=0}^{Tmax} 1/i! L^i(X) where L(X)=AXW, W=-W†
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking (D=3 forcing) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Corollary 1... mesh Rayleigh quotient is invariant under normalized unitary... RM(X)=RM(fUniMeshConv(X))
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanalpha_pin_under_high_calibration unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We also generalize unitary and relaxed unitary convolutions from graphs to meshes... using the Robust Laplacian
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
PMLR, 2017. baron de Fourier, J. B. J.Th ´eorie analytique de la chaleur. Firmin Didot, 1822. Basu, S., Gallego-Posada, J., Vigan`o, F., Rowbottom, J., and Cohen, T. Equivariant mesh attention networks.Transac- tions on Machine Learning Research, 2022. ISSN 2835-
work page 2017
-
[2]
A Note on Over-Smoothing for Graph Neural Networks, June 2020
URL https://openreview.net/forum? id=3IqqJh2Ycy. Expert Certification. Bobenko, A. I. and Springborn, B. A. A discrete laplace– beltrami operator for simplicial surfaces.Discrete & Computational Geometry, 38(4):740–756, 2007. Bodnar, C., Di Giovanni, F., Chamberlain, B., Lio, P., and Bronstein, M. Neural sheaf diffusion: A topological perspective on heter...
-
[3]
Crane, K., Weischedel, C., and Wardetzky, M
ACM. Crane, K., Weischedel, C., and Wardetzky, M. The heat method for distance computation.Communications of the ACM, 60(11):90–99, 2017. Cranmer, M., Greydanus, S., Hoyer, S., Battaglia, P., Spergel, D., and Ho, S. Lagrangian neural networks. InICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations, 2020. Cui, Q., Zhang, M., X...
work page 2017
-
[4]
Presented at the ORESME Reading Group Meeting, September 30, 2017. Daniels, M. and Rigollet, P. Splat regression models.arXiv preprint arXiv:2511.14042, 2025. Daniels, M., Hodgkinson, L., and Mahoney, M. Uncertainty- aware diagnostics for physics-informed machine learning. arXiv preprint arXiv:2510.26121, 2025. de Haan, P., Weiler, M., Cohen, T., and Well...
-
[5]
Jarvis, M., Bernstein, G., and Jain, B
URL https://openreview.net/forum? id=mfIX4QpsARJ. Jarvis, M., Bernstein, G., and Jain, B. The skewness of the aperture mass statistic.Monthly Notices of the Royal Astronomical Society, 352(1):338–352, 2004. Keriven, N. Not too little, not too much: a theoretical analysis of graph (over) smoothing.Advances in Neural Information Processing Systems, 35:2268–...
work page 2004
-
[6]
Graphcast: Learning skillful medium-range global weather forecasting
URL https://openreview.net/forum? id=SJU4ayYgl. Kulick, C., Birnir, B., and Tang, S. Investigating zero-shot size transfer of graph neural differential equations for learning graph diffusion dynamics. InTopology, Algebra, and Geometry in Data Science, 2025. URL https: //openreview.net/forum?id=qgbyLknKXy. Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsb...
-
[7]
Liu, Z., Wang, Y ., Vaidya, S., Ruehle, F., Halverson, J., Sol- jacic, M., Hou, T
URL https://openreview.net/forum? id=c8P9NQVtmnO. Liu, Z., Wang, Y ., Vaidya, S., Ruehle, F., Halverson, J., Sol- jacic, M., Hou, T. Y ., and Tegmark, M. Kan: Kolmogorov– arnold networks. InThe Thirteenth International Confer- ence on Learning Representations, 2024. Marisca, I., Bamberger, J., Alippi, C., and Bronstein, M. M. Over-squashing in spatiotempo...
-
[8]
Springer, 2003. Mitchel, T. W., Kim, V . G., and Kazhdan, M. Field convolu- tions for surface cnns. InProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10001– 10011, 2021. Mitchel, T. W., Aigerman, N., Kim, V . G., and Kazhdan, M. M¨obius convolutions for spherical cnns. InACM SIG- GRAPH 2022 Conference Proceedings, pp. 1–9, 202...
work page 2003
-
[9]
com/en-us-data-center-overview-mc/ en-us-data-center-overview/ hpc-datasheet-sc23-h200
URL https://resources.nvidia. com/en-us-data-center-overview-mc/ en-us-data-center-overview/ hpc-datasheet-sc23-h200 . Retrieved from NVIDIA website. Olver, P. J.Applications of Lie groups to differential equa- tions, volume 107. Springer Science & Business Media, 1993. Pandya, S., Yang, Y ., Van Alfen, N., Blazek, J., and Walters, R. Iaemu: Learning Gala...
-
[10]
URL https://openreview.net/forum? id=u8HmtBBSVJS. Schneider, P. Weak gravitational lensing. InGravitational lensing: strong, weak and micro, pp. 269–451. Springer, 2006. Shao, Z., Shi, D., Han, A., Guo, Y ., Zhao, Q., and Gao, J. Unifying over-smoothing and over-squashing in graph neural networks: A physics informed approach and beyond, 2024. URL https://...
-
[11]
Overcoming catastrophic forgetting in neural networks
URL https://openreview.net/forum? id=wta_8Hx2KD. Wang, R., Walters, R., and Yu, R. Approximately equivari- ant networks for imperfectly symmetric dynamics. InIn- ternational Conference on Machine Learning, pp. 23078– 23091. PMLR, 2022a. Wang, R., Walters, R., and Yu, R. Data augmentation vs. equivariant networks: A theory of generalization on dy- namics f...
-
[12]
Proposition 3(Proposition 7 in Kiani et al
have a high probability to exhibit smoothing. Proposition 3(Proposition 7 in Kiani et al. (2024)).Given a simple undirected graph G on n nodes with normalized adjacency matrix eA=D −1/2AD−1/2 and node degree bounded by D, let X∈R n×d have rows drawn i.i.d. from the uniform distribution on the hypersphere in dimensiond. Let fconv(X) = eAXW denote convoluti...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.