pith. sign in

arxiv: 2602.05352 · v2 · submitted 2026-02-05 · 💻 cs.LG · math.SG

Smoothness Errors in Dynamics Models and How to Avoid Them

Pith reviewed 2026-05-16 07:20 UTC · model grok-4.3

classification 💻 cs.LG math.SG
keywords graph neural networksunitary convolutionspartial differential equationsdynamics modelingmesh discretizationoversmoothingweather forecastingrelaxed constraints
0
0 comments X

The pith

Relaxed unitary convolutions improve accuracy for neural models of physical dynamics by allowing natural smoothing on meshes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Neural networks for PDEs on surfaces often discretize them as meshes and apply graph neural networks, but these suffer from oversmoothing where features become too uniform. Unitary convolutions were introduced to preserve smoothness strictly, yet this paper proves they hurt performance on dynamics tasks such as diffusion where smoothness must increase over time. The authors introduce relaxed unitary convolutions that loosen the constraint just enough to permit the required smoothing while retaining stability benefits. Experiments show these relaxed versions outperform mesh-aware transformers and equivariant networks on the heat and wave equations over complex surfaces and on weather forecasting.

Core claim

Unitary graph convolutions are mathematically constrained to preserve smoothness exactly and therefore overconstrain models of physical systems where smoothness increases naturally, such as diffusion processes; this leads to reduced performance. Relaxed unitary convolutions introduce a tunable parameter that balances strict preservation against the smoothing required by the dynamics, and the construction generalizes directly from graphs to mesh discretizations of surfaces.

What carries the argument

Relaxed unitary convolutions on graphs and meshes, defined by loosening the unitarity constraint via a relaxation parameter to permit controlled feature smoothing.

If this is right

  • Models of diffusion and wave propagation on irregular surfaces achieve lower prediction error than with strictly unitary or non-unitary baselines.
  • Mesh-based neural weather forecasting gains accuracy without adding layers or changing the network depth.
  • The same relaxation technique applies unchanged when moving from graph to surface-mesh discretizations.
  • Performance gains hold across multiple strong baselines including mesh-aware transformers and equivariant networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may extend to other time-dependent scientific simulations such as fluid dynamics on unstructured grids.
  • Tuning the single relaxation parameter could reduce the need for architecture search in new physics-informed tasks.
  • If the relaxation interacts poorly with certain boundary conditions, hybrid schemes that switch between strict and relaxed layers might be needed.

Load-bearing premise

The relaxation parameter can be chosen or tuned to achieve the right balance without introducing instability, losing generalization, or requiring task-specific fitting.

What would settle it

Training and evaluating the same PDE models with a range of relaxation parameters on the heat equation over a fixed mesh, then checking whether any setting yields higher error than a standard non-unitary baseline across multiple random seeds.

Figures

Figures reproduced from arXiv: 2602.05352 by Edward Berman, Jung Yeon Park, Luisa Li, Robin Walters.

Figure 1
Figure 1. Figure 1: Qualitative comparison of autoregressive model pre￾dictions for the heat equation on the armadillo mesh at timestep T = 190. Our R-UNIMESH model remains faithful to the ground truth during each step of the rollout, whereas the EMAN model over smooths and the Hermes model under smooths. A more com￾plete comparison over several timesteps is in Sec. C.4, Tab. 5. et al., 2024; Morel et al., 2025), and synergy … view at source ↗
Figure 2
Figure 2. Figure 2: Top: After zero padding, individual unitary blocks are stacked and the output is fed into an unconstrained decoder. Bottom: Each block uses Taylor truncated unitary convolution. 5.1. Relaxed Unitary Convolution via Taylor Truncation We relax Lie unitary convolution by truncating the Taylor series approximations used in Eq. 3. We note that Kiani et al. (2024) do propose their own constraint relaxation for s… view at source ↗
Figure 3
Figure 3. Figure 3: Left: The dotted lines indicate the mean Rayleigh quotient for input heat graphs at T = 3 and target graphs at T = 4. We also show the mean Rayleigh quotient for the best performing GCN, R-UNIGRAPH, and Lie unitary models. R-UNIGRAPH is best at capturing the true smoothness. Right: Validation MSE of the same three models. R-UNIGRAPH has the best performance. Results for the full set of runs are provided in… view at source ↗
Figure 4
Figure 4. Figure 4: The Rayleigh quotient for each timestep on an unseen mesh for Hermes, EMAN, and R-UNIMESH models. The R￾UNIMESH is the best at capturing the true smoothness for heat [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: RMSE and ACC as a function of lead time for all models temperature prediction. R-UNIMESH has a competitive RMSE, especially at early lead time. R-UNIMESH also maintains viability for lead times of roughly 2 days according to the ECMWF baseline. Exact values recorded in Tab. 7 (Sec. D.4). Results. Our main result is that our R-UNIMESH model outperforms all baselines at solving highly diffusive PDEs and capt… view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of an intrinsic Delaunay edge flip performed by the Robust Laplacian edge rewiring algorithm. This figure is a reproduced version of [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Sample heat diffusion process on a grid discretized as a graph. Node neighbors are the nodes that sit adjacent in the grid. B.2. Taylor Series Sensitivity Analysis We conduct a sensitivity analysis of the Rayleigh quotient to different Taylor series truncations. For completeness, we also compare with standard GCNs and Separable unitary networks. We study these tendencies at initialization for the heat diff… view at source ↗
Figure 8
Figure 8. Figure 8: KL divergence between distribution of Rayleigh quotients before and after applying the model. Results are averaged over 10 runs. In the supplementary material we include a video that shows the evolving Rayleigh quotient distribution as we increase Tmax [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Top: Validation MSE for an ensemble of 5 runs for a GCN (left), R-UNIGRAPH (middle), and a Lie unitary convolution network (right) at timestep t = 3. R-UNIGRAPH significantly outperforms the GCN and also outperforms the Lie unitary network. Bottom: The average Rayleigh quotient over all graphs for an ensemble of 5 runs for the same models at timestep t = 3. The GCN is under constrained and biased towards o… view at source ↗
Figure 10
Figure 10. Figure 10: Smoothness for a Hermes model as measured by the 2−point correlation function. The plot indicates undersmoothing in each radial bin. Heat (α = 1) Model errsmooth (↓) GemCNN – EMAN 4.04 × 10−3 Hermes 9.71 × 10−3 Wave (c = 1) Model errsmooth (↓) GemCNN – EMAN 1.78 × 10−3 Hermes 1.38 × 10−2 Cahn–Hilliard Model errsmooth (↓) GemCNN 1.89 × 10−1 EMAN 4.59 × 10−1 Hermes 9.61 × 10−3 [PITH_FULL_IMAGE:figures/full… view at source ↗
Figure 11
Figure 11. Figure 11: RMSE and ACC as a function of lead time for all models geopotential prediction. R-UNIMESH has a competitive RMSE, especially at early lead time. R-UNIMESH also maintains viability for lead times of roughly 2 days according to the ECMWF baseline. D.5. Smoothness Extended Results We provide smoothness errors for all models on WB2 temperature and geopotential datasets. As seen in Tab. 8, R-UNIMESH is competi… view at source ↗
read the original abstract

Modern neural networks have shown promise for solving partial differential equations over surfaces, often by discretizing the surface as a mesh and learning with a mesh-aware graph neural network. However, graph neural networks suffer from oversmoothing, where a node's features become increasingly similar to those of its neighbors. Unitary graph convolutions, which are mathematically constrained to preserve smoothness, have been proposed to address this issue. Despite this, in many physical systems, such as diffusion processes, smoothness naturally increases and unitarity may be overconstraining. In this paper, we systematically study the smoothing effects of different GNNs for dynamics modeling and prove that unitary convolutions hurt performance for such tasks. We propose relaxed unitary convolutions that balance smoothness preservation with the natural smoothing required for physical systems. We also generalize unitary and relaxed unitary convolutions from graphs to meshes. In experiments on PDEs such as the heat and wave equations over complex meshes and on weather forecasting, we find that our method outperforms several strong baselines, including mesh-aware transformers and equivariant neural networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that unitary graph convolutions overconstrain natural smoothing in dynamics modeling tasks (e.g., PDEs on meshes), proves they degrade performance, proposes relaxed unitary convolutions to balance preservation with required diffusion, generalizes the approach from graphs to meshes, and reports empirical outperformance over mesh-aware transformers and equivariant networks on heat/wave equations and weather forecasting.

Significance. If the proof is rigorous and the relaxation admits a task-independent choice rule, the work would offer a targeted fix for oversmoothing in GNN-based physical simulators while retaining some unitary benefits, with potential impact on scientific ML applications involving complex geometries.

major comments (2)
  1. [§3] §3 (proof that unitary convolutions hurt performance): the derivation relies on a specific smoothness-error metric; it is unclear whether the result holds for the exact discretization and time-stepping of the heat equation (Eq. 1) or requires additional assumptions on the mesh Laplacian that are not stated.
  2. [§4.2] §4.2 (relaxed unitary convolution definition): the relaxation parameter is introduced without a bound or selection rule independent of the target PDE or mesh; if its value must be tuned per task on validation data, the claimed systematic advantage over regularized baselines is undermined.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'natural smoothing required for physical systems' is used without a brief parenthetical definition or reference to the diffusion term in the governing equations.
  2. [Table 1] Table 1: the reported standard deviations for the relaxed unitary model on the wave equation are missing; add them for consistency with other rows.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the scope and applicability of our results. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation and address the concerns raised.

read point-by-point responses
  1. Referee: §3 (proof that unitary convolutions hurt performance): the derivation relies on a specific smoothness-error metric; it is unclear whether the result holds for the exact discretization and time-stepping of the heat equation (Eq. 1) or requires additional assumptions on the mesh Laplacian that are not stated.

    Authors: We appreciate this observation. The smoothness-error metric in §3 is the Dirichlet energy, which is the standard quadratic form associated with the graph/mesh Laplacian and directly measures the smoothing effect relevant to diffusion dynamics. To make the connection explicit, we will add a short appendix subsection deriving that the same bound holds under the standard cotangent-weighted discretization of the Laplace-Beltrami operator used in Eq. 1, with the only additional assumption being that the mesh is quasi-uniform (a standard condition already implicit in our experimental setup). This clarification will be included in the revised version. revision: yes

  2. Referee: §4.2 (relaxed unitary convolution definition): the relaxation parameter is introduced without a bound or selection rule independent of the target PDE or mesh; if its value must be tuned per task on validation data, the claimed systematic advantage over regularized baselines is undermined.

    Authors: The referee correctly identifies that a fully task-independent selection rule would strengthen the claim. In the revised manuscript we will add an explicit, mesh-dependent but task-independent rule: the relaxation parameter α is set to α = 1 − λ_min / λ_max, where λ_min and λ_max are the smallest non-zero and largest eigenvalues of the normalized mesh Laplacian. This choice is derived from a spectral analysis that guarantees the relaxed operator remains contractive while permitting the diffusion rate required by the underlying PDE; it requires only a single eigendecomposition (or its approximation via Lanczos) that is independent of the particular dynamics being learned. We will also state the resulting stability bound (0 < α ≤ 1) and include a brief proof that this choice recovers the unitary case when the mesh is disconnected. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation of relaxed unitary convolutions

full rationale

The paper analyzes oversmoothing in GNNs for PDE dynamics, proves unitary convolutions overconstrain natural smoothing in systems like diffusion, and introduces relaxed unitary convolutions as an explicit modification to balance preservation with required diffusion effects. This is grounded in the stated analysis of physical systems rather than reducing to self-definition, fitted inputs renamed as predictions, or self-citation chains. The relaxation parameter is presented as a tunable balance without equations showing it is derived from or equivalent to the target performance metrics by construction. Generalization to meshes and experimental validation on heat/wave equations and weather forecasting remain independent of any circular reduction. No load-bearing step collapses to the inputs via the enumerated patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the mathematical definition of unitarity preserving smoothness (from prior work) and the introduction of a relaxation parameter to allow natural dynamics smoothing.

free parameters (1)
  • relaxation parameter
    Controls the degree to which unitarity is relaxed; must be set or tuned per task or dataset to balance constraints.
axioms (1)
  • domain assumption Unitary convolutions mathematically preserve smoothness levels in GNNs
    Invoked as the baseline constraint that the paper relaxes for dynamics tasks.

pith-pipeline@v0.9.0 · 5481 in / 1196 out tokens · 45714 ms · 2026-05-16T07:20:08.265973+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

  1. [1]

    baron de Fourier, J

    PMLR, 2017. baron de Fourier, J. B. J.Th ´eorie analytique de la chaleur. Firmin Didot, 1822. Basu, S., Gallego-Posada, J., Vigan`o, F., Rowbottom, J., and Cohen, T. Equivariant mesh attention networks.Transac- tions on Machine Learning Research, 2022. ISSN 2835-

  2. [2]

    A Note on Over-Smoothing for Graph Neural Networks, June 2020

    URL https://openreview.net/forum? id=3IqqJh2Ycy. Expert Certification. Bobenko, A. I. and Springborn, B. A. A discrete laplace– beltrami operator for simplicial surfaces.Discrete & Computational Geometry, 38(4):740–756, 2007. Bodnar, C., Di Giovanni, F., Chamberlain, B., Lio, P., and Bronstein, M. Neural sheaf diffusion: A topological perspective on heter...

  3. [3]

    Crane, K., Weischedel, C., and Wardetzky, M

    ACM. Crane, K., Weischedel, C., and Wardetzky, M. The heat method for distance computation.Communications of the ACM, 60(11):90–99, 2017. Cranmer, M., Greydanus, S., Hoyer, S., Battaglia, P., Spergel, D., and Ho, S. Lagrangian neural networks. InICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations, 2020. Cui, Q., Zhang, M., X...

  4. [4]

    Hersbach, B

    Presented at the ORESME Reading Group Meeting, September 30, 2017. Daniels, M. and Rigollet, P. Splat regression models.arXiv preprint arXiv:2511.14042, 2025. Daniels, M., Hodgkinson, L., and Mahoney, M. Uncertainty- aware diagnostics for physics-informed machine learning. arXiv preprint arXiv:2510.26121, 2025. de Haan, P., Weiler, M., Cohen, T., and Well...

  5. [5]

    Jarvis, M., Bernstein, G., and Jain, B

    URL https://openreview.net/forum? id=mfIX4QpsARJ. Jarvis, M., Bernstein, G., and Jain, B. The skewness of the aperture mass statistic.Monthly Notices of the Royal Astronomical Society, 352(1):338–352, 2004. Keriven, N. Not too little, not too much: a theoretical analysis of graph (over) smoothing.Advances in Neural Information Processing Systems, 35:2268–...

  6. [6]

    Graphcast: Learning skillful medium-range global weather forecasting

    URL https://openreview.net/forum? id=SJU4ayYgl. Kulick, C., Birnir, B., and Tang, S. Investigating zero-shot size transfer of graph neural differential equations for learning graph diffusion dynamics. InTopology, Algebra, and Geometry in Data Science, 2025. URL https: //openreview.net/forum?id=qgbyLknKXy. Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsb...

  7. [7]

    Liu, Z., Wang, Y ., Vaidya, S., Ruehle, F., Halverson, J., Sol- jacic, M., Hou, T

    URL https://openreview.net/forum? id=c8P9NQVtmnO. Liu, Z., Wang, Y ., Vaidya, S., Ruehle, F., Halverson, J., Sol- jacic, M., Hou, T. Y ., and Tegmark, M. Kan: Kolmogorov– arnold networks. InThe Thirteenth International Confer- ence on Learning Representations, 2024. Marisca, I., Bamberger, J., Alippi, C., and Bronstein, M. M. Over-squashing in spatiotempo...

  8. [8]

    Mitchel, T

    Springer, 2003. Mitchel, T. W., Kim, V . G., and Kazhdan, M. Field convolu- tions for surface cnns. InProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10001– 10011, 2021. Mitchel, T. W., Aigerman, N., Kim, V . G., and Kazhdan, M. M¨obius convolutions for spherical cnns. InACM SIG- GRAPH 2022 Conference Proceedings, pp. 1–9, 202...

  9. [9]

    com/en-us-data-center-overview-mc/ en-us-data-center-overview/ hpc-datasheet-sc23-h200

    URL https://resources.nvidia. com/en-us-data-center-overview-mc/ en-us-data-center-overview/ hpc-datasheet-sc23-h200 . Retrieved from NVIDIA website. Olver, P. J.Applications of Lie groups to differential equa- tions, volume 107. Springer Science & Business Media, 1993. Pandya, S., Yang, Y ., Van Alfen, N., Blazek, J., and Walters, R. Iaemu: Learning Gala...

  10. [10]

    Schneider, P

    URL https://openreview.net/forum? id=u8HmtBBSVJS. Schneider, P. Weak gravitational lensing. InGravitational lensing: strong, weak and micro, pp. 269–451. Springer, 2006. Shao, Z., Shi, D., Han, A., Guo, Y ., Zhao, Q., and Gao, J. Unifying over-smoothing and over-squashing in graph neural networks: A physics informed approach and beyond, 2024. URL https://...

  11. [11]

    Overcoming catastrophic forgetting in neural networks

    URL https://openreview.net/forum? id=wta_8Hx2KD. Wang, R., Walters, R., and Yu, R. Approximately equivari- ant networks for imperfectly symmetric dynamics. InIn- ternational Conference on Machine Learning, pp. 23078– 23091. PMLR, 2022a. Wang, R., Walters, R., and Yu, R. Data augmentation vs. equivariant networks: A theory of generalization on dy- namics f...

  12. [12]

    Proposition 3(Proposition 7 in Kiani et al

    have a high probability to exhibit smoothing. Proposition 3(Proposition 7 in Kiani et al. (2024)).Given a simple undirected graph G on n nodes with normalized adjacency matrix eA=D −1/2AD−1/2 and node degree bounded by D, let X∈R n×d have rows drawn i.i.d. from the uniform distribution on the hypersphere in dimensiond. Let fconv(X) = eAXW denote convoluti...