Zero-Shot Size Transfer for Neural ODEs on Sparse Random Graphs: Graphon Limits and Adjoint Convergence

Mingsong Yan; Sui Tang; Zhida Wang

arxiv: 2606.26662 · v1 · pith:VAPP6URXnew · submitted 2026-06-25 · 💻 cs.LG · cs.AI· cs.NA· math.DS· math.NA· math.OC

Zero-Shot Size Transfer for Neural ODEs on Sparse Random Graphs: Graphon Limits and Adjoint Convergence

Mingsong Yan , Zhida Wang , Sui Tang This is my paper

Pith reviewed 2026-06-26 05:23 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.NAmath.DSmath.NAmath.OC

keywords graph neural differential equationsgraphon neural differential equationszero-shot size transferconvergence ratesadjoint systemssparse random graphsdiscretize-then-optimize

0 comments

The pith

GNDE solutions on sparse random graphs converge to Graphon-NDE limits at rate O((α_n n)^{-1/2}) with high probability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes convergence of Graph Neural Differential Equations on finite n-node random graphs to their Graphon Neural Differential Equation counterparts as n grows. For graphs sampled with sparsity α_n from a fixed graphon, the trajectory difference shrinks at rate O((α_n n)^{-1/2}) up to logs and holds with high probability. The same limit relation is shown for the adjoint systems that compute gradients, and discretize-then-optimize versus optimize-then-discretize training become asymptotically consistent under Euler steps. This supplies the quantitative basis for training a model on a small graph and applying it unchanged to much larger graphs drawn from the same underlying graphon.

Core claim

For an n-node random graph with sparsity parameter α_n sampled independently from a fixed graphon, GNDE solutions converge trajectory-wise to Graphon-NDE solutions at rate O((α_n n)^{-1/2}), up to logarithmic factors, with high probability. Uniform-in-time convergence bounds are obtained for the adjoint systems that govern hidden-state and parameter gradients. Under explicit Euler discretization with M steps, discretize-then-optimize and optimize-then-discretize training are asymptotically consistent, with hidden-state discrepancies of order O(1/M) and local parameter-gradient discrepancies of order O(1/M^2), up to sparsity and logarithmic factors.

What carries the argument

Graphon-NDEs and adjoint Graphon-NDEs as the infinite-node limits of GNDE forward and adjoint systems on sparse random graphs sampled from graphons.

If this is right

Trajectory-wise convergence of GNDE solutions to Graphon-NDE solutions holds at the stated rate with high probability.
Uniform-in-time convergence bounds apply to the adjoint systems used for gradient computation.
Discretize-then-optimize and optimize-then-discretize training become asymptotically consistent with hidden-state error O(1/M) and parameter-gradient error O(1/M^2).
Zero-shot deployment of a trained GNDE on larger independently sampled graphs from the same graphon is justified by the convergence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The rate suggests that transfer accuracy improves when the product α_n n grows, even if individual graphs remain sparse.
The same limit argument could be checked numerically by comparing transfer error across graph sizes while fixing the product α_n n.
Adjoint convergence implies that gradient estimates obtained on small graphs remain reliable when the model is evaluated on larger ones.

Load-bearing premise

The finite graphs are sampled independently from a fixed graphon and the GNN velocity fields use local size-independent filters that admit a well-defined graphon limit.

What would settle it

Run GNDE and Graphon-NDE trajectories on independent samples from the same graphon at increasing n while holding α_n n fixed; if the observed trajectory difference fails to contract at the predicted O((α_n n)^{-1/2}) rate, the convergence statement is false.

Figures

Figures reproduced from arXiv: 2606.26662 by Mingsong Yan, Sui Tang, Zhida Wang.

**Figure 2.** Figure 2: Transfer error for forward trajectory: log-log plot of [PITH_FULL_IMAGE:figures/full_fig_p019_2.png] view at source ↗

**Figure 3.** Figure 3: Transfer error for hidden-state gradients: log-log plot of [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗

**Figure 4.** Figure 4: Transfer error for parameter gradients: log-log plot of [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

**Figure 5.** Figure 5: Temporal discretization error: log-log plot of [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗

**Figure 6.** Figure 6: DTO–OTD gradient discrepancy: log-log plots of the DTO–OTD gradient discrepan [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: Two graphons used in the pattern-formation experiment, with representative sample [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗

**Figure 8.** Figure 8: Ground truth versus GNDE prediction on source graphs with [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗

read the original abstract

Graph Neural Differential Equations (GNDEs) model continuous-time graph dynamics by parameterizing Neural ODE velocity fields with Graph Neural Networks. Their local, size-independent filters suggest a zero-shot size-transfer principle: train on a small graph and deploy on larger, similar graphs without retraining. We develop a quantitative theory for this principle on sparse random graphs sampled from graphons. We consider Graphon Neural Differential Equations (Graphon-NDEs) and adjoint Graphon-NDEs as the infinite-node limits of the forward and adjoint GNDE systems, and establish well-posedness. For an $n$-node random graph with sparsity parameter $\alpha_n$, we prove trajectory-wise convergence of GNDE solutions to Graphon-NDE solutions at rate $O((\alpha_n n)^{-1/2})$, up to logarithmic factors, with high probability. We also establish uniform-in-time convergence bounds for adjoint systems governing hidden-state and parameter gradients. We further study discretize-then-optimize (DTO) and optimize-then-discretize (OTD) training. Under explicit Euler discretization with $M$ steps, we show that DTO and OTD are asymptotically consistent, with hidden-state and local parameter-gradient discrepancies of orders $O(1/M)$ and $O(1/M^2)$, respectively, up to sparsity and logarithmic factors. Experiments on HSBM and tent graphons support the theoretical rates, while zero-shot transfer experiments across four graphon classes demonstrate accurate deployment of learned GNDEs on larger independently sampled graphs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper supplies explicit high-probability rates for GNDE-to-graphon-NDE convergence on sparse graphs plus adjoint and training consistency, which directly supports the zero-shot transfer claim under the stated sampling assumptions.

read the letter

The core contribution here is the set of quantitative bounds: trajectory convergence at O((α_n n)^{-1/2}) up to logs, uniform adjoint bounds, and O(1/M), O(1/M^2) consistency for DTO/OTD under Euler steps. These are new extensions of graphon analysis into parameterized continuous-time graph models.

The work does the standard graphon sampling plus Gronwall lifting cleanly enough to produce the rates, and the experiments on HSBM and tent graphons line up with the predicted scaling. The zero-shot transfer tests across four graphon families give some practical evidence that the limit object is useful for deployment. The assumptions are stated up front—i.i.d. sampling from a fixed graphon and local size-independent filters—so there is no hidden circularity or n-dependent Lipschitz fudge.

The main soft spot is that everything rests on graphs that really are drawn from a single fixed graphon; many empirical networks are not, so the rates may degrade in practice without extra robustness work. The discretization analysis looks standard but would benefit from seeing the full error propagation in the manuscript. No load-bearing contradictions appear from the abstract and stress-test.

This is aimed at researchers who need scaling guarantees for continuous graph dynamics or who build on graphon theory. It is grounded enough and novel enough on the rates to deserve a serious referee, even if revisions will be needed on the experiment details and assumption scope.

Referee Report

0 major / 2 minor

Summary. The manuscript develops a quantitative theory for zero-shot size transfer in Graph Neural Differential Equations (GNDEs) on sparse random graphs sampled from graphons. It defines Graphon-NDEs (and their adjoints) as the infinite-node limits of the forward and adjoint GNDE systems, establishes well-posedness, and proves trajectory-wise convergence of n-node GNDE solutions to the Graphon-NDE limit at rate O((α_n n)^{-1/2}) (up to logarithmic factors) with high probability. Uniform-in-time convergence bounds are derived for the adjoint systems. Under explicit Euler discretization with M steps, discretize-then-optimize and optimize-then-discretize schemes are shown to be asymptotically consistent, with hidden-state and local parameter-gradient discrepancies of orders O(1/M) and O(1/M^2) (up to sparsity and log factors). Experiments on HSBM and tent graphons validate the rates and demonstrate zero-shot transfer across graphon classes.

Significance. If the central claims hold, the work supplies explicit, high-probability convergence rates and adjoint bounds that justify size-independent deployment of trained GNDEs, a practically relevant property for continuous-time graph models. The derivation of the O((α_n n)^{-1/2}) rate directly from graphon sampling assumptions and standard Gronwall arguments, together with the asymptotic consistency results for the two training paradigms, constitutes a clear technical contribution. The paper supplies parameter-free rate expressions and high-probability bounds rather than data-fitted quantities.

minor comments (2)

[§3.2] §3.2: the statement of the local, size-independent filter assumption could be restated more explicitly as a hypothesis on the GNN velocity field to make the invocation of the graphon limit fully self-contained.
[Figure 2] Figure 2 caption: the legend for the four graphon classes is slightly compressed; enlarging the font or adding a separate key would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their thorough reading and positive recommendation to accept the manuscript. The report accurately summarizes the contributions, and we have no major comments to address.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation relies on explicit graphon sampling assumptions, standard concentration inequalities, and Gronwall lifting from graphon approximation error to ODE trajectories. These are external to the paper's fitted quantities and do not reduce by construction to self-defined predictions, fitted inputs renamed as outputs, or load-bearing self-citations. The central claims (trajectory-wise convergence at O((α_n n)^{-1/2}) and adjoint bounds) follow from the stated setup without internal redefinition or smuggling of ansatzes.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The theory rests on standard graphon sampling and ODE well-posedness; no free parameters are fitted inside the proofs, and the only new entity is the Graphon-NDE limit itself, which is defined rather than postulated with external evidence.

axioms (2)

domain assumption Well-posedness of the Graphon-NDE and adjoint Graphon-NDE systems
Invoked to establish the infinite-node limit before proving convergence of finite GNDEs.
domain assumption Graphs are sampled independently from a fixed graphon with sparsity α_n
Central modeling assumption used for all convergence statements.

invented entities (1)

Graphon-NDE no independent evidence
purpose: Infinite-node limit of the GNDE forward system
Defined as the continuum object to which finite GNDEs converge; no independent falsifiable prediction is given beyond the convergence itself.

pith-pipeline@v0.9.1-grok · 5825 in / 1518 out tokens · 23139 ms · 2026-06-26T05:23:41.549853+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 12 canonical work pages · 5 internal anchors

[1]

Neural ODEs as the deep limit of ResNets with constant weights

Benny Avelin and Kaj Nystr \"o m. Neural ODEs as the deep limit of ResNets with constant weights. Analysis and Applications, 19 0 (03): 0 397--437, 2021

2021
[2]

Mean-field and graph limits for collective dynamics models with time-varying weights

Nathalie Ayi and Nastassia Pouradier Duteil. Mean-field and graph limits for collective dynamics models with time-varying weights. Journal of Differential Equations, 299: 0 65--110, 2021

2021
[3]

Graphon mean field systems

Erhan Bayraktar, Suman Chakraborty, and Ruoyu Wu. Graphon mean field systems. The Annals of Applied Probability, 33 0 (5): 0 3587--3619, 2023

2023
[4]

Permutation equivariant neural controlled differential equations for dynamic graph representation learning

Torben Berndt, Benjamin Walker, Tiexin Qin, Jan St \"u hmer, and Andrey Kormilitzin. Permutation equivariant neural controlled differential equations for dynamic graph representation learning. Advances in Neural Information Processing Systems, 38: 0 98276--98311, 2026

2026
[5]

Brooks, Philip S

Heather Z. Brooks, Philip S. Chodrow, and Mason A. Porter. Emergence of polarization in a sigmoidal bounded-confidence model of opinion dynamics. SIAM Journal on Applied Dynamical Systems, 23 0 (2): 0 1442--1475, 2024

2024
[6]

Grand: Graph neural diffusion

Ben Chamberlain, James Rowbottom, Maria I Gorinova, Michael Bronstein, Stefan Webb, and Emanuele Rossi. Grand: Graph neural diffusion. In International Conference on Machine Learning, pages 1407--1418. PMLR, 2021

2021
[7]

Neural ordinary differential equations

Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. Advances in Neural Information Processing Systems, 31, 2018

2018
[8]

GREAD : Graph neural reaction-diffusion networks

Jeongwhan Choi, Seoyoung Hong, Noseong Park, and Sung-Bae Cho. GREAD : Graph neural reaction-diffusion networks. In International Conference on Machine Learning, pages 5722--5747. PMLR, 2023

2023
[9]

Fan R. K. Chung. Spectral Graph Theory, volume 92 of CBMS Regional Conference Series in Mathematics. American Mathematical Society, 1997

1997
[10]

Diffusion and elastic equations on networks

Soon-Yeong Chung, Yun-Sung Chung, and Jong-Ho Kim. Diffusion and elastic equations on networks. Publications of the Research Institute for Mathematical Sciences, 43 0 (3): 0 699--726, 2007

2007
[11]

Some G ronwall type inequalities and applications

Sever Silvestru Dragomir. Some G ronwall type inequalities and applications. Science Direct Working Paper, 0 (S1574-0358): 0 04, 2003

2003
[12]

The fisher--kpp equation over simple graphs: Varied persistence states in river networks

Yihong Du, Bendong Lou, Rui Peng, and Maolin Zhou. The fisher--kpp equation over simple graphs: Varied persistence states in river networks. Journal of Mathematical Biology, 80: 0 1559--1616, 2020

2020
[13]

Convex analysis and variational problems

Ivar Ekeland and Roger Temam. Convex analysis and variational problems. SIAM, 1999

1999
[14]

Spatial-temporal graph ODE networks for traffic flow forecasting

Zheng Fang, Qingqing Long, Guojie Song, and Kunqing Xie. Spatial-temporal graph ODE networks for traffic flow forecasting. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 364--373, 2021

2021
[15]

A stable and scalable method for solving initial value PDEs with neural networks

Marc Finzi, Andres Potapczynski, Matthew Choptuik, and Andrew Gordon Wilson. A stable and scalable method for solving initial value PDEs with neural networks. arXiv preprint arXiv:2304.14994, 2023

work page arXiv 2023
[16]

R. A. Fisher. The wave of advance of advantageous genes. Annals of Eugenics, 7 0 (4): 0 355--369, 1937

1937
[17]

Global convergence in neural ODEs : Impact of activation functions

Tianxiang Gao, Siyuan Sun, Hailiang Liu, and Hongyang Gao. Global convergence in neural ODEs : Impact of activation functions. arXiv preprint arXiv:2509.22436, 2025

work page arXiv 2025
[18]

ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs

Amir Gholami, Kurt Keutzer, and George Biros. ANODE : Unconditionally accurate memory-efficient gradients for neural odes. arXiv preprint arXiv:1902.10298, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902
[19]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016

2016
[20]

Opinion dynamics and bounded confidence models, analysis, and simulation

Rainer Hegselmann and Ulrich Krause. Opinion dynamics and bounded confidence models, analysis, and simulation. Journal of Artificial Societies and Social Simulation, 5 0 (3): 0 2, 2002

2002
[21]

Higher-order graphon neural networks: Approximation and cut distance

Daniel Herbst and Stefanie Jegelka. Higher-order graphon neural networks: Approximation and cut distance. arXiv preprint arXiv:2503.14338, 2025

work page arXiv 2025
[22]

Invasion fronts on graphs: The fisher--kpp equation on homogeneous trees and erd o s--r \'e nyi graphs

Aaron Hoffman and Matt Holzer. Invasion fronts on graphs: The fisher--kpp equation on homogeneous trees and erd o s--r \'e nyi graphs. Discrete and Continuous Dynamical Systems - Series B, 24 0 (2): 0 671--694, 2019

2019
[23]

Generalizing graph ODE for learning complex system dynamics across environments

Zijie Huang, Yizhou Sun, and Wei Wang. Generalizing graph ODE for learning complex system dynamics across environments. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 798--809, 2023

2023
[24]

Causal graph ODE : Continuous treatment effect modeling in multi-agent dynamical systems

Zijie Huang, Jeehyun Hwang, Junkai Zhang, Jinwoo Baik, Weitong Zhang, Dominik Wodarz, Yizhou Sun, Quanquan Gu, and Wei Wang. Causal graph ODE : Continuous treatment effect modeling in multi-agent dynamical systems. In Proceedings of the ACM Web Conference 2024, pages 4607--4617, 2024

2024
[25]

Sparse M onte C arlo method for nonlocal diffusion problems

Dmitry Kaliuzhnyi-Verbovetskyi and Georgi S Medvedev. Sparse M onte C arlo method for nonlocal diffusion problems. SIAM Journal on Numerical Analysis, 60 0 (6): 0 3001--3028, 2022

2022
[26]

Convergence and stability of graph convolutional networks on large random graphs

Nicolas Keriven, Alberto Bietti, and Samuel Vaiter. Convergence and stability of graph convolutional networks on large random graphs. Advances in Neural Information Processing Systems, 33: 0 21512--21523, 2020

2020
[27]

On neural differential equations

Patrick Kidger. On neural differential equations. arXiv preprint arXiv:2202.02435, 2022

work page arXiv 2022
[28]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[29]

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[30]

A. N. Kolmogorov, I. G. Petrovskii, and N. S. Piskunov. A study of the diffusion equation with increase in the amount of substance, and its application to a biological problem. Bulletin of Moscow University, Mathematics and Mechanics, 1 0 (6): 0 1--26, 1937

1937
[31]

Ana Lajmanovich and James A. Yorke. A deterministic model for gonorrhea in a nonhomogeneous population. Mathematical Biosciences, 28 0 (3--4): 0 221--236, 1976

1976
[32]

Limits, approximation and size transferability for GNNs on sparse graphs via graphops

Thien Le and Stefanie Jegelka. Limits, approximation and size transferability for GNNs on sparse graphs via graphops. Advances in Neural Information Processing Systems, 36: 0 41305--41342, 2023

2023
[33]

Transferability of spectral graph convolutional neural networks

Ron Levie, Wei Huang, Lorenzo Bucci, Michael Bronstein, and Gitta Kutyniok. Transferability of spectral graph convolutional neural networks. Journal of Machine Learning Research, 22 0 (272): 0 1--59, 2021

2021
[34]

Graph ODEs and beyond: A comprehensive survey on integrating differential equations with graph neural networks

Zewen Liu, Xiaoda Wang, Bohan Wang, Zijie Huang, Carl Yang, and Wei Jin. Graph ODEs and beyond: A comprehensive survey on integrating differential equations with graph neural networks. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2, pages 6118--6128, 2025

2025
[35]

Large networks and graph limits, volume 60

L \'a szl \'o Lov \'a sz. Large networks and graph limits, volume 60. American Mathematical Society, 2012

2012
[36]

HOPE : High-order graph ODE for modeling interacting dynamics

Xiao Luo, Jingyang Yuan, Zijie Huang, Huiyu Jiang, Yifang Qin, Wei Ju, Ming Zhang, and Yizhou Sun. HOPE : High-order graph ODE for modeling interacting dynamics. In International Conference on Machine Learning, pages 23124--23139. PMLR, 2023

2023
[37]

Transferability of graph neural networks: an extended graphon approach

Sohir Maskey, Ron Levie, and Gitta Kutyniok. Transferability of graph neural networks: an extended graphon approach. Applied and Computational Harmonic Analysis, 63: 0 48--83, 2023

2023
[38]

The nonlinear heat equation on dense graphs and graph limits

Georgi S Medvedev. The nonlinear heat equation on dense graphs and graph limits. SIAM Journal on Mathematical Analysis, 46 0 (4): 0 2743--2766, 2014 a

2014
[39]

The nonlinear heat equation on W -random graphs

Georgi S Medvedev. The nonlinear heat equation on W -random graphs. Archive for Rational Mechanics and Analysis, 212 0 (3): 0 781--803, 2014 b

2014
[40]

Heterophilious dynamics enhances consensus

S \'e bastien Motsch and Eitan Tadmor. Heterophilious dynamics enhances consensus. SIAM Review, 56 0 (4): 0 577--621, 2014

2014
[41]

Discretize-optimize vs

Derek Onken and Lars Ruthotto. Discretize-optimize vs. optimize-discretize for time-series regression and continuous normalizing flows. arXiv preprint arXiv:2005.13420, 2020

work page arXiv 2005
[42]

Mean field, hydrodynamic and graph limits for deterministic interacting particle systems: a survey with quantitative estimates

Thierry Paul and Emmanuel Tr \'e lat. From microscopic to macroscopic scale equations: mean field, hydrodynamic and graph limits. arXiv preprint arXiv:2209.08832, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[43]

Graph neural ordinary differential equations

Michael Poli, Stefano Massaroli, Junyoung Park, Atsushi Yamashita, Hajime Asama, and Jinkyoo Park. Graph neural ordinary differential equations. arXiv preprint arXiv:1911.07532, 2019

work page arXiv 1911
[44]

Graphon neural networks and the transferability of graph neural networks

Luana Ruiz, Luiz Chamon, and Alejandro Ribeiro. Graphon neural networks and the transferability of graph neural networks. Advances in Neural Information Processing Systems, 33: 0 1702--1712, 2020

2020
[45]

Graph-coupled oscillator networks

T Konstantin Rusch, Ben Chamberlain, James Rowbottom, Siddhartha Mishra, and Michael Bronstein. Graph-coupled oscillator networks. In International Conference on Machine Learning, pages 18888--18909. PMLR, 2022

2022
[46]

Do residual neural networks discretize neural ordinary differential equations? Advances in Neural Information Processing Systems, 35: 0 36520--36532, 2022

Michael Sander, Pierre Ablin, and Gabriel Peyr \'e . Do residual neural networks discretize neural ordinary differential equations? Advances in Neural Information Processing Systems, 35: 0 36520--36532, 2022

2022
[47]

The graph neural network model

Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, 20 0 (1): 0 61--80, 2008

2008
[48]

The N -intertwined SIS epidemic network model

Piet Van Mieghem. The N -intertwined SIS epidemic network model. Computing, 93 0 (2--4): 0 147--169, 2011

2011
[49]

Virus spread in networks

Piet Van Mieghem, Jasmina Omic, and Robert Kooij. Virus spread in networks. IEEE/ACM Transactions on Networking, 17 0 (1): 0 1--14, 2009

2009
[50]

Correcting auto-differentiation in neural- ODE training

Yewei Xu, Shi Chen, and Qin Li. Correcting auto-differentiation in neural- ODE training. arXiv preprint arXiv:2306.02192, 2023

work page arXiv 2023
[51]

On the Convergence and Size Transferability of Continuous-depth Graph Neural Networks

Mingsong Yan, Charles Kulick, and Sui Tang. On the convergence and size transferability of continuous-depth graph neural networks. arXiv preprint arXiv:2510.03923, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[1] [1]

Neural ODEs as the deep limit of ResNets with constant weights

Benny Avelin and Kaj Nystr \"o m. Neural ODEs as the deep limit of ResNets with constant weights. Analysis and Applications, 19 0 (03): 0 397--437, 2021

2021

[2] [2]

Mean-field and graph limits for collective dynamics models with time-varying weights

Nathalie Ayi and Nastassia Pouradier Duteil. Mean-field and graph limits for collective dynamics models with time-varying weights. Journal of Differential Equations, 299: 0 65--110, 2021

2021

[3] [3]

Graphon mean field systems

Erhan Bayraktar, Suman Chakraborty, and Ruoyu Wu. Graphon mean field systems. The Annals of Applied Probability, 33 0 (5): 0 3587--3619, 2023

2023

[4] [4]

Permutation equivariant neural controlled differential equations for dynamic graph representation learning

Torben Berndt, Benjamin Walker, Tiexin Qin, Jan St \"u hmer, and Andrey Kormilitzin. Permutation equivariant neural controlled differential equations for dynamic graph representation learning. Advances in Neural Information Processing Systems, 38: 0 98276--98311, 2026

2026

[5] [5]

Brooks, Philip S

Heather Z. Brooks, Philip S. Chodrow, and Mason A. Porter. Emergence of polarization in a sigmoidal bounded-confidence model of opinion dynamics. SIAM Journal on Applied Dynamical Systems, 23 0 (2): 0 1442--1475, 2024

2024

[6] [6]

Grand: Graph neural diffusion

Ben Chamberlain, James Rowbottom, Maria I Gorinova, Michael Bronstein, Stefan Webb, and Emanuele Rossi. Grand: Graph neural diffusion. In International Conference on Machine Learning, pages 1407--1418. PMLR, 2021

2021

[7] [7]

Neural ordinary differential equations

Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. Advances in Neural Information Processing Systems, 31, 2018

2018

[8] [8]

GREAD : Graph neural reaction-diffusion networks

Jeongwhan Choi, Seoyoung Hong, Noseong Park, and Sung-Bae Cho. GREAD : Graph neural reaction-diffusion networks. In International Conference on Machine Learning, pages 5722--5747. PMLR, 2023

2023

[9] [9]

Fan R. K. Chung. Spectral Graph Theory, volume 92 of CBMS Regional Conference Series in Mathematics. American Mathematical Society, 1997

1997

[10] [10]

Diffusion and elastic equations on networks

Soon-Yeong Chung, Yun-Sung Chung, and Jong-Ho Kim. Diffusion and elastic equations on networks. Publications of the Research Institute for Mathematical Sciences, 43 0 (3): 0 699--726, 2007

2007

[11] [11]

Some G ronwall type inequalities and applications

Sever Silvestru Dragomir. Some G ronwall type inequalities and applications. Science Direct Working Paper, 0 (S1574-0358): 0 04, 2003

2003

[12] [12]

The fisher--kpp equation over simple graphs: Varied persistence states in river networks

Yihong Du, Bendong Lou, Rui Peng, and Maolin Zhou. The fisher--kpp equation over simple graphs: Varied persistence states in river networks. Journal of Mathematical Biology, 80: 0 1559--1616, 2020

2020

[13] [13]

Convex analysis and variational problems

Ivar Ekeland and Roger Temam. Convex analysis and variational problems. SIAM, 1999

1999

[14] [14]

Spatial-temporal graph ODE networks for traffic flow forecasting

Zheng Fang, Qingqing Long, Guojie Song, and Kunqing Xie. Spatial-temporal graph ODE networks for traffic flow forecasting. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 364--373, 2021

2021

[15] [15]

A stable and scalable method for solving initial value PDEs with neural networks

Marc Finzi, Andres Potapczynski, Matthew Choptuik, and Andrew Gordon Wilson. A stable and scalable method for solving initial value PDEs with neural networks. arXiv preprint arXiv:2304.14994, 2023

work page arXiv 2023

[16] [16]

R. A. Fisher. The wave of advance of advantageous genes. Annals of Eugenics, 7 0 (4): 0 355--369, 1937

1937

[17] [17]

Global convergence in neural ODEs : Impact of activation functions

Tianxiang Gao, Siyuan Sun, Hailiang Liu, and Hongyang Gao. Global convergence in neural ODEs : Impact of activation functions. arXiv preprint arXiv:2509.22436, 2025

work page arXiv 2025

[18] [18]

ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs

Amir Gholami, Kurt Keutzer, and George Biros. ANODE : Unconditionally accurate memory-efficient gradients for neural odes. arXiv preprint arXiv:1902.10298, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902

[19] [19]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016

2016

[20] [20]

Opinion dynamics and bounded confidence models, analysis, and simulation

Rainer Hegselmann and Ulrich Krause. Opinion dynamics and bounded confidence models, analysis, and simulation. Journal of Artificial Societies and Social Simulation, 5 0 (3): 0 2, 2002

2002

[21] [21]

Higher-order graphon neural networks: Approximation and cut distance

Daniel Herbst and Stefanie Jegelka. Higher-order graphon neural networks: Approximation and cut distance. arXiv preprint arXiv:2503.14338, 2025

work page arXiv 2025

[22] [22]

Invasion fronts on graphs: The fisher--kpp equation on homogeneous trees and erd o s--r \'e nyi graphs

Aaron Hoffman and Matt Holzer. Invasion fronts on graphs: The fisher--kpp equation on homogeneous trees and erd o s--r \'e nyi graphs. Discrete and Continuous Dynamical Systems - Series B, 24 0 (2): 0 671--694, 2019

2019

[23] [23]

Generalizing graph ODE for learning complex system dynamics across environments

Zijie Huang, Yizhou Sun, and Wei Wang. Generalizing graph ODE for learning complex system dynamics across environments. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 798--809, 2023

2023

[24] [24]

Causal graph ODE : Continuous treatment effect modeling in multi-agent dynamical systems

Zijie Huang, Jeehyun Hwang, Junkai Zhang, Jinwoo Baik, Weitong Zhang, Dominik Wodarz, Yizhou Sun, Quanquan Gu, and Wei Wang. Causal graph ODE : Continuous treatment effect modeling in multi-agent dynamical systems. In Proceedings of the ACM Web Conference 2024, pages 4607--4617, 2024

2024

[25] [25]

Sparse M onte C arlo method for nonlocal diffusion problems

Dmitry Kaliuzhnyi-Verbovetskyi and Georgi S Medvedev. Sparse M onte C arlo method for nonlocal diffusion problems. SIAM Journal on Numerical Analysis, 60 0 (6): 0 3001--3028, 2022

2022

[26] [26]

Convergence and stability of graph convolutional networks on large random graphs

Nicolas Keriven, Alberto Bietti, and Samuel Vaiter. Convergence and stability of graph convolutional networks on large random graphs. Advances in Neural Information Processing Systems, 33: 0 21512--21523, 2020

2020

[27] [27]

On neural differential equations

Patrick Kidger. On neural differential equations. arXiv preprint arXiv:2202.02435, 2022

work page arXiv 2022

[28] [28]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[29] [29]

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[30] [30]

A. N. Kolmogorov, I. G. Petrovskii, and N. S. Piskunov. A study of the diffusion equation with increase in the amount of substance, and its application to a biological problem. Bulletin of Moscow University, Mathematics and Mechanics, 1 0 (6): 0 1--26, 1937

1937

[31] [31]

Ana Lajmanovich and James A. Yorke. A deterministic model for gonorrhea in a nonhomogeneous population. Mathematical Biosciences, 28 0 (3--4): 0 221--236, 1976

1976

[32] [32]

Limits, approximation and size transferability for GNNs on sparse graphs via graphops

Thien Le and Stefanie Jegelka. Limits, approximation and size transferability for GNNs on sparse graphs via graphops. Advances in Neural Information Processing Systems, 36: 0 41305--41342, 2023

2023

[33] [33]

Transferability of spectral graph convolutional neural networks

Ron Levie, Wei Huang, Lorenzo Bucci, Michael Bronstein, and Gitta Kutyniok. Transferability of spectral graph convolutional neural networks. Journal of Machine Learning Research, 22 0 (272): 0 1--59, 2021

2021

[34] [34]

Graph ODEs and beyond: A comprehensive survey on integrating differential equations with graph neural networks

Zewen Liu, Xiaoda Wang, Bohan Wang, Zijie Huang, Carl Yang, and Wei Jin. Graph ODEs and beyond: A comprehensive survey on integrating differential equations with graph neural networks. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2, pages 6118--6128, 2025

2025

[35] [35]

Large networks and graph limits, volume 60

L \'a szl \'o Lov \'a sz. Large networks and graph limits, volume 60. American Mathematical Society, 2012

2012

[36] [36]

HOPE : High-order graph ODE for modeling interacting dynamics

Xiao Luo, Jingyang Yuan, Zijie Huang, Huiyu Jiang, Yifang Qin, Wei Ju, Ming Zhang, and Yizhou Sun. HOPE : High-order graph ODE for modeling interacting dynamics. In International Conference on Machine Learning, pages 23124--23139. PMLR, 2023

2023

[37] [37]

Transferability of graph neural networks: an extended graphon approach

Sohir Maskey, Ron Levie, and Gitta Kutyniok. Transferability of graph neural networks: an extended graphon approach. Applied and Computational Harmonic Analysis, 63: 0 48--83, 2023

2023

[38] [38]

The nonlinear heat equation on dense graphs and graph limits

Georgi S Medvedev. The nonlinear heat equation on dense graphs and graph limits. SIAM Journal on Mathematical Analysis, 46 0 (4): 0 2743--2766, 2014 a

2014

[39] [39]

The nonlinear heat equation on W -random graphs

Georgi S Medvedev. The nonlinear heat equation on W -random graphs. Archive for Rational Mechanics and Analysis, 212 0 (3): 0 781--803, 2014 b

2014

[40] [40]

Heterophilious dynamics enhances consensus

S \'e bastien Motsch and Eitan Tadmor. Heterophilious dynamics enhances consensus. SIAM Review, 56 0 (4): 0 577--621, 2014

2014

[41] [41]

Discretize-optimize vs

Derek Onken and Lars Ruthotto. Discretize-optimize vs. optimize-discretize for time-series regression and continuous normalizing flows. arXiv preprint arXiv:2005.13420, 2020

work page arXiv 2005

[42] [42]

Mean field, hydrodynamic and graph limits for deterministic interacting particle systems: a survey with quantitative estimates

Thierry Paul and Emmanuel Tr \'e lat. From microscopic to macroscopic scale equations: mean field, hydrodynamic and graph limits. arXiv preprint arXiv:2209.08832, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[43] [43]

Graph neural ordinary differential equations

Michael Poli, Stefano Massaroli, Junyoung Park, Atsushi Yamashita, Hajime Asama, and Jinkyoo Park. Graph neural ordinary differential equations. arXiv preprint arXiv:1911.07532, 2019

work page arXiv 1911

[44] [44]

Graphon neural networks and the transferability of graph neural networks

Luana Ruiz, Luiz Chamon, and Alejandro Ribeiro. Graphon neural networks and the transferability of graph neural networks. Advances in Neural Information Processing Systems, 33: 0 1702--1712, 2020

2020

[45] [45]

Graph-coupled oscillator networks

T Konstantin Rusch, Ben Chamberlain, James Rowbottom, Siddhartha Mishra, and Michael Bronstein. Graph-coupled oscillator networks. In International Conference on Machine Learning, pages 18888--18909. PMLR, 2022

2022

[46] [46]

Do residual neural networks discretize neural ordinary differential equations? Advances in Neural Information Processing Systems, 35: 0 36520--36532, 2022

Michael Sander, Pierre Ablin, and Gabriel Peyr \'e . Do residual neural networks discretize neural ordinary differential equations? Advances in Neural Information Processing Systems, 35: 0 36520--36532, 2022

2022

[47] [47]

The graph neural network model

Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, 20 0 (1): 0 61--80, 2008

2008

[48] [48]

The N -intertwined SIS epidemic network model

Piet Van Mieghem. The N -intertwined SIS epidemic network model. Computing, 93 0 (2--4): 0 147--169, 2011

2011

[49] [49]

Virus spread in networks

Piet Van Mieghem, Jasmina Omic, and Robert Kooij. Virus spread in networks. IEEE/ACM Transactions on Networking, 17 0 (1): 0 1--14, 2009

2009

[50] [50]

Correcting auto-differentiation in neural- ODE training

Yewei Xu, Shi Chen, and Qin Li. Correcting auto-differentiation in neural- ODE training. arXiv preprint arXiv:2306.02192, 2023

work page arXiv 2023

[51] [51]

On the Convergence and Size Transferability of Continuous-depth Graph Neural Networks

Mingsong Yan, Charles Kulick, and Sui Tang. On the convergence and size transferability of continuous-depth graph neural networks. arXiv preprint arXiv:2510.03923, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025