pith. machine review for the scientific record. sign in

arxiv: 2604.27052 · v1 · submitted 2026-04-29 · 🧮 math.OC · cs.LG· math.DG

Recognition: unknown

Man, Machine, and Mathematics

Authors on Pith no claims yet

Pith reviewed 2026-05-07 09:00 UTC · model grok-4.3

classification 🧮 math.OC cs.LGmath.DG
keywords unified theorylearningoptimizationconvergence theoremdynamical systemsgeometryfundamental physicssolvable problems
0
0 comments X

The pith

A unified framework reduces the study of learning to a small set of ideas from dynamical systems, geometry, and physics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes an outline for a unified theory of learning by defining solvable problems and the parametrized methods used to learn their solutions. It frames learning as an interconnected process of problem setup, method selection, and analysis through optimization dynamics. A sympathetic reader would care because the success of nonlinear models across many tasks hints that minimal foundations, like those in physics, could simplify discussion and guide new work. The resulting constructions support a universal convergence theorem specifying when problems yield to chosen methods. This approach shows that learning reduces to remarkably few ideas and tools adapted from dynamical systems theory, geometry, and fundamental physics.

Core claim

We posit and examine a possible outline for a unified theory, interpreting learning broadly as an interconnected process on multiple levels: problem setup, choosing methods, and the analysis of their interplay via imposed optimisation dynamics. We begin by proposing a precise yet versatile definition for solvable problems. We then define the parametrized methods by which their solution(s) may be learned. Our goal is to sketch a universal convergence theorem, specifying how and when solvable problems become amenable to the methods chosen for them. These constructions reduce the study of learning down to remarkably few ideas and tools, many of which are simply adapted from existing ones in the

What carries the argument

The universal convergence theorem, which specifies how and when solvable problems become amenable to chosen parametrized methods under optimization dynamics.

Load-bearing premise

Precise yet versatile definitions for solvable problems and parametrized methods can be given such that they support a universal convergence theorem applicable across a broad landscape of tasks.

What would settle it

A concrete learning task, such as training a nonlinear model on image classification data, where the proposed definitions cannot be applied or the convergence conditions fail to hold would show the framework does not cover the claimed broad landscape.

Figures

Figures reproduced from arXiv: 2604.27052 by Akshunna S. Dogra.

Figure 1
Figure 1. Figure 1: (a) Schema for learning, (b) Some common problems, usual methods, and invoked fields of analysis. Unsurprisingly, this goal is far too ambitious and ill-defined. We will substantially narrow our focus to character￾ize learning pursued via “parameterized methods”, where a model and its parameters evolve under some optimisation rule to produce better estimates of the desired target/solution. Even this goal i… view at source ↗
Figure 2
Figure 2. Figure 2: (a) A generic schema for problems, (b) Nominal gradient flow dynamics for a solvable problem. The notion of “learning” is intimately connected to the notion of having a “problem” to be solved or learned upon. We may wish to classify objects [73], understand visual cues [2, 28], produce solution models for differential equations [21, 38, 51, 70, 87, 89], chat with a human being [96], etc. Thus, if we are to… view at source ↗
Figure 3
Figure 3. Figure 3: (a) Schema for methods, (b) Some typical parametrised architectures and the models produced by them. convergenceC.10 on RM - the set of all sequences that are eventually non-zero for only a finite number of their elements). However, as R∞ ⊂ l 2 and l 2 is a Hilbert space, the reader may keep l 2 in mind for most of what follows. Once an architecture A is chosen, the next question is how we are to refine it… view at source ↗
Figure 4
Figure 4. Figure 4: (a) NN with a = 2, M = 5, and sine activations, (b) Parametric gradient flow dynamics non-linear w.r.t. w, and has a dense model set {A(w) : w ∈ RM} ⊂ L 2 ([−π, π]) if M = ∞. Ex. 3.5 is a generalized Fourier series estimator, with tunable co-efficients and frequencies. Example 3.5 (Fig. 4a). Let w ∈ RM, G = W2,2 ([−π, π]). Our example architecture A is given by A : R M → G, A(w)(·) = w0 + Xa i=1 w2i sin  … view at source ↗
Figure 5
Figure 5. Figure 5: A linear 1-parameter architecture vs a nonlinear 1-parameter spiral architecture. view at source ↗
Figure 6
Figure 6. Figure 6: (a) Analysis is interconnected, (b) Types of analysis and concepts needed for different facets of learning. Analysis is expectedly prevalent in all kinds of learning - and thus, learning about learning. We have already indulged in some forms to setup our problems and methods, speak of convergence to solutions in nominal terms, etc. Many of our choices have been forward looking: we picked the Banach-Riemann… view at source ↗
Figure 7
Figure 7. Figure 7: (a) Where do inclusively expanding Ai limit to? (b) Can we prune Ai without sacrificing performance? Let us set some basic notation that is handy in considering these matters and begin with expansions: Remark 4.7 (Parametric projections and extensions). Let Mj > Mi . {w1, ..., wMi } is taken as the canonical projRMi (wj ) of wj ≡ {w1, ..., wMj } ∈ RMj . The canonical extension of wi ≡ {w1, ..., wMi } ∈ RMi… view at source ↗
Figure 8
Figure 8. Figure 8: (a) Finite parameter architectures as sub-architectures of larger architectures. Here, A5(w ′ ) = A(w) for all w ′ ∈ R 5 , w ∈ R∞, s.t. projR5w = w ′ and w2a = 0 for a > 2. (b) Expanded dynamics Lemma 4.11 (Final architecture, Ch. 4, [18]). Let Ai be an IAE. Then there exists a unique A ∈ C2 pw(R∞, G) s.t. for all i, for all w ∈ RMi , A(w) = Ai(w). Remark 4.12 (Abuse of notation). Since Ai for all i in an … view at source ↗
read the original abstract

Nonlinear models and optimization methods have successfully tackled a rapidly growing set of problems in recent years. Indeed, a relatively small toolbox of such models and methods can provide sufficient performance across a large landscape of tasks: deep learning alone has made significant recent contributions in scientific modelling, natural language processing, visual analysis, etc. A similar relationship exists between physical theories and phenomena, where many applications and observations emerge neatly from remarkably minimal foundations. It is natural to wonder if sparse unified frameworks could be built to steer discussion and discovery in the fields concerned with learning, optimization, and modelling. In this work, we posit and examine a possible outline for such a unified theory, interpreting the notion of ''learning'' in a broad sense. In particular, we pursue our goals by viewing learning as an inter-connected process on multiple levels: problem setup, choosing methods, and the analysis of their interplay via imposed optimisation dynamics. We begin by proposing a precise yet versatile definition for ''solvable'' problems. We then define the ''parametrised methods'' by which their solution(s) may be ''learned''. Our goal is to sketch a ''universal convergence theorem'', specifying how and when solvable problems become amenable to the methods chosen for them. We find these constructions reduce the study of learning down to remarkably few ideas and tools - many of which are simply adapted from existing ones in dynamical systems theory, geometry, and fundamental physics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes an outline for a unified theory of learning, optimization, and modeling by viewing learning as an interconnected process across problem setup, method selection, and optimization dynamics. It introduces precise yet versatile definitions for 'solvable' problems and 'parametrised methods', sketches a 'universal convergence theorem' on when such problems become amenable to the methods, and claims that these reduce the study of learning to remarkably few ideas and tools adapted from dynamical systems theory, geometry, and fundamental physics.

Significance. If the sketched definitions and theorem can be formalized with explicit statements and derivations, the work could provide a valuable high-level unifying perspective linking machine learning to minimal foundations in physics and geometry, similar to how sparse toolboxes succeed across tasks. The manuscript currently offers no such formalization, derivations, or examples, so its significance remains conceptual rather than substantive.

major comments (2)
  1. [Abstract] Abstract: The manuscript states that 'precise yet versatile' definitions for solvable problems and parametrised methods will be proposed and that a universal convergence theorem will be sketched, yet supplies neither the definitions nor the theorem statement. This leaves the central claim—that these constructions reduce learning to few adapted ideas from dynamical systems, geometry, and physics—unsupported by any explicit mathematics.
  2. [Main text] Main text (universal convergence theorem sketch): The claim that the definitions support a theorem applicable across a broad landscape of tasks requires that the notions of solvability and parametrization be stated formally enough to admit a derivation from dynamical systems or geometric tools. No such statements, assumptions (e.g., compactness or regularity conditions), or reduction steps appear, so it is impossible to assess whether hidden restrictions are needed for convergence to hold.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and valuable comments on our manuscript. We are pleased that the potential significance of the unifying perspective is recognized, and we respond to the major comments below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The manuscript states that 'precise yet versatile' definitions for solvable problems and parametrised methods will be proposed and that a universal convergence theorem will be sketched, yet supplies neither the definitions nor the theorem statement. This leaves the central claim—that these constructions reduce learning to few adapted ideas from dynamical systems, geometry, and physics—unsupported by any explicit mathematics.

    Authors: The paper is explicitly positioned as an outline for a unified theory, and the abstract reflects this by describing what is proposed and sketched rather than fully derived. The definitions and theorem are given in precise conceptual terms in the main text, reducing the study to few ideas without requiring explicit symbolic formalization or derivations at this stage. This approach mirrors successful high-level frameworks in physics and other fields. We do not believe additional formal mathematics is necessary for the manuscript's goals, and thus no revision is planned on this point. revision: no

  2. Referee: [Main text] Main text (universal convergence theorem sketch): The claim that the definitions support a theorem applicable across a broad landscape of tasks requires that the notions of solvability and parametrization be stated formally enough to admit a derivation from dynamical systems or geometric tools. No such statements, assumptions (e.g., compactness or regularity conditions), or reduction steps appear, so it is impossible to assess whether hidden restrictions are needed for convergence to hold.

    Authors: We maintain that the sketched notions are stated with enough versatility and precision to support the broad applicability of the universal convergence theorem as outlined. The framework deliberately avoids specifying particular assumptions like compactness to emphasize the general reduction to dynamical systems and geometric tools. The absence of detailed reduction steps is by design, as the paper focuses on the high-level interconnections rather than technical proofs. If the referee seeks a fully derived theorem, that would constitute a different, more technical paper. We see no need for revision here. revision: no

Circularity Check

0 steps flagged

No significant circularity; high-level proposal without explicit self-referential reductions

full rationale

The manuscript proposes definitions for solvable problems and parametrized methods then sketches a universal convergence theorem, explicitly noting that the tools are adapted from dynamical systems theory, geometry, and fundamental physics. No equations, formal definitions, or derivation steps are supplied in the text that would permit exhibiting a reduction of any claimed result to its own inputs by construction. The central claim is framed as an outline and observation rather than a closed mathematical chain that loops back on fitted parameters or self-citations. This is the normal outcome for a high-level conceptual sketch that does not contain load-bearing derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the existence of versatile definitions for solvable problems and parametrized methods that enable a universal convergence theorem; without the full text these remain high-level assumptions whose details and any free parameters cannot be audited.

axioms (2)
  • domain assumption Solvable problems admit a precise yet versatile definition.
    The paper begins by proposing such a definition as the foundation for the framework.
  • domain assumption Parametrised methods exist by which solutions to solvable problems may be learned.
    Defined as the means by which solutions are obtained within the proposed theory.
invented entities (1)
  • Universal convergence theorem no independent evidence
    purpose: To specify how and when solvable problems become amenable to the chosen methods.
    Introduced as the goal of the constructions; no independent evidence or proof is supplied.

pith-pipeline@v0.9.0 · 5546 in / 1392 out tokens · 39916 ms · 2026-05-07T09:00:47.026919+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

113 extracted references · 18 canonical work pages · 1 internal anchor

  1. [1]

    Gradient Flows: In Metric Spaces and in the Space of Probability Measures

    Luigi Ambrosio, Nicola Gigli, and Giuseppe Savare. Gradient Flows: In Metric Spaces and in the Space of Probability Measures. Birkhäuser Basel, 2008

  2. [2]

    D. Ba, A. S. Dogra, R. Gambhir, A. Tasissa, and J. Thaler. Shaper: Can you hear the shape of a jet? Journal of High Energy Physics , 2023(195), 2023

  3. [3]

    Edward Bierstone and Pierre D. Milman. Semianalytic and subanalytic sets. Publications Mathématiques de l’IHÉS, 67:5–42, 1988

  4. [4]

    A family of functional inequalities: Łojasiewicz inequalities and displace- ment convex functions

    Adrien Blanchet and Jérôme Bolte. A family of functional inequalities: Łojasiewicz inequalities and displace- ment convex functions. Journal of Functional Analysis , 275(7):1650–1673, 2018

  5. [5]

    Real Algebraic Geometry

    Jacek Bochnak, Michel Coste, and Marie-Françoise Roy. Real Algebraic Geometry. Springer Berlin, Heidelberg, 1998

  6. [6]

    A mathematical guide to operator learning

    Nicolas Boullé and Alex Townsend. A mathematical guide to operator learning. arXiv:2312.14688, 2023

  7. [7]

    Convex Optimization

    Stephen Boyd and Lieven Vandenberghe. Convex Optimization . Cambridge University Press, 2004

  8. [8]

    On the existence of universal lottery tickets

    Rebekka Burkholz, Nilanjana Laha, Rajarshi Mukherjee, and Alkis Gotovos. On the existence of universal lottery tickets. arXiv, arXiv:2111.11146, 2021

  9. [9]

    Bárta, R

    T. Bárta, R. Chill, and E. Fašangová. Every ordinary differential equation with a strict lyapunov function is a gradient system. Monatshefte für Mathematik , 166, 2012

  10. [10]

    Emre Celebi, Hassan A

    M. Emre Celebi, Hassan A. Kingravi, and Patricio A. Vela. A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems with Applications , 40(1):200–210, 2013

  11. [11]

    On the Łojasiewicz–simon gradient inequality

    Ralph Chill. On the Łojasiewicz–simon gradient inequality. Journal of Functional Analysis , 201(2):572–601, 2003

  12. [12]

    Convergence and decay rate to equilibrium of bounded solutions of quasilinear parabolic equations

    Ralph Chill and Alberto Fiorenza. Convergence and decay rate to equilibrium of bounded solutions of quasilinear parabolic equations. Journal of Differential Equations , 228:611–632, 2006

  13. [13]

    G. Cybenko. Approximation by superpositions of a sigmoidal function. Math. Control Signal Systems 2 , 2, 1989

  14. [14]

    Dogra, and William T

    Pritipriya Dasbehera, Akshunna S. Dogra, and William T. Redman. Distance by de-correlation: Computing distance with heterogeneous grid cells, 2025

  15. [15]

    Damek Davis, Dmitriy Drusvyatskiy, Sham Kakade, and Jason D. Lee. Stochastic subgradient method converges on tame functions. Foundations of Computational Mathematics , 20:119–154, 2020

  16. [16]

    de Hoop, Nikola B

    Maarten V. de Hoop, Nikola B. Kovachki, Nicholas H. Nelsen, and Andrew M. Stuart. Convergence rates for learning linear operators from noisy data. SIAM/ASA Journal on Uncertainty Quantification , 11(2):480–513, 2023

  17. [17]

    Dietrich, T

    F. Dietrich, T. N. Thiem, and I. G. Kevredikis. On the koopman operator of algorithms. SIAM Journal on Applied Dynamical Systems , 19:860–885, 2020

  18. [18]

    A. S. Dogra. Manyfold Learning: A geometric framework for the analysis, optimization, and convergence of nonlinearly parametrised models. PhD thesis, Imperial College London, 2025

  19. [19]

    A. S. Dogra. Neural tangent kernels, pullback metrics, koopman operators, and the many names for geometric control on nonlinear optimization. under preparation, 2025

  20. [20]

    A. S. Dogra. Dynamical systems and neural networks. arXiv:2004.11826, under revision, 2020

  21. [21]

    A. S. Dogra, J. B. Lai, Z. Wang, M. Peev, W. T. Redman, and T. Chen. Solver: Solution learning via equation residuals allows unsupervised error analysis and correction. under revision , 2025. 19

  22. [22]

    A. S. Dogra and W. T. Redman. Optimizing neural networks via koopman operator theory. Advances in Neural Information Processing Systems (NeurIPS) , 33, 2020

  23. [23]

    Draper and Harry Smith

    Norman R. Draper and Harry Smith. Applied Regression Analysis. John Wiley & Sons, New York, 3 edition, 1998

  24. [24]

    Du, Xiyu Zhai, Barnabas Poczos, and Aarti Singh

    Simon S. Du, Xiyu Zhai, Barnabas Poczos, and Aarti Singh. Gradient descent provably optimizes over- parameterized neural networks. In International Conference on Learning Representations , 2019

  25. [25]

    Lottery tickets in linear models: An analysis of iterative magnitude pruning

    Bryn Elesedy, Varun Kanade, and Yee Whye Teh. Lottery tickets in linear models: An analysis of iterative magnitude pruning. arXiv preprint , arXiv:2007.08243, 2021

  26. [26]

    L. C. Evans. Partial Differential Equations , volume 19 of Graduate Studies in Mathematics . American Mathematical Society, Providence, Rhode Island, 1998

  27. [27]

    Resolution of singularities and geometric proofs of the Łojasiewicz inequalities

    Paul M N Feehan. Resolution of singularities and geometric proofs of the Łojasiewicz inequalities. Geometry and Topology,, 23, 2019

  28. [28]

    Geometric data analysis, beyond convolutions

    Jean Feydy. Geometric data analysis, beyond convolutions . ENS Paris-Saclay, 2020

  29. [29]

    Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel

    Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M Roy, and Surya Ganguli. Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information ...

  30. [30]

    The lottery ticket hypothesis: Finding sparse, trainable neural net- works

    Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural net- works. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019 . OpenReview.net, 2019

  31. [31]

    Linear mode connectivity and the lottery ticket hypothesis

    Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, and Michael Carbin. Linear mode connectivity and the lottery ticket hypothesis. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 3259–3269. PMLR, 13–18 Jul 2020

  32. [32]

    Friston and Klaas E

    Karl J. Friston and Klaas E. Stephan. Free-energy and the brain. Synthese, 159(3):417–458, 2007

  33. [33]

    Global convergence in training large-scale transformers

    Cheng Gao, Yuan Cao, Zihao Li, Yihan He, Mengdi Wang, Han Liu, Jason Matthew Klusowski, and Jianqing Fan. Global convergence in training large-scale transformers. In The Thirty-eighth Annual Conference on Neural Information Processing Systems , 2024

  34. [34]

    Diffusions for global optimization

    Stuart Geman and Chii-Ruey Hwang. Diffusions for global optimization. SIAM Journal on Control and Optimization, 24(5):1031–1043, 1986

  35. [35]

    Error bounds for approximations with deep relu neural networks in ws,p norms

    Ingo Gühring, Gitta Kutyniok, and Philipp Petersen. Error bounds for approximations with deep relu neural networks in ws,p norms. Analysis and Applications , 18(05):803–859, 2020

  36. [36]

    Guliyev and Vugar E

    Namig J. Guliyev and Vugar E. Ismailov. Approximation capability of two hidden layer feedforward neural networks with fixed weights. Neurocomputing, 316:262–269, 2018

  37. [37]

    Cooling schedules for optimal annealing

    Bruce Hajek. Cooling schedules for optimal annealing. MATHEMATICS OF OPERATIONS RESEARCH , 13, 1988

  38. [38]

    J. Han, A. Jentzen, and Weinan E. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences , 2018

  39. [39]

    Approximating continuous functions by relu nets of minimal width.arXiv:1710.11278, 2017

    B. Hanin and M. Sellke. Approximating continuous functions by relu nets of minimal width. arXiv:1710.11278, 2018

  40. [40]

    Iterative magnitude pruning as a renormalisation group: A study in the context of the lottery ticket hypothesis

    Abu-Al Hassan. Iterative magnitude pruning as a renormalisation group: A study in the context of the lottery ticket hypothesis. arXiv:2308.03128, 2023. 20

  41. [41]

    Uwe Helmke and John B. Moore. Optimization and Dynamical Systems . Communications and Control Engineering. Springer-Verlag, London & New York, 1st edition, 1994

  42. [42]

    Henderson

    David W. Henderson. Infinite-dimensional manifolds are open subsets of hilbert space. Topology, 9(1):25–33, 1970

  43. [43]

    Rigorous a posteriori error bounds for pde-defined pinns

    Birgit Hillebrecht and Benjamin Unger. Rigorous a posteriori error bounds for pde-defined pinns. IEEE Transactions on Neural Networks and Learning Systems , pages 1–11, 2023

  44. [44]

    K. Hornik. Approximation capabilities of muitilayer feedforward networks. Neural Networks, 4:251–257, 1991

  45. [45]

    Weak formulations of the nonlinear poisson-boltzmann equation in biomolecular electrostatics

    José A Iglesias and Svetoslav Nakov. Weak formulations of the nonlinear poisson-boltzmann equation in biomolecular electrostatics. Journal of Mathematical Analysis and Applications , page 126065, 2022

  46. [46]

    A Convergence Result of a Continuous Model of Deep Learning via Łojasiewicz- Simon inequality.CoRR, abs/2311.15365, 2023

    Noboru Isobe. A convergence result of a continuous model of deep learning via łojasiewicz–simon inequality. arXiv:2311.15365, 2024

  47. [47]

    Neural tangent kernel: Convergence and generalization in neural networks

    Arthur Jacot, Franck Gabriel, and Clement Hongler. Neural tangent kernel: Convergence and generalization in neural networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems , volume 31. Curran Associates, Inc., 2018

  48. [48]

    Deqing Jiang, Justin Sirignano, and Samuel N. Cohen. Global convergence of deep galerkin and pinns methods for solving partial differential equations. arXiv:2305.06000, 2023

  49. [49]

    Kidger and T

    P. Kidger and T. Lyons. Universal approximation with deep narrow networks. arXiv:1905.08539, 2019

  50. [50]

    On universal approximation and error bounds for fourier neural operators

    Nikola Kovachki, Samuel Lanthaler, and Siddhartha Mishra. On universal approximation and error bounds for fourier neural operators. J. Mach. Learn. Res. , 22(1), January 2021

  51. [51]

    Neural operator: Learning maps between function spaces with applications to pdes

    Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces with applications to pdes. Journal of Machine Learning Research , 24(89):1–97, 2023

  52. [52]

    The convenient setting of global analysis / Andreas Kriegl, Peter W

    Andreas Kriegl. The convenient setting of global analysis / Andreas Kriegl, Peter W. Michor. Mathematical surveys and monographs ; no. 53. American Mathematical Society, Providence, R.I., 1997

  53. [53]

    Characterizing possible failure modes in physics-informed neural networks

    Aditi Krishnapriyan, Amir Gholami, Shandian Zhe, Robert Kirby, and Michael W Mahoney. Characterizing possible failure modes in physics-informed neural networks. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems , volume 34, pages 26548–26560. Curran Associates, Inc., 2021

  54. [54]

    On gradients of functions definable in o-minimal structures

    Krzysztof Kurdyka. On gradients of functions definable in o-minimal structures. Annales de l’Institut Fourier , 48:769–783, 1998

  55. [55]

    Error estimates for deeponets: a deep learning framework in infinite dimensions

    Samuel Lanthaler, Siddhartha Mishra, and George E Karniadakis. Error estimates for deeponets: a deep learning framework in infinite dimensions. Transactions of Mathematics and Its Applications , 6(1):tnac001, 03 2022

  56. [56]

    LeCun, Y

    Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521, 2015

  57. [57]

    Wide neural networks of any depth evolve as linear models under gradient descent

    Jaehoon Lee, Lechao Xiao, Samuel Schoenholz, Yasaman Bahri, Roman Novak, Jascha Sohl-Dickstein, and Jeffrey Pennington. Wide neural networks of any depth evolve as linear models under gradient descent. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems , volume 3...

  58. [58]

    Learning overparameterized neural networks via stochastic gradient descent on structured data

    Yuanzhi Li and Yingyu Liang. Learning overparameterized neural networks via stochastic gradient descent on structured data. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems , volume 31. Curran Associates, Inc., 2018. 21

  59. [59]

    Fourier Neural Operator for Parametric Partial Differential Equations

    Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stu- art, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. arXiv:2010.08895, 2020

  60. [60]

    Loss landscapes and optimization in over-parameterized non- linear systems and neural networks

    Chaoyue Liu, Libin Zhu, and Mikhail Belkin. Loss landscapes and optimization in over-parameterized non- linear systems and neural networks. Applied and Computational Harmonic Analysis , 59:85–116, 2022. Special Issue on Harmonic Analysis and Machine Learning

  61. [61]

    Hou, and Max Tegmark

    Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljacic, Thomas Y. Hou, and Max Tegmark. KAN: Kolmogorov–arnold networks. In The Thirteenth International Conference on Learning Representations, 2025

  62. [62]

    Lojasiewicz

    S. Lojasiewicz. Une propriete topologique des sous-ensembles analytiques reels. Colloques internationaux du C.N.R.S 117. Les Équations aux Dérivées Partielles , 1963

  63. [63]

    Lojasiewicz

    S. Lojasiewicz. Ensembles semi-analytiques. preprint IHES , 1965

  64. [64]

    Lojasiewicz

    S. Lojasiewicz. Sur les trajectoires du gradient d’une fonction analytique. Seminari di Geometria, Bologna (1982/83), Universita’ degli Studi di Bologna, Bologna , 1983

  65. [65]

    Some geometric calculations on Wasserstein space

    John Lott. Some geometric calculations on Wasserstein space. Comm. Math. Phys. , 277(2):423–437, 2008

  66. [66]

    A universal approximation theorem of deep neural networks for expressing prob- ability distributions

    Yulong Lu and Jianfeng Lu. A universal approximation theorem of deep neural networks for expressing prob- ability distributions. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA, 2020. Curran Associates Inc

  67. [67]

    Towards understanding iterative magnitude pruning: Why lottery tickets win

    Jaron Maene, Mingxiao Li, and Marie-Francine Moens. Towards understanding iterative magnitude pruning: Why lottery tickets win. arXiv preprint , arXiv:2106.06955, 2021

  68. [68]

    Lower bounds for approximation by mlp neural networks

    Vitaly Maiorov and Allan Pinkus. Lower bounds for approximation by mlp neural networks. Neurocomputing, 25(1):81–91, 1999

  69. [69]

    Exponential convergence of deep operator networks for elliptic partial differential equations

    Carlo Marcati and Christoph Schwab. Exponential convergence of deep operator networks for elliptic partial differential equations. SIAM Journal on Numerical Analysis , 61(3):1513–1545, 2023

  70. [70]

    Mattheakis, D

    M. Mattheakis, D. Sondak, A. S. Dogra, and P. Protopapas. Hamiltonian neural networks for solving differ- ential equations. Physical Review E 105, 065305 , 2022

  71. [71]

    Sparse transfer learning via winning lottery tickets

    Rahul Mehta. Sparse transfer learning via winning lottery tickets. arXiv preprint , arXiv:1905.07785, 2019

  72. [72]

    Nakao, Michael Plum

    Yoshitaka Watanabe Mitsuhiro T. Nakao, Michael Plum. Numerical Verification Methods and Computer- Assisted Proofs for Partial Differential Equations . Springer Singapore, 2019

  73. [73]

    Dogra ∗, Hanfeng Gu, Caleb Meredith, Mark Kon, and Julio Enrique Castrillón- Candás∗, for the Alzheimer’s Disease Neuroimaging Initiative

    Trajan Murphy∗, Akshunna S. Dogra ∗, Hanfeng Gu, Caleb Meredith, Mark Kon, and Julio Enrique Castrillón- Candás∗, for the Alzheimer’s Disease Neuroimaging Initiative. Finder: Feature inference on noisy datasets using eigenspace residuals. arXiv:2510.19917, 2025

  74. [74]

    Universal approximation property of banach space-valued random feature models including random neural networks, 2024

    Ariel Neufeld and Philipp Schmocker. Universal approximation property of banach space-valued random feature models including random neural networks, 2024

  75. [75]

    The geometry of dissipative evolution equations: the porous medium equation

    Felix Otto. The geometry of dissipative evolution equations: the porous medium equation. Comm. Partial Differential Equations , 26(1-2):101–174, 2001

  76. [76]

    W. T. Redman, M. Fonoberova, R. Mohr, Y. Kevrekidis, and I. Mezic. An operator theoretic view on pruning deep neural networks. International Conference on Learning Representations 2021 , 2022

  77. [77]

    William T Redman, Tianlong Chen, Zhangyang Wang, and Akshunna S. Dogra. Universality of winning tickets: A renormalization group perspective. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research , pages 18483–18498. PMLR, 17–23 Jul 2022. 22

  78. [78]

    Roberts, Sho Yaida, and Boris Hanin

    Daniel A. Roberts, Sho Yaida, and Boris Hanin. The principles of deep learning theory. arXiv preprint , arXiv:2106.10165, 2021

  79. [79]

    On the Łojasiewicz–simon gradient inequality on submanifolds

    Fabian Rupp. On the Łojasiewicz–simon gradient inequality on submanifolds. Journal of Functional Analysis , 279(8):108708, 2020

  80. [80]

    Analyzing the neural tangent kernel of period- ically activated coordinate networks

    Hemanth Saratchandran, Shin-Fang Chng, and Simon Lucey. Analyzing the neural tangent kernel of period- ically activated coordinate networks. arXiv:2402.04783, 2024

Showing first 80 references.