Function graph transformers universally approximate operators between function spaces

David Mis; Ivan Dokmani\'c; Maarten V. de Hoop; Matti Lassas; Takashi Furuya

arxiv: 2605.17968 · v1 · pith:5L4FWBGSnew · submitted 2026-05-18 · 💻 cs.LG

Function graph transformers universally approximate operators between function spaces

Takashi Furuya , David Mis , Ivan Dokmani\'c , Maarten V. de Hoop , Matti Lassas This is my paper

Pith reviewed 2026-05-20 13:03 UTC · model grok-4.3

classification 💻 cs.LG

keywords function graph transformersuniversal approximationoperator learningmeasure theoretic transformersself-attentiondiscretization invarianceSobolev spacesnonlinear operators

0 comments

The pith

Transformers can approximate any nonlinear operator between function spaces when functions are lifted to graph measures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that transformers can learn nonlinear operators between function spaces in a discretization-invariant manner. It does this by lifting each function to a measure supported on its graph and applying a measure-theoretic perspective on transformers. The key step is introducing function graph transformers that preserve the graph structure so that outputs remain valid functions. This structure still permits universal approximation through compositions of ordinary attention layers and MLPs, covering operators on Sobolev spaces and other challenging settings.

Core claim

Function graph transformers are graph-preserving maps from graph measures to graph measures that can be approximated arbitrarily well by finite sequences of softmax self-attention layers and pointwise multilayer perceptrons. This yields universal approximation theorems for wide families of nonlinear operators between function spaces. The same construction accommodates regularized negative-order Sobolev inputs and output query points defined on separate domains.

What carries the argument

Function graph transformers: a subclass of measure-theoretic transformers that preserve graph structure by mapping graph-supported measures to graph-supported measures, thereby guaranteeing single-valued function outputs while allowing approximation by standard transformer components.

If this is right

Universal approximation holds for operators acting on regularized negative-order Sobolev function spaces.
Output query locations may be chosen independently of the input discretization points.
Refinement of discretizations corresponds to convergence in the space of measures.
The roles of positional encodings and graph connectivity become explicit in the operator-learning setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar graph-measure ideas could be applied to other architectures such as graph neural networks for operator learning.
Practical training procedures might enforce the graph-preserving property through additional loss terms or architectural constraints.
This viewpoint suggests new ways to prove discretization invariance for existing transformer-based PDE solvers.

Load-bearing premise

Representing functions by measures on their graphs and adopting a measure-theoretic view of transformers is general enough to include all operators one wishes to approximate.

What would settle it

A concrete nonlinear operator from one function space to another that cannot be approximated to any desired accuracy by any finite composition of standard softmax attention layers and pointwise MLPs, when functions are represented by their graph measures, would falsify the result.

Figures

Figures reproduced from arXiv: 2605.17968 by David Mis, Ivan Dokmani\'c, Maarten V. de Hoop, Matti Lassas, Takashi Furuya.

**Figure 2.** Figure 2: Representative two-dimensional FNO-teacher recovery examples for the trained same [PITH_FULL_IMAGE:figures/full_fig_p047_2.png] view at source ↗

read the original abstract

We study the approximation of nonlinear operators between function spaces by transformers. Our approach is to lift functions to measures supported on their graphs and leverage a recently introduced measure-theoretic view of transformers. A function $h$ is represented by its graph measure $\gamma_h$, with finite tokens $\{(x_j,h(x_j))\}_{j=1}^N$ being its empirical approximations. We show that this framework elegantly models discretization refinement via convergence of measures and provides a natural setting for operator learning. Within this framework, we introduce function graph transformers, a graph-preserving subclass of measure-theoretic transformers that maps graph measures to graph measures, which is to say that outputs remain single-valued functions. Crucially, this additional structure does not reduce generality: we prove that the resulting graph-preserving maps can be approximated by finite compositions of standard softmax self-attention layers and pointwise MLPs, yielding universal approximation results for broad classes of nonlinear operators. Unlike existing theoretical approaches to operator learning with transformers, the measure-theoretic framework also accommodates regularized negative-order Sobolev inputs for which discretization invariance is particularly challenging, as well as query points on different output domains. Overall, function graph transformers provide a continuum viewpoint and mathematical toolkit for transformer-based operator learning, clarifying the roles of positional encodings, graph structure, regularization, and ensuring consistency across discretizations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces function graph transformers that preserve graph measures to keep outputs single-valued while still claiming universal approximation of nonlinear operators via ordinary softmax attention and MLPs.

read the letter

The main point is that lifting functions to graph measures lets them treat transformers as maps on measures, then restrict to a graph-preserving subclass that still approximates any reasonable operator between function spaces. This setup makes discretization refinement look like weak convergence of measures and extends naturally to negative-order Sobolev inputs and queries on mismatched domains.

Referee Report

2 major / 2 minor

Summary. The paper lifts functions to measures supported on their graphs and uses a measure-theoretic view of transformers to introduce function graph transformers, a graph-preserving subclass that maps graph measures to graph measures (ensuring single-valued outputs). It claims to prove that these graph-preserving maps can be approximated arbitrarily closely by finite compositions of standard softmax self-attention layers and pointwise MLPs, yielding universal approximation for broad classes of nonlinear operators between function spaces. The framework is asserted to handle discretization refinement via measure convergence, regularized negative-order Sobolev inputs, and query points on mismatched domains without loss of generality.

Significance. If the central approximation result holds with the claimed preservation of graph support, the work supplies a continuum viewpoint and mathematical toolkit for transformer-based operator learning. It addresses discretization invariance and regularity challenges that are difficult for existing approaches, while clarifying roles of positional encodings and graph structure. The explicit accommodation of negative-order Sobolev inputs and cross-domain queries would be a notable advance if rigorously established.

major comments (2)

[abstract and main approximation theorem] The central claim that graph-preserving maps can be approximated by unrestricted softmax attention layers without restricting the class of operators (stated in the abstract and developed in the main results) requires explicit verification that the limit preserves single-valued functional outputs. In weak measure metrics such as Wasserstein or weak-*, small perturbations can split mass across multiple y-values for the same x; the argument appears to rely on density of graph-preserving maps plus an implicit projection step whose details are not secured for negative-order Sobolev inputs or mismatched query domains.
[framework section and universal approximation result] The framework assumes that lifting to graph measures combined with the prior measure-theoretic transformer view provides a sufficiently general setting without restricting approximable operators. However, the dependence on that prior work for the operator approximation result introduces grounding that is not fully external; the manuscript should clarify independence and verify that the graph-preservation constraint does not implicitly narrow the operator class for the Sobolev cases highlighted as a strength.

minor comments (2)

[introduction and framework] Notation for empirical graph measures (finite tokens {(x_j, h(x_j))}) and their convergence under discretization refinement should be made fully explicit with a dedicated definition or equation to aid readability.
[section 2] The manuscript would benefit from a short table or diagram contrasting the function graph transformer construction with standard measure-theoretic transformers to highlight the graph-preservation mechanism.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and insightful comments. We address each major point below, indicating the revisions we will incorporate to strengthen the manuscript while preserving the core contributions.

read point-by-point responses

Referee: [abstract and main approximation theorem] The central claim that graph-preserving maps can be approximated by unrestricted softmax attention layers without restricting the class of operators (stated in the abstract and developed in the main results) requires explicit verification that the limit preserves single-valued functional outputs. In weak measure metrics such as Wasserstein or weak-*, small perturbations can split mass across multiple y-values for the same x; the argument appears to rely on density of graph-preserving maps plus an implicit projection step whose details are not secured for negative-order Sobolev inputs or mismatched query domains.

Authors: We agree that explicit verification of preservation under limits is necessary for full rigor. In the revised manuscript we will insert a new lemma establishing that the weak-* limit of a sequence of graph-preserving maps remains graph-preserving when the underlying measures arise from functions in the regularized negative-order Sobolev spaces considered in the paper. The lemma will also treat the projection onto graph measures explicitly, showing that the projection is continuous in the Wasserstein metric for the relevant function classes and that it introduces no additional error that would affect the universal-approximation guarantee. The same argument extends directly to query points on mismatched domains by viewing the query as a marginal of the lifted measure. These additions will be placed immediately after the statement of the main approximation theorem. revision: yes
Referee: [framework section and universal approximation result] The framework assumes that lifting to graph measures combined with the prior measure-theoretic transformer view provides a sufficiently general setting without restricting approximable operators. However, the dependence on that prior work for the operator approximation result introduces grounding that is not fully external; the manuscript should clarify independence and verify that the graph-preservation constraint does not implicitly narrow the operator class for the Sobolev cases highlighted as a strength.

Authors: We will add a dedicated paragraph in the framework section that separates the contributions: the measure-theoretic transformer construction is taken as background, but the density of graph-preserving maps within the space of all continuous maps on graph measures, together with the approximation by standard softmax attention, is proved self-containedly in our Theorems 3.4 and 4.2. Because every operator between the function spaces lifts uniquely to a graph-preserving map on the corresponding graph measures, the restriction to graph-preserving maps does not reduce the class of approximable operators. A short appendix subsection will verify that the same density and approximation statements hold uniformly for the regularized negative-order Sobolev inputs, confirming that the highlighted strength is retained. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on independent proofs within the measure-theoretic framework.

full rationale

The paper defines function graph transformers as a graph-preserving subclass of measure-theoretic transformers and claims to prove that such maps can be approximated by standard softmax attention plus MLPs, yielding universal operator approximation. This is presented as a mathematical result rather than a reduction by construction, self-definition, or fitted input. The reliance on a 'recently introduced measure-theoretic view' is a citation to prior work; per guidelines, a cited result counts as independent support unless it is shown to reduce the central claim to an unverified self-citation chain or ansatz. No equations or steps in the provided text exhibit the specific reduction (e.g., Eq. X equivalent to input by definition or prediction forced by fit). The framework is self-contained against external benchmarks for the stated universal approximation claims, making this the normal honest non-finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on representing functions via graph measures and extending a prior measure-theoretic transformer view; these are introduced without independent empirical or formal verification beyond the theoretical construction itself.

axioms (2)

domain assumption Functions can be represented by measures supported on their graphs, with empirical approximations given by finite tokens
Stated directly in the abstract as the starting point for the framework.
domain assumption The recently introduced measure-theoretic view of transformers extends to graph measures for operator learning
The abstract says the framework leverages this view to model discretization refinement and operator learning.

invented entities (1)

function graph transformer no independent evidence
purpose: A graph-preserving subclass of measure-theoretic transformers that maps graph measures to graph measures so outputs remain single-valued functions
Newly defined in the paper to add structure while claiming no loss of generality for approximation.

pith-pipeline@v0.9.0 · 5773 in / 1499 out tokens · 69825 ms · 2026-05-20T13:03:30.300014+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

functions are represented by graph measures and transformers by graph-preserving measure maps, yielding universality results that extend to negative-order Sobolev spaces

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

79 extracted references · 79 canonical work pages · 1 internal anchor

[1]

2020 , eprint=

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis , author=. 2020 , eprint=

work page 2020
[2]

Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , url =

Tancik, Matthew and Srinivasan, Pratul and Mildenhall, Ben and Fridovich-Keil, Sara and Raghavan, Nithin and Singhal, Utkarsh and Ramamoorthi, Ravi and Barron, Jonathan and Ng, Ren , booktitle =. Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , url =

work page
[3]

and Magenes, E

Lions, J.-L. and Magenes, E. , TITLE =. 1972 , PAGES =

work page 1972
[4]

International Conference on Learning Representations , year=

Fourier Neural Operator for Parametric Partial Differential Equations , author=. International Conference on Learning Representations , year=

work page
[5]

, TITLE =

Taylor, Michael E. , TITLE =. 2023 , PAGES =. doi:10.1007/978-3-031-33928-8 , URL =

work page doi:10.1007/978-3-031-33928-8 2023
[6]

2026 , eprint=

Flowers: A Warp Drive for Neural PDE Solvers , author=. 2026 , eprint=

work page 2026
[7]

Bogachev, V. I. , TITLE =. 2007 , PAGES =. doi:10.1007/978-3-540-34514-5 , URL =

work page doi:10.1007/978-3-540-34514-5 2007
[8]

On the autonomous Nemytskii operator between Sobolev spaces in the critical and supercritical cases: Well-definedness and higher-order chain rule , journal =

Florin Isaia , keywords =. On the autonomous Nemytskii operator between Sobolev spaces in the critical and supercritical cases: Well-definedness and higher-order chain rule , journal =. 2022 , issn =. doi:https://doi.org/10.1016/j.na.2021.112576 , url =

work page doi:10.1016/j.na.2021.112576 2022
[9]

Yang, Greg , year = 2020, month = apr, number =. Scaling. 1902.04760 , primaryclass =

work page arXiv 2020
[10]

Proceedings of the Thirty-Second Conference on Learning Theory , pages =

Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit , author =. Proceedings of the Thirty-Second Conference on Learning Theory , pages =. 2019 , editor =

work page 2019
[11]

Transformers are

Katharopoulos, Angelos and Vyas, Apoorv and Pappas, Nikolaos and Fleuret, Fran. Transformers are. Proceedings of the 37th International Conference on Machine Learning , pages =. 2020 , editor =

work page 2020
[12]

International Conference on Learning Representations , year=

Rethinking Attention with Performers , author=. International Conference on Learning Representations , year=

work page
[13]

Kovachki and Matthew E

Edoardo Calvello and Nikola B. Kovachki and Matthew E. Levine and Andrew M. Stuart , title =. Journal of Machine Learning Research , year =

work page
[14]

The Twelfth International Conference on Learning Representations , year =

Functional Interpolation for Relative Positions Improves Long Context Transformers , author =. The Twelfth International Conference on Learning Representations , year =

work page
[15]

and Fournier, John J

Adams, Robert A. and Fournier, John J. F. , title =

work page
[16]

Hardy's inequalities revisited , journal =

Brezis, Ha\". Hardy's inequalities revisited , journal =. 1997 , pages =

work page 1997
[17]

SIAM Journal on Mathematical Analysis , volume =

Costabel, Martin , title =. SIAM Journal on Mathematical Analysis , volume =. 1988 , pages =

work page 1988
[18]

Grisvard, Pierre , title =

work page
[19]

Lions, Jacques-Louis and Magenes, Enrico , title =

work page
[20]

, title =

Dudley, Richard M. , title =. 1989 , pages =

work page 1989
[21]

McLean, William , title =

work page
[22]

Direct Methods in the Theory of Elliptic Equations , series =

Ne. Direct Methods in the Theory of Elliptic Equations , series =

work page
[23]

A Panorama of Discrepancy Theory , editor =

Dick, Josef and Pillichshammer, Friedrich , title =. A Panorama of Discrepancy Theory , editor =. 2014 , doi =

work page 2014
[24]

Probability Theory and Related Fields , volume =

Fournier, Nicolas and Guillin, Arnaud , title =. Probability Theory and Related Fields , volume =. 2015 , doi =

work page 2015
[25]

2000 , doi =

Graf, Siegfried and Luschgy, Harald , title =. 2000 , doi =

work page 2000
[26]

The Analysis of Linear Partial Differential Operators

H. The Analysis of Linear Partial Differential Operators

work page
[27]

Leoni, Giovanni , title =

work page
[28]

1992 , doi =

Niederreiter, Harald , title =. 1992 , doi =

work page 1992
[29]

Optimal Transport: Old and New , series =

Villani, C. Optimal Transport: Old and New , series =. 2009 , doi =

work page 2009
[30]

, title =

Rychkov, Vyacheslav S. , title =. Journal of the London Mathematical Society , series =. 1999 , doi =

work page 1999
[31]

2025 , journal =

A mathematical perspective on transformers , author =. 2025 , journal =

work page 2025
[32]

2022 , journal =

A neural ODE interpretation of transformer layers , author =. 2022 , journal =

work page 2022
[33]

2025 , journal =

A unified perspective on the dynamics of deep transformers , author =. 2025 , journal =

work page 2025
[34]

2021 , booktitle =

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author =. 2021 , booktitle =

work page 2021
[35]

2004 , publisher =

An introduction to partial differential equations , author =. 2004 , publisher =

work page 2004
[36]

2026 , eprint=

Phaedra: Learning High-Fidelity Discrete Tokenization for the Physical Science , author=. 2026 , eprint=

work page 2026
[37]

2017 , booktitle =

Attention is All you Need , author =. 2017 , booktitle =

work page 2017
[38]

2023 , journal =

Bayesian posterior perturbation analysis with integral probability metrics , author =. 2023 , journal =

work page 2023
[39]

1974 , journal =

Calculation of the Wasserstein Distance Between Probability Distributions on the Line , author =. 1974 , journal =. doi:10.1137/1118101 , url =

work page doi:10.1137/1118101 1974
[40]

2021 , booktitle =

Choose a transformer: fourier or galerkin , author =. 2021 , booktitle =

work page 2021
[41]

2015 , journal =

Control to flocking of the kinetic Cucker--Smale model , author =. 2015 , journal =

work page 2015
[42]

Billingsley,Convergence of Probability Measures

Convergence of probability measures , author =. 1999 , publisher =. doi:10.1002/9780470316962 , isbn =

work page doi:10.1002/9780470316962 1999
[43]

Pappas and Paris Perdikaris , year =

Sifan Wang and Jacob H Seidman and Shyam Sankaran and Hanwen Wang and George J. Pappas and Paris Perdikaris , year =. The Thirteenth International Conference on Learning Representations , url =

work page
[44]

2023 , journal =

Diffusion models: A comprehensive survey of methods and applications , author =. 2023 , journal =

work page 2023
[45]

2024 , url =

From microscopic to macroscopic scale equations: mean field, hydrodynamic and graph limits , author =. 2024 , url =. 2209.08832 , archiveprefix =

work page internal anchor Pith review arXiv 2024
[46]

2023 , booktitle =

GNOT: a general neural operator transformer for operator learning , author =. 2023 , booktitle =

work page 2023
[47]

Proceedings of the 41st International Conference on Machine Learning , publisher =

How Smooth Is Attention? , author =. Proceedings of the 41st International Conference on Machine Learning , publisher =. 2024 , month =

work page 2024
[48]

2010 , journal =

Inverse problems: a Bayesian perspective , author =. 2010 , journal =

work page 2010
[49]

2024 , journal =

Learning stochastic dynamics and predicting emergent behavior using transformers , author =. 2024 , journal =

work page 2024
[50]

2024 , journal =

Measure-to-measure interpolation using Transformers , author =. 2024 , journal =

work page 2024
[51]

Methods of modern mathematical physics

Reed, Michael and Simon, Barry , year =. Methods of modern mathematical physics

work page
[52]

Obstructions to extension of

Lombardini, Luca and Rossi, Francesco , year =. Obstructions to extension of. Proc. Amer. Math. Soc. , volume =. doi:10.1090/proc/16030 , issn =

work page doi:10.1090/proc/16030
[53]

1958 , journal =

On the Convergence of Sample Probability Distributions , author =. 1958 , journal =

work page 1958
[54]

2016 , booktitle =

On the dynamics of large particle systems in the mean field limit , author =. 2016 , booktitle =

work page 2016
[55]

2020 , journal =

On the local Lipschitz stability of Bayesian inverse problems , author =. 2020 , journal =

work page 2020
[56]

Operator Learning with Domain Decomposition for Geometry Generalization in

Jianing Huang and Kaixuan Zhang and Youjia Wu and Ze Cheng , year =. Operator Learning with Domain Decomposition for Geometry Generalization in. The Fourteenth International Conference on Learning Representations , url =

work page
[57]

, year =

Optimal Transport: Old and New , author =. 2009 , publisher =. doi:10.1007/978-3-540-71050-9 , isbn =

work page doi:10.1007/978-3-540-71050-9 2009
[58]

2023 , journal =

Pattern formation of the Cucker--Smale type kinetic models based on gradient flow , author =. 2023 , journal =

work page 2023
[59]

Periodic homogenization and effective mass theorems for the

Allaire, Gr\'. Periodic homogenization and effective mass theorems for the. 2008 , booktitle =. doi:10.1007/978-3-540-79574-2\_1 , url =

work page doi:10.1007/978-3-540-79574-2 2008
[60]

Poseidon: Efficient foundation models for PDEs

Poseidon: Efficient Foundation Models for PDEs , author =. 2024 , url =. 2405.19101 , archiveprefix =

work page arXiv 2024
[61]

2024 , booktitle =

Positional knowledge is all you need: position-induced transformer (PiT) for operator learning , author =. 2024 , booktitle =

work page 2024
[62]

2020 , publisher =

Probability theory---a comprehensive course , author =. 2020 , publisher =. doi:10.1007/978-3-030-56402-5 , isbn =

work page doi:10.1007/978-3-030-56402-5 2020
[63]

Garrido, Quentin and Kiani, Bobak and Lawrence, Hannah and Lecun, Yann and Mialon, Gr. Self-. 2023 , booktitle =. doi:10.52202/075280-1262 , isbn =

work page doi:10.52202/075280-1262 2023
[64]

2008 , booktitle =

Separability and completeness for the Wasserstein distance , author =. 2008 , booktitle =

work page 2008
[65]

2013 , publisher =

Stochastic differential equations: an introduction with applications , author =. 2013 , publisher =

work page 2013
[66]

2015 , booktitle =

The Bayesian approach to inverse problems , author =. 2015 , booktitle =

work page 2015
[67]

2021 , booktitle =

The lipschitz constant of self-attention , author =. 2021 , booktitle =

work page 2021
[68]

2024 , journal =

Theoretical foundations of deep selective state-space models , author =. 2024 , journal =

work page 2024
[69]

2024 , journal =

Towards understanding the universality of transformers for next-token prediction , author =. 2024 , journal =

work page 2024
[70]

Transformer for Partial Differential Equations

Zijie Li and Kazem Meidani and Amir Barati Farimani , year =. Transformer for Partial Differential Equations. Transactions on Machine Learning Research , issn =

work page
[71]

2025 , booktitle =

Transformers are Universal In-context Learners , author =. 2025 , booktitle =

work page 2025
[72]

2025 , journal =

Transformers as neural operators for solutions of differential equations with finite regularity , author =. 2025 , journal =. doi:https://doi.org/10.1016/j.cma.2024.117560 , issn =

work page doi:10.1016/j.cma.2024.117560 2025
[73]

2025 , journal =

Transformers through the Lens of Support-Preserving Maps between Measures , author =. 2025 , journal =

work page 2025
[74]

2021 , booktitle =

Trumpets: Injective Flows for Inference and Inverse Problems , author =. 2021 , booktitle =

work page 2021
[75]

2024 , journal =

Understanding the expressive power and mechanisms of transformer for sequence modeling , author =. 2024 , journal =

work page 2024
[76]

2024 , journal =

Universal Approximation of Mean-Field Models via Transformers , author =. 2024 , journal =

work page 2024
[77]

2024 , booktitle =

Universal physics transformers: a framework for efficiently scaling neural operators , author =. 2024 , booktitle =

work page 2024
[78]

2025 , journal =

Upper and lower bounds for local Lipschitz stability of Bayesian posteriors , author =. 2025 , journal =

work page 2025
[79]

Walrus: A cross-domain foundation model for continuum dynamics.arXiv preprint arXiv:2511.15684, 2025

Walrus: A Cross-Domain Foundation Model for Continuum Dynamics , author =. 2025 , url =. 2511.15684 , archiveprefix =

work page arXiv 2025

[1] [1]

2020 , eprint=

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis , author=. 2020 , eprint=

work page 2020

[2] [2]

Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , url =

Tancik, Matthew and Srinivasan, Pratul and Mildenhall, Ben and Fridovich-Keil, Sara and Raghavan, Nithin and Singhal, Utkarsh and Ramamoorthi, Ravi and Barron, Jonathan and Ng, Ren , booktitle =. Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , url =

work page

[3] [3]

and Magenes, E

Lions, J.-L. and Magenes, E. , TITLE =. 1972 , PAGES =

work page 1972

[4] [4]

International Conference on Learning Representations , year=

Fourier Neural Operator for Parametric Partial Differential Equations , author=. International Conference on Learning Representations , year=

work page

[5] [5]

, TITLE =

Taylor, Michael E. , TITLE =. 2023 , PAGES =. doi:10.1007/978-3-031-33928-8 , URL =

work page doi:10.1007/978-3-031-33928-8 2023

[6] [6]

2026 , eprint=

Flowers: A Warp Drive for Neural PDE Solvers , author=. 2026 , eprint=

work page 2026

[7] [7]

Bogachev, V. I. , TITLE =. 2007 , PAGES =. doi:10.1007/978-3-540-34514-5 , URL =

work page doi:10.1007/978-3-540-34514-5 2007

[8] [8]

On the autonomous Nemytskii operator between Sobolev spaces in the critical and supercritical cases: Well-definedness and higher-order chain rule , journal =

Florin Isaia , keywords =. On the autonomous Nemytskii operator between Sobolev spaces in the critical and supercritical cases: Well-definedness and higher-order chain rule , journal =. 2022 , issn =. doi:https://doi.org/10.1016/j.na.2021.112576 , url =

work page doi:10.1016/j.na.2021.112576 2022

[9] [9]

Yang, Greg , year = 2020, month = apr, number =. Scaling. 1902.04760 , primaryclass =

work page arXiv 2020

[10] [10]

Proceedings of the Thirty-Second Conference on Learning Theory , pages =

Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit , author =. Proceedings of the Thirty-Second Conference on Learning Theory , pages =. 2019 , editor =

work page 2019

[11] [11]

Transformers are

Katharopoulos, Angelos and Vyas, Apoorv and Pappas, Nikolaos and Fleuret, Fran. Transformers are. Proceedings of the 37th International Conference on Machine Learning , pages =. 2020 , editor =

work page 2020

[12] [12]

International Conference on Learning Representations , year=

Rethinking Attention with Performers , author=. International Conference on Learning Representations , year=

work page

[13] [13]

Kovachki and Matthew E

Edoardo Calvello and Nikola B. Kovachki and Matthew E. Levine and Andrew M. Stuart , title =. Journal of Machine Learning Research , year =

work page

[14] [14]

The Twelfth International Conference on Learning Representations , year =

Functional Interpolation for Relative Positions Improves Long Context Transformers , author =. The Twelfth International Conference on Learning Representations , year =

work page

[15] [15]

and Fournier, John J

Adams, Robert A. and Fournier, John J. F. , title =

work page

[16] [16]

Hardy's inequalities revisited , journal =

Brezis, Ha\". Hardy's inequalities revisited , journal =. 1997 , pages =

work page 1997

[17] [17]

SIAM Journal on Mathematical Analysis , volume =

Costabel, Martin , title =. SIAM Journal on Mathematical Analysis , volume =. 1988 , pages =

work page 1988

[18] [18]

Grisvard, Pierre , title =

work page

[19] [19]

Lions, Jacques-Louis and Magenes, Enrico , title =

work page

[20] [20]

, title =

Dudley, Richard M. , title =. 1989 , pages =

work page 1989

[21] [21]

McLean, William , title =

work page

[22] [22]

Direct Methods in the Theory of Elliptic Equations , series =

Ne. Direct Methods in the Theory of Elliptic Equations , series =

work page

[23] [23]

A Panorama of Discrepancy Theory , editor =

Dick, Josef and Pillichshammer, Friedrich , title =. A Panorama of Discrepancy Theory , editor =. 2014 , doi =

work page 2014

[24] [24]

Probability Theory and Related Fields , volume =

Fournier, Nicolas and Guillin, Arnaud , title =. Probability Theory and Related Fields , volume =. 2015 , doi =

work page 2015

[25] [25]

2000 , doi =

Graf, Siegfried and Luschgy, Harald , title =. 2000 , doi =

work page 2000

[26] [26]

The Analysis of Linear Partial Differential Operators

H. The Analysis of Linear Partial Differential Operators

work page

[27] [27]

Leoni, Giovanni , title =

work page

[28] [28]

1992 , doi =

Niederreiter, Harald , title =. 1992 , doi =

work page 1992

[29] [29]

Optimal Transport: Old and New , series =

Villani, C. Optimal Transport: Old and New , series =. 2009 , doi =

work page 2009

[30] [30]

, title =

Rychkov, Vyacheslav S. , title =. Journal of the London Mathematical Society , series =. 1999 , doi =

work page 1999

[31] [31]

2025 , journal =

A mathematical perspective on transformers , author =. 2025 , journal =

work page 2025

[32] [32]

2022 , journal =

A neural ODE interpretation of transformer layers , author =. 2022 , journal =

work page 2022

[33] [33]

2025 , journal =

A unified perspective on the dynamics of deep transformers , author =. 2025 , journal =

work page 2025

[34] [34]

2021 , booktitle =

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author =. 2021 , booktitle =

work page 2021

[35] [35]

2004 , publisher =

An introduction to partial differential equations , author =. 2004 , publisher =

work page 2004

[36] [36]

2026 , eprint=

Phaedra: Learning High-Fidelity Discrete Tokenization for the Physical Science , author=. 2026 , eprint=

work page 2026

[37] [37]

2017 , booktitle =

Attention is All you Need , author =. 2017 , booktitle =

work page 2017

[38] [38]

2023 , journal =

Bayesian posterior perturbation analysis with integral probability metrics , author =. 2023 , journal =

work page 2023

[39] [39]

1974 , journal =

Calculation of the Wasserstein Distance Between Probability Distributions on the Line , author =. 1974 , journal =. doi:10.1137/1118101 , url =

work page doi:10.1137/1118101 1974

[40] [40]

2021 , booktitle =

Choose a transformer: fourier or galerkin , author =. 2021 , booktitle =

work page 2021

[41] [41]

2015 , journal =

Control to flocking of the kinetic Cucker--Smale model , author =. 2015 , journal =

work page 2015

[42] [42]

Billingsley,Convergence of Probability Measures

Convergence of probability measures , author =. 1999 , publisher =. doi:10.1002/9780470316962 , isbn =

work page doi:10.1002/9780470316962 1999

[43] [43]

Pappas and Paris Perdikaris , year =

Sifan Wang and Jacob H Seidman and Shyam Sankaran and Hanwen Wang and George J. Pappas and Paris Perdikaris , year =. The Thirteenth International Conference on Learning Representations , url =

work page

[44] [44]

2023 , journal =

Diffusion models: A comprehensive survey of methods and applications , author =. 2023 , journal =

work page 2023

[45] [45]

2024 , url =

From microscopic to macroscopic scale equations: mean field, hydrodynamic and graph limits , author =. 2024 , url =. 2209.08832 , archiveprefix =

work page internal anchor Pith review arXiv 2024

[46] [46]

2023 , booktitle =

GNOT: a general neural operator transformer for operator learning , author =. 2023 , booktitle =

work page 2023

[47] [47]

Proceedings of the 41st International Conference on Machine Learning , publisher =

How Smooth Is Attention? , author =. Proceedings of the 41st International Conference on Machine Learning , publisher =. 2024 , month =

work page 2024

[48] [48]

2010 , journal =

Inverse problems: a Bayesian perspective , author =. 2010 , journal =

work page 2010

[49] [49]

2024 , journal =

Learning stochastic dynamics and predicting emergent behavior using transformers , author =. 2024 , journal =

work page 2024

[50] [50]

2024 , journal =

Measure-to-measure interpolation using Transformers , author =. 2024 , journal =

work page 2024

[51] [51]

Methods of modern mathematical physics

Reed, Michael and Simon, Barry , year =. Methods of modern mathematical physics

work page

[52] [52]

Obstructions to extension of

Lombardini, Luca and Rossi, Francesco , year =. Obstructions to extension of. Proc. Amer. Math. Soc. , volume =. doi:10.1090/proc/16030 , issn =

work page doi:10.1090/proc/16030

[53] [53]

1958 , journal =

On the Convergence of Sample Probability Distributions , author =. 1958 , journal =

work page 1958

[54] [54]

2016 , booktitle =

On the dynamics of large particle systems in the mean field limit , author =. 2016 , booktitle =

work page 2016

[55] [55]

2020 , journal =

On the local Lipschitz stability of Bayesian inverse problems , author =. 2020 , journal =

work page 2020

[56] [56]

Operator Learning with Domain Decomposition for Geometry Generalization in

Jianing Huang and Kaixuan Zhang and Youjia Wu and Ze Cheng , year =. Operator Learning with Domain Decomposition for Geometry Generalization in. The Fourteenth International Conference on Learning Representations , url =

work page

[57] [57]

, year =

Optimal Transport: Old and New , author =. 2009 , publisher =. doi:10.1007/978-3-540-71050-9 , isbn =

work page doi:10.1007/978-3-540-71050-9 2009

[58] [58]

2023 , journal =

Pattern formation of the Cucker--Smale type kinetic models based on gradient flow , author =. 2023 , journal =

work page 2023

[59] [59]

Periodic homogenization and effective mass theorems for the

Allaire, Gr\'. Periodic homogenization and effective mass theorems for the. 2008 , booktitle =. doi:10.1007/978-3-540-79574-2\_1 , url =

work page doi:10.1007/978-3-540-79574-2 2008

[60] [60]

Poseidon: Efficient foundation models for PDEs

Poseidon: Efficient Foundation Models for PDEs , author =. 2024 , url =. 2405.19101 , archiveprefix =

work page arXiv 2024

[61] [61]

2024 , booktitle =

Positional knowledge is all you need: position-induced transformer (PiT) for operator learning , author =. 2024 , booktitle =

work page 2024

[62] [62]

2020 , publisher =

Probability theory---a comprehensive course , author =. 2020 , publisher =. doi:10.1007/978-3-030-56402-5 , isbn =

work page doi:10.1007/978-3-030-56402-5 2020

[63] [63]

Garrido, Quentin and Kiani, Bobak and Lawrence, Hannah and Lecun, Yann and Mialon, Gr. Self-. 2023 , booktitle =. doi:10.52202/075280-1262 , isbn =

work page doi:10.52202/075280-1262 2023

[64] [64]

2008 , booktitle =

Separability and completeness for the Wasserstein distance , author =. 2008 , booktitle =

work page 2008

[65] [65]

2013 , publisher =

Stochastic differential equations: an introduction with applications , author =. 2013 , publisher =

work page 2013

[66] [66]

2015 , booktitle =

The Bayesian approach to inverse problems , author =. 2015 , booktitle =

work page 2015

[67] [67]

2021 , booktitle =

The lipschitz constant of self-attention , author =. 2021 , booktitle =

work page 2021

[68] [68]

2024 , journal =

Theoretical foundations of deep selective state-space models , author =. 2024 , journal =

work page 2024

[69] [69]

2024 , journal =

Towards understanding the universality of transformers for next-token prediction , author =. 2024 , journal =

work page 2024

[70] [70]

Transformer for Partial Differential Equations

Zijie Li and Kazem Meidani and Amir Barati Farimani , year =. Transformer for Partial Differential Equations. Transactions on Machine Learning Research , issn =

work page

[71] [71]

2025 , booktitle =

Transformers are Universal In-context Learners , author =. 2025 , booktitle =

work page 2025

[72] [72]

2025 , journal =

Transformers as neural operators for solutions of differential equations with finite regularity , author =. 2025 , journal =. doi:https://doi.org/10.1016/j.cma.2024.117560 , issn =

work page doi:10.1016/j.cma.2024.117560 2025

[73] [73]

2025 , journal =

Transformers through the Lens of Support-Preserving Maps between Measures , author =. 2025 , journal =

work page 2025

[74] [74]

2021 , booktitle =

Trumpets: Injective Flows for Inference and Inverse Problems , author =. 2021 , booktitle =

work page 2021

[75] [75]

2024 , journal =

Understanding the expressive power and mechanisms of transformer for sequence modeling , author =. 2024 , journal =

work page 2024

[76] [76]

2024 , journal =

Universal Approximation of Mean-Field Models via Transformers , author =. 2024 , journal =

work page 2024

[77] [77]

2024 , booktitle =

Universal physics transformers: a framework for efficiently scaling neural operators , author =. 2024 , booktitle =

work page 2024

[78] [78]

2025 , journal =

Upper and lower bounds for local Lipschitz stability of Bayesian posteriors , author =. 2025 , journal =

work page 2025

[79] [79]

Walrus: A cross-domain foundation model for continuum dynamics.arXiv preprint arXiv:2511.15684, 2025

Walrus: A Cross-Domain Foundation Model for Continuum Dynamics , author =. 2025 , url =. 2511.15684 , archiveprefix =

work page arXiv 2025