pith. machine review for the scientific record. sign in

arxiv: 2603.12725 · v3 · submitted 2026-03-13 · 💻 cs.LG · cs.AI

Recognition: no theorem link

Graph In-Context Operator Networks for Generalizable Spatiotemporal Prediction

Authors on Pith no claims yet

Pith reviewed 2026-05-15 11:51 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords in-context operator learninggraph neural networksspatiotemporal predictionair quality forecastingneural operatorsgeneralizationcontextual examplesoperator networks
0
0 comments X

The pith

In-context operator learning outperforms classical single-operator models on spatiotemporal air quality prediction across regions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that in-context operator learning enables neural networks to infer solution operators from contextual examples without weight updates, outperforming classical operator learning when both use identical training data and steps. This is demonstrated through controlled experiments on air quality prediction tasks across two Chinese regions, where the approach generalizes across spatial domains and improves as more examples are supplied at inference. The authors introduce GICON to support this on real spatiotemporal systems by combining graph message passing for geometric handling with example-aware positional encoding for varying context sizes. A sympathetic reader would care because the method suggests models can adapt to new domains or conditions using only contextual examples rather than full retraining. The results indicate robust scaling from few training examples to 100 at inference time.

Core claim

By proposing GICON, which integrates graph message passing for geometric generalization with example-aware positional encoding for cardinality generalization, the authors show that in-context operator learning outperforms classical operator learning on complex spatiotemporal tasks like air quality prediction, achieving better generalization across spatial domains and robust scaling from few to 100 examples at inference under matched training data and steps.

What carries the argument

GICON, the Graph In-Context Operator Network, which combines graph message passing to generalize across geometries with example-aware positional encoding to handle different numbers of contextual examples.

If this is right

  • In-context operator learning generalizes better across different spatial regions than single-operator training on the same data.
  • Performance scales reliably as the number of contextual examples provided at inference increases up to 100.
  • The approach succeeds on complex real-world spatiotemporal tasks such as air quality forecasting with matched training budgets.
  • Contextual examples enable inference of solution operators without requiring weight updates during testing.
  • Controlled experiments with identical training data and steps isolate the advantage of in-context mechanisms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar in-context methods could extend generalization benefits to other spatiotemporal domains such as weather forecasting or traffic flow.
  • Further experiments that swap architectures while holding the in-context mechanism fixed would better isolate the paradigm's contribution.
  • Adoption in dynamic environments could lower costs by avoiding full retraining when conditions or locations change.
  • Testing on datasets with greater spatial diversity would clarify the boundaries of the geometric generalization provided by graph passing.

Load-bearing premise

The performance gains in the controlled comparisons stem specifically from the in-context learning mechanism rather than from the graph message passing or positional encoding components of the GICON architecture.

What would settle it

Training a classical single-operator model with the same graph message passing and positional encoding but without contextual examples, then finding it matches or exceeds GICON accuracy on the air quality prediction tasks across regions, would falsify the claim.

Figures

Figures reproduced from arXiv: 2603.12725 by Boai Sun, Chenghan Wu, Liu Yang, Zongmin Yu.

Figure 1
Figure 1. Figure 1: Illustration of In-Context Operator Networks (ICON). Given [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the Graph In-Context Operator Network (GICON) architecture. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example cardinality generalization on BTHSA for simple to moderate operators. Top: [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example cardinality generalization on BTHSA at ∆ [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Operator extrapolation to ∆t = 48 (out-of-distribution) on BTHSA. Left: PM2.5. Right: O3. Single-operator shows flat curves, while example-trained ICON models improve with examples. Models with k = 5 achieve best extrapolation, with sustained improvement for PM2.5 and a sharp initial drop for O3. 11 [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Geometric generalization: models trained on BTHSA or YRD, both evaluated on YRD [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Single-operator setting (∆t = 24, on BTHSA). Left: PM2.5. Right: O3. Models trained without contextual examples show no difference between noise and quality examples. The model trained with k = 5 approaches and slightly surpasses the baseline for PM2.5 at 100 quality examples. However, when given noise, they show clear degradation, confirming that these models attend to example content. classical single-op… view at source ↗
Figure 8
Figure 8. Figure 8: Training dynamics: multi-operator vs. single-operator comparison at ∆ [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Training dynamics: single-operator ablation at ∆ [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Example cardinality generalization on YRD for simple operators. Left: PM [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Example cardinality generalization on YRD for complex operators. Left: PM [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗
read the original abstract

In-context operator learning enables neural networks to infer solution operators from contextual examples without weight updates. While prior work has demonstrated the effectiveness of this paradigm in leveraging vast datasets, a systematic comparison against single-operator learning using identical training data has been absent. We address this gap through controlled experiments comparing in-context operator learning against classical operator learning (single-operator models trained without contextual examples), under the same training steps and dataset. To enable this investigation on real-world spatiotemporal systems, we propose GICON (Graph In-Context Operator Network), combining graph message passing for geometric generalization with example-aware positional encoding for cardinality generalization. Experiments on air quality prediction across two Chinese regions show that in-context operator learning outperforms classical operator learning on complex tasks, generalizing across spatial domains and scaling robustly from few training examples to 100 at inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces GICON (Graph In-Context Operator Network), which augments operator learning with graph message passing for geometric generalization and example-aware positional encoding for cardinality generalization. It claims that in-context operator learning with GICON outperforms classical single-operator learning (models trained without contextual examples) on air quality prediction across two Chinese regions, under identical training data and steps, with improved generalization across spatial domains and robust scaling from few-shot to 100-example inference.

Significance. If the central empirical comparison holds after isolating the in-context mechanism, the result would strengthen the case for in-context paradigms in operator networks for real-world spatiotemporal systems, offering a route to domain-generalizable predictions without per-domain retraining.

major comments (2)
  1. [Abstract / Experiments] Abstract and Experiments section: the claim that in-context operator learning outperforms classical operator learning rests on a controlled comparison under identical data and training steps, but the manuscript does not state that the classical baselines use the identical GICON backbone with only the in-context components removed. Because GICON introduces graph message passing and example-aware positional encoding, any performance gap could be driven by these new inductive biases rather than the in-context training/inference procedure itself.
  2. [Abstract] Abstract: the central empirical claim of outperformance, generalization across regions, and scaling from few to 100 examples is asserted without any quantitative metrics, error bars, dataset sizes, or implementation details in the provided text, leaving the result unverifiable from the manuscript as presented.
minor comments (1)
  1. [Experiments] Add explicit statements of dataset sizes, number of spatial locations, and exact training/inference protocols in the Experiments section to support reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and describe the revisions that will be made.

read point-by-point responses
  1. Referee: [Abstract / Experiments] Abstract and Experiments section: the claim that in-context operator learning outperforms classical operator learning rests on a controlled comparison under identical data and training steps, but the manuscript does not state that the classical baselines use the identical GICON backbone with only the in-context components removed. Because GICON introduces graph message passing and example-aware positional encoding, any performance gap could be driven by these new inductive biases rather than the in-context training/inference procedure itself.

    Authors: We agree that the manuscript text should explicitly confirm the architecture used for the baselines to isolate the in-context mechanism. The controlled experiments employ the identical GICON backbone for both the in-context and classical settings, with the classical models trained on single examples and without the example-aware positional encoding. To eliminate ambiguity, we will revise the Abstract and Experiments section to state that classical baselines use the same GICON architecture with only the in-context components removed. This change will be incorporated in the revised manuscript. revision: yes

  2. Referee: [Abstract] Abstract: the central empirical claim of outperformance, generalization across regions, and scaling from few to 100 examples is asserted without any quantitative metrics, error bars, dataset sizes, or implementation details in the provided text, leaving the result unverifiable from the manuscript as presented.

    Authors: The abstract provides a high-level summary of the findings. All quantitative metrics, error bars from repeated runs, dataset sizes, and implementation details appear in the Experiments section. To improve verifiability from the abstract itself, we will add concise quantitative highlights (e.g., average error reduction and scaling behavior) while respecting length limits. This constitutes a partial revision. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical comparison rests on controlled experiments, not self-referential definitions or fitted predictions

full rationale

The paper's central claim is an empirical result from controlled experiments on air quality data: in-context operator learning (via GICON) outperforms single-operator classical learning under identical training steps and dataset. GICON is introduced as an enabling architecture (graph message passing + example-aware positional encoding) rather than a derived quantity. No equations, parameters, or self-citations reduce the reported outperformance to a tautology or input by construction. The comparison is presented as isolating the in-context paradigm; any architectural confounding is a methodological question, not a circular reduction in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions that graph message passing suffices for geometric generalization in spatiotemporal data and that example-aware positional encoding enables handling variable context sizes; no free parameters or invented entities are specified in the abstract.

axioms (2)
  • domain assumption Graph message passing captures geometric structure for generalization across spatial domains
    Invoked in the design of GICON for spatiotemporal systems
  • domain assumption Example-aware positional encoding enables cardinality generalization without retraining
    Key component proposed to handle varying numbers of contextual examples

pith-pipeline@v0.9.0 · 5437 in / 1308 out tokens · 67872 ms · 2026-05-15T11:51:38.111761+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Evolutionary Ensemble of Agents

    cs.NE 2026-05 unverdicted novelty 7.0

    EvE uses co-evolving populations of solvers and guidance states with Elo-based evaluation to autonomously discover a rescale-then-interpolate mechanism for better generalization in In-Context Operator Networks.

  2. Evolutionary Ensemble of Agents

    cs.NE 2026-05 unverdicted novelty 5.0

    EvE co-evolves code solvers and guidance states via synchronous races and Elo updates, discovering a rescale-then-interpolate mechanism that enables example-count generalization in ICON.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · cited by 1 Pith paper

  1. [1]

    Weinan E, Jiequn Han, and Arnulf Jentzen. Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations.Communications in Mathematics and Statistics, 5(4):349–380, 2017

  2. [2]

    Solving high-dimensional partial differential equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505– 8510, 2018

    Jiequn Han, Arnulf Jentzen, and Weinan E. Solving high-dimensional partial differential equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505– 8510, 2018

  3. [3]

    Dgm: A deep learning algorithm for solving partial differential equations.Journal of Computational Physics, 375:1339–1364, 2018

    Justin Sirignano and Konstantinos Spiliopoulos. Dgm: A deep learning algorithm for solving partial differential equations.Journal of Computational Physics, 375:1339–1364, 2018

  4. [4]

    The deep ritz method: A deep learning-based numerical algorithm for solving variational problems.Communications in Mathematics and Statistics, 6(1):1–12, 2018

    Weinan E and Bing Yu. The deep ritz method: A deep learning-based numerical algorithm for solving variational problems.Communications in Mathematics and Statistics, 6(1):1–12, 2018

  5. [5]

    PDE-Net: Learning PDEs from data

    Zichao Long, Yiping Lu, Xianzhong Ma, and Bin Dong. PDE-Net: Learning PDEs from data. InInternational Conference on Machine Learning (ICML), pages 3208–3216. PMLR, 2018

  6. [6]

    Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational Physics, 378:686–707, 2019

  7. [7]

    Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators

    Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, 2021

  8. [8]

    Fourier neural operator for parametric partial differential equations

    Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations (ICLR), 2021

  9. [9]

    Liu Yang, Siting Liu, Tingwei Meng, and Stanley J. Osher. In-context operator learning with data prompts for differential equation problems.Proceedings of the National Academy of Sciences, 120(39):e2310142120, 2023

  10. [10]

    Liu Yang, Siting Liu, and Stanley J. Osher. Fine-tune language models as multi-modal differential equation solvers.Neural Networks, 2025

  11. [11]

    Liu Yang and Stanley J. Osher. PDE generalization of in-context operator networks: A study on 1D scalar nonlinear conservation laws.Journal of Computational Physics, 519:113379, 2024

  12. [12]

    Yadi Cao, Yuxuan Liu, Liu Yang, Rose Yu, Hayden Schaeffer, and Stanley J. Osher. VICON: Vision in-context operator networks for multi-physics fluid dynamics prediction.Transactions on Machine Learning Research, 2026

  13. [13]

    Language models are unsupervised multitask learners

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. Technical report, OpenAI, 2019

  14. [14]

    Language models are few-shot learners

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. InAdvances in Neural Information Processing Systems (NeurIPS), volume 33, pages 1877–1901, 2020. 15

  15. [15]

    KnowAir-V2: A benchmark dataset for air quality forecasting with PCDCNet, 2025

    Shuo Wang, Yun Cheng, Qingye Meng, Olga Saukh, Jiang Zhang, Jingfang Fan, Yuanting Zhang, Xingyuan Yuan, and Lothar Thiele. KnowAir-V2: A benchmark dataset for air quality forecasting with PCDCNet, 2025. Data set

  16. [16]

    Tianping Chen and Hong Chen. Approximation capability to functions of several variables, nonlinear functionals, and operators by radial basis function neural networks.IEEE Transactions on Neural Networks, 6(4):904–910, 1995

  17. [17]

    Tianping Chen and Hong Chen. Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems.IEEE Transactions on Neural Networks, 6(4):911–917, 1995

  18. [18]

    A comprehensive and fair comparison of two neural operators (with practical extensions) based on fair data.Computer Methods in Applied Mechanics and Engineering, 393:114778, 2022

    Lu Lu, Xuhui Meng, Shengze Cai, Zhiping Mao, Somdatta Goswami, Zhongqiang Zhang, and George Em Karniadakis. A comprehensive and fair comparison of two neural operators (with practical extensions) based on fair data.Computer Methods in Applied Mechanics and Engineering, 393:114778, 2022

  19. [19]

    MIONet: Learning multiple-input operators via tensor product.SIAM Journal on Scientific Computing, 44(6):A3490–A3514, 2022

    Pengzhan Jin, Shuai Meng, and Lu Lu. MIONet: Learning multiple-input operators via tensor product.SIAM Journal on Scientific Computing, 44(6):A3490–A3514, 2022

  20. [20]

    Neural operator: Learning maps between function spaces with applications to PDEs.Journal of Machine Learning Research, 24(89):1–97, 2023

    Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces with applications to PDEs.Journal of Machine Learning Research, 24(89):1–97, 2023

  21. [21]

    Learning the solution operator of para- metric partial differential equations with physics-informed DeepONets.Science Advances, 7(40):eabi8605, 2021

    Sifan Wang, Hanwen Wang, and Paris Perdikaris. Learning the solution operator of para- metric partial differential equations with physics-informed DeepONets.Science Advances, 7(40):eabi8605, 2021

  22. [22]

    Bayesian deep operator learning for homogenized to fine-scale maps for multiscale PDE.Multiscale Modeling & Simulation, 22(3):956–972, 2024

    Zecheng Zhang, Christian Moya, Wing Tat Leung, Guang Lin, and Hayden Schaeffer. Bayesian deep operator learning for homogenized to fine-scale maps for multiscale PDE.Multiscale Modeling & Simulation, 22(3):956–972, 2024

  23. [23]

    U-FNO—an enhanced fourier neural operator-based deep-learning model for multiphase flow

    Gege Wen, Zongyi Li, Kamyar Azizzadenesheli, Anima Anandkumar, and Sally M Benson. U-FNO—an enhanced fourier neural operator-based deep-learning model for multiphase flow. Advances in Water Resources, 163:104180, 2022

  24. [24]

    BelNet: Basis enhanced learning, a mesh-free neural operator.Proceedings of the Royal Society A, 479(2276):20230043, 2023

    Zecheng Zhang, Wing Tat Leung, and Hayden Schaeffer. BelNet: Basis enhanced learning, a mesh-free neural operator.Proceedings of the Royal Society A, 479(2276):20230043, 2023

  25. [25]

    GNOT: A general neural operator transformer for operator learning

    Zhongkai Hao, Zhengyi Wang, Hang Su, Chengyang Ying, Yinpeng Dong, Songming Liu, Ze Cheng, Jian Song, and Jun Zhu. GNOT: A general neural operator transformer for operator learning. InInternational Conference on Machine Learning (ICML), volume 202, pages 12556–12569. PMLR, 2023

  26. [26]

    Geometry- informed neural operator for large-scale 3D PDEs

    Zongyi Li, Nikola Kovachki, Chris Choy, Boyi Li, Jean Kossaifi, Shourya Otta, Mohammad Amin Nabian, Maximilian Stadler, Christian Hundt, Kamyar Azizzadenesheli, et al. Geometry- informed neural operator for large-scale 3D PDEs. InAdvances in Neural Information Processing Systems (NeurIPS), volume 36, 2023

  27. [27]

    D2NO: Efficient han- dling of heterogeneous input function spaces with distributed deep neural operators.Computer Methods in Applied Mechanics and Engineering, 428:117084, 2024

    Zecheng Zhang, Christian Moya, Lu Lu, Guang Lin, and Hayden Schaeffer. D2NO: Efficient han- dling of heterogeneous input function spaces with distributed deep neural operators.Computer Methods in Applied Mechanics and Engineering, 428:117084, 2024. 16

  28. [28]

    FourCastNet: Accel- erating global high-resolution weather forecasting using adaptive Fourier neural operators

    Thorsten Kurth, Shashank Subramanian, Peter Harrington, Jaideep Pathak, Morteza Mardani, David Hall, Andrea Miele, Karthik Kashinath, and Anima Anandkumar. FourCastNet: Accel- erating global high-resolution weather forecasting using adaptive Fourier neural operators. In Proceedings of the Platform for Advanced Scientific Computing Conference (PASC ’23), D...

  29. [29]

    Zhongyi Jiang, Min Zhu, and Lu Lu. Fourier-MIONet: Fourier-enhanced multiple-input neural operators for multiphase modeling of geological carbon sequestration.Reliability Engineering & System Safety, 251:110392, 2024

  30. [30]

    Trayanova, and Mauro Mag- gioni

    Minglang Yin, Nicolas Charon, Ryan Brody, Lu Lu, Natalia A. Trayanova, and Mauro Mag- gioni. A scalable framework for learning the geometry-dependent solution operators of partial differential equations.Nature Computational Science, 4(12):928–940, 2024

  31. [31]

    Conformalized- DeepONet: A distribution-free framework for uncertainty quantification in deep operator networks.Physica D: Nonlinear Phenomena, 471:134418, 2025

    Christian Moya, Amirhossein Mollaali, Zecheng Zhang, Lu Lu, and Guang Lin. Conformalized- DeepONet: A distribution-free framework for uncertainty quantification in deep operator networks.Physica D: Nonlinear Phenomena, 471:134418, 2025

  32. [32]

    Poseidon: Efficient foundation models for PDEs

    Maximilian Herde, Bogdan Raoni´ c, Tobias Rohner, Roger K¨ appeli, Roberto Molinaro, Em- manuel de B´ ezenac, and Siddhartha Mishra. Poseidon: Efficient foundation models for PDEs. InAdvances in Neural Information Processing Systems (NeurIPS), volume 37, 2024

  33. [33]

    LeMON: Learning to learn multi-operator networks, 2024

    Jingmin Sun, Zecheng Zhang, and Hayden Schaeffer. LeMON: Learning to learn multi-operator networks, 2024

  34. [34]

    Battaglia, Jessica B

    Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zam- baldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierst...

  35. [35]

    Battaglia

    Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Peter W. Battaglia. Learning to simulate complex physics with graph networks. InInternational Conference on Machine Learning (ICML), volume 119, pages 8459–8468. PMLR, 2020

  36. [36]

    Battaglia

    Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W. Battaglia. Learning mesh- based simulation with graph networks. InInternational Conference on Learning Representations (ICLR), 2021. Outstanding Paper Award

  37. [37]

    Message passing neural PDE solvers

    Johannes Brandstetter, Daniel Worrall, and Max Welling. Message passing neural PDE solvers. InInternational Conference on Learning Representations (ICLR), 2022. Spotlight

  38. [38]

    Neural operator: Graph kernel network for partial differential equations, 2020

    Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Graph kernel network for partial differential equations, 2020

  39. [39]

    Multipole graph neural operator for parametric partial differential equations

    Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Multipole graph neural operator for parametric partial differential equations. InAdvances in Neural Information Processing Systems (NeurIPS), volume 33, pages 6755–6766, 2020. 17

  40. [40]

    RIGNO: A graph-based framework for robust and accurate operator learning for PDEs on arbitrary domains

    Sepehr Mousavi, Shizheng Wen, Levi Lingsch, Maximilian Herde, Bogdan Raoni´ c, and Sid- dhartha Mishra. RIGNO: A graph-based framework for robust and accurate operator learning for PDEs on arbitrary domains. InAdvances in Neural Information Processing Systems (NeurIPS), volume 38, 2025

  41. [41]

    Zhang, Siting Liu, Stanley J

    Benjamin J. Zhang, Siting Liu, Stanley J. Osher, and Markos A. Katsoulakis. Probabilistic operator learning: generative modeling and uncertainty quantification for foundation models of differential equations, 2025

  42. [42]

    Osher, and Georg Menz

    Tingwei Meng, Moritz Voß, Nils Detering, Giulio Farolfi, Stanley J. Osher, and Georg Menz. Solving optimal execution problems via in-context operator networks, 2025

  43. [43]

    In-context operator learning on the space of probability measures, 2026

    Frank Cole, Dixi Wang, Yineng Chen, Yulong Lu, and Rongjie Lai. In-context operator learning on the space of probability measures, 2026

  44. [44]

    Benjamin Erichson, Kush Bhatia, Michael W

    Jerry Weihong Liu, N. Benjamin Erichson, Kush Bhatia, Michael W. Mahoney, and Christopher Re. Does in-context operator learning generalize to domain-shifted settings? InNeurIPS 2023 Workshop on the Symbiosis of Deep Learning and Differential Equations (DLDE III), 2023

  45. [45]

    In-context learning of linear systems: Generalization theory and applications to operator learning, 2024

    Frank Cole, Yulong Lu, Wuzhe Xu, and Tianhao Zhang. In-context learning of linear systems: Generalization theory and applications to operator learning, 2024

  46. [46]

    Continuum transformers perform in-context learning by operator gradient descent, 2025

    Abhiti Mishra, Yash Patel, and Ambuj Tewari. Continuum transformers perform in-context learning by operator gradient descent, 2025

  47. [47]

    Prose: Predicting operators and symbolic expressions using multimodal transformers.Neural Networks, 180:106707, 2024

    Yuxuan Liu, Zecheng Zhang, and Hayden Schaeffer. Prose: Predicting operators and symbolic expressions using multimodal transformers.Neural Networks, 180:106707, 2024

  48. [48]

    PROSE-FD: A multimodal PDE foundation model for learning multiple operators for forecasting fluid dynamics

    Yuxuan Liu, Jingmin Sun, Xinjie He, Griffin Pinney, Zecheng Zhang, and Hayden Schaeffer. PROSE-FD: A multimodal PDE foundation model for learning multiple operators for forecasting fluid dynamics. InNeurIPS 2024 Workshop on Foundation Models for Science (FM4Science), 2024

  49. [49]

    Data-efficient operator learning via unsupervised pretraining and in-context learning

    Wuyang Chen, Jialin Song, Pu Ren, Shashank Subramanian, Dmitriy Morozov, and Michael Mahoney. Data-efficient operator learning via unsupervised pretraining and in-context learning. InAdvances in Neural Information Processing Systems (NeurIPS), volume 37, 2024

  50. [50]

    Zebra: In-context and generative pretraining for solving parametric PDEs

    Louis Serrano, Armand Kassa¨ ı Koupa¨ ı, Thomas X Wang, Pierre Erbacher, and Patrick Gallinari. Zebra: In-context and generative pretraining for solving parametric PDEs. InInternational Conference on Machine Learning (ICML). PMLR, 2025

  51. [51]

    ENMA: Tokenwise autoregression for continuous neural PDE operators

    Armand Kassa¨ ı Koupa¨ ı, Lise Le Boudec, Louis Serrano, and Patrick Gallinari. ENMA: Tokenwise autoregression for continuous neural PDE operators. InAdvances in Neural Information Processing Systems (NeurIPS), volume 38, 2025

  52. [52]

    The Faiss library.IEEE Transactions on Big Data, 2025

    Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre- Emmanuel Mazar´ e, Maria Lomeli, Lucas Hosseini, and Herv´ e J´ egou. The Faiss library.IEEE Transactions on Big Data, 2025

  53. [53]

    PCDCNet: A surrogate model for air quality forecasting with physical-chemical dynamics and constraints, 2025

    Shuo Wang, Yun Cheng, Qingye Meng, Olga Saukh, Jiang Zhang, Jingfang Fan, Yuanting Zhang, Xingyuan Yuan, and Lothar Thiele. PCDCNet: A surrogate model for air quality forecasting with physical-chemical dynamics and constraints, 2025. 18

  54. [54]

    May, Kevin H

    Ryan M. May, Kevin H. Goebbert, Jonathan E. Thielen, John R. Leeman, M. Drew Camron, Zachary Bruick, Eric C. Bruning, Russell P. Manser, Sean C. Arms, and Patrick T. Marsh. MetPy: A meteorological python library for data analysis and visualization.Bulletin of the American Meteorological Society, 103(10):E2273–E2284, 2022

  55. [55]

    Root mean square layer normalization

    Biao Zhang and Rico Sennrich. Root mean square layer normalization. InAdvances in Neural Information Processing Systems (NeurIPS), volume 32, 2019

  56. [56]

    Muon: An optimizer for hidden layers in neural networks

    Keller Jordan, Yuchen Jin, Vlado Boza, Jiacheng You, Franz Cesista, Laker Newhouse, and Jeremy Bernstein. Muon: An optimizer for hidden layers in neural networks. https://github. com/KellerJordan/Muon, 2024. 19 A Positional Encoding Algorithms Algorithm 1Positional Encoding for GICON Require:Hidden statesH∈R |V|×(2k+1)×d, number of attention headsH — Inte...