arxiv: 2603.12725 · v3 · submitted 2026-03-13 · 💻 cs.LG · cs.AI

Recognition: no theorem link

Graph In-Context Operator Networks for Generalizable Spatiotemporal Prediction

Chenghan Wu , Zongmin Yu , Boai Sun , Liu Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-15 11:51 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords in-context operator learninggraph neural networksspatiotemporal predictionair quality forecastingneural operatorsgeneralizationcontextual examplesoperator networks

0 comments

The pith

In-context operator learning outperforms classical single-operator models on spatiotemporal air quality prediction across regions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that in-context operator learning enables neural networks to infer solution operators from contextual examples without weight updates, outperforming classical operator learning when both use identical training data and steps. This is demonstrated through controlled experiments on air quality prediction tasks across two Chinese regions, where the approach generalizes across spatial domains and improves as more examples are supplied at inference. The authors introduce GICON to support this on real spatiotemporal systems by combining graph message passing for geometric handling with example-aware positional encoding for varying context sizes. A sympathetic reader would care because the method suggests models can adapt to new domains or conditions using only contextual examples rather than full retraining. The results indicate robust scaling from few training examples to 100 at inference time.

Core claim

By proposing GICON, which integrates graph message passing for geometric generalization with example-aware positional encoding for cardinality generalization, the authors show that in-context operator learning outperforms classical operator learning on complex spatiotemporal tasks like air quality prediction, achieving better generalization across spatial domains and robust scaling from few to 100 examples at inference under matched training data and steps.

What carries the argument

GICON, the Graph In-Context Operator Network, which combines graph message passing to generalize across geometries with example-aware positional encoding to handle different numbers of contextual examples.

If this is right

In-context operator learning generalizes better across different spatial regions than single-operator training on the same data.
Performance scales reliably as the number of contextual examples provided at inference increases up to 100.
The approach succeeds on complex real-world spatiotemporal tasks such as air quality forecasting with matched training budgets.
Contextual examples enable inference of solution operators without requiring weight updates during testing.
Controlled experiments with identical training data and steps isolate the advantage of in-context mechanisms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar in-context methods could extend generalization benefits to other spatiotemporal domains such as weather forecasting or traffic flow.
Further experiments that swap architectures while holding the in-context mechanism fixed would better isolate the paradigm's contribution.
Adoption in dynamic environments could lower costs by avoiding full retraining when conditions or locations change.
Testing on datasets with greater spatial diversity would clarify the boundaries of the geometric generalization provided by graph passing.

Load-bearing premise

The performance gains in the controlled comparisons stem specifically from the in-context learning mechanism rather than from the graph message passing or positional encoding components of the GICON architecture.

What would settle it

Training a classical single-operator model with the same graph message passing and positional encoding but without contextual examples, then finding it matches or exceeds GICON accuracy on the air quality prediction tasks across regions, would falsify the claim.

Figures

Figures reproduced from arXiv: 2603.12725 by Boai Sun, Chenghan Wu, Liu Yang, Zongmin Yu.

**Figure 2.** Figure 2: Overview of the Graph In-Context Operator Network (GICON) architecture. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Example cardinality generalization on BTHSA for simple to moderate operators. Top: [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Example cardinality generalization on BTHSA at ∆ [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Operator extrapolation to ∆t = 48 (out-of-distribution) on BTHSA. Left: PM2.5. Right: O3. Single-operator shows flat curves, while example-trained ICON models improve with examples. Models with k = 5 achieve best extrapolation, with sustained improvement for PM2.5 and a sharp initial drop for O3. 11 [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Geometric generalization: models trained on BTHSA or YRD, both evaluated on YRD [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Single-operator setting (∆t = 24, on BTHSA). Left: PM2.5. Right: O3. Models trained without contextual examples show no difference between noise and quality examples. The model trained with k = 5 approaches and slightly surpasses the baseline for PM2.5 at 100 quality examples. However, when given noise, they show clear degradation, confirming that these models attend to example content. classical single-op… view at source ↗

**Figure 8.** Figure 8: Training dynamics: multi-operator vs. single-operator comparison at ∆ [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

**Figure 9.** Figure 9: Training dynamics: single-operator ablation at ∆ [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 10.** Figure 10: Example cardinality generalization on YRD for simple operators. Left: PM [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗

**Figure 11.** Figure 11: Example cardinality generalization on YRD for complex operators. Left: PM [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗

read the original abstract

In-context operator learning enables neural networks to infer solution operators from contextual examples without weight updates. While prior work has demonstrated the effectiveness of this paradigm in leveraging vast datasets, a systematic comparison against single-operator learning using identical training data has been absent. We address this gap through controlled experiments comparing in-context operator learning against classical operator learning (single-operator models trained without contextual examples), under the same training steps and dataset. To enable this investigation on real-world spatiotemporal systems, we propose GICON (Graph In-Context Operator Network), combining graph message passing for geometric generalization with example-aware positional encoding for cardinality generalization. Experiments on air quality prediction across two Chinese regions show that in-context operator learning outperforms classical operator learning on complex tasks, generalizing across spatial domains and scaling robustly from few training examples to 100 at inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GICON combines graph passing and positional encoding for in-context operator learning on spatiotemporal data, with experiments claiming better generalization than classical methods, but the comparison may not isolate the in-context part cleanly.

read the letter

The paper introduces GICON, which adds graph message passing to handle spatial geometry and example-aware positional encoding to deal with varying numbers of context examples. This setup targets real-world spatiotemporal prediction like air quality monitoring, and the controlled experiments on data from two Chinese regions show the in-context version generalizing across domains and scaling from small to 100 examples at inference, outperforming single-operator baselines trained the same way.

Referee Report

2 major / 1 minor

Summary. The paper introduces GICON (Graph In-Context Operator Network), which augments operator learning with graph message passing for geometric generalization and example-aware positional encoding for cardinality generalization. It claims that in-context operator learning with GICON outperforms classical single-operator learning (models trained without contextual examples) on air quality prediction across two Chinese regions, under identical training data and steps, with improved generalization across spatial domains and robust scaling from few-shot to 100-example inference.

Significance. If the central empirical comparison holds after isolating the in-context mechanism, the result would strengthen the case for in-context paradigms in operator networks for real-world spatiotemporal systems, offering a route to domain-generalizable predictions without per-domain retraining.

major comments (2)

[Abstract / Experiments] Abstract and Experiments section: the claim that in-context operator learning outperforms classical operator learning rests on a controlled comparison under identical data and training steps, but the manuscript does not state that the classical baselines use the identical GICON backbone with only the in-context components removed. Because GICON introduces graph message passing and example-aware positional encoding, any performance gap could be driven by these new inductive biases rather than the in-context training/inference procedure itself.
[Abstract] Abstract: the central empirical claim of outperformance, generalization across regions, and scaling from few to 100 examples is asserted without any quantitative metrics, error bars, dataset sizes, or implementation details in the provided text, leaving the result unverifiable from the manuscript as presented.

minor comments (1)

[Experiments] Add explicit statements of dataset sizes, number of spatial locations, and exact training/inference protocols in the Experiments section to support reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and describe the revisions that will be made.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: the claim that in-context operator learning outperforms classical operator learning rests on a controlled comparison under identical data and training steps, but the manuscript does not state that the classical baselines use the identical GICON backbone with only the in-context components removed. Because GICON introduces graph message passing and example-aware positional encoding, any performance gap could be driven by these new inductive biases rather than the in-context training/inference procedure itself.

Authors: We agree that the manuscript text should explicitly confirm the architecture used for the baselines to isolate the in-context mechanism. The controlled experiments employ the identical GICON backbone for both the in-context and classical settings, with the classical models trained on single examples and without the example-aware positional encoding. To eliminate ambiguity, we will revise the Abstract and Experiments section to state that classical baselines use the same GICON architecture with only the in-context components removed. This change will be incorporated in the revised manuscript. revision: yes
Referee: [Abstract] Abstract: the central empirical claim of outperformance, generalization across regions, and scaling from few to 100 examples is asserted without any quantitative metrics, error bars, dataset sizes, or implementation details in the provided text, leaving the result unverifiable from the manuscript as presented.

Authors: The abstract provides a high-level summary of the findings. All quantitative metrics, error bars from repeated runs, dataset sizes, and implementation details appear in the Experiments section. To improve verifiability from the abstract itself, we will add concise quantitative highlights (e.g., average error reduction and scaling behavior) while respecting length limits. This constitutes a partial revision. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical comparison rests on controlled experiments, not self-referential definitions or fitted predictions

full rationale

The paper's central claim is an empirical result from controlled experiments on air quality data: in-context operator learning (via GICON) outperforms single-operator classical learning under identical training steps and dataset. GICON is introduced as an enabling architecture (graph message passing + example-aware positional encoding) rather than a derived quantity. No equations, parameters, or self-citations reduce the reported outperformance to a tautology or input by construction. The comparison is presented as isolating the in-context paradigm; any architectural confounding is a methodological question, not a circular reduction in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions that graph message passing suffices for geometric generalization in spatiotemporal data and that example-aware positional encoding enables handling variable context sizes; no free parameters or invented entities are specified in the abstract.

axioms (2)

domain assumption Graph message passing captures geometric structure for generalization across spatial domains
Invoked in the design of GICON for spatiotemporal systems
domain assumption Example-aware positional encoding enables cardinality generalization without retraining
Key component proposed to handle varying numbers of contextual examples

pith-pipeline@v0.9.0 · 5437 in / 1308 out tokens · 67872 ms · 2026-05-15T11:51:38.111761+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Evolutionary Ensemble of Agents
cs.NE 2026-05 unverdicted novelty 7.0

EvE uses co-evolving populations of solvers and guidance states with Elo-based evaluation to autonomously discover a rescale-then-interpolate mechanism for better generalization in In-Context Operator Networks.
Evolutionary Ensemble of Agents
cs.NE 2026-05 unverdicted novelty 5.0

EvE co-evolves code solvers and guidance states via synchronous races and Elo updates, discovering a rescale-then-interpolate mechanism that enables example-count generalization in ICON.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · cited by 1 Pith paper

[1]

Weinan E, Jiequn Han, and Arnulf Jentzen. Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations.Communications in Mathematics and Statistics, 5(4):349–380, 2017

work page 2017
[2]

Solving high-dimensional partial differential equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505– 8510, 2018

Jiequn Han, Arnulf Jentzen, and Weinan E. Solving high-dimensional partial differential equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505– 8510, 2018

work page 2018
[3]

Dgm: A deep learning algorithm for solving partial differential equations.Journal of Computational Physics, 375:1339–1364, 2018

Justin Sirignano and Konstantinos Spiliopoulos. Dgm: A deep learning algorithm for solving partial differential equations.Journal of Computational Physics, 375:1339–1364, 2018

work page 2018
[4]

The deep ritz method: A deep learning-based numerical algorithm for solving variational problems.Communications in Mathematics and Statistics, 6(1):1–12, 2018

Weinan E and Bing Yu. The deep ritz method: A deep learning-based numerical algorithm for solving variational problems.Communications in Mathematics and Statistics, 6(1):1–12, 2018

work page 2018
[5]

PDE-Net: Learning PDEs from data

Zichao Long, Yiping Lu, Xianzhong Ma, and Bin Dong. PDE-Net: Learning PDEs from data. InInternational Conference on Machine Learning (ICML), pages 3208–3216. PMLR, 2018

work page 2018
[6]

Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational Physics, 378:686–707, 2019

work page 2019
[7]

Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, 2021

work page 2021
[8]

Fourier neural operator for parametric partial differential equations

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations (ICLR), 2021

work page 2021
[9]

Liu Yang, Siting Liu, Tingwei Meng, and Stanley J. Osher. In-context operator learning with data prompts for differential equation problems.Proceedings of the National Academy of Sciences, 120(39):e2310142120, 2023

work page 2023
[10]

Liu Yang, Siting Liu, and Stanley J. Osher. Fine-tune language models as multi-modal differential equation solvers.Neural Networks, 2025

work page 2025
[11]

Liu Yang and Stanley J. Osher. PDE generalization of in-context operator networks: A study on 1D scalar nonlinear conservation laws.Journal of Computational Physics, 519:113379, 2024

work page 2024
[12]

Yadi Cao, Yuxuan Liu, Liu Yang, Rose Yu, Hayden Schaeffer, and Stanley J. Osher. VICON: Vision in-context operator networks for multi-physics fluid dynamics prediction.Transactions on Machine Learning Research, 2026

work page 2026
[13]

Language models are unsupervised multitask learners

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. Technical report, OpenAI, 2019

work page 2019
[14]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. InAdvances in Neural Information Processing Systems (NeurIPS), volume 33, pages 1877–1901, 2020. 15

work page 1901
[15]

KnowAir-V2: A benchmark dataset for air quality forecasting with PCDCNet, 2025

Shuo Wang, Yun Cheng, Qingye Meng, Olga Saukh, Jiang Zhang, Jingfang Fan, Yuanting Zhang, Xingyuan Yuan, and Lothar Thiele. KnowAir-V2: A benchmark dataset for air quality forecasting with PCDCNet, 2025. Data set

work page 2025
[16]

Tianping Chen and Hong Chen. Approximation capability to functions of several variables, nonlinear functionals, and operators by radial basis function neural networks.IEEE Transactions on Neural Networks, 6(4):904–910, 1995

work page 1995
[17]

Tianping Chen and Hong Chen. Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems.IEEE Transactions on Neural Networks, 6(4):911–917, 1995

work page 1995
[18]

A comprehensive and fair comparison of two neural operators (with practical extensions) based on fair data.Computer Methods in Applied Mechanics and Engineering, 393:114778, 2022

Lu Lu, Xuhui Meng, Shengze Cai, Zhiping Mao, Somdatta Goswami, Zhongqiang Zhang, and George Em Karniadakis. A comprehensive and fair comparison of two neural operators (with practical extensions) based on fair data.Computer Methods in Applied Mechanics and Engineering, 393:114778, 2022

work page 2022
[19]

MIONet: Learning multiple-input operators via tensor product.SIAM Journal on Scientific Computing, 44(6):A3490–A3514, 2022

Pengzhan Jin, Shuai Meng, and Lu Lu. MIONet: Learning multiple-input operators via tensor product.SIAM Journal on Scientific Computing, 44(6):A3490–A3514, 2022

work page 2022
[20]

Neural operator: Learning maps between function spaces with applications to PDEs.Journal of Machine Learning Research, 24(89):1–97, 2023

Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces with applications to PDEs.Journal of Machine Learning Research, 24(89):1–97, 2023

work page 2023
[21]

Learning the solution operator of para- metric partial differential equations with physics-informed DeepONets.Science Advances, 7(40):eabi8605, 2021

Sifan Wang, Hanwen Wang, and Paris Perdikaris. Learning the solution operator of para- metric partial differential equations with physics-informed DeepONets.Science Advances, 7(40):eabi8605, 2021

work page 2021
[22]

Bayesian deep operator learning for homogenized to fine-scale maps for multiscale PDE.Multiscale Modeling & Simulation, 22(3):956–972, 2024

Zecheng Zhang, Christian Moya, Wing Tat Leung, Guang Lin, and Hayden Schaeffer. Bayesian deep operator learning for homogenized to fine-scale maps for multiscale PDE.Multiscale Modeling & Simulation, 22(3):956–972, 2024

work page 2024
[23]

U-FNO—an enhanced fourier neural operator-based deep-learning model for multiphase flow

Gege Wen, Zongyi Li, Kamyar Azizzadenesheli, Anima Anandkumar, and Sally M Benson. U-FNO—an enhanced fourier neural operator-based deep-learning model for multiphase flow. Advances in Water Resources, 163:104180, 2022

work page 2022
[24]

BelNet: Basis enhanced learning, a mesh-free neural operator.Proceedings of the Royal Society A, 479(2276):20230043, 2023

Zecheng Zhang, Wing Tat Leung, and Hayden Schaeffer. BelNet: Basis enhanced learning, a mesh-free neural operator.Proceedings of the Royal Society A, 479(2276):20230043, 2023

work page 2023
[25]

GNOT: A general neural operator transformer for operator learning

Zhongkai Hao, Zhengyi Wang, Hang Su, Chengyang Ying, Yinpeng Dong, Songming Liu, Ze Cheng, Jian Song, and Jun Zhu. GNOT: A general neural operator transformer for operator learning. InInternational Conference on Machine Learning (ICML), volume 202, pages 12556–12569. PMLR, 2023

work page 2023
[26]

Geometry- informed neural operator for large-scale 3D PDEs

Zongyi Li, Nikola Kovachki, Chris Choy, Boyi Li, Jean Kossaifi, Shourya Otta, Mohammad Amin Nabian, Maximilian Stadler, Christian Hundt, Kamyar Azizzadenesheli, et al. Geometry- informed neural operator for large-scale 3D PDEs. InAdvances in Neural Information Processing Systems (NeurIPS), volume 36, 2023

work page 2023
[27]

D2NO: Efficient han- dling of heterogeneous input function spaces with distributed deep neural operators.Computer Methods in Applied Mechanics and Engineering, 428:117084, 2024

Zecheng Zhang, Christian Moya, Lu Lu, Guang Lin, and Hayden Schaeffer. D2NO: Efficient han- dling of heterogeneous input function spaces with distributed deep neural operators.Computer Methods in Applied Mechanics and Engineering, 428:117084, 2024. 16

work page 2024
[28]

FourCastNet: Accel- erating global high-resolution weather forecasting using adaptive Fourier neural operators

Thorsten Kurth, Shashank Subramanian, Peter Harrington, Jaideep Pathak, Morteza Mardani, David Hall, Andrea Miele, Karthik Kashinath, and Anima Anandkumar. FourCastNet: Accel- erating global high-resolution weather forecasting using adaptive Fourier neural operators. In Proceedings of the Platform for Advanced Scientific Computing Conference (PASC ’23), D...

work page 2023
[29]

Zhongyi Jiang, Min Zhu, and Lu Lu. Fourier-MIONet: Fourier-enhanced multiple-input neural operators for multiphase modeling of geological carbon sequestration.Reliability Engineering & System Safety, 251:110392, 2024

work page 2024
[30]

Trayanova, and Mauro Mag- gioni

Minglang Yin, Nicolas Charon, Ryan Brody, Lu Lu, Natalia A. Trayanova, and Mauro Mag- gioni. A scalable framework for learning the geometry-dependent solution operators of partial differential equations.Nature Computational Science, 4(12):928–940, 2024

work page 2024
[31]

Conformalized- DeepONet: A distribution-free framework for uncertainty quantification in deep operator networks.Physica D: Nonlinear Phenomena, 471:134418, 2025

Christian Moya, Amirhossein Mollaali, Zecheng Zhang, Lu Lu, and Guang Lin. Conformalized- DeepONet: A distribution-free framework for uncertainty quantification in deep operator networks.Physica D: Nonlinear Phenomena, 471:134418, 2025

work page 2025
[32]

Poseidon: Efficient foundation models for PDEs

Maximilian Herde, Bogdan Raoni´ c, Tobias Rohner, Roger K¨ appeli, Roberto Molinaro, Em- manuel de B´ ezenac, and Siddhartha Mishra. Poseidon: Efficient foundation models for PDEs. InAdvances in Neural Information Processing Systems (NeurIPS), volume 37, 2024

work page 2024
[33]

LeMON: Learning to learn multi-operator networks, 2024

Jingmin Sun, Zecheng Zhang, and Hayden Schaeffer. LeMON: Learning to learn multi-operator networks, 2024

work page 2024
[34]

Battaglia, Jessica B

Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zam- baldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierst...

work page 2018
[35]

Battaglia

Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Peter W. Battaglia. Learning to simulate complex physics with graph networks. InInternational Conference on Machine Learning (ICML), volume 119, pages 8459–8468. PMLR, 2020

work page 2020
[36]

Battaglia

Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W. Battaglia. Learning mesh- based simulation with graph networks. InInternational Conference on Learning Representations (ICLR), 2021. Outstanding Paper Award

work page 2021
[37]

Message passing neural PDE solvers

Johannes Brandstetter, Daniel Worrall, and Max Welling. Message passing neural PDE solvers. InInternational Conference on Learning Representations (ICLR), 2022. Spotlight

work page 2022
[38]

Neural operator: Graph kernel network for partial differential equations, 2020

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Graph kernel network for partial differential equations, 2020

work page 2020
[39]

Multipole graph neural operator for parametric partial differential equations

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Multipole graph neural operator for parametric partial differential equations. InAdvances in Neural Information Processing Systems (NeurIPS), volume 33, pages 6755–6766, 2020. 17

work page 2020
[40]

RIGNO: A graph-based framework for robust and accurate operator learning for PDEs on arbitrary domains

Sepehr Mousavi, Shizheng Wen, Levi Lingsch, Maximilian Herde, Bogdan Raoni´ c, and Sid- dhartha Mishra. RIGNO: A graph-based framework for robust and accurate operator learning for PDEs on arbitrary domains. InAdvances in Neural Information Processing Systems (NeurIPS), volume 38, 2025

work page 2025
[41]

Zhang, Siting Liu, Stanley J

Benjamin J. Zhang, Siting Liu, Stanley J. Osher, and Markos A. Katsoulakis. Probabilistic operator learning: generative modeling and uncertainty quantification for foundation models of differential equations, 2025

work page 2025
[42]

Osher, and Georg Menz

Tingwei Meng, Moritz Voß, Nils Detering, Giulio Farolfi, Stanley J. Osher, and Georg Menz. Solving optimal execution problems via in-context operator networks, 2025

work page 2025
[43]

In-context operator learning on the space of probability measures, 2026

Frank Cole, Dixi Wang, Yineng Chen, Yulong Lu, and Rongjie Lai. In-context operator learning on the space of probability measures, 2026

work page 2026
[44]

Benjamin Erichson, Kush Bhatia, Michael W

Jerry Weihong Liu, N. Benjamin Erichson, Kush Bhatia, Michael W. Mahoney, and Christopher Re. Does in-context operator learning generalize to domain-shifted settings? InNeurIPS 2023 Workshop on the Symbiosis of Deep Learning and Differential Equations (DLDE III), 2023

work page 2023
[45]

In-context learning of linear systems: Generalization theory and applications to operator learning, 2024

Frank Cole, Yulong Lu, Wuzhe Xu, and Tianhao Zhang. In-context learning of linear systems: Generalization theory and applications to operator learning, 2024

work page 2024
[46]

Continuum transformers perform in-context learning by operator gradient descent, 2025

Abhiti Mishra, Yash Patel, and Ambuj Tewari. Continuum transformers perform in-context learning by operator gradient descent, 2025

work page 2025
[47]

Prose: Predicting operators and symbolic expressions using multimodal transformers.Neural Networks, 180:106707, 2024

Yuxuan Liu, Zecheng Zhang, and Hayden Schaeffer. Prose: Predicting operators and symbolic expressions using multimodal transformers.Neural Networks, 180:106707, 2024

work page 2024
[48]

PROSE-FD: A multimodal PDE foundation model for learning multiple operators for forecasting fluid dynamics

Yuxuan Liu, Jingmin Sun, Xinjie He, Griffin Pinney, Zecheng Zhang, and Hayden Schaeffer. PROSE-FD: A multimodal PDE foundation model for learning multiple operators for forecasting fluid dynamics. InNeurIPS 2024 Workshop on Foundation Models for Science (FM4Science), 2024

work page 2024
[49]

Data-efficient operator learning via unsupervised pretraining and in-context learning

Wuyang Chen, Jialin Song, Pu Ren, Shashank Subramanian, Dmitriy Morozov, and Michael Mahoney. Data-efficient operator learning via unsupervised pretraining and in-context learning. InAdvances in Neural Information Processing Systems (NeurIPS), volume 37, 2024

work page 2024
[50]

Zebra: In-context and generative pretraining for solving parametric PDEs

Louis Serrano, Armand Kassa¨ ı Koupa¨ ı, Thomas X Wang, Pierre Erbacher, and Patrick Gallinari. Zebra: In-context and generative pretraining for solving parametric PDEs. InInternational Conference on Machine Learning (ICML). PMLR, 2025

work page 2025
[51]

ENMA: Tokenwise autoregression for continuous neural PDE operators

Armand Kassa¨ ı Koupa¨ ı, Lise Le Boudec, Louis Serrano, and Patrick Gallinari. ENMA: Tokenwise autoregression for continuous neural PDE operators. InAdvances in Neural Information Processing Systems (NeurIPS), volume 38, 2025

work page 2025
[52]

The Faiss library.IEEE Transactions on Big Data, 2025

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre- Emmanuel Mazar´ e, Maria Lomeli, Lucas Hosseini, and Herv´ e J´ egou. The Faiss library.IEEE Transactions on Big Data, 2025

work page 2025
[53]

PCDCNet: A surrogate model for air quality forecasting with physical-chemical dynamics and constraints, 2025

Shuo Wang, Yun Cheng, Qingye Meng, Olga Saukh, Jiang Zhang, Jingfang Fan, Yuanting Zhang, Xingyuan Yuan, and Lothar Thiele. PCDCNet: A surrogate model for air quality forecasting with physical-chemical dynamics and constraints, 2025. 18

work page 2025
[54]

May, Kevin H

Ryan M. May, Kevin H. Goebbert, Jonathan E. Thielen, John R. Leeman, M. Drew Camron, Zachary Bruick, Eric C. Bruning, Russell P. Manser, Sean C. Arms, and Patrick T. Marsh. MetPy: A meteorological python library for data analysis and visualization.Bulletin of the American Meteorological Society, 103(10):E2273–E2284, 2022

work page 2022
[55]

Root mean square layer normalization

Biao Zhang and Rico Sennrich. Root mean square layer normalization. InAdvances in Neural Information Processing Systems (NeurIPS), volume 32, 2019

work page 2019
[56]

Muon: An optimizer for hidden layers in neural networks

Keller Jordan, Yuchen Jin, Vlado Boza, Jiacheng You, Franz Cesista, Laker Newhouse, and Jeremy Bernstein. Muon: An optimizer for hidden layers in neural networks. https://github. com/KellerJordan/Muon, 2024. 19 A Positional Encoding Algorithms Algorithm 1Positional Encoding for GICON Require:Hidden statesH∈R |V|×(2k+1)×d, number of attention headsH — Inte...

work page 2024