Recognition: 2 theorem links
· Lean TheoremDiscovering Physical Directions in Weight Space: Composing Neural PDE Experts
Pith reviewed 2026-05-15 01:59 UTC · model grok-4.3
The pith
Fine-tuning endpoint experts on a shared neural PDE operator reveals a reusable physical direction in weight space for training-free regime composition.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Starting from a shared family anchor, fine-tuning to low- and high-regime endpoints separates the resulting weight updates into a family-shared adaptation and a direction aligned with the physical parameter. Endpoint experts therefore function as finite-difference probes of a local physical direction in weight space. This perspective motivates Calibration-Conditioned Merge, which infers a composition coordinate from physical metadata, a calibrated mapping, or a short observed rollout prefix and deploys a single merged checkpoint for the remaining rollout. On the evaluated benchmarks the method reduces out-of-distribution rollout error relative to the family anchor by 54.2 percent, 42.8 per-
What carries the argument
Calibration-Conditioned Merge (CCM), a post-hoc coordinate readout that composes neural PDE experts along the discovered physical direction in weight space using metadata or a rollout prefix.
If this is right
- Static averaging of endpoint experts attenuates regime-specific physics and yields higher error than direction-aware merging.
- A single merged checkpoint suffices for the full rollout once the composition coordinate is inferred from metadata or a short prefix.
- Error reductions are largest in extrapolative regimes lying outside the fine-tuned endpoints.
- The physical direction remains consistent when the underlying operator is scaled or replaced by a DPOT-style backbone.
- Endpoint fine-tuning produces reusable structure rather than isolated regime experts.
Where Pith is reading between the lines
- If the direction proves approximately linear, the same separation could support zero-shot adaptation to continuous physical parameters never seen during fine-tuning.
- Analogous directions may exist in other continuous-attribute domains such as vision models conditioned on scale or physics-informed language models.
- Extending the approach to three-dimensional or coupled multiphysics systems would test whether the separation survives increased complexity.
Load-bearing premise
The observed separation of weight updates into a family-shared part and a physical-parameter-aligned direction is stable across fine-tuning procedures and generalizes beyond the tested regimes.
What would settle it
If the vector difference between high- and low-regime fine-tuned weights, after removal of the shared adaptation component, fails to produce accurate merged predictions for an unseen intermediate physical parameter when used as the CCM direction, while independent retraining succeeds.
Figures
read the original abstract
Recent advances in neural operators have made partial differential equation (PDE) surrogate modeling increasingly scalable and transferable through large-scale pretraining and in-context adaptation. However, after a shared operator is fine-tuned to multiple regimes within a continuous physical family, it remains unclear whether the resulting weight-space updates merely form isolated regime experts or reveal reusable physical structure. Starting from a shared family anchor, we fine-tune low- and high-regime endpoint experts and show that their updates can be separated into a family-shared adaptation and a direction aligned with the underlying physical parameter. This separation reinterprets endpoint experts as finite-difference probes of a local physical direction in weight space, explaining why static averaging can interpolate between regimes but attenuates endpoint-specific physics. Building on this perspective, we propose Calibration-Conditioned Merge (CCM), a post-hoc coordinate readout method for composing neural PDE experts along this physical direction. Given physical metadata, a calibrated coordinate mapping, or a short observed rollout prefix, CCM infers the target composition coordinate and deploys a single merged checkpoint for the remaining rollout. We evaluate CCM on the reaction--diffusion system, viscosity-parameterized two-dimensional Navier--Stokes equations, and radial dam-break dynamics. Across these benchmarks, CCM achieves its strongest gains in extrapolative regimes, reducing out-of-distribution rollout error relative to the family anchor by 54.2%, 42.8%, and 13.8%, respectively. Further experiments across FNO scales, a DPOT-style backbone, and ablations confirm that endpoint fine-tuning is not arbitrary checkpoint drift, but reveals a calibratable physical direction for training-free transfer across PDE regimes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that fine-tuning a shared neural operator to low- and high-regime endpoints within a continuous physical family separates weight updates into a family-shared adaptation component and a direction aligned with the underlying physical parameter. This reinterpretation motivates the Calibration-Conditioned Merge (CCM) method, which uses physical metadata, a calibrated coordinate mapping, or a short rollout prefix to infer a composition coordinate and deploy a merged checkpoint for training-free transfer. Evaluations on reaction-diffusion, viscosity-parameterized 2D Navier-Stokes, and radial dam-break dynamics report out-of-distribution rollout error reductions of 54.2%, 42.8%, and 13.8% relative to the family anchor, with further ablations across FNO scales and a DPOT-style backbone.
Significance. If the claimed physical direction proves robust rather than procedure-dependent, the work could offer a principled mechanism for composing neural PDE experts along continuous physical parameters, enabling efficient extrapolation without retraining. The empirical gains in extrapolative regimes across three distinct benchmarks indicate potential practical value for scalable surrogate modeling, though the absence of controls for optimization details limits current confidence in the separation's stability.
major comments (2)
- [Abstract] The separation of updates into family-shared adaptation and physical direction is performed by taking the difference between low- and high-regime endpoint fine-tunings relative to the shared anchor (as described in the abstract). This vector is then treated as aligned with the physical parameter for CCM composition. For the claim to hold, the direction must be dominated by the parameter change and insensitive to optimization details, yet the abstract reports ablations only across backbones and scales with no explicit controls varying learning rate, step count, optimizer, or anchor perturbation.
- [Experiments] The reported error reductions (54.2%, 42.8%, 13.8%) are presented without error bars, details on data splits, or ablation controls for the CCM method versus the family anchor. This makes it difficult to assess whether the gains are statistically reliable or sensitive to the specific fine-tuning trajectories used to discover the direction.
minor comments (1)
- [Abstract] The abstract refers to 'a calibrated coordinate mapping' and 'short observed rollout prefix' for inferring the composition coordinate, but the precise functional form of the readout and how it is fitted from metadata or prefix data is not equationally specified in the provided summary.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which helps clarify the robustness of the claimed physical direction and the statistical presentation of results. We address each major comment below and commit to revisions that strengthen the evidence without altering the core claims.
read point-by-point responses
-
Referee: [Abstract] The separation of updates into family-shared adaptation and physical direction is performed by taking the difference between low- and high-regime endpoint fine-tunings relative to the shared anchor (as described in the abstract). This vector is then treated as aligned with the physical parameter for CCM composition. For the claim to hold, the direction must be dominated by the parameter change and insensitive to optimization details, yet the abstract reports ablations only across backbones and scales with no explicit controls varying learning rate, step count, optimizer, or anchor perturbation.
Authors: We agree that explicit controls for optimization details would strengthen the interpretation that the discovered direction is dominated by the physical parameter rather than fine-tuning procedure. The existing ablations across FNO scales and a DPOT-style backbone already show consistency, but they do not vary learning rate, step count, optimizer, or anchor initialization. In the revised manuscript we will add a dedicated ablation subsection that systematically varies these factors on at least one benchmark and reports the resulting direction stability (measured by cosine similarity to the original direction and downstream CCM error). revision: yes
-
Referee: [Experiments] The reported error reductions (54.2%, 42.8%, 13.8%) are presented without error bars, details on data splits, or ablation controls for the CCM method versus the family anchor. This makes it difficult to assess whether the gains are statistically reliable or sensitive to the specific fine-tuning trajectories used to discover the direction.
Authors: We acknowledge that the current manuscript lacks error bars, explicit data-split descriptions, and direct statistical comparisons of CCM against the family anchor. In the revision we will (i) recompute all reported rollout errors over at least five independent random seeds and include standard-error bars, (ii) add a table specifying train/validation/test splits and trajectory counts for each benchmark, and (iii) include an ablation table that directly contrasts CCM against the anchor with paired statistical tests (e.g., Wilcoxon signed-rank) to quantify significance of the observed gains. revision: yes
Circularity Check
No significant circularity: empirical discovery of weight-space directions remains self-contained
full rationale
The paper presents an empirical procedure: fine-tune endpoint experts from a shared anchor, observe that their difference vector separates family-shared adaptation from a direction that empirically aligns with the physical parameter, then deploy CCM as a post-hoc linear composition along that observed vector. No equation or derivation reduces the claimed physical direction to a fitted quantity defined from the same evaluation data by construction; the alignment is tested via rollout error on held-out and extrapolative regimes rather than being tautological. Self-citations are not load-bearing for the central claim, and no ansatz or uniqueness theorem is smuggled in to force the result. The method is therefore a coordinate readout on an independently observed vector, not a self-referential loop.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
endpoint–anchor residuals define a shared adaptation Δ+ and a signed physical direction Δ− ... θ(α) = θ0 + Δ+ + αΔ−
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
finite-difference probes of a local physical direction in weight space
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Fourier Neural Operator for Parametric Partial Differential Equations
Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differen- tial equations.arXiv preprint arXiv:2010.08895, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[2]
Neural operators for accelerating scientific simulations and design
Kamyar Azizzadenesheli, Nikola Kovachki, Zongyi Li, Miguel Liu-Schiaffini, Jean Kossaifi, and Anima Anandkumar. Neural operators for accelerating scientific simulations and design. Nature Reviews Physics, 6(5):320–328, 2024
work page 2024
-
[3]
Gnot: A general neural operator transformer for operator learning
Zhongkai Hao, Zhengyi Wang, Hang Su, Chengyang Ying, Yinpeng Dong, Songming Liu, Ze Cheng, Jian Song, and Jun Zhu. Gnot: A general neural operator transformer for operator learning. InInternational conference on machine learning, pages 12556–12569. PMLR, 2023
work page 2023
-
[4]
Qianying Cao, Somdatta Goswami, and George Em Karniadakis. Laplace neural operator for solving differential equations.Nature Machine Intelligence, 6(6):631–640, 2024
work page 2024
-
[5]
Neural Operator: Graph Kernel Network for Partial Differential Equations
Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Graph kernel network for partial differential equations.arXiv preprint arXiv:2003.03485, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2003
-
[6]
Jianwei Zheng, Wei Li, Ni Xu, Junwei Zhu, Xiaoxu Lin, and Xiaoqin Zhang. Alias-free mamba neural operator.Advances in Neural Information Processing Systems, 37:52962–52995, 2024
work page 2024
-
[7]
Pengwei Liu, Pengkai Wang, Xingyu Ren, Hangjie Yuan, Zhongkai Hao, Chao Xu, Shengze Cai, and Dong Ni. Aerogto: An efficient graph-transformer operator for learning large-scale aerodynamics of 3d vehicle geometries. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 18924–18932, 2025
work page 2025
-
[8]
Pengwei Liu, Xingyu Ren, Pengkai Wang, Hangjie Yuan, Zhongkai Hao, Guanyu Chen, Chao Xu, Dong Ni, and Shengze Cai. An efficient graph-transformer operator for learning physical dynamics with manifolds embedding.arXiv preprint arXiv:2512.10227, 2025
-
[9]
Xingyu Ren, Pengkai Wang, Pengwei Liu, Xihang Yue, Huanshuo Dong, Zhenxin Huang, Zhongkai Hao, Ziqian Hu, Zhen Huang, Yian Wang, et al. Foundation neural operators: A survey on pretraining methods, the data ecosystem, and efficient adaptation. 2026
work page 2026
-
[10]
Uncertainty- informed meta pseudo labeling for surrogate modeling with limited labeled data
Xingyu Ren, Pengwei Liu, Pengkai Wang, Guanyu Chen, Qinxin Wu, and Dong Ni. Uncertainty- informed meta pseudo labeling for surrogate modeling with limited labeled data. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[11]
Thorsten Kurth, Shashank Subramanian, Peter Harrington, Jaideep Pathak, Morteza Mardani, David Hall, Andrea Miele, Karthik Kashinath, and Anima Anandkumar. Fourcastnet: Accel- erating global high-resolution weather forecasting using adaptive fourier neural operators. In Proceedings of the platform for advanced scientific computing conference, pages 1–11, 2023
work page 2023
-
[12]
Learning skillful medium-range global weather forecasting.Science, 382(6677):1416–1421, 2023
Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, et al. Learning skillful medium-range global weather forecasting.Science, 382(6677):1416–1421, 2023
work page 2023
-
[13]
Learning nonlinear operators via deeponet based on the universal approximation theorem of operators
Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature machine intelligence, 3(3):218–229, 2021
work page 2021
-
[14]
Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces with applications to pdes.Journal of Machine Learning Research, 24(89):1–97, 2023
work page 2023
-
[15]
Dpot: auto-regressive denoising operator transformer for large-scale pde pre-training
Zhongkai Hao, Chang Su, Songming Liu, Julius Berner, Chengyang Ying, Hang Su, Anima Anandkumar, Jian Song, and Jun Zhu. Dpot: auto-regressive denoising operator transformer for large-scale pde pre-training. InProceedings of the 41st International Conference on Machine Learning, pages 17616–17635, 2024. 10
work page 2024
-
[16]
Poseidon: Efficient foundation models for pdes
Maximilian Herde, Bogdan Raoni ´c, Tobias Rohner, Roger Käppeli, Roberto Molinaro, Em- manuel De Bezenac, and Siddhartha Mishra. Poseidon: Efficient foundation models for pdes. Advances in Neural Information Processing Systems, 37:72525–72624, 2024
work page 2024
-
[17]
Ashiqur Rahman, Robert J George, Mogab Elleithy, Daniel Leibovici, Zongyi Li, Boris Bonev, Colin White, Julius Berner, Raymond A Yeh, Jean Kossaifi, et al. Pretraining codomain attention neural operators for solving multiphysics pdes.Advances in Neural Information Processing Systems, 37:104035–104064, 2024
work page 2024
-
[18]
Michael McCabe, Bruno Régaldo-Saint Blancard, Liam Parker, Ruben Ohana, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Siavash Golkar, Geraud Krawezik, Francois Lanusse, et al. Multiple physics pretraining for spatiotemporal surrogate models.Advances in Neural Information Processing Systems, 37:119301–119335, 2024
work page 2024
-
[19]
Michael McCabe, Payel Mukhopadhyay, Tanya Marwah, Bruno Regaldo-Saint Blancard, Fran- cois Rozet, Cristiana Diaconu, Lucas Meyer, Kaze WK Wong, Hadi Sotoudeh, Alberto Bietti, et al. Walrus: A cross-domain foundation model for continuum dynamics.arXiv preprint arXiv:2511.15684, 2025
-
[20]
Neural general circulation models for weather and climate.Nature, 632(8027):1060–1066, 2024
Dmitrii Kochkov, Janni Yuval, Ian Langmore, Peter Norgaard, Jamie Smith, Griffin Mooers, Milan Klöwer, James Lottes, Stephan Rasp, Peter Düben, et al. Neural general circulation models for weather and climate.Nature, 632(8027):1060–1066, 2024
work page 2024
-
[21]
A foundation model for the earth system.Nature, 641(8065):1180–1187, 2025
Cristian Bodnar, Wessel P Bruinsma, Ana Lucic, Megan Stanley, Anna Allen, Johannes Brand- stetter, Patrick Garvan, Maik Riechert, Jonathan A Weyn, Haiyu Dong, et al. A foundation model for the earth system.Nature, 641(8065):1180–1187, 2025
work page 2025
-
[22]
Makoto Takamoto, Timothy Praditia, Raphael Leiteritz, Daniel MacKinlay, Francesco Alesiani, Dirk Pflüger, and Mathias Niepert. Pdebench: An extensive benchmark for scientific machine learning.Advances in neural information processing systems, 35:1596–1611, 2022
work page 2022
-
[23]
Ruben Ohana, Michael McCabe, Lucas Meyer, Rudy Morel, Fruzsina J Agocs, Miguel Beneitez, Marsha Berger, Blakesley Burkhart, Stuart B Dalziel, Drummond B Fielding, et al. The well: a large-scale collection of diverse physics simulations for machine learning.Advances in Neural Information Processing Systems, 37:44989–45037, 2024
work page 2024
-
[24]
Felix Koehler, Simon Niedermayr, Rüdiger Westermann, and Nils Thuerey. Apebench: A benchmark for autoregressive neural emulators of pdes.Advances in Neural Information Processing Systems, 37:120252–120310, 2024
work page 2024
-
[25]
Learning mesh-based simulation with graph networks.arXiv preprint arXiv:2010.03409, 2020
Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W Battaglia. Learning mesh-based simulation with graph networks.arXiv preprint arXiv:2010.03409, 2020
-
[26]
Oussama Boussif, Yoshua Bengio, Loubna Benabbou, and Dan Assouline. Magnet: Mesh agnostic neural pde solver.Advances in Neural Information Processing Systems, 35:31972– 31985, 2022
work page 2022
-
[27]
Somdatta Goswami, Katiana Kontolati, Michael D Shields, and George Em Karniadakis. Deep transfer operator learning for partial differential equations under conditional shift.Nature Machine Intelligence, 4(12):1155–1164, 2022
work page 2022
-
[28]
Zongyi Li, Daniel Zhengyu Huang, Burigede Liu, and Anima Anandkumar. Fourier neural operator with learned deformations for pdes on general geometries.Journal of Machine Learning Research, 24(388):1–26, 2023
work page 2023
-
[29]
Zongyi Li, Nikola Kovachki, Chris Choy, Boyi Li, Jean Kossaifi, Shourya Otta, Moham- mad Amin Nabian, Maximilian Stadler, Christian Hundt, Kamyar Azizzadenesheli, et al. Geometry-informed neural operator for large-scale 3d pdes.Advances in Neural Information Processing Systems, 36:35836–35854, 2023
work page 2023
-
[30]
Jacob Helwig, Xuan Zhang, Cong Fu, Jerry Kurtin, Stephan Wojtowytsch, and Shuiwang Ji. Group equivariant fourier neural operators for partial differential equations.arXiv preprint arXiv:2306.05697, 2023. 11
-
[31]
Tian Wang and Chuang Wang. Latent neural operator for solving forward and inverse pde problems.Advances in Neural Information Processing Systems, 37:33085–33107, 2024
work page 2024
-
[32]
Minglang Yin, Nicolas Charon, Ryan Brody, Lu Lu, Natalia Trayanova, and Mauro Maggioni. A scalable framework for learning the geometry-dependent solution operators of partial differential equations.Nature computational science, 4(12):928–940, 2024
work page 2024
-
[33]
Zhiwei Zhao, Changqing Liu, Yingguang Li, Zhibin Chen, and Xu Liu. Diffeomorphism neural operator for various domains and parameters of partial differential equations.Communications Physics, 8(1):15, 2025
work page 2025
-
[34]
Transolver: A Fast Transformer Solver for PDEs on General Geometries
Haixu Wu, Huakun Luo, Haowen Wang, Jianmin Wang, and Mingsheng Long. Transolver: A fast transformer solver for pdes on general geometries.arXiv preprint arXiv:2402.02366, 2024
work page internal anchor Pith review arXiv 2024
-
[35]
Mitchell Wortsman, Gabriel Ilharco, Samir Ya Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, et al. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. InInternational conference on machine learning, pages 23965–23998. P...
work page 2022
-
[36]
Merging models with fisher-weighted averaging
Michael S Matena and Colin A Raffel. Merging models with fisher-weighted averaging. Advances in Neural Information Processing Systems, 35:17703–17716, 2022
work page 2022
-
[37]
Editing Models with Task Arithmetic
Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic.arXiv preprint arXiv:2212.04089, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[38]
Prateek Yadav, Derek Tam, Leshem Choshen, Colin A Raffel, and Mohit Bansal. Ties-merging: Resolving interference when merging models.Advances in neural information processing systems, 36:7093–7115, 2023
work page 2023
-
[39]
Adamerging: Adaptive model merging for multi-task learning.arXiv preprint arXiv:2310.02575,
Enneng Yang, Zhenyi Wang, Li Shen, Shiwei Liu, Guibing Guo, Xingwei Wang, and Dacheng Tao. Adamerging: Adaptive model merging for multi-task learning.arXiv preprint arXiv:2310.02575, 2023
-
[40]
Language models are super mario: Absorbing abilities from homologous models as a free lunch
Le Yu, Bowen Yu, Haiyang Yu, Fei Huang, and Yongbin Li. Language models are super mario: Absorbing abilities from homologous models as a free lunch. InForty-first International Conference on Machine Learning, 2024
work page 2024
-
[41]
Model Merging Scaling Laws in Large Language Models
Yuanyi Wang, Yanggan Gu, Yiming Zhang, Qi Zhou, Zhaoyi Yan, Congkai Xie, Xinyao Wang, Jianbo Yuan, and Hongxia Yang. Model merging scaling laws in large language models.arXiv preprint arXiv:2509.24244, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[42]
Yanggan Gu, Yuanyi Wang, Zhaoyi Yan, Yiming Zhang, Qi Zhou, Fei Wu, and Hongxia Yang. Infifpo: Implicit model fusion via preference optimization in large language models.arXiv preprint arXiv:2505.13878, 2025
-
[43]
Infigfusion: Graph-on-logits distillation via efficient gromov-wasserstein for model fusion
Yuanyi Wang, Zhaoyi Yan, Yiming Zhang, Qi Zhou, Yanggan Gu, Fei Wu, and Hongxia Yang. Infigfusion: Graph-on-logits distillation via efficient gromov-wasserstein for model fusion. arXiv preprint arXiv:2505.13893, 2025
-
[44]
Gaurav Gupta, Xiongye Xiao, and Paul Bogdan. Multiwavelet-based operator learning for differential equations.Advances in neural information processing systems, 34:24048–24062, 2021
work page 2021
-
[45]
Shuhao Cao. Choose a transformer: Fourier or galerkin.Advances in neural information processing systems, 34:24924–24940, 2021
work page 2021
-
[46]
Factorized fourier neural operators.arXiv preprint arXiv:2111.13802, 2021
Alasdair Tran, Alexander Mathews, Lexing Xie, and Cheng Soon Ong. Factorized fourier neural operators.arXiv preprint arXiv:2111.13802, 2021
-
[47]
Message passing neural pde solvers
Johannes Brandstetter, Daniel Worrall, and Max Welling. Message passing neural pde solvers. arXiv preprint arXiv:2202.03376, 2022. 12
-
[48]
Juncai He, Xinliang Liu, and Jinchao Xu. Mgno: Efficient parameterization of linear operators via multigrid.arXiv preprint arXiv:2310.19809, 2023
-
[49]
Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, and Qi Tian. Accurate medium-range global weather forecasting with 3d neural networks.Nature, 619(7970):533–538, 2023
work page 2023
-
[50]
End- to-end data-driven weather prediction.Nature, 641(8065):1172–1179, 2025
Anna Allen, Stratis Markou, Will Tebbutt, James Requeima, Wessel P Bruinsma, Tom R Andersson, Michael Herzog, Nicholas D Lane, Matthew Chantry, J Scott Hosking, et al. End- to-end data-driven weather prediction.Nature, 641(8065):1172–1179, 2025
work page 2025
-
[51]
Mpg: An efficient multi-scale point- based gnn for non-uniform meshes
Qinxin Wu, Pengwei Liu, Xingyu Ren, and Dong Ni. Mpg: An efficient multi-scale point- based gnn for non-uniform meshes. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 3–18. Springer, 2025
work page 2025
-
[52]
Liu Yang, Siting Liu, Tingwei Meng, and Stanley J Osher. In-context operator learning with data prompts for differential equation problems.Proceedings of the National Academy of Sciences, 120(39):e2310142120, 2023
work page 2023
-
[53]
Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022
work page 2022
-
[54]
Sidak Pal Singh and Martin Jaggi. Model fusion via optimal transport.Advances in Neural Information Processing Systems, 33:22045–22055, 2020
work page 2020
-
[55]
K., Hayase, J., and Srinivasa, S
Samuel K Ainsworth, Jonathan Hayase, and Siddhartha Srinivasa. Git re-basin: Merging models modulo permutation symmetries.arXiv preprint arXiv:2209.04836, 2022
-
[56]
Guillermo Ortiz-Jimenez, Alessandro Favero, and Pascal Frossard. Task arithmetic in the tangent space: Improved editing of pre-trained models.Advances in Neural Information Processing Systems, 36:66727–66754, 2023
work page 2023
-
[57]
Chenyu Huang, Peng Ye, Tao Chen, Tong He, Xiangyu Yue, and Wanli Ouyang. Emr-merging: Tuning-free high-performance model merging.Advances in Neural Information Processing Systems, 37:122741–122769, 2024
work page 2024
-
[58]
Kunda Yan, Min Zhang, Sen Cui, Zikun Qu, Bo Jiang, Feng Liu, and Changshui Zhang. Calm: Consensus-aware localized merging for multi-task learning.arXiv preprint arXiv:2506.13406, 2025
-
[59]
Lu Li, Tianyu Zhang, Zhiqi Bu, Suyuchen Wang, Huan He, Jie Fu, Yonghui Wu, Jiang Bian, Yong Chen, and Yoshua Bengio. Map: Low-compute model merging with amortized pareto fronts via quadratic approximation.arXiv preprint arXiv:2406.07529, 2024. 13 Appendix Contents A Theory for Controlled PDE Merge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....
-
[60]
there is a solutionu(a, µ)withR(u(a, µ), a, µ) = 0; 3.D uR(u(a, µ), a, µ) :U → Zis invertible with uniformly bounded inverse
-
[61]
the first and second derivatives of R entering the sensitivity equations are bounded by a ρ-square-integrable envelope. 18 Thenµ7→ S µ ∈ H ρ isC 2. Moreover, sup µ ∂2 µµSµ ρ <∞. Consequently, forµ(s) =µ c +hs, d2 ds2 Sµ(s) =h 2∂2 µµSµ(s), K S ≲|h| 2. Proof. For each admissible input a, the implicit function theorem in Banach spaces ensures the local C2-re...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.