Composite Bayesian Optimization In Function Spaces Using NEON -- Neural Epistemic Operator Networks
Pith reviewed 2026-05-24 02:18 UTC · model grok-4.3
The pith
NEON uses one operator network to match deep ensembles in composite Bayesian optimization over function spaces while using orders of magnitude fewer parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NEON is an architecture for generating predictions with uncertainty using a single operator network backbone, which presents orders of magnitude less trainable parameters than deep ensembles of comparable performance. When applied to the problem of composite Bayesian optimization of f = g ∘ h, where h : X → C(𝒴, ℝ^{d_s}) is an unknown map outputting elements of a function space and g is a known cheap functional, NEON achieves state-of-the-art performance on toy and real-world scenarios.
What carries the argument
NEON (Neural Epistemic Operator Networks), a single operator network backbone that supplies epistemic uncertainty estimates to guide acquisition in composite Bayesian optimization over function spaces.
If this is right
- Composite Bayesian optimization over functional outputs becomes feasible with far lower memory and training cost.
- Operator learning models can supply the uncertainty needed for sequential decision making without maintaining multiple independent networks.
- The same backbone can be reused across multiple composite problems that share the same functional output space.
- Real-time or resource-constrained applications of function-space optimization become practical.
Where Pith is reading between the lines
- NEON-style single-backbone uncertainty could be tested on other sequential tasks that already use operator networks, such as control of PDE-governed systems.
- If the uncertainty quality generalizes, similar single-network designs might replace ensembles in related operator-learning settings that currently rely on them for calibration.
- The method invites direct comparison against other cheap uncertainty mechanisms, such as last-layer Laplace approximations, inside the same composite BO loop.
Load-bearing premise
A single operator network backbone can produce epistemic uncertainty estimates of quality comparable to deep ensembles for the purpose of guiding composite Bayesian optimization.
What would settle it
A controlled composite Bayesian optimization benchmark in which NEON-guided search reaches demonstrably worse final values than an ensemble baseline of matched predictive accuracy.
Figures
read the original abstract
Operator learning is a rising field of scientific computing where inputs or outputs of a machine learning model are functions defined in infinite-dimensional spaces. In this paper, we introduce NEON (Neural Epistemic Operator Networks), an architecture for generating predictions with uncertainty using a single operator network backbone, which presents orders of magnitude less trainable parameters than deep ensembles of comparable performance. We showcase the utility of this method for sequential decision-making by examining the problem of composite Bayesian Optimization (BO), where we aim to optimize a function $f=g\circ h$, where $h:X\to C(\mathcal{Y},\mathbb{R}^{d_s})$ is an unknown map which outputs elements of a function space, and $g: C(\mathcal{Y},\mathbb{R}^{d_s})\to \mathbb{R}$ is a known and cheap-to-compute functional. By comparing our approach to other state-of-the-art methods on toy and real world scenarios, we demonstrate that NEON achieves state-of-the-art performance while requiring orders of magnitude less trainable parameters.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces NEON (Neural Epistemic Operator Networks), an operator-learning architecture that produces epistemic uncertainty estimates from a single network backbone rather than an ensemble. The method is applied to composite Bayesian optimization of the form f = g ∘ h, where h maps to a function space and g is a known, cheap functional; experiments on toy problems and real-world tasks are reported to show state-of-the-art optimization performance while using orders of magnitude fewer trainable parameters than comparable deep ensembles.
Significance. If the experimental results hold, the work supplies a concrete, parameter-efficient mechanism for epistemic uncertainty in infinite-dimensional operator learning that directly supports sequential decision-making. The evaluation measures optimization regret rather than isolated predictive metrics, and the architecture description supplies an explicit route to uncertainty that is shown to be competitive with ensembles; these elements strengthen the central claim.
minor comments (2)
- The notation for the composite objective (h : X → C(Y, R^{d_s})) is introduced in the abstract but would benefit from an explicit reminder in the first paragraph of §3 when the BO acquisition functions are defined.
- Figure 2 caption states that NEON uses 'a single backbone'; a one-sentence clarification of how the epistemic head is attached without increasing the parameter count relative to a deterministic operator network would improve readability.
Simulated Author's Rebuttal
We thank the referee for their thorough review, positive assessment of the significance of NEON for uncertainty-aware operator learning in composite Bayesian optimization, and recommendation to accept the manuscript.
Circularity Check
No significant circularity detected
full rationale
The paper's central claims rest on introducing the NEON architecture for epistemic uncertainty via a single operator-network backbone and then validating its utility for composite Bayesian optimization through direct empirical comparisons against SOTA baselines on toy and real-world tasks. These comparisons measure optimization performance (not merely internal predictive metrics) and are independent of any self-referential definitions, parameter fits renamed as predictions, or load-bearing self-citations. No equations or architectural choices in the provided description reduce by construction to the target results; the derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, 2006
work page 2006
-
[2]
Bayesian neural networks: An introduction and survey
Ethan Goan and Clinton Fookes. Bayesian neural networks: An introduction and survey. In Case Studies in Applied Bayesian Data Science, pages 45–87. Springer International Publishing, 2020
work page 2020
-
[3]
Simple and scalable predictive uncertainty estimation using deep ensembles, 2016
Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles, 2016
work page 2016
-
[4]
Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators
Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, mar 2021. L. Ferreira Guilhoto & P. Pedikaris 10 A Preprint - April 5, 2024 Composite Bayesian Optimization In Function Spaces Using N...
work page 2021
-
[5]
Fourier neural operator for parametric partial differential equations, 2021
Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations, 2021
work page 2021
-
[6]
Neural operator: Learning maps between function spaces with applications to pdes
Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces with applications to pdes. Journal of Machine Learning Research, 24(89):1–97, 2023
work page 2023
-
[7]
Learning operators with coupled attention
Georgios Kissas, Jacob H Seidman, Leonardo Ferreira Guilhoto, Victor M Preciado, George J Pappas, and Paris Perdikaris. Learning operators with coupled attention. The Journal of Machine Learning Research, 23(1):9636– 9698, 2022
work page 2022
-
[8]
Sifan Wang, Hanwen Wang, and Paris Perdikaris. Learning the solution operator of parametric partial differential equations with physics-informed deeponets. Science advances, 7(40):eabi8605, 2021
work page 2021
-
[9]
Improved architectures and training algorithms for deep operator networks
Sifan Wang, Hanwen Wang, and Paris Perdikaris. Improved architectures and training algorithms for deep operator networks. Journal of Scientific Computing, 92(2):35, 2022
work page 2022
-
[10]
Scalable uncertainty quantification for deep operator networks using randomized priors
Yibo Yang, Georgios Kissas, and Paris Perdikaris. Scalable uncertainty quantification for deep operator networks using randomized priors. Computer Methods in Applied Mechanics and Engineering, 399:115399, 2022
work page 2022
-
[11]
Apostolos F Psaros, Xuhui Meng, Zongren Zou, Ling Guo, and George Em Karniadakis. Uncertainty quantification in scientific machine learning: Methods, metrics, and comparisons.Journal of Computational Physics, 477:111902, 2023
work page 2023
-
[12]
Angelos Filos, Sebastian Farquhar, Aidan N. Gomez, Tim G. J. Rudner, Zachary Kenton, Lewis Smith, Milad Alizadeh, Arnoud de Kroon, and Yarin Gal. A systematic comparison of bayesian deep learning robustness in diabetic retinopathy tasks, 2019
work page 2019
-
[13]
Andre Esteva, Brett Kuprel, Roberto A. Novoa, Justin Ko, Susan M. Swetter, Helen M. Blau, and Sebastian Thrun. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639):115–118, January 2017
work page 2017
-
[14]
Autonomous driving with deep learning: A survey of state-of-art technologies, 2020
Yu Huang and Yue Chen. Autonomous driving with deep learning: A survey of state-of-art technologies, 2020
work page 2020
-
[15]
Bayesian active learning for classification and preference learning, 2011
Neil Houlsby, Ferenc Huszár, Zoubin Ghahramani, and Máté Lengyel. Bayesian active learning for classification and preference learning, 2011
work page 2011
-
[16]
Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning, 2019
Andreas Kirsch, Joost van Amersfoort, and Yarin Gal. Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning, 2019
work page 2019
-
[17]
Ian Osband, Zheng Wen, Mohammad Asghari, Morteza Ibrahimi, Xiyuan Lu, and Benjamin Van Roy. Epistemic neural networks. CoRR, abs/2107.08924, 2021
-
[18]
Recent advances in bayesian optimization, 2022
Xilu Wang, Yaochu Jin, Sebastian Schmitt, and Markus Olhofer. Recent advances in bayesian optimization, 2022
work page 2022
-
[19]
Jiang, Samuel Daulton, Benjamin Letham, Andrew Gordon Wilson, and Eytan Bakshy
Maximilian Balandat, Brian Karrer, Daniel R. Jiang, Samuel Daulton, Benjamin Letham, Andrew Gordon Wilson, and Eytan Bakshy. Botorch: Programmable bayesian optimization in pytorch. CoRR, abs/1910.06403, 2019
-
[20]
Dropout as a bayesian approximation: Representing model uncertainty in deep learning, 2016
Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning, 2016
work page 2016
-
[21]
Seidman, Georgios Kissas, Paris Perdikaris, and George J
Jacob H. Seidman, Georgios Kissas, Paris Perdikaris, and George J. Pappas. Nomad: Nonlinear manifold decoders for operator learning, 2022
work page 2022
-
[22]
Scalable bayesian optimization with randomized prior networks
Mohamed Aziz Bhouri, Michael Joly, Robert Yu, Soumalya Sarkar, and Paris Perdikaris. Scalable bayesian optimization with randomized prior networks. Computer Methods in Applied Mechanics and Engineering , 417:116428, 2023
work page 2023
-
[23]
Bayesian optimization with high-dimensional outputs
Wesley J Maddox, Maximilian Balandat, Andrew G Wilson, and Eytan Bakshy. Bayesian optimization with high-dimensional outputs. Advances in neural information processing systems, 34:19274–19287, 2021
work page 2021
-
[24]
Tianping Chen and Hong Chen. Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems. IEEE Transactions on Neural Networks, 6(4):911–917, 1995
work page 1995
-
[25]
Neural operator prediction of linear instability waves in high-speed boundary layers
Patricio Clark Di Leoni, Lu Lu, Charles Meneveau, George Em Karniadakis, and Tamer A Zaki. Neural operator prediction of linear instability waves in high-speed boundary layers. Journal of Computational Physics, 474:111793, 2023
work page 2023
-
[26]
Mionet: Learning multiple-input operators via tensor product, 2022
Pengzhan Jin, Shuai Meng, and Lu Lu. Mionet: Learning multiple-input operators via tensor product, 2022
work page 2022
-
[27]
Raul Astudillo and Peter I. Frazier. Bayesian optimization of composite functions, 2019. L. Ferreira Guilhoto & P. Pedikaris 11 A Preprint - April 5, 2024 Composite Bayesian Optimization In Function Spaces Using NEON - Neural Epistemic Operator Networks
work page 2019
-
[28]
Joint composite latent space bayesian optimization
Natalie Maus, Zhiyuan Jerry Lin, Maximilian Balandat, and Eytan Bakshy. Joint composite latent space bayesian optimization. arXiv preprint arXiv:2311.02213, 2023
-
[29]
Optimizing coverage and capacity in cellular networks using machine learning
Ryan M Dreifuerst, Samuel Daulton, Yuchen Qian, Paul Varkey, Maximilian Balandat, Sanjay Kasturia, Anoop Tomar, Ali Yazdan, Vish Ponnampalam, and Robert W Heath. Optimizing coverage and capacity in cellular networks using machine learning. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8138–814...
work page 2021
-
[30]
Deep learning for bayesian optimization of scientific problems with high-dimensional structure
Samuel Kim, Peter Y Lu, Charlotte Loh, Jamie Smith, Jasper Snoek, and Marin Solja ˇci´c. Deep learning for bayesian optimization of scientific problems with high-dimensional structure. Transactions on Machine Learning Research, 2022
work page 2022
-
[31]
Matthew Tancik, Pratul P. Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T. Barron, and Ren Ng. Fourier features let networks learn high frequency functions in low dimensional domains. NeurIPS, 2020
work page 2020
-
[32]
Matthews, Kwang Moo Yi, Gopal Sharma, Dmitry Lagun, and Andrea Tagliasacchi
Daniel Rebain, Mark J. Matthews, Kwang Moo Yi, Gopal Sharma, Dmitry Lagun, and Andrea Tagliasacchi. Attention beats concatenation for conditioning neural fields, 2022
work page 2022
-
[33]
On the difficulty of training Recurrent Neural Networks
Razvan Pascanu, Tomás Mikolov, and Yoshua Bengio. Understanding the exploding gradient problem.CoRR, abs/1211.5063, 2012
work page internal anchor Pith review Pith/arXiv arXiv 2012
-
[34]
Rectifier nonlinearities improve neural network acoustic models
Andrew L Maas, Awni Y Hannun, Andrew Y Ng, et al. Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, volume 30-1, page 3. Atlanta, GA, 2013
work page 2013
-
[35]
Unexpected improvements to expected improvement for bayesian optimization
Sebastian Ament, Samuel Daulton, David Eriksson, Maximilian Balandat, and Eytan Bakshy. Unexpected improvements to expected improvement for bayesian optimization. Advances in Neural Information Processing Systems, 36, 2024
work page 2024
-
[36]
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
Niranjan Srinivas, Andreas Krause, Sham M. Kakade, and Matthias W. Seeger. Gaussian process bandits without regret: An experimental design approach. CoRR, abs/0912.3995, 2009
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[37]
Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, ˙Ilhan Polat, Yu Feng, Eric W. M...
work page 2020
-
[38]
On the limited memory bfgs method for large scale optimization
Dong C Liu and Jorge Nocedal. On the limited memory bfgs method for large scale optimization. Mathematical programming, 45(1):503–528, 1989
work page 1989
-
[39]
Kriging is well-suited to parallelize optimization
David Ginsbourger, Rodolphe Le Riche, and Laurent Carraro. Kriging is well-suited to parallelize optimization. In Computational intelligence in expensive optimization problems, pages 131–162. Springer, 2010
work page 2010
-
[40]
Differentiable expected hypervolume improvement for parallel multi-objective bayesian optimization
Samuel Daulton, Maximilian Balandat, and Eytan Bakshy. Differentiable expected hypervolume improvement for parallel multi-objective bayesian optimization. Advances in Neural Information Processing Systems, 33:9851– 9864, 2020
work page 2020
-
[41]
Parallel Bayesian Global Optimization of Expensive Functions
Jialei Wang, Scott C Clark, Eric Liu, and Peter I Frazier. Parallel bayesian global optimization of expensive functions. arXiv preprint arXiv:1602.05149, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[43]
py-pde: A python package for solving partial differential equations
David Zwicker. py-pde: A python package for solving partial differential equations. Journal of Open Source Software, 5(48):2158, 2020
work page 2020
-
[44]
Interferobot: aligning an optical interferometer by a reinforcement learning agent, 2021
Dmitry Sorokin, Alexander Ulanov, Ekaterina Sazhina, and Alexander Lvovsky. Interferobot: aligning an optical interferometer by a reinforcement learning agent, 2021
work page 2021
-
[45]
JAX: composable transforma- tions of Python+NumPy programs, 2018
James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transforma- tions of Python+NumPy programs, 2018
work page 2018
-
[46]
Flax: A neural network library and ecosystem for JAX, 2023
Jonathan Heek, Anselm Levskaya, Avital Oliver, Marvin Ritter, Bertrand Rondepierre, Andreas Steiner, and Marc van Zee. Flax: A neural network library and ecosystem for JAX, 2023
work page 2023
-
[47]
J. D. Hunter. Matplotlib: A 2d graphics environment. Computing in Science & Engineering, 9(3):90–95, 2007. L. Ferreira Guilhoto & P. Pedikaris 12 A Preprint - April 5, 2024 Composite Bayesian Optimization In Function Spaces Using NEON - Neural Epistemic Operator Networks
work page 2007
-
[48]
Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Shepp...
work page 2020
-
[49]
Nikolay Bliznyuk, David Ruppert, Christine Shoemaker, Rommel Regis, Stefan Wild, and Pradeep Mugunthan. Bayesian calibration and uncertainty analysis for computationally expensive models using optimization and radial basis function approximation. Journal of Computational and Graphical Statistics, 17(2):270–294, 2008
work page 2008
-
[50]
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017
work page 2017
-
[51]
Wilson, Frank Hutter, and Marc Peter Deisenroth
James T. Wilson, Frank Hutter, and Marc Peter Deisenroth. Maximizing acquisition functions for bayesian optimization, 2018. Author contributions statement L.F.G. and P.P. conceived the methodology. L.F.G. conducted the experiments and analysed the results. P.P. provided funding and supervised this study. All authors reviewed the manuscript. Competing Inte...
work page 2018
-
[52]
The EpiNet architecture we used consisted of a trainable MLP with two hidden layers of dimension 32, and for the prior component an ensemble of 16 MLPs with 2 hidden layers of width 5 each and a scale parameter of 1. We trained this network for 4,000 steps using full batch and the Adam[50] optimizer and exponential learning rate decay. Figure 6: Full expe...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.