pith. sign in

arxiv: 2404.03099 · v1 · pith:YBBNUV4Gnew · submitted 2024-04-03 · 💻 cs.LG · cs.AI· cs.CE· cs.IT· math.IT· stat.ML

Composite Bayesian Optimization In Function Spaces Using NEON -- Neural Epistemic Operator Networks

Pith reviewed 2026-05-24 02:18 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CEcs.ITmath.ITstat.ML
keywords neural operatorsBayesian optimizationepistemic uncertaintyfunction spacescomposite optimizationoperator learningsequential decision making
0
0 comments X

The pith

NEON uses one operator network to match deep ensembles in composite Bayesian optimization over function spaces while using orders of magnitude fewer parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents NEON as an architecture that produces predictions together with epistemic uncertainty from a single operator network backbone. It applies this to composite Bayesian optimization, where one seeks to maximize an unknown composition f = g ∘ h and h maps inputs to elements of a function space. Experiments on toy problems and real-world tasks show that the resulting acquisition strategy reaches state-of-the-art performance. A sympathetic reader cares because the approach removes the need to train and store large ensembles when the objective involves functional intermediates.

Core claim

NEON is an architecture for generating predictions with uncertainty using a single operator network backbone, which presents orders of magnitude less trainable parameters than deep ensembles of comparable performance. When applied to the problem of composite Bayesian optimization of f = g ∘ h, where h : X → C(𝒴, ℝ^{d_s}) is an unknown map outputting elements of a function space and g is a known cheap functional, NEON achieves state-of-the-art performance on toy and real-world scenarios.

What carries the argument

NEON (Neural Epistemic Operator Networks), a single operator network backbone that supplies epistemic uncertainty estimates to guide acquisition in composite Bayesian optimization over function spaces.

If this is right

  • Composite Bayesian optimization over functional outputs becomes feasible with far lower memory and training cost.
  • Operator learning models can supply the uncertainty needed for sequential decision making without maintaining multiple independent networks.
  • The same backbone can be reused across multiple composite problems that share the same functional output space.
  • Real-time or resource-constrained applications of function-space optimization become practical.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • NEON-style single-backbone uncertainty could be tested on other sequential tasks that already use operator networks, such as control of PDE-governed systems.
  • If the uncertainty quality generalizes, similar single-network designs might replace ensembles in related operator-learning settings that currently rely on them for calibration.
  • The method invites direct comparison against other cheap uncertainty mechanisms, such as last-layer Laplace approximations, inside the same composite BO loop.

Load-bearing premise

A single operator network backbone can produce epistemic uncertainty estimates of quality comparable to deep ensembles for the purpose of guiding composite Bayesian optimization.

What would settle it

A controlled composite Bayesian optimization benchmark in which NEON-guided search reaches demonstrably worse final values than an ensemble baseline of matched predictive accuracy.

Figures

Figures reproduced from arXiv: 2404.03099 by Leonardo Ferreira Guilhoto, Paris Perdikaris.

Figure 1
Figure 1. Figure 1: Example of h(u) ∈ C([0, 221]2 , R 2 ) for the the Cell Towers problem. The input u ∈ R 30 encodes transmission parameters of 15 cell towers, which are used to produce the function seen above, where signal intensity and interference are plotted, respectively. This information is the used to compute a score f(u) = g(h(u)) ∈ R which evaluates the quality of cellular service in the region. By using operator co… view at source ↗
Figure 2
Figure 2. Figure 2: Diagrams for the architectures used in this paper. The NEON architecture (top) combines the deterministic [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Diagrams representing the two decoders used in the experiments considered in this paper. On the left, the [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Experimental results for the Environment Model (left) and Brusselator PDE (right) problems. In both cases [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Experimental results for the Optical Interferometer (left) and Cell Towers (right) problems. For the [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Full experimental results for the Environmental Modeling (left) and Brusselator PDE (right) problems. The [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Best result obtained by NEON for the Optical Interferometer problem. Here we plot the 16 components of [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Best result obtained by NEON for the Cell Towers problem. Here we plot the signal strength and interference [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Full experimental results for the Optical Interferometer (left) and Cell Towers (right) problems. The dashed [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Average results among 5 trials comparing parallel acquisition functions using [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗
read the original abstract

Operator learning is a rising field of scientific computing where inputs or outputs of a machine learning model are functions defined in infinite-dimensional spaces. In this paper, we introduce NEON (Neural Epistemic Operator Networks), an architecture for generating predictions with uncertainty using a single operator network backbone, which presents orders of magnitude less trainable parameters than deep ensembles of comparable performance. We showcase the utility of this method for sequential decision-making by examining the problem of composite Bayesian Optimization (BO), where we aim to optimize a function $f=g\circ h$, where $h:X\to C(\mathcal{Y},\mathbb{R}^{d_s})$ is an unknown map which outputs elements of a function space, and $g: C(\mathcal{Y},\mathbb{R}^{d_s})\to \mathbb{R}$ is a known and cheap-to-compute functional. By comparing our approach to other state-of-the-art methods on toy and real world scenarios, we demonstrate that NEON achieves state-of-the-art performance while requiring orders of magnitude less trainable parameters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces NEON (Neural Epistemic Operator Networks), an operator-learning architecture that produces epistemic uncertainty estimates from a single network backbone rather than an ensemble. The method is applied to composite Bayesian optimization of the form f = g ∘ h, where h maps to a function space and g is a known, cheap functional; experiments on toy problems and real-world tasks are reported to show state-of-the-art optimization performance while using orders of magnitude fewer trainable parameters than comparable deep ensembles.

Significance. If the experimental results hold, the work supplies a concrete, parameter-efficient mechanism for epistemic uncertainty in infinite-dimensional operator learning that directly supports sequential decision-making. The evaluation measures optimization regret rather than isolated predictive metrics, and the architecture description supplies an explicit route to uncertainty that is shown to be competitive with ensembles; these elements strengthen the central claim.

minor comments (2)
  1. The notation for the composite objective (h : X → C(Y, R^{d_s})) is introduced in the abstract but would benefit from an explicit reminder in the first paragraph of §3 when the BO acquisition functions are defined.
  2. Figure 2 caption states that NEON uses 'a single backbone'; a one-sentence clarification of how the epistemic head is attached without increasing the parameter count relative to a deterministic operator network would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their thorough review, positive assessment of the significance of NEON for uncertainty-aware operator learning in composite Bayesian optimization, and recommendation to accept the manuscript.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central claims rest on introducing the NEON architecture for epistemic uncertainty via a single operator-network backbone and then validating its utility for composite Bayesian optimization through direct empirical comparisons against SOTA baselines on toy and real-world tasks. These comparisons measure optimization performance (not merely internal predictive metrics) and are independent of any self-referential definitions, parameter fits renamed as predictions, or load-bearing self-citations. No equations or architectural choices in the provided description reduce by construction to the target results; the derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no identifiable free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5722 in / 920 out tokens · 20300 ms · 2026-05-24T02:18:59.225521+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 3 internal anchors

  1. [1]

    Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, 2006

  2. [2]

    Bayesian neural networks: An introduction and survey

    Ethan Goan and Clinton Fookes. Bayesian neural networks: An introduction and survey. In Case Studies in Applied Bayesian Data Science, pages 45–87. Springer International Publishing, 2020

  3. [3]

    Simple and scalable predictive uncertainty estimation using deep ensembles, 2016

    Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles, 2016

  4. [4]

    Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators

    Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, mar 2021. L. Ferreira Guilhoto & P. Pedikaris 10 A Preprint - April 5, 2024 Composite Bayesian Optimization In Function Spaces Using N...

  5. [5]

    Fourier neural operator for parametric partial differential equations, 2021

    Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations, 2021

  6. [6]

    Neural operator: Learning maps between function spaces with applications to pdes

    Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces with applications to pdes. Journal of Machine Learning Research, 24(89):1–97, 2023

  7. [7]

    Learning operators with coupled attention

    Georgios Kissas, Jacob H Seidman, Leonardo Ferreira Guilhoto, Victor M Preciado, George J Pappas, and Paris Perdikaris. Learning operators with coupled attention. The Journal of Machine Learning Research, 23(1):9636– 9698, 2022

  8. [8]

    Learning the solution operator of parametric partial differential equations with physics-informed deeponets

    Sifan Wang, Hanwen Wang, and Paris Perdikaris. Learning the solution operator of parametric partial differential equations with physics-informed deeponets. Science advances, 7(40):eabi8605, 2021

  9. [9]

    Improved architectures and training algorithms for deep operator networks

    Sifan Wang, Hanwen Wang, and Paris Perdikaris. Improved architectures and training algorithms for deep operator networks. Journal of Scientific Computing, 92(2):35, 2022

  10. [10]

    Scalable uncertainty quantification for deep operator networks using randomized priors

    Yibo Yang, Georgios Kissas, and Paris Perdikaris. Scalable uncertainty quantification for deep operator networks using randomized priors. Computer Methods in Applied Mechanics and Engineering, 399:115399, 2022

  11. [11]

    Uncertainty quantification in scientific machine learning: Methods, metrics, and comparisons.Journal of Computational Physics, 477:111902, 2023

    Apostolos F Psaros, Xuhui Meng, Zongren Zou, Ling Guo, and George Em Karniadakis. Uncertainty quantification in scientific machine learning: Methods, metrics, and comparisons.Journal of Computational Physics, 477:111902, 2023

  12. [12]

    Gomez, Tim G

    Angelos Filos, Sebastian Farquhar, Aidan N. Gomez, Tim G. J. Rudner, Zachary Kenton, Lewis Smith, Milad Alizadeh, Arnoud de Kroon, and Yarin Gal. A systematic comparison of bayesian deep learning robustness in diabetic retinopathy tasks, 2019

  13. [13]

    Novoa, Justin Ko, Susan M

    Andre Esteva, Brett Kuprel, Roberto A. Novoa, Justin Ko, Susan M. Swetter, Helen M. Blau, and Sebastian Thrun. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639):115–118, January 2017

  14. [14]

    Autonomous driving with deep learning: A survey of state-of-art technologies, 2020

    Yu Huang and Yue Chen. Autonomous driving with deep learning: A survey of state-of-art technologies, 2020

  15. [15]

    Bayesian active learning for classification and preference learning, 2011

    Neil Houlsby, Ferenc Huszár, Zoubin Ghahramani, and Máté Lengyel. Bayesian active learning for classification and preference learning, 2011

  16. [16]

    Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning, 2019

    Andreas Kirsch, Joost van Amersfoort, and Yarin Gal. Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning, 2019

  17. [17]

    Epistemic neural networks

    Ian Osband, Zheng Wen, Mohammad Asghari, Morteza Ibrahimi, Xiyuan Lu, and Benjamin Van Roy. Epistemic neural networks. CoRR, abs/2107.08924, 2021

  18. [18]

    Recent advances in bayesian optimization, 2022

    Xilu Wang, Yaochu Jin, Sebastian Schmitt, and Markus Olhofer. Recent advances in bayesian optimization, 2022

  19. [19]

    Jiang, Samuel Daulton, Benjamin Letham, Andrew Gordon Wilson, and Eytan Bakshy

    Maximilian Balandat, Brian Karrer, Daniel R. Jiang, Samuel Daulton, Benjamin Letham, Andrew Gordon Wilson, and Eytan Bakshy. Botorch: Programmable bayesian optimization in pytorch. CoRR, abs/1910.06403, 2019

  20. [20]

    Dropout as a bayesian approximation: Representing model uncertainty in deep learning, 2016

    Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning, 2016

  21. [21]

    Seidman, Georgios Kissas, Paris Perdikaris, and George J

    Jacob H. Seidman, Georgios Kissas, Paris Perdikaris, and George J. Pappas. Nomad: Nonlinear manifold decoders for operator learning, 2022

  22. [22]

    Scalable bayesian optimization with randomized prior networks

    Mohamed Aziz Bhouri, Michael Joly, Robert Yu, Soumalya Sarkar, and Paris Perdikaris. Scalable bayesian optimization with randomized prior networks. Computer Methods in Applied Mechanics and Engineering , 417:116428, 2023

  23. [23]

    Bayesian optimization with high-dimensional outputs

    Wesley J Maddox, Maximilian Balandat, Andrew G Wilson, and Eytan Bakshy. Bayesian optimization with high-dimensional outputs. Advances in neural information processing systems, 34:19274–19287, 2021

  24. [24]

    Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems

    Tianping Chen and Hong Chen. Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems. IEEE Transactions on Neural Networks, 6(4):911–917, 1995

  25. [25]

    Neural operator prediction of linear instability waves in high-speed boundary layers

    Patricio Clark Di Leoni, Lu Lu, Charles Meneveau, George Em Karniadakis, and Tamer A Zaki. Neural operator prediction of linear instability waves in high-speed boundary layers. Journal of Computational Physics, 474:111793, 2023

  26. [26]

    Mionet: Learning multiple-input operators via tensor product, 2022

    Pengzhan Jin, Shuai Meng, and Lu Lu. Mionet: Learning multiple-input operators via tensor product, 2022

  27. [27]

    Raul Astudillo and Peter I. Frazier. Bayesian optimization of composite functions, 2019. L. Ferreira Guilhoto & P. Pedikaris 11 A Preprint - April 5, 2024 Composite Bayesian Optimization In Function Spaces Using NEON - Neural Epistemic Operator Networks

  28. [28]

    Joint composite latent space bayesian optimization

    Natalie Maus, Zhiyuan Jerry Lin, Maximilian Balandat, and Eytan Bakshy. Joint composite latent space bayesian optimization. arXiv preprint arXiv:2311.02213, 2023

  29. [29]

    Optimizing coverage and capacity in cellular networks using machine learning

    Ryan M Dreifuerst, Samuel Daulton, Yuchen Qian, Paul Varkey, Maximilian Balandat, Sanjay Kasturia, Anoop Tomar, Ali Yazdan, Vish Ponnampalam, and Robert W Heath. Optimizing coverage and capacity in cellular networks using machine learning. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8138–814...

  30. [30]

    Deep learning for bayesian optimization of scientific problems with high-dimensional structure

    Samuel Kim, Peter Y Lu, Charlotte Loh, Jamie Smith, Jasper Snoek, and Marin Solja ˇci´c. Deep learning for bayesian optimization of scientific problems with high-dimensional structure. Transactions on Machine Learning Research, 2022

  31. [31]

    Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T

    Matthew Tancik, Pratul P. Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T. Barron, and Ren Ng. Fourier features let networks learn high frequency functions in low dimensional domains. NeurIPS, 2020

  32. [32]

    Matthews, Kwang Moo Yi, Gopal Sharma, Dmitry Lagun, and Andrea Tagliasacchi

    Daniel Rebain, Mark J. Matthews, Kwang Moo Yi, Gopal Sharma, Dmitry Lagun, and Andrea Tagliasacchi. Attention beats concatenation for conditioning neural fields, 2022

  33. [33]

    On the difficulty of training Recurrent Neural Networks

    Razvan Pascanu, Tomás Mikolov, and Yoshua Bengio. Understanding the exploding gradient problem.CoRR, abs/1211.5063, 2012

  34. [34]

    Rectifier nonlinearities improve neural network acoustic models

    Andrew L Maas, Awni Y Hannun, Andrew Y Ng, et al. Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, volume 30-1, page 3. Atlanta, GA, 2013

  35. [35]

    Unexpected improvements to expected improvement for bayesian optimization

    Sebastian Ament, Samuel Daulton, David Eriksson, Maximilian Balandat, and Eytan Bakshy. Unexpected improvements to expected improvement for bayesian optimization. Advances in Neural Information Processing Systems, 36, 2024

  36. [36]

    Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

    Niranjan Srinivas, Andreas Krause, Sham M. Kakade, and Matthias W. Seeger. Gaussian process bandits without regret: An experimental design approach. CoRR, abs/0912.3995, 2009

  37. [37]

    Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J

    Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, ˙Ilhan Polat, Yu Feng, Eric W. M...

  38. [38]

    On the limited memory bfgs method for large scale optimization

    Dong C Liu and Jorge Nocedal. On the limited memory bfgs method for large scale optimization. Mathematical programming, 45(1):503–528, 1989

  39. [39]

    Kriging is well-suited to parallelize optimization

    David Ginsbourger, Rodolphe Le Riche, and Laurent Carraro. Kriging is well-suited to parallelize optimization. In Computational intelligence in expensive optimization problems, pages 131–162. Springer, 2010

  40. [40]

    Differentiable expected hypervolume improvement for parallel multi-objective bayesian optimization

    Samuel Daulton, Maximilian Balandat, and Eytan Bakshy. Differentiable expected hypervolume improvement for parallel multi-objective bayesian optimization. Advances in Neural Information Processing Systems, 33:9851– 9864, 2020

  41. [41]

    Parallel Bayesian Global Optimization of Expensive Functions

    Jialei Wang, Scott C Clark, Eric Liu, and Peter I Frazier. Parallel bayesian global optimization of expensive functions. arXiv preprint arXiv:1602.05149, 2016

  42. [43]

    py-pde: A python package for solving partial differential equations

    David Zwicker. py-pde: A python package for solving partial differential equations. Journal of Open Source Software, 5(48):2158, 2020

  43. [44]

    Interferobot: aligning an optical interferometer by a reinforcement learning agent, 2021

    Dmitry Sorokin, Alexander Ulanov, Ekaterina Sazhina, and Alexander Lvovsky. Interferobot: aligning an optical interferometer by a reinforcement learning agent, 2021

  44. [45]

    JAX: composable transforma- tions of Python+NumPy programs, 2018

    James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transforma- tions of Python+NumPy programs, 2018

  45. [46]

    Flax: A neural network library and ecosystem for JAX, 2023

    Jonathan Heek, Anselm Levskaya, Avital Oliver, Marvin Ritter, Bertrand Rondepierre, Andreas Steiner, and Marc van Zee. Flax: A neural network library and ecosystem for JAX, 2023

  46. [47]

    J. D. Hunter. Matplotlib: A 2d graphics environment. Computing in Science & Engineering, 9(3):90–95, 2007. L. Ferreira Guilhoto & P. Pedikaris 12 A Preprint - April 5, 2024 Composite Bayesian Optimization In Function Spaces Using NEON - Neural Epistemic Operator Networks

  47. [48]

    Harris, K

    Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Shepp...

  48. [49]

    Bayesian calibration and uncertainty analysis for computationally expensive models using optimization and radial basis function approximation

    Nikolay Bliznyuk, David Ruppert, Christine Shoemaker, Rommel Regis, Stefan Wild, and Pradeep Mugunthan. Bayesian calibration and uncertainty analysis for computationally expensive models using optimization and radial basis function approximation. Journal of Computational and Graphical Statistics, 17(2):270–294, 2008

  49. [50]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017

  50. [51]

    Wilson, Frank Hutter, and Marc Peter Deisenroth

    James T. Wilson, Frank Hutter, and Marc Peter Deisenroth. Maximizing acquisition functions for bayesian optimization, 2018. Author contributions statement L.F.G. and P.P. conceived the methodology. L.F.G. conducted the experiments and analysed the results. P.P. provided funding and supervised this study. All authors reviewed the manuscript. Competing Inte...

  51. [52]

    We trained this network for 4,000 steps using full batch and the Adam[50] optimizer and exponential learning rate decay

    The EpiNet architecture we used consisted of a trainable MLP with two hidden layers of dimension 32, and for the prior component an ensemble of 16 MLPs with 2 hidden layers of width 5 each and a scale parameter of 1. We trained this network for 4,000 steps using full batch and the Adam[50] optimizer and exponential learning rate decay. Figure 6: Full expe...