pith. sign in

arxiv: 2501.15458 · v3 · submitted 2025-01-26 · 💻 cs.LG

Amortized Safe Active Learning for Real-Time Data Acquisition: Pretrained Neural Policies From Simulated Nonparametric Functions

Pith reviewed 2026-05-23 04:42 UTC · model grok-4.3

classification 💻 cs.LG
keywords amortized active learningsafe active learningneural policiesGaussian processesFourier featuresreal-time decision makingregression
0
0 comments X

The pith

A neural policy pretrained on simulated Gaussian process samples selects safe queries for active learning via a single forward pass.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops amortized safe active learning by pretraining a neural network policy on functions sampled from Gaussian processes using Fourier features. This policy then replaces the standard online steps of updating a GP model and optimizing an acquisition function during the active learning process. At test time, queries are chosen instantly via one network evaluation, yielding large speedups. The approach maintains the quality of learned models while enabling real-time decisions and extends to non-safe active learning as well.

Core claim

By training a neural policy on nonparametric functions generated via Fourier feature approximations of Gaussian processes, the method allows the selection of informative and safe data points for regression tasks through a single forward pass of the network, bypassing repeated Gaussian process inference and constrained optimization at each step of the active learning loop.

What carries the argument

The pretrained neural policy that takes current observations and outputs the next query point, optimized using a differentiable safety-aware acquisition function during pretraining on simulated data.

If this is right

  • Real-time active learning becomes feasible for applications requiring immediate decisions.
  • Learning quality remains comparable to traditional Gaussian process based methods.
  • The framework supports both safe and unconstrained active learning through modularity.
  • Safety constraints can be incorporated without additional online computation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pretraining strategy might accelerate other sequential decision processes that rely on Gaussian process models.
  • Performance on real data will depend on how representative the simulated training functions are of the target domain.
  • Extensions could include adapting the policy online if generalization gaps appear.

Load-bearing premise

A policy trained only on functions drawn from Gaussian processes will produce high-quality and safe queries when applied to arbitrary unknown real-world functions.

What would settle it

Running the pretrained policy on a real regression task and observing that the selected queries either violate the safety constraints or yield slower learning progress than a standard safe active learning method using Gaussian processes.

Figures

Figures reproduced from arXiv: 2501.15458 by Barbara Rakitsch, Cen-You Li, Christoph Zimmer, Marc Toussaint.

Figure 1
Figure 1. Figure 1: Conventional safe AL relies on com￾putationally expensive (orange) GP fitting and constrained acquisition. Our amortized approach meta trains a safe learner up-front on synthetic data, allowing fast, real-time (green) deployment. Active learning (AL) is a sequential design of experiments, aiming to learn a task with re￾duced data labeling effort [24, 43, 49]. Each label is queried by optimizing an acquisi￾… view at source ↗
Figure 2
Figure 2. Figure 2: Empirical results on standard AL. Left: RMSE on airfoil dataset vs number of queries T. Our trained policy is deployed at T = 2, 4, . . . , 40. DAD trains separate policies for each T (shown for T = 10, 20, 30, 40). All methods improve as T increases. Right: total query time at T = 20 across datasets. Our ap￾proach is significantly faster, requiring only a sin￾gle NN forward pass for each data acquisition.… view at source ↗
Figure 3
Figure 3. Figure 3: Empirical results on safe AL. Left: Our method (blue) achieves competitive RMSE and outperforms Safe Random across all tasks. The RMSE is evaluated on safe test data only. Middle: Safety awareness is quantified as the proportion of safe queries out of the total T queries. Our method reaches the required threshold 1−γ on all tasks. Right: Our approach is significantly faster, as we avoid GP modeling and acq… view at source ↗
read the original abstract

Safe active learning (AL) is a sequential scheme for learning unknown systems while respecting safety constraints during data acquisition. Existing methods often rely on Gaussian processes (GPs) to model the task and safety constraints, requiring repeated GP updates and constrained acquisition optimization--incurring significant computations which are challenging for real-time decision-making. We propose amortized AL for regression and amortized safe AL, replacing expensive online computations with a pretrained neural policy. Inspired by recent advances in amortized Bayesian experimental design, we leverage GPs as pretraining simulators. We train our policy prior to the AL deployment on simulated nonparametric functions, using Fourier feature-based GP sampling and a differentiable acquisition objective that is safety-aware in the safe AL setting. At deployment, our policy selects informative and (if desired) safe queries via a single forward pass, eliminating GP inference and acquisition optimization. This leads to magnitudes of speed improvements while preserving learning quality. Our framework is modular and, without the safety component, yields fast unconstrained AL for time-sensitive tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes amortized safe active learning (AL) for regression, replacing repeated Gaussian process (GP) inference and constrained acquisition optimization with a pretrained neural policy. The policy is trained offline on nonparametric functions sampled from GPs via Fourier feature approximations, using a differentiable safety-aware acquisition objective. At deployment, the policy performs query selection (informative and optionally safe) in a single forward pass, claiming orders-of-magnitude speedups while preserving learning quality. The framework is modular and also supports fast unconstrained AL without the safety component.

Significance. If the sim-to-real generalization holds, the approach could enable real-time safe AL in latency-critical settings by amortizing expensive online computations. The use of simulation-based pretraining with differentiable objectives follows recent amortized Bayesian experimental design work and offers a modular design. However, the central claim of preserved quality rests on unproven transfer from GP-simulated training distributions to arbitrary real target functions.

major comments (2)
  1. [Abstract] The load-bearing claim that the pretrained policy 'preserves learning quality' on real unknown targets (Abstract) is not supported by any reported empirical results, ablation studies, or comparisons to online GP-based safe AL on functions outside the Fourier-feature GP prior. Without such validation, the speed-quality tradeoff cannot be assessed.
  2. The generalization assumption—that a policy trained solely on functions sampled from a stationary GP prior via Fourier features will produce high-quality and safe queries on real targets exhibiting non-stationarity, discontinuities, or mismatched length scales—is not guaranteed and directly undermines both the quality-preservation and safety claims if the sim-to-real gap is large.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed report. We address the two major comments point-by-point below, providing the strongest honest defense of the manuscript while acknowledging where clarifications or additions are warranted.

read point-by-point responses
  1. Referee: [Abstract] The load-bearing claim that the pretrained policy 'preserves learning quality' on real unknown targets (Abstract) is not supported by any reported empirical results, ablation studies, or comparisons to online GP-based safe AL on functions outside the Fourier-feature GP prior. Without such validation, the speed-quality tradeoff cannot be assessed.

    Authors: Our experiments evaluate the amortized policy against online GP baselines on held-out functions drawn from the identical Fourier-feature GP prior used for pretraining. In these settings the policy achieves statistically comparable regression performance (learning curves and final MSE) while delivering the reported speedups. The abstract claim of 'preserving learning quality' is therefore grounded in the reported results, which match the training distribution. We do not present results on real-world data or functions drawn from qualitatively different distributions. We will revise the abstract and introduction to state the evaluation scope explicitly and to avoid any implication of validation outside the GP-simulated regime. revision: partial

  2. Referee: The generalization assumption—that a policy trained solely on functions sampled from a stationary GP prior via Fourier features will produce high-quality and safe queries on real targets exhibiting non-stationarity, discontinuities, or mismatched length scales—is not guaranteed and directly undermines both the quality-preservation and safety claims if the sim-to-real gap is large.

    Authors: We agree that transfer performance is not guaranteed when the target function deviates substantially from the stationary GP prior (e.g., strong non-stationarity or discontinuities). The manuscript positions GPs as flexible simulators rather than claiming universal generalization; the Fourier-feature construction already permits a range of length scales, but cannot cover all possible real-world behaviors. Safety guarantees at deployment inherit the same distributional assumption. We will add an explicit limitations paragraph discussing the sim-to-real gap, the role of prior choice, and possible mitigations such as domain randomization during pretraining. revision: yes

Circularity Check

0 steps flagged

No circularity; pretraining on GP simulations is independent of deployment targets

full rationale

The paper's chain is: (1) sample nonparametric functions from GP prior via Fourier features, (2) optimize neural policy to mimic a differentiable acquisition objective on those simulations, (3) deploy the fixed policy via single forward pass on real data. This is standard amortized simulation-based training; the policy outputs are not defined in terms of, or fitted to, the eventual target function's observations. No equations reduce a claimed prediction back to its own inputs by construction. No self-citations appear as load-bearing premises in the abstract or described method. Generalization from GP-simulated training distribution to real functions is an empirical assumption, not a definitional loop. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that simulated nonparametric functions drawn from GPs are sufficiently representative for policy training to transfer to real tasks. No free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Gaussian processes with Fourier feature sampling can generate representative nonparametric functions for pretraining active learning policies
    Used as the simulator for policy training prior to deployment

pith-pipeline@v0.9.0 · 5712 in / 1120 out tokens · 23779 ms · 2026-05-23T04:42:30.508700+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages

  1. [1]

    Constrained markov decision processes

    Altman, E. Constrained markov decision processes. Routledge, 1999

  2. [2]

    Deep learning quadcopter control via risk-aware active learning

    Andersson, O., Wzorek, M., and Doherty, P. Deep learning quadcopter control via risk-aware active learning. AAAI Conference on Artificial Intelligence, 2017

  3. [3]

    P., and Krause, A

    Berkenkamp, F., Schoellig, A. P., and Krause, A. Safe controller optimization for quadrotors with gaussian processes. International Conference on Robotics and Automation, 2016

  4. [4]

    Berkenkamp, F., Krause, A., and Schoellig, A. P. Bayesian optimization with safety con- straints: Safe and automatic parameter tuning in robotics. Machine Learning, 2020

  5. [5]

    P., Jankowiak, M., Obermeyer, F., Pradhan, N., Karaletsos, T., Singh, R., Szerlip, P., Horsfall, P., and Goodman, N

    Bingham, E., Chen, J. P., Jankowiak, M., Obermeyer, F., Pradhan, N., Karaletsos, T., Singh, R., Szerlip, P., Horsfall, P., and Goodman, N. D. Pyro: Deep Universal Probabilistic Programming. Journal of Machine Learning Research, 2018

  6. [6]

    Amortized inference for gaussian process hyperpa- rameters of structured kernels

    Bitzer, M., Meister, M., and Zimmer, C. Amortized inference for gaussian process hyperpa- rameters of structured kernels. Conference on Uncertainty in Artificial Intelligence, 2023

  7. [7]

    Bottero, A., Luis, C., Vinogradska, J., Berkenkamp, F., and Peters, J. R. Information-theoretic safe exploration with gaussian processes.Advances in Neural Information Processing Systems, 2022

  8. [8]

    M., and de Freitas, N

    Brochu, E., Cora, V . M., and de Freitas, N. A tutorial on bayesian optimization of expen- sive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv, 2010

  9. [9]

    W., Yuan, Z., Zhou, S., Panerati, J., and Schoellig, A

    Brunke, L., Greeff, M., Hall, A. W., Yuan, Z., Zhou, S., Panerati, J., and Schoellig, A. P. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 2022

  10. [10]

    W., Colmenarejo, S

    Chen, Y ., Hoffman, M. W., Colmenarejo, S. G., Denil, M., Lillicrap, T. P., Botvinick, M., and de Freitas, N. Learning to learn without gradient descent by gradient descent. International Conference on Machine Learning, 2017

  11. [11]

    R., Malik, I., and Rainforth, T

    Foster, A., Ivanova, D. R., Malik, I., and Rainforth, T. Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design. International Conference on Machine Learning , 2021

  12. [12]

    A comprehensive survey on safe reinforcement learning

    Garc ´ıa, J., Fern, and o Fern ´andez. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 2015

  13. [13]

    A., Snoek, J., and Adams, R

    Gelbart, M. A., Snoek, J., and Adams, R. P. Bayesian optimization with unknown constraints. Conference on Uncertainty in Artificial Intelligence, 2014

  14. [14]

    B., Gray, G

    Gramacy, R. B., Gray, G. A., Digabel, S. L., Lee, H. K. H., Ranjan, P., Wells, G., and Wild, S. M. Modeling an augmented lagrangian for blackbox constrained optimization. arXiv, 2015

  15. [15]

    and Hern´andez-Lobato, J

    Griffiths, R.-R. and Hern´andez-Lobato, J. M. Constrained Bayesian optimization for automatic chemical design using variational autoencoders. Chemical Science, 2020

  16. [16]

    A review of safe reinforcement learning: Methods, theories and applications

    Gu, S., Yang, L., Du, Y ., Chen, G., Walter, F., Wang, J., and Knoll, A. A review of safe reinforcement learning: Methods, theories and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

  17. [17]

    Guestrin, C., Krause, A., and Singh, A. P. Near-optimal sensor placements in gaussian pro- cesses. International Conference on Machine Learning, 2005

  18. [18]

    Scalable Variational Gaussian Process Clas- sification

    Hensman, J., Matthews, A., and Ghahramani, Z. Scalable Variational Gaussian Process Clas- sification. International Conference on Artificial Intelligence and Statistics, 2015

  19. [19]

    Amortized bayesian experimental design for decision-making

    Huang, D., Guo, Y ., Acerbi, L., and Kaski, S. Amortized bayesian experimental design for decision-making. Advances in Neural Information Processing Systems, 2024. 10

  20. [20]

    R., Foster, A., Kleinegesse, S., Gutmann, M

    Ivanova, D. R., Foster, A., Kleinegesse, S., Gutmann, M. U., and Rainforth, T. Implicit Deep Adaptive Design: Policy-Based Experimental Design without Likelihoods.Advances in Neural Information Processing Systems, 2021

  21. [21]

    Adaptive and safe Bayesian optimization in high dimensions via one-dimensional subspaces

    Kirschner, J., Mutny, M., Hiller, N., Ischebeck, R., and Krause, A. Adaptive and safe Bayesian optimization in high dimensions via one-dimensional subspaces. International Conference on Machine Learning, 2019

  22. [22]

    and Guestrin, C

    Krause, A. and Guestrin, C. Nonmyopic active learning of gaussian processes: An exploration- exploitation approach. International Conference on Machine Learning, 2007

  23. [23]

    Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies

    Krause, A., Singh, A., and Guestrin, C. Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies. Journal of Machine Learning Research , 2008

  24. [24]

    and Gupta, A

    Kumar, P. and Gupta, A. Active learning query strategies for classification, regression, and clustering: A survey. Journal of Computer Science and Technology, 2020

  25. [25]

    Lederer, A., Conejo, A. J. O., Maier, K. A., Xiao, W., Umlauft, J., and Hirche, S. Gaussian process-based real-time learning for safety critical applications. International Conference on Machine Learning, 2021

  26. [26]

    Safe active learning for multi-output gaussian pro- cesses

    Li, C.-Y ., Rakitsch, B., and Zimmer, C. Safe active learning for multi-output gaussian pro- cesses. International Conference on Artificial Intelligence and Statistics, 2022

  27. [27]

    Global safe sequential learning via efficient knowledge transfer

    Li, C.-Y ., D¨unnbier, O., Toussaint, M., Rakitsch, B., and Zimmer, C. Global safe sequential learning via efficient knowledge transfer. Transactions on Machine Learning Research, 2025

  28. [28]

    On the variance of the adaptive learning rate and beyond

    Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., and Han, J. On the variance of the adaptive learning rate and beyond. International Conference on Learning Representations, 2020

  29. [29]

    J., and Adams, R

    Liu, S., Sun, X., Ramadge, P. J., and Adams, R. P. Task-agnostic amortized inference of gaussian process hyperparameters. Advances in Neural Information Processing Systems, 2020

  30. [30]

    B., Ober, S

    Moss, H. B., Ober, S. W., and Picheny, V . Inducing point allocation for sparse gaussian pro- cesses in high-throughput bayesian optimisation. International Conference on Artificial Intel- ligence and Statistics, 2023

  31. [31]

    and Peters, J

    Nguyen–Tuong, D. and Peters, J. Incremental sparsification for real-time online model learn- ing. International Conference on Artificial Intelligence and Statistics, 2010

  32. [32]

    and Wright, S

    Nocedal, J. and Wright, S. J. Numerical optimization. Springer, 2006

  33. [33]

    N., Covell, P

    Pamadi, B. N., Covell, P. F., Tartabini, P. V ., and Murphy, K. J. Aerodynamic characteristics and glide-back performance of langley glide-back booster. Applied Aerodynamics Conference and Exhibit, 2004

  34. [34]

    ”how big is big enough?” adjusting model size in continual gaussian processes

    Pescador-Barrios, G., Filippi, S., and van der Wilk, M. ”how big is big enough?” adjusting model size in continual gaussian processes. arXiv, 2024

  35. [35]

    and Recht, B

    Rahimi, A. and Recht, B. Random features for large-scale kernel machines. Advances in Neural Information Processing Systems, 2007

  36. [36]

    R., and Smith, F

    Rainforth, T., Foster, A., Ivanova, D. R., and Smith, F. B. Modern Bayesian Experimental Design. Statistical Science, 2024

  37. [37]

    and Williams, C

    Rasmussen, C. and Williams, C. Gaussian processes for machine learning. MIT Press, 2006

  38. [38]

    E., Aftosmis, M

    Rogers, S. E., Aftosmis, M. J., Pandya, S. A., Chaderjian, N. M., T., E. T., and Ahmad, J. U. Automated cfd parameter studies on distributed parallel computers.AIAA Computational Fluid Dynamics Conference, 2003

  39. [39]

    Pacoh: Bayes-optimal meta-learning with pac-guarantees

    Rothfuss, J., Fortuin, V ., Josifoski, M., and Krause, A. Pacoh: Bayes-optimal meta-learning with pac-guarantees. International Conference on Machine Learning, 2021. 11

  40. [40]

    Safe exploration for active learning with gaussian processes

    Schreiter, J., Nguyen-Tuong, D., Eberts, M., Bischoff, B., Markert, H., and Toussaint, M. Safe exploration for active learning with gaussian processes. Machine Learning and Knowledge Discovery in Databases, 2015

  41. [41]

    Low rank updates for the cholesky decomposition

    Seeger, M. Low rank updates for the cholesky decomposition. 2004

  42. [42]

    Gaussian process regression: active data selection and test point rejection

    Seo, S., Wallat, M., Graepel, T., and Obermayer, K. Gaussian process regression: active data selection and test point rejection. International Joint Conference on Neural Networks, 2000

  43. [43]

    Active learning literature survey

    Settles, B. Active learning literature survey. University of Wisconsin-Madison, 2010

  44. [44]

    Computer-aided graphing and simulation tools for autocad users

    Simionescu, P. Computer-aided graphing and simulation tools for autocad users. Computer- Aided Graphing and Simulation Tools for AutoCAD Users, 2014

  45. [45]

    M., and Seeger, M

    Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M. W. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 2012

  46. [46]

    Safe exploration for optimization with gaussian processes

    Sui, Y ., Gotovos, A., Burdick, J., and Krause, A. Safe exploration for optimization with gaussian processes. International Conference on Machine Learning, 2015

  47. [47]

    W., and Yue, Y

    Sui, Y ., Zhuang, V ., Burdick, J. W., and Yue, Y . Stagewise Safe Bayesian Optimization with Gaussian Processes. International Conference on Machine Learning, 80, 2018

  48. [48]

    Amortized bayesian optimization over discrete spaces

    Swersky, K., Rubanova, Y ., Dohan, D., and Murphy, K. Amortized bayesian optimization over discrete spaces. Conference on Uncertainty in Artificial Intelligence, 2020

  49. [49]

    and Schenck, W

    Tharwat, A. and Schenck, W. A survey on active learning: State-of-the-art, practical challenges and research directions. Mathematics, 2023

  50. [50]

    Variational learning of inducing variables in sparse gaussian processes

    Titsias, M. Variational learning of inducing variables in sparse gaussian processes. Interna- tional Conference on Artificial Intelligence and Statistics, 2009

  51. [51]

    Constrained optimization in chebfun

    Townsend, A. Constrained optimization in chebfun. chebfun.org, 2017

  52. [52]

    N., Kaiser, L

    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. Attention is all you need. Advances in Neural Information Processing Systems, 2017

  53. [53]

    T., Borovitskiy, V ., Terenin, A., Mostowsky, P., and Deisenroth, M

    Wilson, J. T., Borovitskiy, V ., Terenin, A., Mostowsky, P., and Deisenroth, M. P. Efficiently sampling functions from gaussian process posteriors. International Conference on Machine Learning, 2020

  54. [54]

    N., Low, K

    Zhang, Y ., Hoang, T. N., Low, K. H., and Kankanhalli, M. Near-optimal active learning of multi-output gaussian processes. AAAI Conference on Artificial Intelligence, 2016

  55. [55]

    Safe active learning for time-series modeling with gaussian processes

    Zimmer, C., Meister, M., and Nguyen-Tuong, D. Safe active learning for time-series modeling with gaussian processes. Advances in Neural Information Processing Systems, 2018. 12 Appendix Overview A Gaussian process: distribution and entropy 14 B Policy NN structure 15 C Training objectives: details, illustrations, more objectives 16 C.1 Objectives: unconst...

  56. [56]

    the history encoder ( {(xi, yi)}t i=1 → Et) and the decision feed forward MLP (originally Et → xt+1) are taken from Ivanova et al. [20]

  57. [57]

    we add a hyperbolic tangent function as the last layer to ensure the policy output is in our bounded X , which was not needed in the original Bayesian experimental design problems

  58. [58]

    we add another history encoder to handle the safety data

  59. [59]

    Ef0,...,fNf,q ,Einit,ϵ1:T

    we add a budget encoder to handle the budget variable. Note that the history encoder incorporates the inductive bias that observed data are order-invariant (see [11, 20] for details). This can be seen by noticing that a conventional AL computes the ac- quisition score conditioned on the past observations and the order of the past data does not matter. We ...

  60. [60]

    Consider Q = ID, then 1 D PD d=1 wd[u]2 d = 1 D PD d=1 wd([x]d − 0.5)2 is an ellipsoid cen- tering around (0.5, ..., 0.5) ∈ RD

  61. [61]

    The orthogonal matrix Q is obtained by performing a QR- decomposition of a sampled A ∈ RD×D (each entity from Uniform[−1, 1])

    We can see that µq has the center area being a safe ellipsoid as long as c > 0, with shape and size controlled by wd, and the orthogonal matrix Q allows us to rotate the ellipsoid around the center (0.5, ..., 0.5). The orthogonal matrix Q is obtained by performing a QR- decomposition of a sampled A ∈ RD×D (each entity from Uniform[−1, 1])

  62. [62]

    The above steps describe variables wd, c, Q, and we then describe the constants

  63. [63]

    wd = 20 , D = 2 ) and Q = ID, then the central safe area is a ball and it takes about half of the space, i.e

    If we consider c = 1 , wd/D = 10 , ∀d ≤ D (e.g. wd = 20 , D = 2 ) and Q = ID, then the central safe area is a ball and it takes about half of the space, i.e. the mean function µq brings half of the space safe and half unsafe. We will later sample the shape and the half-safe space is only for an initial design

  64. [64]

    22 Table A.E.2: Batch sizes in training

    With the same c, wd, Q, the constants 3.2 and −0.47 ensure zero mean and unit variance of this µq function, which aligns with our setup that the deployment problems are normalized, and this provides us an estimated variance of µq ≈ c2. 22 Table A.E.2: Batch sizes in training. loss functions I DAD SH Nk = |{(θ, θq)}| 10 10 10 Nf,q = |{(f, q)}| 5 200 5 B = ...

  65. [65]

    After the specifiedT data points are collected, we use the initial and queried data to fit a GP model with Type II maximum likelihood (optimization: L-BFGS-B algorithm)

    Our AAL, our ASAL, DAD, Random: we deploy our amortized (safe) AL or the DAD, Random baselines to collect data. After the specifiedT data points are collected, we use the initial and queried data to fit a GP model with Type II maximum likelihood (optimization: L-BFGS-B algorithm)

  66. [66]

    GP AL, Safe GP AL, Safe Random: We deploy conventional AL and safe AL (Algo- rithms A.8 and A.9), which update GP iteratively with Type II maximum likelihood (opti- mization: L-BFGS-B algorithm)

  67. [67]

    [6] (AGP)

    AGP AL, Safe AGP AL, Safe ARandom: We deploy conventional AL and safe AL (Algo- rithms A.8 and A.9), which update GP iteratively with an amortized inference developed by Bitzer et al. [6] (AGP). Bitzer et al. [6] sampled GP data and trained a transformer model to approximate the Type II maximum likelihood. This AGP is a model with a transformer module. Wh...

  68. [68]

    safe AL of policy trained on our main SH, γ =5% (Eq. (8)),

  69. [69]

    safe AL of policy trained on our appendix SHmean, γ =5% (unconstrained Hmean decorated with our main min unsafe likelihood, see Figure A.C.6),

  70. [70]

    (A.18) and Figure A.C.6),

    safe AL of policy trained on our appendix SH,division (unconstrained H decorated with our appendix max safe likelihood, see Eq. (A.18) and Figure A.C.6),

  71. [71]

    safe AL of policy trained on our appendix SHmean,division (unconstrained Hmean decorated with our appendix max safe likelihood, see Figure A.C.6), and

  72. [72]

    (7)), named MinUnsafe GP AL: xt = argmax{H[y(x)|y1:t−1, Yinit] − log max(γ, p(z(x) < 0|z1:t−1, Zinit))} (γ = 0 .05, this is the same as Eq

    conventional GP based safe AL (Algorithm A.9) but we add the unconstrained safety-aware acquisition criterion (Eq. (7)), named MinUnsafe GP AL: xt = argmax{H[y(x)|y1:t−1, Yinit] − log max(γ, p(z(x) < 0|z1:t−1, Zinit))} (γ = 0 .05, this is the same as Eq. (7) if we take expectation over the forecastedy(x), and this corresponds to objectives SH, SHmean, see...