Amortized Safe Active Learning for Real-Time Data Acquisition: Pretrained Neural Policies From Simulated Nonparametric Functions

Barbara Rakitsch; Cen-You Li; Christoph Zimmer; Marc Toussaint

arxiv: 2501.15458 · v3 · submitted 2025-01-26 · 💻 cs.LG

Amortized Safe Active Learning for Real-Time Data Acquisition: Pretrained Neural Policies From Simulated Nonparametric Functions

Cen-You Li , Marc Toussaint , Barbara Rakitsch , Christoph Zimmer This is my paper

Pith reviewed 2026-05-23 04:42 UTC · model grok-4.3

classification 💻 cs.LG

keywords amortized active learningsafe active learningneural policiesGaussian processesFourier featuresreal-time decision makingregression

0 comments

The pith

A neural policy pretrained on simulated Gaussian process samples selects safe queries for active learning via a single forward pass.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops amortized safe active learning by pretraining a neural network policy on functions sampled from Gaussian processes using Fourier features. This policy then replaces the standard online steps of updating a GP model and optimizing an acquisition function during the active learning process. At test time, queries are chosen instantly via one network evaluation, yielding large speedups. The approach maintains the quality of learned models while enabling real-time decisions and extends to non-safe active learning as well.

Core claim

By training a neural policy on nonparametric functions generated via Fourier feature approximations of Gaussian processes, the method allows the selection of informative and safe data points for regression tasks through a single forward pass of the network, bypassing repeated Gaussian process inference and constrained optimization at each step of the active learning loop.

What carries the argument

The pretrained neural policy that takes current observations and outputs the next query point, optimized using a differentiable safety-aware acquisition function during pretraining on simulated data.

If this is right

Real-time active learning becomes feasible for applications requiring immediate decisions.
Learning quality remains comparable to traditional Gaussian process based methods.
The framework supports both safe and unconstrained active learning through modularity.
Safety constraints can be incorporated without additional online computation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pretraining strategy might accelerate other sequential decision processes that rely on Gaussian process models.
Performance on real data will depend on how representative the simulated training functions are of the target domain.
Extensions could include adapting the policy online if generalization gaps appear.

Load-bearing premise

A policy trained only on functions drawn from Gaussian processes will produce high-quality and safe queries when applied to arbitrary unknown real-world functions.

What would settle it

Running the pretrained policy on a real regression task and observing that the selected queries either violate the safety constraints or yield slower learning progress than a standard safe active learning method using Gaussian processes.

Figures

Figures reproduced from arXiv: 2501.15458 by Barbara Rakitsch, Cen-You Li, Christoph Zimmer, Marc Toussaint.

**Figure 1.** Figure 1: Conventional safe AL relies on computationally expensive (orange) GP fitting and constrained acquisition. Our amortized approach meta trains a safe learner up-front on synthetic data, allowing fast, real-time (green) deployment. Active learning (AL) is a sequential design of experiments, aiming to learn a task with reduced data labeling effort [24, 43, 49]. Each label is queried by optimizing an acquisi… view at source ↗

**Figure 2.** Figure 2: Empirical results on standard AL. Left: RMSE on airfoil dataset vs number of queries T. Our trained policy is deployed at T = 2, 4, . . . , 40. DAD trains separate policies for each T (shown for T = 10, 20, 30, 40). All methods improve as T increases. Right: total query time at T = 20 across datasets. Our approach is significantly faster, requiring only a single NN forward pass for each data acquisition.… view at source ↗

**Figure 3.** Figure 3: Empirical results on safe AL. Left: Our method (blue) achieves competitive RMSE and outperforms Safe Random across all tasks. The RMSE is evaluated on safe test data only. Middle: Safety awareness is quantified as the proportion of safe queries out of the total T queries. Our method reaches the required threshold 1−γ on all tasks. Right: Our approach is significantly faster, as we avoid GP modeling and acq… view at source ↗

read the original abstract

Safe active learning (AL) is a sequential scheme for learning unknown systems while respecting safety constraints during data acquisition. Existing methods often rely on Gaussian processes (GPs) to model the task and safety constraints, requiring repeated GP updates and constrained acquisition optimization--incurring significant computations which are challenging for real-time decision-making. We propose amortized AL for regression and amortized safe AL, replacing expensive online computations with a pretrained neural policy. Inspired by recent advances in amortized Bayesian experimental design, we leverage GPs as pretraining simulators. We train our policy prior to the AL deployment on simulated nonparametric functions, using Fourier feature-based GP sampling and a differentiable acquisition objective that is safety-aware in the safe AL setting. At deployment, our policy selects informative and (if desired) safe queries via a single forward pass, eliminating GP inference and acquisition optimization. This leads to magnitudes of speed improvements while preserving learning quality. Our framework is modular and, without the safety component, yields fast unconstrained AL for time-sensitive tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a clean amortized pipeline for safe AL by pretraining a neural policy on Fourier-feature GP samples, but the quality-preservation claim hinges on untested sim-to-real transfer.

read the letter

The main contribution is a neural policy trained offline on nonparametric functions sampled from a GP prior via Fourier features, then deployed with a single forward pass to select queries that can also respect safety constraints. This removes the need for repeated GP updates and acquisition optimization at runtime, which is the stated bottleneck for real-time use cases. The safety-aware differentiable objective used during training is a reasonable way to bake in the constraint without online optimization, and the modularity note (dropping safety gives fast plain AL) is a practical detail that broadens the appeal. The framing as an extension of amortized Bayesian experimental design to the safe setting is the clearest novelty in the abstract. The approach is straightforward and avoids circularity by keeping pretraining external to the deployment data. The central soft spot is the load-bearing assumption that a policy trained only on GP-simulated functions will produce queries of comparable quality and safety on arbitrary real targets; real functions can deviate in stationarity, length scales, or discontinuities, and nothing in the provided text shows whether this gap was measured. No empirical results, speed benchmarks, or ablation studies appear in the abstract, so the “magnitudes of speed improvements while preserving learning quality” claim remains unsupported for now. This is aimed at researchers working on real-time safe learning in control or robotics who already know the GP baseline costs. A reader looking for new amortized methods would find the pipeline worth examining. It deserves peer review because the problem is well-motivated and the proposed solution is coherent, even if the current evidence is limited to the idea itself.

Referee Report

2 major / 0 minor

Summary. The paper proposes amortized safe active learning (AL) for regression, replacing repeated Gaussian process (GP) inference and constrained acquisition optimization with a pretrained neural policy. The policy is trained offline on nonparametric functions sampled from GPs via Fourier feature approximations, using a differentiable safety-aware acquisition objective. At deployment, the policy performs query selection (informative and optionally safe) in a single forward pass, claiming orders-of-magnitude speedups while preserving learning quality. The framework is modular and also supports fast unconstrained AL without the safety component.

Significance. If the sim-to-real generalization holds, the approach could enable real-time safe AL in latency-critical settings by amortizing expensive online computations. The use of simulation-based pretraining with differentiable objectives follows recent amortized Bayesian experimental design work and offers a modular design. However, the central claim of preserved quality rests on unproven transfer from GP-simulated training distributions to arbitrary real target functions.

major comments (2)

[Abstract] The load-bearing claim that the pretrained policy 'preserves learning quality' on real unknown targets (Abstract) is not supported by any reported empirical results, ablation studies, or comparisons to online GP-based safe AL on functions outside the Fourier-feature GP prior. Without such validation, the speed-quality tradeoff cannot be assessed.
The generalization assumption—that a policy trained solely on functions sampled from a stationary GP prior via Fourier features will produce high-quality and safe queries on real targets exhibiting non-stationarity, discontinuities, or mismatched length scales—is not guaranteed and directly undermines both the quality-preservation and safety claims if the sim-to-real gap is large.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed report. We address the two major comments point-by-point below, providing the strongest honest defense of the manuscript while acknowledging where clarifications or additions are warranted.

read point-by-point responses

Referee: [Abstract] The load-bearing claim that the pretrained policy 'preserves learning quality' on real unknown targets (Abstract) is not supported by any reported empirical results, ablation studies, or comparisons to online GP-based safe AL on functions outside the Fourier-feature GP prior. Without such validation, the speed-quality tradeoff cannot be assessed.

Authors: Our experiments evaluate the amortized policy against online GP baselines on held-out functions drawn from the identical Fourier-feature GP prior used for pretraining. In these settings the policy achieves statistically comparable regression performance (learning curves and final MSE) while delivering the reported speedups. The abstract claim of 'preserving learning quality' is therefore grounded in the reported results, which match the training distribution. We do not present results on real-world data or functions drawn from qualitatively different distributions. We will revise the abstract and introduction to state the evaluation scope explicitly and to avoid any implication of validation outside the GP-simulated regime. revision: partial
Referee: The generalization assumption—that a policy trained solely on functions sampled from a stationary GP prior via Fourier features will produce high-quality and safe queries on real targets exhibiting non-stationarity, discontinuities, or mismatched length scales—is not guaranteed and directly undermines both the quality-preservation and safety claims if the sim-to-real gap is large.

Authors: We agree that transfer performance is not guaranteed when the target function deviates substantially from the stationary GP prior (e.g., strong non-stationarity or discontinuities). The manuscript positions GPs as flexible simulators rather than claiming universal generalization; the Fourier-feature construction already permits a range of length scales, but cannot cover all possible real-world behaviors. Safety guarantees at deployment inherit the same distributional assumption. We will add an explicit limitations paragraph discussing the sim-to-real gap, the role of prior choice, and possible mitigations such as domain randomization during pretraining. revision: yes

Circularity Check

0 steps flagged

No circularity; pretraining on GP simulations is independent of deployment targets

full rationale

The paper's chain is: (1) sample nonparametric functions from GP prior via Fourier features, (2) optimize neural policy to mimic a differentiable acquisition objective on those simulations, (3) deploy the fixed policy via single forward pass on real data. This is standard amortized simulation-based training; the policy outputs are not defined in terms of, or fitted to, the eventual target function's observations. No equations reduce a claimed prediction back to its own inputs by construction. No self-citations appear as load-bearing premises in the abstract or described method. Generalization from GP-simulated training distribution to real functions is an empirical assumption, not a definitional loop. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that simulated nonparametric functions drawn from GPs are sufficiently representative for policy training to transfer to real tasks. No free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Gaussian processes with Fourier feature sampling can generate representative nonparametric functions for pretraining active learning policies
Used as the simulator for policy training prior to deployment

pith-pipeline@v0.9.0 · 5712 in / 1120 out tokens · 23779 ms · 2026-05-23T04:42:30.508700+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We leverage GPs as pretraining simulators... Fourier feature-based GP sampling and a differentiable acquisition objective
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

train our policy prior to the AL deployment on simulated nonparametric functions

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages

[1]

Constrained markov decision processes

Altman, E. Constrained markov decision processes. Routledge, 1999

work page 1999
[2]

Deep learning quadcopter control via risk-aware active learning

Andersson, O., Wzorek, M., and Doherty, P. Deep learning quadcopter control via risk-aware active learning. AAAI Conference on Artificial Intelligence, 2017

work page 2017
[3]

P., and Krause, A

Berkenkamp, F., Schoellig, A. P., and Krause, A. Safe controller optimization for quadrotors with gaussian processes. International Conference on Robotics and Automation, 2016

work page 2016
[4]

Berkenkamp, F., Krause, A., and Schoellig, A. P. Bayesian optimization with safety con- straints: Safe and automatic parameter tuning in robotics. Machine Learning, 2020

work page 2020
[5]

P., Jankowiak, M., Obermeyer, F., Pradhan, N., Karaletsos, T., Singh, R., Szerlip, P., Horsfall, P., and Goodman, N

Bingham, E., Chen, J. P., Jankowiak, M., Obermeyer, F., Pradhan, N., Karaletsos, T., Singh, R., Szerlip, P., Horsfall, P., and Goodman, N. D. Pyro: Deep Universal Probabilistic Programming. Journal of Machine Learning Research, 2018

work page 2018
[6]

Amortized inference for gaussian process hyperpa- rameters of structured kernels

Bitzer, M., Meister, M., and Zimmer, C. Amortized inference for gaussian process hyperpa- rameters of structured kernels. Conference on Uncertainty in Artificial Intelligence, 2023

work page 2023
[7]

Bottero, A., Luis, C., Vinogradska, J., Berkenkamp, F., and Peters, J. R. Information-theoretic safe exploration with gaussian processes.Advances in Neural Information Processing Systems, 2022

work page 2022
[8]

M., and de Freitas, N

Brochu, E., Cora, V . M., and de Freitas, N. A tutorial on bayesian optimization of expen- sive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv, 2010

work page 2010
[9]

W., Yuan, Z., Zhou, S., Panerati, J., and Schoellig, A

Brunke, L., Greeff, M., Hall, A. W., Yuan, Z., Zhou, S., Panerati, J., and Schoellig, A. P. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 2022

work page 2022
[10]

W., Colmenarejo, S

Chen, Y ., Hoffman, M. W., Colmenarejo, S. G., Denil, M., Lillicrap, T. P., Botvinick, M., and de Freitas, N. Learning to learn without gradient descent by gradient descent. International Conference on Machine Learning, 2017

work page 2017
[11]

R., Malik, I., and Rainforth, T

Foster, A., Ivanova, D. R., Malik, I., and Rainforth, T. Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design. International Conference on Machine Learning , 2021

work page 2021
[12]

A comprehensive survey on safe reinforcement learning

Garc ´ıa, J., Fern, and o Fern ´andez. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 2015

work page 2015
[13]

A., Snoek, J., and Adams, R

Gelbart, M. A., Snoek, J., and Adams, R. P. Bayesian optimization with unknown constraints. Conference on Uncertainty in Artificial Intelligence, 2014

work page 2014
[14]

B., Gray, G

Gramacy, R. B., Gray, G. A., Digabel, S. L., Lee, H. K. H., Ranjan, P., Wells, G., and Wild, S. M. Modeling an augmented lagrangian for blackbox constrained optimization. arXiv, 2015

work page 2015
[15]

and Hern´andez-Lobato, J

Griffiths, R.-R. and Hern´andez-Lobato, J. M. Constrained Bayesian optimization for automatic chemical design using variational autoencoders. Chemical Science, 2020

work page 2020
[16]

A review of safe reinforcement learning: Methods, theories and applications

Gu, S., Yang, L., Du, Y ., Chen, G., Walter, F., Wang, J., and Knoll, A. A review of safe reinforcement learning: Methods, theories and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

work page 2024
[17]

Guestrin, C., Krause, A., and Singh, A. P. Near-optimal sensor placements in gaussian pro- cesses. International Conference on Machine Learning, 2005

work page 2005
[18]

Scalable Variational Gaussian Process Clas- sification

Hensman, J., Matthews, A., and Ghahramani, Z. Scalable Variational Gaussian Process Clas- sification. International Conference on Artificial Intelligence and Statistics, 2015

work page 2015
[19]

Amortized bayesian experimental design for decision-making

Huang, D., Guo, Y ., Acerbi, L., and Kaski, S. Amortized bayesian experimental design for decision-making. Advances in Neural Information Processing Systems, 2024. 10

work page 2024
[20]

R., Foster, A., Kleinegesse, S., Gutmann, M

Ivanova, D. R., Foster, A., Kleinegesse, S., Gutmann, M. U., and Rainforth, T. Implicit Deep Adaptive Design: Policy-Based Experimental Design without Likelihoods.Advances in Neural Information Processing Systems, 2021

work page 2021
[21]

Adaptive and safe Bayesian optimization in high dimensions via one-dimensional subspaces

Kirschner, J., Mutny, M., Hiller, N., Ischebeck, R., and Krause, A. Adaptive and safe Bayesian optimization in high dimensions via one-dimensional subspaces. International Conference on Machine Learning, 2019

work page 2019
[22]

and Guestrin, C

Krause, A. and Guestrin, C. Nonmyopic active learning of gaussian processes: An exploration- exploitation approach. International Conference on Machine Learning, 2007

work page 2007
[23]

Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies

Krause, A., Singh, A., and Guestrin, C. Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies. Journal of Machine Learning Research , 2008

work page 2008
[24]

and Gupta, A

Kumar, P. and Gupta, A. Active learning query strategies for classification, regression, and clustering: A survey. Journal of Computer Science and Technology, 2020

work page 2020
[25]

Lederer, A., Conejo, A. J. O., Maier, K. A., Xiao, W., Umlauft, J., and Hirche, S. Gaussian process-based real-time learning for safety critical applications. International Conference on Machine Learning, 2021

work page 2021
[26]

Safe active learning for multi-output gaussian pro- cesses

Li, C.-Y ., Rakitsch, B., and Zimmer, C. Safe active learning for multi-output gaussian pro- cesses. International Conference on Artificial Intelligence and Statistics, 2022

work page 2022
[27]

Global safe sequential learning via efficient knowledge transfer

Li, C.-Y ., D¨unnbier, O., Toussaint, M., Rakitsch, B., and Zimmer, C. Global safe sequential learning via efficient knowledge transfer. Transactions on Machine Learning Research, 2025

work page 2025
[28]

On the variance of the adaptive learning rate and beyond

Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., and Han, J. On the variance of the adaptive learning rate and beyond. International Conference on Learning Representations, 2020

work page 2020
[29]

J., and Adams, R

Liu, S., Sun, X., Ramadge, P. J., and Adams, R. P. Task-agnostic amortized inference of gaussian process hyperparameters. Advances in Neural Information Processing Systems, 2020

work page 2020
[30]

B., Ober, S

Moss, H. B., Ober, S. W., and Picheny, V . Inducing point allocation for sparse gaussian pro- cesses in high-throughput bayesian optimisation. International Conference on Artificial Intel- ligence and Statistics, 2023

work page 2023
[31]

and Peters, J

Nguyen–Tuong, D. and Peters, J. Incremental sparsification for real-time online model learn- ing. International Conference on Artificial Intelligence and Statistics, 2010

work page 2010
[32]

and Wright, S

Nocedal, J. and Wright, S. J. Numerical optimization. Springer, 2006

work page 2006
[33]

N., Covell, P

Pamadi, B. N., Covell, P. F., Tartabini, P. V ., and Murphy, K. J. Aerodynamic characteristics and glide-back performance of langley glide-back booster. Applied Aerodynamics Conference and Exhibit, 2004

work page 2004
[34]

”how big is big enough?” adjusting model size in continual gaussian processes

Pescador-Barrios, G., Filippi, S., and van der Wilk, M. ”how big is big enough?” adjusting model size in continual gaussian processes. arXiv, 2024

work page 2024
[35]

and Recht, B

Rahimi, A. and Recht, B. Random features for large-scale kernel machines. Advances in Neural Information Processing Systems, 2007

work page 2007
[36]

R., and Smith, F

Rainforth, T., Foster, A., Ivanova, D. R., and Smith, F. B. Modern Bayesian Experimental Design. Statistical Science, 2024

work page 2024
[37]

and Williams, C

Rasmussen, C. and Williams, C. Gaussian processes for machine learning. MIT Press, 2006

work page 2006
[38]

E., Aftosmis, M

Rogers, S. E., Aftosmis, M. J., Pandya, S. A., Chaderjian, N. M., T., E. T., and Ahmad, J. U. Automated cfd parameter studies on distributed parallel computers.AIAA Computational Fluid Dynamics Conference, 2003

work page 2003
[39]

Pacoh: Bayes-optimal meta-learning with pac-guarantees

Rothfuss, J., Fortuin, V ., Josifoski, M., and Krause, A. Pacoh: Bayes-optimal meta-learning with pac-guarantees. International Conference on Machine Learning, 2021. 11

work page 2021
[40]

Safe exploration for active learning with gaussian processes

Schreiter, J., Nguyen-Tuong, D., Eberts, M., Bischoff, B., Markert, H., and Toussaint, M. Safe exploration for active learning with gaussian processes. Machine Learning and Knowledge Discovery in Databases, 2015

work page 2015
[41]

Low rank updates for the cholesky decomposition

Seeger, M. Low rank updates for the cholesky decomposition. 2004

work page 2004
[42]

Gaussian process regression: active data selection and test point rejection

Seo, S., Wallat, M., Graepel, T., and Obermayer, K. Gaussian process regression: active data selection and test point rejection. International Joint Conference on Neural Networks, 2000

work page 2000
[43]

Active learning literature survey

Settles, B. Active learning literature survey. University of Wisconsin-Madison, 2010

work page 2010
[44]

Computer-aided graphing and simulation tools for autocad users

Simionescu, P. Computer-aided graphing and simulation tools for autocad users. Computer- Aided Graphing and Simulation Tools for AutoCAD Users, 2014

work page 2014
[45]

M., and Seeger, M

Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M. W. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 2012

work page 2012
[46]

Safe exploration for optimization with gaussian processes

Sui, Y ., Gotovos, A., Burdick, J., and Krause, A. Safe exploration for optimization with gaussian processes. International Conference on Machine Learning, 2015

work page 2015
[47]

W., and Yue, Y

Sui, Y ., Zhuang, V ., Burdick, J. W., and Yue, Y . Stagewise Safe Bayesian Optimization with Gaussian Processes. International Conference on Machine Learning, 80, 2018

work page 2018
[48]

Amortized bayesian optimization over discrete spaces

Swersky, K., Rubanova, Y ., Dohan, D., and Murphy, K. Amortized bayesian optimization over discrete spaces. Conference on Uncertainty in Artificial Intelligence, 2020

work page 2020
[49]

and Schenck, W

Tharwat, A. and Schenck, W. A survey on active learning: State-of-the-art, practical challenges and research directions. Mathematics, 2023

work page 2023
[50]

Variational learning of inducing variables in sparse gaussian processes

Titsias, M. Variational learning of inducing variables in sparse gaussian processes. Interna- tional Conference on Artificial Intelligence and Statistics, 2009

work page 2009
[51]

Constrained optimization in chebfun

Townsend, A. Constrained optimization in chebfun. chebfun.org, 2017

work page 2017
[52]

N., Kaiser, L

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. Attention is all you need. Advances in Neural Information Processing Systems, 2017

work page 2017
[53]

T., Borovitskiy, V ., Terenin, A., Mostowsky, P., and Deisenroth, M

Wilson, J. T., Borovitskiy, V ., Terenin, A., Mostowsky, P., and Deisenroth, M. P. Efficiently sampling functions from gaussian process posteriors. International Conference on Machine Learning, 2020

work page 2020
[54]

N., Low, K

Zhang, Y ., Hoang, T. N., Low, K. H., and Kankanhalli, M. Near-optimal active learning of multi-output gaussian processes. AAAI Conference on Artificial Intelligence, 2016

work page 2016
[55]

Safe active learning for time-series modeling with gaussian processes

Zimmer, C., Meister, M., and Nguyen-Tuong, D. Safe active learning for time-series modeling with gaussian processes. Advances in Neural Information Processing Systems, 2018. 12 Appendix Overview A Gaussian process: distribution and entropy 14 B Policy NN structure 15 C Training objectives: details, illustrations, more objectives 16 C.1 Objectives: unconst...

work page 2018
[56]

the history encoder ( {(xi, yi)}t i=1 → Et) and the decision feed forward MLP (originally Et → xt+1) are taken from Ivanova et al. [20]

work page
[57]

we add a hyperbolic tangent function as the last layer to ensure the policy output is in our bounded X , which was not needed in the original Bayesian experimental design problems

work page
[58]

we add another history encoder to handle the safety data

work page
[59]

Ef0,...,fNf,q ,Einit,ϵ1:T

we add a budget encoder to handle the budget variable. Note that the history encoder incorporates the inductive bias that observed data are order-invariant (see [11, 20] for details). This can be seen by noticing that a conventional AL computes the ac- quisition score conditioned on the past observations and the order of the past data does not matter. We ...

work page
[60]

Consider Q = ID, then 1 D PD d=1 wd[u]2 d = 1 D PD d=1 wd([x]d − 0.5)2 is an ellipsoid cen- tering around (0.5, ..., 0.5) ∈ RD

work page
[61]

The orthogonal matrix Q is obtained by performing a QR- decomposition of a sampled A ∈ RD×D (each entity from Uniform[−1, 1])

We can see that µq has the center area being a safe ellipsoid as long as c > 0, with shape and size controlled by wd, and the orthogonal matrix Q allows us to rotate the ellipsoid around the center (0.5, ..., 0.5). The orthogonal matrix Q is obtained by performing a QR- decomposition of a sampled A ∈ RD×D (each entity from Uniform[−1, 1])

work page
[62]

The above steps describe variables wd, c, Q, and we then describe the constants

work page
[63]

wd = 20 , D = 2 ) and Q = ID, then the central safe area is a ball and it takes about half of the space, i.e

If we consider c = 1 , wd/D = 10 , ∀d ≤ D (e.g. wd = 20 , D = 2 ) and Q = ID, then the central safe area is a ball and it takes about half of the space, i.e. the mean function µq brings half of the space safe and half unsafe. We will later sample the shape and the half-safe space is only for an initial design

work page
[64]

22 Table A.E.2: Batch sizes in training

With the same c, wd, Q, the constants 3.2 and −0.47 ensure zero mean and unit variance of this µq function, which aligns with our setup that the deployment problems are normalized, and this provides us an estimated variance of µq ≈ c2. 22 Table A.E.2: Batch sizes in training. loss functions I DAD SH Nk = |{(θ, θq)}| 10 10 10 Nf,q = |{(f, q)}| 5 200 5 B = ...

work page
[65]

After the specifiedT data points are collected, we use the initial and queried data to fit a GP model with Type II maximum likelihood (optimization: L-BFGS-B algorithm)

Our AAL, our ASAL, DAD, Random: we deploy our amortized (safe) AL or the DAD, Random baselines to collect data. After the specifiedT data points are collected, we use the initial and queried data to fit a GP model with Type II maximum likelihood (optimization: L-BFGS-B algorithm)

work page
[66]

GP AL, Safe GP AL, Safe Random: We deploy conventional AL and safe AL (Algo- rithms A.8 and A.9), which update GP iteratively with Type II maximum likelihood (opti- mization: L-BFGS-B algorithm)

work page
[67]

[6] (AGP)

AGP AL, Safe AGP AL, Safe ARandom: We deploy conventional AL and safe AL (Algo- rithms A.8 and A.9), which update GP iteratively with an amortized inference developed by Bitzer et al. [6] (AGP). Bitzer et al. [6] sampled GP data and trained a transformer model to approximate the Type II maximum likelihood. This AGP is a model with a transformer module. Wh...

work page
[68]

safe AL of policy trained on our main SH, γ =5% (Eq. (8)),

work page
[69]

safe AL of policy trained on our appendix SHmean, γ =5% (unconstrained Hmean decorated with our main min unsafe likelihood, see Figure A.C.6),

work page
[70]

(A.18) and Figure A.C.6),

safe AL of policy trained on our appendix SH,division (unconstrained H decorated with our appendix max safe likelihood, see Eq. (A.18) and Figure A.C.6),

work page
[71]

safe AL of policy trained on our appendix SHmean,division (unconstrained Hmean decorated with our appendix max safe likelihood, see Figure A.C.6), and

work page
[72]

(7)), named MinUnsafe GP AL: xt = argmax{H[y(x)|y1:t−1, Yinit] − log max(γ, p(z(x) < 0|z1:t−1, Zinit))} (γ = 0 .05, this is the same as Eq

conventional GP based safe AL (Algorithm A.9) but we add the unconstrained safety-aware acquisition criterion (Eq. (7)), named MinUnsafe GP AL: xt = argmax{H[y(x)|y1:t−1, Yinit] − log max(γ, p(z(x) < 0|z1:t−1, Zinit))} (γ = 0 .05, this is the same as Eq. (7) if we take expectation over the forecastedy(x), and this corresponds to objectives SH, SHmean, see...

work page

[1] [1]

Constrained markov decision processes

Altman, E. Constrained markov decision processes. Routledge, 1999

work page 1999

[2] [2]

Deep learning quadcopter control via risk-aware active learning

Andersson, O., Wzorek, M., and Doherty, P. Deep learning quadcopter control via risk-aware active learning. AAAI Conference on Artificial Intelligence, 2017

work page 2017

[3] [3]

P., and Krause, A

Berkenkamp, F., Schoellig, A. P., and Krause, A. Safe controller optimization for quadrotors with gaussian processes. International Conference on Robotics and Automation, 2016

work page 2016

[4] [4]

Berkenkamp, F., Krause, A., and Schoellig, A. P. Bayesian optimization with safety con- straints: Safe and automatic parameter tuning in robotics. Machine Learning, 2020

work page 2020

[5] [5]

P., Jankowiak, M., Obermeyer, F., Pradhan, N., Karaletsos, T., Singh, R., Szerlip, P., Horsfall, P., and Goodman, N

Bingham, E., Chen, J. P., Jankowiak, M., Obermeyer, F., Pradhan, N., Karaletsos, T., Singh, R., Szerlip, P., Horsfall, P., and Goodman, N. D. Pyro: Deep Universal Probabilistic Programming. Journal of Machine Learning Research, 2018

work page 2018

[6] [6]

Amortized inference for gaussian process hyperpa- rameters of structured kernels

Bitzer, M., Meister, M., and Zimmer, C. Amortized inference for gaussian process hyperpa- rameters of structured kernels. Conference on Uncertainty in Artificial Intelligence, 2023

work page 2023

[7] [7]

Bottero, A., Luis, C., Vinogradska, J., Berkenkamp, F., and Peters, J. R. Information-theoretic safe exploration with gaussian processes.Advances in Neural Information Processing Systems, 2022

work page 2022

[8] [8]

M., and de Freitas, N

Brochu, E., Cora, V . M., and de Freitas, N. A tutorial on bayesian optimization of expen- sive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv, 2010

work page 2010

[9] [9]

W., Yuan, Z., Zhou, S., Panerati, J., and Schoellig, A

Brunke, L., Greeff, M., Hall, A. W., Yuan, Z., Zhou, S., Panerati, J., and Schoellig, A. P. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 2022

work page 2022

[10] [10]

W., Colmenarejo, S

Chen, Y ., Hoffman, M. W., Colmenarejo, S. G., Denil, M., Lillicrap, T. P., Botvinick, M., and de Freitas, N. Learning to learn without gradient descent by gradient descent. International Conference on Machine Learning, 2017

work page 2017

[11] [11]

R., Malik, I., and Rainforth, T

Foster, A., Ivanova, D. R., Malik, I., and Rainforth, T. Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design. International Conference on Machine Learning , 2021

work page 2021

[12] [12]

A comprehensive survey on safe reinforcement learning

Garc ´ıa, J., Fern, and o Fern ´andez. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 2015

work page 2015

[13] [13]

A., Snoek, J., and Adams, R

Gelbart, M. A., Snoek, J., and Adams, R. P. Bayesian optimization with unknown constraints. Conference on Uncertainty in Artificial Intelligence, 2014

work page 2014

[14] [14]

B., Gray, G

Gramacy, R. B., Gray, G. A., Digabel, S. L., Lee, H. K. H., Ranjan, P., Wells, G., and Wild, S. M. Modeling an augmented lagrangian for blackbox constrained optimization. arXiv, 2015

work page 2015

[15] [15]

and Hern´andez-Lobato, J

Griffiths, R.-R. and Hern´andez-Lobato, J. M. Constrained Bayesian optimization for automatic chemical design using variational autoencoders. Chemical Science, 2020

work page 2020

[16] [16]

A review of safe reinforcement learning: Methods, theories and applications

Gu, S., Yang, L., Du, Y ., Chen, G., Walter, F., Wang, J., and Knoll, A. A review of safe reinforcement learning: Methods, theories and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

work page 2024

[17] [17]

Guestrin, C., Krause, A., and Singh, A. P. Near-optimal sensor placements in gaussian pro- cesses. International Conference on Machine Learning, 2005

work page 2005

[18] [18]

Scalable Variational Gaussian Process Clas- sification

Hensman, J., Matthews, A., and Ghahramani, Z. Scalable Variational Gaussian Process Clas- sification. International Conference on Artificial Intelligence and Statistics, 2015

work page 2015

[19] [19]

Amortized bayesian experimental design for decision-making

Huang, D., Guo, Y ., Acerbi, L., and Kaski, S. Amortized bayesian experimental design for decision-making. Advances in Neural Information Processing Systems, 2024. 10

work page 2024

[20] [20]

R., Foster, A., Kleinegesse, S., Gutmann, M

Ivanova, D. R., Foster, A., Kleinegesse, S., Gutmann, M. U., and Rainforth, T. Implicit Deep Adaptive Design: Policy-Based Experimental Design without Likelihoods.Advances in Neural Information Processing Systems, 2021

work page 2021

[21] [21]

Adaptive and safe Bayesian optimization in high dimensions via one-dimensional subspaces

Kirschner, J., Mutny, M., Hiller, N., Ischebeck, R., and Krause, A. Adaptive and safe Bayesian optimization in high dimensions via one-dimensional subspaces. International Conference on Machine Learning, 2019

work page 2019

[22] [22]

and Guestrin, C

Krause, A. and Guestrin, C. Nonmyopic active learning of gaussian processes: An exploration- exploitation approach. International Conference on Machine Learning, 2007

work page 2007

[23] [23]

Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies

Krause, A., Singh, A., and Guestrin, C. Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies. Journal of Machine Learning Research , 2008

work page 2008

[24] [24]

and Gupta, A

Kumar, P. and Gupta, A. Active learning query strategies for classification, regression, and clustering: A survey. Journal of Computer Science and Technology, 2020

work page 2020

[25] [25]

Lederer, A., Conejo, A. J. O., Maier, K. A., Xiao, W., Umlauft, J., and Hirche, S. Gaussian process-based real-time learning for safety critical applications. International Conference on Machine Learning, 2021

work page 2021

[26] [26]

Safe active learning for multi-output gaussian pro- cesses

Li, C.-Y ., Rakitsch, B., and Zimmer, C. Safe active learning for multi-output gaussian pro- cesses. International Conference on Artificial Intelligence and Statistics, 2022

work page 2022

[27] [27]

Global safe sequential learning via efficient knowledge transfer

Li, C.-Y ., D¨unnbier, O., Toussaint, M., Rakitsch, B., and Zimmer, C. Global safe sequential learning via efficient knowledge transfer. Transactions on Machine Learning Research, 2025

work page 2025

[28] [28]

On the variance of the adaptive learning rate and beyond

Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., and Han, J. On the variance of the adaptive learning rate and beyond. International Conference on Learning Representations, 2020

work page 2020

[29] [29]

J., and Adams, R

Liu, S., Sun, X., Ramadge, P. J., and Adams, R. P. Task-agnostic amortized inference of gaussian process hyperparameters. Advances in Neural Information Processing Systems, 2020

work page 2020

[30] [30]

B., Ober, S

Moss, H. B., Ober, S. W., and Picheny, V . Inducing point allocation for sparse gaussian pro- cesses in high-throughput bayesian optimisation. International Conference on Artificial Intel- ligence and Statistics, 2023

work page 2023

[31] [31]

and Peters, J

Nguyen–Tuong, D. and Peters, J. Incremental sparsification for real-time online model learn- ing. International Conference on Artificial Intelligence and Statistics, 2010

work page 2010

[32] [32]

and Wright, S

Nocedal, J. and Wright, S. J. Numerical optimization. Springer, 2006

work page 2006

[33] [33]

N., Covell, P

Pamadi, B. N., Covell, P. F., Tartabini, P. V ., and Murphy, K. J. Aerodynamic characteristics and glide-back performance of langley glide-back booster. Applied Aerodynamics Conference and Exhibit, 2004

work page 2004

[34] [34]

”how big is big enough?” adjusting model size in continual gaussian processes

Pescador-Barrios, G., Filippi, S., and van der Wilk, M. ”how big is big enough?” adjusting model size in continual gaussian processes. arXiv, 2024

work page 2024

[35] [35]

and Recht, B

Rahimi, A. and Recht, B. Random features for large-scale kernel machines. Advances in Neural Information Processing Systems, 2007

work page 2007

[36] [36]

R., and Smith, F

Rainforth, T., Foster, A., Ivanova, D. R., and Smith, F. B. Modern Bayesian Experimental Design. Statistical Science, 2024

work page 2024

[37] [37]

and Williams, C

Rasmussen, C. and Williams, C. Gaussian processes for machine learning. MIT Press, 2006

work page 2006

[38] [38]

E., Aftosmis, M

Rogers, S. E., Aftosmis, M. J., Pandya, S. A., Chaderjian, N. M., T., E. T., and Ahmad, J. U. Automated cfd parameter studies on distributed parallel computers.AIAA Computational Fluid Dynamics Conference, 2003

work page 2003

[39] [39]

Pacoh: Bayes-optimal meta-learning with pac-guarantees

Rothfuss, J., Fortuin, V ., Josifoski, M., and Krause, A. Pacoh: Bayes-optimal meta-learning with pac-guarantees. International Conference on Machine Learning, 2021. 11

work page 2021

[40] [40]

Safe exploration for active learning with gaussian processes

Schreiter, J., Nguyen-Tuong, D., Eberts, M., Bischoff, B., Markert, H., and Toussaint, M. Safe exploration for active learning with gaussian processes. Machine Learning and Knowledge Discovery in Databases, 2015

work page 2015

[41] [41]

Low rank updates for the cholesky decomposition

Seeger, M. Low rank updates for the cholesky decomposition. 2004

work page 2004

[42] [42]

Gaussian process regression: active data selection and test point rejection

Seo, S., Wallat, M., Graepel, T., and Obermayer, K. Gaussian process regression: active data selection and test point rejection. International Joint Conference on Neural Networks, 2000

work page 2000

[43] [43]

Active learning literature survey

Settles, B. Active learning literature survey. University of Wisconsin-Madison, 2010

work page 2010

[44] [44]

Computer-aided graphing and simulation tools for autocad users

Simionescu, P. Computer-aided graphing and simulation tools for autocad users. Computer- Aided Graphing and Simulation Tools for AutoCAD Users, 2014

work page 2014

[45] [45]

M., and Seeger, M

Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M. W. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 2012

work page 2012

[46] [46]

Safe exploration for optimization with gaussian processes

Sui, Y ., Gotovos, A., Burdick, J., and Krause, A. Safe exploration for optimization with gaussian processes. International Conference on Machine Learning, 2015

work page 2015

[47] [47]

W., and Yue, Y

Sui, Y ., Zhuang, V ., Burdick, J. W., and Yue, Y . Stagewise Safe Bayesian Optimization with Gaussian Processes. International Conference on Machine Learning, 80, 2018

work page 2018

[48] [48]

Amortized bayesian optimization over discrete spaces

Swersky, K., Rubanova, Y ., Dohan, D., and Murphy, K. Amortized bayesian optimization over discrete spaces. Conference on Uncertainty in Artificial Intelligence, 2020

work page 2020

[49] [49]

and Schenck, W

Tharwat, A. and Schenck, W. A survey on active learning: State-of-the-art, practical challenges and research directions. Mathematics, 2023

work page 2023

[50] [50]

Variational learning of inducing variables in sparse gaussian processes

Titsias, M. Variational learning of inducing variables in sparse gaussian processes. Interna- tional Conference on Artificial Intelligence and Statistics, 2009

work page 2009

[51] [51]

Constrained optimization in chebfun

Townsend, A. Constrained optimization in chebfun. chebfun.org, 2017

work page 2017

[52] [52]

N., Kaiser, L

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. Attention is all you need. Advances in Neural Information Processing Systems, 2017

work page 2017

[53] [53]

T., Borovitskiy, V ., Terenin, A., Mostowsky, P., and Deisenroth, M

Wilson, J. T., Borovitskiy, V ., Terenin, A., Mostowsky, P., and Deisenroth, M. P. Efficiently sampling functions from gaussian process posteriors. International Conference on Machine Learning, 2020

work page 2020

[54] [54]

N., Low, K

Zhang, Y ., Hoang, T. N., Low, K. H., and Kankanhalli, M. Near-optimal active learning of multi-output gaussian processes. AAAI Conference on Artificial Intelligence, 2016

work page 2016

[55] [55]

Safe active learning for time-series modeling with gaussian processes

Zimmer, C., Meister, M., and Nguyen-Tuong, D. Safe active learning for time-series modeling with gaussian processes. Advances in Neural Information Processing Systems, 2018. 12 Appendix Overview A Gaussian process: distribution and entropy 14 B Policy NN structure 15 C Training objectives: details, illustrations, more objectives 16 C.1 Objectives: unconst...

work page 2018

[56] [56]

the history encoder ( {(xi, yi)}t i=1 → Et) and the decision feed forward MLP (originally Et → xt+1) are taken from Ivanova et al. [20]

work page

[57] [57]

we add a hyperbolic tangent function as the last layer to ensure the policy output is in our bounded X , which was not needed in the original Bayesian experimental design problems

work page

[58] [58]

we add another history encoder to handle the safety data

work page

[59] [59]

Ef0,...,fNf,q ,Einit,ϵ1:T

we add a budget encoder to handle the budget variable. Note that the history encoder incorporates the inductive bias that observed data are order-invariant (see [11, 20] for details). This can be seen by noticing that a conventional AL computes the ac- quisition score conditioned on the past observations and the order of the past data does not matter. We ...

work page

[60] [60]

Consider Q = ID, then 1 D PD d=1 wd[u]2 d = 1 D PD d=1 wd([x]d − 0.5)2 is an ellipsoid cen- tering around (0.5, ..., 0.5) ∈ RD

work page

[61] [61]

The orthogonal matrix Q is obtained by performing a QR- decomposition of a sampled A ∈ RD×D (each entity from Uniform[−1, 1])

We can see that µq has the center area being a safe ellipsoid as long as c > 0, with shape and size controlled by wd, and the orthogonal matrix Q allows us to rotate the ellipsoid around the center (0.5, ..., 0.5). The orthogonal matrix Q is obtained by performing a QR- decomposition of a sampled A ∈ RD×D (each entity from Uniform[−1, 1])

work page

[62] [62]

The above steps describe variables wd, c, Q, and we then describe the constants

work page

[63] [63]

wd = 20 , D = 2 ) and Q = ID, then the central safe area is a ball and it takes about half of the space, i.e

If we consider c = 1 , wd/D = 10 , ∀d ≤ D (e.g. wd = 20 , D = 2 ) and Q = ID, then the central safe area is a ball and it takes about half of the space, i.e. the mean function µq brings half of the space safe and half unsafe. We will later sample the shape and the half-safe space is only for an initial design

work page

[64] [64]

22 Table A.E.2: Batch sizes in training

With the same c, wd, Q, the constants 3.2 and −0.47 ensure zero mean and unit variance of this µq function, which aligns with our setup that the deployment problems are normalized, and this provides us an estimated variance of µq ≈ c2. 22 Table A.E.2: Batch sizes in training. loss functions I DAD SH Nk = |{(θ, θq)}| 10 10 10 Nf,q = |{(f, q)}| 5 200 5 B = ...

work page

[65] [65]

After the specifiedT data points are collected, we use the initial and queried data to fit a GP model with Type II maximum likelihood (optimization: L-BFGS-B algorithm)

Our AAL, our ASAL, DAD, Random: we deploy our amortized (safe) AL or the DAD, Random baselines to collect data. After the specifiedT data points are collected, we use the initial and queried data to fit a GP model with Type II maximum likelihood (optimization: L-BFGS-B algorithm)

work page

[66] [66]

GP AL, Safe GP AL, Safe Random: We deploy conventional AL and safe AL (Algo- rithms A.8 and A.9), which update GP iteratively with Type II maximum likelihood (opti- mization: L-BFGS-B algorithm)

work page

[67] [67]

[6] (AGP)

AGP AL, Safe AGP AL, Safe ARandom: We deploy conventional AL and safe AL (Algo- rithms A.8 and A.9), which update GP iteratively with an amortized inference developed by Bitzer et al. [6] (AGP). Bitzer et al. [6] sampled GP data and trained a transformer model to approximate the Type II maximum likelihood. This AGP is a model with a transformer module. Wh...

work page

[68] [68]

safe AL of policy trained on our main SH, γ =5% (Eq. (8)),

work page

[69] [69]

safe AL of policy trained on our appendix SHmean, γ =5% (unconstrained Hmean decorated with our main min unsafe likelihood, see Figure A.C.6),

work page

[70] [70]

(A.18) and Figure A.C.6),

safe AL of policy trained on our appendix SH,division (unconstrained H decorated with our appendix max safe likelihood, see Eq. (A.18) and Figure A.C.6),

work page

[71] [71]

safe AL of policy trained on our appendix SHmean,division (unconstrained Hmean decorated with our appendix max safe likelihood, see Figure A.C.6), and

work page

[72] [72]

(7)), named MinUnsafe GP AL: xt = argmax{H[y(x)|y1:t−1, Yinit] − log max(γ, p(z(x) < 0|z1:t−1, Zinit))} (γ = 0 .05, this is the same as Eq

conventional GP based safe AL (Algorithm A.9) but we add the unconstrained safety-aware acquisition criterion (Eq. (7)), named MinUnsafe GP AL: xt = argmax{H[y(x)|y1:t−1, Yinit] − log max(γ, p(z(x) < 0|z1:t−1, Zinit))} (γ = 0 .05, this is the same as Eq. (7) if we take expectation over the forecastedy(x), and this corresponds to objectives SH, SHmean, see...

work page