Amortized Safe Active Learning for Real-Time Data Acquisition: Pretrained Neural Policies From Simulated Nonparametric Functions
Pith reviewed 2026-05-23 04:42 UTC · model grok-4.3
The pith
A neural policy pretrained on simulated Gaussian process samples selects safe queries for active learning via a single forward pass.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By training a neural policy on nonparametric functions generated via Fourier feature approximations of Gaussian processes, the method allows the selection of informative and safe data points for regression tasks through a single forward pass of the network, bypassing repeated Gaussian process inference and constrained optimization at each step of the active learning loop.
What carries the argument
The pretrained neural policy that takes current observations and outputs the next query point, optimized using a differentiable safety-aware acquisition function during pretraining on simulated data.
If this is right
- Real-time active learning becomes feasible for applications requiring immediate decisions.
- Learning quality remains comparable to traditional Gaussian process based methods.
- The framework supports both safe and unconstrained active learning through modularity.
- Safety constraints can be incorporated without additional online computation.
Where Pith is reading between the lines
- The same pretraining strategy might accelerate other sequential decision processes that rely on Gaussian process models.
- Performance on real data will depend on how representative the simulated training functions are of the target domain.
- Extensions could include adapting the policy online if generalization gaps appear.
Load-bearing premise
A policy trained only on functions drawn from Gaussian processes will produce high-quality and safe queries when applied to arbitrary unknown real-world functions.
What would settle it
Running the pretrained policy on a real regression task and observing that the selected queries either violate the safety constraints or yield slower learning progress than a standard safe active learning method using Gaussian processes.
Figures
read the original abstract
Safe active learning (AL) is a sequential scheme for learning unknown systems while respecting safety constraints during data acquisition. Existing methods often rely on Gaussian processes (GPs) to model the task and safety constraints, requiring repeated GP updates and constrained acquisition optimization--incurring significant computations which are challenging for real-time decision-making. We propose amortized AL for regression and amortized safe AL, replacing expensive online computations with a pretrained neural policy. Inspired by recent advances in amortized Bayesian experimental design, we leverage GPs as pretraining simulators. We train our policy prior to the AL deployment on simulated nonparametric functions, using Fourier feature-based GP sampling and a differentiable acquisition objective that is safety-aware in the safe AL setting. At deployment, our policy selects informative and (if desired) safe queries via a single forward pass, eliminating GP inference and acquisition optimization. This leads to magnitudes of speed improvements while preserving learning quality. Our framework is modular and, without the safety component, yields fast unconstrained AL for time-sensitive tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes amortized safe active learning (AL) for regression, replacing repeated Gaussian process (GP) inference and constrained acquisition optimization with a pretrained neural policy. The policy is trained offline on nonparametric functions sampled from GPs via Fourier feature approximations, using a differentiable safety-aware acquisition objective. At deployment, the policy performs query selection (informative and optionally safe) in a single forward pass, claiming orders-of-magnitude speedups while preserving learning quality. The framework is modular and also supports fast unconstrained AL without the safety component.
Significance. If the sim-to-real generalization holds, the approach could enable real-time safe AL in latency-critical settings by amortizing expensive online computations. The use of simulation-based pretraining with differentiable objectives follows recent amortized Bayesian experimental design work and offers a modular design. However, the central claim of preserved quality rests on unproven transfer from GP-simulated training distributions to arbitrary real target functions.
major comments (2)
- [Abstract] The load-bearing claim that the pretrained policy 'preserves learning quality' on real unknown targets (Abstract) is not supported by any reported empirical results, ablation studies, or comparisons to online GP-based safe AL on functions outside the Fourier-feature GP prior. Without such validation, the speed-quality tradeoff cannot be assessed.
- The generalization assumption—that a policy trained solely on functions sampled from a stationary GP prior via Fourier features will produce high-quality and safe queries on real targets exhibiting non-stationarity, discontinuities, or mismatched length scales—is not guaranteed and directly undermines both the quality-preservation and safety claims if the sim-to-real gap is large.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed report. We address the two major comments point-by-point below, providing the strongest honest defense of the manuscript while acknowledging where clarifications or additions are warranted.
read point-by-point responses
-
Referee: [Abstract] The load-bearing claim that the pretrained policy 'preserves learning quality' on real unknown targets (Abstract) is not supported by any reported empirical results, ablation studies, or comparisons to online GP-based safe AL on functions outside the Fourier-feature GP prior. Without such validation, the speed-quality tradeoff cannot be assessed.
Authors: Our experiments evaluate the amortized policy against online GP baselines on held-out functions drawn from the identical Fourier-feature GP prior used for pretraining. In these settings the policy achieves statistically comparable regression performance (learning curves and final MSE) while delivering the reported speedups. The abstract claim of 'preserving learning quality' is therefore grounded in the reported results, which match the training distribution. We do not present results on real-world data or functions drawn from qualitatively different distributions. We will revise the abstract and introduction to state the evaluation scope explicitly and to avoid any implication of validation outside the GP-simulated regime. revision: partial
-
Referee: The generalization assumption—that a policy trained solely on functions sampled from a stationary GP prior via Fourier features will produce high-quality and safe queries on real targets exhibiting non-stationarity, discontinuities, or mismatched length scales—is not guaranteed and directly undermines both the quality-preservation and safety claims if the sim-to-real gap is large.
Authors: We agree that transfer performance is not guaranteed when the target function deviates substantially from the stationary GP prior (e.g., strong non-stationarity or discontinuities). The manuscript positions GPs as flexible simulators rather than claiming universal generalization; the Fourier-feature construction already permits a range of length scales, but cannot cover all possible real-world behaviors. Safety guarantees at deployment inherit the same distributional assumption. We will add an explicit limitations paragraph discussing the sim-to-real gap, the role of prior choice, and possible mitigations such as domain randomization during pretraining. revision: yes
Circularity Check
No circularity; pretraining on GP simulations is independent of deployment targets
full rationale
The paper's chain is: (1) sample nonparametric functions from GP prior via Fourier features, (2) optimize neural policy to mimic a differentiable acquisition objective on those simulations, (3) deploy the fixed policy via single forward pass on real data. This is standard amortized simulation-based training; the policy outputs are not defined in terms of, or fitted to, the eventual target function's observations. No equations reduce a claimed prediction back to its own inputs by construction. No self-citations appear as load-bearing premises in the abstract or described method. Generalization from GP-simulated training distribution to real functions is an empirical assumption, not a definitional loop. The derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Gaussian processes with Fourier feature sampling can generate representative nonparametric functions for pretraining active learning policies
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We leverage GPs as pretraining simulators... Fourier feature-based GP sampling and a differentiable acquisition objective
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
train our policy prior to the AL deployment on simulated nonparametric functions
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Constrained markov decision processes
Altman, E. Constrained markov decision processes. Routledge, 1999
work page 1999
-
[2]
Deep learning quadcopter control via risk-aware active learning
Andersson, O., Wzorek, M., and Doherty, P. Deep learning quadcopter control via risk-aware active learning. AAAI Conference on Artificial Intelligence, 2017
work page 2017
-
[3]
Berkenkamp, F., Schoellig, A. P., and Krause, A. Safe controller optimization for quadrotors with gaussian processes. International Conference on Robotics and Automation, 2016
work page 2016
-
[4]
Berkenkamp, F., Krause, A., and Schoellig, A. P. Bayesian optimization with safety con- straints: Safe and automatic parameter tuning in robotics. Machine Learning, 2020
work page 2020
-
[5]
Bingham, E., Chen, J. P., Jankowiak, M., Obermeyer, F., Pradhan, N., Karaletsos, T., Singh, R., Szerlip, P., Horsfall, P., and Goodman, N. D. Pyro: Deep Universal Probabilistic Programming. Journal of Machine Learning Research, 2018
work page 2018
-
[6]
Amortized inference for gaussian process hyperpa- rameters of structured kernels
Bitzer, M., Meister, M., and Zimmer, C. Amortized inference for gaussian process hyperpa- rameters of structured kernels. Conference on Uncertainty in Artificial Intelligence, 2023
work page 2023
-
[7]
Bottero, A., Luis, C., Vinogradska, J., Berkenkamp, F., and Peters, J. R. Information-theoretic safe exploration with gaussian processes.Advances in Neural Information Processing Systems, 2022
work page 2022
-
[8]
Brochu, E., Cora, V . M., and de Freitas, N. A tutorial on bayesian optimization of expen- sive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv, 2010
work page 2010
-
[9]
W., Yuan, Z., Zhou, S., Panerati, J., and Schoellig, A
Brunke, L., Greeff, M., Hall, A. W., Yuan, Z., Zhou, S., Panerati, J., and Schoellig, A. P. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 2022
work page 2022
-
[10]
Chen, Y ., Hoffman, M. W., Colmenarejo, S. G., Denil, M., Lillicrap, T. P., Botvinick, M., and de Freitas, N. Learning to learn without gradient descent by gradient descent. International Conference on Machine Learning, 2017
work page 2017
-
[11]
R., Malik, I., and Rainforth, T
Foster, A., Ivanova, D. R., Malik, I., and Rainforth, T. Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design. International Conference on Machine Learning , 2021
work page 2021
-
[12]
A comprehensive survey on safe reinforcement learning
Garc ´ıa, J., Fern, and o Fern ´andez. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 2015
work page 2015
-
[13]
Gelbart, M. A., Snoek, J., and Adams, R. P. Bayesian optimization with unknown constraints. Conference on Uncertainty in Artificial Intelligence, 2014
work page 2014
-
[14]
Gramacy, R. B., Gray, G. A., Digabel, S. L., Lee, H. K. H., Ranjan, P., Wells, G., and Wild, S. M. Modeling an augmented lagrangian for blackbox constrained optimization. arXiv, 2015
work page 2015
-
[15]
Griffiths, R.-R. and Hern´andez-Lobato, J. M. Constrained Bayesian optimization for automatic chemical design using variational autoencoders. Chemical Science, 2020
work page 2020
-
[16]
A review of safe reinforcement learning: Methods, theories and applications
Gu, S., Yang, L., Du, Y ., Chen, G., Walter, F., Wang, J., and Knoll, A. A review of safe reinforcement learning: Methods, theories and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
work page 2024
-
[17]
Guestrin, C., Krause, A., and Singh, A. P. Near-optimal sensor placements in gaussian pro- cesses. International Conference on Machine Learning, 2005
work page 2005
-
[18]
Scalable Variational Gaussian Process Clas- sification
Hensman, J., Matthews, A., and Ghahramani, Z. Scalable Variational Gaussian Process Clas- sification. International Conference on Artificial Intelligence and Statistics, 2015
work page 2015
-
[19]
Amortized bayesian experimental design for decision-making
Huang, D., Guo, Y ., Acerbi, L., and Kaski, S. Amortized bayesian experimental design for decision-making. Advances in Neural Information Processing Systems, 2024. 10
work page 2024
-
[20]
R., Foster, A., Kleinegesse, S., Gutmann, M
Ivanova, D. R., Foster, A., Kleinegesse, S., Gutmann, M. U., and Rainforth, T. Implicit Deep Adaptive Design: Policy-Based Experimental Design without Likelihoods.Advances in Neural Information Processing Systems, 2021
work page 2021
-
[21]
Adaptive and safe Bayesian optimization in high dimensions via one-dimensional subspaces
Kirschner, J., Mutny, M., Hiller, N., Ischebeck, R., and Krause, A. Adaptive and safe Bayesian optimization in high dimensions via one-dimensional subspaces. International Conference on Machine Learning, 2019
work page 2019
-
[22]
Krause, A. and Guestrin, C. Nonmyopic active learning of gaussian processes: An exploration- exploitation approach. International Conference on Machine Learning, 2007
work page 2007
-
[23]
Krause, A., Singh, A., and Guestrin, C. Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies. Journal of Machine Learning Research , 2008
work page 2008
-
[24]
Kumar, P. and Gupta, A. Active learning query strategies for classification, regression, and clustering: A survey. Journal of Computer Science and Technology, 2020
work page 2020
-
[25]
Lederer, A., Conejo, A. J. O., Maier, K. A., Xiao, W., Umlauft, J., and Hirche, S. Gaussian process-based real-time learning for safety critical applications. International Conference on Machine Learning, 2021
work page 2021
-
[26]
Safe active learning for multi-output gaussian pro- cesses
Li, C.-Y ., Rakitsch, B., and Zimmer, C. Safe active learning for multi-output gaussian pro- cesses. International Conference on Artificial Intelligence and Statistics, 2022
work page 2022
-
[27]
Global safe sequential learning via efficient knowledge transfer
Li, C.-Y ., D¨unnbier, O., Toussaint, M., Rakitsch, B., and Zimmer, C. Global safe sequential learning via efficient knowledge transfer. Transactions on Machine Learning Research, 2025
work page 2025
-
[28]
On the variance of the adaptive learning rate and beyond
Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., and Han, J. On the variance of the adaptive learning rate and beyond. International Conference on Learning Representations, 2020
work page 2020
-
[29]
Liu, S., Sun, X., Ramadge, P. J., and Adams, R. P. Task-agnostic amortized inference of gaussian process hyperparameters. Advances in Neural Information Processing Systems, 2020
work page 2020
-
[30]
Moss, H. B., Ober, S. W., and Picheny, V . Inducing point allocation for sparse gaussian pro- cesses in high-throughput bayesian optimisation. International Conference on Artificial Intel- ligence and Statistics, 2023
work page 2023
-
[31]
Nguyen–Tuong, D. and Peters, J. Incremental sparsification for real-time online model learn- ing. International Conference on Artificial Intelligence and Statistics, 2010
work page 2010
- [32]
-
[33]
Pamadi, B. N., Covell, P. F., Tartabini, P. V ., and Murphy, K. J. Aerodynamic characteristics and glide-back performance of langley glide-back booster. Applied Aerodynamics Conference and Exhibit, 2004
work page 2004
-
[34]
”how big is big enough?” adjusting model size in continual gaussian processes
Pescador-Barrios, G., Filippi, S., and van der Wilk, M. ”how big is big enough?” adjusting model size in continual gaussian processes. arXiv, 2024
work page 2024
-
[35]
Rahimi, A. and Recht, B. Random features for large-scale kernel machines. Advances in Neural Information Processing Systems, 2007
work page 2007
-
[36]
Rainforth, T., Foster, A., Ivanova, D. R., and Smith, F. B. Modern Bayesian Experimental Design. Statistical Science, 2024
work page 2024
-
[37]
Rasmussen, C. and Williams, C. Gaussian processes for machine learning. MIT Press, 2006
work page 2006
-
[38]
Rogers, S. E., Aftosmis, M. J., Pandya, S. A., Chaderjian, N. M., T., E. T., and Ahmad, J. U. Automated cfd parameter studies on distributed parallel computers.AIAA Computational Fluid Dynamics Conference, 2003
work page 2003
-
[39]
Pacoh: Bayes-optimal meta-learning with pac-guarantees
Rothfuss, J., Fortuin, V ., Josifoski, M., and Krause, A. Pacoh: Bayes-optimal meta-learning with pac-guarantees. International Conference on Machine Learning, 2021. 11
work page 2021
-
[40]
Safe exploration for active learning with gaussian processes
Schreiter, J., Nguyen-Tuong, D., Eberts, M., Bischoff, B., Markert, H., and Toussaint, M. Safe exploration for active learning with gaussian processes. Machine Learning and Knowledge Discovery in Databases, 2015
work page 2015
-
[41]
Low rank updates for the cholesky decomposition
Seeger, M. Low rank updates for the cholesky decomposition. 2004
work page 2004
-
[42]
Gaussian process regression: active data selection and test point rejection
Seo, S., Wallat, M., Graepel, T., and Obermayer, K. Gaussian process regression: active data selection and test point rejection. International Joint Conference on Neural Networks, 2000
work page 2000
-
[43]
Active learning literature survey
Settles, B. Active learning literature survey. University of Wisconsin-Madison, 2010
work page 2010
-
[44]
Computer-aided graphing and simulation tools for autocad users
Simionescu, P. Computer-aided graphing and simulation tools for autocad users. Computer- Aided Graphing and Simulation Tools for AutoCAD Users, 2014
work page 2014
-
[45]
Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M. W. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 2012
work page 2012
-
[46]
Safe exploration for optimization with gaussian processes
Sui, Y ., Gotovos, A., Burdick, J., and Krause, A. Safe exploration for optimization with gaussian processes. International Conference on Machine Learning, 2015
work page 2015
-
[47]
Sui, Y ., Zhuang, V ., Burdick, J. W., and Yue, Y . Stagewise Safe Bayesian Optimization with Gaussian Processes. International Conference on Machine Learning, 80, 2018
work page 2018
-
[48]
Amortized bayesian optimization over discrete spaces
Swersky, K., Rubanova, Y ., Dohan, D., and Murphy, K. Amortized bayesian optimization over discrete spaces. Conference on Uncertainty in Artificial Intelligence, 2020
work page 2020
-
[49]
Tharwat, A. and Schenck, W. A survey on active learning: State-of-the-art, practical challenges and research directions. Mathematics, 2023
work page 2023
-
[50]
Variational learning of inducing variables in sparse gaussian processes
Titsias, M. Variational learning of inducing variables in sparse gaussian processes. Interna- tional Conference on Artificial Intelligence and Statistics, 2009
work page 2009
-
[51]
Constrained optimization in chebfun
Townsend, A. Constrained optimization in chebfun. chebfun.org, 2017
work page 2017
-
[52]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. Attention is all you need. Advances in Neural Information Processing Systems, 2017
work page 2017
-
[53]
T., Borovitskiy, V ., Terenin, A., Mostowsky, P., and Deisenroth, M
Wilson, J. T., Borovitskiy, V ., Terenin, A., Mostowsky, P., and Deisenroth, M. P. Efficiently sampling functions from gaussian process posteriors. International Conference on Machine Learning, 2020
work page 2020
-
[54]
Zhang, Y ., Hoang, T. N., Low, K. H., and Kankanhalli, M. Near-optimal active learning of multi-output gaussian processes. AAAI Conference on Artificial Intelligence, 2016
work page 2016
-
[55]
Safe active learning for time-series modeling with gaussian processes
Zimmer, C., Meister, M., and Nguyen-Tuong, D. Safe active learning for time-series modeling with gaussian processes. Advances in Neural Information Processing Systems, 2018. 12 Appendix Overview A Gaussian process: distribution and entropy 14 B Policy NN structure 15 C Training objectives: details, illustrations, more objectives 16 C.1 Objectives: unconst...
work page 2018
-
[56]
the history encoder ( {(xi, yi)}t i=1 → Et) and the decision feed forward MLP (originally Et → xt+1) are taken from Ivanova et al. [20]
-
[57]
we add a hyperbolic tangent function as the last layer to ensure the policy output is in our bounded X , which was not needed in the original Bayesian experimental design problems
-
[58]
we add another history encoder to handle the safety data
-
[59]
we add a budget encoder to handle the budget variable. Note that the history encoder incorporates the inductive bias that observed data are order-invariant (see [11, 20] for details). This can be seen by noticing that a conventional AL computes the ac- quisition score conditioned on the past observations and the order of the past data does not matter. We ...
-
[60]
Consider Q = ID, then 1 D PD d=1 wd[u]2 d = 1 D PD d=1 wd([x]d − 0.5)2 is an ellipsoid cen- tering around (0.5, ..., 0.5) ∈ RD
-
[61]
We can see that µq has the center area being a safe ellipsoid as long as c > 0, with shape and size controlled by wd, and the orthogonal matrix Q allows us to rotate the ellipsoid around the center (0.5, ..., 0.5). The orthogonal matrix Q is obtained by performing a QR- decomposition of a sampled A ∈ RD×D (each entity from Uniform[−1, 1])
-
[62]
The above steps describe variables wd, c, Q, and we then describe the constants
-
[63]
If we consider c = 1 , wd/D = 10 , ∀d ≤ D (e.g. wd = 20 , D = 2 ) and Q = ID, then the central safe area is a ball and it takes about half of the space, i.e. the mean function µq brings half of the space safe and half unsafe. We will later sample the shape and the half-safe space is only for an initial design
-
[64]
22 Table A.E.2: Batch sizes in training
With the same c, wd, Q, the constants 3.2 and −0.47 ensure zero mean and unit variance of this µq function, which aligns with our setup that the deployment problems are normalized, and this provides us an estimated variance of µq ≈ c2. 22 Table A.E.2: Batch sizes in training. loss functions I DAD SH Nk = |{(θ, θq)}| 10 10 10 Nf,q = |{(f, q)}| 5 200 5 B = ...
-
[65]
Our AAL, our ASAL, DAD, Random: we deploy our amortized (safe) AL or the DAD, Random baselines to collect data. After the specifiedT data points are collected, we use the initial and queried data to fit a GP model with Type II maximum likelihood (optimization: L-BFGS-B algorithm)
-
[66]
GP AL, Safe GP AL, Safe Random: We deploy conventional AL and safe AL (Algo- rithms A.8 and A.9), which update GP iteratively with Type II maximum likelihood (opti- mization: L-BFGS-B algorithm)
-
[67]
AGP AL, Safe AGP AL, Safe ARandom: We deploy conventional AL and safe AL (Algo- rithms A.8 and A.9), which update GP iteratively with an amortized inference developed by Bitzer et al. [6] (AGP). Bitzer et al. [6] sampled GP data and trained a transformer model to approximate the Type II maximum likelihood. This AGP is a model with a transformer module. Wh...
-
[68]
safe AL of policy trained on our main SH, γ =5% (Eq. (8)),
-
[69]
safe AL of policy trained on our appendix SHmean, γ =5% (unconstrained Hmean decorated with our main min unsafe likelihood, see Figure A.C.6),
-
[70]
safe AL of policy trained on our appendix SH,division (unconstrained H decorated with our appendix max safe likelihood, see Eq. (A.18) and Figure A.C.6),
-
[71]
safe AL of policy trained on our appendix SHmean,division (unconstrained Hmean decorated with our appendix max safe likelihood, see Figure A.C.6), and
-
[72]
conventional GP based safe AL (Algorithm A.9) but we add the unconstrained safety-aware acquisition criterion (Eq. (7)), named MinUnsafe GP AL: xt = argmax{H[y(x)|y1:t−1, Yinit] − log max(γ, p(z(x) < 0|z1:t−1, Zinit))} (γ = 0 .05, this is the same as Eq. (7) if we take expectation over the forecastedy(x), and this corresponds to objectives SH, SHmean, see...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.