Generative Bayesian Optimization: Generative Models as Acquisition Functions

Daniel M. Steinberg; Edwin V. Bonilla; Rafael Oliveira

arxiv: 2510.25240 · v3 · pith:ZBLWUS4Tnew · submitted 2025-10-29 · 📊 stat.ML · cs.LG

Generative Bayesian Optimization: Generative Models as Acquisition Functions

Rafael Oliveira , Daniel M. Steinberg , Edwin V. Bonilla This is my paper

Pith reviewed 2026-05-18 03:39 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords Bayesian optimizationgenerative modelsacquisition functionsblack-box optimizationbatch optimizationproposal distributionsdirect preference optimization

0 comments

The pith

Generative models trained directly on observed utilities can sample points according to a Bayesian optimization acquisition function.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a strategy to repurpose generative models as samplers for batch Bayesian optimization. Instead of fitting a surrogate regression model to predict the black-box objective, the method trains the generative model on simple, noisy utility values computed straight from the observed data points. Once trained, the model produces proposals whose probability density is proportional to the expected utility, which is exactly what an acquisition function encodes. This removes the usual surrogate step and supports large batches, high dimensions, and non-continuous design spaces. Theory shows that the sequence of generative models converges asymptotically to the optimal target distribution under stated conditions.

Core claim

A generative model can be trained with noisy utility values computed directly from observations so that its sampling distribution has density proportional to the expected utility; the resulting proposals therefore serve as the acquisition function for Bayesian optimization without constructing an explicit surrogate model.

What carries the argument

Generative model trained on direct utility feedback from observations to produce proposal distributions whose densities match the acquisition-function target.

If this is right

Large-batch Bayesian optimization becomes feasible because sampling replaces per-point acquisition optimization.
Non-continuous, combinatorial, and high-dimensional design spaces can be handled directly by the generative sampler.
The framework extends beyond preference data to arbitrary reward signals and loss functions.
The generative models trace a sequence of distributions that approach the optimal acquisition target under the paper's conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The direct-training view may combine with existing large-scale generative architectures such as diffusion or autoregressive models for structured design problems.
Avoiding surrogate fitting could reduce per-iteration compute in settings where model training dominates the budget.
Similar direct-utility training loops could be tested in other sequential black-box decision tasks that currently rely on fitted surrogates.

Load-bearing premise

The generative model can be trained so that its sampling distribution asymptotically approximates the optimal acquisition-function target from noisy utilities alone, without an explicit surrogate.

What would settle it

Run the method on a high-dimensional continuous benchmark and compare final best value and regret against standard surrogate-based BO; if the generative approach consistently underperforms or fails to scale with batch size, the central claim is falsified.

read the original abstract

We present a general strategy for turning generative models into candidate solution samplers for batch Bayesian optimization (BO). The use of generative models for BO enables large batch scaling as generative sampling, optimization of non-continuous design spaces, and high-dimensional and combinatorial design. Inspired by the success of direct preference optimization (DPO), we show that one can train a generative model with noisy, simple utility values directly computed from observations to then form proposal distributions whose densities are proportional to the expected utility, i.e., BO's acquisition function values. Furthermore, this approach is generalizable beyond preference-based feedback to general types of reward signals and loss functions. This perspective avoids the construction of surrogate (regression or classification) models, common in previous methods that have used generative models for black-box optimization. Theoretically, we show that the generative models within the BO process follow a sequence of distributions which asymptotically approximate an optimal target under certain conditions. We also evaluate the performance through experiments on challenging optimization problems involving large batches in high dimensions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames generative models trained directly on observed utilities as acquisition function samplers for BO, skipping surrogates, but the step from raw points to posterior expectations is underspecified.

read the letter

The new angle here is training a generative model on noisy utility values computed straight from observations, then using its sampling distribution as a stand-in for the acquisition function. This draws from DPO-style losses but extends to general rewards, and it aims to handle large batches plus non-continuous or high-dimensional spaces without fitting a regression surrogate like a GP each iteration. They also claim a sequence of these models asymptotically approaches an optimal target under certain conditions, which is the theoretical hook. The experiments on high-dimensional problems with large batches are at least a concrete starting point for testing the scaling claims. That said, the central mechanism is thin. Standard acquisitions like expected improvement are posterior expectations that rely on a surrogate to quantify uncertainty and extrapolate. Training only on direct point utilities risks reducing to reweighted sampling of seen data rather than true Bayesian acquisition. The abstract gives no derivation, error bounds, or loss details showing how the proportionality to expected utility emerges without that surrogate step, so the asymptotic result rests on an unverified assumption. Baseline comparisons and ablation on the training loss would help clarify whether it actually improves over existing generative BO methods. This is worth attention from people working on batch or combinatorial black-box optimization who want to explore generative alternatives to GPs. It has enough of a distinct idea and some empirical grounding to go to referees for a proper check on the math and results.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Generative Bayesian Optimization, a method that trains generative models directly on noisy utility values computed from observations to produce proposal distributions whose densities are proportional to expected utility (i.e., standard BO acquisition functions such as EI). This approach, inspired by direct preference optimization, avoids explicit surrogate models. It claims a theoretical result that the sequence of generative distributions asymptotically approximates an optimal target under stated conditions and reports experiments on high-dimensional, large-batch optimization problems.

Significance. If the central claims are substantiated, the work offers a scalable alternative for batch BO in high-dimensional, combinatorial, and non-continuous spaces by leveraging generative sampling directly. The generalization to arbitrary reward signals and loss functions, plus the avoidance of surrogate fitting, would be notable strengths. The manuscript does not yet provide machine-checked proofs, reproducible code artifacts, or parameter-free derivations that would strengthen the assessment.

major comments (3)

[Abstract] Abstract and theoretical section: the claim that training on 'noisy, simple utility values directly computed from observations' yields proposal densities proportional to the posterior-expected acquisition function E[u(f(x))|data] is not supported by any derivation or mechanism; standard acquisitions require a surrogate to quantify uncertainty and extrapolate, and the manuscript provides no explicit link showing how the generalized DPO-style loss converts raw point observations into implicit posterior expectations.
[Theoretical Analysis] Theoretical result on asymptotic approximation: the statement that 'the generative models within the BO process follow a sequence of distributions which asymptotically approximate an optimal target under certain conditions' lacks derivation details, error analysis, convergence rates, or the precise conditions under which the approximation holds, leaving the central theoretical contribution unverified.
[Experiments] Experiments section: no baseline comparisons with standard surrogate-based BO (e.g., GP-EI or UCB) or ablation studies on the effect of direct utility training versus surrogate-based acquisition are reported, undermining the claim of practical advantage for large-batch high-dimensional problems.

minor comments (2)

[Introduction] Notation for the utility function and the proportionality claim should be defined more precisely with respect to the data-generating process.
[Experiments] Figure captions and table headers lack sufficient detail on the exact optimization benchmarks and batch sizes used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments and the recommendation for major revision. We address each of the major comments point by point below. We believe the clarifications and proposed revisions will strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract and theoretical section: the claim that training on 'noisy, simple utility values directly computed from observations' yields proposal densities proportional to the posterior-expected acquisition function E[u(f(x))|data] is not supported by any derivation or mechanism; standard acquisitions require a surrogate to quantify uncertainty and extrapolate, and the manuscript provides no explicit link showing how the generalized DPO-style loss converts raw point observations into implicit posterior expectations.

Authors: We appreciate this observation. The generalized DPO-style loss is designed such that its optimum corresponds to a distribution with density proportional to the utility, and when utilities are noisy observations of the black-box function, this implicitly targets the expected utility under the posterior induced by the data. However, we acknowledge that the link to the full posterior expectation E[u(f(x))|data] could be made more explicit. In the revised manuscript, we will include a detailed derivation showing how the training objective leads to proposals approximating the acquisition function without an intermediate surrogate model. revision: yes
Referee: [Theoretical Analysis] Theoretical result on asymptotic approximation: the statement that 'the generative models within the BO process follow a sequence of distributions which asymptotically approximate an optimal target under certain conditions' lacks derivation details, error analysis, convergence rates, or the precise conditions under which the approximation holds, leaving the central theoretical contribution unverified.

Authors: The theoretical analysis builds on results from direct preference optimization and generative model convergence. The sequence of distributions approximates the optimal target (the distribution proportional to the acquisition function) as the number of observations increases and under assumptions of model expressivity and optimization convergence. We agree that more details are needed. The revised version will expand this section with the full proof sketch, error analysis, convergence rates where possible, and the precise conditions (e.g., infinite data limit, perfect optimization of the loss). revision: yes
Referee: [Experiments] Experiments section: no baseline comparisons with standard surrogate-based BO (e.g., GP-EI or UCB) or ablation studies on the effect of direct utility training versus surrogate-based acquisition are reported, undermining the claim of practical advantage for large-batch high-dimensional problems.

Authors: Our experiments focus on regimes where standard surrogate-based methods like GPs struggle due to high dimensionality and large batch sizes. We compared against other generative and evolutionary baselines suitable for those settings. That said, we recognize the value of including comparisons to GP-EI or UCB on lower-dimensional problems for context, as well as ablations isolating the direct training approach. We will add these in the revised manuscript to better substantiate the practical advantages. revision: yes

Circularity Check

1 steps flagged

Training generative model on direct observation utilities and claiming proportionality to expected acquisition function reduces to fitted sampling by construction

specific steps

fitted input called prediction [Abstract]
"we show that one can train a generative model with noisy, simple utility values directly computed from observations to then form proposal distributions whose densities are proportional to the expected utility, i.e., BO's acquisition function values. ... This perspective avoids the construction of surrogate (regression or classification) models"

The generative model is explicitly trained on the simple utility values from observations; the claim that its sampling density becomes proportional to the 'expected utility' (standard BO acquisition) is therefore the direct output of that training fit. No separate mechanism computes the posterior expectation E[u(f(x))|data] that would normally require a surrogate; the proportionality is enforced by the loss (generalized from DPO) on the same observed utilities, rendering the 'prediction' of acquisition-function densities equivalent to the fitted distribution by construction.

full rationale

The central derivation claims that training a generative model directly on noisy utilities computed from raw observations produces proposal densities proportional to the expected utility (i.e., BO acquisition values) without an explicit surrogate. This reduces to a fitted-input-called-prediction pattern because the model is optimized to match the observed utilities, making the resulting density proportional to those utilities by the training objective itself; relabeling the fitted distribution as approximating the posterior-expected acquisition function adds no independent Bayesian content. The theoretical claim of asymptotic approximation to an optimal target under stated conditions inherits the same reduction, as the target is defined via the same utility signals. No external surrogate or posterior is constructed, so the proportionality is enforced by construction rather than derived from first principles.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that a generative model can be trained to produce densities proportional to expected utility from noisy observations alone, plus standard assumptions on the existence of an optimal target distribution in BO.

axioms (1)

domain assumption A generative model can be trained such that its output distribution approximates the acquisition function density from utility values computed directly from observations.
Invoked in the description of the training strategy and the theoretical sequence of distributions.

pith-pipeline@v0.9.0 · 5704 in / 1144 out tokens · 25004 ms · 2026-05-18T03:39:15.939909+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

train a generative model with noisy, simple utility values directly computed from observations to then form proposal distributions whose densities are proportional to the expected utility, i.e., BO's acquisition function values
IndisputableMonolith/Foundation/AbsoluteFloorClosure reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

avoids the construction of surrogate (regression or classification) models

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.