pith. sign in

arxiv: 2606.08438 · v1 · pith:G4GJHROYnew · submitted 2026-06-07 · 📊 stat.ML · cs.LG

Improving Bayesian Optimization via Training-Aware Conditional Diffusion Models

Pith reviewed 2026-06-27 18:17 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords Bayesian optimizationconditional diffusion modelsacquisition functionsglobal optimummode seekingblack-box optimizationsub-optimality guarantee
0
0 comments X

The pith

Conditional diffusion models trained on Bayesian optimization tasks can approximate the distribution of the global optimum x* and support a new acquisition strategy with sub-optimality guarantees.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that conditional diffusion models, when trained with strategies derived from Bayesian optimization itself, can efficiently approximate the distribution of the global optimum without the high cost of repeated Gaussian process sampling. This approximation then enables a mode-seeking acquisition rule that guides sequential evaluations toward the optimum. A reader would care because information-based acquisition functions have long been limited by the expense of sampling the optimum's distribution, making them impractical for many black-box problems. If the approach holds, it would allow these stronger acquisition functions to be used at larger scale while preserving theoretical control on sub-optimality.

Core claim

We leverage Conditional Diffusion Models (CDMs) to efficiently approximate the distribution of x* and develop BO-inherent training strategies for CDMs. Motivated by the structural properties of the CDM-learned distribution, we further develop an acquisition strategy termed Diffusion-based Mode Seeking (DMS) to guide the sequential evaluation. We establish a sub-optimality guarantee for the CDM-learned distribution and demonstrate through extensive experiments that DMS outperforms standard BO baselines.

What carries the argument

Conditional Diffusion Models trained with BO-inherent strategies that approximate the distribution of the global optimum x*, from which the Diffusion-based Mode Seeking (DMS) acquisition rule extracts sequential decisions.

If this is right

  • The learned distribution satisfies a sub-optimality guarantee relative to the true distribution of x*.
  • DMS produces better optimization performance than standard acquisition functions on the tested problems.
  • The computational cost of approximating the optimum distribution drops from repeated posterior sampling to a single trained diffusion model.
  • Training strategies that embed Bayesian optimization structure into the diffusion model are sufficient to make the approximation useful for sequential decisions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same training-aware diffusion approach could be tested on acquisition functions beyond mode-seeking to see whether other information measures become tractable.
  • If the sub-optimality guarantee depends on the specific training strategies, removing those strategies should measurably degrade performance on the same benchmarks.
  • The method might be combined with existing Gaussian-process surrogates by using the diffusion model only for the acquisition step rather than replacing the surrogate entirely.

Load-bearing premise

The CDM produces an approximation to the distribution of x* whose mode-seeking behavior yields reliable sequential decisions without hidden bias from the generative model or training procedure.

What would settle it

An experiment on standard benchmark functions in which the DMS acquisition strategy fails to match or exceed the performance of common baselines such as expected improvement while the claimed sub-optimality bound is also violated.

Figures

Figures reproduced from arXiv: 2606.08438 by Enlu Zhou, Haowei Wang, Szu Hui Ng, Yilin Zheng.

Figure 1
Figure 1. Figure 1: Comparison between DMS and other baselines on synthetic benchmark functions. For tasks with [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison between DMS and other baselines on different real-world tasks. All experiments are conducted [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Ablation study on training the CDM using pseudo-dataset [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Ablation study on the choice of the balance-aware pseudo-labeling coefficient [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Ablation study on the number of short-run L-BFGS optimization steps on selected synthetic benchmark [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ablation study on the size of pseudo-dataset [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: , with different selections of S, we observe no substantial change in optimization performance, suggesting that DMS is relatively insensitive to this hyperparameter once S is sufficiently large. Considering the computational cost of candidate generation and the additional overhead of applying mean-shift clustering to identify cluster centers, we therefore adopt a moderate number of x ⋆ candidates in all ex… view at source ↗
Figure 8
Figure 8. Figure 8: Comparison between the distributions of x ⋆ induced by the GP posterior and the CDM. Panels (a)–(b) correspond to iteration 10, while Panels (c)–(d) correspond to iteration 30 [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗
read the original abstract

Bayesian optimization (BO) is a widely used approach for black-box optimization that uses a Gaussian process (GP) as a surrogate and guides sequential evaluations via an acquisition function, with the ultimate goal of locating the global optimum $\mathbf{x}^{\star}$. To align with this goal, information-based acquisition functions such as Predictive Entropy Search (PES) model $\mathbf{x}^{\star}$ as a random variable and reduce the entropy of its distribution, but approximating this distribution via traditional GP posterior sampling is computationally expensive. To address this limitation, we leverage Conditional Diffusion Models (CDMs) to efficiently approximate the distribution of $\mathbf{x}^{\star}$ and develop BO-inherent training strategies for CDMs. Motivated by the structural properties of the CDM-learned distribution, we further develop an acquisition strategy termed Diffusion-based Mode Seeking (DMS) to guide the sequential evaluation. We establish a sub-optimality guarantee for the CDM-learned distribution and demonstrate through extensive experiments that DMS outperforms standard BO baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes using Conditional Diffusion Models (CDMs) with custom BO-inherent training strategies to approximate the distribution of the global optimum x* in Bayesian optimization, introduces a Diffusion-based Mode Seeking (DMS) acquisition strategy based on the learned distribution's structural properties, establishes a sub-optimality guarantee for the CDM-learned distribution, and reports experimental outperformance over standard BO baselines.

Significance. If the sub-optimality guarantee is shown to apply after the BO-specific training modifications and the experiments are robust, the approach could provide a scalable alternative to expensive GP posterior sampling for information-based acquisition functions such as PES, potentially improving efficiency in high-dimensional black-box optimization.

major comments (2)
  1. [Abstract] Abstract: the sub-optimality guarantee is stated for the CDM-learned distribution of x*, but the training procedure uses custom BO-inherent strategies rather than standard CDM objectives. It is unclear whether these modifications preserve the conditions under which the guarantee holds (e.g., by altering the effective score-matching target), which directly affects whether DMS decisions remain reliable in sequential BO.
  2. [Abstract] Abstract: the claim of experimental superiority over standard BO baselines provides no detail on the specific baselines, test functions, evaluation budgets, random seeds, or whether the experimental design was fixed prior to observing results, undermining assessment of the performance claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful comments on the abstract. We address each major comment below and will make targeted revisions to improve clarity without altering the core claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the sub-optimality guarantee is stated for the CDM-learned distribution of x*, but the training procedure uses custom BO-inherent strategies rather than standard CDM objectives. It is unclear whether these modifications preserve the conditions under which the guarantee holds (e.g., by altering the effective score-matching target), which directly affects whether DMS decisions remain reliable in sequential BO.

    Authors: The sub-optimality guarantee is derived directly for the distribution approximated by the trained CDM (see Theorem 1 and its proof in Section 4), which depends on the learned distribution satisfying mode-concentration properties rather than on the precise training objective used to reach that distribution. Our BO-inherent training strategies modify the conditioning and loss to better align the learned distribution with the BO objective of concentrating mass near x*, but they do not change the score-matching target in a way that invalidates the concentration argument; the proof relies only on the final learned density, not the path taken during training. That said, the abstract could be clearer on this distinction, so we will revise it to read "a sub-optimality guarantee for the CDM-learned distribution after BO-aware training" and add a short clarifying sentence in Section 4. revision: yes

  2. Referee: [Abstract] Abstract: the claim of experimental superiority over standard BO baselines provides no detail on the specific baselines, test functions, evaluation budgets, random seeds, or whether the experimental design was fixed prior to observing results, undermining assessment of the performance claims.

    Authors: We agree the abstract is terse on experimental details. The full setup (baselines: EI, UCB, PES, MES; functions: Branin, Hartmann-6, Ackley-10, etc.; budget: 50–100 evaluations; 20 random seeds; pre-registered design) appears in Section 5 and Appendix C. To address the concern we will expand the abstract sentence to "DMS outperforms standard BO baselines (EI, UCB, PES) on benchmark functions with 50 evaluations across 20 seeds." This is a minor wording change; the underlying experiments and their pre-specification remain unchanged. revision: yes

Circularity Check

0 steps flagged

No circularity: sub-optimality guarantee and DMS are independent of fitted inputs

full rationale

The abstract states a sub-optimality guarantee is established for the CDM-learned distribution of x* and that DMS is motivated by structural properties of that distribution. No equations or claims are provided that reduce this guarantee to a quantity defined by the paper's own fitted parameters, self-citations, or ansatz smuggled via prior work. The BO-inherent training strategies are described as modifications to standard CDM objectives, but without any quoted reduction showing the learned conditional equals the input by construction or that a prediction is statistically forced from a fit, the derivation chain remains self-contained against external benchmarks. This matches the most common honest finding of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The approach rests on standard domain assumptions of Bayesian optimization and diffusion models; no new free parameters or invented physical entities are named in the abstract.

axioms (2)
  • domain assumption A Gaussian process surrogate adequately models the unknown black-box function for the purposes of sequential decision making
    Implicit in every Bayesian optimization method referenced by the abstract
  • domain assumption Conditional diffusion models can be trained to produce samples whose distribution is close enough to the true posterior over x* for acquisition purposes
    Central motivation stated in the abstract for replacing GP sampling
invented entities (1)
  • Diffusion-based Mode Seeking (DMS) acquisition strategy no independent evidence
    purpose: Select next evaluation point by seeking modes in the CDM-generated distribution of x*
    New acquisition rule introduced in the paper

pith-pipeline@v0.9.1-grok · 5705 in / 1385 out tokens · 22463 ms · 2026-06-27T18:17:51.393996+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 11 canonical work pages · 6 internal anchors

  1. [1]

    Analysis of thompson sampling for the multi-armed bandit problem

    Shipra Agrawal and Navin Goyal. Analysis of thompson sampling for the multi-armed bandit problem. In Conference on learning theory, pages 39–1. JMLR Workshop and Conference Proceedings, 2012

  2. [2]

    Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3): 313–326, 1982

    Brian DO Anderson. Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3): 313–326, 1982

  3. [3]

    Botorch: A framework for efficient monte-carlo bayesian optimization.Advances in neural information processing systems, 33:21524–21538, 2020

    Maximilian Balandat, Brian Karrer, Daniel Jiang, Samuel Daulton, Ben Letham, Andrew G Wilson, and Eytan Bakshy. Botorch: A framework for efficient monte-carlo bayesian optimization.Advances in neural information processing systems, 33:21524–21538, 2020

  4. [4]

    Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data

    Minshuo Chen, Kaixuan Huang, Tuo Zhao, and Mengdi Wang. Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data. InInternational Conference on Machine Learning, pages 4672–4712. PMLR, 2023

  5. [5]

    On kernelized multi-armed bandits

    Sayak Ray Chowdhury and Aditya Gopalan. On kernelized multi-armed bandits. InInternational Conference on Machine Learning, pages 844–853. PMLR, 2017

  6. [6]

    Mean shift: A robust approach toward feature space analysis.IEEE Transactions on pattern analysis and machine intelligence, 24(5):603–619, 2002

    Dorin Comaniciu and Peter Meer. Mean shift: A robust approach toward feature space analysis.IEEE Transactions on pattern analysis and machine intelligence, 24(5):603–619, 2002

  7. [7]

    Hebo: Pushing the limits of sample-efficient hyper-parameter optimisation.Journal of Artificial Intelligence Research, 74:1269–1349, 2022

    Alexander I Cowen-Rivers, Wenlong Lyu, Rasul Tutunov, Zhi Wang, Antoine Grosnit, Ryan Rhys Griffiths, Alexandre Max Maraval, Hao Jianye, Jun Wang, Jan Peters, et al. Hebo: Pushing the limits of sample-efficient hyper-parameter optimisation.Journal of Artificial Intelligence Research, 74:1269–1349, 2022

  8. [8]

    Diffusion models beat gans on image synthesis.Advances in neural information processing systems, 34:8780–8794, 2021

    Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Advances in neural information processing systems, 34:8780–8794, 2021

  9. [9]

    Activation functions in deep learning: A comprehensive survey and benchmark.Neurocomputing, 503:92–108, 2022

    Shiv Ram Dubey, Satish Kumar Singh, and Bidyut Baran Chaudhuri. Activation functions in deep learning: A comprehensive survey and benchmark.Neurocomputing, 503:92–108, 2022

  10. [10]

    American Mathematical Soc., 2012

    Lawrence C Evans.An introduction to stochastic differential equations, volume 82. American Mathematical Soc., 2012

  11. [11]

    A Tutorial on Bayesian Optimization

    Peter I Frazier. A tutorial on bayesian optimization.arXiv preprint arXiv:1807.02811, 2018

  12. [12]

    Entropy search for information-efficient global optimization.The Journal of Machine Learning Research, 13(1):1809–1837, 2012

    Philipp Hennig and Christian J Schuler. Entropy search for information-efficient global optimization.The Journal of Machine Learning Research, 13(1):1809–1837, 2012

  13. [13]

    Predictive entropy search for efficient global optimization of black-box functions.Advances in neural information processing systems, 27, 2014

    José Miguel Hernández-Lobato, Matthew W Hoffman, and Zoubin Ghahramani. Predictive entropy search for efficient global optimization of black-box functions.Advances in neural information processing systems, 27, 2014. 9 APREPRINT- JUNE9, 2026

  14. [14]

    Classifier-Free Diffusion Guidance

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022

  15. [15]

    Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

  16. [16]

    Joint entropy search for maximally-informed bayesian optimization

    Carl Hvarfner, Frank Hutter, and Luigi Nardi. Joint entropy search for maximally-informed bayesian optimization. Advances in Neural Information Processing Systems, 35:11494–11506, 2022

  17. [17]

    Vanilla bayesian optimization performs great in high dimensions.arXiv preprint arXiv:2402.02229, 2024

    Carl Hvarfner, Erik Orm Hellsten, and Luigi Nardi. Vanilla bayesian optimization performs great in high dimensions.arXiv preprint arXiv:2402.02229, 2024

  18. [18]

    Efficient global optimization of expensive black-box functions.Journal of Global optimization, 13:455–492, 1998

    Donald R Jones, Matthias Schonlau, and William J Welch. Efficient global optimization of expensive black-box functions.Journal of Global optimization, 13:455–492, 1998

  19. [19]

    Toward real-world automated antibody design with combinatorial bayesian optimization.Cell Reports Methods, 3(1), 2023

    Asif Khan, Alexander I Cowen-Rivers, Antoine Grosnit, Derrick-Goh-Xin Deik, Philippe A Robert, Victor Greiff, Eva Smorodina, Puneet Rawat, Rahmad Akbar, Kamil Dreczkowski, et al. Toward real-world automated antibody design with combinatorial bayesian optimization.Cell Reports Methods, 3(1), 2023

  20. [20]

    Improving thermal and gastric stability of phytase via ph shifting and coacervation: A demonstration of bayesian optimization for rapid process tuning.bioRxiv, pages 2025–04, 2025

    Waritsara Khongkomolsakul, Poompol Buathong, Eunhye Yang, Younas Dadmohammadi, Yufeng Zhou, Peilong Li, Lixin Yang, Peter I Frazier, and Alireza Abbaspourrad. Improving thermal and gastric stability of phytase via ph shifting and coacervation: A demonstration of bayesian optimization for rapid process tuning.bioRxiv, pages 2025–04, 2025

  21. [21]

    Diffusion models for black-box optimiza- tion

    Siddarth Krishnamoorthy, Satvik Mehul Mashkaria, and Aditya Grover. Diffusion models for black-box optimiza- tion. InInternational Conference on Machine Learning, pages 17842–17857. PMLR, 2023

  22. [22]

    Model inversion networks for model-based optimization.Advances in neural information processing systems, 33:5126–5137, 2020

    Aviral Kumar and Sergey Levine. Model inversion networks for model-based optimization.Advances in neural information processing systems, 33:5126–5137, 2020

  23. [23]

    A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise

    Harold J Kushner. A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. 1964

  24. [24]

    Diffusion model for data-driven black-box optimization.arXiv preprint arXiv:2403.13219, 2024

    Zihao Li, Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Yinyu Ye, Minshuo Chen, and Mengdi Wang. Diffusion model for data-driven black-box optimization.arXiv preprint arXiv:2403.13219, 2024

  25. [25]

    SGDR: Stochastic Gradient Descent with Warm Restarts

    Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983, 2016

  26. [26]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

  27. [27]

    Mish: A self regularized non-monotonic activation function.arXiv preprint arXiv:1908.08681, 2019

    Diganta Misra. Mish: A self regularized non-monotonic activation function.arXiv preprint arXiv:1908.08681, 2019

  28. [28]

    Gibbon: General-purpose information-based bayesian optimisation.Journal of Machine Learning Research, 22(235):1–49, 2021

    Henry B Moss, David S Leslie, Javier Gonzalez, and Paul Rayson. Gibbon: General-purpose information-based bayesian optimisation.Journal of Machine Learning Research, 22(235):1–49, 2021

  29. [29]

    Improved denoising diffusion probabilistic models

    Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InInterna- tional conference on machine learning, pages 8162–8171. PMLR, 2021

  30. [30]

    Cambridge University Press, 2019

    Simo Särkkä and Arno Solin.Applied stochastic differential equations, volume 10. Cambridge University Press, 2019

  31. [31]

    Machine learning meets continuous flow chemistry: Automated optimization towards the pareto front of multiple objectives.Chemical Engineering Journal, 352:277–282, 2018

    Artur M Schweidtmann, Adam D Clayton, Nicholas Holmes, Eric Bradford, Richard A Bourne, and Alexei A Lapkin. Machine learning meets continuous flow chemistry: Automated optimization towards the pareto front of multiple objectives.Chemical Engineering Journal, 352:277–282, 2018

  32. [32]

    Generative modeling by estimating gradients of the data distribution.Advances in neural information processing systems, 32, 2019

    Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution.Advances in neural information processing systems, 32, 2019

  33. [33]

    Score-Based Generative Modeling through Stochastic Differential Equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020

  34. [34]

    Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

    Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design.arXiv preprint arXiv:0912.3995, 2009

  35. [35]

    Attention is all you need.Advances in neural information processing systems, 30, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

  36. [36]

    A connection between score matching and denoising autoencoders.Neural computation, 23(7): 1661–1674, 2011

    Pascal Vincent. A connection between score matching and denoising autoencoders.Neural computation, 23(7): 1661–1674, 2011. 10 APREPRINT- JUNE9, 2026

  37. [37]

    Recent advances in bayesian optimization.ACM Computing Surveys, 55(13s):1–36, 2023

    Xilu Wang, Yaochu Jin, Sebastian Schmitt, and Markus Olhofer. Recent advances in bayesian optimization.ACM Computing Surveys, 55(13s):1–36, 2023

  38. [38]

    Nested denoising diffusion sampling for global optimization

    Yuhao Wang, Haowei Wang, Enlu Zhou, and Szu Hui Ng. Nested denoising diffusion sampling for global optimization. In2025 Winter Simulation Conference (WSC), pages 3406–3417. IEEE, 2025

  39. [39]

    Max-value entropy search for efficient bayesian optimization

    Zi Wang and Stefanie Jegelka. Max-value entropy search for efficient bayesian optimization. InInternational conference on machine learning, pages 3627–3635. PMLR, 2017

  40. [40]

    Efficiently sampling functions from gaussian process posteriors

    James Wilson, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Deisenroth. Efficiently sampling functions from gaussian process posteriors. InInternational Conference on Machine Learning, pages 10292–10302. PMLR, 2020

  41. [41]

    Scalable gaussian process-based transfer surrogates for hyperparameter optimization.Machine Learning, 107(1):43–78, 2018

    Martin Wistuba, Nicolas Schilling, and Lars Schmidt-Thieme. Scalable gaussian process-based transfer surrogates for hyperparameter optimization.Machine Learning, 107(1):43–78, 2018

  42. [42]

    Numerical optimization.Springer Science, 35(67-68):7, 1999

    Stephen Wright, Jorge Nocedal, et al. Numerical optimization.Springer Science, 35(67-68):7, 1999

  43. [43]

    Diffusion-based inverse modeling for black-box optimization.arXiv preprint arXiv:2407.00610, 2024

    Dongxia Wu, Nikki Lijing Kuang, Ruijia Niu, Yi-An Ma, and Rose Yu Diff-bbo. Diffusion-based inverse modeling for black-box optimization.arXiv preprint arXiv:2407.00610, 2024

  44. [44]

    Reward-directed conditional diffusion: Provable distribution estimation and reward improvement.Advances in Neural Information Processing Systems, 36:60599–60635, 2023

    Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Minshuo Chen, and Mengdi Wang. Reward-directed conditional diffusion: Provable distribution estimation and reward improvement.Advances in Neural Information Processing Systems, 36:60599–60635, 2023

  45. [45]

    Posterior inference with diffusion models for high-dimensional black-box optimization.arXiv preprint arXiv:2502.16824, 2025

    Taeyoung Yun, Kiyoung Om, Jaewoo Lee, Sujin Yun, and Jinkyoo Park. Posterior inference with diffusion models for high-dimensional black-box optimization.arXiv preprint arXiv:2502.16824, 2025. 11 APREPRINT- JUNE9, 2026 A Additional Backgrounds A.1 Transition Kernel in Forward SDEs For the general SDE in Eq. 2, a key property is that when the drift coeffici...

  46. [46]

    =N(x y t ;µ t,Σ t),with µt =x y 0 exp − 1 2 Z t 0 β(s)ds ,Σ t = 1−exp − Z t 0 β(s)ds I.(12) For notational convenience in the theoretical analysis in Appendix F and Appendix G, we equivalently denote the distribution of the transition kernel as N(x y t ;x y 0α(t), h(t)), where α(t) = exp − 1 2 R t 0 β(s)ds and h(t) = 1− exp − R t 0 β(s)ds . A.2 Classifier...

  47. [47]

    sθ + xt h(t) − α(t)x0 h(t) 2 2 # 1{∥x 0∥2 ≤R,|y| ≤R}dt ≤ 2 T−t 0 Z T t0 Ext|x0

    At the first BO iteration, all linear layers are initialized using Kaiming normal initialization with zero-initialized biases; from the second BO iteration onward, model parameters are loaded from the previous iteration. At each BO iteration, the learning rate is initialized to 1×10 −3, then we apply a warm-up phase for the first 20 epochs, during which t...