pith. sign in

arxiv: 2501.18756 · v2 · pith:ARMEWEVInew · submitted 2025-01-30 · 📊 stat.ML · cs.LG· math.OC

A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization

Pith reviewed 2026-05-23 04:17 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.OC
keywords Bayesian optimizationExpected ImprovementMax-value Entropy SearchVariational inferenceAcquisition functionsEntropy searchBlack-box optimization
0
0 comments X

The pith

Expected Improvement arises as a variational inference approximation to Max-value Entropy Search.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Variational Entropy Search as a framework that links Expected Improvement to Max-value Entropy Search in Bayesian optimization. It shows that the popular Expected Improvement acquisition function can be recovered as a variational approximation to the information-theoretic objective of Max-value Entropy Search. This connection motivates a new acquisition function, VES-Gamma, designed to retain useful properties from both. Experiments across synthetic and real benchmarks indicate that VES-Gamma is competitive with or better than standard choices. A sympathetic reader would therefore see the long-standing separation between improvement-based and entropy-based acquisition functions as less sharp than usually assumed.

Core claim

The authors claim that Expected Improvement can be derived as a variational inference approximation of Max-value Entropy Search. Under the Variational Entropy Search framework this approximation yields a practical hybrid acquisition function called VES-Gamma that inherits strengths from both parents and performs competitively on low- and high-dimensional test problems.

What carries the argument

The Variational Entropy Search framework, which recasts the entropy-reduction goal of Max-value Entropy Search as a variational inference problem whose solution recovers Expected Improvement.

If this is right

  • Expected Improvement and Max-value Entropy Search are related through a shared variational objective rather than being fundamentally distinct.
  • VES-Gamma can be used as a drop-in acquisition function that balances the computational simplicity of Expected Improvement with the uncertainty-reduction focus of Max-value Entropy Search.
  • New acquisition functions can be constructed by varying the variational approximation within the same unified framework.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same variational lens might be applied to other acquisition functions to reveal additional hidden connections.
  • Implementation in existing Bayesian optimization libraries could default to VES-Gamma without requiring separate code paths for Expected Improvement and Max-value Entropy Search.
  • The framework may extend naturally to settings with noisy observations or batch selection where both improvement and information gain remain relevant.

Load-bearing premise

The variational inference step that turns Max-value Entropy Search into Expected Improvement does not materially change which point the acquisition function would select next.

What would settle it

On a fixed set of benchmark functions, compare the argmax locations chosen by the original Max-value Entropy Search and by the variational approximation; if the locations differ systematically and the optimization trajectories diverge, the claimed equivalence does not hold.

read the original abstract

Bayesian optimization is a widely used method for optimizing expensive black-box functions, with Expected Improvement being one of the most commonly used acquisition functions. In contrast, information-theoretic acquisition functions aim to reduce uncertainty about the function's optimum and are often considered fundamentally distinct from EI. In this work, we challenge this prevailing perspective by introducing a unified theoretical framework, Variational Entropy Search, which reveals that EI and information-theoretic acquisition functions are more closely related than previously recognized. We demonstrate that EI can be interpreted as a variational inference approximation of the popular information-theoretic acquisition function, named Max-value Entropy Search. Building on this insight, we propose VES-Gamma, a novel acquisition function that balances the strengths of EI and MES. Extensive empirical evaluations across both low- and high-dimensional synthetic and real-world benchmarks demonstrate that VES-Gamma is competitive with state-of-the-art acquisition functions and in many cases outperforms EI and MES.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Variational Entropy Search as a unified framework for Bayesian optimization acquisition functions. It claims that Expected Improvement (EI) arises as a variational inference approximation to Max-value Entropy Search (MES), proposes the hybrid VES-Gamma acquisition function, and reports that VES-Gamma is competitive with or outperforms EI and MES on low- and high-dimensional synthetic and real-world benchmarks.

Significance. If the claimed variational relationship holds with controlled error, the work would usefully connect two families of acquisition functions that are usually treated as distinct, potentially enabling principled hybrids. The empirical section supplies reproducible benchmark results across multiple problem classes, which is a positive feature.

major comments (3)
  1. [§3] §3 (derivation of EI as VI approximation to MES): no quantitative bounds, KL estimates, or surface/argmax comparisons are supplied to characterize the approximation error between the two acquisition functions on the same posterior; this error analysis is load-bearing for the central unification claim.
  2. [§4] §4 (VES-Gamma construction): the new acquisition function is defined by combining EI and MES under the variational link, yet the manuscript provides no analysis of when the implicit variational family becomes loose or how that affects the location of the next query point.
  3. [§5] §5 (empirical evaluation): the benchmark tables compare final performance but do not include side-by-side plots or statistics of the EI versus MES acquisition surfaces or their argmax locations, leaving the practical fidelity of the claimed approximation untested.
minor comments (2)
  1. [§2] Notation for the variational distribution q and the entropy terms is introduced without an explicit table of symbols, which would aid readability.
  2. [Figure 2] Figure captions for the acquisition-function visualizations do not state the kernel hyperparameters or the number of posterior samples used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of the variational unification claim that would benefit from additional analysis. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core contributions.

read point-by-point responses
  1. Referee: [§3] §3 (derivation of EI as VI approximation to MES): no quantitative bounds, KL estimates, or surface/argmax comparisons are supplied to characterize the approximation error between the two acquisition functions on the same posterior; this error analysis is load-bearing for the central unification claim.

    Authors: We agree that the manuscript would be strengthened by quantitative characterization of the approximation error. The derivation in §3 establishes the variational relationship conceptually, but does not include explicit KL bounds or direct surface comparisons. We will add a new subsection with KL divergence estimates computed on example Gaussian process posteriors, along with visualizations comparing the EI and MES acquisition surfaces and their argmax locations to quantify the practical fidelity of the approximation. revision: yes

  2. Referee: [§4] §4 (VES-Gamma construction): the new acquisition function is defined by combining EI and MES under the variational link, yet the manuscript provides no analysis of when the implicit variational family becomes loose or how that affects the location of the next query point.

    Authors: The referee is correct that the manuscript lacks analysis of the conditions under which the variational family may become loose. We will expand §4 to include a discussion of the variational family's limitations, supported by targeted experiments that examine how approximation tightness influences the location of the next query point under different posterior regimes. revision: yes

  3. Referee: [§5] §5 (empirical evaluation): the benchmark tables compare final performance but do not include side-by-side plots or statistics of the EI versus MES acquisition surfaces or their argmax locations, leaving the practical fidelity of the claimed approximation untested.

    Authors: We acknowledge that the current empirical section focuses on final optimization performance and does not directly visualize or statistically compare the acquisition surfaces. We will augment §5 with side-by-side plots of EI and MES surfaces on selected benchmark problems, along with statistics on argmax locations, to provide direct evidence of the approximation's practical behavior. revision: yes

Circularity Check

0 steps flagged

No circularity: variational derivation of EI as MES approximation is independent

full rationale

The paper derives the relationship between EI and MES via standard variational inference techniques without reducing any central claim to a fitted parameter, self-referential definition, or load-bearing self-citation. The abstract and description present the unified framework as obtained from first-principles inference steps that remain externally verifiable. No equations or steps are shown to be equivalent to their inputs by construction, and the VES-Gamma construction builds on this independent link rather than assuming it tautologically.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are identified. The framework appears to rest on standard variational inference techniques from prior literature.

pith-pipeline@v0.9.0 · 5694 in / 995 out tokens · 62470 ms · 2026-05-23T04:17:15.199877+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Exploring Exploration in Bayesian Optimization

    cs.LG 2025-02 unverdicted novelty 7.0

    Introduces observation traveling salesman distance and observation entropy to quantify exploration in Bayesian optimization acquisition functions and links them to empirical performance.