pith. sign in

arxiv: 2601.21357 · v3 · pith:QO3ZASMDnew · submitted 2026-01-29 · 💻 cs.LG

Beyond Objective-Based Improvement: Stationarity-Aware Expected Improvement for Bayesian Optimization

Pith reviewed 2026-05-21 14:06 UTC · model grok-4.3

classification 💻 cs.LG
keywords Bayesian optimizationExpected improvementAcquisition functionGradient normsStationarityGaussian processBlack-box optimization
0
0 comments X

The pith

A new acquisition function for Bayesian optimization adds gradient-norm information to guide sampling toward both better values and stationary points.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to fix a limitation in standard Expected Improvement, which only tracks objective-value gains and can lose its signal in uncertain or flat regions. By extending the improvement idea to also reward proximity to points where the gradient is near zero, the method promotes sampling that is both high-performing and close to first-order optima. The authors derive a closed-form expression that stays within the existing improvement framework and remains tractable. They report consistent gains on benchmark problems and show the approach works for control-policy learning. A sympathetic reader would see this as making the acquisition criterion richer without breaking the core logic of Bayesian optimization.

Core claim

We propose Expected Improvement via Gradient Norms (EI-GN), a novel acquisition function that extends the improvement principle to incorporate first-order stationarity, promoting sampling in regions that are both high-performing and close to stationary points. We derive a tractable closed-form expression for EI-GN and show that it remains consistent with the improvement-based acquisition framework. By embedding progress toward stationarity into the acquisition criterion, EI-GN provides a richer and more informative notion of improvement.

What carries the argument

Expected Improvement via Gradient Norms (EI-GN), an acquisition function that augments standard expected improvement with the norm of the gradient from the surrogate model to favor regions near stationary points.

If this is right

  • The acquisition function stays informative even when pure value improvement becomes uninformative.
  • Sampling is steered toward regions that are simultaneously high-value and near first-order stationary points.
  • The closed-form expression keeps the method computationally practical for Gaussian-process surrogates.
  • Empirical performance improves consistently over baseline acquisition functions on standard benchmarks.
  • The same stationarity-aware principle applies directly to policy-learning tasks in control.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same stationarity signal could be added to other acquisition functions beyond expected improvement.
  • In high-dimensional or highly multimodal problems, balancing value and gradient norms may reduce wasted samples in non-stationary plateaus.
  • Testing whether the method still works when the surrogate gradient is noisy would reveal the practical robustness of the approach.

Load-bearing premise

Gradient information from the surrogate model can be reliably used to promote stationarity without introducing new biases or degrading performance where model uncertainty is high.

What would settle it

On a simple known multimodal test function, compare the fraction of samples EI-GN places near true stationary points versus standard expected improvement; a clear reversal of the claimed advantage would falsify the central benefit.

read the original abstract

Bayesian Optimization (BO) is a principled framework for optimizing expensive black-box functions, with Expected Improvement (EI) among its most widely used acquisition functions. Despite its empirical success, EI is agnostic to first-order optimality conditions, relying solely on objective-value improvement. As a result, it can exhibit vanishing acquisition signals where the improvement criterion is uninformative, limiting its effectiveness in guiding search. We propose Expected Improvement via Gradient Norms (EI-GN), a novel acquisition function that extends the improvement principle to incorporate first-order stationarity, promoting sampling in regions that are both high-performing and close to stationary points. We derive a tractable closed-form expression for EI-GN and show that it remains consistent with the improvement-based acquisition framework. By embedding progress toward stationarity into the acquisition criterion, EI-GN provides a richer and more informative notion of improvement. Empirical results on standard BO benchmarks demonstrate consistent gains over baseline methods, and we further illustrate its applicability to control policy learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Expected Improvement via Gradient Norms (EI-GN) as a novel acquisition function for Bayesian optimization. It extends standard EI by multiplying the improvement term with a weighting function of the gradient norm from the GP posterior to promote sampling near first-order stationary points, derives a claimed tractable closed-form expression, proves consistency with the improvement framework, and reports empirical gains over baselines on synthetic and control benchmarks.

Significance. If the closed-form derivation correctly handles the joint posterior of function value and gradient and the stationarity weighting does not distort the improvement signal under realistic uncertainty, EI-GN could supply a richer acquisition criterion that guides search toward both high-value and stationary regions. This would be especially relevant for policy optimization and other settings where first-order stationarity carries semantic meaning. The empirical results on standard benchmarks provide initial evidence of practical benefit, but the strength of the contribution hinges on whether the theoretical consistency holds when predictive variance is non-negligible.

major comments (2)
  1. [§3.2, Eq. (8)] §3.2, Eq. (8): The claimed closed-form for EI-GN = E[(f* − f(x))_+ ⋅ h(‖∇f(x)‖)] relies on the joint Gaussianity of f(x) and ∇f(x). However, the variance of ∇f(x) scales with predictive uncertainty; when this variance is large, the expectation over h can assign high weight to points whose mean gradient is moderate but whose posterior mass includes near-zero gradients, even when the mean improvement is small. This interaction is load-bearing for the consistency claim and requires either an explicit bound or a counter-example showing that the weighting remains subordinate to the improvement term.
  2. [§5.1, Figure 3 and Table 1] §5.1, Figure 3 and Table 1: The reported regret reductions versus EI are shown only for fixed GP hyperparameters and a single choice of h(·). No ablation isolates whether gains arise from stationarity awareness or from incidental regularization of high-variance regions; without this, the central empirical claim that EI-GN “provides a richer notion of improvement” remains under-supported.
minor comments (2)
  1. Notation for the stationarity weighting function h(·) is introduced without an explicit functional form or hyper-parameter schedule in the main text; a short appendix table listing the concrete choices used in each experiment would improve reproducibility.
  2. The abstract states “consistent gains over baseline methods,” yet the main text does not report statistical significance or number of random seeds for the control-policy experiment; adding these details would strengthen the empirical section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, indicating the revisions we plan to incorporate to strengthen the theoretical and empirical support for EI-GN.

read point-by-point responses
  1. Referee: [§3.2, Eq. (8)] §3.2, Eq. (8): The claimed closed-form for EI-GN = E[(f* − f(x))_+ ⋅ h(‖∇f(x)‖)] relies on the joint Gaussianity of f(x) and ∇f(x). However, the variance of ∇f(x) scales with predictive uncertainty; when this variance is large, the expectation over h can assign high weight to points whose mean gradient is moderate but whose posterior mass includes near-zero gradients, even when the mean improvement is small. This interaction is load-bearing for the consistency claim and requires either an explicit bound or a counter-example showing that the weighting remains subordinate to the improvement term.

    Authors: We thank the referee for this insightful observation on the interplay between predictive uncertainty and the gradient-norm weighting. The closed-form expression follows directly from the exact joint Gaussian posterior of (f(x), ∇f(x)) under the GP model, which permits analytic integration over the product of the improvement indicator and h(‖∇f(x)‖). To clarify the consistency claim under non-negligible variance, we will add a short appendix section deriving an explicit upper bound: EI-GN(x) ≤ C · EI(x) where C is a constant depending only on the chosen h and the GP kernel length-scale. This bound ensures the improvement term remains dominant. We will also include a brief numerical counter-example in a high-variance regime to illustrate that the weighting does not invert the acquisition ordering relative to standard EI. revision: yes

  2. Referee: [§5.1, Figure 3 and Table 1] §5.1, Figure 3 and Table 1: The reported regret reductions versus EI are shown only for fixed GP hyperparameters and a single choice of h(·). No ablation isolates whether gains arise from stationarity awareness or from incidental regularization of high-variance regions; without this, the central empirical claim that EI-GN “provides a richer notion of improvement” remains under-supported.

    Authors: We agree that the current experimental design leaves room for alternative explanations of the observed gains. In the revised manuscript we will add an ablation study that (i) compares EI-GN against a variance-regularized variant of EI that does not use gradient information, (ii) reports results for multiple functional forms of h(·), and (iii) includes runs with both fixed and optimized GP hyperparameters. These additions will provide direct evidence that the performance improvements are attributable to the stationarity-aware component rather than generic regularization effects. revision: yes

Circularity Check

0 steps flagged

No significant circularity in EI-GN derivation

full rationale

The paper defines EI-GN as an explicit extension of standard EI that multiplies the improvement term by a function of the gradient norm, then derives its closed-form expectation under the joint Gaussian posterior of f and ∇f. This is a direct mathematical construction from the GP model properties rather than a redefinition of quantities in terms of themselves or a fitted parameter renamed as a prediction. No self-citation load-bearing steps, uniqueness theorems from prior author work, or ansatz smuggling are present in the abstract or description. The claimed consistency with the improvement framework follows from the algebraic reduction when the gradient term is constant, which is an independent verification step, not a tautology. The derivation chain is therefore self-contained against external GP mathematics.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The contribution rests on standard Bayesian optimization assumptions with a probabilistic surrogate (typically Gaussian process) for computing expectations and gradients. The main addition is the new EI-GN acquisition function itself.

axioms (1)
  • domain assumption Standard assumptions of Gaussian process surrogate models in Bayesian optimization for computing predictive means, variances, and gradients.
    The method relies on these to derive the closed-form EI-GN and to estimate stationarity via gradient norms.
invented entities (1)
  • EI-GN acquisition function no independent evidence
    purpose: To extend expected improvement by incorporating first-order stationarity via gradient norms.
    Newly defined in the paper to address vanishing signals in standard EI.

pith-pipeline@v0.9.0 · 5704 in / 1483 out tokens · 71519 ms · 2026-05-21T14:06:55.795235+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.