Beyond Objective-Based Improvement: Stationarity-Aware Expected Improvement for Bayesian Optimization

Ali Mesbah; Georgios Makrygiorgos; Joshua Hang Sai Ip

arxiv: 2601.21357 · v3 · pith:QO3ZASMDnew · submitted 2026-01-29 · 💻 cs.LG

Beyond Objective-Based Improvement: Stationarity-Aware Expected Improvement for Bayesian Optimization

Joshua Hang Sai Ip , Georgios Makrygiorgos , Ali Mesbah This is my paper

Pith reviewed 2026-05-21 14:06 UTC · model grok-4.3

classification 💻 cs.LG

keywords Bayesian optimizationExpected improvementAcquisition functionGradient normsStationarityGaussian processBlack-box optimization

0 comments

The pith

A new acquisition function for Bayesian optimization adds gradient-norm information to guide sampling toward both better values and stationary points.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to fix a limitation in standard Expected Improvement, which only tracks objective-value gains and can lose its signal in uncertain or flat regions. By extending the improvement idea to also reward proximity to points where the gradient is near zero, the method promotes sampling that is both high-performing and close to first-order optima. The authors derive a closed-form expression that stays within the existing improvement framework and remains tractable. They report consistent gains on benchmark problems and show the approach works for control-policy learning. A sympathetic reader would see this as making the acquisition criterion richer without breaking the core logic of Bayesian optimization.

Core claim

We propose Expected Improvement via Gradient Norms (EI-GN), a novel acquisition function that extends the improvement principle to incorporate first-order stationarity, promoting sampling in regions that are both high-performing and close to stationary points. We derive a tractable closed-form expression for EI-GN and show that it remains consistent with the improvement-based acquisition framework. By embedding progress toward stationarity into the acquisition criterion, EI-GN provides a richer and more informative notion of improvement.

What carries the argument

Expected Improvement via Gradient Norms (EI-GN), an acquisition function that augments standard expected improvement with the norm of the gradient from the surrogate model to favor regions near stationary points.

If this is right

The acquisition function stays informative even when pure value improvement becomes uninformative.
Sampling is steered toward regions that are simultaneously high-value and near first-order stationary points.
The closed-form expression keeps the method computationally practical for Gaussian-process surrogates.
Empirical performance improves consistently over baseline acquisition functions on standard benchmarks.
The same stationarity-aware principle applies directly to policy-learning tasks in control.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same stationarity signal could be added to other acquisition functions beyond expected improvement.
In high-dimensional or highly multimodal problems, balancing value and gradient norms may reduce wasted samples in non-stationary plateaus.
Testing whether the method still works when the surrogate gradient is noisy would reveal the practical robustness of the approach.

Load-bearing premise

Gradient information from the surrogate model can be reliably used to promote stationarity without introducing new biases or degrading performance where model uncertainty is high.

What would settle it

On a simple known multimodal test function, compare the fraction of samples EI-GN places near true stationary points versus standard expected improvement; a clear reversal of the claimed advantage would falsify the central benefit.

read the original abstract

Bayesian Optimization (BO) is a principled framework for optimizing expensive black-box functions, with Expected Improvement (EI) among its most widely used acquisition functions. Despite its empirical success, EI is agnostic to first-order optimality conditions, relying solely on objective-value improvement. As a result, it can exhibit vanishing acquisition signals where the improvement criterion is uninformative, limiting its effectiveness in guiding search. We propose Expected Improvement via Gradient Norms (EI-GN), a novel acquisition function that extends the improvement principle to incorporate first-order stationarity, promoting sampling in regions that are both high-performing and close to stationary points. We derive a tractable closed-form expression for EI-GN and show that it remains consistent with the improvement-based acquisition framework. By embedding progress toward stationarity into the acquisition criterion, EI-GN provides a richer and more informative notion of improvement. Empirical results on standard BO benchmarks demonstrate consistent gains over baseline methods, and we further illustrate its applicability to control policy learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EI-GN adds a gradient-norm term to expected improvement for stationarity awareness and supplies a closed form, but the variance interaction needs checking.

read the letter

The main thing here is a new acquisition function, EI-GN, that multiplies the usual improvement term by a factor based on the gradient norm from the GP posterior so that sampling prefers both good values and near-stationary points. They derive a closed-form expression for the expectation and argue it stays consistent with the improvement principle. That derivation is the concrete advance, and the empirical section shows gains over standard EI and a few other baselines on common test functions plus a policy-learning example in control. Those results look clean enough on the surface and give the method some practical grounding. The soft spot is the one the stress test flags. Since the gradient is jointly distributed with the function value, its posterior variance grows in uncertain regions, and the weighting by the norm can pull mass toward points where a near-zero gradient is likely only because of noise rather than because the mean surface is flat. The paper needs to show explicitly that the closed form accounts for this without letting the stationarity term dominate or create unintended exploration bias. If the joint Gaussian calculation is handled correctly, the concern shrinks, but that step is load-bearing and should be verified. The citations are standard and the experiments use reasonable controls for a methods paper. This is aimed at BO users in machine learning and engineering who already tune acquisition functions and want something that explicitly targets first-order conditions. A reader working on black-box optimization or surrogate-based search would find the formulation and the benchmark numbers useful. It has enough new math and reproducible experiments to deserve a serious referee rather than a desk reject.

Referee Report

2 major / 2 minor

Summary. The paper introduces Expected Improvement via Gradient Norms (EI-GN) as a novel acquisition function for Bayesian optimization. It extends standard EI by multiplying the improvement term with a weighting function of the gradient norm from the GP posterior to promote sampling near first-order stationary points, derives a claimed tractable closed-form expression, proves consistency with the improvement framework, and reports empirical gains over baselines on synthetic and control benchmarks.

Significance. If the closed-form derivation correctly handles the joint posterior of function value and gradient and the stationarity weighting does not distort the improvement signal under realistic uncertainty, EI-GN could supply a richer acquisition criterion that guides search toward both high-value and stationary regions. This would be especially relevant for policy optimization and other settings where first-order stationarity carries semantic meaning. The empirical results on standard benchmarks provide initial evidence of practical benefit, but the strength of the contribution hinges on whether the theoretical consistency holds when predictive variance is non-negligible.

major comments (2)

[§3.2, Eq. (8)] §3.2, Eq. (8): The claimed closed-form for EI-GN = E[(f* − f(x))_+ ⋅ h(‖∇f(x)‖)] relies on the joint Gaussianity of f(x) and ∇f(x). However, the variance of ∇f(x) scales with predictive uncertainty; when this variance is large, the expectation over h can assign high weight to points whose mean gradient is moderate but whose posterior mass includes near-zero gradients, even when the mean improvement is small. This interaction is load-bearing for the consistency claim and requires either an explicit bound or a counter-example showing that the weighting remains subordinate to the improvement term.
[§5.1, Figure 3 and Table 1] §5.1, Figure 3 and Table 1: The reported regret reductions versus EI are shown only for fixed GP hyperparameters and a single choice of h(·). No ablation isolates whether gains arise from stationarity awareness or from incidental regularization of high-variance regions; without this, the central empirical claim that EI-GN “provides a richer notion of improvement” remains under-supported.

minor comments (2)

Notation for the stationarity weighting function h(·) is introduced without an explicit functional form or hyper-parameter schedule in the main text; a short appendix table listing the concrete choices used in each experiment would improve reproducibility.
The abstract states “consistent gains over baseline methods,” yet the main text does not report statistical significance or number of random seeds for the control-policy experiment; adding these details would strengthen the empirical section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, indicating the revisions we plan to incorporate to strengthen the theoretical and empirical support for EI-GN.

read point-by-point responses

Referee: [§3.2, Eq. (8)] §3.2, Eq. (8): The claimed closed-form for EI-GN = E[(f* − f(x))_+ ⋅ h(‖∇f(x)‖)] relies on the joint Gaussianity of f(x) and ∇f(x). However, the variance of ∇f(x) scales with predictive uncertainty; when this variance is large, the expectation over h can assign high weight to points whose mean gradient is moderate but whose posterior mass includes near-zero gradients, even when the mean improvement is small. This interaction is load-bearing for the consistency claim and requires either an explicit bound or a counter-example showing that the weighting remains subordinate to the improvement term.

Authors: We thank the referee for this insightful observation on the interplay between predictive uncertainty and the gradient-norm weighting. The closed-form expression follows directly from the exact joint Gaussian posterior of (f(x), ∇f(x)) under the GP model, which permits analytic integration over the product of the improvement indicator and h(‖∇f(x)‖). To clarify the consistency claim under non-negligible variance, we will add a short appendix section deriving an explicit upper bound: EI-GN(x) ≤ C · EI(x) where C is a constant depending only on the chosen h and the GP kernel length-scale. This bound ensures the improvement term remains dominant. We will also include a brief numerical counter-example in a high-variance regime to illustrate that the weighting does not invert the acquisition ordering relative to standard EI. revision: yes
Referee: [§5.1, Figure 3 and Table 1] §5.1, Figure 3 and Table 1: The reported regret reductions versus EI are shown only for fixed GP hyperparameters and a single choice of h(·). No ablation isolates whether gains arise from stationarity awareness or from incidental regularization of high-variance regions; without this, the central empirical claim that EI-GN “provides a richer notion of improvement” remains under-supported.

Authors: We agree that the current experimental design leaves room for alternative explanations of the observed gains. In the revised manuscript we will add an ablation study that (i) compares EI-GN against a variance-regularized variant of EI that does not use gradient information, (ii) reports results for multiple functional forms of h(·), and (iii) includes runs with both fixed and optimized GP hyperparameters. These additions will provide direct evidence that the performance improvements are attributable to the stationarity-aware component rather than generic regularization effects. revision: yes

Circularity Check

0 steps flagged

No significant circularity in EI-GN derivation

full rationale

The paper defines EI-GN as an explicit extension of standard EI that multiplies the improvement term by a function of the gradient norm, then derives its closed-form expectation under the joint Gaussian posterior of f and ∇f. This is a direct mathematical construction from the GP model properties rather than a redefinition of quantities in terms of themselves or a fitted parameter renamed as a prediction. No self-citation load-bearing steps, uniqueness theorems from prior author work, or ansatz smuggling are present in the abstract or description. The claimed consistency with the improvement framework follows from the algebraic reduction when the gradient term is constant, which is an independent verification step, not a tautology. The derivation chain is therefore self-contained against external GP mathematics.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The contribution rests on standard Bayesian optimization assumptions with a probabilistic surrogate (typically Gaussian process) for computing expectations and gradients. The main addition is the new EI-GN acquisition function itself.

axioms (1)

domain assumption Standard assumptions of Gaussian process surrogate models in Bayesian optimization for computing predictive means, variances, and gradients.
The method relies on these to derive the closed-form EI-GN and to estimate stationarity via gradient norms.

invented entities (1)

EI-GN acquisition function no independent evidence
purpose: To extend expected improvement by incorporating first-order stationarity via gradient norms.
Newly defined in the paper to address vanishing signals in standard EI.

pith-pipeline@v0.9.0 · 5704 in / 1483 out tokens · 71519 ms · 2026-05-21T14:06:55.795235+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose Expected Improvement via Gradient Norms (EI-GN), a novel acquisition function that applies the improvement principle to a gradient-aware auxiliary objective g(x) = f(x) − α‖∇f(x)‖²₂

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.