Rethinking Trust Region Bayesian Optimization in High Dimensions
Pith reviewed 2026-05-08 09:48 UTC · model grok-4.3
The pith
Scaling the GP lengthscale with dimension and trust region size keeps local models from degenerating in high-dimensional Bayesian optimization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TuRBO's local GP remains either excessively complex or overly simple as the dimension D and trust region side length L vary because its lengthscale is not adjusted accordingly. The proposed AdaScale-TuRBO variant scales the GP lengthscale with both D and L, thereby preserving kernel geometry and maintaining consistent prior complexity. Empirical tests show this variant robustly outperforms standard TuRBO and other popular high-dimensional BO methods on synthetic benchmarks and real-world trajectory planning tasks.
What carries the argument
The lengthscale scaling rule that multiplies the GP kernel lengthscale by factors of dimension D and trust-region side length L to hold kernel geometry and prior complexity fixed.
If this is right
- Local GP models maintain roughly constant complexity as problems grow in dimension or change in trust-region size.
- Optimization performance becomes more reliable on high-dimensional synthetic functions without additional hyperparameter search.
- Real-world tasks such as trajectory planning see consistent gains over both the original TuRBO and competing high-dimensional BO approaches.
- The method requires no new computational overhead beyond the simple scaling multiplication at each trust-region step.
Where Pith is reading between the lines
- The same scaling idea could be tested inside other local surrogate methods that also rely on stationary kernels inside shrinking regions.
- If the scaling rule proves stable, it might reduce the need for repeated lengthscale optimization inside each trust region.
- Applying the rule to problems whose noise level also varies with dimension would show whether the complexity preservation still holds.
Load-bearing premise
That scaling only the lengthscale with dimension and region size is enough to preserve the intended kernel behavior without creating new mismatches that would require retuning other hyperparameters.
What would settle it
Run both standard TuRBO and AdaScale-TuRBO on the same high-dimensional test function with fixed hyperparameters and measure whether the scaled version eliminates the performance drop that occurs when the local GP degenerates.
Figures
read the original abstract
Trust Region Bayesian Optimization (TuRBO) is an effective strategy for alleviating the curse of dimensionality in high-dimensional black-box optimization. However, inappropriate lengthscale design can cause the local Gaussian process (GP) model within the trust region to degenerate, leading to suboptimal performance in high dimensions. In this work, we show that TuRBO's local GP may remain either excessively complex or overly simple as the dimension $D$ and trust region side length $L$ vary. To address this issue, we propose a straightforward variant, AdaScale-TuRBO, which scales the GP lengthscale with both the problem dimension and trust region size, thereby preserving kernel geometry and maintaining consistent prior complexity. Empirically, we show that AdaScale-TuRBO can robustly outperform standard TuRBO and other popular high-dimensional BO methods on synthetic benchmarks and real-world trajectory planning tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper identifies that TuRBO's local Gaussian process models can degenerate in high dimensions because the lengthscale choice fails to maintain consistent prior complexity as dimension D and trust-region side length L vary. It proposes AdaScale-TuRBO, a simple variant that rescales the GP lengthscale with both D and L to preserve kernel geometry. The authors report that this change yields robust empirical gains over standard TuRBO and other high-dimensional BO baselines on synthetic functions and real-world trajectory-planning tasks.
Significance. A lightweight, hyperparameter-light fix that demonstrably improves TuRBO on both synthetic and applied problems would be a useful practical contribution to high-dimensional Bayesian optimization. The empirical results on trajectory planning add relevance, but the absence of an explicit scaling formula, derivation of prior-complexity invariance, and controlled ablation of outputscale/mean-function effects limits the depth and reproducibility of the claimed mechanism.
major comments (2)
- [§3] §3 (AdaScale-TuRBO definition): the manuscript states that lengthscale scaling with D and L 'preserves kernel geometry and maintains consistent prior complexity' but supplies neither an explicit formula (e.g., l_new = l * g(D,L)) nor a derivation showing that the rescaled kernel keeps the same effective degrees of freedom or marginal variance when the trust-region volume scales as L^D; without this, it is unclear whether outputscale or mean-function adjustments are implicitly required.
- [§5] §5 (Experiments): performance tables and figures report point estimates without error bars, seed counts, or statistical significance tests; moreover, the text does not state whether the outputscale (signal variance) or constant mean were held fixed or re-optimized after lengthscale rescaling, leaving open the possibility that observed gains arise from implicit retuning rather than geometry preservation alone.
minor comments (2)
- [Abstract / §1] The abstract and introduction repeatedly use 'kernel geometry' without a precise definition (e.g., in terms of correlation length relative to domain volume or eigenvalue decay); a short clarifying sentence would help readers.
- [§2] Notation for trust-region side length L versus the normalized hypercube side length should be made explicit in §2 to avoid confusion when D changes.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive overall assessment of our work. The comments identify important opportunities to strengthen the clarity and rigor of the presentation, and we will revise the manuscript accordingly.
read point-by-point responses
-
Referee: §3 (AdaScale-TuRBO definition): the manuscript states that lengthscale scaling with D and L 'preserves kernel geometry and maintains consistent prior complexity' but supplies neither an explicit formula (e.g., l_new = l * g(D,L)) nor a derivation showing that the rescaled kernel keeps the same effective degrees of freedom or marginal variance when the trust-region volume scales as L^D; without this, it is unclear whether outputscale or mean-function adjustments are implicitly required.
Authors: We agree that an explicit formula and supporting derivation would improve the manuscript. In the revised version we will add the precise scaling rule used by AdaScale-TuRBO together with a short derivation showing that the chosen rescaling keeps the effective correlation structure (and thus prior complexity) invariant with respect to changes in D and L. We will also state explicitly that the outputscale and mean function remain exactly as in the original TuRBO implementation and are not adjusted. revision: yes
-
Referee: §5 (Experiments): performance tables and figures report point estimates without error bars, seed counts, or statistical significance tests; moreover, the text does not state whether the outputscale (signal variance) or constant mean were held fixed or re-optimized after lengthscale rescaling, leaving open the possibility that observed gains arise from implicit retuning rather than geometry preservation alone.
Authors: We acknowledge these reporting omissions. The revised manuscript will include error bars (standard deviations across independent runs), state the number of random seeds employed for every experiment, and report statistical significance tests comparing AdaScale-TuRBO against the baselines. We will also clarify in the experimental protocol that the outputscale and constant mean function are held fixed at the default TuRBO values and are not re-optimized after lengthscale rescaling, confirming that performance differences arise from the geometry-preserving change alone. revision: yes
Circularity Check
No circularity: heuristic lengthscale scaling proposed and validated empirically
full rationale
The paper introduces AdaScale-TuRBO as a direct heuristic that multiplies the GP lengthscale by factors involving D and L. No derivation chain, equations, or fitted parameters are shown that reduce the claimed geometry preservation to a self-defined quantity or to a prior self-citation. The central contribution is an empirical demonstration of improved performance on benchmarks; the scaling rule is presented as a straightforward adjustment rather than a result derived from the method's own outputs. No self-definitional, fitted-input, or uniqueness-imported steps appear.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
PMLR, 2021. D. Eriksson, M. Pearce, J. Gardner, R. D. Turner, and M. Poloczek. Scalable global optimization via local bayesian optimization.Advances in neural informa- tion processing systems, 32, 2019. Z. Fan, W. Wang, S. H. Ng, and Q. Hu. Minimizing ucb: a better local search strategy in local bayesian optimization.Advances in Neural Information Pro- ce...
-
[2]
IEEE, 2024. W.-T. Tang, A. Chakrabarty, and J. Paulson. Tr- beacon: Shedding light on efficient behavior discov- ery in high-dimensional spaces with bayesian novelty search over trust regions. InNeurIPS 2024 Work- shop on Bayesian Decision-making and Uncertainty, 2024a. W.-T. Tang, A. Chakrabarty, and J. A. Paulson. Beacon: A bayesian optimization strateg...
-
[3]
[Yes] (b) An analysis of the properties and complexity (time, space, sample size) of any algorithm
For all models and algorithms presented, check if you include: (a) A clear description of the mathematical set- ting, assumptions, algorithm, and/or model. [Yes] (b) An analysis of the properties and complexity (time, space, sample size) of any algorithm. [Yes] (c) (Optional) Anonymized source code, with specification of all dependencies, including extern...
-
[4]
[Yes] (b) Complete proofs of all theoretical results
For any theoretical claim, check if you include: (a) Statements of the full set of assumptions of all theoretical results. [Yes] (b) Complete proofs of all theoretical results. [Yes] (c) Clear explanations of any assumptions. [Yes]
-
[5]
[Yes] (b) All the training details (e.g., data splits, hy- perparameters, how they were chosen)
For all figures and tables that present empirical results, check if you include: (a) The code, data, and instructions needed to reproduce the main experimental results (ei- ther in the supplemental material or as a URL). [Yes] (b) All the training details (e.g., data splits, hy- perparameters, how they were chosen). [Yes] (c) A clear definition of the spe...
-
[6]
[Yes] (b) The license information of the assets, if ap- plicable
If you are using existing assets (e.g., code, data, models) or curating/releasing new assets, check if you include: (a) Citations of the creator If your work uses ex- isting assets. [Yes] (b) The license information of the assets, if ap- plicable. [Not Applicable] (c) New assets either in the supplemental mate- rial or as a URL, if applicable. [Yes] (d) I...
-
[7]
If you used crowdsourcing or conducted research with human subjects, check if you include: W ei-Ting T ang, Joel A. Paulson (a) The full text of instructions given to partici- pants and screenshots. [Not Applicable] (b) Descriptions of potential participant risks, with links to Institutional Review Board (IRB) approvals if applicable. [Not Appli- cable] (...
work page 1998
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.