pith. sign in

arxiv: 1907.08953 · v1 · pith:CBDK2272new · submitted 2019-07-21 · 💻 cs.LG · stat.ML

High Dimensional Bayesian Optimization via Supervised Dimension Reduction

Pith reviewed 2026-05-24 18:46 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords Bayesian optimizationhigh-dimensional optimizationdimension reductionsliced inverse regressionkernel methodsregret bounds
0
0 comments X

The pith

Sliced inverse regression recovers the intrinsic low-dimensional subspace of an objective function to make high-dimensional Bayesian optimization feasible.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies sliced inverse regression directly inside the Bayesian optimization loop so that each new evaluation helps identify which directions in the input space actually affect the output. This supervised reduction step replaces the assumption of a fixed additive or linear embedding with a data-driven estimate of the effective subspace. A kernel extension handles nonlinear structure while keeping the per-iteration cost manageable. Regret bounds are stated for the resulting algorithm, and experiments on synthetic functions plus two real tasks show lower regret than prior high-dimensional BO methods.

Core claim

Supervised dimension reduction via sliced inverse regression can be inserted into Bayesian optimization to learn the intrinsic low-dimensional sub-structure of the unknown objective from the growing set of evaluated points and their values, yielding both computational savings and provable regret bounds.

What carries the argument

Sliced Inverse Regression (SIR), which partitions the observed function values into slices and solves a supervised regression problem to recover the effective dimension reduction subspace.

If this is right

  • The per-iteration cost of Gaussian-process modeling drops from scaling with the ambient dimension to scaling with the recovered subspace dimension.
  • A kernelized version of SIR extends the method to objective functions whose relevant structure is nonlinear in the original coordinates.
  • Explicit regret bounds are available that depend on the quality of the recovered subspace rather than the ambient dimension.
  • The same supervised reduction step can be reused across multiple optimization runs that share similar objective structure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The recovered subspace could be warm-started from previous runs on related tasks, turning the method into a form of transfer learning for Bayesian optimization.
  • If the subspace estimate stabilizes after a modest number of evaluations, later iterations could switch to a cheaper low-dimensional optimizer without further SIR computations.
  • The approach suggests a broader pattern: any supervised dimension-reduction technique whose training cost is linear in the number of observations could replace SIR inside the same BO wrapper.

Load-bearing premise

The unknown objective function possesses an intrinsic low-dimensional subspace that sliced inverse regression can recover from the sequence of evaluated points and their values.

What would settle it

Construct a high-dimensional test function whose level sets have no low-dimensional linear or kernel subspace and measure whether the SIR-augmented optimizer still achieves lower simple regret than standard high-dimensional BO baselines after a fixed budget of evaluations.

Figures

Figures reproduced from arXiv: 1907.08953 by Huiqi Li, Miao Zhang, Steven Su.

Figure 1
Figure 1. Figure 1: Simple regrets over iterations of our SIR-BO and KISIR-BO with compared approaches on two synthetic functions under different [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Performance for configuring neural network on Boston dataset and controlling a three-link walk robot. We plot the mean and [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance for Walk Robot controlling with different assumed dimensions of subspace. We plot means and [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
read the original abstract

Bayesian optimization (BO) has been broadly applied to computational expensive problems, but it is still challenging to extend BO to high dimensions. Existing works are usually under strict assumption of an additive or a linear embedding structure for objective functions. This paper directly introduces a supervised dimension reduction method, Sliced Inverse Regression (SIR), to high dimensional Bayesian optimization, which could effectively learn the intrinsic sub-structure of objective function during the optimization. Furthermore, a kernel trick is developed to reduce computational complexity and learn nonlinear subset of the unknowing function when applying SIR to extremely high dimensional BO. We present several computational benefits and derive theoretical regret bounds of our algorithm. Extensive experiments on synthetic examples and two real applications demonstrate the superiority of our algorithms for high dimensional Bayesian optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes applying Sliced Inverse Regression (SIR) directly to high-dimensional Bayesian optimization to recover an intrinsic low-dimensional subspace of the unknown objective from the sequence of evaluated points and function values. It develops a kernelized version of SIR to handle nonlinear embeddings and reduce computational cost in very high dimensions, derives regret bounds for the resulting algorithm, and reports empirical superiority over baselines on synthetic functions and two real-world applications.

Significance. If the regret bounds are valid and the SIR subspace estimates remain consistent under BO's adaptive sampling, the work would offer a principled way to relax the additive or linear-embedding assumptions common in high-dimensional BO while retaining theoretical guarantees. The explicit derivation of regret bounds and the kernel trick for nonlinear cases are concrete strengths; the experiments provide supporting evidence but do not substitute for the missing analysis of SIR consistency under sequential selection.

major comments (2)
  1. [§4, Theorem 2] §4 (Regret Analysis), Theorem 2: the stated regret bound assumes that the SIR estimate converges to the true effective dimension reduction (EDR) space at a rate sufficient to preserve the sublinear regret; however, the proof sketch does not verify that the elliptical-symmetry and sufficient-coverage conditions of SIR continue to hold when the design points are chosen by acquisition-function maximization rather than drawn from a fixed distribution.
  2. [§3.1] §3.1 (Online SIR Update): the recursive update for the SIR matrix is presented without an accompanying consistency argument or concentration inequality that accounts for the dependence between successive points induced by the BO loop; this leaves open whether the subspace estimate remains reliable after O(d) iterations.
minor comments (2)
  1. The abstract states that 'several computational benefits' are presented, yet the main text does not tabulate wall-clock times or flop counts alongside the regret curves; adding such a table would strengthen the practical-utility claim.
  2. Notation for the kernel matrix in the nonlinear SIR variant is introduced without an explicit definition of the feature map or the centering step; a short appendix paragraph would remove ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed comments. We address the two major comments point by point below, acknowledging the gaps in the current theoretical analysis while proposing targeted revisions.

read point-by-point responses
  1. Referee: [§4, Theorem 2] §4 (Regret Analysis), Theorem 2: the stated regret bound assumes that the SIR estimate converges to the true effective dimension reduction (EDR) space at a rate sufficient to preserve the sublinear regret; however, the proof sketch does not verify that the elliptical-symmetry and sufficient-coverage conditions of SIR continue to hold when the design points are chosen by acquisition-function maximization rather than drawn from a fixed distribution.

    Authors: We agree that the proof of Theorem 2 invokes the standard SIR assumptions (elliptical symmetry and sufficient coverage) without explicitly verifying their preservation under adaptive BO sampling. The regret bound is derived conditionally on the subspace estimation error decaying sufficiently fast, as is common in such analyses. In revision we will clarify this conditional nature in the theorem statement and add a short discussion noting that the acquisition function's exploration component helps maintain coverage, while acknowledging that a full verification of the SIR conditions under sequential selection remains an open technical question. We will not claim unconditional validity of the bound. revision: partial

  2. Referee: [§3.1] §3.1 (Online SIR Update): the recursive update for the SIR matrix is presented without an accompanying consistency argument or concentration inequality that accounts for the dependence between successive points induced by the BO loop; this leaves open whether the subspace estimate remains reliable after O(d) iterations.

    Authors: The recursive SIR update is introduced primarily for computational efficiency. We recognize that the manuscript lacks a dedicated consistency or concentration result that handles the dependence induced by the BO acquisition loop. The current regret analysis assumes the subspace estimate remains sufficiently accurate after the initial burn-in phase. In the revision we will add an explicit remark in §3.1 stating this limitation and indicating that a martingale-based analysis would be needed for a complete guarantee; we will also reference related online dimension-reduction literature. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper applies an existing supervised dimension reduction technique (SIR) to high-dimensional BO and states that it derives theoretical regret bounds. No load-bearing steps reduce by construction to fitted inputs, self-citations, or ansatzes imported from the authors' prior work. The central claim is an application of SIR during optimization rather than a self-referential derivation; the provided text exhibits no equations or definitions where a prediction equals its own fitting procedure. This is the expected non-circular outcome for a methods paper that imports an external technique.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on the domain assumption that the objective possesses a recoverable low-dimensional effective subspace; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption The objective function has an intrinsic low-dimensional sub-structure recoverable by sliced inverse regression from evaluated points.
    This premise is required for SIR to learn a useful embedding during optimization.

pith-pipeline@v0.9.0 · 5647 in / 1173 out tokens · 21854 ms · 2026-05-24T18:46:30.982268+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Sub-linear Regret Bounds for Bayesian Optimisation in Unknown Search Spaces

    stat.ML 2020-09 unverdicted novelty 6.0

    BO algorithms expand unknown search spaces via hyperharmonic series control to achieve sub-linear cumulative regret bounds, with a high-dimensional variant.

  2. Trading Convergence Rate with Computational Budget in High Dimensional Bayesian Optimization

    stat.ML 2019-11 unverdicted novelty 6.0

    A high-dimensional BO algorithm maximizes acquisition functions over discrete low-dimensional subspaces without assuming low-dimensional structure on the objective, achieving sub-linear regret with a tunable bound tha...

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · cited by 2 Pith papers · 1 internal anchor

  1. [1]

    Bayesian optimization of com- binatorial structures

    [Baptista and Poloczek, 2018] Ricardo Baptista and Matthias Poloczek. Bayesian optimization of com- binatorial structures. In International Conference on Machine Learning, pages 462–471,

  2. [2]

    Algorithms for hyper- parameter optimization

    [Bergstra et al., 2011] James S Bergstra, R ´emi Bardenet, Yoshua Bengio, and Bal ´azs K ´egl. Algorithms for hyper- parameter optimization. In Advances in neural informa- tion processing systems, pages 2546–2554,

  3. [3]

    Castro, and Andreas Krause

    [Chen et al., 2012] Bo Chen, Rui M. Castro, and Andreas Krause. Joint optimization and variable selection of high- dimensional gaussian processes. pages 1423–1430,

  4. [4]

    Linear dimensionality reduction: Survey, insights, and generalizations

    [Cunningham and Ghahramani, 2015] John P Cunningham and Zoubin Ghahramani. Linear dimensionality reduction: Survey, insights, and generalizations. The Journal of Ma- chine Learning Research, 16(1):2859–2900,

  5. [5]

    Gaussian processes for data-efficient learning in robotics and control

    [Deisenroth et al., 2015] Marc Peter Deisenroth, Dieter Fox, and Carl Edward Rasmussen. Gaussian processes for data-efficient learning in robotics and control. IEEE transactions on pattern analysis and machine intelligence, 37(2):408–423,

  6. [6]

    High-dimensional gaussian process ban- dits

    [Djolonga et al., 2013] Josip Djolonga, Andreas Krause, and V olkan Cevher. High-dimensional gaussian process ban- dits. In Advances in Neural Information Processing Sys- tems, pages 1025–1033,

  7. [7]

    An efficient approach for assessing hy- perparameter importance

    [Hutter et al., 2014] Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. An efficient approach for assessing hy- perparameter importance. In International Conference on Machine Learning, pages 754–762,

  8. [8]

    High dimensional bayesian optimisation and bandits via additive models

    [Kandasamy et al., 2015] Kirthevasan Kandasamy, Jeff Schneider, and Barnab ´as P ´oczos. High dimensional bayesian optimisation and bandits via additive models. In International Conference on Machine Learning , pages 295–304,

  9. [9]

    Neural architecture search with bayesian optimisa- tion and optimal transport

    [Kandasamy et al., 2018] Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabas Poczos, and Eric P Xing. Neural architecture search with bayesian optimisa- tion and optimal transport. In Advances in Neural Infor- mation Processing Systems, pages 2020–2029,

  10. [10]

    High dimensional bayesian optimization via restricted projection pursuit models

    [Li et al., 2016] Chun-Liang Li, Kirthevasan Kandasamy, Barnab´as P ´oczos, and Jeff Schneider. High dimensional bayesian optimization via restricted projection pursuit models. In Artificial Intelligence and Statistics , pages 884–892,

  11. [11]

    High dimensional bayesian optimization using dropout

    [Li et al., 2017] Cheng Li, Sunil Gupta, Santu Rana, Vu Nguyen, Svetha Venkatesh, and Alistair Shilton. High dimensional bayesian optimization using dropout. In Pro- ceedings of the 26th International Joint Conference on Ar- tificial Intelligence, pages 2096–2102. AAAI Press,

  12. [12]

    Sliced inverse regression for dimen- sion reduction

    [Li, 1991] Ker-Chau Li. Sliced inverse regression for dimen- sion reduction. Journal of the American Statistical Asso- ciation, 86(414):316–327,

  13. [13]

    Practical bayesian op- timization

    [Lizotte, 2008] Daniel James Lizotte. Practical bayesian op- timization. PhD thesis, University of Alberta,

  14. [14]

    [Rasmussen and Williams, 2004] Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning . MIT Press, Cambridge, Mas- sachusetts,

  15. [15]

    High-dimensional bayesian optimization via additive models with overlap- ping groups

    [Rolland et al., 2018] Paul Rolland, Jonathan Scarlett, Il- ija Bogunovic, and V olkan Cevher. High-dimensional bayesian optimization via additive models with overlap- ping groups. In International Conference on Artificial In- telligence and Statistics, pages 298–307,

  16. [16]

    Practical bayesian optimization of ma- chine learning algorithms

    [Snoek et al., 2012] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of ma- chine learning algorithms. In Advances in neural informa- tion processing systems, pages 2951–2959,

  17. [17]

    Information- theoretic regret bounds for gaussian process optimization in the bandit setting

    [Srinivas et al., 2012] Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias W Seeger. Information- theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 58(5):3250–3265,

  18. [18]

    Learning Integral Representations of Gaussian Processes

    [Tan and Mukherjee, 2018] Zilong Tan and Sayan Mukher- jee. Subspace-induced gaussian processes. arXiv preprint arXiv:1802.07528,

  19. [19]

    Bayesian optimization with dimension scheduling: Application to biological systems

    [Ulmasov et al., 2016] Doniyor Ulmasov, Caroline Baroukh, Benoit Chachuat, Marc Peter Deisenroth, and Ruth Mis- ener. Bayesian optimization with dimension scheduling: Application to biological systems. In Computer Aided Chemical Engineering, volume 38, pages 1051–1056. El- sevier,

  20. [20]

    Bayesian optimization in high dimensions via random embeddings

    [Wang et al., 2013] Ziyu Wang, Masrour Zoghi, Frank Hut- ter, David Matheson, Nando De Freitas, et al. Bayesian optimization in high dimensions via random embeddings. In IJCAI, pages 1778–1784,

  21. [21]

    Batched high-dimensional bayesian optimization via structural kernel learning

    [Wang et al., 2017] Zi Wang, Chengtao Li, Stefanie Jegelka, and Pushmeet Kohli. Batched high-dimensional bayesian optimization via structural kernel learning. In Interna- tional Conference on Machine Learning , pages 3656– 3664,

  22. [22]

    Feedback control of dynamic bipedal robot locomotion

    [Westervelt et al., 2007] Eric R Westervelt, Christine Chevallereau, Jun Ho Choi, Benjamin Morris, and Jessy W Grizzle. Feedback control of dynamic bipedal robot locomotion. CRC press,

  23. [23]

    Regularized sliced inverse regression for kernel models

    [Wu et al., 2007] Qiang Wu, F Liang, and S Mukherjee. Regularized sliced inverse regression for kernel models. Technical report, Technical Report 07-25, ISDS, Duke Univ,

  24. [24]

    Kernel sliced inverse regression with applications to classification

    [Wu, 2008] Han-Ming Wu. Kernel sliced inverse regression with applications to classification. Journal of Computa- tional and Graphical Statistics, 17(3):590–610,

  25. [25]

    Nonlinear dimension reduction with kernel sliced in- verse regression

    [Yeh et al., 2009] Yi-Ren Yeh, Su-Yun Huang, and Yuh-Jye Lee. Nonlinear dimension reduction with kernel sliced in- verse regression. IEEE Transactions on Knowledge and Data Engineering, 21(11):1590–1603, 2009