pith. sign in

arxiv: 2605.20249 · v1 · pith:ZYR3OH67new · submitted 2026-05-18 · 💻 cs.LG · cs.AI

Automated Kernel Discovery Towards Understanding High-dimensional Bayesian Optimization

Pith reviewed 2026-05-21 08:22 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords kernel discoverybayesian optimizationgaussian processeshigh-dimensional optimizationlarge language modelsevolutionary algorithmsautomated machine learning
0
0 comments X

The pith

An LLM-driven evolutionary framework discovers novel Gaussian process kernels for high-dimensional Bayesian optimization without conditioning on raw observations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an automated method called Kernel Discovery to design kernels for Gaussian processes used in Bayesian optimization when inputs have many dimensions. Manual kernel design is difficult in these settings, and prior automated approaches are restricted to simple combinations of base kernels or must feed raw data into language models, which hits practical limits. The new approach prompts large language models in two stages to first suggest new mathematical kernel expressions and then convert those expressions into working code. Kernels are then chosen using a leave-one-out continuous ranked probability score that discourages overfitting. Tests on five standard high-dimensional benchmarks show the method ranks first on average among seventeen competing approaches.

Core claim

We introduce Kernel Discovery, a LLM-driven evolutionary framework for high-dimensional BO that searches a broader kernel space beyond predefined composition rules and does not require conditioning on observations. Motivated by the observation that directly prompting an LLM to generate kernel code yields syntactically varied but functionally identical kernels, we adopt a two-stage approach: an LLM first proposes novel mathematical forms, then a second LLM call converts each form into validated, executable code. We also propose a leave-one-out continuous ranked probability score (LOO-CRPS) as a selection criterion that penalizes overfitted kernels. On five high-dimensional BO benchmarks, our

What carries the argument

The two-stage LLM process in which one model proposes novel mathematical kernel forms and a second model turns them into executable code, together with LOO-CRPS selection that penalizes overfitting.

Load-bearing premise

Large language models can reliably propose mathematically novel and functionally distinct kernel forms without access to raw observations, and LOO-CRPS will select kernels that generalize rather than overfit in high-dimensional regimes.

What would settle it

Running Kernel Discovery on a new high-dimensional Bayesian optimization benchmark and checking whether the average rank across methods stays near 1.2 or whether the discovered kernels fail to beat standard automatic relevance determination kernels.

Figures

Figures reproduced from arXiv: 2605.20249 by Inhyuck Song, Jaewoo Lee, Jinkyoo Park, Taeyoung Yun, Woocheol Shin.

Figure 1
Figure 1. Figure 1: Motivation Figure. (Left): In high-dimensional BO, a conventional LLM-based BO does not work due to the out-of-context limits and the difficulty of extracting meaningful patterns. (Right): Our pipeline enables us to discover effective kernels for high-dimensional BO beyond compositions. these breakthroughs are promising, identifying such effective priors or transformations still requires extensive manual e… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the method. Given an initial population, we instruct an LLM to generate novel [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Prompts for Kernel Discovery. To discover novel kernels for high-dimensional BO, we [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Learning curve of various methods across standard benchmarks. We visualize the top-2 [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Ablation studies on the two-stage approach for kernel discovery. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ablation on each compo￾nent of Kernel Discovery [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Post-analysis on kernel discovery. Both experiments are conducted in SVM benchmark. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Mathematical form discovery prompt template [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Code conversion prompt template. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Composition prompt template. Kernel Validation Failure Rates. The failure probability in Tables 2 and 3 denotes the empirical fraction of LLM-generated kernels rejected by our structural validator. A generated kernel knew is accepted only when V(knew) = Vagn(knew) ∧ Vpsd(knew) = 1. Here, Vagn checks whether the generated kernel implementation is dimension- and shape-agnostic, as required by the BO pipelin… view at source ↗
Figure 11
Figure 11. Figure 11: Full learning curves for all methods across five benchmarks. [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Functional redundancy example from direct code generation. Both snippets compute [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Functional redundancy example from direct code generation. Both snippets compute [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Functional redundancy example from direct code generation. Both snippets compute [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Ablation studies on each component of our method across other benchmarks. [PITH_FULL_IMAGE:figures/full_fig_p025_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Ablation studies on the base population for Compositional Search and CAKE baselines. [PITH_FULL_IMAGE:figures/full_fig_p026_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Ablation studies on the evaluation metric for Compositional Search and CAKE baselines. [PITH_FULL_IMAGE:figures/full_fig_p026_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Representative cross-covariance failures caught by [PITH_FULL_IMAGE:figures/full_fig_p027_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Representative PSD failures caught by Vpsd. Both include terms that do not generally preserve positive semi-definiteness and can yield indefinite Gram matrices. Problematic terms are highlighted in red. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Forward code for the discovered kernel k1. 1 def forward ( self , x1 , x2 , diag = False , ** params ): 2 if x1 . dim () == 1: x1 = x1 . unsqueeze ( -1) 3 if x2 . dim () == 1: x2 = x2 . unsqueeze ( -1) 4 arc_weight = self . raw_arc_weight_constraint . transform ( self . raw_arc_weight ) 5 global_scale = self . raw_global_scale_constraint . transform ( self . raw_global_scale ) 6 angular_weights = self . r… view at source ↗
Figure 21
Figure 21. Figure 21: Forward code for the discovered kernel k2. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Forward code for the discovered kernel k3. Evolving Population on Other Benchmarks [PITH_FULL_IMAGE:figures/full_fig_p031_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Post-analysis of kernel discovery on four additional benchmarks. For each benchmark, we [PITH_FULL_IMAGE:figures/full_fig_p031_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Analysis on boundary-seeking behavior of several kernels. [PITH_FULL_IMAGE:figures/full_fig_p033_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Code snippet of the discovered kernel from SVM benchmark. [PITH_FULL_IMAGE:figures/full_fig_p034_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Code snippet of an additional discovered kernel with strong transferability. This kernel [PITH_FULL_IMAGE:figures/full_fig_p035_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Code snippet of another additional discovered kernel with strong transferability. This [PITH_FULL_IMAGE:figures/full_fig_p036_27.png] view at source ↗
read the original abstract

Gaussian Process (GP) kernels are central to Bayesian optimization (BO), yet designing effective kernels for high-dimensional problems still relies on extensive manual engineering. Existing automated approaches struggle in high dimensions for two bottlenecks: their kernel search space is limited to additions and multiplications of base kernels, and LLM-based approaches require conditioning on raw observations, which becomes infeasible due to context-length limits and the difficulty of extracting meaningful patterns. We introduce \textbf{Kernel Discovery}, a LLM-driven evolutionary framework for high-dimensional BO that searches a broader kernel space beyond predefined composition rules and does not require conditioning on observations. Motivated by the observation that directly prompting an LLM to generate kernel code yields syntactically varied but functionally identical kernels, we adopt a two-stage approach: an LLM first proposes novel mathematical forms, then a second LLM call converts each form into validated, executable code. We also propose a leave-one-out continuous ranked probability score (LOO-CRPS) as a selection criterion that penalizes overfitted kernels. On five high-dimensional BO benchmarks, our method achieves an average rank of \textbf{1.2 out of 17}, outperforming competitive baselines. We further analyze the discovered kernels to identify which kernels lead to improvements in high-dimensional BO.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Kernel Discovery, an LLM-driven evolutionary framework for automated kernel design in high-dimensional Bayesian optimization. It uses a two-stage process in which one LLM proposes novel mathematical kernel forms and a second converts them to executable code, avoiding direct observation conditioning to sidestep context-length limits. A leave-one-out continuous ranked probability score (LOO-CRPS) serves as the selection criterion to penalize overfitting. The method is evaluated on five high-dimensional BO benchmarks, where it reports an average rank of 1.2 out of 17 competing approaches, and the discovered kernels are further analyzed to identify properties that improve high-dimensional BO performance.

Significance. If the performance claims are substantiated, the work offers a practical advance in automated kernel engineering for GPs in regimes where manual design is intractable. The two-stage LLM proposal and LOO-CRPS selection address documented bottlenecks in prior search-based and LLM-based kernel methods. Credit is due for the broader kernel space explored beyond fixed composition rules and for the explicit analysis of which kernel characteristics benefit high-dimensional BO. The empirical nature of the contribution makes rigorous experimental reporting essential to its significance.

major comments (3)
  1. [§4] §4 (Experimental results): the headline claim of average rank 1.2 out of 17 is presented without the number of independent runs, standard deviations across runs, or any statistical significance tests (e.g., paired Wilcoxon or Friedman tests) against the 16 baselines. This information is load-bearing for the central performance assertion.
  2. [§3.3] §3.3 (LOO-CRPS selection): the criterion is motivated as penalizing overfit kernels, yet the manuscript provides no ablation or held-out generalization test demonstrating that LOO-CRPS prefers kernels that extrapolate rather than fit noise in the sparse high-dimensional regime. Given the skeptic concern that leave-one-out on limited data may still reward excessive flexibility, this validation is required to support the selection mechanism.
  3. [§3.2] §3.2 (two-stage proposal): the claim that the two-stage LLM process yields functionally distinct kernels is central to the broader search space argument, but the paper does not quantify functional diversity (e.g., via kernel Gram-matrix distances or predictive behavior on synthetic functions) or describe explicit diversity maintenance in the evolutionary loop.
minor comments (2)
  1. [Abstract / §4] The abstract and §4 refer to “five high-dimensional BO benchmarks” without naming them or providing a summary table of their dimensions and characteristics; explicit identification would aid reproducibility.
  2. [§3.3] Notation for the LOO-CRPS formula in §3.3 could be clarified by explicitly stating the predictive distribution used for each left-out point.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important aspects for strengthening the experimental rigor and supporting arguments. We address each major comment point by point below, indicating the revisions we plan to make.

read point-by-point responses
  1. Referee: [§4] §4 (Experimental results): the headline claim of average rank 1.2 out of 17 is presented without the number of independent runs, standard deviations across runs, or any statistical significance tests (e.g., paired Wilcoxon or Friedman tests) against the 16 baselines. This information is load-bearing for the central performance assertion.

    Authors: We agree that the experimental reporting in §4 requires these details to fully substantiate the performance claims. In the revised manuscript, we will explicitly state the number of independent runs, report standard deviations across runs, and include statistical significance tests such as the Friedman test with post-hoc analysis to compare our method against the baselines. revision: yes

  2. Referee: [§3.3] §3.3 (LOO-CRPS selection): the criterion is motivated as penalizing overfit kernels, yet the manuscript provides no ablation or held-out generalization test demonstrating that LOO-CRPS prefers kernels that extrapolate rather than fit noise in the sparse high-dimensional regime. Given the skeptic concern that leave-one-out on limited data may still reward excessive flexibility, this validation is required to support the selection mechanism.

    Authors: We appreciate the referee's concern regarding the validation of LOO-CRPS. While the criterion is intended to penalize overfitting through its leave-one-out predictive evaluation, we agree that additional evidence would strengthen the case. In the revised manuscript, we will add an ablation study in §3.3 comparing LOO-CRPS to other selection criteria, along with a held-out generalization test on synthetic high-dimensional data to show that it favors kernels with better extrapolation properties. revision: yes

  3. Referee: [§3.2] §3.2 (two-stage proposal): the claim that the two-stage LLM process yields functionally distinct kernels is central to the broader search space argument, but the paper does not quantify functional diversity (e.g., via kernel Gram-matrix distances or predictive behavior on synthetic functions) or describe explicit diversity maintenance in the evolutionary loop.

    Authors: We thank the referee for this observation. To better support the claim that the two-stage process enables a functionally broader kernel space, we will revise §3.2 to include quantitative analysis of functional diversity, for example by computing distances between kernel Gram matrices evaluated on synthetic functions and by examining predictive performance differences. We will also describe the diversity-promoting aspects of the evolutionary loop, such as the mutation and selection procedures. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical search procedure with external benchmark evaluation

full rationale

The paper introduces an LLM-driven evolutionary framework for automated kernel discovery in high-dimensional BO. It relies on a two-stage proposal process and LOO-CRPS selection, then reports comparative performance (avg rank 1.2/17) on five external benchmarks. No mathematical derivations, first-principles predictions, or fitted parameters are presented that could reduce to self-definitions or self-citations. The central claims rest on algorithmic description plus independent experimental results rather than any internal consistency loop.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the unverified capability of current LLMs to generate useful novel kernel expressions and on the effectiveness of LOO-CRPS as a generalizable selection metric; no explicit free parameters or invented physical entities are described.

axioms (2)
  • domain assumption LLMs prompted directly produce syntactically varied but functionally identical kernels, motivating the two-stage separation of form proposal and code generation.
    Stated as motivation in the abstract for adopting the two-stage approach.
  • domain assumption LOO-CRPS penalizes overfitting kernels more effectively than standard likelihood in high-dimensional BO settings.
    Introduced as the selection criterion without further justification in the abstract.

pith-pipeline@v0.9.0 · 5760 in / 1408 out tokens · 44341 ms · 2026-05-21T08:22:27.705539+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · 5 internal anchors

  1. [1]

    Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020

    Ryan Turner, David Eriksson, Michael McCourt, Juha Kiili, Eero Laaksonen, Zhen Xu, and Isabelle Guyon. Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020. InNeurIPS 2020 Competition and Demonstration Track, 2021

  2. [2]

    Pre-training helps bayesian optimization too

    Zi Wang, George E Dahl, Kevin Swersky, Chansoo Lee, Zelda Mariet, Zachary Nado, Justin Gilmer, Jasper Snoek, and Zoubin Ghahramani. Pre-training helps bayesian optimization too. InICML Workshop on Adaptive Experimental Design and Active Learning in the Real World, 2022

  3. [3]

    Raiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces

    Kevin Swersky, David Duvenaud, Jasper Snoek, Frank Hutter, and Michael A Osborne. Raiders of the lost architecture: Kernels for bayesian optimization in conditional parameter spaces. arXiv preprint arXiv:1409.4011, 2014

  4. [4]

    Neural architecture search with bayesian optimisation and optimal transport

    Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabas Poczos, and Eric P Xing. Neural architecture search with bayesian optimisation and optimal transport. InAdvances in Neural Information Processing Systems (NeurIPS), 2018

  5. [5]

    Local latent space bayesian optimization over structured inputs

    Natalie Maus, Haydn Jones, Juston Moore, Matt J Kusner, John Bradshaw, and Jacob Gard- ner. Local latent space bayesian optimization over structured inputs. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

  6. [6]

    Latent bayesian optimization via autoregressive normalizing flows

    Seunghun Lee, Jinyoung Park, Jaewon Chu, Minseo Yoon, and Hyunwoo J Kim. Latent bayesian optimization via autoregressive normalizing flows. InInternational Conference on Learning Representations (ICLR), 2025

  7. [7]

    Learning search space partition for black-box optimization using monte carlo tree search

    Linnan Wang, Rodrigo Fonseca, and Yuandong Tian. Learning search space partition for black-box optimization using monte carlo tree search. InAdvances in Neural Information Processing Systems (NeurIPS), 2020

  8. [8]

    Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics.Machine Learning, 2023

    Felix Berkenkamp, Andreas Krause, and Angela P Schoellig. Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics.Machine Learning, 2023

  9. [9]

    A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise.Journal of Basic Engineering, 1964

    Harold J Kushner. A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise.Journal of Basic Engineering, 1964

  10. [10]

    Cambridge University Press, 2023

    Roman Garnett.Bayesian optimization. Cambridge University Press, 2023

  11. [11]

    Additive gaussian processes

    David K Duvenaud, Hannes Nickisch, and Carl Rasmussen. Additive gaussian processes. In Advances in Neural Information Processing Systems (NeurIPS), 2011

  12. [12]

    High dimensional bayesian optimisation and bandits via additive models

    Kirthevasan Kandasamy, Jeff Schneider, and Barnabás Póczos. High dimensional bayesian optimisation and bandits via additive models. InInternational Conference on Machine Learning (ICML), 2015

  13. [13]

    Discovering and exploiting additive structure for bayesian optimization

    Jacob Gardner, Chuan Guo, Kilian Weinberger, Roman Garnett, and Roger Grosse. Discovering and exploiting additive structure for bayesian optimization. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2017

  14. [14]

    Additive gaussian processes revisited

    Xiaoyu Lu, Alexis Boukouvalas, and James Hensman. Additive gaussian processes revisited. InInternational Conference on Machine Learning (ICML), 2022

  15. [15]

    Bayesian optimization in a billion dimensions via random embeddings.Journal of Artificial Intelligence Research, 2016

    Ziyu Wang, Frank Hutter, Masrour Zoghi, David Matheson, and Nando De Feitas. Bayesian optimization in a billion dimensions via random embeddings.Journal of Artificial Intelligence Research, 2016

  16. [16]

    A framework for bayesian optimization in embedded subspaces

    Amin Nayebi, Alexander Munteanu, and Matthias Poloczek. A framework for bayesian optimization in embedded subspaces. InInternational Conference on Machine Learning (ICML), 2019

  17. [17]

    Re-examining linear em- beddings for high-dimensional bayesian optimization

    Ben Letham, Roberto Calandra, Akshara Rai, and Eytan Bakshy. Re-examining linear em- beddings for high-dimensional bayesian optimization. InAdvances in Neural Information Processing Systems (NeurIPS), 2020. 10

  18. [18]

    High-dimensional bayesian optimization with sparse axis-aligned subspaces

    David Eriksson and Martin Jankowiak. High-dimensional bayesian optimization with sparse axis-aligned subspaces. InConference on Uncertainty in Artificial Intelligence (UAI), 2021

  19. [19]

    Increasing the scope as you learn: Adaptive bayesian optimization in nested subspaces

    Leonard Papenmeier, Luigi Nardi, and Matthias Poloczek. Increasing the scope as you learn: Adaptive bayesian optimization in nested subspaces. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

  20. [20]

    Scalable global optimization via local bayesian optimization

    David Eriksson, Michael Pearce, Jacob Gardner, Ryan D Turner, and Matthias Poloczek. Scalable global optimization via local bayesian optimization. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

  21. [21]

    Scalable constrained bayesian optimization

    David Eriksson and Matthias Poloczek. Scalable constrained bayesian optimization. InInterna- tional Conference on Artificial Intelligence and Statistics (AISTATS), 2021

  22. [22]

    Joint composite latent space bayesian optimization

    Natalie Maus, Zhiyuan Jerry Lin, Maximilian Balandat, and Eytan Bakshy. Joint composite latent space bayesian optimization. InInternational Conference on Machine Learning (ICML), 2024

  23. [23]

    Vanilla bayesian optimization performs great in high dimensions

    Carl Hvarfner, Erik Orm Hellsten, and Luigi Nardi. Vanilla bayesian optimization performs great in high dimensions. InInternational Conference on Machine Learning (ICML), 2024

  24. [24]

    Standard gaussian process is all you need for high-dimensional bayesian optimization

    Zhitong Xu, Haitao Wang, Jeff M Phillips, and Shandian Zhe. Standard gaussian process is all you need for high-dimensional bayesian optimization. InInternational Conference on Learning Representations (ICLR), 2025

  25. [25]

    Bock: Bayesian optimization with cylindrical kernels

    ChangYong Oh, Efstratios Gavves, and Max Welling. Bock: Bayesian optimization with cylindrical kernels. InInternational Conference on Machine Learning (ICML), 2018

  26. [26]

    We still don’t understand high-dimensional bayesian optimization

    Colin Doumont, Donney Fan, Natalie Maus, Jacob R Gardner, Henry Moss, and Geoff Pleiss. We still don’t understand high-dimensional bayesian optimization. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2026

  27. [27]

    The Llama 3 Herd of Models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

  28. [28]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

  29. [29]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025

  30. [30]

    Mathematical discoveries from program search with large language models

    Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M Pawan Kumar, Emilien Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming Wang, Omar Fawzi, et al. Mathematical discoveries from program search with large language models. Nature, 2024

  31. [31]

    AlphaEvolve: A coding agent for scientific and algorithmic discovery

    Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. Alphaevolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131, 2025

  32. [32]

    Funbo: Discovering acquisition functions for bayesian optimization with funsearch

    Virginia Aglietti, Ira Ktena, Jessica Schrouff, Eleni Sgouritsa, Francisco Ruiz, Alan Malek, Alexis Bellot, and Silvia Chiappa. Funbo: Discovering acquisition functions for bayesian optimization with funsearch. InInternational Conference on Machine Learning (ICML), 2025

  33. [33]

    Trajevo: Trajectory prediction heuristics design via llm-driven evolution

    Zhikai Zhao, Chuanbo Hua, Federico Berto, Kanghoon Lee, Zihan Ma, Jiachen Li, and Jinkyoo Park. Trajevo: Trajectory prediction heuristics design via llm-driven evolution. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026. 11

  34. [34]

    Self-evolverec: Self-evolving recommender systems with llm-based directional feedback.arXiv preprint arXiv:2602.12612, 2026

    Sein Kim, Sangwu Park, Hongseok Kang, Wonjoong Kim, Jimin Seo, Yeonjun In, Kanghoon Yoon, and Chanyoung Park. Self-evolverec: Self-evolving recommender systems with llm-based directional feedback.arXiv preprint arXiv:2602.12612, 2026

  35. [35]

    Adaptive and safe bayesian optimization in high dimensions via one-dimensional subspaces

    Johannes Kirschner, Mojmir Mutny, Nicole Hiller, Rasmus Ischebeck, and Andreas Krause. Adaptive and safe bayesian optimization in high dimensions via one-dimensional subspaces. In International Conference on Machine Learning (ICML), 2019

  36. [36]

    Monte carlo tree search based variable selection for high dimensional bayesian optimization

    Lei Song, Ke Xue, Xiaobin Huang, and Chao Qian. Monte carlo tree search based variable selection for high dimensional bayesian optimization. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

  37. [37]

    O., Hvarfner, C., Papenmeier, L., and Nardi, L

    Erik Orm Hellsten, Carl Hvarfner, Leonard Papenmeier, and Luigi Nardi. High-dimensional bayesian optimization with group testing.arXiv preprint arXiv:2310.03515, 2023

  38. [38]

    Multi-objective bayesian optimization over high-dimensional search spaces

    Samuel Daulton, David Eriksson, Maximilian Balandat, and Eytan Bakshy. Multi-objective bayesian optimization over high-dimensional search spaces. InConference on Uncertainty in Artificial Intelligence (UAI), 2022

  39. [39]

    Discovering many diverse solutions with bayesian optimization

    Natalie Maus, Kaiwen Wu, David Eriksson, and Jacob Gardner. Discovering many diverse solutions with bayesian optimization. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023

  40. [40]

    Feasibility-driven trust region bayesian optimization

    Paolo Ascia, Elena Raponi, Thomas Bäck, and Fabian Duddeck. Feasibility-driven trust region bayesian optimization. InInternational Conference on Automated Machine Learning (AutoML), 2025

  41. [41]

    Understanding high-dimensional bayesian optimization

    Leonard Papenmeier, Matthias Poloczek, and Luigi Nardi. Understanding high-dimensional bayesian optimization. InInternational Conference on Machine Learning (ICML), 2025

  42. [42]

    Cylindrical thompson sampling for high-dimensional bayesian optimization

    Bahador Rashidi, Kerrick Johnstonbaugh, and Chao Gao. Cylindrical thompson sampling for high-dimensional bayesian optimization. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024

  43. [43]

    Adaptive acquisition selection for bayesian optimization with large language models

    Giang Ngo, Dat Phan Trong, Dang Nguyen, Sunil Gupta, and Svetha Venkatesh. Adaptive acquisition selection for bayesian optimization with large language models. InInternational Conference on Learning Representations (ICLR), 2026

  44. [44]

    Llamea-bo: A large language model evolutionary algorithm for automatically generating bayesian optimization algorithms

    Wenhu Li, Niki van Stein, Thomas Bäck, and Elena Raponi. Llamea-bo: A large language model evolutionary algorithm for automatically generating bayesian optimization algorithms. arXiv preprint arXiv:2505.21034, 2025

  45. [45]

    Large language models to enhance bayesian optimization

    Tennison Liu, Nicolás Astorga, Nabeel Seedat, and Mihaela van der Schaar. Large language models to enhance bayesian optimization. InInternational Conference on Learning Representa- tions (ICLR), 2024

  46. [46]

    Adaptive kernel design for bayesian optimization is a piece of cake with llms

    Richard Cornelius Suwandi, Feng Yin, Juntao Wang, Renjie Li, Tsung-Hui Chang, and Sergios Theodoridis. Adaptive kernel design for bayesian optimization is a piece of cake with llms. In Advances in Neural Information Processing Systems (NeurIPS), 2025

  47. [47]

    We are underselling the modularity of Bayesian optimization

    Austin Tripp. We are underselling the modularity of Bayesian optimization. https://www. austintripp.ca/blog/2026-02-09-bo-modularity/, 2026. Accessed: 2026-04-14

  48. [48]

    Gaussian processes in machine learning

    Carl Edward Rasmussen. Gaussian processes in machine learning. InSummer School on Machine Learning. Springer, 2003

  49. [49]

    The application of bayesian methods for seeking the extremum.Towards Global Optimization, 1998

    Jonas Mockus. The application of bayesian methods for seeking the extremum.Towards Global Optimization, 1998

  50. [50]

    A limited memory algorithm for bound constrained optimization.SIAM Journal on Scientific Computing, 1995

    Richard H Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. A limited memory algorithm for bound constrained optimization.SIAM Journal on Scientific Computing, 1995

  51. [51]

    Gaussian processes for regression

    Christopher Williams and Carl Rasmussen. Gaussian processes for regression. InAdvances in Neural Information Processing Systems (NeurIPS), 1995. 12

  52. [52]

    Batched large-scale bayesian optimization in high-dimensional spaces

    Zi Wang, Clement Gehring, Pushmeet Kohli, and Stefanie Jegelka. Batched large-scale bayesian optimization in high-dimensional spaces. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2018

  53. [53]

    Lassobench: A high- dimensional hyperparameter optimization benchmark suite for lasso

    Kenan Šehi ´c, Alexandre Gramfort, Joseph Salmon, and Luigi Nardi. Lassobench: A high- dimensional hyperparameter optimization benchmark suite for lasso. InInternational Confer- ence on Automated Machine Learning (AutoML), 2022

  54. [54]

    An experimental study in adaptive kernel selection for bayesian optimization.IEEE Access, 2019

    Ibai Roman, Roberto Santana, Alexander Mendiburu, and Jose A Lozano. An experimental study in adaptive kernel selection for bayesian optimization.IEEE Access, 2019

  55. [55]

    Structure discovery in nonparametric regression through compositional kernel search

    David Duvenaud, James Lloyd, Roger Grosse, Joshua Tenenbaum, and Ghahramani Zoubin. Structure discovery in nonparametric regression through compositional kernel search. In International Conference on Machine Learning (ICML), 2013

  56. [56]

    Unexpected improvements to expected improvement for bayesian optimization

    Sebastian Ament, Samuel Daulton, David Eriksson, Maximilian Balandat, and Eytan Bakshy. Unexpected improvements to expected improvement for bayesian optimization. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

  57. [57]

    Construction and comparison of high-dimensional sobol’generators.Wilmott, 2011

    Ilya M Sobol’, Danil Asotsky, Alexander Kreinin, and Sergei Kucherenko. Construction and comparison of high-dimensional sobol’generators.Wilmott, 2011. 13 Appendix A Task Details We evaluate our method on five standard high-dimensional BO benchmarks covering a range of problem types and dimensionalities. Rover (D= 100 ).The Rover benchmark [ 52] is a traj...

  58. [58]

    The system calls ‘ E v o l v e d K e r n e l ( a r d _ n u m _ d i m s = D ) ‘

    _ _i nit __ S I G N A T U R E : ‘ __init__ ‘ MUST accept ‘ a r d _ n u m _ d i m s : int ‘ as the ONLY re qu ir ed p a r a m e t e r . The system calls ‘ E v o l v e d K e r n e l ( a r d _ n u m _ d i m s = D ) ‘. All other args MUST have d ef au lt s . NEVER add re qu ire d args like ‘q : int ‘ , ‘ center : Tensor ‘ , ‘M : int ‘

  59. [59]

    FORWARD S I G N A T U R E : ‘ def forward ( self , x1 , x2 , diag = False , ** params ) ‘ - always accept ** params

  60. [60]

    NEVER use ‘

    BATCH - SAFE SHAPES ( C RI TIC AL ) : The kernel is tested with these exact shapes - your code MUST handle ALL of them : 13- 2 D : x1 =(5 , D ) vs x2 =(1 , D ) -> output must be (5 , 1) , NOT (5 , 5) 14- 2 D : x1 =(3 , D ) vs x2 =(7 , D ) -> output must be (3 , 7) 15- 3 D batched ( BoTorch acqf o p t i m i z a t i o n ) : x1 =(1 , 4 , D ) vs x2 =(1 , 3 , ...

  61. [61]

    , N1 , N2 ) for ANY N1 != N2

    OUTPUT SHAPE : return (... , N1 , N2 ) for ANY N1 != N2 . If your output is ( N1 , N1 ) instead of ( N1 , N2 ) , you have a bug

  62. [62]

    cdist ( x1s , x2s , p =2) ‘ - NEVER ‘ x1

    D IS TAN CE : ‘ torch . cdist ( x1s , x2s , p =2) ‘ - NEVER ‘ x1 . u n s q u e e z e ( -2) - x2 . u n s q u e e z e ( -3) ‘ ( OOM )

  63. [63]

    t r a n s p o s e ( -1 , -2) ‘

    INNER PRODUCT : ‘ x1 @ x2 . t r a n s p o s e ( -1 , -2) ‘

  64. [64]

    device ‘

    DEVICE : new tensors in forward use ‘ device = x1 . device ‘. Use ‘ register_buffer ‘ in __ in it __

  65. [65]

    NO IN - PLACE : ‘x = x + y ‘ not ‘x += y ‘. ‘x = x . clamp (...) ‘ not ‘x . clamp_ (...) ‘

  66. [66]

    detach () ‘, ‘

    NO DETACH / NUMPY : never ‘. detach () ‘, ‘. numpy () ‘, ‘. item () ‘, ‘ torch . no_grad () ‘ in forward

  67. [67]

    clamp ( min =1 e -15) ‘ before sqrt / log , ‘

    N U M E R I C A L : ‘. clamp ( min =1 e -15) ‘ before sqrt / log , ‘. clamp ( max =20.0) ‘ for exp

  68. [68]

    r a w _ l e n g t h s c a l e

    L E N G T H S C A L E : if ‘ h a s _ l e n g t h s c a l e = True ‘ , do NOT ‘ r e g i s t e r _ p a r a m e t e r (" r a w _ l e n g t h s c a l e " , ...) ‘

  69. [69]

    PARAM ORDER : ‘ r e g i s t e r _ p a r a m e t e r ( name , ...) ‘ BEFORE ‘ r e g i s t e r _ c o n s t r a i n t ( name , ...) ‘

  70. [70]

    d iag on al ( dim1 = -2 , dim2 = -1) ‘

    DIAG : ‘ if diag : return covar . d iag on al ( dim1 = -2 , dim2 = -1) ‘

  71. [71]

    sqrt / pi / log

    IMPORTS : include ‘ import math ‘ if using math . sqrt / pi / log

  72. [72]

    NO SELF - R E A S S I G N M E N T in forward () : never ‘ self . x = ... ‘ - use local v a r i a b l e s

  73. [73]

    ONLY USE EX IS TI NG g py tor ch classes

  74. [74]

    NO SUB - KERNELS : do NOT i n s t a n t i a t e gp yt orc h kernel objects ( e . g . ‘ M a t e r n K e r n e l () ‘, ‘ R B F K e r n e l () ‘) as sub - c o m p o n e n t s . They produce L a z y E v a l u a t e d K e r n e l T e n s o r with u n p r e d i c t a b l e shapes . Instead , i m p l e m e n t the formula dir ec tl y ( e . g . for Matern -5/2: ‘...

  75. [75]

    NO DATA - D E P E N D E N T BOUNDS in forward () : never compute bounds / center from the input data ( e . g . ‘ x1 . max ( dim = -2) ‘ or ‘ x1 . min ( dim = -2) ‘) . These change with N and break shape i n v a r i a n c e . Store fixed bounds via ‘ register_buffer ‘ in __init__ , or use c o n s t a n t s ( e . g . 0 and 1)

  76. [76]

    SCALAR P AI RW ISE DI STA NC E : when c o m p u t i n g d is tan ce between scalar values ( e . g . warped radii of shape (... , N , 1) ) , use ‘ torch . cdist ( r1 , r2 , p =2) ‘ which gives (... , N1 , N2 ) . NEVER use ‘ r1 - r2 ‘ or ‘ torch . abs ( r1 - r2 ) ‘ - this gives (... , N1 , 1) not (... , N1 , N2 ) and breaks cross - pair c o m p u t a t i o ...