Automated Kernel Discovery Towards Understanding High-dimensional Bayesian Optimization

Inhyuck Song; Jaewoo Lee; Jinkyoo Park; Taeyoung Yun; Woocheol Shin

arxiv: 2605.20249 · v1 · pith:ZYR3OH67new · submitted 2026-05-18 · 💻 cs.LG · cs.AI

Automated Kernel Discovery Towards Understanding High-dimensional Bayesian Optimization

Taeyoung Yun , Woocheol Shin , Inhyuck Song , Jaewoo Lee , Jinkyoo Park This is my paper

Pith reviewed 2026-05-21 08:22 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords kernel discoverybayesian optimizationgaussian processeshigh-dimensional optimizationlarge language modelsevolutionary algorithmsautomated machine learning

0 comments

The pith

An LLM-driven evolutionary framework discovers novel Gaussian process kernels for high-dimensional Bayesian optimization without conditioning on raw observations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an automated method called Kernel Discovery to design kernels for Gaussian processes used in Bayesian optimization when inputs have many dimensions. Manual kernel design is difficult in these settings, and prior automated approaches are restricted to simple combinations of base kernels or must feed raw data into language models, which hits practical limits. The new approach prompts large language models in two stages to first suggest new mathematical kernel expressions and then convert those expressions into working code. Kernels are then chosen using a leave-one-out continuous ranked probability score that discourages overfitting. Tests on five standard high-dimensional benchmarks show the method ranks first on average among seventeen competing approaches.

Core claim

We introduce Kernel Discovery, a LLM-driven evolutionary framework for high-dimensional BO that searches a broader kernel space beyond predefined composition rules and does not require conditioning on observations. Motivated by the observation that directly prompting an LLM to generate kernel code yields syntactically varied but functionally identical kernels, we adopt a two-stage approach: an LLM first proposes novel mathematical forms, then a second LLM call converts each form into validated, executable code. We also propose a leave-one-out continuous ranked probability score (LOO-CRPS) as a selection criterion that penalizes overfitted kernels. On five high-dimensional BO benchmarks, our

What carries the argument

The two-stage LLM process in which one model proposes novel mathematical kernel forms and a second model turns them into executable code, together with LOO-CRPS selection that penalizes overfitting.

Load-bearing premise

Large language models can reliably propose mathematically novel and functionally distinct kernel forms without access to raw observations, and LOO-CRPS will select kernels that generalize rather than overfit in high-dimensional regimes.

What would settle it

Running Kernel Discovery on a new high-dimensional Bayesian optimization benchmark and checking whether the average rank across methods stays near 1.2 or whether the discovered kernels fail to beat standard automatic relevance determination kernels.

Figures

Figures reproduced from arXiv: 2605.20249 by Inhyuck Song, Jaewoo Lee, Jinkyoo Park, Taeyoung Yun, Woocheol Shin.

**Figure 1.** Figure 1: Motivation Figure. (Left): In high-dimensional BO, a conventional LLM-based BO does not work due to the out-of-context limits and the difficulty of extracting meaningful patterns. (Right): Our pipeline enables us to discover effective kernels for high-dimensional BO beyond compositions. these breakthroughs are promising, identifying such effective priors or transformations still requires extensive manual e… view at source ↗

**Figure 2.** Figure 2: Overview of the method. Given an initial population, we instruct an LLM to generate novel [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Prompts for Kernel Discovery. To discover novel kernels for high-dimensional BO, we [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Learning curve of various methods across standard benchmarks. We visualize the top-2 [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Ablation studies on the two-stage approach for kernel discovery. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation on each component of Kernel Discovery [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Post-analysis on kernel discovery. Both experiments are conducted in SVM benchmark. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Mathematical form discovery prompt template [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Code conversion prompt template. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: Composition prompt template. Kernel Validation Failure Rates. The failure probability in Tables 2 and 3 denotes the empirical fraction of LLM-generated kernels rejected by our structural validator. A generated kernel knew is accepted only when V(knew) = Vagn(knew) ∧ Vpsd(knew) = 1. Here, Vagn checks whether the generated kernel implementation is dimension- and shape-agnostic, as required by the BO pipelin… view at source ↗

**Figure 11.** Figure 11: Full learning curves for all methods across five benchmarks. [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗

**Figure 12.** Figure 12: Functional redundancy example from direct code generation. Both snippets compute [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗

**Figure 13.** Figure 13: Functional redundancy example from direct code generation. Both snippets compute [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗

**Figure 14.** Figure 14: Functional redundancy example from direct code generation. Both snippets compute [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗

**Figure 15.** Figure 15: Ablation studies on each component of our method across other benchmarks. [PITH_FULL_IMAGE:figures/full_fig_p025_15.png] view at source ↗

**Figure 16.** Figure 16: Ablation studies on the base population for Compositional Search and CAKE baselines. [PITH_FULL_IMAGE:figures/full_fig_p026_16.png] view at source ↗

**Figure 17.** Figure 17: Ablation studies on the evaluation metric for Compositional Search and CAKE baselines. [PITH_FULL_IMAGE:figures/full_fig_p026_17.png] view at source ↗

**Figure 18.** Figure 18: Representative cross-covariance failures caught by [PITH_FULL_IMAGE:figures/full_fig_p027_18.png] view at source ↗

**Figure 19.** Figure 19: Representative PSD failures caught by Vpsd. Both include terms that do not generally preserve positive semi-definiteness and can yield indefinite Gram matrices. Problematic terms are highlighted in red. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_19.png] view at source ↗

**Figure 20.** Figure 20: Forward code for the discovered kernel k1. 1 def forward ( self , x1 , x2 , diag = False , ** params ): 2 if x1 . dim () == 1: x1 = x1 . unsqueeze ( -1) 3 if x2 . dim () == 1: x2 = x2 . unsqueeze ( -1) 4 arc_weight = self . raw_arc_weight_constraint . transform ( self . raw_arc_weight ) 5 global_scale = self . raw_global_scale_constraint . transform ( self . raw_global_scale ) 6 angular_weights = self . r… view at source ↗

**Figure 21.** Figure 21: Forward code for the discovered kernel k2. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_21.png] view at source ↗

**Figure 22.** Figure 22: Forward code for the discovered kernel k3. Evolving Population on Other Benchmarks [PITH_FULL_IMAGE:figures/full_fig_p031_22.png] view at source ↗

**Figure 23.** Figure 23: Post-analysis of kernel discovery on four additional benchmarks. For each benchmark, we [PITH_FULL_IMAGE:figures/full_fig_p031_23.png] view at source ↗

**Figure 24.** Figure 24: Analysis on boundary-seeking behavior of several kernels. [PITH_FULL_IMAGE:figures/full_fig_p033_24.png] view at source ↗

**Figure 25.** Figure 25: Code snippet of the discovered kernel from SVM benchmark. [PITH_FULL_IMAGE:figures/full_fig_p034_25.png] view at source ↗

**Figure 26.** Figure 26: Code snippet of an additional discovered kernel with strong transferability. This kernel [PITH_FULL_IMAGE:figures/full_fig_p035_26.png] view at source ↗

**Figure 27.** Figure 27: Code snippet of another additional discovered kernel with strong transferability. This [PITH_FULL_IMAGE:figures/full_fig_p036_27.png] view at source ↗

read the original abstract

Gaussian Process (GP) kernels are central to Bayesian optimization (BO), yet designing effective kernels for high-dimensional problems still relies on extensive manual engineering. Existing automated approaches struggle in high dimensions for two bottlenecks: their kernel search space is limited to additions and multiplications of base kernels, and LLM-based approaches require conditioning on raw observations, which becomes infeasible due to context-length limits and the difficulty of extracting meaningful patterns. We introduce \textbf{Kernel Discovery}, a LLM-driven evolutionary framework for high-dimensional BO that searches a broader kernel space beyond predefined composition rules and does not require conditioning on observations. Motivated by the observation that directly prompting an LLM to generate kernel code yields syntactically varied but functionally identical kernels, we adopt a two-stage approach: an LLM first proposes novel mathematical forms, then a second LLM call converts each form into validated, executable code. We also propose a leave-one-out continuous ranked probability score (LOO-CRPS) as a selection criterion that penalizes overfitted kernels. On five high-dimensional BO benchmarks, our method achieves an average rank of \textbf{1.2 out of 17}, outperforming competitive baselines. We further analyze the discovered kernels to identify which kernels lead to improvements in high-dimensional BO.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Kernel Discovery, an LLM-driven evolutionary framework for automated kernel design in high-dimensional Bayesian optimization. It uses a two-stage process in which one LLM proposes novel mathematical kernel forms and a second converts them to executable code, avoiding direct observation conditioning to sidestep context-length limits. A leave-one-out continuous ranked probability score (LOO-CRPS) serves as the selection criterion to penalize overfitting. The method is evaluated on five high-dimensional BO benchmarks, where it reports an average rank of 1.2 out of 17 competing approaches, and the discovered kernels are further analyzed to identify properties that improve high-dimensional BO performance.

Significance. If the performance claims are substantiated, the work offers a practical advance in automated kernel engineering for GPs in regimes where manual design is intractable. The two-stage LLM proposal and LOO-CRPS selection address documented bottlenecks in prior search-based and LLM-based kernel methods. Credit is due for the broader kernel space explored beyond fixed composition rules and for the explicit analysis of which kernel characteristics benefit high-dimensional BO. The empirical nature of the contribution makes rigorous experimental reporting essential to its significance.

major comments (3)

[§4] §4 (Experimental results): the headline claim of average rank 1.2 out of 17 is presented without the number of independent runs, standard deviations across runs, or any statistical significance tests (e.g., paired Wilcoxon or Friedman tests) against the 16 baselines. This information is load-bearing for the central performance assertion.
[§3.3] §3.3 (LOO-CRPS selection): the criterion is motivated as penalizing overfit kernels, yet the manuscript provides no ablation or held-out generalization test demonstrating that LOO-CRPS prefers kernels that extrapolate rather than fit noise in the sparse high-dimensional regime. Given the skeptic concern that leave-one-out on limited data may still reward excessive flexibility, this validation is required to support the selection mechanism.
[§3.2] §3.2 (two-stage proposal): the claim that the two-stage LLM process yields functionally distinct kernels is central to the broader search space argument, but the paper does not quantify functional diversity (e.g., via kernel Gram-matrix distances or predictive behavior on synthetic functions) or describe explicit diversity maintenance in the evolutionary loop.

minor comments (2)

[Abstract / §4] The abstract and §4 refer to “five high-dimensional BO benchmarks” without naming them or providing a summary table of their dimensions and characteristics; explicit identification would aid reproducibility.
[§3.3] Notation for the LOO-CRPS formula in §3.3 could be clarified by explicitly stating the predictive distribution used for each left-out point.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important aspects for strengthening the experimental rigor and supporting arguments. We address each major comment point by point below, indicating the revisions we plan to make.

read point-by-point responses

Referee: [§4] §4 (Experimental results): the headline claim of average rank 1.2 out of 17 is presented without the number of independent runs, standard deviations across runs, or any statistical significance tests (e.g., paired Wilcoxon or Friedman tests) against the 16 baselines. This information is load-bearing for the central performance assertion.

Authors: We agree that the experimental reporting in §4 requires these details to fully substantiate the performance claims. In the revised manuscript, we will explicitly state the number of independent runs, report standard deviations across runs, and include statistical significance tests such as the Friedman test with post-hoc analysis to compare our method against the baselines. revision: yes
Referee: [§3.3] §3.3 (LOO-CRPS selection): the criterion is motivated as penalizing overfit kernels, yet the manuscript provides no ablation or held-out generalization test demonstrating that LOO-CRPS prefers kernels that extrapolate rather than fit noise in the sparse high-dimensional regime. Given the skeptic concern that leave-one-out on limited data may still reward excessive flexibility, this validation is required to support the selection mechanism.

Authors: We appreciate the referee's concern regarding the validation of LOO-CRPS. While the criterion is intended to penalize overfitting through its leave-one-out predictive evaluation, we agree that additional evidence would strengthen the case. In the revised manuscript, we will add an ablation study in §3.3 comparing LOO-CRPS to other selection criteria, along with a held-out generalization test on synthetic high-dimensional data to show that it favors kernels with better extrapolation properties. revision: yes
Referee: [§3.2] §3.2 (two-stage proposal): the claim that the two-stage LLM process yields functionally distinct kernels is central to the broader search space argument, but the paper does not quantify functional diversity (e.g., via kernel Gram-matrix distances or predictive behavior on synthetic functions) or describe explicit diversity maintenance in the evolutionary loop.

Authors: We thank the referee for this observation. To better support the claim that the two-stage process enables a functionally broader kernel space, we will revise §3.2 to include quantitative analysis of functional diversity, for example by computing distances between kernel Gram matrices evaluated on synthetic functions and by examining predictive performance differences. We will also describe the diversity-promoting aspects of the evolutionary loop, such as the mutation and selection procedures. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical search procedure with external benchmark evaluation

full rationale

The paper introduces an LLM-driven evolutionary framework for automated kernel discovery in high-dimensional BO. It relies on a two-stage proposal process and LOO-CRPS selection, then reports comparative performance (avg rank 1.2/17) on five external benchmarks. No mathematical derivations, first-principles predictions, or fitted parameters are presented that could reduce to self-definitions or self-citations. The central claims rest on algorithmic description plus independent experimental results rather than any internal consistency loop.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the unverified capability of current LLMs to generate useful novel kernel expressions and on the effectiveness of LOO-CRPS as a generalizable selection metric; no explicit free parameters or invented physical entities are described.

axioms (2)

domain assumption LLMs prompted directly produce syntactically varied but functionally identical kernels, motivating the two-stage separation of form proposal and code generation.
Stated as motivation in the abstract for adopting the two-stage approach.
domain assumption LOO-CRPS penalizes overfitting kernels more effectively than standard likelihood in high-dimensional BO settings.
Introduced as the selection criterion without further justification in the abstract.

pith-pipeline@v0.9.0 · 5760 in / 1408 out tokens · 44341 ms · 2026-05-21T08:22:27.705539+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · 5 internal anchors

[1]

Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020

Ryan Turner, David Eriksson, Michael McCourt, Juha Kiili, Eero Laaksonen, Zhen Xu, and Isabelle Guyon. Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020. InNeurIPS 2020 Competition and Demonstration Track, 2021

work page 2020
[2]

Pre-training helps bayesian optimization too

Zi Wang, George E Dahl, Kevin Swersky, Chansoo Lee, Zelda Mariet, Zachary Nado, Justin Gilmer, Jasper Snoek, and Zoubin Ghahramani. Pre-training helps bayesian optimization too. InICML Workshop on Adaptive Experimental Design and Active Learning in the Real World, 2022

work page 2022
[3]

Raiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces

Kevin Swersky, David Duvenaud, Jasper Snoek, Frank Hutter, and Michael A Osborne. Raiders of the lost architecture: Kernels for bayesian optimization in conditional parameter spaces. arXiv preprint arXiv:1409.4011, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[4]

Neural architecture search with bayesian optimisation and optimal transport

Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabas Poczos, and Eric P Xing. Neural architecture search with bayesian optimisation and optimal transport. InAdvances in Neural Information Processing Systems (NeurIPS), 2018

work page 2018
[5]

Local latent space bayesian optimization over structured inputs

Natalie Maus, Haydn Jones, Juston Moore, Matt J Kusner, John Bradshaw, and Jacob Gard- ner. Local latent space bayesian optimization over structured inputs. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022
[6]

Latent bayesian optimization via autoregressive normalizing flows

Seunghun Lee, Jinyoung Park, Jaewon Chu, Minseo Yoon, and Hyunwoo J Kim. Latent bayesian optimization via autoregressive normalizing flows. InInternational Conference on Learning Representations (ICLR), 2025

work page 2025
[7]

Learning search space partition for black-box optimization using monte carlo tree search

Linnan Wang, Rodrigo Fonseca, and Yuandong Tian. Learning search space partition for black-box optimization using monte carlo tree search. InAdvances in Neural Information Processing Systems (NeurIPS), 2020

work page 2020
[8]

Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics.Machine Learning, 2023

Felix Berkenkamp, Andreas Krause, and Angela P Schoellig. Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics.Machine Learning, 2023

work page 2023
[9]

A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise.Journal of Basic Engineering, 1964

Harold J Kushner. A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise.Journal of Basic Engineering, 1964

work page 1964
[10]

Cambridge University Press, 2023

Roman Garnett.Bayesian optimization. Cambridge University Press, 2023

work page 2023
[11]

Additive gaussian processes

David K Duvenaud, Hannes Nickisch, and Carl Rasmussen. Additive gaussian processes. In Advances in Neural Information Processing Systems (NeurIPS), 2011

work page 2011
[12]

High dimensional bayesian optimisation and bandits via additive models

Kirthevasan Kandasamy, Jeff Schneider, and Barnabás Póczos. High dimensional bayesian optimisation and bandits via additive models. InInternational Conference on Machine Learning (ICML), 2015

work page 2015
[13]

Discovering and exploiting additive structure for bayesian optimization

Jacob Gardner, Chuan Guo, Kilian Weinberger, Roman Garnett, and Roger Grosse. Discovering and exploiting additive structure for bayesian optimization. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2017

work page 2017
[14]

Additive gaussian processes revisited

Xiaoyu Lu, Alexis Boukouvalas, and James Hensman. Additive gaussian processes revisited. InInternational Conference on Machine Learning (ICML), 2022

work page 2022
[15]

Bayesian optimization in a billion dimensions via random embeddings.Journal of Artificial Intelligence Research, 2016

Ziyu Wang, Frank Hutter, Masrour Zoghi, David Matheson, and Nando De Feitas. Bayesian optimization in a billion dimensions via random embeddings.Journal of Artificial Intelligence Research, 2016

work page 2016
[16]

A framework for bayesian optimization in embedded subspaces

Amin Nayebi, Alexander Munteanu, and Matthias Poloczek. A framework for bayesian optimization in embedded subspaces. InInternational Conference on Machine Learning (ICML), 2019

work page 2019
[17]

Re-examining linear em- beddings for high-dimensional bayesian optimization

Ben Letham, Roberto Calandra, Akshara Rai, and Eytan Bakshy. Re-examining linear em- beddings for high-dimensional bayesian optimization. InAdvances in Neural Information Processing Systems (NeurIPS), 2020. 10

work page 2020
[18]

High-dimensional bayesian optimization with sparse axis-aligned subspaces

David Eriksson and Martin Jankowiak. High-dimensional bayesian optimization with sparse axis-aligned subspaces. InConference on Uncertainty in Artificial Intelligence (UAI), 2021

work page 2021
[19]

Increasing the scope as you learn: Adaptive bayesian optimization in nested subspaces

Leonard Papenmeier, Luigi Nardi, and Matthias Poloczek. Increasing the scope as you learn: Adaptive bayesian optimization in nested subspaces. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022
[20]

Scalable global optimization via local bayesian optimization

David Eriksson, Michael Pearce, Jacob Gardner, Ryan D Turner, and Matthias Poloczek. Scalable global optimization via local bayesian optimization. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

work page 2019
[21]

Scalable constrained bayesian optimization

David Eriksson and Matthias Poloczek. Scalable constrained bayesian optimization. InInterna- tional Conference on Artificial Intelligence and Statistics (AISTATS), 2021

work page 2021
[22]

Joint composite latent space bayesian optimization

Natalie Maus, Zhiyuan Jerry Lin, Maximilian Balandat, and Eytan Bakshy. Joint composite latent space bayesian optimization. InInternational Conference on Machine Learning (ICML), 2024

work page 2024
[23]

Vanilla bayesian optimization performs great in high dimensions

Carl Hvarfner, Erik Orm Hellsten, and Luigi Nardi. Vanilla bayesian optimization performs great in high dimensions. InInternational Conference on Machine Learning (ICML), 2024

work page 2024
[24]

Standard gaussian process is all you need for high-dimensional bayesian optimization

Zhitong Xu, Haitao Wang, Jeff M Phillips, and Shandian Zhe. Standard gaussian process is all you need for high-dimensional bayesian optimization. InInternational Conference on Learning Representations (ICLR), 2025

work page 2025
[25]

Bock: Bayesian optimization with cylindrical kernels

ChangYong Oh, Efstratios Gavves, and Max Welling. Bock: Bayesian optimization with cylindrical kernels. InInternational Conference on Machine Learning (ICML), 2018

work page 2018
[26]

We still don’t understand high-dimensional bayesian optimization

Colin Doumont, Donney Fan, Natalie Maus, Jacob R Gardner, Henry Moss, and Geoff Pleiss. We still don’t understand high-dimensional bayesian optimization. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2026

work page 2026
[27]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[28]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[29]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Mathematical discoveries from program search with large language models

Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M Pawan Kumar, Emilien Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming Wang, Omar Fawzi, et al. Mathematical discoveries from program search with large language models. Nature, 2024

work page 2024
[31]

AlphaEvolve: A coding agent for scientific and algorithmic discovery

Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. Alphaevolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[32]

Funbo: Discovering acquisition functions for bayesian optimization with funsearch

Virginia Aglietti, Ira Ktena, Jessica Schrouff, Eleni Sgouritsa, Francisco Ruiz, Alan Malek, Alexis Bellot, and Silvia Chiappa. Funbo: Discovering acquisition functions for bayesian optimization with funsearch. InInternational Conference on Machine Learning (ICML), 2025

work page 2025
[33]

Trajevo: Trajectory prediction heuristics design via llm-driven evolution

Zhikai Zhao, Chuanbo Hua, Federico Berto, Kanghoon Lee, Zihan Ma, Jiachen Li, and Jinkyoo Park. Trajevo: Trajectory prediction heuristics design via llm-driven evolution. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026. 11

work page 2026
[34]

Self-evolverec: Self-evolving recommender systems with llm-based directional feedback.arXiv preprint arXiv:2602.12612, 2026

Sein Kim, Sangwu Park, Hongseok Kang, Wonjoong Kim, Jimin Seo, Yeonjun In, Kanghoon Yoon, and Chanyoung Park. Self-evolverec: Self-evolving recommender systems with llm-based directional feedback.arXiv preprint arXiv:2602.12612, 2026

work page arXiv 2026
[35]

Adaptive and safe bayesian optimization in high dimensions via one-dimensional subspaces

Johannes Kirschner, Mojmir Mutny, Nicole Hiller, Rasmus Ischebeck, and Andreas Krause. Adaptive and safe bayesian optimization in high dimensions via one-dimensional subspaces. In International Conference on Machine Learning (ICML), 2019

work page 2019
[36]

Monte carlo tree search based variable selection for high dimensional bayesian optimization

Lei Song, Ke Xue, Xiaobin Huang, and Chao Qian. Monte carlo tree search based variable selection for high dimensional bayesian optimization. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022
[37]

O., Hvarfner, C., Papenmeier, L., and Nardi, L

Erik Orm Hellsten, Carl Hvarfner, Leonard Papenmeier, and Luigi Nardi. High-dimensional bayesian optimization with group testing.arXiv preprint arXiv:2310.03515, 2023

work page arXiv 2023
[38]

Multi-objective bayesian optimization over high-dimensional search spaces

Samuel Daulton, David Eriksson, Maximilian Balandat, and Eytan Bakshy. Multi-objective bayesian optimization over high-dimensional search spaces. InConference on Uncertainty in Artificial Intelligence (UAI), 2022

work page 2022
[39]

Discovering many diverse solutions with bayesian optimization

Natalie Maus, Kaiwen Wu, David Eriksson, and Jacob Gardner. Discovering many diverse solutions with bayesian optimization. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023

work page 2023
[40]

Feasibility-driven trust region bayesian optimization

Paolo Ascia, Elena Raponi, Thomas Bäck, and Fabian Duddeck. Feasibility-driven trust region bayesian optimization. InInternational Conference on Automated Machine Learning (AutoML), 2025

work page 2025
[41]

Understanding high-dimensional bayesian optimization

Leonard Papenmeier, Matthias Poloczek, and Luigi Nardi. Understanding high-dimensional bayesian optimization. InInternational Conference on Machine Learning (ICML), 2025

work page 2025
[42]

Cylindrical thompson sampling for high-dimensional bayesian optimization

Bahador Rashidi, Kerrick Johnstonbaugh, and Chao Gao. Cylindrical thompson sampling for high-dimensional bayesian optimization. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024

work page 2024
[43]

Adaptive acquisition selection for bayesian optimization with large language models

Giang Ngo, Dat Phan Trong, Dang Nguyen, Sunil Gupta, and Svetha Venkatesh. Adaptive acquisition selection for bayesian optimization with large language models. InInternational Conference on Learning Representations (ICLR), 2026

work page 2026
[44]

Llamea-bo: A large language model evolutionary algorithm for automatically generating bayesian optimization algorithms

Wenhu Li, Niki van Stein, Thomas Bäck, and Elena Raponi. Llamea-bo: A large language model evolutionary algorithm for automatically generating bayesian optimization algorithms. arXiv preprint arXiv:2505.21034, 2025

work page arXiv 2025
[45]

Large language models to enhance bayesian optimization

Tennison Liu, Nicolás Astorga, Nabeel Seedat, and Mihaela van der Schaar. Large language models to enhance bayesian optimization. InInternational Conference on Learning Representa- tions (ICLR), 2024

work page 2024
[46]

Adaptive kernel design for bayesian optimization is a piece of cake with llms

Richard Cornelius Suwandi, Feng Yin, Juntao Wang, Renjie Li, Tsung-Hui Chang, and Sergios Theodoridis. Adaptive kernel design for bayesian optimization is a piece of cake with llms. In Advances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[47]

We are underselling the modularity of Bayesian optimization

Austin Tripp. We are underselling the modularity of Bayesian optimization. https://www. austintripp.ca/blog/2026-02-09-bo-modularity/, 2026. Accessed: 2026-04-14

work page 2026
[48]

Gaussian processes in machine learning

Carl Edward Rasmussen. Gaussian processes in machine learning. InSummer School on Machine Learning. Springer, 2003

work page 2003
[49]

The application of bayesian methods for seeking the extremum.Towards Global Optimization, 1998

Jonas Mockus. The application of bayesian methods for seeking the extremum.Towards Global Optimization, 1998

work page 1998
[50]

A limited memory algorithm for bound constrained optimization.SIAM Journal on Scientific Computing, 1995

Richard H Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. A limited memory algorithm for bound constrained optimization.SIAM Journal on Scientific Computing, 1995

work page 1995
[51]

Gaussian processes for regression

Christopher Williams and Carl Rasmussen. Gaussian processes for regression. InAdvances in Neural Information Processing Systems (NeurIPS), 1995. 12

work page 1995
[52]

Batched large-scale bayesian optimization in high-dimensional spaces

Zi Wang, Clement Gehring, Pushmeet Kohli, and Stefanie Jegelka. Batched large-scale bayesian optimization in high-dimensional spaces. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2018

work page 2018
[53]

Lassobench: A high- dimensional hyperparameter optimization benchmark suite for lasso

Kenan Šehi ´c, Alexandre Gramfort, Joseph Salmon, and Luigi Nardi. Lassobench: A high- dimensional hyperparameter optimization benchmark suite for lasso. InInternational Confer- ence on Automated Machine Learning (AutoML), 2022

work page 2022
[54]

An experimental study in adaptive kernel selection for bayesian optimization.IEEE Access, 2019

Ibai Roman, Roberto Santana, Alexander Mendiburu, and Jose A Lozano. An experimental study in adaptive kernel selection for bayesian optimization.IEEE Access, 2019

work page 2019
[55]

Structure discovery in nonparametric regression through compositional kernel search

David Duvenaud, James Lloyd, Roger Grosse, Joshua Tenenbaum, and Ghahramani Zoubin. Structure discovery in nonparametric regression through compositional kernel search. In International Conference on Machine Learning (ICML), 2013

work page 2013
[56]

Unexpected improvements to expected improvement for bayesian optimization

Sebastian Ament, Samuel Daulton, David Eriksson, Maximilian Balandat, and Eytan Bakshy. Unexpected improvements to expected improvement for bayesian optimization. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[57]

Construction and comparison of high-dimensional sobol’generators.Wilmott, 2011

Ilya M Sobol’, Danil Asotsky, Alexander Kreinin, and Sergei Kucherenko. Construction and comparison of high-dimensional sobol’generators.Wilmott, 2011. 13 Appendix A Task Details We evaluate our method on five standard high-dimensional BO benchmarks covering a range of problem types and dimensionalities. Rover (D= 100 ).The Rover benchmark [ 52] is a traj...

work page 2011
[58]

The system calls ‘ E v o l v e d K e r n e l ( a r d _ n u m _ d i m s = D ) ‘

_ _i nit __ S I G N A T U R E : ‘ __init__ ‘ MUST accept ‘ a r d _ n u m _ d i m s : int ‘ as the ONLY re qu ir ed p a r a m e t e r . The system calls ‘ E v o l v e d K e r n e l ( a r d _ n u m _ d i m s = D ) ‘. All other args MUST have d ef au lt s . NEVER add re qu ire d args like ‘q : int ‘ , ‘ center : Tensor ‘ , ‘M : int ‘

work page
[59]

FORWARD S I G N A T U R E : ‘ def forward ( self , x1 , x2 , diag = False , ** params ) ‘ - always accept ** params

work page
[60]

NEVER use ‘

BATCH - SAFE SHAPES ( C RI TIC AL ) : The kernel is tested with these exact shapes - your code MUST handle ALL of them : 13- 2 D : x1 =(5 , D ) vs x2 =(1 , D ) -> output must be (5 , 1) , NOT (5 , 5) 14- 2 D : x1 =(3 , D ) vs x2 =(7 , D ) -> output must be (3 , 7) 15- 3 D batched ( BoTorch acqf o p t i m i z a t i o n ) : x1 =(1 , 4 , D ) vs x2 =(1 , 3 , ...

work page
[61]

, N1 , N2 ) for ANY N1 != N2

OUTPUT SHAPE : return (... , N1 , N2 ) for ANY N1 != N2 . If your output is ( N1 , N1 ) instead of ( N1 , N2 ) , you have a bug

work page
[62]

cdist ( x1s , x2s , p =2) ‘ - NEVER ‘ x1

D IS TAN CE : ‘ torch . cdist ( x1s , x2s , p =2) ‘ - NEVER ‘ x1 . u n s q u e e z e ( -2) - x2 . u n s q u e e z e ( -3) ‘ ( OOM )

work page
[63]

t r a n s p o s e ( -1 , -2) ‘

INNER PRODUCT : ‘ x1 @ x2 . t r a n s p o s e ( -1 , -2) ‘

work page
[64]

device ‘

DEVICE : new tensors in forward use ‘ device = x1 . device ‘. Use ‘ register_buffer ‘ in __ in it __

work page
[65]

NO IN - PLACE : ‘x = x + y ‘ not ‘x += y ‘. ‘x = x . clamp (...) ‘ not ‘x . clamp_ (...) ‘

work page
[66]

detach () ‘, ‘

NO DETACH / NUMPY : never ‘. detach () ‘, ‘. numpy () ‘, ‘. item () ‘, ‘ torch . no_grad () ‘ in forward

work page
[67]

clamp ( min =1 e -15) ‘ before sqrt / log , ‘

N U M E R I C A L : ‘. clamp ( min =1 e -15) ‘ before sqrt / log , ‘. clamp ( max =20.0) ‘ for exp

work page
[68]

r a w _ l e n g t h s c a l e

L E N G T H S C A L E : if ‘ h a s _ l e n g t h s c a l e = True ‘ , do NOT ‘ r e g i s t e r _ p a r a m e t e r (" r a w _ l e n g t h s c a l e " , ...) ‘

work page
[69]

PARAM ORDER : ‘ r e g i s t e r _ p a r a m e t e r ( name , ...) ‘ BEFORE ‘ r e g i s t e r _ c o n s t r a i n t ( name , ...) ‘

work page
[70]

d iag on al ( dim1 = -2 , dim2 = -1) ‘

DIAG : ‘ if diag : return covar . d iag on al ( dim1 = -2 , dim2 = -1) ‘

work page
[71]

sqrt / pi / log

IMPORTS : include ‘ import math ‘ if using math . sqrt / pi / log

work page
[72]

NO SELF - R E A S S I G N M E N T in forward () : never ‘ self . x = ... ‘ - use local v a r i a b l e s

work page
[73]

ONLY USE EX IS TI NG g py tor ch classes

work page
[74]

NO SUB - KERNELS : do NOT i n s t a n t i a t e gp yt orc h kernel objects ( e . g . ‘ M a t e r n K e r n e l () ‘, ‘ R B F K e r n e l () ‘) as sub - c o m p o n e n t s . They produce L a z y E v a l u a t e d K e r n e l T e n s o r with u n p r e d i c t a b l e shapes . Instead , i m p l e m e n t the formula dir ec tl y ( e . g . for Matern -5/2: ‘...

work page
[75]

NO DATA - D E P E N D E N T BOUNDS in forward () : never compute bounds / center from the input data ( e . g . ‘ x1 . max ( dim = -2) ‘ or ‘ x1 . min ( dim = -2) ‘) . These change with N and break shape i n v a r i a n c e . Store fixed bounds via ‘ register_buffer ‘ in __init__ , or use c o n s t a n t s ( e . g . 0 and 1)

work page
[76]

SCALAR P AI RW ISE DI STA NC E : when c o m p u t i n g d is tan ce between scalar values ( e . g . warped radii of shape (... , N , 1) ) , use ‘ torch . cdist ( r1 , r2 , p =2) ‘ which gives (... , N1 , N2 ) . NEVER use ‘ r1 - r2 ‘ or ‘ torch . abs ( r1 - r2 ) ‘ - this gives (... , N1 , 1) not (... , N1 , N2 ) and breaks cross - pair c o m p u t a t i o ...

work page

[1] [1]

Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020

Ryan Turner, David Eriksson, Michael McCourt, Juha Kiili, Eero Laaksonen, Zhen Xu, and Isabelle Guyon. Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020. InNeurIPS 2020 Competition and Demonstration Track, 2021

work page 2020

[2] [2]

Pre-training helps bayesian optimization too

Zi Wang, George E Dahl, Kevin Swersky, Chansoo Lee, Zelda Mariet, Zachary Nado, Justin Gilmer, Jasper Snoek, and Zoubin Ghahramani. Pre-training helps bayesian optimization too. InICML Workshop on Adaptive Experimental Design and Active Learning in the Real World, 2022

work page 2022

[3] [3]

Raiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces

Kevin Swersky, David Duvenaud, Jasper Snoek, Frank Hutter, and Michael A Osborne. Raiders of the lost architecture: Kernels for bayesian optimization in conditional parameter spaces. arXiv preprint arXiv:1409.4011, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[4] [4]

Neural architecture search with bayesian optimisation and optimal transport

Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabas Poczos, and Eric P Xing. Neural architecture search with bayesian optimisation and optimal transport. InAdvances in Neural Information Processing Systems (NeurIPS), 2018

work page 2018

[5] [5]

Local latent space bayesian optimization over structured inputs

Natalie Maus, Haydn Jones, Juston Moore, Matt J Kusner, John Bradshaw, and Jacob Gard- ner. Local latent space bayesian optimization over structured inputs. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022

[6] [6]

Latent bayesian optimization via autoregressive normalizing flows

Seunghun Lee, Jinyoung Park, Jaewon Chu, Minseo Yoon, and Hyunwoo J Kim. Latent bayesian optimization via autoregressive normalizing flows. InInternational Conference on Learning Representations (ICLR), 2025

work page 2025

[7] [7]

Learning search space partition for black-box optimization using monte carlo tree search

Linnan Wang, Rodrigo Fonseca, and Yuandong Tian. Learning search space partition for black-box optimization using monte carlo tree search. InAdvances in Neural Information Processing Systems (NeurIPS), 2020

work page 2020

[8] [8]

Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics.Machine Learning, 2023

Felix Berkenkamp, Andreas Krause, and Angela P Schoellig. Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics.Machine Learning, 2023

work page 2023

[9] [9]

A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise.Journal of Basic Engineering, 1964

Harold J Kushner. A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise.Journal of Basic Engineering, 1964

work page 1964

[10] [10]

Cambridge University Press, 2023

Roman Garnett.Bayesian optimization. Cambridge University Press, 2023

work page 2023

[11] [11]

Additive gaussian processes

David K Duvenaud, Hannes Nickisch, and Carl Rasmussen. Additive gaussian processes. In Advances in Neural Information Processing Systems (NeurIPS), 2011

work page 2011

[12] [12]

High dimensional bayesian optimisation and bandits via additive models

Kirthevasan Kandasamy, Jeff Schneider, and Barnabás Póczos. High dimensional bayesian optimisation and bandits via additive models. InInternational Conference on Machine Learning (ICML), 2015

work page 2015

[13] [13]

Discovering and exploiting additive structure for bayesian optimization

Jacob Gardner, Chuan Guo, Kilian Weinberger, Roman Garnett, and Roger Grosse. Discovering and exploiting additive structure for bayesian optimization. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2017

work page 2017

[14] [14]

Additive gaussian processes revisited

Xiaoyu Lu, Alexis Boukouvalas, and James Hensman. Additive gaussian processes revisited. InInternational Conference on Machine Learning (ICML), 2022

work page 2022

[15] [15]

Bayesian optimization in a billion dimensions via random embeddings.Journal of Artificial Intelligence Research, 2016

Ziyu Wang, Frank Hutter, Masrour Zoghi, David Matheson, and Nando De Feitas. Bayesian optimization in a billion dimensions via random embeddings.Journal of Artificial Intelligence Research, 2016

work page 2016

[16] [16]

A framework for bayesian optimization in embedded subspaces

Amin Nayebi, Alexander Munteanu, and Matthias Poloczek. A framework for bayesian optimization in embedded subspaces. InInternational Conference on Machine Learning (ICML), 2019

work page 2019

[17] [17]

Re-examining linear em- beddings for high-dimensional bayesian optimization

Ben Letham, Roberto Calandra, Akshara Rai, and Eytan Bakshy. Re-examining linear em- beddings for high-dimensional bayesian optimization. InAdvances in Neural Information Processing Systems (NeurIPS), 2020. 10

work page 2020

[18] [18]

High-dimensional bayesian optimization with sparse axis-aligned subspaces

David Eriksson and Martin Jankowiak. High-dimensional bayesian optimization with sparse axis-aligned subspaces. InConference on Uncertainty in Artificial Intelligence (UAI), 2021

work page 2021

[19] [19]

Increasing the scope as you learn: Adaptive bayesian optimization in nested subspaces

Leonard Papenmeier, Luigi Nardi, and Matthias Poloczek. Increasing the scope as you learn: Adaptive bayesian optimization in nested subspaces. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022

[20] [20]

Scalable global optimization via local bayesian optimization

David Eriksson, Michael Pearce, Jacob Gardner, Ryan D Turner, and Matthias Poloczek. Scalable global optimization via local bayesian optimization. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

work page 2019

[21] [21]

Scalable constrained bayesian optimization

David Eriksson and Matthias Poloczek. Scalable constrained bayesian optimization. InInterna- tional Conference on Artificial Intelligence and Statistics (AISTATS), 2021

work page 2021

[22] [22]

Joint composite latent space bayesian optimization

Natalie Maus, Zhiyuan Jerry Lin, Maximilian Balandat, and Eytan Bakshy. Joint composite latent space bayesian optimization. InInternational Conference on Machine Learning (ICML), 2024

work page 2024

[23] [23]

Vanilla bayesian optimization performs great in high dimensions

Carl Hvarfner, Erik Orm Hellsten, and Luigi Nardi. Vanilla bayesian optimization performs great in high dimensions. InInternational Conference on Machine Learning (ICML), 2024

work page 2024

[24] [24]

Standard gaussian process is all you need for high-dimensional bayesian optimization

Zhitong Xu, Haitao Wang, Jeff M Phillips, and Shandian Zhe. Standard gaussian process is all you need for high-dimensional bayesian optimization. InInternational Conference on Learning Representations (ICLR), 2025

work page 2025

[25] [25]

Bock: Bayesian optimization with cylindrical kernels

ChangYong Oh, Efstratios Gavves, and Max Welling. Bock: Bayesian optimization with cylindrical kernels. InInternational Conference on Machine Learning (ICML), 2018

work page 2018

[26] [26]

We still don’t understand high-dimensional bayesian optimization

Colin Doumont, Donney Fan, Natalie Maus, Jacob R Gardner, Henry Moss, and Geoff Pleiss. We still don’t understand high-dimensional bayesian optimization. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2026

work page 2026

[27] [27]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[28] [28]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[29] [29]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

Mathematical discoveries from program search with large language models

Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M Pawan Kumar, Emilien Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming Wang, Omar Fawzi, et al. Mathematical discoveries from program search with large language models. Nature, 2024

work page 2024

[31] [31]

AlphaEvolve: A coding agent for scientific and algorithmic discovery

Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. Alphaevolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[32] [32]

Funbo: Discovering acquisition functions for bayesian optimization with funsearch

Virginia Aglietti, Ira Ktena, Jessica Schrouff, Eleni Sgouritsa, Francisco Ruiz, Alan Malek, Alexis Bellot, and Silvia Chiappa. Funbo: Discovering acquisition functions for bayesian optimization with funsearch. InInternational Conference on Machine Learning (ICML), 2025

work page 2025

[33] [33]

Trajevo: Trajectory prediction heuristics design via llm-driven evolution

Zhikai Zhao, Chuanbo Hua, Federico Berto, Kanghoon Lee, Zihan Ma, Jiachen Li, and Jinkyoo Park. Trajevo: Trajectory prediction heuristics design via llm-driven evolution. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026. 11

work page 2026

[34] [34]

Self-evolverec: Self-evolving recommender systems with llm-based directional feedback.arXiv preprint arXiv:2602.12612, 2026

Sein Kim, Sangwu Park, Hongseok Kang, Wonjoong Kim, Jimin Seo, Yeonjun In, Kanghoon Yoon, and Chanyoung Park. Self-evolverec: Self-evolving recommender systems with llm-based directional feedback.arXiv preprint arXiv:2602.12612, 2026

work page arXiv 2026

[35] [35]

Adaptive and safe bayesian optimization in high dimensions via one-dimensional subspaces

Johannes Kirschner, Mojmir Mutny, Nicole Hiller, Rasmus Ischebeck, and Andreas Krause. Adaptive and safe bayesian optimization in high dimensions via one-dimensional subspaces. In International Conference on Machine Learning (ICML), 2019

work page 2019

[36] [36]

Monte carlo tree search based variable selection for high dimensional bayesian optimization

Lei Song, Ke Xue, Xiaobin Huang, and Chao Qian. Monte carlo tree search based variable selection for high dimensional bayesian optimization. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022

[37] [37]

O., Hvarfner, C., Papenmeier, L., and Nardi, L

Erik Orm Hellsten, Carl Hvarfner, Leonard Papenmeier, and Luigi Nardi. High-dimensional bayesian optimization with group testing.arXiv preprint arXiv:2310.03515, 2023

work page arXiv 2023

[38] [38]

Multi-objective bayesian optimization over high-dimensional search spaces

Samuel Daulton, David Eriksson, Maximilian Balandat, and Eytan Bakshy. Multi-objective bayesian optimization over high-dimensional search spaces. InConference on Uncertainty in Artificial Intelligence (UAI), 2022

work page 2022

[39] [39]

Discovering many diverse solutions with bayesian optimization

Natalie Maus, Kaiwen Wu, David Eriksson, and Jacob Gardner. Discovering many diverse solutions with bayesian optimization. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023

work page 2023

[40] [40]

Feasibility-driven trust region bayesian optimization

Paolo Ascia, Elena Raponi, Thomas Bäck, and Fabian Duddeck. Feasibility-driven trust region bayesian optimization. InInternational Conference on Automated Machine Learning (AutoML), 2025

work page 2025

[41] [41]

Understanding high-dimensional bayesian optimization

Leonard Papenmeier, Matthias Poloczek, and Luigi Nardi. Understanding high-dimensional bayesian optimization. InInternational Conference on Machine Learning (ICML), 2025

work page 2025

[42] [42]

Cylindrical thompson sampling for high-dimensional bayesian optimization

Bahador Rashidi, Kerrick Johnstonbaugh, and Chao Gao. Cylindrical thompson sampling for high-dimensional bayesian optimization. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024

work page 2024

[43] [43]

Adaptive acquisition selection for bayesian optimization with large language models

Giang Ngo, Dat Phan Trong, Dang Nguyen, Sunil Gupta, and Svetha Venkatesh. Adaptive acquisition selection for bayesian optimization with large language models. InInternational Conference on Learning Representations (ICLR), 2026

work page 2026

[44] [44]

Llamea-bo: A large language model evolutionary algorithm for automatically generating bayesian optimization algorithms

Wenhu Li, Niki van Stein, Thomas Bäck, and Elena Raponi. Llamea-bo: A large language model evolutionary algorithm for automatically generating bayesian optimization algorithms. arXiv preprint arXiv:2505.21034, 2025

work page arXiv 2025

[45] [45]

Large language models to enhance bayesian optimization

Tennison Liu, Nicolás Astorga, Nabeel Seedat, and Mihaela van der Schaar. Large language models to enhance bayesian optimization. InInternational Conference on Learning Representa- tions (ICLR), 2024

work page 2024

[46] [46]

Adaptive kernel design for bayesian optimization is a piece of cake with llms

Richard Cornelius Suwandi, Feng Yin, Juntao Wang, Renjie Li, Tsung-Hui Chang, and Sergios Theodoridis. Adaptive kernel design for bayesian optimization is a piece of cake with llms. In Advances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025

[47] [47]

We are underselling the modularity of Bayesian optimization

Austin Tripp. We are underselling the modularity of Bayesian optimization. https://www. austintripp.ca/blog/2026-02-09-bo-modularity/, 2026. Accessed: 2026-04-14

work page 2026

[48] [48]

Gaussian processes in machine learning

Carl Edward Rasmussen. Gaussian processes in machine learning. InSummer School on Machine Learning. Springer, 2003

work page 2003

[49] [49]

The application of bayesian methods for seeking the extremum.Towards Global Optimization, 1998

Jonas Mockus. The application of bayesian methods for seeking the extremum.Towards Global Optimization, 1998

work page 1998

[50] [50]

A limited memory algorithm for bound constrained optimization.SIAM Journal on Scientific Computing, 1995

Richard H Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. A limited memory algorithm for bound constrained optimization.SIAM Journal on Scientific Computing, 1995

work page 1995

[51] [51]

Gaussian processes for regression

Christopher Williams and Carl Rasmussen. Gaussian processes for regression. InAdvances in Neural Information Processing Systems (NeurIPS), 1995. 12

work page 1995

[52] [52]

Batched large-scale bayesian optimization in high-dimensional spaces

Zi Wang, Clement Gehring, Pushmeet Kohli, and Stefanie Jegelka. Batched large-scale bayesian optimization in high-dimensional spaces. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2018

work page 2018

[53] [53]

Lassobench: A high- dimensional hyperparameter optimization benchmark suite for lasso

Kenan Šehi ´c, Alexandre Gramfort, Joseph Salmon, and Luigi Nardi. Lassobench: A high- dimensional hyperparameter optimization benchmark suite for lasso. InInternational Confer- ence on Automated Machine Learning (AutoML), 2022

work page 2022

[54] [54]

An experimental study in adaptive kernel selection for bayesian optimization.IEEE Access, 2019

Ibai Roman, Roberto Santana, Alexander Mendiburu, and Jose A Lozano. An experimental study in adaptive kernel selection for bayesian optimization.IEEE Access, 2019

work page 2019

[55] [55]

Structure discovery in nonparametric regression through compositional kernel search

David Duvenaud, James Lloyd, Roger Grosse, Joshua Tenenbaum, and Ghahramani Zoubin. Structure discovery in nonparametric regression through compositional kernel search. In International Conference on Machine Learning (ICML), 2013

work page 2013

[56] [56]

Unexpected improvements to expected improvement for bayesian optimization

Sebastian Ament, Samuel Daulton, David Eriksson, Maximilian Balandat, and Eytan Bakshy. Unexpected improvements to expected improvement for bayesian optimization. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023

[57] [57]

Construction and comparison of high-dimensional sobol’generators.Wilmott, 2011

Ilya M Sobol’, Danil Asotsky, Alexander Kreinin, and Sergei Kucherenko. Construction and comparison of high-dimensional sobol’generators.Wilmott, 2011. 13 Appendix A Task Details We evaluate our method on five standard high-dimensional BO benchmarks covering a range of problem types and dimensionalities. Rover (D= 100 ).The Rover benchmark [ 52] is a traj...

work page 2011

[58] [58]

The system calls ‘ E v o l v e d K e r n e l ( a r d _ n u m _ d i m s = D ) ‘

_ _i nit __ S I G N A T U R E : ‘ __init__ ‘ MUST accept ‘ a r d _ n u m _ d i m s : int ‘ as the ONLY re qu ir ed p a r a m e t e r . The system calls ‘ E v o l v e d K e r n e l ( a r d _ n u m _ d i m s = D ) ‘. All other args MUST have d ef au lt s . NEVER add re qu ire d args like ‘q : int ‘ , ‘ center : Tensor ‘ , ‘M : int ‘

work page

[59] [59]

FORWARD S I G N A T U R E : ‘ def forward ( self , x1 , x2 , diag = False , ** params ) ‘ - always accept ** params

work page

[60] [60]

NEVER use ‘

BATCH - SAFE SHAPES ( C RI TIC AL ) : The kernel is tested with these exact shapes - your code MUST handle ALL of them : 13- 2 D : x1 =(5 , D ) vs x2 =(1 , D ) -> output must be (5 , 1) , NOT (5 , 5) 14- 2 D : x1 =(3 , D ) vs x2 =(7 , D ) -> output must be (3 , 7) 15- 3 D batched ( BoTorch acqf o p t i m i z a t i o n ) : x1 =(1 , 4 , D ) vs x2 =(1 , 3 , ...

work page

[61] [61]

, N1 , N2 ) for ANY N1 != N2

OUTPUT SHAPE : return (... , N1 , N2 ) for ANY N1 != N2 . If your output is ( N1 , N1 ) instead of ( N1 , N2 ) , you have a bug

work page

[62] [62]

cdist ( x1s , x2s , p =2) ‘ - NEVER ‘ x1

D IS TAN CE : ‘ torch . cdist ( x1s , x2s , p =2) ‘ - NEVER ‘ x1 . u n s q u e e z e ( -2) - x2 . u n s q u e e z e ( -3) ‘ ( OOM )

work page

[63] [63]

t r a n s p o s e ( -1 , -2) ‘

INNER PRODUCT : ‘ x1 @ x2 . t r a n s p o s e ( -1 , -2) ‘

work page

[64] [64]

device ‘

DEVICE : new tensors in forward use ‘ device = x1 . device ‘. Use ‘ register_buffer ‘ in __ in it __

work page

[65] [65]

NO IN - PLACE : ‘x = x + y ‘ not ‘x += y ‘. ‘x = x . clamp (...) ‘ not ‘x . clamp_ (...) ‘

work page

[66] [66]

detach () ‘, ‘

NO DETACH / NUMPY : never ‘. detach () ‘, ‘. numpy () ‘, ‘. item () ‘, ‘ torch . no_grad () ‘ in forward

work page

[67] [67]

clamp ( min =1 e -15) ‘ before sqrt / log , ‘

N U M E R I C A L : ‘. clamp ( min =1 e -15) ‘ before sqrt / log , ‘. clamp ( max =20.0) ‘ for exp

work page

[68] [68]

r a w _ l e n g t h s c a l e

L E N G T H S C A L E : if ‘ h a s _ l e n g t h s c a l e = True ‘ , do NOT ‘ r e g i s t e r _ p a r a m e t e r (" r a w _ l e n g t h s c a l e " , ...) ‘

work page

[69] [69]

PARAM ORDER : ‘ r e g i s t e r _ p a r a m e t e r ( name , ...) ‘ BEFORE ‘ r e g i s t e r _ c o n s t r a i n t ( name , ...) ‘

work page

[70] [70]

d iag on al ( dim1 = -2 , dim2 = -1) ‘

DIAG : ‘ if diag : return covar . d iag on al ( dim1 = -2 , dim2 = -1) ‘

work page

[71] [71]

sqrt / pi / log

IMPORTS : include ‘ import math ‘ if using math . sqrt / pi / log

work page

[72] [72]

NO SELF - R E A S S I G N M E N T in forward () : never ‘ self . x = ... ‘ - use local v a r i a b l e s

work page

[73] [73]

ONLY USE EX IS TI NG g py tor ch classes

work page

[74] [74]

NO SUB - KERNELS : do NOT i n s t a n t i a t e gp yt orc h kernel objects ( e . g . ‘ M a t e r n K e r n e l () ‘, ‘ R B F K e r n e l () ‘) as sub - c o m p o n e n t s . They produce L a z y E v a l u a t e d K e r n e l T e n s o r with u n p r e d i c t a b l e shapes . Instead , i m p l e m e n t the formula dir ec tl y ( e . g . for Matern -5/2: ‘...

work page

[75] [75]

NO DATA - D E P E N D E N T BOUNDS in forward () : never compute bounds / center from the input data ( e . g . ‘ x1 . max ( dim = -2) ‘ or ‘ x1 . min ( dim = -2) ‘) . These change with N and break shape i n v a r i a n c e . Store fixed bounds via ‘ register_buffer ‘ in __init__ , or use c o n s t a n t s ( e . g . 0 and 1)

work page

[76] [76]

SCALAR P AI RW ISE DI STA NC E : when c o m p u t i n g d is tan ce between scalar values ( e . g . warped radii of shape (... , N , 1) ) , use ‘ torch . cdist ( r1 , r2 , p =2) ‘ which gives (... , N1 , N2 ) . NEVER use ‘ r1 - r2 ‘ or ‘ torch . abs ( r1 - r2 ) ‘ - this gives (... , N1 , 1) not (... , N1 , N2 ) and breaks cross - pair c o m p u t a t i o ...

work page