BoLT: A Benchmark to Democratize Black-box Optimization Research for Expensive LLM Tasks
Pith reviewed 2026-05-19 20:44 UTC · model grok-4.3
The pith
BoLT supplies lightweight surrogate models from thousands of real LLM runs so black-box optimization researchers can test methods on realistic expensive tasks without prohibitive costs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BoLT is the first LLM-centric benchmark that democratizes access to realistic optimization problems by providing lightweight surrogate models fitted to results from thousands of actual LLM experiments. These surrogates embed the practical challenges of multi-fidelity evaluation, multi-objective trade-offs, heteroscedastic noise, and high-dimensional search spaces that arise when tuning LLM training and inference configurations. Benchmarking a broad collection of Bayesian optimization and black-box optimization methods on BoLT shows that particular BO approaches maintain an edge in performance across the tasks.
What carries the argument
Lightweight surrogate models fitted to real LLM experimental data that approximate the objective landscapes for configuration tuning.
If this is right
- Researchers can now iterate on new black-box optimization algorithms using realistic LLM-like problems at low cost.
- Performance gaps in existing methods for handling noise and multiple objectives become measurable on LLM-relevant tasks.
- Algorithm designers gain concrete targets for improving sample efficiency on high-dimensional noisy surfaces.
- The benchmark supports reproducible comparisons that were previously blocked by compute barriers.
Where Pith is reading between the lines
- Methods that excel on the surrogates could transfer to real LLM tuning workflows if the landscapes match closely enough.
- The surrogate-fitting strategy might be reused to create accessible benchmarks for other domains where full experiments are prohibitively expensive.
- Adding newer model families to the benchmark would test whether the observed method rankings remain stable as architectures evolve.
Load-bearing premise
The fitted surrogate models must accurately reproduce the optimization landscapes, noise patterns, and relative performance ordering of methods that would appear on the true expensive LLM tasks.
What would settle it
Execute the top-ranked BO methods identified on BoLT directly on the original LLM tasks and check whether their sample-efficiency and final-performance advantages over other methods still hold.
Figures
read the original abstract
Optimization of LLM training and inference configurations, such as hyperparameters, data mixtures, and prompts, is critical to performance, but it is often approached heuristically in practice, leading to potentially suboptimal outcomes. By framing them as noisy, expensive, and derivative-free optimization problems, Bayesian optimization (BO) and other black-box optimization (BBO) methods offer a promising yet underexplored direction for principled, sample-efficient methods. However, LLM training and inference costs are prohibitively high for most of the BBO research community, and new methods are often only evaluated on synthetic test functions and small-scale datasets that fail to capture the challenges of modern LLM optimization problems. This impedes the development of BBO methods and makes it difficult to assess their effectiveness on modern LLM tasks. We introduce BoLT, the first LLM-centric benchmark that democratizes LLM research for the BBO community. BoLT is released at https://github.com/chewwt/bolt. BoLT covers broad and well-motivated LLM optimization problems, involving multi-fidelity, multi-objective, heteroscedastic noise, and high-dimensional search spaces. Each problem in BoLT is grounded in real experimental data and made fully reproducible and accessible through lightweight surrogate models fitted to the results of thousands of real LLM experiments. We benchmark BoLT against an extensive range of BO and BBO methods, showing that selected BO methods consistently outperform others across tasks and highlighting gaps in existing BBO methods on LLM tasks, underscoring the need to modernize benchmarks for the BBO community.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces BoLT, the first LLM-centric benchmark for black-box optimization research. It releases lightweight surrogate models fitted to thousands of real LLM experiments to make expensive tasks (hyperparameter tuning, data mixtures, prompt optimization) accessible to the BBO community. The benchmark covers multi-fidelity, multi-objective, heteroscedastic noise, and high-dimensional problems. The authors evaluate a broad range of BO and BBO methods on BoLT and report that selected BO methods consistently outperform others across tasks, while highlighting gaps in existing methods.
Significance. If the surrogates accurately reproduce real LLM optimization landscapes, noise characteristics, and relative method orderings, BoLT would be a substantial contribution: it lowers the barrier for BBO researchers to test methods on practically relevant, expensive problems rather than synthetic functions. The public release of code and reproducible surrogates supports this utility. The work directly addresses the mismatch between current BBO benchmarks and modern LLM-scale tasks.
major comments (2)
- [§3] §3 (Surrogate Model Construction): The manuscript states that surrogates are fitted to results from thousands of real LLM experiments and are intended to reproduce optimization landscapes, heteroscedastic noise, and multi-fidelity behavior, but reports no quantitative fidelity metrics (e.g., held-out predictive MSE, Spearman rank correlation of method performances, or fidelity across fidelity levels). This validation is load-bearing for the central claim that rankings and behaviors observed on BoLT will transfer to actual expensive LLM tasks.
- [§5] §5 (Benchmarking Experiments): The claim that 'selected BO methods consistently outperform others across tasks' is presented without statistical significance testing or confidence intervals on performance differences. Given the heteroscedastic noise explicitly modeled in the surrogates, this weakens the strength of the comparative conclusions.
minor comments (2)
- [Abstract and §2] The abstract and §2 could more explicitly state the regression or interpolation technique used to build the surrogates (e.g., Gaussian process, random forest, or neural network) rather than referring only to 'lightweight surrogate models'.
- [Table 2] Table 2 (task characteristics) would benefit from an additional column reporting the number of real experiments used to fit each surrogate, to allow readers to assess data density per task.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review. The comments highlight important aspects of surrogate validation and statistical rigor that we have addressed through revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (Surrogate Model Construction): The manuscript states that surrogates are fitted to results from thousands of real LLM experiments and are intended to reproduce optimization landscapes, heteroscedastic noise, and multi-fidelity behavior, but reports no quantitative fidelity metrics (e.g., held-out predictive MSE, Spearman rank correlation of method performances, or fidelity across fidelity levels). This validation is load-bearing for the central claim that rankings and behaviors observed on BoLT will transfer to actual expensive LLM tasks.
Authors: We agree that quantitative fidelity metrics are essential to support the claim that BoLT surrogates accurately reproduce real LLM optimization landscapes, noise characteristics, and relative method orderings. In the revised manuscript, we have added a new validation subsection to §3. This includes held-out predictive MSE and R² scores computed on a disjoint set of real LLM experimental results. We also report Spearman rank correlations between method performance rankings obtained on the surrogates and those from a small collection of held-out real LLM runs. Finally, we provide per-fidelity-level error metrics to verify multi-fidelity behavior. These additions directly bolster the transferability argument. revision: yes
-
Referee: [§5] §5 (Benchmarking Experiments): The claim that 'selected BO methods consistently outperform others across tasks' is presented without statistical significance testing or confidence intervals on performance differences. Given the heteroscedastic noise explicitly modeled in the surrogates, this weakens the strength of the comparative conclusions.
Authors: We acknowledge that the lack of statistical significance testing and confidence intervals weakens the comparative claims, especially in the presence of heteroscedastic noise. In the revision, we have updated §5 and the appendix to include bootstrap confidence intervals around all performance metrics. We have also added results from Wilcoxon signed-rank tests (with p-values) to assess whether observed differences between methods are statistically significant. These changes provide the requested rigor while preserving the original experimental setup. revision: yes
Circularity Check
No significant circularity; surrogates and evaluations remain independent
full rationale
The paper constructs lightweight surrogate models by fitting to results from thousands of real LLM experiments (external data) and then evaluates BO/BBO methods on those surrogates. No step reduces the reported method outperformance or benchmark utility to the authors' fitting procedure by construction, self-definition, or self-citation chain. The performance ordering is obtained by running the methods on the surrogates rather than being statistically forced by the surrogate parameters themselves, and the surrogates are presented as reproducible approximations grounded outside the present evaluation loop. This is a standard benchmark construction with no load-bearing circular reduction.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We fit 2-layer MLPs for the optimization objective of HPO and DMO tasks... Emulators are validated on a Sobol-sampled held-out test set... Spearman’s rank correlation ρ as the validation metric
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery theorem unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
BOLT covers broad and well-motivated LLM optimization problems, involving multi-fidelity, multi-objective, heteroscedastic noise, and high-dimensional search spaces.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
arXiv preprint arXiv:2402.16827
A. Albalak, Y . Elazar, S. M. Xie, S. Longpre, N. Lambert, X. Wang, N. Muennighoff, B. Hou, L. Pan, H. Jeong, et al. A survey on data selection for language models.arXiv preprint arXiv:2402.16827, 2024
- [3]
-
[4]
S. P. Arango, H. S. Jomaa, M. Wistuba, and J. Grabocka. Hpo-b: A large-scale reproducible benchmark for black-box hpo based on openml. InThirty-fifth Conference on Neural Information Processing Systems Track on Datasets and Benchmarks, 2021
work page 2021
-
[5]
Program Synthesis with Large Language Models
J. Austin, A. Odena, M. Nye, M. Bosma, H. Michalewski, D. Dohan, E. Jiang, C. Cai, M. Terry, Q. Le, et al. Program synthesis with large language models.arXiv preprint arXiv:2108.07732, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[6]
M. Balandat, B. Karrer, D. Jiang, S. Daulton, B. Letham, A. G. Wilson, and E. Bakshy. BoTorch: A framework for efficient monte-carlo bayesian optimization.Advances in Neural Information Processing Systems, 33:21524–21538, 2020
work page 2020
-
[7]
S. Belakaria, A. Deshwal, and J. R. Doppa. Max-value entropy search for multi-objective Bayesian optimization.Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[8]
J. Bergstra, R. Bardenet, Y . Bengio, and B. Kégl. Algorithms for hyper-parameter optimization.Advances in Neural Information Processing Systems, 24, 2011
work page 2011
-
[9]
D. Bingham and S. Surjanovic. Virtual library of simulation experiments: Test functions and datasets,
-
[10]
URLhttps://www.sfu.ca/~ssurjano/optimization.html
- [11]
-
[12]
E. Bradford, A. M. Schweidtmann, and A. Lapkin. Efficient multiobjective optimization employing Gaussian processes, spectral sampling and a genetic algorithm.Journal of global optimization, 71(2): 407–438, 2018
work page 2018
-
[13]
O. Chapelle and L. Li. An empirical evaluation of Thompson sampling.Advances in Neural Information Processing Systems, 24, 2011
work page 2011
-
[14]
L. Chen, J. Chen, T. Goldstein, H. Huang, and T. Zhou. Instructzero: Efficient instruction optimization for black-box large language models. InInternational Conference on Machine Learning, 2024
work page 2024
-
[15]
Z. Chen, G. K. R. Lau, C.-S. Foo, and B. K. H. Low. DUET: Optimizing training data mixtures via feedback from unseen evaluation tasks.arXiv:2502.00270, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[16]
S. Daulton, M. Balandat, and E. Bakshy. Parallel bayesian optimization of multiple noisy objectives with expected hypervolume improvement.Advances in Neural Information Processing Systems, 34:2187–2200, 2021. 10
work page 2021
-
[17]
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II.IEEE transactions on evolutionary computation, 6(2):182–197, 2002
work page 2002
-
[18]
K. Dreczkowski, A. Grosnit, and H. Bou Ammar. Framework and benchmarks for combinatorial and mixed-variable Bayesian optimization.Advances in Neural Information Processing Systems Track on Datasets and Benchmarks, 36:69464–69489, 2023
work page 2023
-
[19]
K. Eggensperger, P. Müller, N. Mallik, M. Feurer, R. Sass, A. Klein, N. Awad, M. Lindauer, and F. Hutter. Hpobench: A collection of reproducible multi-fidelity benchmark problems for HPO. InNeural Information Processing Systems Track on Datasets and Benchmarks, 2021
work page 2021
-
[20]
D. Eriksson and M. Jankowiak. High-dimensional Bayesian optimization with sparse axis-aligned sub- spaces. InUncertainty in artificial intelligence, pages 493–503. PMLR, 2021
work page 2021
-
[21]
D. Eriksson, M. Pearce, J. Gardner, R. D. Turner, and M. Poloczek. Scalable global optimization via local Bayesian optimization.Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[22]
S. Falkner, A. Klein, and F. Hutter. Bohb: Robust and efficient hyperparameter optimization at scale. In International Conference on Machine Learning, pages 1437–1446. PMLR, 2018
work page 2018
-
[23]
C. Fernando, D. S. Banarse, H. Michalewski, S. Osindero, and T. Rocktäschel. Promptbreeder: Self- referential self-improvement via prompt evolution. InInternational Conference on Machine Learning, pages 13481–13544. PMLR, 2024
work page 2024
-
[24]
P. Frazier, W. Powell, and S. Dayanik. The knowledge-gradient policy for correlated normal beliefs. INFORMS journal on Computing, 21(4):599–613, 2009
work page 2009
-
[25]
P. I. Frazier. A tutorial on Bayesian optimization.arXiv preprint arXiv:1807.02811, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[26]
L. Gao, J. Tow, B. Abbasi, S. Biderman, S. Black, A. DiPofi, C. Foster, L. Golding, J. Hsu, A. Le Noac’h, H. Li, K. McDonell, N. Muennighoff, C. Ociepa, J. Phang, L. Reynolds, H. Schoelkopf, A. Skowron, L. Sutawika, E. Tang, A. Thite, B. Wang, K. Wang, and A. Zou. The language model evaluation harness, 07 2024. URLhttps://zenodo.org/records/12608602
-
[27]
R. Garnett.Bayesian optimization. Cambridge University Press, 2023
work page 2023
-
[28]
N. Hansen. The CMA evolution strategy: a comparing review.Towards a new evolutionary computation: Advances in the estimation of distribution algorithms, pages 75–102, 2006
work page 2006
- [29]
- [30]
-
[31]
F. Häse, M. Aldeghi, R. J. Hickman, L. M. Roch, M. Christensen, E. Liles, J. E. Hein, and A. Aspuru-Guzik. Olympus: a benchmarking framework for noisy optimization and experiment planning.Machine Learning: Science and Technology, 2(3):035021, 2021
work page 2021
-
[32]
D. Hendrycks, C. Burns, S. Kadavath, A. Arora, S. Basart, E. Tang, D. Song, and J. Steinhardt. Measuring mathematical problem solving with the MATH dataset. InThirty-fifth Conference on Neural Information Processing Systems Track on Datasets and Benchmarks, 2021
work page 2021
-
[33]
D. Hernández-Lobato, J. Hernandez-Lobato, A. Shah, and R. Adams. Predictive entropy search for multi- objective Bayesian optimization. InInternational Conference on Machine Learning, pages 1492–1501. PMLR, 2016
work page 2016
-
[34]
J. M. Hernández-Lobato, M. W. Hoffman, and Z. Ghahramani. Predictive entropy search for efficient global optimization of black-box functions.Advances in Neural Information Processing Systems, 27, 2014
work page 2014
-
[35]
W. Hu, Y . Shu, Z. Yu, Z. Wu, X. Lin, Z. Dai, S.-K. Ng, and B. K. H. Low. Localized zeroth-order prompt optimization.Advances in Neural Information Processing Systems, 37:86309–86345, 2024
work page 2024
-
[36]
C. Hvarfner, F. Hutter, and L. Nardi. Joint entropy search for maximally-informed Bayesian optimization. Advances in Neural Information Processing Systems, 35:11494–11506, 2022
work page 2022
-
[37]
C. Hvarfner, E. O. Hellsten, and L. Nardi. Vanilla Bayesian optimization performs great in high dimensions. InInternational Conference on Machine Learning, pages 20793–20817. PMLR, 2024. 11
work page 2024
-
[38]
C. Jang, H. Lee, J. Kim, and J. Lee. Model fusion through Bayesian optimization in language model fine-tuning.Advances in Neural Information Processing Systems, 37:29878–29912, 2024
work page 2024
-
[39]
D. R. Jones, M. Schonlau, and W. J. Welch. Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4):455–492, 1998
work page 1998
-
[40]
K. Kersting, C. Plagemann, P. Pfaff, and W. Burgard. Most likely heteroscedastic gaussian process regression. InInternational Conference on Machine learning, pages 393–400, 2007
work page 2007
-
[41]
J. Knowles. Parego: A hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems.IEEE transactions on evolutionary computation, 10(1):50–66, 2006
work page 2006
- [42]
- [43]
-
[44]
A. Kusupati, G. Bhatt, A. Rege, M. Wallingford, A. Sinha, V . Ramanujan, W. Howard-Snyder, K. Chen, S. Kakade, P. Jain, et al. Matryoshka representation learning.Advances in Neural Information Processing Systems, 35:30233–30249, 2022
work page 2022
-
[45]
N. Lambert, J. Morrison, V . Pyatkin, S. Huang, H. Ivison, F. Brahman, L. J. V . Miranda, A. Liu, N. Dziri, X. Lyu, et al. Tulu 3: Pushing frontiers in open language model post-training. InSecond Conference on Language Modeling, 2025
work page 2025
-
[46]
M. Lázaro-Gredilla and M. K. Titsias. Variational heteroscedastic Gaussian process regression. In International Conference on Machine Learning, pages 841–848, 2011
work page 2011
- [47]
-
[48]
L. Li, K. Jamieson, A. Rostamizadeh, E. Gonina, J. Ben-Tzur, M. Hardt, B. Recht, and A. Talwalkar. A system for massively parallel hyperparameter tuning.Proceedings of machine learning and systems, 2: 230–246, 2020
work page 2020
-
[49]
Y . Li, Z. Liu, and E. Xing. Data mixing optimization for supervised fine-tuning of large language models. InInternational Conference on Machine Learning, pages 35419–35437. PMLR, 2025
work page 2025
-
[50]
Q. Liang, A. E. Gongora, Z. Ren, A. Tiihonen, Z. Liu, S. Sun, J. R. Deneault, D. Bash, F. Mekki-Berrada, S. A. Khan, et al. Benchmarking the performance of Bayesian optimization across multiple experimental materials science domains.npj Computational Materials, 7(1):188, 2021
work page 2021
-
[51]
H. Lightman, V . Kosaraju, Y . Burda, H. Edwards, B. Baker, T. Lee, J. Leike, J. Schulman, I. Sutskever, and K. Cobbe. Let’s verify step by step. InThe Twelfth International Conference on Learning Representations, 2023
work page 2023
-
[52]
X. Lin, Z. Wu, Z. Dai, W. Hu, Y . Shu, S.-K. Ng, P. Jaillet, and B. K. H. Low. Use your INSTINCT: Instruc- tion optimization for llms using neural bandits coupled with transformers. InInternational Conference on Machine Learning, pages 30317–30345. PMLR, 2024
work page 2024
-
[53]
M. Lindauer, K. Eggensperger, M. Feurer, A. Biedenkapp, D. Deng, C. Benjamins, T. Ruhkopf, R. Sass, and F. Hutter. Smac3: A versatile Bayesian optimization package for hyperparameter optimization.Journal of Machine Learning Research, 23(54):1–9, 2022
work page 2022
-
[54]
J. Liu, C. S. Xia, Y . Wang, and L. Zhang. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation.Advances in Neural Information Processing Systems, 36:21558–21572, 2023
work page 2023
-
[55]
Q. Liu, X. Zheng, N. Muennighoff, G. Zeng, L. Dou, T. Pang, J. Jiang, and M. Lin. Regmix: Data mixture as regression for language model pre-training. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[56]
H. B. Moss, D. S. Leslie, J. Gonzalez, and P. Rayson. Gibbon: General-purpose information-based Bayesian optimisation.Journal of Machine Learning Research, 22(235):1–49, 2021. 12
work page 2021
- [57]
-
[58]
T. Olmo, A. Ettinger, A. Bertsch, B. Kuehl, D. Graham, D. Heineman, D. Groeneveld, F. Brahman, F. Timbers, H. Ivison, et al. Olmo 3.arXiv preprint arXiv:2512.13961, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[59]
P. S. Palar, R. P. Liem, L. R. Zuhal, and K. Shimoyama. On the use of surrogate models in engineering design optimization and exploration: The key issues. InProceedings of the genetic and evolutionary computation conference companion, pages 1592–1602, 2019
work page 2019
-
[60]
L. Papenmeier, L. Nardi, and M. Poloczek. Increasing the scope as you learn: Adaptive Bayesian optimization in nested subspaces.Advances in Neural Information Processing Systems, 35:11586–11601, 2022
work page 2022
-
[61]
L. Papenmeier, M. Poloczek, and L. Nardi. Understanding high-dimensional Bayesian optimization. In International Conference on Machine Learning, pages 47902–47923. PMLR, 2025
work page 2025
-
[62]
F. Pfisterer, L. Schneider, J. Moosbauer, M. Binder, and B. Bischl. Yahpo gym-an efficient multi-objective multi-fidelity benchmark for hyperparameter optimization. InInternational Conference on Automated Machine Learning, pages 3–1. PMLR, 2022
work page 2022
-
[63]
EmbeddingGemma: Powerful and Lightweight Text Representations
H. Schechter Vera, S. Dua, B. Zhang, D. Salz, R. Mullins, S. Raghuram Panyam, S. Smoot, I. Naim, J. Zou, F. Chen, D. Cer, A. Lisak, M. Choi, L. Gonzalez, O. Sanseviero, G. Cameron, I. Ballantyne, K. Black, K. Chen, W. Wang, Z. Li, G. Martins, J. Lee, M. Sherwood, J. Ji, R. Wu, J. Zheng, J. Singh, A. Sharma, D. Sreepat, A. Jain, A. Elarabawy, A. Co, A. Dou...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[64]
B. Seong-Eun, L. Jung-Mok, K. Sung-Bin, and T.-H. Oh. Efficient hyper-parameter search for LoRA via language-aided Bayesian optimization.arXiv preprint arXiv:2602.11171, 2026
-
[65]
B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas. Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, 2015
work page 2015
-
[66]
C. Shi, K. Yang, Z. Chen, J. Li, J. Yang, and C. Shen. Efficient prompt optimization through the lens of best arm identification.Advances in Neural Information Processing Systems, 37:99646–99685, 2024
work page 2024
- [67]
-
[68]
N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. InInternational Conference on Machine Learning, 2010
work page 2010
- [69]
-
[70]
S. Toshniwal, W. Du, I. Moshkov, B. Kisacanin, A. Ayrapetyan, and I. Gitman. Openmathinstruct-2: Accelerating ai for math with massive open-source instruction data. InThe Thirteenth International Conference on Learning Representations, 2023
work page 2023
- [71]
-
[72]
B. Tu, A. Gandy, N. Kantas, and B. Shafei. Joint entropy search for multi-objective Bayesian optimization. Advances in Neural Information Processing Systems, 35:9922–9938, 2022
work page 2022
-
[73]
L. Wang, W. Xu, Y . Lan, Z. Hu, Y . Lan, R. K.-W. Lee, and E.-P. Lim. Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models. InProceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers), pages 2609–2634, 2023
work page 2023
-
[74]
Z. Wang and S. Jegelka. Max-value entropy search for efficient Bayesian optimization. InInternational Conference on Machine Learning, pages 3627–3635. PMLR, 2017. 13
work page 2017
-
[75]
Z. Wu, X. Lin, Z. Dai, W. Hu, Y . Shu, S.-K. Ng, P. Jaillet, and B. K. H. Low. Prompt optimization with ease? efficient ordering-aware automated selection of exemplars.Advances in Neural Information Processing Systems, 37:122706–122740, 2024
work page 2024
-
[76]
S. M. Xie, H. Pham, X. Dong, N. Du, H. Liu, Y . Lu, P. S. Liang, Q. V . Le, T. Ma, and A. W. Yu. Doremi: Optimizing data mixtures speeds up language model pretraining.Advances in Neural Information Processing Systems, 36:69798–69818, 2023
work page 2023
-
[77]
A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[78]
C. Yang, X. Wang, Y . Lu, H. Liu, Q. V . Le, D. Zhou, and X. Chen. Large language models as optimizers. InThe Twelfth International Conference on Learning Representations, 2023
work page 2023
-
[79]
T. Yen, A. W. T. Siah, H. Chen, C. D. Guetta, T. Peng, and H. Namkoong. Data mixture optimization: A multi-fidelity multi-scale bayesian framework. InAdvances in Neural Information Processing Systems, 2025
work page 2025
-
[80]
Y . Zhang, A. Mohamed, H. Abdine, G. Shang, and M. Vazirgiannis. Beyond random sampling: Efficient language model pretraining via curriculum learning. InProceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5776–5794, 2026
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.