arxiv: 2605.06123 · v1 · submitted 2026-05-07 · 💻 cs.AI

Recognition: unknown

Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs

Nguyen Viet Tuan Kiet , Bui Dinh Pham , Dao Van Tung , Tran Cong Dao , Huynh Thi Thanh Binh

Authors on Pith no claims yet

Pith reviewed 2026-05-08 10:30 UTC · model grok-4.3

classification 💻 cs.AI

keywords automatic heuristic designlarge language modelscombinatorial optimizationknowledge representationtop-down searchheuristic discoveryLLM-based optimizationtransfer learning

0 comments

The pith

Treating high-level knowledge as the primary search target, with code only as an instantiation, improves efficiency, transfer, and generalization in automatic heuristic design.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current LLM-based methods for automatic heuristic design search over executable code and use execution feedback to guide refinements, a bottom-up process that leaves learned principles implicit. The paper introduces a complementary top-down view in which knowledge itself becomes the main search object while code serves only to test and apply it. This shift is formalized through a statistical-learning perspective that identifies a distortion-compression trade-off. The approach is implemented in both population-based and tree-based frameworks for combinatorial optimization and related tasks. Results indicate that knowledge-first search often outperforms pure code search, that learned knowledge transfers and generalizes better, and that combining the two strategies yields additional gains.

Core claim

By making knowledge the explicit primary search target in LLM-driven automatic heuristic design rather than searching code directly, the process yields heuristics that are discovered more efficiently, transfer more readily across problems and trajectories, and generalize better, with the strongest results obtained when knowledge-first and code-centric strategies are combined.

What carries the argument

The top-down paradigm that treats knowledge as the primary search object and code merely as its instantiation and test, formalized via a statistical-learning view that exposes a distortion-compression trade-off.

If this is right

Knowledge-first search improves discovery efficiency over code-centric pipelines.
Knowledge extracted during search transfers more effectively to new problem instances and search trajectories.
Generalization across tasks increases when the search explicitly targets reusable knowledge rather than implicit patterns in code.
Combining knowledge-first and code-centric strategies produces further performance improvements.
Sustainable progress in automatic heuristic design requires iteratively building and evolving interpretable hypotheses that retain value beyond a single trajectory.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hybrid systems that alternate between knowledge-level and code-level search could retain the reusability of explicit principles while still allowing fine-grained optimization.
The same knowledge-first framing may apply to other LLM-driven design tasks where explicit, reusable reasoning structures are more valuable than opaque code outputs.
Extracted knowledge could be tested for reuse in entirely new domains not encountered during the original search to measure its true generality.
Adopting knowledge as the primary object may lower the total computational cost of repeated heuristic searches by avoiding rediscovery of the same principles.

Load-bearing premise

LLMs can reliably propose, refine, and instantiate high-level knowledge in a way that produces measurably better heuristics than direct code search and that this knowledge remains reusable across different problems.

What would settle it

A controlled comparison on multiple combinatorial optimization benchmarks in which the knowledge-first method produces no measurable gains in discovery speed, transfer performance, or generalization relative to code-centric baselines would refute the central claim.

Figures

Figures reproduced from arXiv: 2605.06123 by Bui Dinh Pham, Dao Van Tung, Huynh Thi Thanh Binh, Nguyen Viet Tuan Kiet, Tran Cong Dao.

**Figure 1.** Figure 1: Comparison of bottom-up and top-down paradigms for AHD. (1a) ReEvo BU evolves view at source ↗

**Figure 2.** Figure 2: Left: Training trajectories of constructive heuristics on five CO tasks (mean over 5 runs). Lower is better. Right: Bottom-up (code-centric) and top-down (knowledge-centric) search pipelines view at source ↗

**Figure 3.** Figure 3: Optimality-gap trajectories during training for view at source ↗

**Figure 4.** Figure 4: Example TSP instances from four spatial distributions at sizes 100, 200, and 500. view at source ↗

**Figure 5.** Figure 5: Sensitivity to LLM backbones on TSP transfer tasks. We compare bottom-up and top-down view at source ↗

read the original abstract

Large language models (LLMs) have recently advanced automatic heuristic design (AHD) for combinatorial optimization (CO), where candidate heuristics are iteratively proposed, evaluated, and refined. Most existing approaches search over executable programs and distill insights from execution feedback to guide later iterations. Because this process moves from low-level implementations to high-level principles, we refer to it as a bottom-up paradigm. We argue that this view is incomplete and introduce a complementary top-down perspective: knowledge becomes the primary search object and code merely instantiates and tests it, making what is learned explicit and reusable across problems and trajectories. We formalize this shift through a statistical-learning view that exposes a distortion--compression trade-off, and instantiate it in both population-based and tree-based AHD frameworks. Across CO and tasks beyond it, knowledge-first search improves discovery efficiency, transfer, and generalization, often outperforming code-centric pipelines, while combining both strategies yields further gains. Our results suggest that progress in AHD depends on iteratively constructing and evolving interpretable hypotheses that retain value beyond a single search trajectory.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reframes LLM heuristic design around knowledge as the primary search target rather than code, with a statistical trade-off lens and some reported gains, but the reusability claim across trajectories rests on limited evidence.

read the letter

The main point is that this work pushes automatic heuristic design away from the usual bottom-up code search and toward a top-down process where LLMs first build and refine explicit knowledge, then turn it into code for testing. They treat the two as complementary and show that hybrids can do better than either alone on efficiency, transfer, and generalization in combinatorial optimization tasks and a few others.

Referee Report

3 major / 2 minor

Summary. The paper proposes a top-down paradigm for automatic heuristic design (AHD) in combinatorial optimization, positioning high-level knowledge as the primary search object for LLMs while code serves only to instantiate and test it. It formalizes the approach via a statistical-learning lens that exposes a distortion-compression trade-off, implements the idea in both population-based and tree-based AHD frameworks, and reports empirical improvements in discovery efficiency, transfer, and generalization over code-centric baselines, with additional gains from hybrid strategies.

Significance. If the reusability and superiority claims are substantiated with appropriate controls, the work could shift AHD research toward more interpretable and transferable knowledge representations, complementing existing bottom-up code-search methods. The explicit dual-framework instantiation and the trade-off formalization are constructive contributions that could aid future method design.

major comments (3)

[Experimental evaluation] The central claim that knowledge extracted via the top-down approach is reusable across distinct trajectories and problem distributions (as opposed to within-trajectory prompting improvements) is load-bearing for the generalization and transfer assertions in the abstract; the experimental design must explicitly test out-of-distribution instances and report cross-trajectory metrics to support this over a pure within-run prompting advantage.
[Formalization section] The distortion-compression trade-off is presented as the key formalization of the statistical-learning view, yet without visible equations or measurement protocols it is unclear how the two terms are quantified or optimized in the population-based and tree-based instantiations; this risks reducing the trade-off to an expository lens rather than a predictive or prescriptive tool.
[Results and baselines] Reported gains over code-centric pipelines may be confounded by unequal prompt-engineering effort or search budget; the evaluation must document and equalize these factors (or ablate them) and apply multiple-testing corrections, as the abstract's empirical claims cannot otherwise be verified as robust.

minor comments (2)

Define AHD and CO on first use in the abstract and introduction for accessibility.
[Instantiation subsections] Clarify in the methods how knowledge is represented (e.g., natural language hypotheses, structured templates) to enable reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which has helped clarify several aspects of our presentation and evaluation. We address each major comment below and have revised the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: [Experimental evaluation] The central claim that knowledge extracted via the top-down approach is reusable across distinct trajectories and problem distributions (as opposed to within-trajectory prompting improvements) is load-bearing for the generalization and transfer assertions in the abstract; the experimental design must explicitly test out-of-distribution instances and report cross-trajectory metrics to support this over a pure within-run prompting advantage.

Authors: We agree that distinguishing true cross-trajectory and out-of-distribution reusability from within-trajectory prompting effects is essential. In the revised manuscript we have added dedicated experiments that apply knowledge extracted from one trajectory to initialize independent searches on OOD instances drawn from different problem distributions. We now report explicit cross-trajectory metrics (e.g., success rate and efficiency gains when transferring knowledge across separate runs) that demonstrate benefits beyond single-trajectory prompting. revision: yes
Referee: [Formalization section] The distortion-compression trade-off is presented as the key formalization of the statistical-learning view, yet without visible equations or measurement protocols it is unclear how the two terms are quantified or optimized in the population-based and tree-based instantiations; this risks reducing the trade-off to an expository lens rather than a predictive or prescriptive tool.

Authors: The referee is correct that the original description remained largely conceptual. We have expanded the formalization section with explicit equations defining distortion (as the expected performance degradation from knowledge abstraction) and compression (as the reduction in effective search-space size), together with concrete measurement protocols that are applied to both the population-based and tree-based instantiations. These additions make the trade-off directly usable for guiding design choices. revision: yes
Referee: [Results and baselines] Reported gains over code-centric pipelines may be confounded by unequal prompt-engineering effort or search budget; the evaluation must document and equalize these factors (or ablate them) and apply multiple-testing corrections, as the abstract's empirical claims cannot otherwise be verified as robust.

Authors: We acknowledge the importance of controlling for prompt-engineering effort and search budget. The revised experimental section now documents the precise prompt templates and iteration budgets used for every method, enforces equalized budgets across top-down and code-centric pipelines, includes ablations that vary prompt-engineering intensity, and applies Bonferroni-corrected statistical tests to all reported comparisons. These changes confirm that the observed advantages remain robust under controlled conditions. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper advances a conceptual distinction between bottom-up code-centric and top-down knowledge-first heuristic design, formalizes the latter via a statistical-learning perspective exposing a distortion-compression trade-off, and reports empirical gains in efficiency, transfer, and generalization across CO tasks. No equations, fitted parameters, or self-citations appear in the provided text that would reduce any central claim to a tautology or input by construction. The argument rests on the instantiation of the perspective in population- and tree-based frameworks plus experimental outcomes, remaining self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper rests on domain assumptions about LLM capability for knowledge proposal and introduces an ad-hoc statistical trade-off; no free parameters or new physical entities are visible from the abstract.

axioms (2)

domain assumption LLMs can generate and iteratively refine high-level, reusable knowledge for heuristic design that is more effective than direct code generation
Central premise of the top-down paradigm stated in the abstract.
ad hoc to paper A distortion-compression trade-off governs the value of learned knowledge in AHD
Introduced as the formalization of the new perspective.

pith-pipeline@v0.9.0 · 5503 in / 1219 out tokens · 42032 ms · 2026-05-08T10:30:28.738457+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

118 extracted references · 15 canonical work pages · 3 internal anchors

[1]

Evolution of heuristics: Towards efficient automatic algorithm design using large language model

Fei Liu, Tong Xialiang, Mingxuan Yuan, Xi Lin, Fu Luo, Zhenkun Wang, Zhichao Lu, and Qingfu Zhang. Evolution of heuristics: Towards efficient automatic algorithm design using large language model. InInternational Conference on Machine Learning, pages 32201–32223. PMLR, 2024

2024
[2]

AlphaEvolve: A coding agent for scientific and algorithmic discovery

Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. Alphaevolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131, 2025

work page internal anchor Pith review arXiv 2025
[3]

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, et al. Gepa: Reflective prompt evolution can outperform reinforcement learning.arXiv preprint arXiv:2507.19457, 2025

work page internal anchor Pith review arXiv 2025
[4]

Eureka: Human-level reward design via coding large language models

Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi Fan, and Anima Anandkumar. Eureka: Human-level reward design via coding large language models. InThe Twelfth International Conference on Learning Representations, 2021

2021
[5]

Automatically learning hybrid digital twins of dynamical systems

Samuel Holt, Tennison Liu, and Mihaela van der Schaar. Automatically learning hybrid digital twins of dynamical systems. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URLhttps://openreview.net/forum?id=SOsiObSdU2

2024
[6]

Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, and Chandan K. Reddy. LLM-SR: Scientific equation discovery via programming with large language models. InThe Thirteenth International Conference on Learning Representations, 2025. URL https: //openreview.net/forum?id=m2nmp8P5in

2025
[7]

A survey on large language models for code generation.ACM Transactions on Software Engineering and Methodology, 35 (2):1–72, 2026

Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models for code generation.ACM Transactions on Software Engineering and Methodology, 35 (2):1–72, 2026

2026
[8]

Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

1901
[9]

Large language models are zero-shot reasoners.Advances in neural information processing systems, 35:22199–22213, 2022

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large language models are zero-shot reasoners.Advances in neural information processing systems, 35:22199–22213, 2022

2022
[10]

Reevo: Large language models as hyper-heuristics with reflective evolution

Haoran Ye, Jiarui Wang, Zhiguang Cao, Federico Berto, Chuanbo Hua, Haeyeon Kim, Jinkyoo Park, and Guojie Song. Reevo: Large language models as hyper-heuristics with reflective evolution. InAdvances in Neural Information Processing Systems, 2024. https://github. com/ai4co/reevo

2024
[11]

Dorigo and L.M

M. Dorigo and L.M. Gambardella. Ant colony system: a cooperative learning approach to the traveling salesman problem.IEEE Transactions on Evolutionary Computation, 1(1):53–66,
[12]

doi: 10.1109/4235.585892

work page doi:10.1109/4235.585892
[13]

Christos V oudouris and Edward P. K. Tsang.Guided Local Search, pages 185–218. Springer US, Boston, MA, 2003. ISBN 978-0-306-48056-0. doi: 10.1007/0-306-48056-5_7. URL https://doi.org/10.1007/0-306-48056-5_7

work page doi:10.1007/0-306-48056-5_7 2003
[14]

Large language model-driven large neigh- borhood search for large-scale MILP problems

Huigen Ye, Hua Xu, An Yan, and Yaoyang Cheng. Large language model-driven large neigh- borhood search for large-scale MILP problems. InForty-second International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=teUg2pMrF0

2025
[15]

Pomo: Policy optimization with multiple optima for reinforcement learning.Advances in neural information processing systems, 33:21188–21198, 2020

Yeong-Dae Kwon, Jinho Choo, Byoungjip Kim, Iljoo Yoon, Youngjune Gwon, and Seungjai Min. Pomo: Policy optimization with multiple optima for reinforcement learning.Advances in neural information processing systems, 33:21188–21198, 2020. 10

2020
[16]

Algorithm evolution using large language model.arXiv preprint arXiv:2311.15249, 2023

Fei Liu, Xialiang Tong, Mingxuan Yuan, and Qingfu Zhang. Algorithm evolution using large language model.arXiv preprint arXiv:2311.15249, 2023

work page arXiv 2023
[17]

Hifo-prompt: Prompting with hindsight and foresight for LLM-based automatic heuristic design

ChentongChen, Mengyuan Zhong, Jialong Shi, Jianyong Sun, and Ye Fan. Hifo-prompt: Prompting with hindsight and foresight for LLM-based automatic heuristic design. InThe Fourteenth International Conference on Learning Representations, 2026. URL https:// openreview.net/forum?id=imSLzfZ6av

2026
[18]

Monte carlo tree search for comprehensive exploration in llm-based automatic heuristic design

Zhi Zheng, Zhuoliang Xie, Zhenkun Wang, and Bryan Hooi. Monte carlo tree search for comprehensive exploration in llm-based automatic heuristic design. InInternational Conference on Machine Learning, pages 78338–78373. PMLR, 2025

2025
[19]

Motif: Multi-strategy optimization via turn-based interactive framework

Nguyen Viet Tuan Kiet, Dao Van Tung, Tran Cong Dao, and Huynh Thi Thanh Binh. Motif: Multi-strategy optimization via turn-based interactive framework. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), Singapore, January 2026. Oral Presentation

2026
[20]

Generalizable heuristic generation through LLMs with meta-optimization

Yiding Shi, Jianan Zhou, Wen Song, Jieyi Bi, Yaoxin Wu, Zhiguang Cao, and Jie Zhang. Generalizable heuristic generation through LLMs with meta-optimization. InThe Fourteenth International Conference on Learning Representations, 2026. URL https://openreview. net/forum?id=tIQZ7pVN6S

2026
[21]

CALM: Co-evolution of algorithms and language model for automatic heuristic design

Ziyao Huang, Weiwei Wu, Kui Wu, Wei-Bin Lee, and Jianping Wang. CALM: Co-evolution of algorithms and language model for automatic heuristic design. InThe Fourteenth International Conference on Learning Representations, 2026. URL https://openreview.net/forum? id=x6bG2Hoqdf

2026
[22]

Hyper-heuristics: A survey of the state of the art.Journal of the Operational Research Society, 64(12):1695–1724, 2013

Edmund K Burke, Michel Gendreau, Matthew Hyde, Graham Kendall, Gabriela Ochoa, Ender Özcan, and Rong Qu. Hyper-heuristics: A survey of the state of the art.Journal of the Operational Research Society, 64(12):1695–1724, 2013

2013
[23]

Exploring hyper-heuristic methodologies with genetic programming

Edmund K Burke, Mathew R Hyde, Graham Kendall, Gabriela Ochoa, Ender Ozcan, and John R Woodward. Exploring hyper-heuristic methodologies with genetic programming. In Computational intelligence: Collaboration, fusion and emergence, pages 177–201. Springer, 2009

2009
[24]

Mathematical discoveries from program search with large language models

Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M Pawan Kumar, Emilien Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming Wang, Omar Fawzi, et al. Mathematical discoveries from program search with large language models. Nature, 625(7995):468–475, 2024

2024
[25]

Hsevo: Elevating automatic heuristic design with diversity-driven harmony search and genetic algorithm using llms

Pham Vu Tuan Dat, Long Doan, and Huynh Thi Thanh Binh. Hsevo: Elevating automatic heuristic design with diversity-driven harmony search and genetic algorithm using llms. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 26931–26938, 2025

2025
[26]

Efficient heuristics generation for solving combinatorial optimization problems using large lan- guage models

Xuan Wu, Di Wang, Chunguo Wu, Lijie Wen, Chunyan Miao, Yubin Xiao, and You Zhou. Efficient heuristics generation for solving combinatorial optimization problems using large lan- guage models. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 3228–3239, 2025

2025
[27]

Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schu- urmans, Claire Cui, Olivier Bousquet, Quoc V Le, and Ed H. Chi. Least-to-most prompting enables complex reasoning in large language models. InThe Eleventh International Con- ference on Learning Representations, 2023. URL https://openreview.net/forum?id= WZH7099tgfM

2023
[28]

Parsel: Algo- rithmic reasoning with language models by composing decompositions.Advances in Neural Information Processing Systems, 36:31466–31523, 2023

Eric Zelikman, Qian Huang, Gabriel Poesia, Noah Goodman, and Nick Haber. Parsel: Algo- rithmic reasoning with language models by composing decompositions.Advances in Neural Information Processing Systems, 36:31466–31523, 2023

2023
[29]

Tenenbaum, and Chuang Gan

Shun Zhang, Zhenfang Chen, Yikang Shen, Mingyu Ding, Joshua B. Tenenbaum, and Chuang Gan. Planning with large language models for code generation. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum? id=Lr8cOOtYbfL. 11

2023
[30]

Codeplan: Unlocking reasoning potential in large language models by scaling code-form planning

Jiaxin Wen, Jian Guan, Hongning Wang, Wei Wu, and Minlie Huang. Codeplan: Unlocking reasoning potential in large language models by scaling code-form planning. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview. net/forum?id=dCPF1wlqj8

2025
[31]

Reasoning-as-logic-units: Scaling test-time reasoning in large language models through logic unit alignment

Cheryl Li, Tianyuan Xu, and Steven Y Guo. Reasoning-as-logic-units: Scaling test-time reasoning in large language models through logic unit alignment. InInternational Conference on Machine Learning, pages 36530–36550. PMLR, 2025

2025
[32]

AutoEP: LLMs-driven automation of hyperparameter evolution for meta- heuristic algorithms

Zhenxing Xu, Yizhe Zhang, Weidong Bao, Hao Wang, Ming Chen, Haoran Ye, Wenzheng Jiang, Hui Yan, and Ji Wang. AutoEP: LLMs-driven automation of hyperparameter evolution for meta- heuristic algorithms. InThe Fourteenth International Conference on Learning Representations,
[33]

URLhttps://openreview.net/forum?id=hit3hGBheP
[34]

Eoh-s: Evolution of heuristic set using llms for automated heuristic design

Fei Liu, Yilu Liu, Qingfu Zhang, Tong Xialiang, and Mingxuan Yuan. Eoh-s: Evolution of heuristic set using llms for automated heuristic design. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 37090–37098, 2026

2026
[35]

Papadimitriou and K

C.H. Papadimitriou and K. Steiglitz.Combinatorial Optimization: Algorithms and Complexity. Dover Books on Computer Science. Dover Publications, 1998. ISBN 9780486402581. URL https://books.google.com.vn/books?id=cDY-joeCGoIC

1998
[36]

Wolsey and G.L

L.A. Wolsey and G.L. Nemhauser.Integer and Combinatorial Optimization. Wiley Series in Discrete Mathematics and Optimization. Wiley, 1999. ISBN 9780471359432. URL https: //books.google.com.vn/books?id=vvm4DwAAQBAJ

1999
[37]

Karp.Reducibility among Combinatorial Problems, pages 85–103

Richard M. Karp.Reducibility among Combinatorial Problems, pages 85–103. Springer US, Boston, MA, 1972. ISBN 978-1-4684-2001-2. doi: 10.1007/978-1-4684-2001-2_9. URL https://doi.org/10.1007/978-1-4684-2001-2_9

work page doi:10.1007/978-1-4684-2001-2_9 1972
[38]

Pointer networks

Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors,Advances in Neural Information Processing Sys- tems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/ paper_files/paper/2015/file/29921001f2f04bd3baee84a12e98098f-Paper.pdf

work page arXiv 2015
[39]

Le, Mohammad Norouzi, and Samy Bengio

Irwan Bello*, Hieu Pham*, Quoc V . Le, Mohammad Norouzi, and Samy Bengio. Neural combinatorial optimization with reinforcement learning, 2017. URL https://openreview. net/forum?id=rJY3vK9eg

2017
[40]

Learning combinatorial optimization algorithms over graphs.Advances in neural information processing systems, 30, 2017

Elias Khalil, Hanjun Dai, Yuyu Zhang, Bistra Dilkina, and Le Song. Learning combinatorial optimization algorithms over graphs.Advances in neural information processing systems, 30, 2017

2017
[41]

Attention, learn to solve routing problems! In International Conference on Learning Representations, 2019

Wouter Kool, Herke van Hoof, and Max Welling. Attention, learn to solve routing problems! In International Conference on Learning Representations, 2019. URL https://openreview. net/forum?id=ByxBFsRqYm

2019
[42]

Paramils: an automatic algorithm configuration framework.Journal of artificial intelligence research, 36:267–306, 2009

Frank Hutter, Holger H Hoos, Kevin Leyton-Brown, and Thomas Stützle. Paramils: an automatic algorithm configuration framework.Journal of artificial intelligence research, 36:267–306, 2009

2009
[43]

Hoos, and Kevin Leyton-Brown

Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In Carlos A. Coello Coello, editor,Learning and Intelligent Optimization, pages 507–523, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg. ISBN 978-3-642-25566-3

2011
[44]

The application of Bayesian methods for seeking the extremum.Towards Global Optimization, 2(117-129):2, 1978

Jonas Mockus, Vytautas Tiesis, and Antanas Zilinskas. The application of Bayesian methods for seeking the extremum.Towards Global Optimization, 2(117-129):2, 1978

1978
[45]

Jones, Matthias Schonlau, and William J

Donald R. Jones, Matthias Schonlau, and William J. Welch. Efficient global optimization of expensive black-box functions.J. Global Optimization, 13(4):455–492, 1998. URL http: //dblp.uni-trier.de/db/journals/jgo/jgo13.html#JonesSW98. 12

1998
[46]

Gaussian process optimization in the bandit setting: no regret and experimental design

Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: no regret and experimental design. InProceedings of the 27th International Conference on International Conference on Machine Learning, pages 1015–1022, 2010

2010
[47]

Practical bayesian optimization of machine learning algorithms.Advances in neural information processing systems, 25, 2012

Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms.Advances in neural information processing systems, 25, 2012

2012
[48]

Shahriari, K

Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P. Adams, and Nando de Freitas. Taking the human out of the loop: A review of bayesian optimization.Proceedings of the IEEE, 104 (1):148–175, 2016. doi: 10.1109/JPROC.2015.2494218

work page doi:10.1109/jproc.2015.2494218 2016
[49]

Large language models to enhance bayesian optimization

Tennison Liu, Nicolás Astorga, Nabeel Seedat, and Mihaela van der Schaar. Large language models to enhance bayesian optimization. InThe Twelfth International Conference on Learning Representations, 2024. URLhttps://openreview.net/forum?id=OOxotBmGol

2024
[50]

InstructZero: Efficient instruction optimization for black-box large language models

Lichang Chen, Jiuhai Chen, Tom Goldstein, Heng Huang, and Tianyi Zhou. InstructZero: Efficient instruction optimization for black-box large language models. In Ruslan Salakhutdi- nov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Lear...

2024
[51]

Searching for optimal solutions with llms via bayesian optimization

Dhruv Agarwal, Manoj Ghuhan Arivazhagan, Rajarshi Das, Sandesh Swamy, Sopan Khosla, and Rashmi Gangadharaiah. Searching for optimal solutions with llms via bayesian optimization. In Y . Yue, A. Garg, N. Peng, F. Sha, and R. Yu, edi- tors,International Conference on Learning Representations, volume 2025, pages 67180– 67201, 2025. URL https://proceedings.ic...

2025
[52]

Hyperband-based bayesian optimization for black-box prompt selection

Lennart Schneider, Martin Wistuba, Aaron Klein, Jacek Golebiowski, Giovanni Zappella, and Felice Antonio Merra. Hyperband-based bayesian optimization for black-box prompt selection. InForty-second International Conference on Machine Learning, 2025. URL https: //openreview.net/forum?id=Lm9DXFrcHD

2025
[53]

Evolutionary com- putation in the era of large language model: Survey and roadmap.IEEE Transactions on Evolutionary Computation, 29(2):534–554, 2024

Xingyu Wu, Sheng-hao Wu, Jibin Wu, Liang Feng, and Kay Chen Tan. Evolutionary com- putation in the era of large language model: Survey and roadmap.IEEE Transactions on Evolutionary Computation, 29(2):534–554, 2024

2024
[54]

Algorithm discovery with LLMs: Evolutionary search meets reinforcement learning

Anja Surina, Amin Mansouri, Lars Quaedvlieg, Amal Seddas, Maryna Viazovska, Emmanuel Abbe, and Caglar Gulcehre. Algorithm discovery with llms: Evolutionary search meets reinforcement learning.arXiv preprint arXiv:2504.05108, 2025

work page arXiv 2025
[55]

A systematic survey on large language models for algorithm design.ACM Computing Surveys, 58(8):1–32, 2026

Fei Liu, Yiming Yao, Ping Guo, Zhiyuan Yang, Xi Lin, Zhe Zhao, Xialiang Tong, Kun Mao, Zhichao Lu, Zhenkun Wang, et al. A systematic survey on large language models for algorithm design.ACM Computing Surveys, 58(8):1–32, 2026

2026
[56]

Llamea: A large language model evolutionary algorithm for automatically generating metaheuristics.IEEE Transactions on Evolutionary Computation, 29 (2):331–345, 2024

Niki Van Stein and Thomas Bäck. Llamea: A large language model evolutionary algorithm for automatically generating metaheuristics.IEEE Transactions on Evolutionary Computation, 29 (2):331–345, 2024

2024
[57]

Scientific algorithm discovery by augmenting AlphaEvolve with Deep Research.arXiv preprint arXiv:2510.06056, 2025

Gang Liu, Yihan Zhu, Jie Chen, and Meng Jiang. Scientific algorithm discovery by augmenting alphaevolve with deep research.arXiv preprint arXiv:2510.06056, 2025

work page arXiv 2025
[58]

Dspy: compiling declarative language model calls into state-of-the-art pipelines

Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Saiful Haq, Ashutosh Sharma, Thomas T Joshi, Hanna Moazam, Heather Miller, et al. Dspy: compiling declarative language model calls into state-of-the-art pipelines. InThe Twelfth International Conference on Learning Representations, 2023

2023
[59]

Optimizing instructions and demonstrations for multi-stage language model programs

Krista Opsahl-Ong, Michael J Ryan, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, and Omar Khattab. Optimizing instructions and demonstrations for multi-stage language model programs. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 9340–9366, 2024. 13

2024
[60]

Llm-sr: Scientific equation discovery via programming with large language models

Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, and Chandan K Reddy. Llm-sr: Scientific equation discovery via programming with large language models. In The Thirteenth International Conference on Learning Representations, 2025

2025
[61]

Symbolic regression with a learned concept library.Advances in Neural Information Processing Systems, 37:44678–44709, 2024

Arya Grayeli, Atharva Sehgal, Omar Costilla-Reyes, Miles Cranmer, and Swarat Chaudhuri. Symbolic regression with a learned concept library.Advances in Neural Information Processing Systems, 37:44678–44709, 2024

2024
[62]

Coevo: Continual evolution of symbolic solutions using large language models

Ping Guo, Qingfu Zhang, and Xi Lin. Coevo: Continual evolution of symbolic solutions using large language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 1810–1818, 2026

2026
[63]

Sr-scientist: Scientific equation discovery with agentic ai, 2025

Shijie Xia, Yuhan Sun, and Pengfei Liu. Sr-scientist: Scientific equation discovery with agentic ai, 2025. URLhttps://arxiv.org/abs/2510.11661

work page arXiv 2025
[64]

Drsr: Llm based scientific equation discovery with dual reasoning from data and experience.arXiv preprint arXiv:2506.04282, 2025

Runxiang Wang, Boxiao Wang, Kai Li, Yifan Zhang, and Jian Cheng. Drsr: Llm based scientific equation discovery with dual reasoning from data and experience.arXiv preprint arXiv:2506.04282, 2025

work page arXiv 2025
[65]

Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

Miles Cranmer. Interpretable machine learning for science with pysr and symbolicregression. jl. arXiv preprint arXiv:2305.01582, 2023

work page internal anchor Pith review arXiv 2023
[66]

Solving olympiad geometry without human demonstrations.Nature, 625(7995):476–482, 2024

Trieu H Trinh, Yuhuai Wu, Quoc V Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations.Nature, 625(7995):476–482, 2024

2024
[67]

Evaluating language models for mathematics through interactions.Proceedings of the National Academy of Sciences, 121(24):e2318124121, 2024

Katherine M Collins, Albert Q Jiang, Simon Frieder, Lionel Wong, Miri Zilka, Umang Bhatt, Thomas Lukasiewicz, Yuhuai Wu, Joshua B Tenenbaum, William Hart, et al. Evaluating language models for mathematics through interactions.Proceedings of the National Academy of Sciences, 121(24):e2318124121, 2024

2024
[68]

Litsearch: A retrieval benchmark for scientific literature search

Anirudh Ajith, Mengzhou Xia, Alexis Chevalier, Tanya Goyal, Danqi Chen, and Tianyu Gao. Litsearch: A retrieval benchmark for scientific literature search. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 15068–15083, 2024

2024
[69]

Mlagentbench: evaluating language agents on machine learning experimentation

Qian Huang, Jian V ora, Percy Liang, and Jure Leskovec. Mlagentbench: evaluating language agents on machine learning experimentation. InProceedings of the 41st International Confer- ence on Machine Learning, pages 20271–20309, 2024

2024
[70]

Scicode: A research coding benchmark curated by scientists.Advances in Neural Information Processing Systems, 37:30624–30650, 2024

Minyang Tian, Luyu Gao, Shizhuo D Zhang, Xinan Chen, Cunwei Fan, Xuefei Guo, Roland Haas, Pan Ji, Kittithat Krongchon, Yao Li, et al. Scicode: A research coding benchmark curated by scientists.Advances in Neural Information Processing Systems, 37:30624–30650, 2024

2024
[71]

Scimon: Scientific inspiration machines optimized for novelty

Qingyun Wang, Doug Downey, Heng Ji, and Tom Hope. Scimon: Scientific inspiration machines optimized for novelty. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 279–299, 2024

2024
[72]

Can llms generate novel research ideas? a large-scale human study with 100+ nlp researchers

Chenglei Si, Diyi Yang, and Tatsunori Hashimoto. Can llms generate novel research ideas? a large-scale human study with 100+ nlp researchers. InThe Thirteenth International Conference on Learning Representations, 2025

2025
[73]

Researchagent: Iterative research idea generation over scientific literature with large language models

Jinheon Baek, Sujay Kumar Jauhar, Silviu Cucerzan, and Sung Ju Hwang. Researchagent: Iterative research idea generation over scientific literature with large language models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pa...

2025
[74]

Simonton, D

Chenglei Si, Tatsunori Hashimoto, and Diyi Yang. The ideation-execution gap: Execution outcomes of llm-generated versus human research ideas.arXiv preprint arXiv:2506.20803, 2025

work page arXiv 2025
[75]

Leverag- ing large language models for predictive chemistry.Nature Machine Intelligence, 6(2):161–169, 2024

Kevin Maik Jablonka, Philippe Schwaller, Andres Ortega-Guerrero, and Berend Smit. Leverag- ing large language models for predictive chemistry.Nature Machine Intelligence, 6(2):161–169, 2024. 14

2024
[76]

Lu, Bruce Wittmann, Kadina E

Kieran Didi, Sarah Alamdari, Alex X. Lu, Bruce Wittmann, Kadina E. Johnston, Ava P. Amini, Ali Madani, Maya Czeneszew, Christian Dallago, and Kevin K. Yang. Flip2: Expanding protein fitness landscape benchmarks for real-world machine learning applications.bioRxiv, 2026. doi: 10.64898/2026.02.23.707496. URL https://www.biorxiv.org/content/early/2026/ 02/26...

work page doi:10.64898/2026.02.23.707496 2026
[77]

Deepaco: Neural-enhanced ant systems for combinatorial optimization.Advances in neural information processing systems, 36:43706–43728, 2023

Haoran Ye, Jiarui Wang, Zhiguang Cao, Helan Liang, and Yong Li. Deepaco: Neural-enhanced ant systems for combinatorial optimization.Advances in neural information processing systems, 36:43706–43728, 2023

2023
[78]

Difusco: Graph-based diffusion solvers for combinatorial optimization.Advances in neural information processing systems, 36:3706–3731, 2023

Zhiqing Sun and Yiming Yang. Difusco: Graph-based diffusion solvers for combinatorial optimization.Advances in neural information processing systems, 36:3706–3731, 2023. 15 Appendix Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs Table of Contents A Q&A 17 B Related Works 19 B.1 LLMs for CO . . . . . . . . . . . . . . . . ...

2023
[79]

establishes the central role of NP-completeness for many canonical CO problems. Because exact optimization can be computationally prohibitive at scale, practical CO has long relied on approximation algorithms, local search, metaheuristics, and problem-specific heuristics to obtain high-quality solutions under limited computational budgets. Machine learnin...
[80]

[39] substantially improves neural construction policies for routing problems

learns greedy policies for graph optimization problems such as minimum vertex cover, maximum cut, and TSP, while the attention model of Kool et al. [39] substantially improves neural construction policies for routing problems. These methods replace hand-engineered decision rules with learned policies, but they typically require task-specific training, arc...

Showing first 80 references.