arxiv: 2605.07214 · v1 · submitted 2026-05-08 · 💻 cs.AI

Recognition: no theorem link

HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization

Yuping Yan , Jirui Han , Fei Ming , Yuanshuai Li , Yaochu Jin

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:45 UTC · model grok-4.3

classification 💻 cs.AI

keywords multi-agent collaborationheuristic evolutioncombinatorial optimizationlarge language modelsTSPbin packingautomated algorithm design

0 comments

The pith

Four specialized agents in a collaborative loop evolve better heuristics for combinatorial optimization using fewer LLM tokens.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that existing LLM approaches to heuristic design for NP-hard problems are limited by rigid templates and quick convergence to poor solutions, and that a structured multi-agent system overcomes this by dividing the work into distinct roles that explore, create, test, and remember promising ideas. A sympathetic reader would care because solving problems like routing deliveries or packing efficiently matters in logistics and manufacturing, and this method promises to automate the creation of strong solution methods while keeping the expensive LLM calls to a minimum. The central idea is that treating heuristic search as team organization rather than a single workflow leads to more varied and practical results across multiple problem types.

Core claim

HMACE reconceptualizes heuristic search as an organizational design problem by decomposing each evolutionary generation into an autonomous, role-specialized loop with four coordinated agents: a Proposer for strategy exploration, a Generator for executable heuristic synthesis, an Evaluator for empirical assessment, and a Reflector for archive-backed memory update. By coupling behavior-aware retrieval, lightweight candidate filtering, and fitness-grounded archive updates, HMACE guides the search toward diverse and promising heuristic behaviors while avoiding redundant evaluations. Evaluations on TSP, Online BPP, MKP, and PFSP show that HMACE achieves a favorable quality-efficiency trade-off,,

What carries the argument

The HMACE four-agent loop (Proposer, Generator, Evaluator, Reflector) together with behavior-aware retrieval and fitness-grounded archive updates, which organizes LLM calls to maintain diversity and reduce wasted evaluations during heuristic evolution.

If this is right

HMACE reaches the lowest reported average gaps of 0.464 percent on TSP and 0.223 percent on Online BPP.
It does so while using only 0.13 million tokens for TSP and 0.42 million for BPP, far below the compared baselines.
The same four-agent structure with retrieval and filtering extends to MKP and PFSP while preserving the quality-efficiency balance.
Behavior-aware retrieval and fitness-based updates prevent repeated evaluation of similar or weak candidates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same division of labor might help LLM agents in other code-generation settings where diversity is needed, such as generating scheduling rules or packing strategies for new domains.
Explicit archive reflection could be added to non-evolutionary multi-agent setups to cut token waste without changing the underlying model.
Testing the framework on much larger problem instances would reveal whether the token savings scale when solution spaces grow.

Load-bearing premise

The framework assumes that an LLM will consistently produce executable and varied heuristic code plus accurate performance reflections when given role-specific prompts and access to an archive of past results.

What would settle it

If HMACE-evolved heuristics on the standard TSP benchmark instances produce an average gap above 0.5 percent or consume more than 0.2 million tokens while a simpler baseline stays below those thresholds, the claimed quality-efficiency advantage would not hold.

Figures

Figures reproduced from arXiv: 2605.07214 by Fei Ming, Jirui Han, Yaochu Jin, Yuanshuai Li, Yuping Yan.

**Figure 2.** Figure 2: The overview of the HMACE framework. The proposer conditions on parent heuristics and [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Trajectory-level cost analysis on TSP con [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Best-so-far trajectories over generations for four tasks. Lower is better in all panels. The [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Anytime comparison under cumulative LLM-token budget on PFSP and MKP. Lower is [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: PCA projection of the behavior descriptors produced by evaluated heuristics on four tasks. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Archive-cell coverage over generations for the CVT-MAP-Elites memory. Higher is [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Visual summary of the ablation study from Table 4. Lower is better in both panels. The [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Estimated cost–quality scatter on PFSP, MKP, and CVRP. Lower is better on the y-axis. [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: Total LLM tokens at matched evaluator budget on PFSP, MKP, and CVRP. Lower is better. [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

read the original abstract

Large Language Models have recently emerged as a promising paradigm for automated heuristic design for NP-hard combinatorial optimization problems. Despite this progress, existing LLM-based methods typically rely on monolithic workflows constrained by rigid templates, thereby restricting memory-guided exploration and triggering premature convergence to local optima. To design an autonomous and collaborative architecture, we introduce HMACE, a Heterogeneous Multi-Agent Collaborative Evolution framework that reconceptualizes heuristic search as an organizational design problem. HMACE decomposes each evolutionary generation into an autonomous, role-specialized loop with four coordinated agents: a Proposer for strategy exploration, a Generator for executable heuristic synthesis, an Evaluator for empirical assessment, and a Reflector for archive-backed memory update. By coupling behavior-aware retrieval, lightweight candidate filtering, and fitness-grounded archive updates, HMACE guides the search toward diverse and promising heuristic behaviors while avoiding redundant evaluations. Extensive evaluations on representative COPs, including TSP, Online BPP, MKP, and PFSP, show that HMACE achieves a favorable quality-efficiency trade-off compared to state-of-the-art single-agent and multi-agent baselines. In the matched LLM-driven reference comparison, HMACE achieves the lowest average gaps on TSP and Online BPP (0.464\% and 0.223\%, respectively), while requiring only 0.13M and 0.42M tokens for the two tasks, substantially fewer than the compared baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HMACE's four-agent loop with archive memory and filtering gives a workable efficiency edge on TSP and BPP over prior LLM baselines.

read the letter

HMACE's main contribution is the decomposition of heuristic evolution into a coordinated loop of four role-specialized agents—Proposer for strategy ideas, Generator for executable code, Evaluator for empirical testing, and Reflector for archive-backed updates—plus lightweight candidate filtering and behavior-aware retrieval. This organizational template is presented as distinct from the monolithic or less structured multi-agent baselines cited, and the authors tie it directly to avoiding redundant evaluations and premature convergence through fitness-grounded archive updates. The reported results back the efficiency claim: lowest average gaps on TSP (0.464%) and Online BPP (0.223%) while using only 0.13M and 0.42M tokens, respectively, which is substantially below the compared methods. The framework rests on empirical comparisons rather than any circular parameter fitting, so the internal logic holds without obvious contradictions. The paper does a clean job spelling out how the archive stores behaviors and how retrieval is conditioned on past performance, which gives a concrete mechanism readers can implement or adapt. The soft spots are limited but real. The abstract leaves the experimental protocol thin—no mention of LLM run averaging, statistical tests, or exact baseline token accounting—so full verification depends on the manuscript details the stress-test says are present and consistent. If those controls are standard and the instance sets are representative, the quality-efficiency trade-off stands; otherwise the edge could shrink under different prompting or variance handling. This is aimed at researchers working on LLM-driven automated solver design for combinatorial optimization. Anyone already experimenting with multi-agent workflows for code or heuristic generation would find the specific roles and archive mechanics useful to try. It has enough novel structure and falsifiable empirical claims to deserve a serious referee, even if revisions will likely focus on reproducibility details.

Referee Report

0 major / 3 minor

Summary. The paper introduces HMACE, a Heterogeneous Multi-Agent Collaborative Evolution framework for automated heuristic design in combinatorial optimization using LLMs. It decomposes each evolutionary generation into a role-specialized loop with four agents (Proposer for strategy exploration, Generator for executable code synthesis, Evaluator for empirical assessment, and Reflector for archive-backed memory update), augmented by behavior-aware retrieval, lightweight filtering, and fitness-grounded archiving to promote diversity and avoid premature convergence. Experiments on TSP, Online BPP, MKP, and PFSP report that HMACE attains the lowest average gaps on TSP (0.464%) and Online BPP (0.223%) while consuming substantially fewer tokens (0.13M and 0.42M) than single-agent and multi-agent LLM baselines.

Significance. If the reported performance holds under the stated protocol, HMACE advances LLM-based automated algorithm design by reframing heuristic search as an organizational multi-agent problem. The quality-efficiency trade-off, achieved through explicit role specialization and memory mechanisms rather than monolithic prompting, offers a concrete template for reducing token costs while improving exploration on NP-hard problems. The framework's emphasis on executable heuristic synthesis and archive updates could generalize to other domains requiring iterative code generation and evaluation.

minor comments (3)

The abstract reports headline gaps and token counts but omits any mention of the number of instances, instance sizes, or statistical aggregation method (e.g., mean over 10 runs); adding one sentence summarizing the experimental protocol would improve readability for readers who stop at the abstract.
In the method description, the precise definition of 'behavior-aware retrieval' (e.g., which features of prior heuristics are embedded and how similarity is computed) is referenced but not formalized; a short pseudocode block or equation would clarify the mechanism that purportedly prevents redundant evaluations.
Table captions and axis labels in the experimental figures should explicitly state the LLM backbone and temperature setting used for all compared methods to allow direct replication of the token-consumption comparison.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of HMACE and the recommendation for minor revision. The summary accurately reflects the framework's decomposition into role-specialized agents, the use of behavior-aware retrieval and fitness-grounded archiving, and the reported gains in optimality gaps and token efficiency on TSP and Online BPP. We will incorporate any minor editorial or clarification changes requested in the revised manuscript.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces an empirical multi-agent LLM framework (HMACE) for heuristic evolution on combinatorial problems and supports its claims exclusively through experimental comparisons of solution gaps and token usage on TSP, Online BPP, MKP, and PFSP instances against single-agent and multi-agent baselines. No mathematical derivations, equations, or first-principles predictions exist that could reduce to fitted parameters, self-definitions, or self-citation chains. The reported results (0.464% TSP gap, 0.223% BPP gap, 0.13M/0.42M tokens) follow directly from the described procedural loop of Proposer/Generator/Evaluator/Reflector agents plus retrieval and archiving steps, which are externally replicable on the stated benchmarks without internal reduction to the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework depends on LLMs being capable of generating executable, evaluable heuristics and meaningful reflections from natural-language prompts; no free parameters or new entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Large language models can reliably produce executable code for combinatorial optimization heuristics when given role-specific prompts.
The Generator and Evaluator agents presuppose this capability for the loop to function.

pith-pipeline@v0.9.0 · 5555 in / 1168 out tokens · 43101 ms · 2026-05-11T01:45:52.055926+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 1 internal anchor

[1]

Llms can schedule,

Henrik Abgaryan, Ararat Harutyunyan, and Tristan Cazenave. Llms can schedule.arXiv preprint arXiv:2408.06993, 2024

work page arXiv 2024
[2]

arXiv preprint arXiv:2402.10172 , year =

Ali AhmadiTeshnizi, Wenzhi Gao, and Madeleine Udell. Optimus: Scalable optimization modeling with (mi) lp solvers and large language models.arXiv preprint arXiv:2402.10172, 2024

work page arXiv 2024
[3]

Lm4opt: Unveiling the potential of large language models in formulating mathematical optimization problems.INFOR: Information Systems and Operational Research, 62(4):559–572, 2024

Tasnim Ahmed and Salimur Choudhury. Lm4opt: Unveiling the potential of large language models in formulating mathematical optimization problems.INFOR: Information Systems and Operational Research, 62(4):559–572, 2024

work page 2024
[4]

Solving the traveling salesman problem with machine learning: a review of recent advances and challenges.Artificial Intelligence Review, 58(9):267, 2025

Entesar Alanzi and Mohamed El Bachir Menai. Solving the traveling salesman problem with machine learning: a review of recent advances and challenges.Artificial Intelligence Review, 58(9):267, 2025

work page 2025
[5]

Mapf-gpt: Imitation learning for multi-agent pathfinding at scale

Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, and Alexey Skrynnik. Mapf-gpt: Imitation learning for multi-agent pathfinding at scale. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 23126–23134, 2025

work page 2025
[6]

Princeton University Press, 2006

David L Applegate, Robert E Bixby, Vašek Chvátal, and William J Cook.The Traveling Salesman Problem: A Computational Study. Princeton University Press, 2006

work page 2006
[7]

Springer Science & Business Media, 2012

Giorgio Ausiello, Pierluigi Crescenzi, Giorgio Gambosi, Viggo Kann, Alberto Marchetti- Spaccamela, and Marco Protasi.Complexity and approximation: Combinatorial optimization problems and their approximability properties. Springer Science & Business Media, 2012

work page 2012
[8]

Neural Combinatorial Optimization with Reinforcement Learning

Irwan Bello, Hieu Pham, Quoc V Le, Mohammad Norouzi, and Samy Bengio. Neural combina- torial optimization with reinforcement learning.arXiv preprint arXiv:1611.09940, 2017

work page Pith review arXiv 2017
[9]

Dragon: Llm-driven decomposition and reconstruction agents for large-scale combinatorial optimization.arXiv preprint arXiv:2601.06502, 2026

Shengkai Chen, Zhiguang Cao, Jianan Zhou, Yaoxin Wu, Senthilnath Jayavelu, Zhuoyi Lin, Xiaoli Li, and Shili Xiang. Dragon: Llm-driven decomposition and reconstruction agents for large-scale combinatorial optimization.arXiv preprint arXiv:2601.06502, 2026

work page arXiv 2026
[10]

Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors

Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, et al. Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors. InThe Twelfth International Conference on Learning Representations, 2023

work page 2023
[11]

A genetic algorithm for the multidimensional knapsack problem

P C Chu and John E Beasley. A genetic algorithm for the multidimensional knapsack problem. Journal of Heuristics, 4(1):63–86, 1998

work page 1998
[12]

Approximation algorithms for bin packing: a survey

Edward G Coffman Jr, Michael R Garey, and David S Johnson. Approximation algorithms for bin packing: a survey. In Dorit S Hochbaum, editor,Approximation Algorithms for NP-hard Problems, pages 46–93. PWS Publishing, 1996

work page 1996
[13]

Large language models for combinatorial optimization: A systematic review.ACM Computing Surveys, 2025

Francesca Da Ros, Michael Soprano, Luca Di Gaspero, and Kevin Roitero. Large language models for combinatorial optimization: A systematic review.ACM Computing Surveys, 2025

work page 2025
[14]

Hsevo: Elevating automatic heuristic design with diversity-driven harmony search and genetic algorithm using llms

Pham Vu Tuan Dat, Long Doan, and Huynh Thi Thanh Binh. Hsevo: Elevating automatic heuristic design with diversity-driven harmony search and genetic algorithm using llms. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 26931–26938, 2025

work page 2025
[15]

A hybrid grouping genetic algorithm for bin packing.Journal of Heuristics, 2(1):5–30, 1996

Emmanuel Falkenauer. A hybrid grouping genetic algorithm for bin packing.Journal of Heuristics, 2(1):5–30, 1996

work page 1996
[16]

A functional heuristic algorithm for the flowshop scheduling problem

Jatinder N D Gupta. A functional heuristic algorithm for the flowshop scheduling problem. Operational Research Quarterly, 22(1):39–47, 1971

work page 1971
[17]

An effective implementation of the lin-kernighan traveling salesman heuristic

Keld Helsgaun. An effective implementation of the lin-kernighan traveling salesman heuristic. European Journal of Operational Research, 126(1):106–130, 2000

work page 2000
[18]

Metagpt: Meta programming for a multi-agent collaborative framework

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. Metagpt: Meta programming for a multi-agent collaborative framework. InThe twelfth international conference on learning representations, 2023. 10

work page 2023
[19]

How multimodal integration boost the performance of llm for optimization: Case study on capacitated vehicle routing problems

Yuxiao Huang, Wenjie Zhang, Liang Feng, Xingyu Wu, and Kay Chen Tan. How multimodal integration boost the performance of llm for optimization: Case study on capacitated vehicle routing problems. In2025 IEEE Symposium for Multidisciplinary Computational Intelligence Incubators (MCII), pages 1–7. IEEE, 2025

work page 2025
[20]

Llmopt: Learning to define and solve general optimization problems from scratch.arXiv preprint arXiv:2410.13213,

Caigao Jiang, Xiang Shu, Hong Qian, Xingyu Lu, Jun Zhou, Aimin Zhou, and Yang Yu. Llmopt: Learning to define and solve general optimization problems from scratch.arXiv preprint arXiv:2410.13213, 2024

work page arXiv 2024
[21]

Large language models as end-to-end combinatorial optimization solvers.arXiv preprint arXiv:2509.16865, 2025

Xia Jiang, Yaoxin Wu, Minshuo Li, Zhiguang Cao, and Yingqian Zhang. Large language models as end-to-end combinatorial optimization solvers.arXiv preprint arXiv:2509.16865, 2025

work page arXiv 2025
[22]

PhD thesis, Massachusetts Institute of Technology, 1973

David S Johnson.Near-optimal bin packing algorithms. PhD thesis, Massachusetts Institute of Technology, 1973

work page 1973
[23]

Attention, learn to solve routing problems! In International Conference on Learning Representations, 2019

Wouter Kool, Herke van Hoof, and Max Welling. Attention, learn to solve routing problems! In International Conference on Learning Representations, 2019

work page 2019
[24]

Pomo: Policy optimization with multiple optima for reinforcement learning

Yeong-Dae Kwon, Jinho Choo, Byung-In Kim, Iljoo Yoon, Youngjune Gwon, and Seungjai Min. Pomo: Policy optimization with multiple optima for reinforcement learning. InAdvances in Neural Information Processing Systems, volume 33, pages 21188–21198, 2020

work page 2020
[25]

Camel: Communicative agents for" mind" exploration of large language model society.Advances in neural information processing systems, 36:51991–52008, 2023

Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for" mind" exploration of large language model society.Advances in neural information processing systems, 36:51991–52008, 2023

work page 2023
[26]

AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts

Keyu Li, Junhao Shi, Yang Xiao, Mohan Jiang, Jie Sun, Yunze Wu, Shijie Xia, Xiaojie Cai, Tianze Xu, Weiye Si, et al. Agencybench: Benchmarking the frontiers of autonomous agents in 1m-token real-world contexts.arXiv preprint arXiv:2601.11044, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[27]

When single-agent with skills replace multi-agent systems and when they fail,

Xiaoxiao Li. When single-agent with skills replace multi-agent systems and when they fail. arXiv preprint arXiv:2601.04748, 2026

work page arXiv 2026
[28]

Evolution of heuristics: Towards efficient automatic algorithm design using large language model,

Fei Liu, Xialiang Tong, Mingxuan Yuan, Xi Lin, Fu Luo, Zhenkun Wang, Zhichao Lu, and Qingfu Zhang. Evolution of heuristics: Towards efficient automatic algorithm design using large language model.arXiv preprint arXiv:2401.02051, 2024

work page arXiv 2024
[29]

Agentbench: Evaluating llms as agents

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, et al. Agentbench: Evaluating llms as agents. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[30]

Experience-guided reflective co-evolution of prompts and heuristics for automatic algorithm design.arXiv preprint arXiv:2509.24509, 2025

Yihong Liu, Junyi Li, Wayne Xin Zhao, Hongyu Lu, and Ji-Rong Wen. Experience-guided reflective co-evolution of prompts and heuristics for automatic algorithm design.arXiv preprint arXiv:2509.24509, 2025

work page arXiv 2025
[31]

Neural combinatorial optimization with heavy decoder: Toward large-scale generalization

Fu Luo, Xi Lin, Fei Liu, Qingfu Zhang, and Zhenkun Wang. Neural combinatorial optimization with heavy decoder: Toward large-scale generalization. InAdvances in Neural Information Processing Systems, volume 36, pages 19769–19789, 2023

work page 2023
[32]

John Wiley & Sons, 1990

Silvano Martello and Paolo Toth.Knapsack Problems: Algorithms and Computer Implementa- tions. John Wiley & Sons, 1990

work page 1990
[33]

Illuminating search spaces by mapping elites

Jean-Baptiste Mouret and Jeff Clune. Illuminating search spaces by mapping elites.arXiv preprint arXiv:1504.04909, 2015

work page Pith review arXiv 2015
[34]

A heuristic algorithm for the m-machine, n-job flow-shop sequencing problem.Omega, 11(1):91–95, 1983

Muhammad Nawaz, E Emory Enscore Jr, and Inyong Ham. A heuristic algorithm for the m-machine, n-job flow-shop sequencing problem.Omega, 11(1):91–95, 1983

work page 1983
[35]

Courier Corporation, 1998

Christos H Papadimitriou and Kenneth Steiglitz.Combinatorial optimization: algorithms and complexity. Courier Corporation, 1998

work page 1998
[36]

Or-tools

Laurent Perron and Vincent Furnon. Or-tools. Google Optimization Tools, version 9.10, 2024

work page 2024
[37]

Chatdev: Communicative agents for software development

Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, et al. Chatdev: Communicative agents for software development. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15174–15186, 2024

work page 2024
[38]

2604.01658 , archivePrefix =

Ao Qu, Han Zheng, Zijian Zhou, Yihao Yan, Yihong Tang, Shao Yong Ong, Fenglu Hong, Kaichen Zhou, Chonghe Jiang, Minwei Kong, et al. Coral: Towards autonomous multi-agent evolution for open-ended discovery.arXiv preprint arXiv:2604.01658, 2026. 11

work page arXiv 2026
[39]

Experimental evaluation of heuristic optimization algorithms: A tutorial.Journal of Heuristics, 7(3):261–304, 2001

Ronald L Rardin and Reha Uzsoy. Experimental evaluation of heuristic optimization algorithms: A tutorial.Journal of Heuristics, 7(3):261–304, 2001

work page 2001
[40]

Tsplib—a traveling salesman problem library.ORSA Journal on Computing, 3(4):376–384, 1991

Gerhard Reinelt. Tsplib—a traveling salesman problem library.ORSA Journal on Computing, 3(4):376–384, 1991

work page 1991
[41]

Mathematical discoveries from program search with large language models

Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M Pawan Kumar, Emilien Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming Wang, Omar Fawzi, et al. Mathematical discoveries from program search with large language models. Nature, 625(7995):468–475, 2024

work page 2024
[42]

Generalizable heuristic generation through llms with meta-optimization

Yiding Shi, Jianan Zhou, Wen Song, Jieyi Bi, Yaoxin Wu, Zhiguang Cao, and Jie Zhang. Generalizable heuristic generation through llms with meta-optimization. InThe Fourteenth International Conference on Learning Representations, 2026

work page 2026
[43]

Nature inspired meta heuristic algorithms for optimization problems.Computing, 104(2):251–269, 2022

Vinod Chandra SS and Anand HS. Nature inspired meta heuristic algorithms for optimization problems.Computing, 104(2):251–269, 2022

work page 2022
[44]

Benchmarks for basic scheduling problems.European Journal of Operational Research, 64(2):278–285, 1993

Eric Taillard. Benchmarks for basic scheduling problems.European Journal of Operational Research, 64(2):278–285, 1993

work page 1993
[45]

Using centroidal voronoi tessellations to scale up the multidimensional archive of phenotypic elites algorithm

Vassilis Vassiliades, Konstantinos Chatzilygeroudis, and Jean-Baptiste Mouret. Using centroidal voronoi tessellations to scale up the multidimensional archive of phenotypic elites algorithm. IEEE Transactions on Evolutionary Computation, 22(4):623–630, 2018

work page 2018
[46]

Pointer networks

Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. InAdvances in Neural Information Processing Systems, volume 28, 2015

work page 2015
[47]

Autogen: Enabling next-gen llm applications via multi-agent conversations

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversations. InFirst conference on language modeling, 2024

work page 2024
[48]

Efficient heuristics generation for solving combinatorial optimization problems using large lan- guage models

Xuan Wu, Di Wang, Chunguo Wu, Lijie Wen, Chunyan Miao, Yubin Xiao, and You Zhou. Efficient heuristics generation for solving combinatorial optimization problems using large lan- guage models. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 3228–3239, 2025

work page 2025
[49]

Multi-objective evolution of heuristic using large language model

Shunyu Yao, Fei Liu, Xi Lin, Zhichao Lu, Zhenkun Wang, and Qingfu Zhang. Multi-objective evolution of heuristic using large language model. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 27144–27152, 2025

work page 2025
[50]

Reevo: Large language models as hyper-heuristics with reflective evolution.Advances in neural information processing systems, 37:43571–43608, 2024

Haoran Ye, Jiarui Wang, Zhiguang Cao, Federico Berto, Chuanbo Hua, Haeyeon Kim, Jinkyoo Park, and Guojie Song. Reevo: Large language models as hyper-heuristics with reflective evolution.Advances in neural information processing systems, 37:43571–43608, 2024

work page 2024
[51]

Monte carlo tree search for comprehensive explo- ration in llm-based automatic heuristic design,

Zhi Zheng, Zhuoliang Xie, Zhenkun Wang, and Bryan Hooi. Monte carlo tree search for compre- hensive exploration in llm-based automatic heuristic design.arXiv preprint arXiv:2501.08603, 2025. 12 A Supplementary Results and Additional Analysis This appendix is organized in the order most useful for readers of the main paper: we first report the additional e...

work page arXiv 2025