pith. machine review for the scientific record. sign in

arxiv: 2605.07214 · v1 · submitted 2026-05-08 · 💻 cs.AI

Recognition: no theorem link

HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:45 UTC · model grok-4.3

classification 💻 cs.AI
keywords multi-agent collaborationheuristic evolutioncombinatorial optimizationlarge language modelsTSPbin packingautomated algorithm design
0
0 comments X

The pith

Four specialized agents in a collaborative loop evolve better heuristics for combinatorial optimization using fewer LLM tokens.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that existing LLM approaches to heuristic design for NP-hard problems are limited by rigid templates and quick convergence to poor solutions, and that a structured multi-agent system overcomes this by dividing the work into distinct roles that explore, create, test, and remember promising ideas. A sympathetic reader would care because solving problems like routing deliveries or packing efficiently matters in logistics and manufacturing, and this method promises to automate the creation of strong solution methods while keeping the expensive LLM calls to a minimum. The central idea is that treating heuristic search as team organization rather than a single workflow leads to more varied and practical results across multiple problem types.

Core claim

HMACE reconceptualizes heuristic search as an organizational design problem by decomposing each evolutionary generation into an autonomous, role-specialized loop with four coordinated agents: a Proposer for strategy exploration, a Generator for executable heuristic synthesis, an Evaluator for empirical assessment, and a Reflector for archive-backed memory update. By coupling behavior-aware retrieval, lightweight candidate filtering, and fitness-grounded archive updates, HMACE guides the search toward diverse and promising heuristic behaviors while avoiding redundant evaluations. Evaluations on TSP, Online BPP, MKP, and PFSP show that HMACE achieves a favorable quality-efficiency trade-off,,

What carries the argument

The HMACE four-agent loop (Proposer, Generator, Evaluator, Reflector) together with behavior-aware retrieval and fitness-grounded archive updates, which organizes LLM calls to maintain diversity and reduce wasted evaluations during heuristic evolution.

If this is right

  • HMACE reaches the lowest reported average gaps of 0.464 percent on TSP and 0.223 percent on Online BPP.
  • It does so while using only 0.13 million tokens for TSP and 0.42 million for BPP, far below the compared baselines.
  • The same four-agent structure with retrieval and filtering extends to MKP and PFSP while preserving the quality-efficiency balance.
  • Behavior-aware retrieval and fitness-based updates prevent repeated evaluation of similar or weak candidates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same division of labor might help LLM agents in other code-generation settings where diversity is needed, such as generating scheduling rules or packing strategies for new domains.
  • Explicit archive reflection could be added to non-evolutionary multi-agent setups to cut token waste without changing the underlying model.
  • Testing the framework on much larger problem instances would reveal whether the token savings scale when solution spaces grow.

Load-bearing premise

The framework assumes that an LLM will consistently produce executable and varied heuristic code plus accurate performance reflections when given role-specific prompts and access to an archive of past results.

What would settle it

If HMACE-evolved heuristics on the standard TSP benchmark instances produce an average gap above 0.5 percent or consume more than 0.2 million tokens while a simpler baseline stays below those thresholds, the claimed quality-efficiency advantage would not hold.

Figures

Figures reproduced from arXiv: 2605.07214 by Fei Ming, Jirui Han, Yaochu Jin, Yuanshuai Li, Yuping Yan.

Figure 1
Figure 1. Figure 1: Architectural progression of LLM-based solvers for COPs. (1) Fixed Evolutionary Search: [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The overview of the HMACE framework. The proposer conditions on parent heuristics and [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Trajectory-level cost analysis on TSP con [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Best-so-far trajectories over generations for four tasks. Lower is better in all panels. The [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Anytime comparison under cumulative LLM-token budget on PFSP and MKP. Lower is [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: PCA projection of the behavior descriptors produced by evaluated heuristics on four tasks. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Archive-cell coverage over generations for the CVT-MAP-Elites memory. Higher is [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visual summary of the ablation study from Table 4. Lower is better in both panels. The [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Estimated cost–quality scatter on PFSP, MKP, and CVRP. Lower is better on the y-axis. [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Total LLM tokens at matched evaluator budget on PFSP, MKP, and CVRP. Lower is better. [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗
read the original abstract

Large Language Models have recently emerged as a promising paradigm for automated heuristic design for NP-hard combinatorial optimization problems. Despite this progress, existing LLM-based methods typically rely on monolithic workflows constrained by rigid templates, thereby restricting memory-guided exploration and triggering premature convergence to local optima. To design an autonomous and collaborative architecture, we introduce HMACE, a Heterogeneous Multi-Agent Collaborative Evolution framework that reconceptualizes heuristic search as an organizational design problem. HMACE decomposes each evolutionary generation into an autonomous, role-specialized loop with four coordinated agents: a Proposer for strategy exploration, a Generator for executable heuristic synthesis, an Evaluator for empirical assessment, and a Reflector for archive-backed memory update. By coupling behavior-aware retrieval, lightweight candidate filtering, and fitness-grounded archive updates, HMACE guides the search toward diverse and promising heuristic behaviors while avoiding redundant evaluations. Extensive evaluations on representative COPs, including TSP, Online BPP, MKP, and PFSP, show that HMACE achieves a favorable quality-efficiency trade-off compared to state-of-the-art single-agent and multi-agent baselines. In the matched LLM-driven reference comparison, HMACE achieves the lowest average gaps on TSP and Online BPP (0.464\% and 0.223\%, respectively), while requiring only 0.13M and 0.42M tokens for the two tasks, substantially fewer than the compared baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper introduces HMACE, a Heterogeneous Multi-Agent Collaborative Evolution framework for automated heuristic design in combinatorial optimization using LLMs. It decomposes each evolutionary generation into a role-specialized loop with four agents (Proposer for strategy exploration, Generator for executable code synthesis, Evaluator for empirical assessment, and Reflector for archive-backed memory update), augmented by behavior-aware retrieval, lightweight filtering, and fitness-grounded archiving to promote diversity and avoid premature convergence. Experiments on TSP, Online BPP, MKP, and PFSP report that HMACE attains the lowest average gaps on TSP (0.464%) and Online BPP (0.223%) while consuming substantially fewer tokens (0.13M and 0.42M) than single-agent and multi-agent LLM baselines.

Significance. If the reported performance holds under the stated protocol, HMACE advances LLM-based automated algorithm design by reframing heuristic search as an organizational multi-agent problem. The quality-efficiency trade-off, achieved through explicit role specialization and memory mechanisms rather than monolithic prompting, offers a concrete template for reducing token costs while improving exploration on NP-hard problems. The framework's emphasis on executable heuristic synthesis and archive updates could generalize to other domains requiring iterative code generation and evaluation.

minor comments (3)
  1. The abstract reports headline gaps and token counts but omits any mention of the number of instances, instance sizes, or statistical aggregation method (e.g., mean over 10 runs); adding one sentence summarizing the experimental protocol would improve readability for readers who stop at the abstract.
  2. In the method description, the precise definition of 'behavior-aware retrieval' (e.g., which features of prior heuristics are embedded and how similarity is computed) is referenced but not formalized; a short pseudocode block or equation would clarify the mechanism that purportedly prevents redundant evaluations.
  3. Table captions and axis labels in the experimental figures should explicitly state the LLM backbone and temperature setting used for all compared methods to allow direct replication of the token-consumption comparison.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of HMACE and the recommendation for minor revision. The summary accurately reflects the framework's decomposition into role-specialized agents, the use of behavior-aware retrieval and fitness-grounded archiving, and the reported gains in optimality gaps and token efficiency on TSP and Online BPP. We will incorporate any minor editorial or clarification changes requested in the revised manuscript.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces an empirical multi-agent LLM framework (HMACE) for heuristic evolution on combinatorial problems and supports its claims exclusively through experimental comparisons of solution gaps and token usage on TSP, Online BPP, MKP, and PFSP instances against single-agent and multi-agent baselines. No mathematical derivations, equations, or first-principles predictions exist that could reduce to fitted parameters, self-definitions, or self-citation chains. The reported results (0.464% TSP gap, 0.223% BPP gap, 0.13M/0.42M tokens) follow directly from the described procedural loop of Proposer/Generator/Evaluator/Reflector agents plus retrieval and archiving steps, which are externally replicable on the stated benchmarks without internal reduction to the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework depends on LLMs being capable of generating executable, evaluable heuristics and meaningful reflections from natural-language prompts; no free parameters or new entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Large language models can reliably produce executable code for combinatorial optimization heuristics when given role-specific prompts.
    The Generator and Evaluator agents presuppose this capability for the loop to function.

pith-pipeline@v0.9.0 · 5555 in / 1168 out tokens · 43101 ms · 2026-05-11T01:45:52.055926+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 1 internal anchor

  1. [1]

    Llms can schedule,

    Henrik Abgaryan, Ararat Harutyunyan, and Tristan Cazenave. Llms can schedule.arXiv preprint arXiv:2408.06993, 2024

  2. [2]

    arXiv preprint arXiv:2402.10172 , year =

    Ali AhmadiTeshnizi, Wenzhi Gao, and Madeleine Udell. Optimus: Scalable optimization modeling with (mi) lp solvers and large language models.arXiv preprint arXiv:2402.10172, 2024

  3. [3]

    Lm4opt: Unveiling the potential of large language models in formulating mathematical optimization problems.INFOR: Information Systems and Operational Research, 62(4):559–572, 2024

    Tasnim Ahmed and Salimur Choudhury. Lm4opt: Unveiling the potential of large language models in formulating mathematical optimization problems.INFOR: Information Systems and Operational Research, 62(4):559–572, 2024

  4. [4]

    Solving the traveling salesman problem with machine learning: a review of recent advances and challenges.Artificial Intelligence Review, 58(9):267, 2025

    Entesar Alanzi and Mohamed El Bachir Menai. Solving the traveling salesman problem with machine learning: a review of recent advances and challenges.Artificial Intelligence Review, 58(9):267, 2025

  5. [5]

    Mapf-gpt: Imitation learning for multi-agent pathfinding at scale

    Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, and Alexey Skrynnik. Mapf-gpt: Imitation learning for multi-agent pathfinding at scale. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 23126–23134, 2025

  6. [6]

    Princeton University Press, 2006

    David L Applegate, Robert E Bixby, Vašek Chvátal, and William J Cook.The Traveling Salesman Problem: A Computational Study. Princeton University Press, 2006

  7. [7]

    Springer Science & Business Media, 2012

    Giorgio Ausiello, Pierluigi Crescenzi, Giorgio Gambosi, Viggo Kann, Alberto Marchetti- Spaccamela, and Marco Protasi.Complexity and approximation: Combinatorial optimization problems and their approximability properties. Springer Science & Business Media, 2012

  8. [8]

    Neural Combinatorial Optimization with Reinforcement Learning

    Irwan Bello, Hieu Pham, Quoc V Le, Mohammad Norouzi, and Samy Bengio. Neural combina- torial optimization with reinforcement learning.arXiv preprint arXiv:1611.09940, 2017

  9. [9]

    Dragon: Llm-driven decomposition and reconstruction agents for large-scale combinatorial optimization.arXiv preprint arXiv:2601.06502, 2026

    Shengkai Chen, Zhiguang Cao, Jianan Zhou, Yaoxin Wu, Senthilnath Jayavelu, Zhuoyi Lin, Xiaoli Li, and Shili Xiang. Dragon: Llm-driven decomposition and reconstruction agents for large-scale combinatorial optimization.arXiv preprint arXiv:2601.06502, 2026

  10. [10]

    Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors

    Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, et al. Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors. InThe Twelfth International Conference on Learning Representations, 2023

  11. [11]

    A genetic algorithm for the multidimensional knapsack problem

    P C Chu and John E Beasley. A genetic algorithm for the multidimensional knapsack problem. Journal of Heuristics, 4(1):63–86, 1998

  12. [12]

    Approximation algorithms for bin packing: a survey

    Edward G Coffman Jr, Michael R Garey, and David S Johnson. Approximation algorithms for bin packing: a survey. In Dorit S Hochbaum, editor,Approximation Algorithms for NP-hard Problems, pages 46–93. PWS Publishing, 1996

  13. [13]

    Large language models for combinatorial optimization: A systematic review.ACM Computing Surveys, 2025

    Francesca Da Ros, Michael Soprano, Luca Di Gaspero, and Kevin Roitero. Large language models for combinatorial optimization: A systematic review.ACM Computing Surveys, 2025

  14. [14]

    Hsevo: Elevating automatic heuristic design with diversity-driven harmony search and genetic algorithm using llms

    Pham Vu Tuan Dat, Long Doan, and Huynh Thi Thanh Binh. Hsevo: Elevating automatic heuristic design with diversity-driven harmony search and genetic algorithm using llms. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 26931–26938, 2025

  15. [15]

    A hybrid grouping genetic algorithm for bin packing.Journal of Heuristics, 2(1):5–30, 1996

    Emmanuel Falkenauer. A hybrid grouping genetic algorithm for bin packing.Journal of Heuristics, 2(1):5–30, 1996

  16. [16]

    A functional heuristic algorithm for the flowshop scheduling problem

    Jatinder N D Gupta. A functional heuristic algorithm for the flowshop scheduling problem. Operational Research Quarterly, 22(1):39–47, 1971

  17. [17]

    An effective implementation of the lin-kernighan traveling salesman heuristic

    Keld Helsgaun. An effective implementation of the lin-kernighan traveling salesman heuristic. European Journal of Operational Research, 126(1):106–130, 2000

  18. [18]

    Metagpt: Meta programming for a multi-agent collaborative framework

    Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. Metagpt: Meta programming for a multi-agent collaborative framework. InThe twelfth international conference on learning representations, 2023. 10

  19. [19]

    How multimodal integration boost the performance of llm for optimization: Case study on capacitated vehicle routing problems

    Yuxiao Huang, Wenjie Zhang, Liang Feng, Xingyu Wu, and Kay Chen Tan. How multimodal integration boost the performance of llm for optimization: Case study on capacitated vehicle routing problems. In2025 IEEE Symposium for Multidisciplinary Computational Intelligence Incubators (MCII), pages 1–7. IEEE, 2025

  20. [20]

    Llmopt: Learning to define and solve general optimization problems from scratch.arXiv preprint arXiv:2410.13213,

    Caigao Jiang, Xiang Shu, Hong Qian, Xingyu Lu, Jun Zhou, Aimin Zhou, and Yang Yu. Llmopt: Learning to define and solve general optimization problems from scratch.arXiv preprint arXiv:2410.13213, 2024

  21. [21]

    Large language models as end-to-end combinatorial optimization solvers.arXiv preprint arXiv:2509.16865, 2025

    Xia Jiang, Yaoxin Wu, Minshuo Li, Zhiguang Cao, and Yingqian Zhang. Large language models as end-to-end combinatorial optimization solvers.arXiv preprint arXiv:2509.16865, 2025

  22. [22]

    PhD thesis, Massachusetts Institute of Technology, 1973

    David S Johnson.Near-optimal bin packing algorithms. PhD thesis, Massachusetts Institute of Technology, 1973

  23. [23]

    Attention, learn to solve routing problems! In International Conference on Learning Representations, 2019

    Wouter Kool, Herke van Hoof, and Max Welling. Attention, learn to solve routing problems! In International Conference on Learning Representations, 2019

  24. [24]

    Pomo: Policy optimization with multiple optima for reinforcement learning

    Yeong-Dae Kwon, Jinho Choo, Byung-In Kim, Iljoo Yoon, Youngjune Gwon, and Seungjai Min. Pomo: Policy optimization with multiple optima for reinforcement learning. InAdvances in Neural Information Processing Systems, volume 33, pages 21188–21198, 2020

  25. [25]

    Camel: Communicative agents for" mind" exploration of large language model society.Advances in neural information processing systems, 36:51991–52008, 2023

    Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for" mind" exploration of large language model society.Advances in neural information processing systems, 36:51991–52008, 2023

  26. [26]

    AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts

    Keyu Li, Junhao Shi, Yang Xiao, Mohan Jiang, Jie Sun, Yunze Wu, Shijie Xia, Xiaojie Cai, Tianze Xu, Weiye Si, et al. Agencybench: Benchmarking the frontiers of autonomous agents in 1m-token real-world contexts.arXiv preprint arXiv:2601.11044, 2026

  27. [27]

    When single-agent with skills replace multi-agent systems and when they fail,

    Xiaoxiao Li. When single-agent with skills replace multi-agent systems and when they fail. arXiv preprint arXiv:2601.04748, 2026

  28. [28]

    Evolution of heuristics: Towards efficient automatic algorithm design using large language model,

    Fei Liu, Xialiang Tong, Mingxuan Yuan, Xi Lin, Fu Luo, Zhenkun Wang, Zhichao Lu, and Qingfu Zhang. Evolution of heuristics: Towards efficient automatic algorithm design using large language model.arXiv preprint arXiv:2401.02051, 2024

  29. [29]

    Agentbench: Evaluating llms as agents

    Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, et al. Agentbench: Evaluating llms as agents. InThe Twelfth International Conference on Learning Representations, 2024

  30. [30]

    Experience-guided reflective co-evolution of prompts and heuristics for automatic algorithm design.arXiv preprint arXiv:2509.24509, 2025

    Yihong Liu, Junyi Li, Wayne Xin Zhao, Hongyu Lu, and Ji-Rong Wen. Experience-guided reflective co-evolution of prompts and heuristics for automatic algorithm design.arXiv preprint arXiv:2509.24509, 2025

  31. [31]

    Neural combinatorial optimization with heavy decoder: Toward large-scale generalization

    Fu Luo, Xi Lin, Fei Liu, Qingfu Zhang, and Zhenkun Wang. Neural combinatorial optimization with heavy decoder: Toward large-scale generalization. InAdvances in Neural Information Processing Systems, volume 36, pages 19769–19789, 2023

  32. [32]

    John Wiley & Sons, 1990

    Silvano Martello and Paolo Toth.Knapsack Problems: Algorithms and Computer Implementa- tions. John Wiley & Sons, 1990

  33. [33]

    Illuminating search spaces by mapping elites

    Jean-Baptiste Mouret and Jeff Clune. Illuminating search spaces by mapping elites.arXiv preprint arXiv:1504.04909, 2015

  34. [34]

    A heuristic algorithm for the m-machine, n-job flow-shop sequencing problem.Omega, 11(1):91–95, 1983

    Muhammad Nawaz, E Emory Enscore Jr, and Inyong Ham. A heuristic algorithm for the m-machine, n-job flow-shop sequencing problem.Omega, 11(1):91–95, 1983

  35. [35]

    Courier Corporation, 1998

    Christos H Papadimitriou and Kenneth Steiglitz.Combinatorial optimization: algorithms and complexity. Courier Corporation, 1998

  36. [36]

    Or-tools

    Laurent Perron and Vincent Furnon. Or-tools. Google Optimization Tools, version 9.10, 2024

  37. [37]

    Chatdev: Communicative agents for software development

    Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, et al. Chatdev: Communicative agents for software development. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15174–15186, 2024

  38. [38]

    2604.01658 , archivePrefix =

    Ao Qu, Han Zheng, Zijian Zhou, Yihao Yan, Yihong Tang, Shao Yong Ong, Fenglu Hong, Kaichen Zhou, Chonghe Jiang, Minwei Kong, et al. Coral: Towards autonomous multi-agent evolution for open-ended discovery.arXiv preprint arXiv:2604.01658, 2026. 11

  39. [39]

    Experimental evaluation of heuristic optimization algorithms: A tutorial.Journal of Heuristics, 7(3):261–304, 2001

    Ronald L Rardin and Reha Uzsoy. Experimental evaluation of heuristic optimization algorithms: A tutorial.Journal of Heuristics, 7(3):261–304, 2001

  40. [40]

    Tsplib—a traveling salesman problem library.ORSA Journal on Computing, 3(4):376–384, 1991

    Gerhard Reinelt. Tsplib—a traveling salesman problem library.ORSA Journal on Computing, 3(4):376–384, 1991

  41. [41]

    Mathematical discoveries from program search with large language models

    Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M Pawan Kumar, Emilien Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming Wang, Omar Fawzi, et al. Mathematical discoveries from program search with large language models. Nature, 625(7995):468–475, 2024

  42. [42]

    Generalizable heuristic generation through llms with meta-optimization

    Yiding Shi, Jianan Zhou, Wen Song, Jieyi Bi, Yaoxin Wu, Zhiguang Cao, and Jie Zhang. Generalizable heuristic generation through llms with meta-optimization. InThe Fourteenth International Conference on Learning Representations, 2026

  43. [43]

    Nature inspired meta heuristic algorithms for optimization problems.Computing, 104(2):251–269, 2022

    Vinod Chandra SS and Anand HS. Nature inspired meta heuristic algorithms for optimization problems.Computing, 104(2):251–269, 2022

  44. [44]

    Benchmarks for basic scheduling problems.European Journal of Operational Research, 64(2):278–285, 1993

    Eric Taillard. Benchmarks for basic scheduling problems.European Journal of Operational Research, 64(2):278–285, 1993

  45. [45]

    Using centroidal voronoi tessellations to scale up the multidimensional archive of phenotypic elites algorithm

    Vassilis Vassiliades, Konstantinos Chatzilygeroudis, and Jean-Baptiste Mouret. Using centroidal voronoi tessellations to scale up the multidimensional archive of phenotypic elites algorithm. IEEE Transactions on Evolutionary Computation, 22(4):623–630, 2018

  46. [46]

    Pointer networks

    Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. InAdvances in Neural Information Processing Systems, volume 28, 2015

  47. [47]

    Autogen: Enabling next-gen llm applications via multi-agent conversations

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversations. InFirst conference on language modeling, 2024

  48. [48]

    Efficient heuristics generation for solving combinatorial optimization problems using large lan- guage models

    Xuan Wu, Di Wang, Chunguo Wu, Lijie Wen, Chunyan Miao, Yubin Xiao, and You Zhou. Efficient heuristics generation for solving combinatorial optimization problems using large lan- guage models. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 3228–3239, 2025

  49. [49]

    Multi-objective evolution of heuristic using large language model

    Shunyu Yao, Fei Liu, Xi Lin, Zhichao Lu, Zhenkun Wang, and Qingfu Zhang. Multi-objective evolution of heuristic using large language model. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 27144–27152, 2025

  50. [50]

    Reevo: Large language models as hyper-heuristics with reflective evolution.Advances in neural information processing systems, 37:43571–43608, 2024

    Haoran Ye, Jiarui Wang, Zhiguang Cao, Federico Berto, Chuanbo Hua, Haeyeon Kim, Jinkyoo Park, and Guojie Song. Reevo: Large language models as hyper-heuristics with reflective evolution.Advances in neural information processing systems, 37:43571–43608, 2024

  51. [51]

    Monte carlo tree search for comprehensive explo- ration in llm-based automatic heuristic design,

    Zhi Zheng, Zhuoliang Xie, Zhenkun Wang, and Bryan Hooi. Monte carlo tree search for compre- hensive exploration in llm-based automatic heuristic design.arXiv preprint arXiv:2501.08603, 2025. 12 A Supplementary Results and Additional Analysis This appendix is organized in the order most useful for readers of the main paper: we first report the additional e...