Recognition: no theorem link
HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization
Pith reviewed 2026-05-11 01:45 UTC · model grok-4.3
The pith
Four specialized agents in a collaborative loop evolve better heuristics for combinatorial optimization using fewer LLM tokens.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HMACE reconceptualizes heuristic search as an organizational design problem by decomposing each evolutionary generation into an autonomous, role-specialized loop with four coordinated agents: a Proposer for strategy exploration, a Generator for executable heuristic synthesis, an Evaluator for empirical assessment, and a Reflector for archive-backed memory update. By coupling behavior-aware retrieval, lightweight candidate filtering, and fitness-grounded archive updates, HMACE guides the search toward diverse and promising heuristic behaviors while avoiding redundant evaluations. Evaluations on TSP, Online BPP, MKP, and PFSP show that HMACE achieves a favorable quality-efficiency trade-off,,
What carries the argument
The HMACE four-agent loop (Proposer, Generator, Evaluator, Reflector) together with behavior-aware retrieval and fitness-grounded archive updates, which organizes LLM calls to maintain diversity and reduce wasted evaluations during heuristic evolution.
If this is right
- HMACE reaches the lowest reported average gaps of 0.464 percent on TSP and 0.223 percent on Online BPP.
- It does so while using only 0.13 million tokens for TSP and 0.42 million for BPP, far below the compared baselines.
- The same four-agent structure with retrieval and filtering extends to MKP and PFSP while preserving the quality-efficiency balance.
- Behavior-aware retrieval and fitness-based updates prevent repeated evaluation of similar or weak candidates.
Where Pith is reading between the lines
- The same division of labor might help LLM agents in other code-generation settings where diversity is needed, such as generating scheduling rules or packing strategies for new domains.
- Explicit archive reflection could be added to non-evolutionary multi-agent setups to cut token waste without changing the underlying model.
- Testing the framework on much larger problem instances would reveal whether the token savings scale when solution spaces grow.
Load-bearing premise
The framework assumes that an LLM will consistently produce executable and varied heuristic code plus accurate performance reflections when given role-specific prompts and access to an archive of past results.
What would settle it
If HMACE-evolved heuristics on the standard TSP benchmark instances produce an average gap above 0.5 percent or consume more than 0.2 million tokens while a simpler baseline stays below those thresholds, the claimed quality-efficiency advantage would not hold.
Figures
read the original abstract
Large Language Models have recently emerged as a promising paradigm for automated heuristic design for NP-hard combinatorial optimization problems. Despite this progress, existing LLM-based methods typically rely on monolithic workflows constrained by rigid templates, thereby restricting memory-guided exploration and triggering premature convergence to local optima. To design an autonomous and collaborative architecture, we introduce HMACE, a Heterogeneous Multi-Agent Collaborative Evolution framework that reconceptualizes heuristic search as an organizational design problem. HMACE decomposes each evolutionary generation into an autonomous, role-specialized loop with four coordinated agents: a Proposer for strategy exploration, a Generator for executable heuristic synthesis, an Evaluator for empirical assessment, and a Reflector for archive-backed memory update. By coupling behavior-aware retrieval, lightweight candidate filtering, and fitness-grounded archive updates, HMACE guides the search toward diverse and promising heuristic behaviors while avoiding redundant evaluations. Extensive evaluations on representative COPs, including TSP, Online BPP, MKP, and PFSP, show that HMACE achieves a favorable quality-efficiency trade-off compared to state-of-the-art single-agent and multi-agent baselines. In the matched LLM-driven reference comparison, HMACE achieves the lowest average gaps on TSP and Online BPP (0.464\% and 0.223\%, respectively), while requiring only 0.13M and 0.42M tokens for the two tasks, substantially fewer than the compared baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces HMACE, a Heterogeneous Multi-Agent Collaborative Evolution framework for automated heuristic design in combinatorial optimization using LLMs. It decomposes each evolutionary generation into a role-specialized loop with four agents (Proposer for strategy exploration, Generator for executable code synthesis, Evaluator for empirical assessment, and Reflector for archive-backed memory update), augmented by behavior-aware retrieval, lightweight filtering, and fitness-grounded archiving to promote diversity and avoid premature convergence. Experiments on TSP, Online BPP, MKP, and PFSP report that HMACE attains the lowest average gaps on TSP (0.464%) and Online BPP (0.223%) while consuming substantially fewer tokens (0.13M and 0.42M) than single-agent and multi-agent LLM baselines.
Significance. If the reported performance holds under the stated protocol, HMACE advances LLM-based automated algorithm design by reframing heuristic search as an organizational multi-agent problem. The quality-efficiency trade-off, achieved through explicit role specialization and memory mechanisms rather than monolithic prompting, offers a concrete template for reducing token costs while improving exploration on NP-hard problems. The framework's emphasis on executable heuristic synthesis and archive updates could generalize to other domains requiring iterative code generation and evaluation.
minor comments (3)
- The abstract reports headline gaps and token counts but omits any mention of the number of instances, instance sizes, or statistical aggregation method (e.g., mean over 10 runs); adding one sentence summarizing the experimental protocol would improve readability for readers who stop at the abstract.
- In the method description, the precise definition of 'behavior-aware retrieval' (e.g., which features of prior heuristics are embedded and how similarity is computed) is referenced but not formalized; a short pseudocode block or equation would clarify the mechanism that purportedly prevents redundant evaluations.
- Table captions and axis labels in the experimental figures should explicitly state the LLM backbone and temperature setting used for all compared methods to allow direct replication of the token-consumption comparison.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of HMACE and the recommendation for minor revision. The summary accurately reflects the framework's decomposition into role-specialized agents, the use of behavior-aware retrieval and fitness-grounded archiving, and the reported gains in optimality gaps and token efficiency on TSP and Online BPP. We will incorporate any minor editorial or clarification changes requested in the revised manuscript.
Circularity Check
No significant circularity
full rationale
The paper introduces an empirical multi-agent LLM framework (HMACE) for heuristic evolution on combinatorial problems and supports its claims exclusively through experimental comparisons of solution gaps and token usage on TSP, Online BPP, MKP, and PFSP instances against single-agent and multi-agent baselines. No mathematical derivations, equations, or first-principles predictions exist that could reduce to fitted parameters, self-definitions, or self-citation chains. The reported results (0.464% TSP gap, 0.223% BPP gap, 0.13M/0.42M tokens) follow directly from the described procedural loop of Proposer/Generator/Evaluator/Reflector agents plus retrieval and archiving steps, which are externally replicable on the stated benchmarks without internal reduction to the inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models can reliably produce executable code for combinatorial optimization heuristics when given role-specific prompts.
Reference graph
Works this paper leans on
-
[1]
Henrik Abgaryan, Ararat Harutyunyan, and Tristan Cazenave. Llms can schedule.arXiv preprint arXiv:2408.06993, 2024
-
[2]
arXiv preprint arXiv:2402.10172 , year =
Ali AhmadiTeshnizi, Wenzhi Gao, and Madeleine Udell. Optimus: Scalable optimization modeling with (mi) lp solvers and large language models.arXiv preprint arXiv:2402.10172, 2024
-
[3]
Tasnim Ahmed and Salimur Choudhury. Lm4opt: Unveiling the potential of large language models in formulating mathematical optimization problems.INFOR: Information Systems and Operational Research, 62(4):559–572, 2024
work page 2024
-
[4]
Entesar Alanzi and Mohamed El Bachir Menai. Solving the traveling salesman problem with machine learning: a review of recent advances and challenges.Artificial Intelligence Review, 58(9):267, 2025
work page 2025
-
[5]
Mapf-gpt: Imitation learning for multi-agent pathfinding at scale
Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, and Alexey Skrynnik. Mapf-gpt: Imitation learning for multi-agent pathfinding at scale. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 23126–23134, 2025
work page 2025
-
[6]
Princeton University Press, 2006
David L Applegate, Robert E Bixby, Vašek Chvátal, and William J Cook.The Traveling Salesman Problem: A Computational Study. Princeton University Press, 2006
work page 2006
-
[7]
Springer Science & Business Media, 2012
Giorgio Ausiello, Pierluigi Crescenzi, Giorgio Gambosi, Viggo Kann, Alberto Marchetti- Spaccamela, and Marco Protasi.Complexity and approximation: Combinatorial optimization problems and their approximability properties. Springer Science & Business Media, 2012
work page 2012
-
[8]
Neural Combinatorial Optimization with Reinforcement Learning
Irwan Bello, Hieu Pham, Quoc V Le, Mohammad Norouzi, and Samy Bengio. Neural combina- torial optimization with reinforcement learning.arXiv preprint arXiv:1611.09940, 2017
work page Pith review arXiv 2017
-
[9]
Shengkai Chen, Zhiguang Cao, Jianan Zhou, Yaoxin Wu, Senthilnath Jayavelu, Zhuoyi Lin, Xiaoli Li, and Shili Xiang. Dragon: Llm-driven decomposition and reconstruction agents for large-scale combinatorial optimization.arXiv preprint arXiv:2601.06502, 2026
-
[10]
Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors
Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, et al. Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors. InThe Twelfth International Conference on Learning Representations, 2023
work page 2023
-
[11]
A genetic algorithm for the multidimensional knapsack problem
P C Chu and John E Beasley. A genetic algorithm for the multidimensional knapsack problem. Journal of Heuristics, 4(1):63–86, 1998
work page 1998
-
[12]
Approximation algorithms for bin packing: a survey
Edward G Coffman Jr, Michael R Garey, and David S Johnson. Approximation algorithms for bin packing: a survey. In Dorit S Hochbaum, editor,Approximation Algorithms for NP-hard Problems, pages 46–93. PWS Publishing, 1996
work page 1996
-
[13]
Francesca Da Ros, Michael Soprano, Luca Di Gaspero, and Kevin Roitero. Large language models for combinatorial optimization: A systematic review.ACM Computing Surveys, 2025
work page 2025
-
[14]
Pham Vu Tuan Dat, Long Doan, and Huynh Thi Thanh Binh. Hsevo: Elevating automatic heuristic design with diversity-driven harmony search and genetic algorithm using llms. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 26931–26938, 2025
work page 2025
-
[15]
A hybrid grouping genetic algorithm for bin packing.Journal of Heuristics, 2(1):5–30, 1996
Emmanuel Falkenauer. A hybrid grouping genetic algorithm for bin packing.Journal of Heuristics, 2(1):5–30, 1996
work page 1996
-
[16]
A functional heuristic algorithm for the flowshop scheduling problem
Jatinder N D Gupta. A functional heuristic algorithm for the flowshop scheduling problem. Operational Research Quarterly, 22(1):39–47, 1971
work page 1971
-
[17]
An effective implementation of the lin-kernighan traveling salesman heuristic
Keld Helsgaun. An effective implementation of the lin-kernighan traveling salesman heuristic. European Journal of Operational Research, 126(1):106–130, 2000
work page 2000
-
[18]
Metagpt: Meta programming for a multi-agent collaborative framework
Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. Metagpt: Meta programming for a multi-agent collaborative framework. InThe twelfth international conference on learning representations, 2023. 10
work page 2023
-
[19]
Yuxiao Huang, Wenjie Zhang, Liang Feng, Xingyu Wu, and Kay Chen Tan. How multimodal integration boost the performance of llm for optimization: Case study on capacitated vehicle routing problems. In2025 IEEE Symposium for Multidisciplinary Computational Intelligence Incubators (MCII), pages 1–7. IEEE, 2025
work page 2025
-
[20]
Caigao Jiang, Xiang Shu, Hong Qian, Xingyu Lu, Jun Zhou, Aimin Zhou, and Yang Yu. Llmopt: Learning to define and solve general optimization problems from scratch.arXiv preprint arXiv:2410.13213, 2024
-
[21]
Xia Jiang, Yaoxin Wu, Minshuo Li, Zhiguang Cao, and Yingqian Zhang. Large language models as end-to-end combinatorial optimization solvers.arXiv preprint arXiv:2509.16865, 2025
-
[22]
PhD thesis, Massachusetts Institute of Technology, 1973
David S Johnson.Near-optimal bin packing algorithms. PhD thesis, Massachusetts Institute of Technology, 1973
work page 1973
-
[23]
Wouter Kool, Herke van Hoof, and Max Welling. Attention, learn to solve routing problems! In International Conference on Learning Representations, 2019
work page 2019
-
[24]
Pomo: Policy optimization with multiple optima for reinforcement learning
Yeong-Dae Kwon, Jinho Choo, Byung-In Kim, Iljoo Yoon, Youngjune Gwon, and Seungjai Min. Pomo: Policy optimization with multiple optima for reinforcement learning. InAdvances in Neural Information Processing Systems, volume 33, pages 21188–21198, 2020
work page 2020
-
[25]
Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for" mind" exploration of large language model society.Advances in neural information processing systems, 36:51991–52008, 2023
work page 2023
-
[26]
AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts
Keyu Li, Junhao Shi, Yang Xiao, Mohan Jiang, Jie Sun, Yunze Wu, Shijie Xia, Xiaojie Cai, Tianze Xu, Weiye Si, et al. Agencybench: Benchmarking the frontiers of autonomous agents in 1m-token real-world contexts.arXiv preprint arXiv:2601.11044, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[27]
When single-agent with skills replace multi-agent systems and when they fail,
Xiaoxiao Li. When single-agent with skills replace multi-agent systems and when they fail. arXiv preprint arXiv:2601.04748, 2026
-
[28]
Evolution of heuristics: Towards efficient automatic algorithm design using large language model,
Fei Liu, Xialiang Tong, Mingxuan Yuan, Xi Lin, Fu Luo, Zhenkun Wang, Zhichao Lu, and Qingfu Zhang. Evolution of heuristics: Towards efficient automatic algorithm design using large language model.arXiv preprint arXiv:2401.02051, 2024
-
[29]
Agentbench: Evaluating llms as agents
Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, et al. Agentbench: Evaluating llms as agents. InThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[30]
Yihong Liu, Junyi Li, Wayne Xin Zhao, Hongyu Lu, and Ji-Rong Wen. Experience-guided reflective co-evolution of prompts and heuristics for automatic algorithm design.arXiv preprint arXiv:2509.24509, 2025
-
[31]
Neural combinatorial optimization with heavy decoder: Toward large-scale generalization
Fu Luo, Xi Lin, Fei Liu, Qingfu Zhang, and Zhenkun Wang. Neural combinatorial optimization with heavy decoder: Toward large-scale generalization. InAdvances in Neural Information Processing Systems, volume 36, pages 19769–19789, 2023
work page 2023
-
[32]
Silvano Martello and Paolo Toth.Knapsack Problems: Algorithms and Computer Implementa- tions. John Wiley & Sons, 1990
work page 1990
-
[33]
Illuminating search spaces by mapping elites
Jean-Baptiste Mouret and Jeff Clune. Illuminating search spaces by mapping elites.arXiv preprint arXiv:1504.04909, 2015
work page Pith review arXiv 2015
-
[34]
A heuristic algorithm for the m-machine, n-job flow-shop sequencing problem.Omega, 11(1):91–95, 1983
Muhammad Nawaz, E Emory Enscore Jr, and Inyong Ham. A heuristic algorithm for the m-machine, n-job flow-shop sequencing problem.Omega, 11(1):91–95, 1983
work page 1983
-
[35]
Christos H Papadimitriou and Kenneth Steiglitz.Combinatorial optimization: algorithms and complexity. Courier Corporation, 1998
work page 1998
- [36]
-
[37]
Chatdev: Communicative agents for software development
Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, et al. Chatdev: Communicative agents for software development. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15174–15186, 2024
work page 2024
-
[38]
Ao Qu, Han Zheng, Zijian Zhou, Yihao Yan, Yihong Tang, Shao Yong Ong, Fenglu Hong, Kaichen Zhou, Chonghe Jiang, Minwei Kong, et al. Coral: Towards autonomous multi-agent evolution for open-ended discovery.arXiv preprint arXiv:2604.01658, 2026. 11
-
[39]
Ronald L Rardin and Reha Uzsoy. Experimental evaluation of heuristic optimization algorithms: A tutorial.Journal of Heuristics, 7(3):261–304, 2001
work page 2001
-
[40]
Tsplib—a traveling salesman problem library.ORSA Journal on Computing, 3(4):376–384, 1991
Gerhard Reinelt. Tsplib—a traveling salesman problem library.ORSA Journal on Computing, 3(4):376–384, 1991
work page 1991
-
[41]
Mathematical discoveries from program search with large language models
Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M Pawan Kumar, Emilien Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming Wang, Omar Fawzi, et al. Mathematical discoveries from program search with large language models. Nature, 625(7995):468–475, 2024
work page 2024
-
[42]
Generalizable heuristic generation through llms with meta-optimization
Yiding Shi, Jianan Zhou, Wen Song, Jieyi Bi, Yaoxin Wu, Zhiguang Cao, and Jie Zhang. Generalizable heuristic generation through llms with meta-optimization. InThe Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[43]
Nature inspired meta heuristic algorithms for optimization problems.Computing, 104(2):251–269, 2022
Vinod Chandra SS and Anand HS. Nature inspired meta heuristic algorithms for optimization problems.Computing, 104(2):251–269, 2022
work page 2022
-
[44]
Eric Taillard. Benchmarks for basic scheduling problems.European Journal of Operational Research, 64(2):278–285, 1993
work page 1993
-
[45]
Vassilis Vassiliades, Konstantinos Chatzilygeroudis, and Jean-Baptiste Mouret. Using centroidal voronoi tessellations to scale up the multidimensional archive of phenotypic elites algorithm. IEEE Transactions on Evolutionary Computation, 22(4):623–630, 2018
work page 2018
-
[46]
Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. InAdvances in Neural Information Processing Systems, volume 28, 2015
work page 2015
-
[47]
Autogen: Enabling next-gen llm applications via multi-agent conversations
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversations. InFirst conference on language modeling, 2024
work page 2024
-
[48]
Xuan Wu, Di Wang, Chunguo Wu, Lijie Wen, Chunyan Miao, Yubin Xiao, and You Zhou. Efficient heuristics generation for solving combinatorial optimization problems using large lan- guage models. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 3228–3239, 2025
work page 2025
-
[49]
Multi-objective evolution of heuristic using large language model
Shunyu Yao, Fei Liu, Xi Lin, Zhichao Lu, Zhenkun Wang, and Qingfu Zhang. Multi-objective evolution of heuristic using large language model. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 27144–27152, 2025
work page 2025
-
[50]
Haoran Ye, Jiarui Wang, Zhiguang Cao, Federico Berto, Chuanbo Hua, Haeyeon Kim, Jinkyoo Park, and Guojie Song. Reevo: Large language models as hyper-heuristics with reflective evolution.Advances in neural information processing systems, 37:43571–43608, 2024
work page 2024
-
[51]
Monte carlo tree search for comprehensive explo- ration in llm-based automatic heuristic design,
Zhi Zheng, Zhuoliang Xie, Zhenkun Wang, and Bryan Hooi. Monte carlo tree search for compre- hensive exploration in llm-based automatic heuristic design.arXiv preprint arXiv:2501.08603, 2025. 12 A Supplementary Results and Additional Analysis This appendix is organized in the order most useful for readers of the main paper: we first report the additional e...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.