COAgents: Multi-Agent Framework to Learn and Navigate Routing Problems Search Space

Abdullah Ali Sivas; Cheikh Ahmed; Mahdi Mostajabdaveh; Mao Kun; Oleksandr Yakovenko; Xiaorui Li; Zirui Zhou

arxiv: 2605.20618 · v1 · pith:M5CZOQ4Jnew · submitted 2026-05-20 · 💻 cs.AI

COAgents: Multi-Agent Framework to Learn and Navigate Routing Problems Search Space

Oleksandr Yakovenko , Mahdi Mostajabdaveh , Cheikh Ahmed , Abdullah Ali Sivas , Xiaorui Li , Zirui Zhou , Mao Kun This is my paper

Pith reviewed 2026-05-21 05:21 UTC · model grok-4.3

classification 💻 cs.AI

keywords multi-agent frameworkvehicle routing problempartial search graphcombinatorial optimizationVRPTWsearch intensificationdiversificationlearning-based solver

0 comments

The pith

COAgents trains agents on a partial search graph to guide solution search for vehicle routing problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents COAgents as a way to solve vehicle routing problems by treating the search for better routes as a graph navigation task. Solutions are nodes in a graph, connected by either small improvements or big changes called jumps. A partial version of this graph is built on the fly while searching, and three agents learn to pick which solution to work on next, which improvement to apply, and when to make a jump to escape poor areas. This setup keeps the learning of how to search separate from the details of the routing problem itself. A sympathetic reader would care because traditional methods need lots of custom rules for each problem type, and this learned approach could make solvers more flexible for real logistics challenges.

Core claim

COAgents is a cooperative multi-agent framework that models the search process as a graph where nodes represent solutions and edges correspond to local refinements or large perturbations for diversification. It dynamically constructs a Partial Search Graph during search to train a Node Selection Agent and a Move Selection Agent for intensification as well as a Jump Agent to trigger explorations of new regions. The framework separates problem-agnostic search control from compact domain-specific encoding to improve adaptability.

What carries the argument

Partial Search Graph (PSG) dynamically built from solutions during search, used to train Node Selection, Move Selection, and Jump agents that control intensification and diversification.

If this is right

COAgents sets a new state of the art among learning-based methods on VRPTW instances.
Reduces the gap to the best-known solutions by 14% at N=100 and 44% at N=50 relative to POMO.
Reduces the gap by 21% at N=100 and 40% at N=50 relative to ALNS.
Remains competitive with several learn-to-search baselines on CVRP.
The separation of search control from domain encoding supports adaptability across different tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested on other combinatorial optimization problems that use local search and diversification strategies.
Learned jump timing might help in problems where manual diversification rules are hard to design.

Load-bearing premise

Dynamically constructing a Partial Search Graph and training specialized agents on it will reliably guide both intensification and diversification across diverse VRP instances without needing extensive problem-specific handcrafted rules.

What would settle it

Evaluating COAgents on a fresh collection of VRPTW instances of size 50 and 100 and measuring whether the optimality gap reductions match or exceed the claimed 44% and 14% improvements over POMO would confirm or refute the performance advantage.

Figures

Figures reproduced from arXiv: 2605.20618 by Abdullah Ali Sivas, Cheikh Ahmed, Mahdi Mostajabdaveh, Mao Kun, Oleksandr Yakovenko, Xiaorui Li, Zirui Zhou.

**Figure 3.** Figure 3: Evolution of a PSG during the algorithm. s denotes solutions and m are moves. In this study, we propose a method that learns how to guide the exploration of the search space from previous experiences and how to generate jumps during exploration. We will use PSGs to identify patterns in solution exploration and to guide further exploration, improving search efficiency by focusing on unexplored or promising… view at source ↗

**Figure 4.** Figure 4: illustrates the general architecture of agents [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of optimality gap for a test set of VRPTW instances. [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Node & Move Selection training sample. Left: Three PSGs converging from distinct perturbed solutions. Right: A training sample with three sub-graphs sampled from the PSGs on the left with labels. Jump Agent Dataset For our jump model, using the same training instances, we generate multiple trajectories per instance by running ALNS [13] from different random starts until convergence to a local optimum. We r… view at source ↗

read the original abstract

Although Vehicle Routing Problems (VRP) are essential to many real-world systems, they remain computationally intractable at scale due to their combinatorial complexity. Traditional heuristics rely on handcrafted rules for local improvements and occasional \textit{jumps} to escape local minima, but often struggle to generalize across diverse instances. We introduce \textbf{COAgents}, a cooperative multi-agent framework that models the search process as a graph: nodes represent solutions, and edges correspond to either local refinements or large perturbations for diversification (i.e., jumps). A \textit{Partial Search Graph} (PSG) is dynamically constructed during search, enabling COAgents to train a Node Selection Agent and a Move Selection Agent to guide intensification, and a Jump Agent to trigger well-timed explorations of new regions. Unlike end-to-end learning approaches, COAgents cleanly separates problem-agnostic search control from compact domain-specific encoding, facilitating adaptability across tasks. Extensive experiments on the CVRP and VRPTW benchmarks show that COAgents remains competitive with several learn-to-search baselines on CVRP and sets a new state of the art among learning-based methods on the more challenging VRPTW instances, reducing the gap to the best-known solutions by 14\% at $N\!=\!100$ and 44\% at $N\!=\!50$ relative to the strongest neural solver (POMO), and by 21\% and 40\% respectively relative to ALNS. Code is available at https://github.com/mahdims/COAgents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

COAgents gives a clean multi-agent split for VRP search control via partial graphs, but the VRPTW gap reductions need matched budgets and ablations before the claims stick.

read the letter

The core contribution here is the cooperative setup with a Node Selection Agent, Move Selection Agent, and Jump Agent all working on a dynamically built Partial Search Graph. This separates problem-agnostic search decisions from the compact domain encoding for CVRP and VRPTW, which is a practical step beyond pure end-to-end neural solvers. They ship code, which is useful for anyone wanting to test the approach directly. The framework description in the abstract is straightforward and the motivation for using specialized agents for intensification versus diversification tracks with how traditional heuristics handle local search and escapes from minima. That separation could help with generalization across instance distributions without heavy hand-engineering. On the empirical side the abstract reports competitive results on CVRP and new SOTA among learning methods on VRPTW, with gap reductions versus POMO and ALNS. Those numbers are the part that needs scrutiny. The stress-test concern lands: without explicit matching on total local moves, perturbation frequency, wall-clock time, or function evaluations, it is hard to isolate whether the multi-agent PSG is responsible or whether any competent search given extra budget would close a similar fraction of the gap. Missing ablations, such as running without the Jump Agent while keeping the others, leave open the possibility that gains come from implementation details rather than the claimed architecture. The paper assumes the learned policies will guide both intensification and diversification reliably across diverse instances, but the abstract gives no protocol details on hyperparameter tuning, statistical significance, or training distribution coverage, so verification is limited right now. This work is aimed at people building learn-to-search methods for routing and logistics. A reader already working on multi-agent RL for combinatorial problems would find the framework and code worth examining. It deserves a serious referee because the modeling choice is coherent and the code lowers the barrier to checking the results; revisions would mainly need tighter experimental controls and the requested ablations. I would send it to review rather than desk reject.

Referee Report

3 major / 2 minor

Summary. The paper introduces COAgents, a cooperative multi-agent framework for routing problems that dynamically constructs a Partial Search Graph (PSG) during search. Nodes represent solutions and edges represent local moves or jumps; three specialized agents (Node Selection, Move Selection, and Jump) are trained to guide intensification and diversification. The approach separates problem-agnostic search control from domain encoding. Experiments on CVRP and VRPTW benchmarks claim competitiveness with learn-to-search baselines on CVRP and new state-of-the-art results among learning-based methods on VRPTW, with gap reductions to best-known solutions of 14% (N=100) and 44% (N=50) versus POMO and 21%/40% versus ALNS.

Significance. If the empirical claims hold under matched budgets and proper controls, the clean separation of search control from domain encoding and the multi-agent PSG mechanism could provide a more generalizable alternative to end-to-end neural solvers for combinatorial optimization. The open-source code is a positive contribution that aids reproducibility. However, the significance is currently limited by the absence of detailed experimental protocols needed to confirm that gains arise from the proposed framework rather than unequal computational resources.

major comments (3)

[Abstract / Experiments] Abstract and experimental results section: The headline SOTA claims (14% gap reduction at N=100 and 44% at N=50 versus POMO on VRPTW) are presented without any reported details on the number of independent runs, statistical significance tests, variance across instances, or hyperparameter tuning protocol. This information is load-bearing for verifying that the reported improvements are robust and not the result of post-hoc selection or favorable random seeds.
[Method / Experiments] Method and experimental sections: No ablation is described that disables the Jump Agent while retaining the Node/Move Selection agents on the PSG. Without this control, it is impossible to isolate whether the cooperative multi-agent structure (rather than any competent search procedure given extra budget) is responsible for the claimed diversification benefits and gap reductions.
[Experiments] Experiments section: The manuscript does not report explicit matching of search budgets (total local moves, perturbation frequency, wall-clock time, or function evaluations) against the POMO and ALNS baselines. Because the framework description emphasizes problem-agnostic control, the absence of such controls leaves open the possibility that any search procedure with equivalent extra effort could achieve similar gap closures.

minor comments (2)

[Method] The notation distinguishing the three agents and the PSG construction steps would benefit from an explicit diagram or pseudocode listing in the method section to improve readability.
[Abstract / Introduction] A few sentences in the abstract and introduction repeat the same performance numbers; consolidating them would reduce redundancy.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight important aspects of experimental rigor that will improve the manuscript. We respond to each major comment below and indicate the planned revisions.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and experimental results section: The headline SOTA claims (14% gap reduction at N=100 and 44% at N=50 versus POMO on VRPTW) are presented without any reported details on the number of independent runs, statistical significance tests, variance across instances, or hyperparameter tuning protocol. This information is load-bearing for verifying that the reported improvements are robust and not the result of post-hoc selection or favorable random seeds.

Authors: We agree that these details are essential for substantiating the claims. In the revised manuscript we will report the number of independent runs, include standard deviations and variance across instances, add statistical significance tests (such as paired t-tests or Wilcoxon tests), and describe the hyperparameter tuning protocol in the Experiments section. revision: yes
Referee: [Method / Experiments] Method and experimental sections: No ablation is described that disables the Jump Agent while retaining the Node/Move Selection agents on the PSG. Without this control, it is impossible to isolate whether the cooperative multi-agent structure (rather than any competent search procedure given extra budget) is responsible for the claimed diversification benefits and gap reductions.

Authors: This is a fair request to isolate component contributions. We will add an ablation study in the revised Experiments section in which the Jump Agent is disabled while retaining the Node Selection and Move Selection agents on the PSG, and we will report the resulting performance to demonstrate the Jump Agent's role in diversification. revision: yes
Referee: [Experiments] Experiments section: The manuscript does not report explicit matching of search budgets (total local moves, perturbation frequency, wall-clock time, or function evaluations) against the POMO and ALNS baselines. Because the framework description emphasizes problem-agnostic control, the absence of such controls leaves open the possibility that any search procedure with equivalent extra effort could achieve similar gap closures.

Authors: We acknowledge the need for explicit budget matching. In the revision we will report detailed search budgets (local moves, perturbations, wall-clock time) for COAgents and the baselines. Where direct equivalence is not already present we will either adjust the experimental protocol or clearly document the differences and their implications. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical SOTA claims rest on external baseline comparisons

full rationale

The paper proposes a multi-agent framework (COAgents) that dynamically builds a Partial Search Graph and trains Node/Move Selection and Jump Agents to guide VRP search. Central claims of gap reductions (14%/44% vs POMO, 21%/40% vs ALNS on VRPTW at N=50/100) are presented as outcomes of experiments on standard CVRP/VRPTW benchmarks. These rest on direct performance comparisons to independent external solvers rather than any internal prediction, fitted parameter, or derivation that reduces to the method's own inputs by construction. No equations, uniqueness theorems, or ansatzes are invoked that loop back to self-defined quantities. The separation of problem-agnostic control from domain encoding supplies independent methodological content, rendering the reported results self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions from heuristic search and reinforcement learning plus typical ML hyperparameters; no new physical entities are postulated.

free parameters (1)

Agent network architectures and training hyperparameters
Standard in multi-agent RL; these are chosen or tuned during development to achieve the reported benchmark performance.

axioms (1)

domain assumption The solution search process for VRPs can be effectively modeled as a graph with edges for local refinements and large perturbations.
Core modeling choice described in the abstract for enabling agent-guided navigation.

pith-pipeline@v0.9.0 · 5828 in / 1369 out tokens · 60718 ms · 2026-05-21T05:21:53.624300+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

models the search process as a graph: nodes represent solutions, and edges correspond to either local refinements or large perturbations... Partial Search Graph (PSG) is dynamically constructed
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Node Selection Agent and a Move Selection Agent to guide intensification, and a Jump Agent

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 3 internal anchors

[1]

Residual Gated Graph ConvNets

Bresson, X., Laurent, T.: Residual gated graph convnets. arXiv:1711.07553 (2023). https://doi.org/https://doi.org/10.48550/arXiv.1711.07553

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1711.07553 2023
[2]

Handbook of metaheuristics pp

Burke, E.K., Hyde, M.R., Kendall, G., Ochoa, G., Özcan, E., Woodward, J.R.: A classification of hyper-heuristic approaches: revisited. Handbook of metaheuristics pp. 453–477 (2019)

work page 2019
[3]

Advances in neural information processing systems32(2019)

Chen, X., Tian, Y.: Learning to perform local rewriting for combinatorial opti- mization. Advances in neural information processing systems32(2019)

work page 2019
[4]

Advances in Neural Information Processing Systems35, 8760–8772 (2022)

Choo, J., Kwon, Y.D., Kim, J., Jae, J., Hottung, A., Tierney, K., Gwon, Y.: Simulation-guided beam search for neural combinatorial optimization. Advances in Neural Information Processing Systems35, 8760–8772 (2022)

work page 2022
[5]

Rethinking Attention with Performers

Choromanski, K., Likhosherstov, V., Dohan, D., Song, X., Gane, A., Sarlos, T., Hawkins, P., Davis, J., Mohiuddin, A., Kaiser, L., Belanger, D., Colwell, L., Weller, A.:Rethinkingattention withperformers. In:InternationalConferenceon Learning Representations (ICLR) (2021), https://arxiv.org/abs/2009.14794

work page internal anchor Pith review Pith/arXiv arXiv 2021
[6]

In: Proceedings of the 15th International Joint Conference on Artifical Intelligence - Volume 1

Denzinger, J., Fuchs, M., Fuchs, M.: High performance ATP systems by combining several AI methods. In: Proceedings of the 15th International Joint Conference on Artifical Intelligence - Volume 1. p. 102–107. IJCAI’97, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1997)

work page 1997
[7]

European Journal of Operational Research285(2), 405–428 (2020)

Drake,J.H.,Kheiri,A.,Özcan,E.,Burke,E.K.:Recentadvancesinselectionhyper- heuristics. European Journal of Operational Research285(2), 405–428 (2020)

work page 2020
[8]

Roskilde: Roskilde University12, 966–980 (2017)

Helsgaun, K.: An extension of the lin-kernighan-helsgaun tsp solver for constrained traveling salesman and vehicle routing problems. Roskilde: Roskilde University12, 966–980 (2017)

work page 2017
[9]

In: International Conference on Learning Representations (2021)

Hottung, A., Bhandari, B., Tierney, K.: Learning a latent search space for routing problems using variational autoencoders. In: International Conference on Learning Representations (2021)

work page 2021
[10]

International Conference on Learning Representations (2021)

Hottung, A., Kwon, Y.D., Tierney, K.: Efficient active search for combinatorial op- timization problems. International Conference on Learning Representations (2021)

work page 2021
[11]

In: Giacomo, G.D., et al

Hottung, A., Tierney, K.: Neural large neighborhood search for the capaci- tated vehicle routing problem. In: Giacomo, G.D., et al. (eds.) ECAI 2020 Title Suppressed Due to Excessive Length 15 – 24th European Conference on Artificial Intelligence, Frontiers in Artifi- cial Intelligence and Applications, vol. 325, pp. 443–450. IOS Press (2020). https://doi...

work page doi:10.3233/faia200124 2020
[12]

Artificial Intelligence313, 103786 (2022)

Hottung, A., Tierney, K.: Neural large neighborhood search for routing problems. Artificial Intelligence313, 103786 (2022)

work page 2022
[13]

Computers & Opera- tions Research172, 106791 (2024)

Johnn,S.N.,Darvariu,V.A.,Handl,J.,Kalcsics,J.:Agraphreinforcementlearning framework for neural adaptive large neighbourhood search. Computers & Opera- tions Research172, 106791 (2024)

work page 2024
[14]

European Journal of Operational Research309(1), 446–468 (2023)

Kallestad, J., Hasibi, R., Hemmati, A., Sörensen, K.: A general deep reinforcement learning hyperheuristic framework for solving combinatorial optimization prob- lems. European Journal of Operational Research309(1), 446–468 (2023)

work page 2023
[15]

Advances in neural information processing sys- tems30(2017)

Khalil, E., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial opti- mization algorithms over graphs. Advances in neural information processing sys- tems30(2017)

work page 2017
[16]

In: The Eleventh International Conference on Learning Repre- sentations (2023)

Kim, M., Park, J., Park, J.: Learning to cross exchange to solve min-max vehicle routing problems. In: The Eleventh International Conference on Learning Repre- sentations (2023)

work page 2023
[17]

Advances in Neural Information Processing Systems34, 10418–10430 (2021)

Kim, M., Park, J., et al.: Learning collaborative policies to solve np-hard routing problems. Advances in Neural Information Processing Systems34, 10418–10430 (2021)

work page 2021
[18]

Advances in Neural Information Processing Systems35, 1936–1949 (2022)

Kim, M., Park, J., Park, J.: Sym-nco: Leveraging symmetricity for neural com- binatorial optimization. Advances in Neural Information Processing Systems35, 1936–1949 (2022)

work page 1936
[19]

In: International conference on integration of constraint programming, artificial intelligence, and operations research

Kool, W., van Hoof, H., Gromicho, J., Welling, M.: Deep policy dynamic program- ming for vehicle routing problems. In: International conference on integration of constraint programming, artificial intelligence, and operations research. pp. 190–

work page
[20]

Advances in Neural Information Processing Systems33, 21188–21198 (2020)

Kwon, Y.D., Choo, J., Kim, B., Yoon, I., Gwon, Y., Min, S.: Pomo: Policy op- timization with multiple optima for reinforcement learning. Advances in Neural Information Processing Systems33, 21188–21198 (2020)

work page 2020
[21]

European Journal of Operational Research312(1), 70–91 (2024)

Lagos, F., Pereira, J.: Multi-armed bandit-based hyper-heuristics for combinatorial optimization problems. European Journal of Operational Research312(1), 70–91 (2024)

work page 2024
[22]

Journal of Machine Learning Research 23(54), 1–9 (2022), http://jmlr.org/papers/v23/21-0888.html

Lindauer, M., Eggensperger, K., Feurer, M., Biedenkapp, A., Deng, D., Benjamins, C., Ruhkopf, T., Sass, R., Hutter, F.: Smac3: A versatile bayesian optimization package for hyperparameter optimization. Journal of Machine Learning Research 23(54), 1–9 (2022), http://jmlr.org/papers/v23/21-0888.html

work page 2022
[23]

In: International conference on learning representations (2019)

Lu, H., Zhang, X., Yang, S.: A learning-based iterative method for solving vehicle routing problems. In: International conference on learning representations (2019)

work page 2019
[24]

Advances in Neural Information Processing Systems34, 11096–11107 (2021)

Ma, Y., Li, J., Cao, Z., Song, W., Zhang, L., Chen, Z., Tang, J.: Learning to itera- tively solve routing problems with dual-aspect collaborative transformer. Advances in Neural Information Processing Systems34, 11096–11107 (2021)

work page 2021
[25]

Advances in Neural Information Processing Systems36, 49555–49578 (2023)

Ma, Y., Cao, Z., Chee, Y.M.: Learning to search feasible and infeasible regions of routing problems with flexible neural k-opt. Advances in Neural Information Processing Systems36, 49555–49578 (2023)

work page 2023
[26]

European Journal of Operational Research324(2), 522–537 (2025)

Mostajabdaveh,M.,Salman,F.S.,Gutjahr,W.J.:Abranch-and-pricealgorithmfor fast and equitable last-mile relief aid distribution. European Journal of Operational Research324(2), 522–537 (2025)

work page 2025
[27]

Yakovenko et al

Petropoulos, F., Laporte, G., Aktas, E., Alumur, S.A., Archetti, C., Ayhan, H., Battarra, M., Bennell, J.A., Bourjolly, J.M., Boylan, J.E., Breton, M., Canca, 16 O. Yakovenko et al. D., Charlin, L., Chen, B., Cicek, C.T., Jr, L.A.C., Currie, C.S., Demeulemeester, E., Ding, L., Disney, S.M., Ehrgott, M., Eppler, M.J., Erdoğan, G., Fortz, B., Franco, L.A., ...

work page 2024
[28]

Recipe for a General, Powerful, Scalable Graph Transformer

Rampášek, L., Galkin, M., Dwivedi, V.P., Luu, A.T., Wolf, G., Beaini, D.: Recipe for a general, powerful, scalable graph transformer. In: Advances in Neural Information Processing Systems. vol. 35, pp. 14501–14515 (2022), https://arxiv.org/abs/2205.12454

work page arXiv 2022
[29]

In: Proceedings of the 36th International Conference on Neural Information Processing Systems

Rampášek, L., Galkin, M., Dwivedi, V.P., Luu, A.T., Wolf, G., Beaini, D.: Recipe for a general, powerful, scalable graph transformer. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. NIPS ’22, Curran Associates Inc., Red Hook, NY, USA (2024)

work page 2024
[30]

In: Proceedings of the International Conference on Automated Planning and Scheduling

Reijnen, R., Zhang, Y., Lau, H.C., Bukhsh, Z.: Online control of adaptive large neighborhood search using deep reinforcement learning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 34, pp. 475– 483 (2024)

work page 2024
[31]

European Journal of Operational Research260(3), 972–983 (2017)

Soria-Alcaraz, J.A., Ochoa, G., Sotelo-Figeroa, M.A., Burke, E.K.: A method- ology for determining an effective subset of heuristics in selection hyper- heuristics. European Journal of Operational Research260(3), 972–983 (2017). https://doi.org/https://doi.org/10.1016/j.ejor.2017.01.042

work page doi:10.1016/j.ejor.2017.01.042 2017
[32]

Computers & Operations Research140, 105643 (2022)

Vidal, T.: Hybrid genetic search for the cvrp: Open-source implementa- tion and swap* neighborhood. Computers & Operations Research140, 105643 (2022). https://doi.org/https://doi.org/10.1016/j.cor.2021.105643, https://www.sciencedirect.com/science/article/pii/S030505482100349X

work page doi:10.1016/j.cor.2021.105643 2022
[33]

Computers & operations research40(1), 475–489 (2013)

Vidal, T., Crainic, T.G., Gendreau, M., Prins, C.: A hybrid genetic algorithm with adaptive diversity management for a large class of vehicle routing problems with time-windows. Computers & operations research40(1), 475–489 (2013)

work page 2013
[34]

Linformer: Self-Attention with Linear Complexity

Wang, S., Li, B.Z., Khabsa, M., Fang, H., Ma, H.: Linformer: Self- attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020), https://arxiv.org/abs/2006.04768

work page internal anchor Pith review Pith/arXiv arXiv 2006
[35]

INFORMS Journal on Computing36(4), 943–955 (2024)

Wouda, N.A., Lan, L., Kool, W.: PyVRP: a high-performance VRP solver package. INFORMS Journal on Computing36(4), 943–955 (2024). https://doi.org/10.1287/ijoc.2023.0055, https://doi.org/10.1287/ijoc.2023.0055

work page doi:10.1287/ijoc.2023.0055 2024
[36]

IEEE transactions on neural networks and learning systems33(9), 5057–5069 (2021)

Wu, Y., Song, W., Cao, Z., Zhang, J., Lim, A.: Learning improvement heuristics for solving routing problems. IEEE transactions on neural networks and learning systems33(9), 5057–5069 (2021)

work page 2021
[37]

Ye, H., Wang, J., Cao, Z., Berto, F., Hua, C., Kim, H., Park, J., Song, G.: ReEvo: Largelanguagemodelsashyper-heuristicswithreflectiveevolution.In:Proceedings of the 38th Conference on Neural Information Processing (NeurIPS 2024). pp. 10–

work page 2024
[38]

Vancouver, Canada (2024) Title Suppressed Due to Excessive Length 17

work page 2024
[39]

perturb-and-improve

Zhou, J., Cao, Z., Wu, Y., Song, W., Ma, Y., Zhang, J., Xu, C.: Mvmoe: Multi- task vehicle routing solver with mixture-of-experts. In: International Conference on Machine Learning (2024) 18 O. Yakovenko et al. A Problem-Specific Feature Definitions Below we summarize the global and local features used for each problem domain in our experiments. Vehicle Ro...

work page 2024

[1] [1]

Residual Gated Graph ConvNets

Bresson, X., Laurent, T.: Residual gated graph convnets. arXiv:1711.07553 (2023). https://doi.org/https://doi.org/10.48550/arXiv.1711.07553

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1711.07553 2023

[2] [2]

Handbook of metaheuristics pp

Burke, E.K., Hyde, M.R., Kendall, G., Ochoa, G., Özcan, E., Woodward, J.R.: A classification of hyper-heuristic approaches: revisited. Handbook of metaheuristics pp. 453–477 (2019)

work page 2019

[3] [3]

Advances in neural information processing systems32(2019)

Chen, X., Tian, Y.: Learning to perform local rewriting for combinatorial opti- mization. Advances in neural information processing systems32(2019)

work page 2019

[4] [4]

Advances in Neural Information Processing Systems35, 8760–8772 (2022)

Choo, J., Kwon, Y.D., Kim, J., Jae, J., Hottung, A., Tierney, K., Gwon, Y.: Simulation-guided beam search for neural combinatorial optimization. Advances in Neural Information Processing Systems35, 8760–8772 (2022)

work page 2022

[5] [5]

Rethinking Attention with Performers

Choromanski, K., Likhosherstov, V., Dohan, D., Song, X., Gane, A., Sarlos, T., Hawkins, P., Davis, J., Mohiuddin, A., Kaiser, L., Belanger, D., Colwell, L., Weller, A.:Rethinkingattention withperformers. In:InternationalConferenceon Learning Representations (ICLR) (2021), https://arxiv.org/abs/2009.14794

work page internal anchor Pith review Pith/arXiv arXiv 2021

[6] [6]

In: Proceedings of the 15th International Joint Conference on Artifical Intelligence - Volume 1

Denzinger, J., Fuchs, M., Fuchs, M.: High performance ATP systems by combining several AI methods. In: Proceedings of the 15th International Joint Conference on Artifical Intelligence - Volume 1. p. 102–107. IJCAI’97, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1997)

work page 1997

[7] [7]

European Journal of Operational Research285(2), 405–428 (2020)

Drake,J.H.,Kheiri,A.,Özcan,E.,Burke,E.K.:Recentadvancesinselectionhyper- heuristics. European Journal of Operational Research285(2), 405–428 (2020)

work page 2020

[8] [8]

Roskilde: Roskilde University12, 966–980 (2017)

Helsgaun, K.: An extension of the lin-kernighan-helsgaun tsp solver for constrained traveling salesman and vehicle routing problems. Roskilde: Roskilde University12, 966–980 (2017)

work page 2017

[9] [9]

In: International Conference on Learning Representations (2021)

Hottung, A., Bhandari, B., Tierney, K.: Learning a latent search space for routing problems using variational autoencoders. In: International Conference on Learning Representations (2021)

work page 2021

[10] [10]

International Conference on Learning Representations (2021)

Hottung, A., Kwon, Y.D., Tierney, K.: Efficient active search for combinatorial op- timization problems. International Conference on Learning Representations (2021)

work page 2021

[11] [11]

In: Giacomo, G.D., et al

Hottung, A., Tierney, K.: Neural large neighborhood search for the capaci- tated vehicle routing problem. In: Giacomo, G.D., et al. (eds.) ECAI 2020 Title Suppressed Due to Excessive Length 15 – 24th European Conference on Artificial Intelligence, Frontiers in Artifi- cial Intelligence and Applications, vol. 325, pp. 443–450. IOS Press (2020). https://doi...

work page doi:10.3233/faia200124 2020

[12] [12]

Artificial Intelligence313, 103786 (2022)

Hottung, A., Tierney, K.: Neural large neighborhood search for routing problems. Artificial Intelligence313, 103786 (2022)

work page 2022

[13] [13]

Computers & Opera- tions Research172, 106791 (2024)

Johnn,S.N.,Darvariu,V.A.,Handl,J.,Kalcsics,J.:Agraphreinforcementlearning framework for neural adaptive large neighbourhood search. Computers & Opera- tions Research172, 106791 (2024)

work page 2024

[14] [14]

European Journal of Operational Research309(1), 446–468 (2023)

Kallestad, J., Hasibi, R., Hemmati, A., Sörensen, K.: A general deep reinforcement learning hyperheuristic framework for solving combinatorial optimization prob- lems. European Journal of Operational Research309(1), 446–468 (2023)

work page 2023

[15] [15]

Advances in neural information processing sys- tems30(2017)

Khalil, E., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial opti- mization algorithms over graphs. Advances in neural information processing sys- tems30(2017)

work page 2017

[16] [16]

In: The Eleventh International Conference on Learning Repre- sentations (2023)

Kim, M., Park, J., Park, J.: Learning to cross exchange to solve min-max vehicle routing problems. In: The Eleventh International Conference on Learning Repre- sentations (2023)

work page 2023

[17] [17]

Advances in Neural Information Processing Systems34, 10418–10430 (2021)

Kim, M., Park, J., et al.: Learning collaborative policies to solve np-hard routing problems. Advances in Neural Information Processing Systems34, 10418–10430 (2021)

work page 2021

[18] [18]

Advances in Neural Information Processing Systems35, 1936–1949 (2022)

Kim, M., Park, J., Park, J.: Sym-nco: Leveraging symmetricity for neural com- binatorial optimization. Advances in Neural Information Processing Systems35, 1936–1949 (2022)

work page 1936

[19] [19]

In: International conference on integration of constraint programming, artificial intelligence, and operations research

Kool, W., van Hoof, H., Gromicho, J., Welling, M.: Deep policy dynamic program- ming for vehicle routing problems. In: International conference on integration of constraint programming, artificial intelligence, and operations research. pp. 190–

work page

[20] [20]

Advances in Neural Information Processing Systems33, 21188–21198 (2020)

Kwon, Y.D., Choo, J., Kim, B., Yoon, I., Gwon, Y., Min, S.: Pomo: Policy op- timization with multiple optima for reinforcement learning. Advances in Neural Information Processing Systems33, 21188–21198 (2020)

work page 2020

[21] [21]

European Journal of Operational Research312(1), 70–91 (2024)

Lagos, F., Pereira, J.: Multi-armed bandit-based hyper-heuristics for combinatorial optimization problems. European Journal of Operational Research312(1), 70–91 (2024)

work page 2024

[22] [22]

Journal of Machine Learning Research 23(54), 1–9 (2022), http://jmlr.org/papers/v23/21-0888.html

Lindauer, M., Eggensperger, K., Feurer, M., Biedenkapp, A., Deng, D., Benjamins, C., Ruhkopf, T., Sass, R., Hutter, F.: Smac3: A versatile bayesian optimization package for hyperparameter optimization. Journal of Machine Learning Research 23(54), 1–9 (2022), http://jmlr.org/papers/v23/21-0888.html

work page 2022

[23] [23]

In: International conference on learning representations (2019)

Lu, H., Zhang, X., Yang, S.: A learning-based iterative method for solving vehicle routing problems. In: International conference on learning representations (2019)

work page 2019

[24] [24]

Advances in Neural Information Processing Systems34, 11096–11107 (2021)

Ma, Y., Li, J., Cao, Z., Song, W., Zhang, L., Chen, Z., Tang, J.: Learning to itera- tively solve routing problems with dual-aspect collaborative transformer. Advances in Neural Information Processing Systems34, 11096–11107 (2021)

work page 2021

[25] [25]

Advances in Neural Information Processing Systems36, 49555–49578 (2023)

Ma, Y., Cao, Z., Chee, Y.M.: Learning to search feasible and infeasible regions of routing problems with flexible neural k-opt. Advances in Neural Information Processing Systems36, 49555–49578 (2023)

work page 2023

[26] [26]

European Journal of Operational Research324(2), 522–537 (2025)

Mostajabdaveh,M.,Salman,F.S.,Gutjahr,W.J.:Abranch-and-pricealgorithmfor fast and equitable last-mile relief aid distribution. European Journal of Operational Research324(2), 522–537 (2025)

work page 2025

[27] [27]

Yakovenko et al

Petropoulos, F., Laporte, G., Aktas, E., Alumur, S.A., Archetti, C., Ayhan, H., Battarra, M., Bennell, J.A., Bourjolly, J.M., Boylan, J.E., Breton, M., Canca, 16 O. Yakovenko et al. D., Charlin, L., Chen, B., Cicek, C.T., Jr, L.A.C., Currie, C.S., Demeulemeester, E., Ding, L., Disney, S.M., Ehrgott, M., Eppler, M.J., Erdoğan, G., Fortz, B., Franco, L.A., ...

work page 2024

[28] [28]

Recipe for a General, Powerful, Scalable Graph Transformer

Rampášek, L., Galkin, M., Dwivedi, V.P., Luu, A.T., Wolf, G., Beaini, D.: Recipe for a general, powerful, scalable graph transformer. In: Advances in Neural Information Processing Systems. vol. 35, pp. 14501–14515 (2022), https://arxiv.org/abs/2205.12454

work page arXiv 2022

[29] [29]

In: Proceedings of the 36th International Conference on Neural Information Processing Systems

Rampášek, L., Galkin, M., Dwivedi, V.P., Luu, A.T., Wolf, G., Beaini, D.: Recipe for a general, powerful, scalable graph transformer. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. NIPS ’22, Curran Associates Inc., Red Hook, NY, USA (2024)

work page 2024

[30] [30]

In: Proceedings of the International Conference on Automated Planning and Scheduling

Reijnen, R., Zhang, Y., Lau, H.C., Bukhsh, Z.: Online control of adaptive large neighborhood search using deep reinforcement learning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 34, pp. 475– 483 (2024)

work page 2024

[31] [31]

European Journal of Operational Research260(3), 972–983 (2017)

Soria-Alcaraz, J.A., Ochoa, G., Sotelo-Figeroa, M.A., Burke, E.K.: A method- ology for determining an effective subset of heuristics in selection hyper- heuristics. European Journal of Operational Research260(3), 972–983 (2017). https://doi.org/https://doi.org/10.1016/j.ejor.2017.01.042

work page doi:10.1016/j.ejor.2017.01.042 2017

[32] [32]

Computers & Operations Research140, 105643 (2022)

Vidal, T.: Hybrid genetic search for the cvrp: Open-source implementa- tion and swap* neighborhood. Computers & Operations Research140, 105643 (2022). https://doi.org/https://doi.org/10.1016/j.cor.2021.105643, https://www.sciencedirect.com/science/article/pii/S030505482100349X

work page doi:10.1016/j.cor.2021.105643 2022

[33] [33]

Computers & operations research40(1), 475–489 (2013)

Vidal, T., Crainic, T.G., Gendreau, M., Prins, C.: A hybrid genetic algorithm with adaptive diversity management for a large class of vehicle routing problems with time-windows. Computers & operations research40(1), 475–489 (2013)

work page 2013

[34] [34]

Linformer: Self-Attention with Linear Complexity

Wang, S., Li, B.Z., Khabsa, M., Fang, H., Ma, H.: Linformer: Self- attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020), https://arxiv.org/abs/2006.04768

work page internal anchor Pith review Pith/arXiv arXiv 2006

[35] [35]

INFORMS Journal on Computing36(4), 943–955 (2024)

Wouda, N.A., Lan, L., Kool, W.: PyVRP: a high-performance VRP solver package. INFORMS Journal on Computing36(4), 943–955 (2024). https://doi.org/10.1287/ijoc.2023.0055, https://doi.org/10.1287/ijoc.2023.0055

work page doi:10.1287/ijoc.2023.0055 2024

[36] [36]

IEEE transactions on neural networks and learning systems33(9), 5057–5069 (2021)

Wu, Y., Song, W., Cao, Z., Zhang, J., Lim, A.: Learning improvement heuristics for solving routing problems. IEEE transactions on neural networks and learning systems33(9), 5057–5069 (2021)

work page 2021

[37] [37]

Ye, H., Wang, J., Cao, Z., Berto, F., Hua, C., Kim, H., Park, J., Song, G.: ReEvo: Largelanguagemodelsashyper-heuristicswithreflectiveevolution.In:Proceedings of the 38th Conference on Neural Information Processing (NeurIPS 2024). pp. 10–

work page 2024

[38] [38]

Vancouver, Canada (2024) Title Suppressed Due to Excessive Length 17

work page 2024

[39] [39]

perturb-and-improve

Zhou, J., Cao, Z., Wu, Y., Song, W., Ma, Y., Zhang, J., Xu, C.: Mvmoe: Multi- task vehicle routing solver with mixture-of-experts. In: International Conference on Machine Learning (2024) 18 O. Yakovenko et al. A Problem-Specific Feature Definitions Below we summarize the global and local features used for each problem domain in our experiments. Vehicle Ro...

work page 2024