arxiv: 2605.14416 · v1 · submitted 2026-05-14 · 💻 cs.AI

Recognition: no theorem link

A Unified Knowledge Embedded Reinforcement Learning-based Framework for Generalized Capacitated Vehicle Routing Problems

Wen Wang , Xiangchen Wu , Liang Wang , Hao Hu , Xianping Tao

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:06 UTC · model grok-4.3

classification 💻 cs.AI

keywords Capacitated Vehicle RoutingReinforcement LearningKnowledge EmbeddingHeuristic DecompositionDynamic ProgrammingGeneralizationCombinatorial OptimizationVehicle Routing Problem

0 comments

The pith

A framework embedding classical routing knowledge into RL achieves better solutions for diverse CVRP variants.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to create a unified RL framework for generalized Capacitated Vehicle Routing Problems by incorporating knowledge from Route-First Cluster-Second heuristics. It decomposes the problem so that dynamic programming solves the clustering part and guides the RL agent on route construction. A history-enhanced context module helps with the partial observability from this split. If correct, this hybrid approach delivers higher quality solutions than pure learning methods while generalizing well to variants with extra constraints like time windows. This would matter for practical logistics where problems vary and end-to-end RL often falls short of classical methods.

Core claim

By decomposing generalized CVRPs into a route-first subproblem solved by RL and a cluster-second subproblem solved by dynamic programming whose results guide the RL solver, and adding a unified history-enhanced context processing module to handle partial observability, the framework produces superior solution quality compared to state-of-the-art learning-based methods and shows strong generalization across CVRP variants.

What carries the argument

The Route-First Cluster-Second decomposition combined with dynamic programming guidance for clustering to inform the RL-based route constructor, along with the history-enhanced context processing module.

If this is right

Superior solution quality over state-of-the-art learning-based methods for CVRPs.
Smaller performance gap to classical heuristics.
Strong generalization to diverse CVRP variants including those with time windows or backhauls.
Unified framework applicable to multiple problem types without custom redesign.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar knowledge embedding could improve RL for other combinatorial problems like scheduling or packing.
Integration with more advanced DP or exact methods might further close the gap to optimal solutions.
Testing on real-world logistics data with noisy constraints could reveal practical robustness.

Load-bearing premise

That decomposing the problem via Route-First Cluster-Second and using dynamic programming to guide RL will consistently mitigate partial observability and yield generalizable gains without adding new biases or causing overfitting to specific variants.

What would settle it

Running the framework on an unseen CVRP variant, such as one involving pickup and delivery with time windows not in the original experiments, and verifying if it still outperforms learning baselines by a similar margin.

Figures

Figures reproduced from arXiv: 2605.14416 by Hao Hu, Liang Wang, Wen Wang, Xiangchen Wu, Xianping Tao.

**Figure 2.** Figure 2: The Average Optimality Gap. 5.4 Results for RQ2 We evaluate the zero-shot transfer capability of RFCS across different capacity distributions encountered during training. During training, for scenarios with n = 50, the vehicle capacity was set to 40. We directly evaluate the model obtained in RQ1 on instances with varying vehicle capacities to examine its performance changes. The target distribution of ve… view at source ↗

read the original abstract

The Capacitated Vehicle Routing Problem (CVRP) is a fundamental NP-hard problem with broad applications in logistics and transportation. Real-world CVRPs often involve diverse objectives and complex constraints, such as time windows or backhaul requirements, motivating the development of a unified solution framework. Recent reinforcement learning (RL) approaches have shown promise in combinatorial optimization, yet they rely on end-to-end learning and lack explicit problem-solving knowledge, limiting solution quality. In this paper, we propose a knowledge-embedded framework inspired by the Route-First Cluster-Second heuristics. It incorporates knowledge at two levels: (1) decomposing CVRPs into the route-first and cluster-second subproblems, and (2) leveraging dynamic programming to solve the second subproblem, whose results guide the RL-based constructive solver to solve the first problem. To mitigate partial observability caused by problem decomposition, we introduce a unified history-enhanced context processing module. Extensive experiments show that this framework achieves superior solution quality compared with state-of-the-art learning-based methods, with a smaller gap to classical heuristics, demonstrating strong generalization across diverse CVRP variants.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a concrete way to mix Route-First Cluster-Second decomposition with DP guidance inside an RL solver for CVRP variants, but the reported gains rest on experiments whose details are not visible in the abstract.

read the letter

The core move is to split the problem the way classical heuristics do, let dynamic programming handle the cluster subproblem exactly, and feed its output back as guidance to an RL policy that builds the routes. They add a single history-enhanced context module to patch the partial observability that the split creates. That combination is new relative to the pure end-to-end RL papers they cite, and it is a direct, workable way to bring in external knowledge rather than hoping the network learns it from scratch. The approach is grounded in existing ideas and avoids circular definitions, which is a plus. The abstract claims better solution quality than other learning methods and a narrower gap to classical heuristics, plus good generalization across variants. If the full experiments actually show that with proper baselines and ablations, the framework is worth attention for people who need solvers that handle time windows, backhauls, and similar constraints without retraining from zero each time. The main soft spot is that the abstract gives no numbers, no error bars, no list of baselines, and no ablation on the history module or the DP feedback. Without those, it is impossible to tell whether the claimed smaller gap is real or an artifact of how the test instances were generated. The stress-test worry about hidden bias from the decomposition is reasonable until the paper shows results on out-of-distribution constraint combinations. This is for researchers who already work on learning-based routing solvers and want to see how explicit heuristic knowledge can be injected without losing the end-to-end flavor. A reader who cares about practical performance on logistics instances would get value if the numbers hold up. It deserves a serious referee because the construction is testable and the motivation is clear, even if the current write-up leaves the evidence thin.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a knowledge-embedded RL framework for generalized CVRPs by decomposing into route-first (RL) and cluster-second (DP) subproblems, using DP outputs to guide the RL policy, and introducing a history-enhanced context module to handle partial observability from the split. It claims this yields better solution quality than SOTA learning methods and closer performance to classical heuristics, with strong generalization across CVRP variants.

Significance. If the experimental claims hold, the result would be significant for combinatorial optimization because it supplies a concrete mechanism for injecting domain knowledge (heuristic decomposition plus exact DP sub-solvers) into RL, directly addressing the generalization limits of end-to-end learning highlighted in the abstract. The hybrid Route-First Cluster-Second structure with feedback is a clear, non-circular strength that could influence future hybrid solvers.

major comments (2)

Abstract: the central claim of 'superior solution quality' and 'strong generalization' is asserted without any quantitative metrics, baselines, instance sizes, error bars, or ablation results; this is load-bearing because the abstract supplies no evidence that the history module or DP guidance actually delivers the claimed improvements.
Framework description (Route-First Cluster-Second + DP guidance): the skeptic concern lands—the decomposition induces partial observability whose repair by the unified history-enhanced module is not shown to be robust; no referenced test demonstrates that DP-derived guidance avoids systematic bias on out-of-distribution constraint combinations (e.g., unseen time-window + backhaul mixes).

minor comments (1)

Abstract: the phrase 'smaller gap to classical heuristics' is imprecise; name the specific heuristics and report the numerical gap reduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the abstract should contain quantitative evidence and will revise it. We also acknowledge the need to demonstrate robustness of the history module on mixed out-of-distribution constraints and will add targeted experiments. Point-by-point responses are provided below.

read point-by-point responses

Referee: Abstract: the central claim of 'superior solution quality' and 'strong generalization' is asserted without any quantitative metrics, baselines, instance sizes, error bars, or ablation results; this is load-bearing because the abstract supplies no evidence that the history module or DP guidance actually delivers the claimed improvements.

Authors: We agree that the abstract is currently qualitative and should be strengthened with concrete evidence. In the revised manuscript we will insert specific metrics: average optimality gaps (with standard deviations over 5 runs) on CVRP instances of 20-100 customers, direct comparisons to learning baselines (AM, POMO) and classical solvers (LKH, OR-Tools), plus ablation results isolating the DP guidance and history-enhanced module contributions. revision: yes
Referee: Framework description (Route-First Cluster-Second + DP guidance): the skeptic concern lands—the decomposition induces partial observability whose repair by the unified history-enhanced module is not shown to be robust; no referenced test demonstrates that DP-derived guidance avoids systematic bias on out-of-distribution constraint combinations (e.g., unseen time-window + backhaul mixes).

Authors: We appreciate the concern. The history-enhanced module maintains a running record of prior route decisions to compensate for the split-induced partial observability, and our current experiments already cover separate CVRP variants (time windows, backhauls, etc.) with consistent gains. To directly test mixed OOD combinations, we will add new experiments in the revision: models trained on single-constraint data will be evaluated on unseen time-window + backhaul mixes, with explicit reporting of any systematic bias in the DP-guided policy. Because the DP sub-solver is exact, we expect limited bias, but the new results will quantify this. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected; derivation rests on external heuristics and DP components.

full rationale

The paper's framework decomposes CVRP via the established Route-First Cluster-Second heuristic and feeds DP outputs as guidance to an RL constructive solver, with a history-enhanced module added to address partial observability. These steps are presented as incorporation of independent problem-solving knowledge rather than self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations. No equations or claims in the abstract or described chain reduce the reported generalization or solution quality to the inputs by construction; performance claims are tied to external experimental baselines. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on the assumption that the Route-First Cluster-Second heuristic decomposition is a valid and useful split for RL, plus standard RL training assumptions and the correctness of dynamic programming for the cluster subproblem.

axioms (2)

domain assumption Route-First Cluster-Second heuristic provides a beneficial decomposition for CVRP variants
Invoked in the framework design to justify splitting the problem and using DP on the second subproblem.
domain assumption Dynamic programming solves the cluster-second subproblem exactly and its outputs are useful guidance for RL
Central to the knowledge-embedding claim.

invented entities (1)

unified history-enhanced context processing module no independent evidence
purpose: Mitigate partial observability caused by problem decomposition
New module introduced to handle information loss from the split; no independent evidence provided beyond the claim.

pith-pipeline@v0.9.0 · 5496 in / 1438 out tokens · 33911 ms · 2026-05-15T02:06:16.113961+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

[1]

Routefinder: Towards foundation models for vehicle routing problems

[Bertoet al., 2024 ] Federico Berto, Chuanbo Hua, Nayeli Gast Zepeda, Andr ´e Hottung, Niels Wouda, Leon Lan, Kevin Tierney, and Jinkyoo Park. Routefinder: Towards foundation models for vehicle routing problems. InICML 2024 Workshop on Foundation Models in the Wild,

work page 2024
[2]

Learn- ing to handle complex constraints for vehicle routing prob- lems.Advances in Neural Information Processing Sys- tems, 37:93479–93509,

[Biet al., 2024 ] Jieyi Bi, Yining Ma, Jianan Zhou, Wen Song, Zhiguang Cao, Yaoxin Wu, and Jie Zhang. Learn- ing to handle complex constraints for vehicle routing prob- lems.Advances in Neural Information Processing Sys- tems, 37:93479–93509,

work page 2024
[3]

Learning to perform local rewriting for combinatorial opti- mization.Advances in neural information processing sys- tems, 32,

[Chen and Tian, 2019] Xinyun Chen and Yuandong Tian. Learning to perform local rewriting for combinatorial opti- mization.Advances in neural information processing sys- tems, 32,

work page 2019
[4]

Select and optimize: Learning to solve large-scale tsp instances

[Chenget al., 2023 ] Hanni Cheng, Haosi Zheng, Ya Cong, Weihao Jiang, and Shiliang Pu. Select and optimize: Learning to solve large-scale tsp instances. InInterna- tional conference on artificial intelligence and statistics, pages 1219–1231. PMLR,

work page 2023
[5]

Learning 2-opt heuristics for the traveling salesman problem via deep re- inforcement learning

[d O Costaet al., 2020 ] Paulo R d O Costa, Jason Rhugge- naath, Yingqian Zhang, and Alp Akcay. Learning 2-opt heuristics for the traveling salesman problem via deep re- inforcement learning. InAsian conference on machine learning, pages 465–480. PMLR,

work page 2020
[6]

Moco: A learnable meta optimizer for combinatorial optimization

[Derneddeet al., 2025 ] Tim Dernedde, Daniela Thyssens, S¨oren Dittrich, Maximilian Stubbemann, and Lars Schmidt-Thieme. Moco: A learnable meta optimizer for combinatorial optimization. InPacific-Asia Conference on Knowledge Discovery and Data Mining, pages 236–248. Springer,

work page 2025
[7]

Goal: A generalist combinatorial optimization agent learner

[Drakulicet al., 2025 ] Darko Drakulic, Sofia Michel, and Jean-Marc Andreoli. Goal: A generalist combinatorial optimization agent learner. InProceedings of the Inter- national Conference on Learning Representations (ICLR) 2025,

work page 2025
[8]

[Furnon and Perron, ] Vincent Furnon and Laurent Perron

Poster. [Furnon and Perron, ] Vincent Furnon and Laurent Perron. Or-tools routing library. [Gaoet al., 2024 ] Chengrui Gao, Haopu Shang, Ke Xue, Dong Li, and Chao Qian. Towards generalizable neural solvers for vehicle routing problems via ensemble with transferrable local policy. InProceedings of the Thirty- Third International Joint Conference on Artific...

work page 2024
[9]

Long short-term memory.Neural computation, 9(8):1735–1780,

[Hochreiter and Schmidhuber, 1997] Sepp Hochreiter and J¨urgen Schmidhuber. Long short-term memory.Neural computation, 9(8):1735–1780,

work page 1997
[10]

Neural large neighborhood search for the capacitated vehicle routing problem

[Hottung and Tierney, 2020] Andr´e Hottung and Kevin Tier- ney. Neural large neighborhood search for the capacitated vehicle routing problem. InECAI 2020, pages 443–450. IOS Press,

work page 2020
[11]

Optimizing urban logistics: Vehicle routing prob- lem with underground transportation.IEEE Transactions on Intelligent Transportation Systems,

[Jeong and Song, 2025] Ho Young Jeong and Byung Duk Song. Optimizing urban logistics: Vehicle routing prob- lem with underground transportation.IEEE Transactions on Intelligent Transportation Systems,

work page 2025
[12]

Sym-nco: Leveraging symmetricity for neural com- binatorial optimization.Advances in Neural Information Processing Systems, 35:1936–1949,

[Kimet al., 2022a ] Minsu Kim, Junyoung Park, and Jinkyoo Park. Sym-nco: Leveraging symmetricity for neural com- binatorial optimization.Advances in Neural Information Processing Systems, 35:1936–1949,

work page 1936
[13]

Scale-conditioned adaptation for large scale combinatorial optimization

[Kimet al., 2022b ] Minsu Kim, Jiwoo Son, Hyeonah Kim, and Jinkyoo Park. Scale-conditioned adaptation for large scale combinatorial optimization. InNeurIPS 2022 Work- shop on Distribution Shifts: Connecting Methods and Ap- plications,

work page 2022
[14]

Attention, learn to solve routing problems! In International Conference on Learning Representations,

[Koolet al., 2018 ] Wouter Kool, Herke van Hoof, and Max Welling. Attention, learn to solve routing problems! In International Conference on Learning Representations,

work page 2018
[15]

Deep policy dynamic pro- gramming for vehicle routing problems

[Koolet al., 2022 ] Wouter Kool, Herke van Hoof, Joaquim Gromicho, and Max Welling. Deep policy dynamic pro- gramming for vehicle routing problems. InInternational conference on integration of constraint programming, ar- tificial intelligence, and operations research, pages 190–

work page 2022
[16]

Pomo: Policy optimization with multiple optima for reinforcement learning.Advances in Neural Information Processing Systems, 33:21188–21198,

[Kwonet al., 2020 ] Yeong-Dae Kwon, Jinho Choo, By- oungjip Kim, Iljoo Yoon, Youngjune Gwon, and Seungjai Min. Pomo: Policy optimization with multiple optima for reinforcement learning.Advances in Neural Information Processing Systems, 33:21188–21198,

work page 2020
[17]

Distri- bution path optimization for intelligent logistics vehicles of urban rail transportation using vrp optimization model

[Leng and Li, 2021] Kaijun Leng and Shanghong Li. Distri- bution path optimization for intelligent logistics vehicles of urban rail transportation using vrp optimization model. IEEE Transactions on Intelligent Transportation Systems, 23(2):1661–1669,

work page 2021
[18]

Complexity of vehicle routing and scheduling problems.Networks, 11(2):221–227,

[Lenstra and Kan, 1981] Jan Karel Lenstra and AHG Rin- nooy Kan. Complexity of vehicle routing and scheduling problems.Networks, 11(2):221–227,

work page 1981
[19]

The open vehicle routing problem: Algorithms, large-scale test problems, and computational results.Com- puters & operations research, 34(10):2918–2930,

[Liet al., 2007 ] Feiyue Li, Bruce Golden, and Edward Wasil. The open vehicle routing problem: Algorithms, large-scale test problems, and computational results.Com- puters & operations research, 34(10):2918–2930,

work page 2007
[20]

Destroy and repair using hyper-graphs for rout- ing

[Liet al., 2025 ] Ke Li, Fei Liu, Zhenkun Wang, and Qingfu Zhang. Destroy and repair using hyper-graphs for rout- ing. InProceedings of the AAAI Conference on Artificial Intelligence, pages 18341–18349,

work page 2025
[21]

Neural combinatorial optimiza- tion with heavy decoder: Toward large scale generaliza- tion.Advances in Neural Information Processing Systems, 36:8845–8864,

[Luoet al., 2023 ] Fu Luo, Xi Lin, Fei Liu, Qingfu Zhang, and Zhenkun Wang. Neural combinatorial optimiza- tion with heavy decoder: Toward large scale generaliza- tion.Advances in Neural Information Processing Systems, 36:8845–8864,

work page 2023
[22]

Order-first split-second methods for vehicle routing problems: A review.Transportation Re- search Part C-emerging Technologies, 40:179–200,

[Prinset al., 2014 ] Christian Prins, Philippe Lacomme, and Caroline Prodhon. Order-first split-second methods for vehicle routing problems: A review.Transportation Re- search Part C-emerging Technologies, 40:179–200,

work page 2014
[23]

Two meta-heuristics for solving the capacitated ve- hicle routing problem: the case of the tunisian post office

[Sbaiet al., 2022 ] Ines Sbai, Saoussen Krichen, and Olfa Li- mam. Two meta-heuristics for solving the capacitated ve- hicle routing problem: the case of the tunisian post office. Operational Research, 22(1):507–549,

work page 2022
[24]

Symmetry-informed marl: A de- centralized and cooperative uav swarm control approach for communication coverage.IEEE Transactions on Mo- bile Computing, 24(01):1–18,

[Shiet al., 2025 ] Rongye Shi, Xin Yu, Yandong Wang, Yongkai Tian, Zhenyu Liu, Wenjun Wu, Xiao-Ping Zhang, and Manuela M Veloso. Symmetry-informed marl: A de- centralized and cooperative uav swarm control approach for communication coverage.IEEE Transactions on Mo- bile Computing, 24(01):1–18,

work page 2025
[25]

Equity-transformer: Solving np-hard min-max routing problems as sequential generation with equity context

[Sonet al., 2024 ] Jiwoo Son, Minsu Kim, Sanghyeok Choi, Hyeonah Kim, and Jinkyoo Park. Equity-transformer: Solving np-hard min-max routing problems as sequential generation with equity context. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 20265–20273,

work page 2024
[26]

Learning 3-opt heuristics for traveling salesman problem via deep reinforcement learning

[Suiet al., 2021 ] Jingyan Sui, Shizhe Ding, Ruizhi Liu, Liming Xu, and Dongbo Bu. Learning 3-opt heuristics for traveling salesman problem via deep reinforcement learning. InAsian conference on machine learning, pages 1301–1316. PMLR,

work page 2021
[27]

An optimization model for vehicle routing problem in last-mile delivery.Expert Systems with Applications, 222:119789,

[Tiwari and Sharma, 2023] Krishna Veer Tiwari and Satyen- dra Kumar Sharma. An optimization model for vehicle routing problem in last-mile delivery.Expert Systems with Applications, 222:119789,

work page 2023
[28]

A unified solution framework for multi-attribute vehicle routing problems

[Vidalet al., 2014 ] Thibaut Vidal, Teodor Gabriel Crainic, Michel Gendreau, and Christian Prins. A unified solution framework for multi-attribute vehicle routing problems. European Journal of Operational Research, 234(3):658– 673,

work page 2014
[29]

Technical note: Split algorithm in o(n) for the capacitated vehicle routing problem.Com- put

[Vidal, 2015] Thibaut Vidal. Technical note: Split algorithm in o(n) for the capacitated vehicle routing problem.Com- put. Oper. Res., 69:40–47,

work page 2015
[30]

Pointer networks.Advances in neural in- formation processing systems, 28,

[Vinyalset al., 2015 ] Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks.Advances in neural in- formation processing systems, 28,

work page 2015
[31]

Solv- ing the min-max multiple traveling salesmen problem via learning-based path generation and optimal splitting

[Wanget al., 2025 ] Wen Wang, Xiangchen Wu, Liang Wang, Hao Hu, Xianping Tao, and Linghao Zhang. Solv- ing the min-max multiple traveling salesmen problem via learning-based path generation and optimal splitting. arXiv preprint arXiv:2508.17087,

work page arXiv 2025
[32]

Williams

[Williams, 1992] Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforce- ment learning.Mach. Learn., 8(3–4):229–256, May

work page 1992
[33]

Wouda, Leon Lan, and Wouter Kool

[Woudaet al., 2024 ] Niels A. Wouda, Leon Lan, and Wouter Kool. PyVRP: a high-performance VRP solver package. INFORMS Journal on Computing, 36(4):943–955,

work page 2024
[34]

Learning improvement heuris- tics for solving routing problems.IEEE transactions on neural networks and learning systems, 33(9):5057–5069,

[Wuet al., 2021 ] Yaoxin Wu, Wen Song, Zhiguang Cao, Jie Zhang, and Andrew Lim. Learning improvement heuris- tics for solving routing problems.IEEE transactions on neural networks and learning systems, 33(9):5057–5069,

work page 2021
[35]

Neural combinatorial optimiza- tion algorithms for solving vehicle routing problems: A comprehensive survey with perspectives.arXiv preprint arXiv:2406.00415,

[Wuet al., 2024 ] Xuan Wu, Di Wang, Lijie Wen, Yubin Xiao, Chunguo Wu, Yuesong Wu, Chaoyu Yu, Douglas L Maskell, and You Zhou. Neural combinatorial optimiza- tion algorithms for solving vehicle routing problems: A comprehensive survey with perspectives.arXiv preprint arXiv:2406.00415,

work page arXiv 2024
[36]

[Xinet al., 2021 ] Liang Xin, Wen Song, Zhiguang Cao, and Jie Zhang. Neurolkh: Combining deep learning model with lin-kernighan-helsgaun heuristic for solving the trav- eling salesman problem.Advances in Neural Information Processing Systems, 34:7472–7483,

work page 2021
[37]

Deep neural network ap- proximated dynamic programming for combinatorial opti- mization

[Xuet al., 2020 ] Shenghe Xu, Shivendra S Panwar, Murali Kodialam, and TV Lakshman. Deep neural network ap- proximated dynamic programming for combinatorial opti- mization. InProceedings of the AAAI Conference on Arti- ficial Intelligence, volume 34, pages 1684–1691,

work page 2020
[38]

Leveraging partial symmetry for multi-agent reinforcement learning

[Yuet al., 2024 ] Xin Yu, Rongye Shi, Pu Feng, Yongkai Tian, Simin Li, Shuhao Liao, and Wenjun Wu. Leveraging partial symmetry for multi-agent reinforcement learning. InProceedings of the AAAI Conference on Artificial Intel- ligence, volume 38, pages 17583–17590,

work page 2024
[39]

Review of vehicle rout- ing problems: Models, classification and solving algo- rithms.Archives of Computational Methods in Engineer- ing, 29(1):195–221,

[Zhanget al., 2022 ] Haifei Zhang, Hongwei Ge, Jinlong Yang, and Yubing Tong. Review of vehicle rout- ing problems: Models, classification and solving algo- rithms.Archives of Computational Methods in Engineer- ing, 29(1):195–221,

work page 2022
[40]

Combining reinforce- ment learning with lin-kernighan-helsgaun algorithm for the traveling salesman problem

[Zhenget al., 2021 ] Jiongzhi Zheng, Kun He, Jianrong Zhou, Yan Jin, and Chu-Min Li. Combining reinforce- ment learning with lin-kernighan-helsgaun algorithm for the traveling salesman problem. InProceedings of the AAAI conference on artificial intelligence, volume 35, pages 12445–12452,

work page 2021
[41]

Dpn: Decoupling partition and navigation for neural solvers of min-max vehicle routing problems

[Zhenget al., 2024 ] Zhi Zheng, Shunyu Yao, Zhenkun Wang, Tong Xialiang, Mingxuan Yuan, and Ke Tang. Dpn: Decoupling partition and navigation for neural solvers of min-max vehicle routing problems. InInternational Conference on Machine Learning, pages 61559–61592. PMLR,

work page 2024
[42]

Mvmoe: Multi-task vehicle routing solver with mixture-of-experts

[Zhouet al., 2024 ] Jianan Zhou, Zhiguang Cao, Yaoxin Wu, Wen Song, Yining Ma, Jie Zhang, and Xu Chi. Mvmoe: Multi-task vehicle routing solver with mixture-of-experts. InInternational Conference on Machine Learning, pages 61804–61824. PMLR, 2024

work page 2024