{"paper":{"title":"A Unified Knowledge Embedded Reinforcement Learning-based Framework for Generalized Capacitated Vehicle Routing Problems","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A framework embedding classical routing knowledge into RL achieves better solutions for diverse CVRP variants.","cross_cats":[],"primary_cat":"cs.AI","authors_text":"Hao Hu, Liang Wang, Wen Wang, Xiangchen Wu, Xianping Tao","submitted_at":"2026-05-14T06:05:22Z","abstract_excerpt":"The Capacitated Vehicle Routing Problem (CVRP) is a fundamental NP-hard problem with broad applications in logistics and transportation. Real-world CVRPs often involve diverse objectives and complex constraints, such as time windows or backhaul requirements, motivating the development of a unified solution framework. Recent reinforcement learning (RL) approaches have shown promise in combinatorial optimization, yet they rely on end-to-end learning and lack explicit problem-solving knowledge, limiting solution quality. In this paper, we propose a knowledge-embedded framework inspired by the Rou"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Extensive experiments show that this framework achieves superior solution quality compared with state-of-the-art learning-based methods, with a smaller gap to classical heuristics, demonstrating strong generalization across diverse CVRP variants.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the Route-First Cluster-Second decomposition plus dynamic programming guidance will reliably mitigate partial observability and produce generalizable improvements without introducing new biases or overfitting to the tested CVRP variants.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A knowledge-embedded RL framework decomposes generalized CVRPs into route-first and cluster-second subproblems, using dynamic programming to guide the RL solver and a history-enhanced context module to handle partial observability, yielding better solutions than prior learning methods.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A framework embedding classical routing knowledge into RL achieves better solutions for diverse CVRP variants.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"a5a7cbe3e560807b7369002c3a1c2b30d7bb080094d48e4791871fcccf748068"},"source":{"id":"2605.14416","kind":"arxiv","version":1},"verdict":{"id":"def30479-0570-414b-a712-e49a8c6ca532","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T02:06:16.113961Z","strongest_claim":"Extensive experiments show that this framework achieves superior solution quality compared with state-of-the-art learning-based methods, with a smaller gap to classical heuristics, demonstrating strong generalization across diverse CVRP variants.","one_line_summary":"A knowledge-embedded RL framework decomposes generalized CVRPs into route-first and cluster-second subproblems, using dynamic programming to guide the RL solver and a history-enhanced context module to handle partial observability, yielding better solutions than prior learning methods.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the Route-First Cluster-Second decomposition plus dynamic programming guidance will reliably mitigate partial observability and produce generalizable improvements without introducing new biases or overfitting to the tested CVRP variants.","pith_extraction_headline":"A framework embedding classical routing knowledge into RL achieves better solutions for diverse CVRP variants."},"references":{"count":42,"sample":[{"doi":"","year":2024,"title":"Routefinder: Towards foundation models for vehicle routing problems","work_id":"65b94abc-7a2f-4393-bd62-6da2c721cfea","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Learn- ing to handle complex constraints for vehicle routing prob- lems.Advances in Neural Information Processing Sys- tems, 37:93479–93509,","work_id":"78783065-615f-42d4-8053-c9b54afe3327","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2019,"title":"Learning to perform local rewriting for combinatorial opti- mization.Advances in neural information processing sys- tems, 32,","work_id":"6d943f93-f5a7-4c4c-bbd4-57d3c98f04b5","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Select and optimize: Learning to solve large-scale tsp instances","work_id":"87d25f8c-0301-493d-a781-843da427f284","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"Learning 2-opt heuristics for the traveling salesman problem via deep re- inforcement learning","work_id":"f079835f-7cd1-4c98-8ea8-dc44846d6b10","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":42,"snapshot_sha256":"8db56c4f55f4230dde80621239975e5ce37b6a865226bbef87ac9b758da6e7ac","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}