pith. machine review for the scientific record. sign in

arxiv: 2604.07872 · v1 · submitted 2026-04-09 · 💻 cs.NE · cs.AI

Recognition: 2 theorem links

· Lean Theorem

PyVRP^+: LLM-Driven Metacognitive Heuristic Evolution for Hybrid Genetic Search in Vehicle Routing Problems

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:16 UTC · model grok-4.3

classification 💻 cs.NE cs.AI
keywords vehicle routing problemhybrid genetic searchlarge language modelsmetaheuristic evolutionmetacognitive programmingcombinatorial optimization
0
0 comments X

The pith

Metacognitive Evolutionary Programming makes LLMs reason about failures to evolve superior heuristics for vehicle routing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Metacognitive Evolutionary Programming as a way to move beyond reactive LLM code mutations in metaheuristic design. MEP forces the model into an explicit Reason-Act-Reflect loop: it must diagnose why a current heuristic underperforms, form a hypothesis grounded in supplied VRP domain knowledge, and then code a targeted change. When this process is used to evolve operators inside the Hybrid Genetic Search algorithm, the resulting heuristics deliver better solution quality and lower runtimes than the hand-designed baseline across multiple VRP variants.

Core claim

By replacing black-box performance-driven mutations with a structured metacognitive cycle that requires explicit failure diagnosis and hypothesis formation from domain knowledge, MEP discovers new local-search and crossover operators for Hybrid Genetic Search that improve solution quality by up to 2.70 percent and reduce runtime by more than 45 percent on challenging VRP instances.

What carries the argument

Metacognitive Evolutionary Programming (MEP), the framework that compels an LLM to follow a Reason-Act-Reflect cycle rather than performing direct code edits based only on fitness scores.

If this is right

  • Heuristics produced by MEP generalize across different VRP problem classes without additional hand-tuning.
  • The same metacognitive loop can be applied to evolve other core components of metaheuristics beyond local search and crossover.
  • Focusing the LLM on the exploration-exploitation balance yields both faster and higher-quality search trajectories.
  • The approach lowers the expertise barrier for creating competitive solvers for NP-hard routing problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Structured reasoning prompts may let LLMs contribute more reliably to algorithm design in other combinatorial domains such as scheduling or packing.
  • The performance edge could shrink if the evaluation set used during evolution is too narrow or correlated with the final test instances.
  • Combining MEP with traditional parameter-tuning methods might further improve results on very large-scale VRP instances.

Load-bearing premise

That the LLM, when guided through the structured cycle with pre-supplied domain knowledge, will generate genuinely novel and better heuristics instead of superficial variations that simply score well on the instances used for evaluation.

What would settle it

Measure whether the evolved heuristics maintain their reported quality and speed advantages when tested on a fresh collection of VRP instances drawn from different distributions or larger sizes than those used during the evolution process.

Figures

Figures reproduced from arXiv: 2604.07872 by Jianan Zhou, Manuj Malik, Shashank Reddy Chirra, Zhiguang Cao.

Figure 1
Figure 1. Figure 1: An overview of MEP, which outlines a process for evolving high-performing heuristics. The process continues until [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The prompt template used to guide the LLM. It [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Evolution of average cost values over 10 generations [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Hybrid Adaptive Stratified Diversity-Pressure Parent Selector. [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Hybrid Adaptive Stratified Diversity-Pressure Parent Selector. [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Hybrid Adaptive Stratified Diversity-Pressure Parent Selector. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Hybrid Adaptive Stratified Diversity-Pressure Parent Selector. [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
read the original abstract

Designing high-performing metaheuristics for NP-hard combinatorial optimization problems, such as the Vehicle Routing Problem (VRP), remains a significant challenge, often requiring extensive domain expertise and manual tuning. Recent advances have demonstrated the potential of large language models (LLMs) to automate this process through evolutionary search. However, existing methods are largely reactive, relying on immediate performance feedback to guide what are essentially black-box code mutations. Our work departs from this paradigm by introducing Metacognitive Evolutionary Programming (MEP), a framework that elevates the LLM to a strategic discovery agent. Instead of merely reacting to performance scores, MEP compels the LLM to engage in a structured Reason-Act-Reflect cycle, forcing it to explicitly diagnose failures, formulate design hypotheses, and implement solutions grounded in pre-supplied domain knowledge. By applying MEP to evolve core components of the state-of-the-art Hybrid Genetic Search (HGS) algorithm, we discover novel heuristics that significantly outperform the original baseline. By steering the LLM to reason strategically about the exploration-exploitation trade-off, our approach discovers more effective and efficient heuristics applicable across a wide spectrum of VRP variants. Our results show that MEP discovers heuristics that yield significant performance gains over the original HGS baseline, improving solution quality by up to 2.70\% and reducing runtime by over 45\% on challenging VRP variants.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Metacognitive Evolutionary Programming (MEP), a framework that elevates LLMs from reactive code mutators to strategic agents via a structured Reason-Act-Reflect cycle grounded in pre-supplied domain knowledge. Applied to evolving core components of the Hybrid Genetic Search (HGS) algorithm for Vehicle Routing Problems (VRP), the work claims discovery of novel heuristics that outperform the HGS baseline, with reported gains of up to 2.70% in solution quality and over 45% reduction in runtime across challenging VRP variants.

Significance. If the empirical claims hold and the evolved heuristics prove both novel and causally attributable to the metacognitive structure, the work would advance automated metaheuristic design by showing how LLMs can be steered toward hypothesis-driven discovery rather than black-box mutation. This extends prior LLM-evolution approaches with a more deliberate reasoning loop and could reduce reliance on manual domain expertise in combinatorial optimization. The framework's emphasis on pre-supplied knowledge and explicit failure diagnosis is a constructive step, though its impact hinges on verifiable novelty and reproducibility.

major comments (3)
  1. [Results and Discussion] The central claim that MEP 'discovers novel heuristics' (abstract) that are 'significantly outperform[ing]' the baseline rests on an unverified assumption of genuine novelty; the manuscript provides no pseudocode, explicit diffs against original HGS operators, or expert assessment comparing the evolved components to established VRP literature (e.g., standard crossover, mutation, or local-search operators). This is load-bearing because without it the reported gains could arise from selection bias across LLM trials rather than the Reason-Act-Reflect structure.
  2. [Experimental Evaluation] The experimental claims of 2.70% quality improvement and >45% runtime reduction lack supporting details on instance sets (e.g., specific CVRP, VRPTW, or multi-depot benchmarks), number of runs, statistical tests, confidence intervals, or ablation studies isolating the Reflect phase; these omissions prevent assessment of whether the gains are robust or instance-specific.
  3. [Methods] The description of the MEP framework does not specify the exact prompting strategy, how domain knowledge is encoded and supplied to the LLM, or safeguards against the model simply rephrasing known heuristics, making it impossible to replicate the 'strategic discovery' process or confirm it differs from prior reactive LLM methods.
minor comments (2)
  1. [Title and Abstract] The title references PyVRP^+ but the abstract and introduction do not define PyVRP or clarify its relationship to the proposed MEP framework and HGS modifications.
  2. [Methods] Notation for the Reason-Act-Reflect cycle and the evolved heuristic components could be formalized with pseudocode or a clear diagram to improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. The comments highlight important areas where additional rigor and transparency will strengthen the manuscript. We address each major comment below and commit to revisions that directly incorporate the requested elements.

read point-by-point responses
  1. Referee: [Results and Discussion] The central claim that MEP 'discovers novel heuristics' (abstract) that are 'significantly outperform[ing]' the baseline rests on an unverified assumption of genuine novelty; the manuscript provides no pseudocode, explicit diffs against original HGS operators, or expert assessment comparing the evolved components to established VRP literature (e.g., standard crossover, mutation, or local-search operators). This is load-bearing because without it the reported gains could arise from selection bias across LLM trials rather than the Reason-Act-Reflect structure.

    Authors: We agree that explicit verification of novelty is essential to substantiate the central claim. In the revised manuscript we will add: (1) full pseudocode for each evolved heuristic component, (2) a side-by-side diff table highlighting modifications relative to the original HGS operators, and (3) a comparison against canonical VRP operators drawn from the literature (e.g., OX crossover, 2-opt, and standard local-search moves). To address selection bias, we will report the complete distribution of heuristic quality across all LLM-generated candidates and include an ablation contrasting the full Reason-Act-Reflect cycle against a reactive mutation-only baseline. These additions will allow readers to assess whether the performance gains are attributable to the metacognitive structure rather than trial selection. revision: yes

  2. Referee: [Experimental Evaluation] The experimental claims of 2.70% quality improvement and >45% runtime reduction lack supporting details on instance sets (e.g., specific CVRP, VRPTW, or multi-depot benchmarks), number of runs, statistical tests, confidence intervals, or ablation studies isolating the Reflect phase; these omissions prevent assessment of whether the gains are robust or instance-specific.

    Authors: We will expand the experimental section with the missing details. The revised manuscript will specify the exact benchmark sets (Christofides CVRP, Solomon VRPTW, and multi-depot instances), the number of independent runs per instance (10), the statistical tests performed (paired t-tests with reported p-values), 95% confidence intervals on all reported improvements, and dedicated ablation studies that isolate the Reflect phase by comparing full MEP against variants lacking reflection. These additions will demonstrate that the reported gains are statistically robust and not instance-specific. revision: yes

  3. Referee: [Methods] The description of the MEP framework does not specify the exact prompting strategy, how domain knowledge is encoded and supplied to the LLM, or safeguards against the model simply rephrasing known heuristics, making it impossible to replicate the 'strategic discovery' process or confirm it differs from prior reactive LLM methods.

    Authors: We will add a new appendix containing the complete prompting strategy, including verbatim system and user prompts used at each stage of the Reason-Act-Reflect cycle. Domain knowledge is supplied as curated textual summaries of VRP literature and HGS design principles, injected into the system prompt. To guard against simple rephrasing, the Reflect phase explicitly requires the LLM to (a) diagnose concrete failure modes observed in the current heuristic and (b) justify proposed changes with reference to the supplied domain knowledge or first-principles reasoning. We will also release the full prompt templates and generation code to enable exact replication and differentiation from prior reactive approaches. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework with measured gains

full rationale

The paper describes an empirical process: an LLM is guided through a Reason-Act-Reflect cycle (MEP) to mutate components of the existing Hybrid Genetic Search algorithm, after which solution quality and runtime are measured on VRP benchmark instances. No equations, fitted parameters, or self-referential derivations appear in the abstract or described claims. Performance deltas (2.70% quality, 45% runtime) are reported as experimental outcomes, not as quantities that reduce by construction to the input heuristics or to prior self-citations. The central claim therefore rests on external measurement rather than definitional or self-citation loops.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the effectiveness of the introduced MEP cycle and the assumption that pre-supplied domain knowledge can be productively used by the LLM; no explicit free parameters or invented physical entities are described.

axioms (1)
  • domain assumption The LLM can meaningfully diagnose heuristic failures and generate design hypotheses when prompted with domain knowledge about exploration-exploitation trade-offs.
    Invoked in the description of the Reason-Act-Reflect cycle as the mechanism that elevates the LLM beyond black-box mutation.
invented entities (1)
  • Metacognitive Evolutionary Programming (MEP) framework no independent evidence
    purpose: To structure LLM interactions as strategic discovery rather than reactive code editing for heuristic evolution.
    New framework introduced in the paper to guide the LLM through diagnosis, hypothesis, and reflection steps.

pith-pipeline@v0.9.0 · 5559 in / 1329 out tokens · 37557 ms · 2026-05-10T18:16:44.350102+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 8 canonical work pages · 3 internal anchors

  1. [1]

    Tian Bai, Yongwang Cao, Yan Ge, and Haitao Yu. 2025. MP: Endowing Large Language Models with Lateral Thinking. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 23460–23468

  2. [2]

    Yoshua Bengio, Andrea Lodi, and Antoine Prouvost. 2021. Machine learning for combinatorial optimization: a methodological tour d’horizon.European Journal of Operational Research290, 2 (2021), 405–421

  3. [3]

    Federico Berto, Chuanbo Hua, Junyoung Park, Laurin Luttmann, Yining Ma, Fanchen Bu, Jiarui Wang, Haoran Ye, Minsu Kim, Sanghyeok Choi, Nayeli Gast Zepeda, André Hottung, Jianan Zhou, Jieyi Bi, Yu Hu, Fei Liu, Hyeonah Kim, Jiwoo Son, Haeyeon Kim, Davide Angioni, Wouter Kool, Zhiguang Cao, Qingfu Zhang, Joungho Kim, Jie Zhang, Kijung Shin, Cathy Wu, Sungsoo...

  4. [4]

    Federico Berto, Chuanbo Hua, Nayeli Gast Zepeda, André Hottung, Niels A Wouda, Leon Lan, Junyoung Park, Kevin Tierney, and Jinkyoo Park. 2025. RouteFinder: Towards Foundation Models for Vehicle Routing Problems.Trans- actions on Machine Learning Research2025 (2025)

  5. [5]

    Quentin Cappart, Didier Chételat, Elias B Khalil, Andrea Lodi, Christopher Morris, and Petar Veličković. 2023. Combinatorial optimization and reasoning with graph neural networks.Journal of Machine Learning Research24, 130 (2023), 1–61

  6. [6]

    Angelica Chen, David Dohan, and David So. 2023. EvoPrompting: Language Models for Code-Level Neural Architecture Search. InThirty-seventh Conference on Neural Information Processing Systems. https://openreview.net/forum?id= ifbF4WdT8f

  7. [7]

    Francesca Da Ros, Michael Soprano, Luca Di Gaspero, and Kevin Roitero. 2025. Large Language Models for Combinatorial Optimization: A Systematic Review. arXiv preprint arXiv:2507.03637(2025)

  8. [8]

    Pham Vu Tuan Dat, Long Doan, and Huynh Thi Thanh Binh. 2025. Hsevo: Elevating automatic heuristic design with diversity-driven harmony search and genetic algorithm using llms. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 26931–26938

  9. [9]

    Alexander David Goldie, Zilin Wang, Jakob Nicolaus Foerster, and Shimon Whiteson. 2025. How Should We Meta-Learn Reinforcement Learning Algo- rithms?. InReinforcement Learning Conference. https://openreview.net/forum? id=jKzQ6af2DU

  10. [10]

    Jin Huang, Qihao Liu, Xinyu Li, Liang Gao, and Yue Teng. 2026. Automatic pro- gramming via large language models with population self-evolution for dynamic fuzzy job shop scheduling problem.IEEE Transactions on Fuzzy Systems(2026)

  11. [11]

    Xia Jiang, Yaoxin Wu, Minshuo Li, Zhiguang Cao, and Yingqian Zhang. 2025. Large language models as end-to-end combinatorial optimization solvers. In Advances in Neural Information Processing Systems

  12. [12]

    Fei Liu, Xialiang Tong, Mingxuan Yuan, and Qingfu Zhang. 2023. Algorithm evolution using large language model.arXiv preprint arXiv:2311.15249(2023)

  13. [13]

    Fei Liu, Tong Xialiang, Mingxuan Yuan, Xi Lin, Fu Luo, Zhenkun Wang, Zhichao Lu, and Qingfu Zhang. 2024. Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model. InInternational Conference on Machine Learning

  14. [14]

    Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. 2024. The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery.arXiv preprint arXiv:2408.06292(2024). arXiv:2408.06292 [cs.AI] https://arxiv.org/abs/2408.06292

  15. [15]

    Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shikha Prabhumoye, Yiming Yang, et al. 2023. Self-Refine: Iterative Refinement with Self-Feedback. InAdvances in Neural Infor- mation Processing Systems

  16. [16]

    Kalyan Varma Nadimpalli, Shashank Reddy Chirra, Pradeep Varakantham, and Stefan Bauer. 2025. Evolving RL: Discovering New Activation Functions using LLMs. InTowards Agentic AI for Science: Hypothesis Generation, Comprehension, Quantification, and Validation. https://openreview.net/forum?id=H2x9juCuJg

  17. [17]

    Deepak Nathani, Lovish Madaan, Nicholas Roberts, Nikolay Bashlykov, Ajay Menon, Vincent Moens, Mikhail Plekhanov, Amar Budhiraja, Despoina Magka, Vladislav Vorotilov, et al. [n.d.]. MLGym: A New Framework and Benchmark for Advancing AI Research Agents. InSecond Conference on Language Modeling

  18. [18]

    Alexander Novikov, Ngân V ˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. 2025. AlphaEvolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131(2025)

  19. [19]

    Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M Pawan Kumar, Emilien Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming Wang, Omar Fawzi, et al. 2024. Mathematical discoveries from program search with large language models.Nature625, 7995 (2024), 468–475

  20. [20]

    Yiding Shi, Jianan Zhou, Wen Song, Jieyi Bi, Yaoxin Wu, and Jie Zhang. 2025. Generalizable Heuristic Generation Through Large Language Models with Meta- Optimization.arXiv preprint arXiv:2505.20881(2025)

  21. [21]

    Yiwen Sun, Xianyin Zhang, Shiyu Huang, Shaowei Cai, BingZhen Zhang, and Ke Wei. 2024. AutoSAT: Automatically optimize sat solvers via large language models.arXiv preprint arXiv:2402.10705(2024)

  22. [22]

    Anja Šurina, Amin Mansouri, Lars CPM Quaedvlieg, Amal Seddas, Maryna Via- zovska, Emmanuel Abbe, and Caglar Gulcehre. [n.d.]. Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning. InSecond Conference on Language Modeling

  23. [23]

    Nguyen Thach, Aida Riahifar, Nathan Huynh, and Hau Chan. 2025. RedAHD: Reduction-Based End-to-End Automatic Heuristic Design with Large Language Models.arXiv preprint arXiv:2505.20242(2025)

  24. [24]

    Thibaut Vidal. 2022. Hybrid genetic search for the CVRP: Open-source implemen- tation and SWAP* neighborhood.Computers & Operations Research140 (2022), 105643

  25. [25]

    Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models. InInternational Conference on Learning Representations

  26. [26]

    Niels A Wouda, Leon Lan, and Wouter Kool. 2024. PyVRP: A high-performance VRP solver package.INFORMS Journal on Computing36, 4 (2024), 943–955

  27. [27]

    Xuan Wu, Di Wang, Chunguo Wu, Lijie Wen, Chunyan Miao, Yubin Xiao, and You Zhou. 2025. Efficient Heuristics Generation for Solving Combinatorial Optimization Problems Using Large Language Models. InSIGKDD Conference on Knowledge Discovery and Data Mining

  28. [28]

    Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. 2025. The AI Scientist-v2: Workshop- Level Automated Scientific Discovery via Agentic Tree Search.arXiv preprint arXiv:2504.08066(2025). arXiv:2504.08066 [cs.AI] https://arxiv.org/abs/2504.08066

  29. [29]

    Xianliang Yang, Lei Song, Yapu Zhang, and Jiang Bian. 2025. HeurAgenix: A Multi-Agent LLM-Based Paradigm for Adaptive Heuristic Evolution and Selection in Combinatorial Optimization. https://openreview.net/forum?id=xxSK3ZNAhh

  30. [30]

    Shunyu Yao, Xi Lin, Jiashu Wang, Qingfu Zhang, and Zhenkun Wang. 2024. Rethinking supervised learning based neural combinatorial optimization for routing problem.ACM Transactions on Evolutionary Learning(2024)

  31. [31]

    Shunyu Yao, Fei Liu, Xi Lin, Zhichao Lu, Zhenkun Wang, and Qingfu Zhang. 2025. Multi-objective evolution of heuristic using large language model. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 27144–27152

  32. [32]

    Haoran Ye, Jiarui Wang, Zhiguang Cao, Federico Berto, Chuanbo Hua, Haeyeon Kim, Jinkyoo Park, and Guojie Song. 2024. ReEvo: Large Language Models as Hyper-Heuristics with Reflective Evolution. InAdvances in Neural Information Processing Systems

  33. [33]

    Huigen Ye, Hua Xu, An Yan, and Yaoyang Cheng. 2025. Large Language Model- driven Large Neighborhood Search for Large-Scale MILP Problems. InInterna- tional Conference on Machine Learning

  34. [34]

    Ni Zhang, Zhiguang Cao, Jianan Zhou, Cong Zhang, and Yew-Soon Ong. 2025. An Agentic Framework with LLMs for Solving Complex Vehicle Routing Problems. InInternational Conference on Learning Representations

  35. [35]

    Zhi Zheng, Zhuoliang Xie, Zhenkun Wang, and Bryan Hooi. 2025. Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design. InInternational Conference on Machine Learning

  36. [36]

    Jianan Zhou, Zhiguang Cao, Yaoxin Wu, Wen Song, Yining Ma, Jie Zhang, and Xu Chi. 2024. MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts. InInternational Conference on Machine Learning

  37. [37]

    Rongjie Zhu, Cong Zhang, and Zhiguang Cao. 2025. Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM. InInternational Conference on Learning Representations. A VRP ATTRIBUTE TAXONOMY To handle the vast landscape of VRP variants in a structured manner, we describe VRP instances via composable attributes. This allows us to repre...

  38. [38]

    (1) Pitfalls ( 𝐾𝑝 ):List three ways a parent-selector can go wrong in VRP (e.g., over-exploitation, diversity collapse, cost-feasibility trade-offs)

    Planning Phase Before you inspect any code, pause and map out your thinking. (1) Pitfalls ( 𝐾𝑝 ):List three ways a parent-selector can go wrong in VRP (e.g., over-exploitation, diversity collapse, cost-feasibility trade-offs). (2) Strategies (𝐾𝑠 ):For each pitfall, propose one high-level counter- measure (e.g., partial elitism + random injection). (3) Hid...

  39. [39]

    Reasoning Phase Now dig intosel1andsel2—but think aloud. • Identify Patterns:What exploration/exploitation bias does each show? • Assess Shortcomings:Where does each stumble relative to your 𝐾𝑝 /𝐾𝑡 list? • Design Sketch:Using your 𝐾𝑠 strategies, outline in prose how sel3will avoid those failures. Your Task in This Interaction:You will be presented with tw...

  40. [40]

    – Use cost_evaluator.penalised_cost(solution) to assess solu- tion quality

    ->tuple[ Solution , Solution ]: •Behavior: – Prioritize feasible solutions but allow occasional infeasible ones if beneficial. – Use cost_evaluator.penalised_cost(solution) to assess solu- tion quality. –Ensure diversity between selected parents. –Avoid modifying the population in-place. •Performance Hints: –Cache penalized costs if reused multiple times....

  41. [41]

    8defload_penalty ( 9self , load :int, capacity :int, dimension :int

    -> None : ... 8defload_penalty ( 9self , load :int, capacity :int, dimension :int

  42. [42]

    11deftw_penalty ( self , time_warp :int) ->int:

    ->int: ... 11deftw_penalty ( self , time_warp :int) ->int: ... 12defdist_penalty ( self , distance :int, max_distance :int) -> int: ... 13defpenalised_cost ( self , solution : Solution ) ->int: ... 14defcost ( self , solution : Solution ) ->int: ... 15 16classSolution : 17def__init__ ( 18self , 19data : ProblemData , 20routes :list[ Route ] |list[list[int]] ,

  43. [43]

    22@classmethod 23defmake_random ( 24cls , data : ProblemData , rng : R a n d o m N u m b e r G e n e r a t o r

    -> None : ... 22@classmethod 23defmake_random ( 24cls , data : ProblemData , rng : R a n d o m N u m b e r G e n e r a t o r

  44. [44]

    26defis_feasible ( self ) ->bool:

    -> Solution : ... 26defis_feasible ( self ) ->bool: ... 27# ... ( additional methods omitted for brevity ) 28 29classR a n d o m N u m b e r G e n e r a t o r : 30@overload 31def__init__ ( self , seed :int) -> None : ... 32@overload 33def__init__ ( self , state :list[int]) -> None : ... 34defrand ( self ) ->float: ... 35defrandint ( self , high :int) ->in...

  45. [45]

    Reflection Phase After you draftsel3, self-evaluate before returning code. (1) Which of your original𝐾𝑝 pitfalls does the final code best address? Which remain? (2) Did any new traps surface during coding? How would you tweak sel3to handle them? (3) Proposetwoconcrete revisions to improve quality/diversity trade- off further. Response Format:Enclose your ...

  46. [46]

    " " 10[ Brief description of your novel parent selection strategy ] 11

    ->tuple[ Solution , Solution ]: 9" " " 10[ Brief description of your novel parent selection strategy ] 11" " " 12# [ Your implementation here ] 13parent1 = ... 14parent2 = ... 15returnparent1 , parent2 C PARENT SELECTION CODE This section presents the complete implementation, from Figures 4 to 7, of the parent selection algorithm that was discovered. The ...

  47. [47]

    ** Precompute Costs & Stratification **: 19- Cache penalized costs and feasibility flags for all solutions upfront . 20- Sort population by cost and partition into three strata : 21* Elite : top 1/6 solutions ( highest quality ) 22* Mid : solutions from 1/6 to 2/3 ( moderate quality ) 23* Tail : bottom 1/3 solutions ( lower quality ) 24

  48. [48]

    27- Tournament emphasizes best feasible solutions with occasional infeasible survivors 28(20% probability ) for diversity maintenance

    ** Parent 1 Selection **: 26- Select from elite stratum using feasibility - cost biased tournament . 27- Tournament emphasizes best feasible solutions with occasional infeasible survivors 28(20% probability ) for diversity maintenance . 29

  49. [49]

    ** Adaptive Parent 2 Strategy **: 31- Determine complementary strata based on Parent 1 characteristics : 32* If Parent 1 is elite & feasible : select from mid + tail ( exploration ) 33* If Parent 1 is infeasible or non - elite : select from elite + mid ( intensification ) 34

  50. [50]

    ** Population - Size Adaptive Selection **: 36- ** Large populations ( >500) **: Skip structural analysis entirely , use simplified scoring : 37* Cost advantage (60% weight ) 38* Feasibility bonus (30% weight ) 39* Basic diversity bonus (10% weight ) 40- ** Small populations (= <500) **: Full structural analysis with lazy encoding : 41* Structural dissimi...

  51. [51]

    " " 3Combined feasibility - cost biased tournament emphasizing best feasible solutions 4with occasional infeasible survivors for diversity maintenance . 5

    ** Lazy Encoding & Sampling **: 46- Only encode solutions that are actually evaluated as candidates . 47- Adaptive sampling limits : 20 -30 candidates per stratum depending on population size . 48- Add small randomness to scores to avoid local minima . 49 50- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -...