Learn to Relax with Large Language Models: Solving Constraint Optimization Problems via Bidirectional Coevolution

Beidan Liu; Chen Gao; Quanjun Yin; Tianle Pu; Wei Qi; Yong Zhao; Zhengqiu Zhu

arxiv: 2509.12643 · v4 · submitted 2025-09-16 · 💻 cs.AI

Learn to Relax with Large Language Models: Solving Constraint Optimization Problems via Bidirectional Coevolution

Beidan Liu , Zhengqiu Zhu , Chen Gao , Tianle Pu , Yong Zhao , Wei Qi , Quanjun Yin This is my paper

Pith reviewed 2026-05-18 16:59 UTC · model grok-4.3

classification 💻 cs.AI

keywords constraint optimizationlarge language modelsconstraint relaxationbidirectional coevolutionMonte Carlo Tree Searchevolutionary algorithmsautomated problem solving

0 comments

The pith

Large language models can solve hard constraint optimization problems by synthesizing principled relaxation strategies and evolving them bidirectionally with search algorithms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents AutoCO as a way to shift LLMs from passive constraint checkers to proactive designers of optimization strategies. It binds relaxation ideas from operations research to executable code through a single triple representation so the model can justify and implement changes that simplify the original problem. A bidirectional loop then runs Monte Carlo Tree Search to explore different relaxation paths globally while evolutionary algorithms refine solutions locally, exchanging information to keep search from getting stuck. Experiments on three tough benchmarks show the approach works better than prior methods, especially when problems are most difficult. If correct, this points to LLMs that can actively reshape hard problems rather than merely verify candidate answers.

Core claim

AutoCO couples operations-research principles of constraint relaxation with LLM reasoning via a unified triple-representation that links relaxation strategies, algorithmic principles, and executable codes, allowing the model to synthesize, justify, and instantiate both principled and directly runnable relaxations, while a bidirectional global-local coevolution mechanism pairs MCTS for exploring relaxation trajectories with evolutionary algorithms for local intensification to balance diversification and intensification and prevent premature convergence.

What carries the argument

Unified triple-representation binding relaxation strategies, algorithmic principles, and executable codes, together with bidirectional coevolution between global Monte Carlo Tree Search trajectories and local evolutionary intensification.

If this is right

LLM-based solvers can shift from verification to active problem reformulation on complex instances.
Hard constraint problems become more tractable when relaxation trajectories are explored globally and refined locally in a closed loop.
Optimization outputs gain verifiability because each relaxation step stays grounded in established operations-research principles.
Premature convergence is reduced through explicit exchange of priors and feedback between global and local search components.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same relaxation-plus-coevolution pattern could be tested on scheduling or logistics problems where constraint tightening is equally central.
Future checks might examine whether the method remains effective when the underlying LLM changes or when problem size grows beyond current benchmarks.
One open question is how to add automated soundness checks so that synthesized relaxations never admit solutions that violate the original constraints.

Load-bearing premise

Large language models can reliably synthesize, justify, and instantiate relaxation strategies that are both principled and directly executable as code.

What would settle it

Applying the method to the same hard COP benchmarks and observing that performance does not exceed current approaches or that many generated relaxations produce invalid or infeasible solutions.

Figures

Figures reproduced from arXiv: 2509.12643 by Beidan Liu, Chen Gao, Quanjun Yin, Tianle Pu, Wei Qi, Yong Zhao, Zhengqiu Zhu.

**Figure 2.** Figure 2: AutoCO (blue) vs. Current LLM method (green) performance on VRPTW-fuel problems over 100 iterations. The autonomous constraint relaxation temporarily expands feasible regions, enhancing optimization feedback. ration through Monte Carlo Tree Search (MCTS), enabling effective discovery of performant strategyconcepts-code triples. The contributions of this work are concluded as follows: • We propose an end-… view at source ↗

**Figure 3.** Figure 3: Architecture of AutoCO. Initially, we use LLMs to parse user-input problems and generate initial constraint [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Time distribution of AutoCO components. 5.3 Time Efficiency Analysis We assess AutoCO’s computational performance on VRPTW through three metrics: end-to-end runtime (Te2e), time to first feasible solution (Ttf f ), and performance stagnation time (Tstag). As shown in [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Effectiveness of various constraint relaxation [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Convergence dynamics and fitness analysis of [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: The triple individual representation example [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Constraint importance analysis prompt Prompt for Constraint Importance Analysis You are an expert in solving optimization problems, here are the detailed information of an unsolved nonlinear combinatorial optimization problem. ```json {"constraint 1" , "Importance": 0.9}, {"constraint 2" , "Importance": 1.0} ``` Do not output anything other than the specified requirements. <Problem Information> First, plea… view at source ↗

**Figure 11.** Figure 11: The prompt detail of mutation strategy 1 [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗

**Figure 12.** Figure 12: The prompt detail of mutation strategy 2 [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗

**Figure 13.** Figure 13: The prompt detail of mutation strategy 3 [PITH_FULL_IMAGE:figures/full_fig_p016_13.png] view at source ↗

**Figure 15.** Figure 15: The prompt detail of crossover strategy 2 [PITH_FULL_IMAGE:figures/full_fig_p017_15.png] view at source ↗

**Figure 16.** Figure 16: The prompt detail of crossover strategy 3 [PITH_FULL_IMAGE:figures/full_fig_p017_16.png] view at source ↗

read the original abstract

Large Language Model (LLM)-based optimization has recently shown promise for autonomous problem solving, yet most approaches still cast LLMs as passive constraint checkers rather than proactive strategy designers, limiting their effectiveness on complex Constraint Optimization Problems (COPs). To address this, we present AutoCO, an end-to-end Automated Constraint Optimization method that tightly couples operations-research principles of constraint relaxation with LLM reasoning. A core innovation is a unified triple-representation that binds relaxation strategies, algorithmic principles, and executable codes. This design enables the LLM to synthesize, justify, and instantiate relaxation strategies that are both principled and executable. To navigate fragmented solution spaces, AutoCO employs a bidirectional global-local coevolution mechanism, synergistically coupling Monte Carlo Tree Search (MCTS) for global relaxation-trajectory exploration with Evolutionary Algorithms (EAs) for local solution intensification. This continuous exchange of priors and feedback explicitly balances diversification and intensification, thus preventing premature convergence. Extensive experiments on three challenging COP benchmarks validate AutoCO's consistent effectiveness and superior performance, especially in hard regimes where current methods degrade. Results highlight AutoCO as a principled and effective path toward proactive, verifiable LLM-driven optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AutoCO proposes a unified representation and bidirectional MCTS-EA coevolution to let LLMs generate principled relaxation strategies for COPs, but the performance edge still needs concrete numbers to evaluate.

read the letter

The main thing to know is that this paper moves LLMs from just checking constraints to actively synthesizing relaxation strategies for constraint optimization problems. It ties those strategies to algorithmic principles and executable code in one representation, then runs a back-and-forth loop between global search with MCTS and local intensification with evolutionary algorithms to keep the search from collapsing too soon.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces AutoCO, an end-to-end Automated Constraint Optimization method that tightly couples operations-research principles of constraint relaxation with LLM reasoning. A core innovation is a unified triple-representation that binds relaxation strategies, algorithmic principles, and executable codes, enabling the LLM to synthesize, justify, and instantiate principled and executable relaxation strategies. AutoCO employs a bidirectional global-local coevolution mechanism that synergistically couples Monte Carlo Tree Search (MCTS) for global relaxation-trajectory exploration with Evolutionary Algorithms (EAs) for local solution intensification, explicitly balancing diversification and intensification to prevent premature convergence. The paper claims that extensive experiments on three challenging COP benchmarks validate AutoCO's consistent effectiveness and superior performance, especially in hard regimes where current methods degrade.

Significance. If the empirical claims hold, this work could meaningfully advance LLM-driven optimization by positioning LLMs as proactive strategy designers grounded in OR principles rather than passive constraint checkers. The unified triple-representation and bidirectional coevolution offer a coherent architecture for navigating fragmented solution spaces in COPs, with potential for more verifiable and robust solvers if the performance gains are reproducible and the relaxation strategies are shown to be reliably principled.

major comments (2)

Abstract: the central claim of 'consistent effectiveness and superior performance' on three COP benchmarks, along with prevention of premature convergence, is load-bearing for the paper's contribution, yet the abstract provides no quantitative results, error bars, baseline comparisons, or details on how the relaxation strategies were validated as principled; the results section must supply these to allow assessment of the effectiveness claims.
Framework description (unified triple-representation and bidirectional coevolution): the assumption that LLMs can reliably synthesize OR-grounded relaxation strategies that are both principled and directly executable is central to the method, but without concrete examples, ablation studies, or metrics on the quality of generated strategies (e.g., feasibility rates or adherence to OR principles), it remains unclear whether this component functions as asserted or reduces to heuristic prompting.

minor comments (2)

Abstract: consider specifying the three COP benchmarks by name to provide immediate context for the claimed superiority.
Notation: ensure the unified triple-representation is formally defined (e.g., as a tuple or data structure) rather than described only at a high level, to improve clarity for readers implementing the method.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments. We address each major comment point by point below, indicating where revisions will be incorporated to strengthen the manuscript.

read point-by-point responses

Referee: Abstract: the central claim of 'consistent effectiveness and superior performance' on three COP benchmarks, along with prevention of premature convergence, is load-bearing for the paper's contribution, yet the abstract provides no quantitative results, error bars, baseline comparisons, or details on how the relaxation strategies were validated as principled; the results section must supply these to allow assessment of the effectiveness claims.

Authors: We agree that the abstract would benefit from explicit quantitative highlights to better support the central claims. In the revised manuscript, we will update the abstract to include key performance metrics such as average improvement percentages over baselines, standard deviations from repeated runs, and brief notes on how relaxation strategies were validated for feasibility and OR-principle adherence. The results section (Sections 4.1–4.4) already supplies detailed tables with baseline comparisons, error bars, and analysis of strategy quality; we will add explicit cross-references from the abstract to these sections for easier assessment. revision: yes
Referee: Framework description (unified triple-representation and bidirectional coevolution): the assumption that LLMs can reliably synthesize OR-grounded relaxation strategies that are both principled and directly executable is central to the method, but without concrete examples, ablation studies, or metrics on the quality of generated strategies (e.g., feasibility rates or adherence to OR principles), it remains unclear whether this component functions as asserted or reduces to heuristic prompting.

Authors: The manuscript already provides concrete examples of the unified triple-representation in Section 3.2, showing LLM-generated relaxation strategies with explicit OR-principle justifications and corresponding executable code snippets. The bidirectional coevolution mechanism is illustrated with algorithmic descriptions and interaction diagrams in Section 3.3. Ablation studies in Section 4.3 quantify the contribution of these components to overall performance. To further strengthen the validation, we will add explicit metrics on strategy feasibility rates and a summary of adherence to OR principles in the revised version, making clear that the approach is grounded rather than purely heuristic. revision: partial

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper describes AutoCO as an architectural framework that couples LLM-based synthesis of relaxation strategies with a unified triple-representation and bidirectional MCTS-EA coevolution for COPs. No equations, parameter-fitting procedures, or derivation steps are presented that reduce any claimed prediction or result to the inputs by construction. The central claims rest on the design of the triple-representation and coevolution mechanism plus external experimental validation on three benchmarks, rather than any self-referential loop or self-citation that bears the load of the effectiveness argument. The method choices are presented as independent engineering decisions grounded in OR principles, with no evidence of renaming known results or smuggling ansatzes via prior self-work.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on the assumption that LLMs possess sufficient reasoning capability to produce executable, OR-principled relaxation strategies and that the bidirectional exchange between MCTS and EA reliably balances exploration and intensification. No free parameters or invented physical entities are mentioned; the triple-representation and coevolution mechanism are introduced constructs whose effectiveness is asserted rather than derived from prior results.

axioms (2)

domain assumption LLMs can synthesize, justify, and instantiate relaxation strategies that are both principled and executable
This premise is invoked when the abstract states that the unified triple-representation enables the LLM to produce such strategies.
domain assumption Continuous exchange of priors and feedback between global MCTS and local EA prevents premature convergence
Stated directly in the description of the bidirectional coevolution mechanism.

invented entities (2)

unified triple-representation no independent evidence
purpose: Binds relaxation strategies, algorithmic principles, and executable codes so the LLM can synthesize and instantiate them
Introduced as a core innovation that enables the LLM to act as a proactive strategy designer.
bidirectional global-local coevolution mechanism no independent evidence
purpose: Couples MCTS for global trajectory exploration with EAs for local intensification to balance diversification and intensification
Presented as the mechanism that navigates fragmented solution spaces and prevents premature convergence.

pith-pipeline@v0.9.0 · 5749 in / 1635 out tokens · 40922 ms · 2026-05-18T16:59:42.409787+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

8 extracted references · 8 canonical work pages · 1 internal anchor

[1]

Large Language Models as Optimizers

Novel mutation strategy for enhancing shade and lshade algorithms for global numerical opti- mization.Swarm and Evolutionary Computation, 50:100455. Mahdi Mokhtarzadeh, Reza Tavakkoli-Moghaddam, Chefi Triki, and Yaser Rahimi. 2021. A hybrid of clustering and meta-heuristic algorithms to solve a p-mobile hub location–allocation problem with the depreciatio...

work page internal anchor Pith review Pith/arXiv arXiv 2021
[3]

First, read the parent's constraint relaxation strategy and problem information, and understand the current problem's constraint relaxation processing logic

Executable Code ： Do not output anything other than the specified requirements. First, read the parent's constraint relaxation strategy and problem information, and understand the current problem's constraint relaxation processing logic. Second, read the parent's algorithmic concept and executable code, learn and locate code segments related to constraint...

work page
[5]

First, read the parent's algorithmic concept and executable code, and understand the current code design logic

Executable Code ： Do not output anything other than the specified requirements. First, read the parent's algorithmic concept and executable code, and understand the current code design logic. Second, read the current constraint relaxation strategy, and clarify the constraint relaxation strategy that needs to be implemented. Then, design a new constraint r...

work page
[7]

First, read the parent's algorithmic concept and executable code, and understand the current code design logic

Executable Code ： Do not output anything other than the specified requirements. First, read the parent's algorithmic concept and executable code, and understand the current code design logic. Second, read the current constraint relaxation strategy, clarify the constraint relaxation strategy to be implemented and the current implementation approach for con...

work page
[9]

Executable Code ： Do not output anything other than the specified requirements. Parent1 :[Constraint Relaxation Strategy 1]; [Algorithmic Concept1]; [Executable Code 1] Parent2 :[Constraint Relaxation Strategy 2]; [Algorithmic Concept2]; [Executable Code 2] Figure 14: The prompt detail of crossover strategy 1 Crossover Strategy 2: Code Implementation Prom...

work page
[11]

Executable Code ： Do not output anything other than the specified requirements. Parent1 :[Constraint Relaxation Strategy 1]; [Algorithmic Concept1]; [Executable Code 1] Parent2 :[Constraint Relaxation Strategy 2]; [Algorithmic Concept2]; [Executable Code 2] Figure 15: The prompt detail of crossover strategy 2 Crossover Strategy 3: Algorithmic concepts Pro...

work page
[12]

Algorithmic Concept：

work page
[13]

particle

Executable Code ： Do not output anything other than the specified requirements. Parent1 :[Constraint Relaxation Strategy 1]; [Algorithmic Concept1]; [Executable Code 1] Parent2 :[Constraint Relaxation Strategy 2]; [Algorithmic Concept2]; [Executable Code 2] Figure 16: The prompt detail of crossover strategy 3 C Experimental Details C.1 Benchmarks To valid...

work page 1987

[1] [1]

Large Language Models as Optimizers

Novel mutation strategy for enhancing shade and lshade algorithms for global numerical opti- mization.Swarm and Evolutionary Computation, 50:100455. Mahdi Mokhtarzadeh, Reza Tavakkoli-Moghaddam, Chefi Triki, and Yaser Rahimi. 2021. A hybrid of clustering and meta-heuristic algorithms to solve a p-mobile hub location–allocation problem with the depreciatio...

work page internal anchor Pith review Pith/arXiv arXiv 2021

[2] [3]

First, read the parent's constraint relaxation strategy and problem information, and understand the current problem's constraint relaxation processing logic

Executable Code ： Do not output anything other than the specified requirements. First, read the parent's constraint relaxation strategy and problem information, and understand the current problem's constraint relaxation processing logic. Second, read the parent's algorithmic concept and executable code, learn and locate code segments related to constraint...

work page

[3] [5]

First, read the parent's algorithmic concept and executable code, and understand the current code design logic

Executable Code ： Do not output anything other than the specified requirements. First, read the parent's algorithmic concept and executable code, and understand the current code design logic. Second, read the current constraint relaxation strategy, and clarify the constraint relaxation strategy that needs to be implemented. Then, design a new constraint r...

work page

[4] [7]

First, read the parent's algorithmic concept and executable code, and understand the current code design logic

Executable Code ： Do not output anything other than the specified requirements. First, read the parent's algorithmic concept and executable code, and understand the current code design logic. Second, read the current constraint relaxation strategy, clarify the constraint relaxation strategy to be implemented and the current implementation approach for con...

work page

[5] [9]

Executable Code ： Do not output anything other than the specified requirements. Parent1 :[Constraint Relaxation Strategy 1]; [Algorithmic Concept1]; [Executable Code 1] Parent2 :[Constraint Relaxation Strategy 2]; [Algorithmic Concept2]; [Executable Code 2] Figure 14: The prompt detail of crossover strategy 1 Crossover Strategy 2: Code Implementation Prom...

work page

[6] [11]

Executable Code ： Do not output anything other than the specified requirements. Parent1 :[Constraint Relaxation Strategy 1]; [Algorithmic Concept1]; [Executable Code 1] Parent2 :[Constraint Relaxation Strategy 2]; [Algorithmic Concept2]; [Executable Code 2] Figure 15: The prompt detail of crossover strategy 2 Crossover Strategy 3: Algorithmic concepts Pro...

work page

[7] [12]

Algorithmic Concept：

work page

[8] [13]

particle

Executable Code ： Do not output anything other than the specified requirements. Parent1 :[Constraint Relaxation Strategy 1]; [Algorithmic Concept1]; [Executable Code 1] Parent2 :[Constraint Relaxation Strategy 2]; [Algorithmic Concept2]; [Executable Code 2] Figure 16: The prompt detail of crossover strategy 3 C Experimental Details C.1 Benchmarks To valid...

work page 1987