A Hybrid Learning-to-Optimize Framework for Mixed-Integer Quadratic Programming

Mu Xie; Rahul Mangharam; Viet-Anh Le

arxiv: 2511.19383 · v2 · pith:CDOGN3FAnew · submitted 2025-11-24 · 📡 eess.SY · cs.SY

A Hybrid Learning-to-Optimize Framework for Mixed-Integer Quadratic Programming

Viet-Anh Le , Mu Xie , Rahul Mangharam This is my paper

Pith reviewed 2026-05-17 05:03 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords learning to optimizemixed-integer quadratic programmingmodel predictive controldifferentiable optimizationhybrid supervised self-supervised lossneural networks for integer decisionsparametric optimization

0 comments

The pith

A neural network predicts integer variables in parametric mixed-integer quadratic programs while a differentiable QP layer solves for the continuous part using a hybrid loss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a hybrid learning-to-optimize method for parametric mixed-integer quadratic programming problems that arise in mixed-integer model predictive control. A neural network is trained to map problem parameters directly to integer decisions, after which a differentiable quadratic programming layer computes the matching continuous variables. Training uses a hybrid loss that adds a supervised term matching known global optima to a self-supervised term enforcing the original objective and constraints. The resulting framework is evaluated on two standard MI-MPC benchmark problems against purely supervised and purely self-supervised baselines. If the approach holds, repeated online optimizations become faster while retaining near-optimality and feasibility for new parameter values.

Core claim

The framework learns a neural network to predict optimal integer solutions from problem parameters and integrates a differentiable QP layer to solve for continuous variables given those integers, trained with a hybrid loss that includes supervised terms for global optimality and self-supervised terms derived from the objective and constraints to ensure feasibility and performance on unseen instances.

What carries the argument

The hybrid loss function combining supervised loss with respect to the global optimal solution and self-supervised loss derived from the problem objective and constraints, together with the differentiable QP layer that computes exact continuous solutions once integers are fixed.

If this is right

Online solution times for repeated parametric MIQP instances drop because the expensive integer search is replaced by a forward pass through the network.
Feasibility of the returned solutions improves relative to purely supervised models because the self-supervised term penalizes constraint violations during training.
Optimality gaps stay small on the tested MI-MPC benchmarks because the supervised term pulls predictions toward known global optima.
The overall pipeline remains differentiable end-to-end, allowing gradient-based training even though the original MIQP is combinatorial.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same architecture could be applied to other parametric mixed-integer programs whose continuous relaxations are convex and efficiently solvable once integers are fixed.
Performance may degrade when the distribution of runtime parameters drifts far from the training distribution, suggesting periodic retraining or domain randomization as safeguards.
Scaling the network size or adding explicit feasibility layers might further tighten the gap to exact solvers on larger problem dimensions.
The method naturally supplies a warm-start integer guess that could accelerate exact branch-and-bound solvers when higher precision is required.

Load-bearing premise

A neural network trained on a finite collection of problem instances will still produce integer predictions whose corresponding QP solutions remain near-optimal and feasible when the parameters change at runtime.

What would settle it

Evaluate the trained model on a fresh collection of parameter values never seen during training and measure whether the resulting solutions violate any constraints or have objective values substantially higher than those returned by an exact MIQP solver.

Figures

Figures reproduced from arXiv: 2511.19383 by Mu Xie, Rahul Mangharam, Viet-Anh Le.

**Figure 1.** Figure 1: Architecture of the proposed hybrid framework (c) compared with supervised learning (a) and self-supervised learning (b). In our framework, the NN takes the problem parameters θ to predict the integer solution δ, while the QP layer computes the continuous solution x based on θ and δ. In conventional SL and SSL, the NN is trained to predict the integer solution without considering the continuous solution or… view at source ↗

**Figure 2.** Figure 2: Statistical comparison of the three models: hybrid L2O (H-L2O), supervised learning (SL), and self-supervised learning (SSL), for the robot navigation example. 0.0 1.0 2.0 3.0 Values 0 5000 10000 Count Violation rate SL: 5.7% SSL: 0.0% H-L2O: 1.1% Constraint violation (Integer) SL SSL H-L2O SL SSL H-L2O Models 0 10 20 Values Violation rate SL: 4.8% SSL: 11.0% H-L2O: 5.6% Constraint violation (Continuous) S… view at source ↗

**Figure 3.** Figure 3: Statistical comparison of the three models: hybrid L2O (H-L2O), supervised learning (SL), and self-supervised learning (SSL), for thermal energy tank example. each example, a multilayer perceptron network is constructed with four hidden layers, 128 neurons per layer, and ReLU activation functions. Our implementation and examples are available at https://github.com/mlab-upenn/L2O-MIQP. We compare the propo… view at source ↗

read the original abstract

In this paper, we propose a learning-to-optimize (L2O) framework to accelerate solving parametric mixed-integer quadratic programming (MIQP) problems, with a particular focus on mixed-integer model predictive control (MI-MPC) applications. The framework learns to predict integer solutions with enhanced optimality and feasibility by integrating supervised learning (for optimality), self-supervised learning (for feasibility), and a differentiable quadratic programming (QP) layer, resulting in a hybrid L2O framework. Specifically, a neural network (NN) is used to learn the mapping from problem parameters to optimal integer solutions, while a differentiable QP layer is integrated to compute the corresponding continuous variables given the predicted integers and problem parameters. Moreover, a hybrid loss function is proposed, which combines a supervised loss with respect to the global optimal solution, and a self-supervised loss derived from the problem's objective and constraints. The effectiveness of the proposed framework is demonstrated on two benchmark MI-MPC problems, with comparative results against purely supervised and self-supervised learning models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a hybrid NN-plus-differentiable-QP setup with mixed supervised and self-supervised losses for parametric MIQP, but the supporting evidence stays thin and the generalization claim rests on untested assumptions.

read the letter

The main takeaway is a hybrid learning-to-optimize method for parametric mixed-integer quadratic programs, focused on mixed-integer model predictive control. A neural network predicts the integer variables from the problem parameters, a differentiable QP layer then solves for the continuous variables, and training uses a combined loss: one term supervised on known global optima and another self-supervised term pulled from the objective and constraints. The abstract positions this three-part construction as the new element and reports comparisons against pure supervised and pure self-supervised baselines on two benchmark MI-MPC problems. That integration is the clearest contribution; it tries to get both optimality signals and feasibility signals into the same training loop without forcing the network to learn everything from scratch. The benchmarks and the direct comparison to the two single-loss variants are reasonable first steps for showing where the hybrid sits. The soft spots are straightforward. No numerical results, ablation numbers, or out-of-distribution tests appear in the available description, so it is impossible to judge whether the hybrid loss actually improves solution quality or just matches the supervised case. The central assumption—that a network trained on a finite set of instances will keep producing integer predictions whose downstream QP solutions stay near-optimal and feasible on unseen parameters—carries the load, and nothing in the abstract supplies a bound, recovery mechanism, or empirical check on that point. If the integer guesses drift, the framework has no obvious fallback. This is the sort of paper that control and optimization researchers working on fast MI-MPC solvers would want to see. A reader looking for concrete ideas on blending learned predictors with embedded differentiable layers could extract useful architecture details even before stronger numbers arrive. It deserves a serious referee. The motivation and the hybrid construction are clear enough that referees can usefully ask for the missing metrics, generalization experiments, and loss-weighting details rather than starting from scratch.

Referee Report

2 major / 2 minor

Summary. The paper proposes a hybrid learning-to-optimize framework for parametric mixed-integer quadratic programming (MIQP), focused on mixed-integer model predictive control (MI-MPC). A neural network predicts integer solutions from problem parameters; a differentiable QP layer then computes the corresponding continuous variables. Training uses a hybrid loss that combines a supervised term on global optima with a self-supervised term derived from the problem objective and constraints. Effectiveness is shown via comparative experiments on two benchmark MI-MPC problems against purely supervised and self-supervised baselines.

Significance. If the empirical results hold, the framework offers a practical route to accelerate MIQP solves in real-time control by learning integer predictions while enforcing feasibility and near-optimality through the embedded differentiable QP layer. The hybrid loss design and end-to-end differentiability are technically attractive strengths that could generalize beyond the two benchmarks if out-of-distribution behavior is demonstrated.

major comments (2)

[§4] §4 (Experiments): the reported comparisons on the two MI-MPC benchmarks supply no quantitative metrics (optimality gap, feasibility rate, or solve-time reduction) for parameter values outside the training distribution, leaving the central generalization claim unverified.
[§3.2] §3.2 (Hybrid Loss): the weighting coefficients between the supervised and self-supervised terms are treated as free hyperparameters with no sensitivity analysis or justification for the chosen balance; this directly affects whether the feasibility signal remains independent of the fitted network as asserted.

minor comments (2)

[§3] Notation for the integer prediction mapping and the subsequent QP layer should be introduced with explicit variable definitions before the loss is defined.
[§4] Figure captions for the benchmark results should state the exact number of test instances and whether they are drawn from the same distribution as training data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We address the major comments point by point below, proposing revisions to enhance the clarity and completeness of our work regarding generalization and the hybrid loss formulation.

read point-by-point responses

Referee: §4 (Experiments): the reported comparisons on the two MI-MPC benchmarks supply no quantitative metrics (optimality gap, feasibility rate, or solve-time reduction) for parameter values outside the training distribution, leaving the central generalization claim unverified.

Authors: We agree that explicit evaluation on out-of-distribution (OOD) parameters is important to substantiate the generalization claims. While the benchmarks include parametric variations, they may not fully cover OOD cases. In the revised manuscript, we will include additional experiments reporting optimality gap, feasibility rate, and solve-time reduction for parameter values outside the training distribution, such as scaled or extrapolated problem parameters. This will provide quantitative evidence for the framework's robustness beyond the training set. revision: yes
Referee: §3.2 (Hybrid Loss): the weighting coefficients between the supervised and self-supervised terms are treated as free hyperparameters with no sensitivity analysis or justification for the chosen balance; this directly affects whether the feasibility signal remains independent of the fitted network as asserted.

Authors: The coefficients were chosen through empirical tuning to ensure effective training where the self-supervised feasibility loss provides a signal independent of the network's predictions on integers. To address the concern, we will add a sensitivity analysis in the revised version, varying the weights and showing their impact on performance metrics and confirming that the feasibility term remains effective and largely independent. This will justify the chosen balance and strengthen the assertion. revision: yes

Circularity Check

0 steps flagged

No significant circularity in hybrid L2O MIQP framework

full rationale

The paper's core derivation uses a neural network to map parameters to integer solutions, followed by a differentiable QP layer to recover continuous variables, with training via a hybrid loss. The supervised branch targets global optima while the self-supervised branch is explicitly constructed from the problem objective and constraints; this provides an external feasibility/optimality signal independent of the fitted network weights. No equation or step reduces a prediction to its own inputs by construction, no fitted parameter is relabeled as a prediction, and no load-bearing self-citation or uniqueness theorem is invoked. The architecture remains self-contained against the benchmark MI-MPC instances and external QP solves.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Because only the abstract is available, the precise neural-network architecture, loss-weighting coefficients, and training-data generation procedure remain unspecified; the framework implicitly relies on standard assumptions that a differentiable QP layer exists and that the integer-to-continuous mapping is well-defined.

free parameters (2)

neural-network weights and biases
Learned parameters that map problem parameters to integer decisions; their values are fitted during training and not reported in the abstract.
loss weighting coefficients between supervised and self-supervised terms
Hand-chosen or tuned scalars that balance the two loss components; not specified in the abstract.

axioms (1)

domain assumption A differentiable quadratic-programming layer can be embedded inside the training graph and back-propagated through.
Invoked when the framework integrates the QP layer after the integer prediction; standard in differentiable optimization literature but treated as given.

pith-pipeline@v0.9.0 · 5478 in / 1384 out tokens · 38972 ms · 2026-05-17T05:03:37.746958+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

hybrid loss function... combines a supervised loss with respect to the global optimal solution, and a self-supervised loss derived from the problem's objective and constraints

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 1 internal anchor

[1]

Differentiable convex optimization layers

Akshay Agrawal, Brandon Amos, Shane Barratt, Stephen Boyd, Steven Diamond, and J Zico Kolter. Differentiable convex optimization layers. Advances in neural information processing systems, 32, 2019

work page 2019
[2]

Optnet: Differentiable optimization as a layer in neural networks

Brandon Amos and J Zico Kolter. Optnet: Differentiable optimization as a layer in neural networks. In International conference on machine learning, pages 136--145. PMLR, 2017

work page 2017
[3]

Formal methods for control synthesis: An optimization perspective

Calin Belta and Sadra Sadraddini. Formal methods for control synthesis: An optimization perspective. Annual Review of Control, Robotics, and Autonomous Systems, 2 0 (1): 0 115--140, 2019

work page 2019
[4]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Yoshua Bengio, Nicholas L \'e onard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[5]

Constrained optimization and Lagrange multiplier methods

Dimitri P Bertsekas. Constrained optimization and Lagrange multiplier methods. Academic press, 2014

work page 2014
[6]

Learning to solve parametric mixed-integer optimal control problems via differentiable predictive control

J \'a n Boldock \`y , Shahriar Dadras Javan, Martin Gulan, Martin M \"o nnigmann, and J \'a n Drgo n a. Learning to solve parametric mixed-integer optimal control problems via differentiable predictive control. arXiv preprint arXiv:2506.19646, 2025

work page arXiv 2025
[7]

Multi-robot pickup and delivery via distributed resource allocation

Andrea Camisa, Andrea Testa, and Giuseppe Notarstefano. Multi-robot pickup and delivery via distributed resource allocation. IEEE Transactions on Robotics, 39 0 (2): 0 1106--1118, 2022

work page 2022
[8]

Coco: Online mixed-integer control via supervised learning

Abhishek Cauligi, Preston Culbertson, Edward Schmerling, Mac Schwager, Bartolomeo Stellato, and Marco Pavone. Coco: Online mixed-integer control via supervised learning. IEEE Robotics and Automation Letters, 7 0 (2): 0 1447--1454, 2021

work page 2021
[9]

Prism: Recurrent neural networks and presolve methods for fast mixed-integer optimal control

Abhishek Cauligi, Ankush Chakrabarty, Stefano Di Cairano, and Rien Quirynen. Prism: Recurrent neural networks and presolve methods for fast mixed-integer optimal control. In Learning for Dynamics and Control Conference, pages 34--46. PMLR, 2022

work page 2022
[10]

Gurobi optimizer reference manual, 2021

Gurobi Optimization, LLC . Gurobi optimizer reference manual, 2021. URL http://www.gurobi.com

work page 2021
[11]

Distributed Optimization for Traffic Light Control and Connected Automated Vehicle Coordination in Mixed-Traffic Intersections

Viet-Anh Le and Andreas A Malikopoulos. Distributed Optimization for Traffic Light Control and Connected Automated Vehicle Coordination in Mixed-Traffic Intersections . IEEE Control Systems Letters, 8: 0 2721--2726, 2024

work page 2024
[12]

Malikopoulos

Viet-Anh Le, Panagiotis Kounatidis, and Andreas A. Malikopoulos. Combining Graph Attention Networks and Distributed Optimization for Multi-Robot Mixed-Integer Convex Programming . In 2025 64th IEEE Conference on Decision and Control, 2025

work page 2025
[13]

Real-time mixed-integer quadratic programming for vehicle decision-making and motion planning

Rien Quirynen, Sleiman Safaoui, and Stefano Di Cairano. Real-time mixed-integer quadratic programming for vehicle decision-making and motion planning. IEEE Transactions on Control Systems Technology, 2024

work page 2024
[14]

Motion planning and goal assignment for robot fleets using trajectory optimization

Jo \ a o Salvado, Robert Krug, Masoumeh Mansouri, and Fedorico Pecora. Motion planning and goal assignment for robot fleets using trajectory optimization. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7939--7946. IEEE, 2018

work page 2018
[15]

Learning to optimize for mixed-integer non-linear programming

Bo Tang, Elias B Khalil, and J \'a n Drgo n a. Learning to optimize for mixed-integer non-linear programming. arXiv preprint arXiv:2410.11061, 2024

work page arXiv 2024

[1] [1]

Differentiable convex optimization layers

Akshay Agrawal, Brandon Amos, Shane Barratt, Stephen Boyd, Steven Diamond, and J Zico Kolter. Differentiable convex optimization layers. Advances in neural information processing systems, 32, 2019

work page 2019

[2] [2]

Optnet: Differentiable optimization as a layer in neural networks

Brandon Amos and J Zico Kolter. Optnet: Differentiable optimization as a layer in neural networks. In International conference on machine learning, pages 136--145. PMLR, 2017

work page 2017

[3] [3]

Formal methods for control synthesis: An optimization perspective

Calin Belta and Sadra Sadraddini. Formal methods for control synthesis: An optimization perspective. Annual Review of Control, Robotics, and Autonomous Systems, 2 0 (1): 0 115--140, 2019

work page 2019

[4] [4]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Yoshua Bengio, Nicholas L \'e onard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[5] [5]

Constrained optimization and Lagrange multiplier methods

Dimitri P Bertsekas. Constrained optimization and Lagrange multiplier methods. Academic press, 2014

work page 2014

[6] [6]

Learning to solve parametric mixed-integer optimal control problems via differentiable predictive control

J \'a n Boldock \`y , Shahriar Dadras Javan, Martin Gulan, Martin M \"o nnigmann, and J \'a n Drgo n a. Learning to solve parametric mixed-integer optimal control problems via differentiable predictive control. arXiv preprint arXiv:2506.19646, 2025

work page arXiv 2025

[7] [7]

Multi-robot pickup and delivery via distributed resource allocation

Andrea Camisa, Andrea Testa, and Giuseppe Notarstefano. Multi-robot pickup and delivery via distributed resource allocation. IEEE Transactions on Robotics, 39 0 (2): 0 1106--1118, 2022

work page 2022

[8] [8]

Coco: Online mixed-integer control via supervised learning

Abhishek Cauligi, Preston Culbertson, Edward Schmerling, Mac Schwager, Bartolomeo Stellato, and Marco Pavone. Coco: Online mixed-integer control via supervised learning. IEEE Robotics and Automation Letters, 7 0 (2): 0 1447--1454, 2021

work page 2021

[9] [9]

Prism: Recurrent neural networks and presolve methods for fast mixed-integer optimal control

Abhishek Cauligi, Ankush Chakrabarty, Stefano Di Cairano, and Rien Quirynen. Prism: Recurrent neural networks and presolve methods for fast mixed-integer optimal control. In Learning for Dynamics and Control Conference, pages 34--46. PMLR, 2022

work page 2022

[10] [10]

Gurobi optimizer reference manual, 2021

Gurobi Optimization, LLC . Gurobi optimizer reference manual, 2021. URL http://www.gurobi.com

work page 2021

[11] [11]

Distributed Optimization for Traffic Light Control and Connected Automated Vehicle Coordination in Mixed-Traffic Intersections

Viet-Anh Le and Andreas A Malikopoulos. Distributed Optimization for Traffic Light Control and Connected Automated Vehicle Coordination in Mixed-Traffic Intersections . IEEE Control Systems Letters, 8: 0 2721--2726, 2024

work page 2024

[12] [12]

Malikopoulos

Viet-Anh Le, Panagiotis Kounatidis, and Andreas A. Malikopoulos. Combining Graph Attention Networks and Distributed Optimization for Multi-Robot Mixed-Integer Convex Programming . In 2025 64th IEEE Conference on Decision and Control, 2025

work page 2025

[13] [13]

Real-time mixed-integer quadratic programming for vehicle decision-making and motion planning

Rien Quirynen, Sleiman Safaoui, and Stefano Di Cairano. Real-time mixed-integer quadratic programming for vehicle decision-making and motion planning. IEEE Transactions on Control Systems Technology, 2024

work page 2024

[14] [14]

Motion planning and goal assignment for robot fleets using trajectory optimization

Jo \ a o Salvado, Robert Krug, Masoumeh Mansouri, and Fedorico Pecora. Motion planning and goal assignment for robot fleets using trajectory optimization. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7939--7946. IEEE, 2018

work page 2018

[15] [15]

Learning to optimize for mixed-integer non-linear programming

Bo Tang, Elias B Khalil, and J \'a n Drgo n a. Learning to optimize for mixed-integer non-linear programming. arXiv preprint arXiv:2410.11061, 2024

work page arXiv 2024