Warm-starting active-set solvers using graph neural networks

Daniel Arnstr\"om; Ella J. Schmidtobreick; Jens Sj\"olund; Paul H\"ausner

arxiv: 2511.13174 · v2 · pith:R4ZPAVD2new · submitted 2025-11-17 · 💻 cs.LG

Warm-starting active-set solvers using graph neural networks

Ella J. Schmidtobreick , Daniel Arnstr\"om , Paul H\"ausner , Jens Sj\"olund This is my paper

Pith reviewed 2026-05-21 18:33 UTC · model grok-4.3

classification 💻 cs.LG

keywords quadratic programminggraph neural networksactive-set solverswarm-startinglearning to optimizemodel predictive controlbipartite graphs

0 comments

The pith

Graph neural networks can warm-start active-set quadratic programming solvers by predicting active constraints from bipartite graph representations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes using graph neural networks to learn which constraints are active in quadratic programs, represented as bipartite graphs. This prediction serves as a warm-start for the dual active-set solver DAQP, cutting down the number of iterations needed. The approach performs similarly to a standard multilayer perceptron but uniquely generalizes to problem dimensions not encountered during training. This matters for time-sensitive applications like model predictive control where repeated QP solves occur under varying conditions.

Core claim

By encoding quadratic programs as bipartite graphs and applying graph neural network message passing, the method learns to approximate the optimal active set, enabling effective warm-starts that reduce solver iterations across different problem sizes while maintaining generalization to unseen dimensions unlike a multilayer perceptron baseline.

What carries the argument

Bipartite graph representation of the quadratic program with graph neural network message passing to predict the active set for warm-starting the DAQP solver.

If this is right

Reduces the number of iterations in the active-set solver compared to cold-starting for problems of varying sizes.
Performs comparably to a multilayer perceptron but generalizes better to unseen problem dimensions.
Supports scalability in real-time optimization tasks such as model predictive control.
The structure-aware learning exploits the graph structure of QPs for better flexibility.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar GNN warm-starting could apply to other optimization solvers beyond active-set methods.
Testing on specific control benchmarks like MPC for robotics might reveal practical speedups.
Combining with other learning-to-optimize techniques could further improve performance on high-dimensional problems.

Load-bearing premise

A bipartite graph representation of the quadratic program plus standard graph neural network message passing suffices to accurately predict the active set for warm-starting on problem instances with dimensions never seen in training.

What would settle it

Running the GNN warm-start on a set of quadratic programs with dimensions outside the training range and finding that the number of solver iterations does not decrease relative to cold-starting or that accuracy drops significantly.

Figures

Figures reproduced from arXiv: 2511.13174 by Daniel Arnstr\"om, Ella J. Schmidtobreick, Jens Sj\"olund, Paul H\"ausner.

**Figure 2.** Figure 2: A quadratic program can be represented as a bipartite graph having a set of variable nodes [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of iterations when warm-starting the DAQP solver with predictions from our graph neural network, a standard multilayer perceptron and cold-starting the solver without any learned prediction [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of solve time and prediction time for (a) our graph neural network, (b) a [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of iterations (left) and solve time (right) of cold-starting (purple) and warm [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of iterations (left) and solve time (right) of cold-starting (purple) and warm [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of solve time (red) and prediction time (purple) for our graph neural network [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of iterations when warm-starting the DAQP solver with predictions from our graph neural network, a standard multilayer perceptron and cold-starting the solver without any learned prediction, shown on logarithmic axes. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

read the original abstract

Quadratic programming (QP) solvers are widely used in real-time control and optimization, but their computational cost often limits applicability in time-critical settings. To resolve this, we propose a learning-to-optimize approach using graph neural networks (GNNs) to predict active constraints in the dual active-set solver DAQP. Our method exploits the structural properties of QPs by representing them as bipartite graphs and learns to approximate the optimal active set for effectively warm-starting the solver. Across varying problem sizes, the GNN consistently reduces the number of solver iterations compared to cold-starting, while performance is comparable to a multilayer perceptron baseline. In contrast to the baseline, our GNN-based approach trained on varying problem sizes generalizes to unseen dimensions, demonstrating flexibility and scalability. These results highlight the potential of structure-aware learning to accelerate optimization in real-time applications such as model predictive control.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GNN on bipartite QP graphs cuts iterations and generalizes to new sizes better than MLP, but local message passing may limit how far that holds when dimensions grow a lot.

read the letter

The paper trains a GNN on bipartite graphs of quadratic programs to predict active sets and warm-start the DAQP solver. It reports fewer iterations than cold starts across different problem sizes, and unlike an MLP baseline the GNN keeps working on dimensions it never saw in training. That combination is the concrete advance here: a structure-aware predictor tied to one specific active-set method plus explicit cross-size tests for control-type QPs. The experiments apparently deliver usable iteration reductions, which matters for real-time MPC where every iteration saved helps on embedded hardware. Credit to the authors for moving past fixed-size training and showing the practical difference in generalization behavior. The pipeline is straightforward and the baseline comparison is fair enough to make the claim readable. The soft spot is the one the stress-test note flags. Standard GNN layers aggregate locally, yet the active set in a QP can depend on couplings across the entire KKT system. If the test problems only stretch from modest training sizes to slightly larger ones, the reported generalization could be an artifact of the sampling rather than true scale invariance. The paper would be stronger with explicit numbers on the size jumps tested, error bars across runs, and perhaps a check with deeper or global-attention variants to see whether receptive field is the bottleneck. This is aimed at researchers who already work on learning-to-optimize for embedded control or who need fast QP solves in varying dimensions. A reader who wants empirical evidence that graph structure helps warm-starting more than a plain network will get something out of the results. The work is coherent on its own terms and ships a usable pipeline, so it deserves a serious referee even if the generalization story needs tighter bounds in revision.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes using graph neural networks (GNNs) to predict active sets for warm-starting the dual active-set QP solver DAQP. Quadratic programs are represented as bipartite graphs between variables and constraints; standard GNN message passing learns an approximation to the optimal active set that is then used to initialize the solver. The central empirical claims are that the GNN reduces solver iterations relative to cold-start across varying problem sizes, performs comparably to an MLP baseline, and—unlike the MLP—generalizes to problem dimensions never seen during training.

Significance. If the generalization results hold under rigorous testing, the work demonstrates a practical advantage of structure-aware GNNs over fixed-size MLPs for variable-dimension optimization problems, with potential utility in real-time model predictive control. The approach is empirical and data-driven; credit is due for directly measuring iteration reductions against cold-start and MLP baselines rather than relying on algebraic forcing. However, the absence of reported quantitative tables, error bars, or statistical tests in the provided abstract limits immediate assessment of effect sizes and robustness.

major comments (2)

[Abstract and Experiments] Abstract and Experiments section: The central claim that the GNN 'consistently reduces the number of solver iterations' and 'generalizes to unseen dimensions' while the MLP does not is load-bearing for the contribution. No quantitative tables, mean iteration counts, reduction percentages, standard deviations, or details on the exact training/test size ranges (e.g., training on n ≤ 20 vs. testing on n = 50) are referenced, making it impossible to judge whether the reported generalization reflects architectural scale invariance or modest extrapolation within a narrow sampling distribution.
[Method] Method section (bipartite graph + GNN message passing): The weakest assumption—that fixed-depth standard message passing on the bipartite QP graph suffices to learn globally consistent active-set predictions that remain accurate on much larger unseen graphs—is not directly tested. Because the KKT system couples all variables and constraints, long-range dependencies may not propagate when graph size increases; the MLP baseline comparison does not isolate this issue since the MLP lacks a variable-size input representation altogether. Concrete evidence (e.g., ablation on message-passing depth or receptive-field analysis) is needed to support the generalization claim.

minor comments (2)

[Abstract] Abstract: Consider briefly defining the bipartite-graph construction (variable nodes, constraint nodes, edge features) to improve accessibility for readers unfamiliar with QP-to-graph encodings.
[Notation] Notation: Ensure consistent use of symbols for problem dimension n, number of constraints m, and active-set size throughout; any mismatch between text and figures should be corrected.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We appreciate the referee's careful reading and the opportunity to address the concerns. We respond point-by-point to the major comments below and will revise the manuscript to improve clarity and provide additional supporting analysis.

read point-by-point responses

Referee: [Abstract and Experiments] Abstract and Experiments section: The central claim that the GNN 'consistently reduces the number of solver iterations' and 'generalizes to unseen dimensions' while the MLP does not is load-bearing for the contribution. No quantitative tables, mean iteration counts, reduction percentages, standard deviations, or details on the exact training/test size ranges (e.g., training on n ≤ 20 vs. testing on n = 50) are referenced, making it impossible to judge whether the reported generalization reflects architectural scale invariance or modest extrapolation within a narrow sampling distribution.

Authors: We agree that the abstract and experiments would benefit from explicit quantitative reporting. In the revised manuscript we will add a results table that reports mean iteration counts, percentage reductions relative to cold-start, standard deviations, and the precise training/test size ranges used (training on problems with n ≤ 20 variables/constraints and testing on n = 50). We will also insert a concise quantitative statement into the abstract. These additions will make the effect sizes and generalization behavior directly verifiable. revision: yes
Referee: [Method] Method section (bipartite graph + GNN message passing): The weakest assumption—that fixed-depth standard message passing on the bipartite QP graph suffices to learn globally consistent active-set predictions that remain accurate on much larger unseen graphs—is not directly tested. Because the KKT system couples all variables and constraints, long-range dependencies may not propagate when graph size increases; the MLP baseline comparison does not isolate this issue since the MLP lacks a variable-size input representation altogether. Concrete evidence (e.g., ablation on message-passing depth or receptive-field analysis) is needed to support the generalization claim.

Authors: We acknowledge that long-range dependency propagation is a legitimate concern for fixed-depth message passing on growing bipartite graphs. Our existing experiments already demonstrate that the GNN trained on smaller instances generalizes to larger unseen dimensions while the MLP does not, which indirectly supports that the learned message-passing captures the necessary couplings. Nevertheless, to directly test the assumption we will add an ablation on message-passing depth together with a short receptive-field analysis in the revised Method and Experiments sections. This will provide the concrete evidence requested. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the empirical GNN-based warm-starting method

full rationale

The paper describes an empirical learning-to-optimize pipeline: QPs are encoded as bipartite graphs, a GNN is trained to predict active sets, and the resulting warm-starts are evaluated by measuring iteration counts against cold-start and MLP baselines on both in-distribution and out-of-distribution problem sizes. No algebraic derivation, uniqueness theorem, or first-principles claim is presented that reduces to its own inputs by construction. Generalization to unseen dimensions is asserted via experimental results rather than via any self-referential definition or fitted-parameter renaming. The central performance claims therefore remain independent of the training procedure itself.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The central claim rests on standard supervised training of a GNN whose weights are free parameters fitted to a (unspecified) collection of QP instances; no new mathematical axioms or invented physical entities are introduced beyond the conventional assumption that message-passing on the bipartite graph captures active-set structure.

free parameters (1)

GNN weights and hyperparameters
Network parameters learned from training data on QP instances of varying sizes.

pith-pipeline@v0.9.0 · 5689 in / 1273 out tokens · 56227 ms · 2026-05-21T18:33:46.851167+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages · 2 internal anchors

[1]

Relational inductive biases, deep learning, and graph networks

arXiv:1806.01261. Francesco Borrelli, Alberto Bemporad, and Manfred Morari.Predictive control for linear and hybrid systems. Cambridge University Press,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

arXiv:2104.13478. Quentin Cappart, Didier Ch ´etelat, Elias B. Khalil, Andrea Lodi, Christopher Morris, and Petar Veliˇckovi´c. Combinatorial Optimization and Reasoning with Graph Neural Networks.Journal of Machine Learning Research, 24(130):1–61,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Henri Doerks, Paul H ¨ausner, Daniel Hern´andez Escobar, and Jens Sj ¨olund

arXiv:2406.05938. Henri Doerks, Paul H ¨ausner, Daniel Hern´andez Escobar, and Jens Sj ¨olund. Learning to accelerate distributed ADMM using graph neural networks.arXiv preprint arXiv:2509.05288,

work page arXiv
[4]

Rajiv Sambharya and Bartolomeo Stellato

arXiv:1911.07979. Rajiv Sambharya and Bartolomeo Stellato. Learning algorithm hyperparameters for fast parametric convex optimization.arXiv preprint arXiv:2411.15717,

work page arXiv 1911
[5]

Vrushabh Zinage, Ahmed Khalil, and Efstathios Bakolas

arXiv:2202.00264. Vrushabh Zinage, Ahmed Khalil, and Efstathios Bakolas. TransformerMPC: Accelerating Model Predictive Control via Transformers,

work page arXiv
[6]

13 Appendix A

arXiv:2409.09266. 13 Appendix A. Algorithm of theDAQPsolver Algorithm 1Dual active-set method for solving a dual QP (Arnstr ¨om et al.,

work page arXiv

[1] [1]

Relational inductive biases, deep learning, and graph networks

arXiv:1806.01261. Francesco Borrelli, Alberto Bemporad, and Manfred Morari.Predictive control for linear and hybrid systems. Cambridge University Press,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

arXiv:2104.13478. Quentin Cappart, Didier Ch ´etelat, Elias B. Khalil, Andrea Lodi, Christopher Morris, and Petar Veliˇckovi´c. Combinatorial Optimization and Reasoning with Graph Neural Networks.Journal of Machine Learning Research, 24(130):1–61,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Henri Doerks, Paul H ¨ausner, Daniel Hern´andez Escobar, and Jens Sj ¨olund

arXiv:2406.05938. Henri Doerks, Paul H ¨ausner, Daniel Hern´andez Escobar, and Jens Sj ¨olund. Learning to accelerate distributed ADMM using graph neural networks.arXiv preprint arXiv:2509.05288,

work page arXiv

[4] [4]

Rajiv Sambharya and Bartolomeo Stellato

arXiv:1911.07979. Rajiv Sambharya and Bartolomeo Stellato. Learning algorithm hyperparameters for fast parametric convex optimization.arXiv preprint arXiv:2411.15717,

work page arXiv 1911

[5] [5]

Vrushabh Zinage, Ahmed Khalil, and Efstathios Bakolas

arXiv:2202.00264. Vrushabh Zinage, Ahmed Khalil, and Efstathios Bakolas. TransformerMPC: Accelerating Model Predictive Control via Transformers,

work page arXiv

[6] [6]

13 Appendix A

arXiv:2409.09266. 13 Appendix A. Algorithm of theDAQPsolver Algorithm 1Dual active-set method for solving a dual QP (Arnstr ¨om et al.,

work page arXiv