pith. machine review for the scientific record. sign in

arxiv: 2605.07113 · v1 · submitted 2026-05-08 · 💻 cs.LG · math.OC

Recognition: no theorem link

Solving Max-Cut to Global Optimality via Feasibility-Preserving Graph Neural Networks

Authors on Pith no claims yet

Pith reviewed 2026-05-11 00:50 UTC · model grok-4.3

classification 💻 cs.LG math.OC
keywords Max-CutGraph Neural NetworksSemidefinite ProgrammingBranch-and-BoundCombinatorial OptimizationSelf-supervised LearningExact Algorithms
0
0 comments X

The pith

A graph neural network can predict always-valid primal and dual SDP solutions for Max-Cut and replace expensive solvers inside branch-and-bound.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a graph neural network tailored to Max-Cut that outputs both primal and dual feasible solutions to the semidefinite programming relaxation used for bounding. The network is trained in a self-supervised way so that no solved SDP instances are required as labels, and its update rules are constructed to guarantee feasibility by design. When inserted into a branch-and-bound solver, the network replaces repeated calls to a commercial SDP solver, cutting the time spent on bounding by as much as 10.6 times on the tested instances. A reader cares because many practical problems are instances of Max-Cut and the main barrier to exact global solutions has been the cost of solving the convex relaxations at every node of the search tree.

Core claim

We propose a Max-Cut-specific graph neural network that serves as a principled, lightweight neural proxy for SDP solvers and can be plugged directly into an exact branch-and-bound framework. The proposed architecture has update steps of complexity O(n² + ne) and predicts both primal- and dual-feasible SDP solutions. The primal SDP solutions yield feasible Max-Cut solutions via the Goemans-Williamson algorithm. In addition, it is trained in a self-supervised fashion without requiring solved SDP relaxations as labels. Empirically, we show that our architecture can substantially reduce the cost of bounding in exact Max-Cut solving by up to 10.6 × compared with using the state-of-the-art SDP sol

What carries the argument

Max-Cut-specific graph neural network whose message-passing steps are constructed to enforce primal and dual feasibility of the predicted SDP solutions at every iteration.

If this is right

  • The GNN can be inserted directly into existing branch-and-bound codes for exact Max-Cut without changing the search logic.
  • Primal outputs immediately produce feasible cuts through the Goemans-Williamson rounding procedure.
  • Self-supervised training removes the need to pre-solve SDP relaxations on the subproblems that arise during search.
  • Per-iteration cost of O(n² + ne) keeps the method practical for graphs of moderate size inside the search tree.
  • The reported 10.6× reduction in bounding time is measured against the commercial solver Mosek on the same instances.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same feasibility-preserving construction could be tried on other combinatorial problems whose branch-and-bound routines rely on SDP relaxations.
  • Because the model never requires solved SDP labels, it could be fine-tuned online on the subproblems encountered in a single long run.
  • If the learned bounds remain tight across different graph distributions, the method would reduce dependence on commercial convex solvers for structured problems.
  • A natural next measurement is whether the total number of branch-and-bound nodes explored changes when the learned bounds replace the exact SDP bounds.

Load-bearing premise

The self-supervised GNN will produce sufficiently tight and always-valid primal/dual SDP solutions on the unseen subproblems generated during branch-and-bound search, without any labeled SDP data from those subproblems.

What would settle it

Run the same branch-and-bound solver on a collection of standard Max-Cut benchmark graphs once with the GNN bounding routine and once with Mosek; if the GNN version explores more nodes, returns a worse objective, or takes longer total time, the claim does not hold.

Figures

Figures reproduced from arXiv: 2605.07113 by Andrea Lodi, Can Li, Chendi Qian, Christopher Morris, Hao Chen.

Figure 1
Figure 1. Figure 1: Overview of our neural SDP surrogate for Max-Cut. (1) Data Generation: Subgraphs are generated dynamically from random branch-and-bound search trees. (2) GNN Prediction: MC-MPNN predicts unconstrained primal and dual features. (3) Feasibility Guarantee: A radial projection shifts the yˆ and Sˆ to the feasible region, while the primal matrix is factorized by construc￾tion. The resulting valid bounds are use… view at source ↗
Figure 2
Figure 2. Figure 2: Effect of normalization layers on objective gaps. The absolute (left) and relative (right) [PITH_FULL_IMAGE:figures/full_fig_p025_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance of MC-MPNN and δ-MC-MPNN under different normalizations. Absolute (top) and relative (bottom) dual objective gaps evaluated on the test set [PITH_FULL_IMAGE:figures/full_fig_p026_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Effect of normalization layers and layer variants on objective gaps, trained within the [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effect of training distribution on objective gaps, with [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗
read the original abstract

Exact solution of hard combinatorial optimization problems often relies on strong convex relaxations, but solving these relaxations repeatedly inside a branch-and-bound algorithm can be prohibitively expensive. Hence, we consider this challenge for Max-Cut, where branch and bound commonly uses semidefinite programming (SDP) relaxations to bound subproblems. We propose a Max-Cut-specific graph neural network that serves as a principled, lightweight neural proxy for these SDP solvers and can be plugged directly into an exact branch-and-bound framework. The proposed architecture has update steps of complexity $\mathcal{O}(n^2 + ne)$, and predicts both primal- and dual-feasible SDP solutions. The primal SDP solutions yield feasible Max-Cut solutions via the Goemans--Williamson algorithm. In addition, it is trained in a self-supervised fashion without requiring solved SDP relaxations as labels. Empirically, we show that our architecture can substantially reduce the cost of bounding in exact Max-Cut solving by up to $10.6 \times$ compared with using the state-of-the-art SDP solver Mosek. Our work highlights the potential of learned, validity-preserving surrogates for accelerating exact optimization over structured convex relaxations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a Max-Cut-specific graph neural network that acts as a lightweight, feasibility-preserving surrogate for SDP relaxations. The architecture uses O(n² + ne) message-passing layers to output both primal- and dual-feasible solutions, is trained self-supervised without labeled SDP data, and is intended to be plugged into branch-and-bound to accelerate bounding. Empirically, it reports up to 10.6× reduction in bounding cost versus Mosek on Max-Cut instances.

Significance. If the GNN produces bounds of comparable tightness to Mosek on the modified subproblems arising inside branch-and-bound, the approach could meaningfully accelerate exact Max-Cut solvers and serve as a template for learned surrogates on other structured convex relaxations. The self-supervised training regime and explicit primal/dual feasibility projection are concrete strengths that avoid the usual labeling bottleneck and validity issues common in ML-for-optimization work.

major comments (2)
  1. [§5] §5 (experimental results): The headline claim is a 10.6× reduction in bounding cost, yet the manuscript reports neither the number of branch-and-bound nodes explored nor the total wall-clock time to optimality when the GNN replaces Mosek. Without these quantities it is impossible to determine whether any loss in bound tightness on dynamically generated subproblems (fixed variables, reweighted edges) increases tree size enough to offset the per-call speedup.
  2. [§4.2] §4.2 (architecture and training): The feasibility projection is claimed to guarantee valid primal/dual SDP solutions on unseen subproblems, but no quantitative comparison of duality gaps or objective values versus Mosek is provided on the specific subproblem distribution encountered during search. If average gap widens even modestly, the central empirical claim for exact solving does not follow.
minor comments (2)
  1. [§3] The notation 'e' for number of edges is used in the complexity statement but never defined in the text; add an explicit sentence in §3.
  2. Table captions in the experimental section should state the graph families, sizes, and whether the reported times include only the relaxation or the full B&B run.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and positive review of our manuscript. We address the two major comments point by point below and will incorporate revisions to strengthen the empirical validation of the approach.

read point-by-point responses
  1. Referee: [§5] §5 (experimental results): The headline claim is a 10.6× reduction in bounding cost, yet the manuscript reports neither the number of branch-and-bound nodes explored nor the total wall-clock time to optimality when the GNN replaces Mosek. Without these quantities it is impossible to determine whether any loss in bound tightness on dynamically generated subproblems (fixed variables, reweighted edges) increases tree size enough to offset the per-call speedup.

    Authors: We agree that reporting the number of branch-and-bound nodes and the total wall-clock time to optimality under the GNN surrogate would provide a more complete assessment of net performance gains. Our current results isolate the per-bounding-call speedup because bounding dominates runtime in exact Max-Cut solvers; however, we acknowledge that any degradation in bound tightness on dynamically generated subproblems could enlarge the search tree. In the revised manuscript we will add full branch-and-bound experiments that integrate the GNN, reporting node counts and end-to-end solve times on the same benchmark instances used for the bounding-cost comparison. revision: yes

  2. Referee: [§4.2] §4.2 (architecture and training): The feasibility projection is claimed to guarantee valid primal/dual SDP solutions on unseen subproblems, but no quantitative comparison of duality gaps or objective values versus Mosek is provided on the specific subproblem distribution encountered during search. If average gap widens even modestly, the central empirical claim for exact solving does not follow.

    Authors: The primal/dual feasibility projection guarantees that every output is a valid SDP solution and therefore yields a valid bound, independent of the subproblem distribution. Nevertheless, we did not include a direct side-by-side comparison of duality gaps or objective values between the GNN and Mosek on the specific subproblems (with fixed variables and reweighted edges) that arise inside branch-and-bound. We will add this quantitative analysis to the revised version by sampling subproblems from the search tree and reporting the resulting gaps and objective values relative to Mosek. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical speedup measured against external solver

full rationale

The paper's derivation chain consists of an architecture definition (O(n² + ne) message-passing layers plus feasibility projection), a self-supervised training objective that does not require solved SDP labels, and an empirical runtime comparison of bounding cost inside branch-and-bound against the external Mosek solver. No equation or claim reduces by construction to a fitted parameter, self-citation, or renamed input; the 10.6× factor is an observed aggregate wall-clock improvement on dynamically generated subproblems, not a tautological restatement of the model. The central claim therefore remains externally falsifiable and independent of its own fitted values.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed from abstract only; full architectural and training details unavailable. The main external dependency is the standard Goemans-Williamson rounding procedure.

axioms (1)
  • standard math Goemans-Williamson rounding applied to a primal-feasible SDP solution yields a feasible Max-Cut cut.
    Classic result in approximation algorithms, invoked implicitly when the paper states that primal SDP solutions yield feasible Max-Cut solutions.

pith-pipeline@v0.9.0 · 5518 in / 1212 out tokens · 62417 ms · 2026-05-11T00:50:44.589997+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

90 extracted references · 15 canonical work pages · 3 internal anchors

  1. [1]

    Achterberg.Constraint integer programming

    T. Achterberg.Constraint integer programming. PhD thesis, 2007. 3

  2. [2]

    Achterberg, T

    T. Achterberg, T. Koch, and A. Martin. Branching rules revisited.Operations Research Letters, 33(1):42–54, 2005. 3

  3. [3]

    Agrawal, B

    A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and J. Z. Kolter. Differentiable convex optimization layers.Advances in Neural Information Processing Systems, 2019. 3

  4. [4]

    A. M. Alvarez, Q. Louveaux, and L. Wehenkel. A machine learning-based approximation of strong branching.INFORMS Journal on Computing, 29(1):185–195, 2017. 3

  5. [5]

    Amos and J

    B. Amos and J. Z. Kolter. Optnet: Differentiable optimization as a layer in neural networks. In International Conference on Machine Learning. PMLR, 2017. 3

  6. [6]

    M. ApS. Mosek optimizer api for python.Version, 9(17):6–4, 2022. 2, 7, 22

  7. [7]

    J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization.arXiv preprint arXiv:1607.06450,

  8. [8]

    L. Babai. Lectures on graph isomorphism. University of Toronto, Department of Computer Science. Mimeographed lecture notes, October 1979, 1979. 16

  9. [9]

    Barahona, M

    F. Barahona, M. Grötschel, M. Jünger, and G. Reinelt. An application of combinatorial optimization to statistical physics and circuit layout design.Operations Research, 36(3): 493–513, 1988. doi: 10.1287/opre.36.3.493. 2

  10. [10]

    Bengio, A

    Y . Bengio, A. Lodi, and A. Prouvost. Machine learning for combinatorial optimization: a methodological tour d’horizon.European Journal of Operational Research, 290(2):405–421,

  11. [11]

    Sdplib 1.2, a library of semidefinite program- ming test problems.Optimization Methods and Software, 11(1-4):683–690, 1999

    X. Bresson and T. Laurent. Residual gated graph ConvNets.arXiv preprint arXiv:1711.07553v2,

  12. [12]

    Burer and R

    S. Burer and R. D. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization.Mathematical programming, 95(2):329–357, 2003. 22

  13. [13]

    J.-Y . Cai, M. Fürer, and N. Immerman. An optimal lower bound on the number of variables for graph identification.Combinatorica, 12(4):389–410, 1992. 4, 16

  14. [14]

    T. Cai, S. Luo, K. Xu, D. He, T.-y. Liu, and L. Wang. Graphnorm: A principled approach to accelerating graph neural network training. InInternational Conference on Machine Learning,

  15. [15]

    Cappart, D

    Q. Cappart, D. Chételat, E. B. Khalil, A. Lodi, C. Morris, and P. Veliˇckovi´c. Combinatorial optimization and reasoning with graph neural networks.Journal of Machine Learning Research, 24(130):1–61, 2023. 1

  16. [16]

    Chambolle and T

    A. Chambolle and T. Pock. On the ergodic convergence rates of a first-order primal–dual algorithm.Mathematical Programming, 159(1):253–287, 2016. 20

  17. [17]

    B. Chen, P. L. Donti, K. Baker, J. Z. Kolter, and M. Bergés. Enforcing policy feasibility constraints through differentiable projection for energy optimization. InProceedings of the Twelfth ACM International Conference on Future Energy Systems, pages 199–210, 2021. 3

  18. [18]

    X. Chen, J. Liu, and W. Yin. Learning to optimize: A tutorial for continuous and mixed-integer optimization.Science China Mathematics, 67(6):1191–1262, 2024. 3

  19. [19]

    Z. Chen, J. Liu, X. Wang, and W. Yin. On representing linear programs by graph neural networks. InThe Eleventh International Conference on Learning Representations, 2023. 1, 2

  20. [20]

    Z. Chen, J. Liu, X. Wang, and W. Yin. On representing mixed-integer linear programs by graph neural networks. InThe Eleventh International Conference on Learning Representations, 2023. 2 11

  21. [21]

    Z. Chen, J. Liu, X. Chen, X. Wang, and W. Yin. Rethinking the capacity of graph neural networks for branching strategy.Advances in Neural Information Processing Systems, 2024. 1, 3

  22. [22]

    Z. Chen, X. Chen, J. Liu, X. Wang, and W. Yin. Expressive power of graph neural networks for (mixed-integer) quadratic programs. InForty-second International Conference on Machine Learning, 2025. 2

  23. [23]

    Deza and E

    A. Deza and E. B. Khalil. Machine learning for cutting planes in integer programming: A survey. InIJCAI, 2023. 1

  24. [24]

    J.-Y . Ding, C. Zhang, L. Shen, S. Li, B. Wang, Y . Xu, and L. Song. Accelerating primal solution findings for mixed integer programs based on solution prediction. InAAAI, 2020. 2

  25. [25]

    Donti, D

    P. Donti, D. Rolnick, and J. Z. Kolter. DC3: A learning method for optimization with hard constraints. InInternational Conference on Learning Representations, 2021. 3

  26. [26]

    Duvenaud, D

    D. Duvenaud, D. Maclaurin, J. Aguilera-Iparraguirre, R. Gómez-Bombarelli, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams. Convolutional networks on graphs for learning molecular fingerprints.arXiv preprint arXiv:1509.09292, 2015. 2

  27. [27]

    Erd ˝os and A

    P. Erd ˝os and A. Rényi. On random graphs i.Publ. math. debrecen, 6(290-297):18, 1959. 7

  28. [28]

    Fioretto, T

    F. Fioretto, T. W. Mak, and P. Van Hentenryck. Predicting ac optimal power flows: Combining deep learning and lagrangian dual methods. InProceedings of the AAAI conference on artificial intelligence, volume 34, pages 630–637, 2020. 3

  29. [29]

    Fioretto, P

    F. Fioretto, P. Van Hentenryck, T. W. Mak, C. Tran, F. Baldo, and M. Lombardi. Lagrangian duality for constrained deep learning. InJoint European conference on machine learning and knowledge discovery in databases, pages 118–135. Springer, 2020. 3

  30. [30]

    G. E. C. Flores, H. Chen, and C. Li. Enforcing hard linear constraints in deep learning models with decision rules. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 3

  31. [31]

    Gasse, D

    M. Gasse, D. Chételat, N. Ferroni, L. Charlin, and A. Lodi.Exact combinatorial optimization with graph convolutional neural networks. Curran Associates Inc., Red Hook, NY , USA, 2019. 1, 2, 3

  32. [32]

    Gilmer, S

    J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural message passing for quantum chemistry. InInternational Conference on Machine Learning, 2017. 1, 2

  33. [33]

    M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming.Journal of the ACM (JACM), 42 (6):1115–1145, 1995. 3, 5

  34. [34]

    Grohe.Descriptive complexity, canonisation, and definable graph structure theory

    M. Grohe.Descriptive complexity, canonisation, and definable graph structure theory. Cam- bridge University Press, 2017. 16

  35. [35]

    Gusmeroli, T

    N. Gusmeroli, T. Hrga, B. Lužar, J. Povh, M. Siebenhofer, and A. Wiegele. Biqbin: a par- allel branch-and-bound solver for binary quadratic problems with linear constraints.ACM Transactions on Mathematical Software (TOMS), 48(2):1–31, 2022. 3, 7, 22, 25

  36. [36]

    Hamilton, Z

    W. Hamilton, Z. Ying, and J. Leskovec. Inductive representation learning on large graphs. NeurIPS, 2017. 2

  37. [37]

    Q. Han, C. Li, Z. Lin, C. Chen, Q. Deng, D. Ge, H. Liu, and Y . Ye. A low-rank admm splitting approach for semidefinite programming.INFORMS Journal on Computing, 2025. 2

  38. [38]

    Helmberg and F

    C. Helmberg and F. Rendl. Solving quadratic (0, 1)-problems by semidefinite programs and cutting planes.Mathematical programming, 82(3):291–315, 1998. 3

  39. [39]

    R. A. Horn and C. R. Johnson.Topics in matrix analysis. Cambridge university press, 1994. 17, 19 12

  40. [40]

    Hrga and J

    T. Hrga and J. Povh. Madam: a parallel exact solver for max-cut based on semidefinite programming and admm.Computational Optimization and Applications, 80:347–375, 2021. ISSN 1573-2894. doi: 10.1007/s10589-021-00310-6. 3

  41. [41]

    Immerman and E

    N. Immerman and E. Lander. Describing graphs: A first-order approach to graph canonization. InComplexity Theory Retrospective: In Honor of Juris Hartmanis on the Occasion of His Sixtieth Birthday, July 5, 1988, pages 59–81, 1990. 16

  42. [42]

    Khalil, P

    E. Khalil, P. Le Bodic, L. Song, G. Nemhauser, and B. Dilkina. Learning to branch in mixed integer programming. InProceedings of the AAAI conference on artificial intelligence, volume 30, 2016. 3

  43. [43]

    E. B. Khalil, C. Morris, and A. Lodi. Mip-gnn: A data-driven framework for guiding combina- torial solvers. InAAAI, 2022. 1, 2

  44. [44]

    D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. InICLR, 2015. 22

  45. [45]

    Korte and J

    B. Korte and J. Vygen.Combinatorial Optimization: Theory and Algorithms. Springer, 5th edition, 2012. 1

  46. [46]

    Kotary and F

    J. Kotary and F. Fioretto. Learning constrained optimization with deep augmented lagrangian methods.arXiv preprint arXiv:2403.03454, 2024. 3

  47. [47]

    Krislock, J

    N. Krislock, J. Malick, and F. Roupin. Biqcrunch: A semidefinite branch-and-bound method for solving binary quadratic problems.ACM Trans. Math. Softw., 43(4), Jan. 2017. ISSN 0098-3500. doi: 10.1145/3005345. 3, 9, 27

  48. [48]

    B. Li, L. Yang, Y . Chen, S. Wang, H. Mao, Q. Chen, Y . Ma, A. Wang, T. Ding, J. Tang, and R. Sun. Pdhg-unrolled learning-to-optimize method for large-scale linear programming. In Proceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org,

  49. [49]

    M. Li, S. Kolouri, and J. Mohammadi. Learning to solve optimization problems with hard linear constraints.IEEE Access, 11:59995–60004, 2023. ISSN 21693536. doi: 10.1109/access.2023. 3285199. 3

  50. [50]

    Q. Li, T. Ding, L. Yang, M. Ouyang, Q. Shi, and R. Sun. On the power of small-size graph neural networks for linear programming. InNeurIPS, 2024. 2

  51. [51]

    Q. Li, M. Ouyang, T. Ding, Y . Wang, Q. Shi, and R. Sun. Towards explaining the power of constant-depth graph neural networks for structured linear programming. InICLR, 2025. 2

  52. [52]

    R. Li, E. Liang, and M. Chen. On the expressivity of GNN for solving second order cone programming. InNeurIPS Workshop on GPU-Accelerated and Scalable Optimization, 2025. 2

  53. [53]

    S. Li, W. Ouyang, M. B. Paulus, and C. Wu. Learning to configure separators in branch-and-cut. InNeurIPS, 2023. 1

  54. [54]

    Liang, M

    E. Liang, M. Chen, and S. Low. Low complexity homeomorphic projection to ensure neural- network solution feasibility for optimization over (Non-)Convex set. InInternational Conference on Machine Learning. PMLR, 2023. 3

  55. [55]

    Liang, M

    E. Liang, M. Chen, and S. H. Low. Homeomorphic projection to ensure neural-network solution feasibility for constrained optimization.Journal of Machine Learning Research, 25:1–55, 2024. 3

  56. [56]

    J. Lin, J. Zhu, H. Wang, and T. Zhang. Learning to branch with tree-aware branching transform- ers.Knowledge-Based Systems, 252:109455, 2022. 3

  57. [57]

    Maron, H

    H. Maron, H. Ben-Hamu, H. Serviansky, and Y . Lipman. Provably powerful graph networks. In NeurIPS, 2019. 2, 21

  58. [58]

    Morris, M

    C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, G. Rattan, and M. Grohe. Weisfeiler and Leman go neural: Higher-order graph neural networks. InAAAI, 2019. 2, 4 13

  59. [59]

    Morris, G

    C. Morris, G. Rattan, and P. Mutzel. Weisfeiler and Leman go sparse: Towards higher-order graph embeddings. InNeurIPS, 2020. 2

  60. [60]

    Morris, G

    C. Morris, G. Rattan, S. Kiefer, and S. Ravanbakhsh. SpeqNets: Sparsity-aware permutation- equivariant graph networks. InICML, 2022. 2

  61. [61]

    V . Nair, S. Bartunov, F. Gimeno, I. von Glehn, P. Lichocki, I. Lobov, B. O’Donoghue, N. Son- nerat, C. Tjandraatmadja, P. Wang, R. Addanki, T. Hapuarachchi, T. Keck, J. Keeling, P. Kohli, I. Ktena, Y . Li, O. Vinyals, and Y . Zwols. Solving mixed integer programs using neural networks. CoRR, abs/2012.13349, 2020. 1

  62. [62]

    O’Donoghue

    B. O’Donoghue. Operator splitting for a homogeneous embedding of the linear complementarity problem.SIAM Journal on Optimization, 31(3):1999–2023, 2021. 22

  63. [63]

    X. Pan, M. Chen, T. Zhao, and S. H. Low. Deepopf: A feasibility-optimized deep neural network approach for ac optimal power flow problems.IEEE Systems Journal, 17(1):673–683,

  64. [64]

    Park and P

    S. Park and P. Van Hentenryck. Self-supervised primal-dual learning for constrained optimiza- tion. InAAAI, 2023. 3

  65. [65]

    M. B. Paulus, G. Zarpellon, A. Krause, L. Charlin, and C. Maddison. Learning to cut by looking ahead: Cutting plane selection via imitation learning. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research...

  66. [66]

    Poljak and Z

    S. Poljak and Z. Tuza. The max-cut problem — a survey. InSpecial Year on Combinatorial Optimization. American Mathematical Society, 1995. 2

  67. [67]

    At- tending to graph transformers.arXiv preprint, 2023

    C. Qian and C. Morris. Principled data augmentation for learning to solve quadratic program- ming problems.arXiv preprint arXiv:2506.01728, 2025. 2

  68. [68]

    Qian and C

    C. Qian and C. Morris. Towards graph neural networks for provably solving convex optimization problems.arXiv preprint arXiv:2502.02446, 2025. 3

  69. [69]

    On the Expressive Power of GNNs to Solve Linear SDPs

    C. Qian and C. Morris. On the expressive power of gnns to solve linear sdps.arXiv preprint arXiv:2604.27786, 2026. 2, 4, 5, 7, 17, 20, 21, 24

  70. [70]

    C. Qian, D. Chételat, and C. Morris. Exploring the power of graph neural networks in solving linear optimization problems. InAISTATS, 2024. 1, 2, 3

  71. [71]

    Rashwan, K

    A. Rashwan, K. Briggs, C. Budd, and L. M. Kreusser. Enforcing convex constraints in graph neural networks. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. 3

  72. [72]

    Rendl, G

    F. Rendl, G. Rinaldi, and A. Wiegele. A branch and bound algorithm for max-cut based on combining semidefinite and polyhedral relaxations. InInternational Conference on Integer Programming and Combinatorial Optimization, pages 295–309. Springer, 2007. 2

  73. [73]

    Rendl, G

    F. Rendl, G. Rinaldi, and A. Wiegele. Solving Max-Cut to optimality by intersecting semidefinite and polyhedral relaxations.Math. Programming, 121(2):307, 2010. 3, 5, 7, 22, 25

  74. [74]

    Scarselli, M

    F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. The graph neural network model.IEEE Transactions on Neural Networks, 20(1):61–80, 2008. 1, 2

  75. [75]

    Scavuzzo, K

    L. Scavuzzo, K. Aardal, A. Lodi, and N. Yorke-Smith. Machine learning augmented branch and bound for mixed integer linear programming.Mathematical Programming, pages 1–44, 2024. 3

  76. [76]

    B. Tang, E. B. Khalil, and J. Drgona. Learning to optimize for mixed-integer non-linear pro- gramming with feasibility guarantees. InWorkshop on Differentiable Learning of Combinatorial Algorithms, 2025. 3

  77. [77]

    Tanneau and P

    M. Tanneau and P. Van Hentenryck. Dual lagrangian learning for conic optimization.NeurIPS,

  78. [78]

    RAYEN: Imposition of Hard Convex Constraints on Neural Networks

    J. Tordesillas, J. P. How, and M. Hutter. Rayen: Imposition of hard convex constraints on neural networks.arXiv preprint arXiv:2307.08336, 2023. 3

  79. [79]

    Vandenberghe and S

    L. Vandenberghe and S. Boyd. Semidefinite programming.SIAM review, 38(1):49–95, 1996. 2

  80. [80]

    Veliˇckovi´c, G

    P. Veliˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y . Bengio. Graph attention networks. InInternational Conference on Learning Representations, 2018. 2

Showing first 80 references.