Recognition: unknown
On the Expressive Power of GNNs to Solve Linear SDPs
Pith reviewed 2026-05-07 06:20 UTC · model grok-4.3
The pith
Standard GNNs cannot recover optimal solutions to linear semidefinite programs, but a more expressive architecture can emulate the updates of first-order solvers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Standard GNN architectures are provably unable to recover optimal solutions for linear SDPs, while a more expressive architecture that captures the bipartite variable-constraint graph of the SDP can emulate the updates of a first-order solver and thereby produce high-quality approximate solutions.
What carries the argument
An enhanced graph neural network whose layers explicitly model the interactions between primal variables and dual constraints, allowing it to replicate the primal-dual update rules of a first-order SDP solver.
If this is right
- The enhanced architecture consistently achieves lower prediction error than standard GNNs on both synthetic and SdpLib SDP instances.
- It produces smaller gaps to the optimal objective value across multiple classes of linear SDPs.
- Using its predictions to initialize a first-order solver reduces total running time by up to 80 percent on benchmark problems.
- The same architecture applies uniformly to synthetic instances and real-world SDP collections without class-specific redesign.
Where Pith is reading between the lines
- Similar structural enhancements to GNNs may allow fast approximation of other convex programs whose constraint graphs are bipartite.
- The ability to emulate solver steps suggests that learned models could replace or accelerate iterative methods in large-scale combinatorial relaxations.
- If the training procedure generalizes across SDP families, it could reduce the number of full solver runs needed in repeated optimization tasks such as branch-and-bound.
- The negative results for standard GNNs indicate that sparsity alone is insufficient and that explicit modeling of dual variables is required for this class of problems.
Load-bearing premise
The enhanced architecture can be trained to match first-order solver steps on new SDP instances without requiring per-problem architectural changes or extensive hyper-parameter retuning.
What would settle it
A concrete test set of linear SDPs on which the enhanced architecture produces iterates that diverge from those of a first-order solver or fails to reduce wall-clock time when used as a warm start.
Figures
read the original abstract
Semidefinite programs (SDPs) are a powerful framework for convex optimization and for constructing strong relaxations of hard combinatorial problems. However, solving large SDPs can be computationally expensive, motivating the use of machine learning models as fast computational surrogates. Graph neural networks (GNNs) are a natural candidate in this setting due to their sparsity-awareness and ability to model variable-constraint interactions. In this work, we study what expressive power is sufficient to recover optimal SDP solutions. We first prove negative results showing that standard GNN architectures fail on recovering linear SDP solutions. We then identify a more expressive architecture that captures the key structure of SDPs and can, in particular, emulate the updates of a standard first-order solver. Empirically, on both synthetic and \textsc{SdpLib} benchmarks of various classes of SDPs, this more expressive architecture achieves consistently lower prediction error and objective gap than theoretically weaker baselines. Finally, using the learned high-quality predictions to warm-start the first-order solver yields practical speedups of up to 80%.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proves that standard GNN architectures lack sufficient expressive power to recover optimal solutions to linear SDPs (via invariance arguments on the SDP constraint graph). It then constructs a more expressive GNN variant that captures SDP structure and can emulate the iterative updates of a standard first-order solver. Empirically, this architecture yields lower prediction error and objective gap than weaker baselines on synthetic instances and SdpLib benchmarks across SDP classes, and the learned predictions provide warm-starts that accelerate the solver by up to 80%.
Significance. If the central claims hold, the work supplies a concrete theoretical characterization of the expressive power needed for GNNs to solve linear SDPs and demonstrates a practical bridge between learned models and first-order optimization. The negative results plus constructive architecture, together with reproducible speedups on standard benchmarks, would strengthen the case for using GNN surrogates in large-scale convex relaxations of combinatorial problems.
major comments (3)
- [§4] §4 (architecture definition): the claim that the proposed GNN 'emulates the updates of a standard first-order solver' requires explicit equations showing how the message functions implement (or learn) the projection, dual ascent, or eigenvalue steps. If these functions are hand-crafted rather than end-to-end learned, the result demonstrates an SDP-tailored model rather than a general increase in GNN expressive power.
- [§5] §5 (SdpLib experiments): the tables and text must clarify whether a single set of parameters is trained across all SDP classes or whether separate models are trained per benchmark family. Per-class training would indicate that the reported gains rely on class-specific tuning, undermining the generality asserted in the abstract and introduction.
- [§3] §3 (negative results): the invariance argument establishing that standard GNNs fail must be checked against the precise graph representation (variable-constraint bipartite graph or lifted graph) used in the positive experiments; any mismatch would invalidate the separation between the negative and positive results.
minor comments (3)
- [Abstract] Abstract: 'linear SDPs' should be defined on first use (linear objective and affine constraints) to avoid ambiguity with other SDP variants.
- [Figures] Figures 2-4: axis labels and legend entries should explicitly name the GNN variants (e.g., 'Standard MPNN', 'Proposed SDP-GNN') rather than using generic colors.
- [Related Work] Related work: the discussion of prior GNNs for combinatorial optimization should cite at least one recent result on GNNs for SDP relaxations or quadratic programming to situate the contribution.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify key aspects of our theoretical and empirical contributions. We address each major comment below and will incorporate revisions to improve the manuscript.
read point-by-point responses
-
Referee: [§4] §4 (architecture definition): the claim that the proposed GNN 'emulates the updates of a standard first-order solver' requires explicit equations showing how the message functions implement (or learn) the projection, dual ascent, or eigenvalue steps. If these functions are hand-crafted rather than end-to-end learned, the result demonstrates an SDP-tailored model rather than a general increase in GNN expressive power.
Authors: We agree that explicit equations will strengthen the presentation. In the revised §4, we will add the precise update rules for the message functions, showing how they can represent the projection onto the positive semidefinite cone, dual ascent iterations, and eigenvalue decompositions via learnable linear transformations and nonlinear activations within the GNN layers. These components are fully parameterized and trained end-to-end on the SDP instances, rather than being fixed hand-crafted operations. This demonstrates a genuine increase in expressive power for GNNs that is specialized to SDP structure but remains a general architectural extension. revision: yes
-
Referee: [§5] §5 (SdpLib experiments): the tables and text must clarify whether a single set of parameters is trained across all SDP classes or whether separate models are trained per benchmark family. Per-class training would indicate that the reported gains rely on class-specific tuning, undermining the generality asserted in the abstract and introduction.
Authors: We will revise the text and table captions in §5 to explicitly state that a single set of parameters is trained across all SDP classes. The model is trained on a combined dataset comprising instances from multiple SdpLib families (with appropriate class balancing), which directly supports the generality claims in the abstract and introduction. Separate per-class models were not used; the shared-parameter training regime is what enables the reported cross-class performance and warm-start speedups. revision: yes
-
Referee: [§3] §3 (negative results): the invariance argument establishing that standard GNNs fail must be checked against the precise graph representation (variable-constraint bipartite graph or lifted graph) used in the positive experiments; any mismatch would invalidate the separation between the negative and positive results.
Authors: The graph representation is identical in both sections: the variable-constraint bipartite graph constructed from the SDP's linear constraints and variables (as defined in §2 and used throughout). The invariance arguments in §3 apply directly to this representation, establishing the separation from the more expressive architecture in §4. We will add a short clarifying sentence in §3 that cross-references the graph construction details to eliminate any potential ambiguity. revision: partial
Circularity Check
No significant circularity; proofs and empirical validation remain independent
full rationale
The paper establishes negative results via explicit proofs on standard GNN limitations for SDP recovery (likely invariance-based), then constructs a more expressive architecture shown capable of emulating first-order solver updates by design. Training occurs on synthetic and SdpLib instances with reported generalization across classes, and warm-start speedups are measured separately. No quoted equations reduce predictions to fitted inputs by construction, no load-bearing self-citations close the chain, and the architecture's expressivity claim rests on its structural match to SDPs rather than renaming or smuggling prior results. The derivation is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
Solving Max-Cut to Global Optimality via Feasibility-Preserving Graph Neural Networks
A Max-Cut-specific graph neural network predicts primal- and dual-feasible SDP solutions in linearithmic time, cutting bounding costs in exact branch-and-bound by up to 10.6 times versus a commercial SDP solver while ...
Reference graph
Works this paper leans on
-
[1]
M., and Maron, H
5, 27 Bevilacqua, B., Frasca, F., Lim, D., Srinivasan, B., Cai, C., Balamurugan, G., Bronstein, M. M., and Maron, H. Equivariant subgraph aggregation networks. InICLR,
-
[2]
Efficient subgraph gnns by learning effec- tive selection policies.arXiv preprint arXiv:2310.20082,
3 Bevilacqua, B., Eliasof, M., Meirom, E., Ribeiro, B., and Maron, H. Efficient subgraph gnns by learning effec- tive selection policies.arXiv preprint arXiv:2310.20082,
-
[3]
3 Borchers, B. Sdplib 1.2, a library of semidefinite program- ming test problems.Optimization Methods and Software, 11(1-4):683–690, 1999. 6 Boyd, S. and Vandenberghe, L.Convex Optimization. Cam- bridge University Press, 2004. 1, 21 Boyd, S., El Ghaoui, L., Feron, E., and Balakrishnan, V . Linear matrix inequalities in system and control theory. SIAM, 199...
-
[4]
Graph isomorphism testing without numerics for graphs of bounded eigenvalue multiplicity
3 Fürer, M. Graph isomorphism testing without numerics for graphs of bounded eigenvalue multiplicity. InSODA,
-
[5]
On the power of combinatorial and spectral in- variants.Linear algebra and its applications, 432(9): 2373–2380, 2010
24 Fürer, M. On the power of combinatorial and spectral in- variants.Linear algebra and its applications, 432(9): 2373–2380, 2010. 24 Galli, L. and Letchford, A. N. On the lovász theta function and some variants.Discrete Optimization, 25:159–174,
2010
-
[6]
A novel neural network for nonlinear convex programming.IEEE Transactions on Neural Networks, 15(3):613–621, 2004
1, 33 Gao, X.-B. A novel neural network for nonlinear convex programming.IEEE Transactions on Neural Networks, 15(3):613–621, 2004. 2 Gasse, M., Chételat, D., Ferroni, N., Charlin, L., and Lodi, A. Exact combinatorial optimization with graph convolu- tional neural networks.NeurIPS, 2019. 1, 2, 44 Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., a...
2004
-
[7]
The logic of graph neural networks
3, 14, 15 Grohe, M. The logic of graph neural networks. InSympo- sium on Logic in Computer Science, 2021. 14 Hamilton, W., Ying, Z., and Leskovec, J. Inductive repre- sentation learning on large graphs.NeurIPS, 2017. 2 Han, Q., Li, C., Lin, Z., Chen, C., Deng, Q., Ge, D., Liu, H., and Ye, Y . A low-rank admm splitting approach for semidefinite programming...
-
[8]
Weisfeiler and Leman go sparse: Towards higher-order graph embeddings
1, 2, 3, 14, 29 Morris, C., Rattan, G., and Mutzel, P. Weisfeiler and Leman go sparse: Towards higher-order graph embeddings. In NeurIPS, 2020. 1, 3, 6, 38 Morris, C., Lipman, Y ., Maron, H., Rieck, B., Kriege, N. M., Grohe, M., Fey, M., and Borgwardt, K. Weisfeiler and Leman go machine learning: The story so far.arXiv preprint arXiv:2112.09992, 2021. 3 1...
-
[9]
At- tending to graph transformers.arXiv preprint, 2023
3 Müller, L., Galkin, M., Morris, C., and Rampášek, L. At- tending to graph transformers.arXiv preprint, 2023. 3 Müller, L., Kusuma, D., Bonet, B., and Morris, C. Towards principled graph transformers.NeurIPS, 2024. 3, 6, 42 Nikseresht, A. and Nazemi, A. A novel neural network for solving semidefinite programming problems with some applications.Journal of...
-
[10]
3 Qian, C., Chételat, D., and Morris, C. Exploring the power of graph neural networks in solving linear optimization problems. InAISTATS, 2024. 1, 2 Rampášek, L., Galkin, M., Dwivedi, V . P., Luu, A. T., Wolf, G., and Beaini, D. Recipe for a general, powerful, scal- able graph transformer.NeurIPS, 2022. 3 Rontsis, N., Goulart, P., and Nakatsukasa, Y . Eff...
-
[11]
Here, the diagonal entries are distinguished (a̸=f), allowing the VC-2-WL algorithm to distinguish the solutions
In contrast, VC-2-WL converges to a b c b e d c d f , where{a,b,c,···}are from an alphabet to represent colors. Here, the diagonal entries are distinguished (a̸=f), allowing the VC-2-WL algorithm to distinguish the solutions. Therefore,VC-2-WL⊏VC-WL. Similarly,VC-2-FWL⊏VC-WLcan be proven in exactly the same way with the same example; we omit the pr...
1992
-
[12]
We define the functionf in Fact 2 asf(c∞ k ) =y t k
Primal Update.In the primal update step, we must evaluate the term A∗(yt) = ∑m k=1yt kAk. We define the functionf in Fact 2 asf(c∞ k ) =y t k. This function is well-defined because, by the inductive hypothesis,yt k is refined by the color c∞ k , i.e.,c∞ k =c∞ l =⇒yt k =y t l . Applying Fact 2 with thisf, we obtain: v∞ ij =v∞ pq =⇒ ∑ k∈[m] Ak,ijyt k = ∑ k∈...
-
[13]
Since both Xt+1 and Xt are refined by v∞, so is Wt+1
Dual Update.In the dual update step, we evaluate the term inside the operatorA: Wt+1 :=X t+1 +θt ( Xt+1−Xt) . Since both Xt+1 and Xt are refined by v∞, so is Wt+1. We define the functiong in Fact 3 asg ( v∞ ij ) = ( Wt+1) ij. This is well-defined because the value of ( Wt+1) ij is refined by the colorv∞ ij . Applying Fact 3 with thisg, we obtain: c∞ k =c∞...
2017
-
[14]
For every variable node with index (i,j) , we constructn nodes with index (i,u,j) foru∈[n]
We further construct a node setVtri. For every variable node with index (i,j) , we constructn nodes with index (i,u,j) foru∈[n]. These are initialized with colorsc0 iuj =αtri({{Ciu,Cuj}}). Since the initialization functionsα(·)have disjoint ranges, the color classes of three node sets remain disjoint throughout the execution. The total number of nodes isN...
2019
-
[15]
Now, scale the constraint vector toαbwithα= 2ϵ/δ
Letδ=|X∗ 11−X∗ 33|>0. Now, scale the constraint vector toαbwithα= 2ϵ/δ. By Lemma C.24, the new optimal solution isαX∗, and the gap between the entries becomes|αX∗ 11−αX∗ 33|=αδ= 2ϵ. Since the scaling of b preserves the structural symmetry of the graph, the unrolling trees for variable nodes v11 and v33 remain isomorphic, the VC-MPNN must still predict ide...
1995
-
[16]
Input symmetry: Since Xij =X ji, operations that distinguish rows from columns become redundant, e.g., summing over rows yields the same vector as summing over columns
-
[17]
For instance, broadcasting a vector to rows must be paired with broadcasting to columns
Output symmetry: To guarantee the output is symmetric, the corresponding basis operations must share weights. For instance, broadcasting a vector to rows must be paired with broadcasting to columns. Formally, let1∈Rn be the all-ones vector,Ibe the identity matrix,d=diag(X)∈R n be the vector of diagonal entries, and s=X1∈Rn be the vector of row sums. We id...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.