arxiv: 2604.27786 · v2 · submitted 2026-04-30 · 💻 cs.LG

Recognition: unknown

On the Expressive Power of GNNs to Solve Linear SDPs

Chendi Qian , Christopher Morris

Authors on Pith no claims yet

Pith reviewed 2026-05-07 06:20 UTC · model grok-4.3

classification 💻 cs.LG

keywords Graph Neural NetworksSemidefinite ProgrammingExpressive PowerConvex OptimizationFirst-order MethodsWarm-startingMachine Learning SurrogatesLinear SDPs

0 comments

The pith

Standard GNNs cannot recover optimal solutions to linear semidefinite programs, but a more expressive architecture can emulate the updates of first-order solvers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that ordinary graph neural networks lack sufficient expressive power to find good solutions to linear semidefinite programs. It then constructs a stronger architecture whose message-passing rules directly reflect the variable-constraint structure of an SDP. This architecture is shown to reproduce the iteration steps of a standard first-order solver. On both synthetic instances and real SdpLib benchmarks the stronger model yields smaller prediction errors and objective gaps than weaker baselines. Feeding its output into an existing solver as a warm start produces measured speedups of up to 80 percent.

Core claim

Standard GNN architectures are provably unable to recover optimal solutions for linear SDPs, while a more expressive architecture that captures the bipartite variable-constraint graph of the SDP can emulate the updates of a first-order solver and thereby produce high-quality approximate solutions.

What carries the argument

An enhanced graph neural network whose layers explicitly model the interactions between primal variables and dual constraints, allowing it to replicate the primal-dual update rules of a first-order SDP solver.

If this is right

The enhanced architecture consistently achieves lower prediction error than standard GNNs on both synthetic and SdpLib SDP instances.
It produces smaller gaps to the optimal objective value across multiple classes of linear SDPs.
Using its predictions to initialize a first-order solver reduces total running time by up to 80 percent on benchmark problems.
The same architecture applies uniformly to synthetic instances and real-world SDP collections without class-specific redesign.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar structural enhancements to GNNs may allow fast approximation of other convex programs whose constraint graphs are bipartite.
The ability to emulate solver steps suggests that learned models could replace or accelerate iterative methods in large-scale combinatorial relaxations.
If the training procedure generalizes across SDP families, it could reduce the number of full solver runs needed in repeated optimization tasks such as branch-and-bound.
The negative results for standard GNNs indicate that sparsity alone is insufficient and that explicit modeling of dual variables is required for this class of problems.

Load-bearing premise

The enhanced architecture can be trained to match first-order solver steps on new SDP instances without requiring per-problem architectural changes or extensive hyper-parameter retuning.

What would settle it

A concrete test set of linear SDPs on which the enhanced architecture produces iterates that diverge from those of a first-order solver or fails to reduce wall-clock time when used as a warm start.

Figures

Figures reproduced from arXiv: 2604.27786 by Chendi Qian, Christopher Morris.

**Figure 1.** Figure 1: Our framework initializes variable and constraint embeddings directly from the SDP problem data. At each step t, the model iteratively updates these representations by aggregating messages from variable and constraint neighbors via permutation-equivariant functions. Finally, the variable embeddings are decoded to predict the optimal solution X∗ . as VC-2-FWL, is capable of capturing linear SDPs. Concretely… view at source ↗

read the original abstract

Semidefinite programs (SDPs) are a powerful framework for convex optimization and for constructing strong relaxations of hard combinatorial problems. However, solving large SDPs can be computationally expensive, motivating the use of machine learning models as fast computational surrogates. Graph neural networks (GNNs) are a natural candidate in this setting due to their sparsity-awareness and ability to model variable-constraint interactions. In this work, we study what expressive power is sufficient to recover optimal SDP solutions. We first prove negative results showing that standard GNN architectures fail on recovering linear SDP solutions. We then identify a more expressive architecture that captures the key structure of SDPs and can, in particular, emulate the updates of a standard first-order solver. Empirically, on both synthetic and \textsc{SdpLib} benchmarks of various classes of SDPs, this more expressive architecture achieves consistently lower prediction error and objective gap than theoretically weaker baselines. Finally, using the learned high-quality predictions to warm-start the first-order solver yields practical speedups of up to 80%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Standard GNNs provably cannot recover linear SDP solutions, but the authors build a more expressive architecture that emulates first-order solver steps and delivers up to 80% warm-start speedups, though the model may be too SDP-specific to count as a general expressivity advance.

read the letter

The main thing to know is that ordinary GNNs are too weak for linear SDPs while a custom architecture can copy first-order solver behavior and cut solve times when used as a warm start. The negative results are the cleanest part. The authors prove that standard message-passing models cannot recover optimal SDP solutions, most likely because they cannot capture the right variable-constraint interactions on the SDP graph. That bound is new enough to be worth citing if you work on GNN limits for structured optimization problems. They then define a stronger architecture that incorporates SDP structure directly and can replicate the iteration steps of a standard first-order method. On both synthetic instances and SdpLib benchmarks the model reports lower prediction error and smaller objective gaps than the weaker baselines, and feeding its output into the solver produces the claimed speedups. Those numbers are concrete and the experimental setup covers multiple SDP classes, which is better than many pure theory papers. The soft spot is exactly the one the stress-test raises. If the new architecture achieves the emulation by baking in projection or dual-update operations rather than learning them from data, and if training still needs to be redone per SDP family, then the result is an SDP-tailored network rather than a general increase in GNN power. The abstract says the gains are consistent across classes, but the paper would be stronger if it showed that a single set of learned parameters works without per-class retuning or hand-crafted components. This paper is aimed at researchers who already use SDPs for relaxations and are looking for fast surrogates or warm-starts. Anyone thinking about GNN expressivity for convex problems will find the negative bounds useful, and the practical speedups will interest people building hybrid solvers. It has enough new theory and reproducible empirical claims to deserve a serious referee. I would send it out for review rather than desk-reject; the proofs and the warm-start results are specific enough that referees can check the architecture details and the generality question directly.

Referee Report

3 major / 3 minor

Summary. The paper proves that standard GNN architectures lack sufficient expressive power to recover optimal solutions to linear SDPs (via invariance arguments on the SDP constraint graph). It then constructs a more expressive GNN variant that captures SDP structure and can emulate the iterative updates of a standard first-order solver. Empirically, this architecture yields lower prediction error and objective gap than weaker baselines on synthetic instances and SdpLib benchmarks across SDP classes, and the learned predictions provide warm-starts that accelerate the solver by up to 80%.

Significance. If the central claims hold, the work supplies a concrete theoretical characterization of the expressive power needed for GNNs to solve linear SDPs and demonstrates a practical bridge between learned models and first-order optimization. The negative results plus constructive architecture, together with reproducible speedups on standard benchmarks, would strengthen the case for using GNN surrogates in large-scale convex relaxations of combinatorial problems.

major comments (3)

[§4] §4 (architecture definition): the claim that the proposed GNN 'emulates the updates of a standard first-order solver' requires explicit equations showing how the message functions implement (or learn) the projection, dual ascent, or eigenvalue steps. If these functions are hand-crafted rather than end-to-end learned, the result demonstrates an SDP-tailored model rather than a general increase in GNN expressive power.
[§5] §5 (SdpLib experiments): the tables and text must clarify whether a single set of parameters is trained across all SDP classes or whether separate models are trained per benchmark family. Per-class training would indicate that the reported gains rely on class-specific tuning, undermining the generality asserted in the abstract and introduction.
[§3] §3 (negative results): the invariance argument establishing that standard GNNs fail must be checked against the precise graph representation (variable-constraint bipartite graph or lifted graph) used in the positive experiments; any mismatch would invalidate the separation between the negative and positive results.

minor comments (3)

[Abstract] Abstract: 'linear SDPs' should be defined on first use (linear objective and affine constraints) to avoid ambiguity with other SDP variants.
[Figures] Figures 2-4: axis labels and legend entries should explicitly name the GNN variants (e.g., 'Standard MPNN', 'Proposed SDP-GNN') rather than using generic colors.
[Related Work] Related work: the discussion of prior GNNs for combinatorial optimization should cite at least one recent result on GNNs for SDP relaxations or quadratic programming to situate the contribution.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify key aspects of our theoretical and empirical contributions. We address each major comment below and will incorporate revisions to improve the manuscript.

read point-by-point responses

Referee: [§4] §4 (architecture definition): the claim that the proposed GNN 'emulates the updates of a standard first-order solver' requires explicit equations showing how the message functions implement (or learn) the projection, dual ascent, or eigenvalue steps. If these functions are hand-crafted rather than end-to-end learned, the result demonstrates an SDP-tailored model rather than a general increase in GNN expressive power.

Authors: We agree that explicit equations will strengthen the presentation. In the revised §4, we will add the precise update rules for the message functions, showing how they can represent the projection onto the positive semidefinite cone, dual ascent iterations, and eigenvalue decompositions via learnable linear transformations and nonlinear activations within the GNN layers. These components are fully parameterized and trained end-to-end on the SDP instances, rather than being fixed hand-crafted operations. This demonstrates a genuine increase in expressive power for GNNs that is specialized to SDP structure but remains a general architectural extension. revision: yes
Referee: [§5] §5 (SdpLib experiments): the tables and text must clarify whether a single set of parameters is trained across all SDP classes or whether separate models are trained per benchmark family. Per-class training would indicate that the reported gains rely on class-specific tuning, undermining the generality asserted in the abstract and introduction.

Authors: We will revise the text and table captions in §5 to explicitly state that a single set of parameters is trained across all SDP classes. The model is trained on a combined dataset comprising instances from multiple SdpLib families (with appropriate class balancing), which directly supports the generality claims in the abstract and introduction. Separate per-class models were not used; the shared-parameter training regime is what enables the reported cross-class performance and warm-start speedups. revision: yes
Referee: [§3] §3 (negative results): the invariance argument establishing that standard GNNs fail must be checked against the precise graph representation (variable-constraint bipartite graph or lifted graph) used in the positive experiments; any mismatch would invalidate the separation between the negative and positive results.

Authors: The graph representation is identical in both sections: the variable-constraint bipartite graph constructed from the SDP's linear constraints and variables (as defined in §2 and used throughout). The invariance arguments in §3 apply directly to this representation, establishing the separation from the more expressive architecture in §4. We will add a short clarifying sentence in §3 that cross-references the graph construction details to eliminate any potential ambiguity. revision: partial

Circularity Check

0 steps flagged

No significant circularity; proofs and empirical validation remain independent

full rationale

The paper establishes negative results via explicit proofs on standard GNN limitations for SDP recovery (likely invariance-based), then constructs a more expressive architecture shown capable of emulating first-order solver updates by design. Training occurs on synthetic and SdpLib instances with reported generalization across classes, and warm-start speedups are measured separately. No quoted equations reduce predictions to fitted inputs by construction, no load-bearing self-citations close the chain, and the architecture's expressivity claim rests on its structural match to SDPs rather than renaming or smuggling prior results. The derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The 'more expressive architecture' is described at a high level without detailing any new postulated components or fitted constants.

pith-pipeline@v0.9.0 · 5478 in / 1152 out tokens · 77248 ms · 2026-05-07T06:20:12.795612+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Solving Max-Cut to Global Optimality via Feasibility-Preserving Graph Neural Networks
cs.LG 2026-05 unverdicted novelty 7.0

A Max-Cut-specific graph neural network predicts primal- and dual-feasible SDP solutions in linearithmic time, cutting bounding costs in exact branch-and-bound by up to 10.6 times versus a commercial SDP solver while ...

Reference graph

Works this paper leans on

17 extracted references · 6 canonical work pages · cited by 1 Pith paper

[1]

M., and Maron, H

5, 27 Bevilacqua, B., Frasca, F., Lim, D., Srinivasan, B., Cai, C., Balamurugan, G., Bronstein, M. M., and Maron, H. Equivariant subgraph aggregation networks. InICLR,
[2]

Efficient subgraph gnns by learning effec- tive selection policies.arXiv preprint arXiv:2310.20082,

3 Bevilacqua, B., Eliasof, M., Meirom, E., Ribeiro, B., and Maron, H. Efficient subgraph gnns by learning effec- tive selection policies.arXiv preprint arXiv:2310.20082,

work page arXiv
[3]

Sdplib 1.2, a library of semidefinite program- ming test problems.Optimization Methods and Software, 11(1-4):683–690, 1999

3 Borchers, B. Sdplib 1.2, a library of semidefinite program- ming test problems.Optimization Methods and Software, 11(1-4):683–690, 1999. 6 Boyd, S. and Vandenberghe, L.Convex Optimization. Cam- bridge University Press, 2004. 1, 21 Boyd, S., El Ghaoui, L., Feron, E., and Balakrishnan, V . Linear matrix inequalities in system and control theory. SIAM, 199...

work page arXiv 1999
[4]

Graph isomorphism testing without numerics for graphs of bounded eigenvalue multiplicity

3 Fürer, M. Graph isomorphism testing without numerics for graphs of bounded eigenvalue multiplicity. InSODA,
[5]

On the power of combinatorial and spectral in- variants.Linear algebra and its applications, 432(9): 2373–2380, 2010

24 Fürer, M. On the power of combinatorial and spectral in- variants.Linear algebra and its applications, 432(9): 2373–2380, 2010. 24 Galli, L. and Letchford, A. N. On the lovász theta function and some variants.Discrete Optimization, 25:159–174,

2010
[6]

A novel neural network for nonlinear convex programming.IEEE Transactions on Neural Networks, 15(3):613–621, 2004

1, 33 Gao, X.-B. A novel neural network for nonlinear convex programming.IEEE Transactions on Neural Networks, 15(3):613–621, 2004. 2 Gasse, M., Chételat, D., Ferroni, N., Charlin, L., and Lodi, A. Exact combinatorial optimization with graph convolu- tional neural networks.NeurIPS, 2019. 1, 2, 44 Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., a...

2004
[7]

The logic of graph neural networks

3, 14, 15 Grohe, M. The logic of graph neural networks. InSympo- sium on Logic in Computer Science, 2021. 14 Hamilton, W., Ying, Z., and Leskovec, J. Inductive repre- sentation learning on large graphs.NeurIPS, 2017. 2 Han, Q., Li, C., Lin, Z., Chen, C., Deng, Q., Ge, D., Liu, H., and Ye, Y . A low-rank admm splitting approach for semidefinite programming...

work page arXiv 2021
[8]

Weisfeiler and Leman go sparse: Towards higher-order graph embeddings

1, 2, 3, 14, 29 Morris, C., Rattan, G., and Mutzel, P. Weisfeiler and Leman go sparse: Towards higher-order graph embeddings. In NeurIPS, 2020. 1, 3, 6, 38 Morris, C., Lipman, Y ., Maron, H., Rieck, B., Kriege, N. M., Grohe, M., Fey, M., and Borgwardt, K. Weisfeiler and Leman go machine learning: The story so far.arXiv preprint arXiv:2112.09992, 2021. 3 1...

work page arXiv 2020
[9]

At- tending to graph transformers.arXiv preprint, 2023

3 Müller, L., Galkin, M., Morris, C., and Rampášek, L. At- tending to graph transformers.arXiv preprint, 2023. 3 Müller, L., Kusuma, D., Bonet, B., and Morris, C. Towards principled graph transformers.NeurIPS, 2024. 3, 6, 42 Nikseresht, A. and Nazemi, A. A novel neural network for solving semidefinite programming problems with some applications.Journal of...

work page arXiv 2023
[10]

parallel

3 Qian, C., Chételat, D., and Morris, C. Exploring the power of graph neural networks in solving linear optimization problems. InAISTATS, 2024. 1, 2 Rampášek, L., Galkin, M., Dwivedi, V . P., Luu, A. T., Wolf, G., and Beaini, D. Recipe for a general, powerful, scal- able graph transformer.NeurIPS, 2022. 3 Rontsis, N., Goulart, P., and Nakatsukasa, Y . Eff...

work page arXiv 2024
[11]

Here, the diagonal entries are distinguished (a̸=f), allowing the VC-2-WL algorithm to distinguish the solutions

In contrast, VC-2-WL converges to   a b c b e d c d f  , where{a,b,c,···}are from an alphabet to represent colors. Here, the diagonal entries are distinguished (a̸=f), allowing the VC-2-WL algorithm to distinguish the solutions. Therefore,VC-2-WL⊏VC-WL. Similarly,VC-2-FWL⊏VC-WLcan be proven in exactly the same way with the same example; we omit the pr...

1992
[12]

We define the functionf in Fact 2 asf(c∞ k ) =y t k

Primal Update.In the primal update step, we must evaluate the term A∗(yt) = ∑m k=1yt kAk. We define the functionf in Fact 2 asf(c∞ k ) =y t k. This function is well-defined because, by the inductive hypothesis,yt k is refined by the color c∞ k , i.e.,c∞ k =c∞ l =⇒yt k =y t l . Applying Fact 2 with thisf, we obtain: v∞ ij =v∞ pq =⇒ ∑ k∈[m] Ak,ijyt k = ∑ k∈...
[13]

Since both Xt+1 and Xt are refined by v∞, so is Wt+1

Dual Update.In the dual update step, we evaluate the term inside the operatorA: Wt+1 :=X t+1 +θt ( Xt+1−Xt) . Since both Xt+1 and Xt are refined by v∞, so is Wt+1. We define the functiong in Fact 3 asg ( v∞ ij ) = ( Wt+1) ij. This is well-defined because the value of ( Wt+1) ij is refined by the colorv∞ ij . Applying Fact 3 with thisg, we obtain: c∞ k =c∞...

2017
[14]

For every variable node with index (i,j) , we constructn nodes with index (i,u,j) foru∈[n]

We further construct a node setVtri. For every variable node with index (i,j) , we constructn nodes with index (i,u,j) foru∈[n]. These are initialized with colorsc0 iuj =αtri({{Ciu,Cuj}}). Since the initialization functionsα(·)have disjoint ranges, the color classes of three node sets remain disjoint throughout the execution. The total number of nodes isN...

2019
[15]

Now, scale the constraint vector toαbwithα= 2ϵ/δ

Letδ=|X∗ 11−X∗ 33|>0. Now, scale the constraint vector toαbwithα= 2ϵ/δ. By Lemma C.24, the new optimal solution isαX∗, and the gap between the entries becomes|αX∗ 11−αX∗ 33|=αδ= 2ϵ. Since the scaling of b preserves the structural symmetry of the graph, the unrolling trees for variable nodes v11 and v33 remain isomorphic, the VC-MPNN must still predict ide...

1995
[16]

Input symmetry: Since Xij =X ji, operations that distinguish rows from columns become redundant, e.g., summing over rows yields the same vector as summing over columns
[17]

For instance, broadcasting a vector to rows must be paired with broadcasting to columns

Output symmetry: To guarantee the output is symmetric, the corresponding basis operations must share weights. For instance, broadcasting a vector to rows must be paired with broadcasting to columns. Formally, let1∈Rn be the all-ones vector,Ibe the identity matrix,d=diag(X)∈R n be the vector of diagonal entries, and s=X1∈Rn be the vector of row sums. We id...

2024