WARP: A Benchmark for Primal-Dual Warm-Starting of Interior-Point Solvers
Pith reviewed 2026-05-08 14:58 UTC · model grok-4.3
The pith
Primal-only machine learning predictions fail to speed up interior-point solvers for power flow once the correct default start is used, but full primal-dual predictions cut iterations by 76 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that interior-point methods exhibit a geometric anticorrelation in which primal prediction accuracy harms convergence speed, so that only the complete primal-dual-barrier state (x*, λ*, z*, μ*) is structurally capable of large iteration reductions; primal-only warm-starts are therefore ineffective against the solver default, while the released WARP encode-process-decode network on the constraint graph achieves a 76 percent reduction and accommodates N-1 topology variations.
What carries the argument
The full interior-point state consisting of primal solution, dual multipliers, slack variables, and barrier parameter, predicted by a topology-conditioned encode-process-decode interaction network on the heterogeneous constraint graph.
If this is right
- Primal-only warm-start methods cannot reduce interior-point iterations below the solver's default midpoint start in AC-OPF problems.
- Only predictors that also supply dual and barrier information can reach the observed 85 percent iteration reduction shown by oracles.
- Evaluation protocols for warm-start research must adopt the solver default rather than flat starts as the reference point.
- Graph-based models can deliver warm-starts that adapt to N-1 contingency topology changes without retraining.
Where Pith is reading between the lines
- The observed anticorrelation between primal accuracy and convergence speed may reflect a general property of barrier methods rather than a quirk of AC-OPF.
- Benchmark corrections of the kind introduced here are likely needed wherever machine learning is applied to warm-starting of interior-point or other path-following solvers.
- The same full-state prediction approach could be tested on other interior-point implementations beyond IPOPT to check whether the 76 percent reduction generalizes.
Load-bearing premise
That the variable-bound midpoint is the solver's actual default starting point and remains near-optimal for log-barrier centrality across the tested AC-OPF instances and IPOPT configuration.
What would settle it
A controlled experiment in which a primal-only warm-start method produces fewer iterations than the midpoint baseline on a fresh set of AC-OPF cases run with the same IPOPT settings would falsify the central claim.
Figures
read the original abstract
Solving AC Optimal Power Flow (AC-OPF) is of central importance in electricity market operations, where interior-point methods (IPMs) such as IPOPT are the standard solvers. A growing body of work uses machine learning to predict primal warm-start iterates, reporting iteration reductions of 30-46\%. We show that these reported gains rest on an inappropriate evaluation baseline: prior methods benchmark against the flat start $V_m = 1, V_a = 0$, whereas the solver's actual default - the variable-bound midpoint $(l+u)/2$ - is near-optimal for log-barrier centrality. Against this corrected baseline, no primal-only warm-start method reduces solver iterations. We trace the failure to a geometric property of interior-point methods: primal prediction accuracy is anticorrelated with convergence speed, and providing the ground-truth optimal solution $x^*$ without dual variables causes the solver to diverge. Oracle experiments establish that the complete primal-dual-barrier state $(x^*, \lambda^*, z^*, \mu^*)$ reduces IPOPT iterations from 23 to 3 - an 85\% reduction that is structurally inaccessible to primal-only methods. To enable rigorous evaluation of warm-start methods on this task, we release a benchmark suite comprising dual-labeled AC-OPF datasets with IPOPT-extracted solutions, a corrected evaluation protocol, and WARP - a topology-conditioned encode-process-decode interaction network that predicts the full interior-point state $(\hat{x}, \hat{\lambda}, \hat{z}, \hat{\mu})$ on the heterogeneous constraint graph. WARP achieves a 76\% reduction in IPOPT iterations while natively accommodating N-1 contingency topology variations without retraining.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper argues that prior machine learning warm-start methods for interior-point solvers on AC Optimal Power Flow (AC-OPF) problems have used an inappropriate flat-start baseline (V_m=1, V_a=0) instead of the solver's default variable-bound midpoint (l+u)/2, which is near-optimal for log-barrier centrality. Against this corrected baseline, no primal-only warm-start reduces iterations, and providing only the primal optimum x* can cause divergence due to an anticorrelation between primal accuracy and convergence speed. Oracle experiments show that the full primal-dual-barrier state reduces IPOPT iterations from 23 to 3 (85% reduction). The authors release dual-labeled AC-OPF datasets, a corrected evaluation protocol, and WARP, a topology-conditioned encode-process-decode graph network that predicts the full state (x, λ, z, μ) and achieves a 76% iteration reduction while handling N-1 contingencies without retraining.
Significance. If the empirical findings hold, the work would correct a methodological flaw in ML-for-optimization research on warm-starting, establish that dual and barrier predictions are structurally necessary for IPM acceleration, and supply a reusable benchmark with dual-labeled data that enables rigorous comparison. The oracle results and topology-handling capability of WARP are particularly notable strengths that could influence solver design beyond AC-OPF.
major comments (3)
- [§4 and §5.1] §4 (Evaluation Protocol) and §5.1 (Baseline Comparison): The central claim that no primal-only method reduces iterations rests on (l+u)/2 being both IPOPT's actual default initialization and near-optimal for centrality. This equivalence is asserted but not directly verified against IPOPT source, options (e.g., warm_start_init_point), or bound heuristics; if the solver's internal start differs, the dismissal of prior primal-only methods and the necessity of full-state prediction do not follow.
- [§5.2] §5.2 (Anticorrelation Analysis): The reported anticorrelation between primal prediction accuracy and solver convergence speed is observed only on the tested AC-OPF instances under the midpoint baseline; without statistical tests (e.g., correlation coefficients or cross-instance validation) or experiments on other problem classes, this geometric property cannot yet be treated as general.
- [§6.3] §6.3 (Oracle Experiments): The reduction from 23 to 3 iterations when supplying the complete (x*, λ*, z*, μ*) state is load-bearing for the argument that primal-only methods are structurally limited, yet the exact IPOPT configuration, barrier update schedule, and handling of μ* are not specified, making reproduction and generalization difficult.
minor comments (2)
- [Figures/Tables] Figure 2 and Table 1: Axis labels and captions should explicitly state the IPOPT version, tolerance settings, and whether iteration counts include the initial factorization.
- [Notation] Notation: The symbols for the barrier parameter and dual variables are introduced clearly but should be summarized in a single table for quick reference when comparing WARP predictions to ground truth.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which have helped us improve the clarity and rigor of the manuscript. We address each major comment below.
read point-by-point responses
-
Referee: [§4 and §5.1] §4 (Evaluation Protocol) and §5.1 (Baseline Comparison): The central claim that no primal-only method reduces iterations rests on (l+u)/2 being both IPOPT's actual default initialization and near-optimal for centrality. This equivalence is asserted but not directly verified against IPOPT source, options (e.g., warm_start_init_point), or bound heuristics; if the solver's internal start differs, the dismissal of prior primal-only methods and the necessity of full-state prediction do not follow.
Authors: We appreciate this observation. Upon re-examination of the IPOPT source code (version 3.14.4), the default initialization in the IpIpoptApplication class indeed uses the midpoint of the variable bounds when no initial point is provided and warm_start_init_point is set to 'no'. We will add this verification, including relevant code excerpts and option settings, to Section 4 in the revised manuscript to substantiate the baseline choice. This does not alter our conclusions but strengthens the presentation. revision: yes
-
Referee: [§5.2] §5.2 (Anticorrelation Analysis): The reported anticorrelation between primal prediction accuracy and solver convergence speed is observed only on the tested AC-OPF instances under the midpoint baseline; without statistical tests (e.g., correlation coefficients or cross-instance validation) or experiments on other problem classes, this geometric property cannot yet be treated as general.
Authors: The referee is correct that we have not included formal statistical tests in the current version. In the revision, we will compute and report Pearson correlation coefficients with p-values for the anticorrelation between primal error and iteration count across all test cases. We will also include a short theoretical explanation linking this to the log-barrier centrality condition. While the paper focuses on AC-OPF and does not claim generality to all IPM problems, we will clarify this scope and note that similar behavior has been observed in related literature on IPMs. No experiments on other classes are added as they fall outside the paper's scope. revision: partial
-
Referee: [§6.3] §6.3 (Oracle Experiments): The reduction from 23 to 3 iterations when supplying the complete (x*, λ*, z*, μ*) state is load-bearing for the argument that primal-only methods are structurally limited, yet the exact IPOPT configuration, barrier update schedule, and handling of μ* are not specified, making reproduction and generalization difficult.
Authors: We agree that additional details are necessary for reproducibility. In the revised manuscript and the accompanying code repository, we will provide the complete IPOPT configuration used for the oracle experiments, including the barrier parameter update strategy (mu_strategy = 'adaptive'), initial mu value, tolerance settings, and how the predicted μ* is incorporated (via the mu_init option). A reproduction script will be added to the benchmark suite. revision: yes
Circularity Check
No significant circularity; claims rest on external empirical benchmarks
full rationale
The paper's argument chain consists of empirical comparisons: prior primal-only warm-starts are tested against the variable-bound midpoint baseline using IPOPT runs on AC-OPF instances, oracle experiments measure iteration reductions from the full primal-dual state, and WARP (a trained encode-process-decode network) is evaluated on held-out instances for a 76% reduction. No step reduces by construction to fitted parameters, self-citations, or ansatzes; performance metrics derive from independent solver executions rather than internal redefinitions or renamings. The model training uses data but the central claims (baseline correction, anticorrelation observation, and WARP gains) are falsifiable against external runs and do not collapse to the inputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- WARP network weights
axioms (1)
- domain assumption The variable-bound midpoint (l+u)/2 is the solver's actual default start and near-optimal for log-barrier centrality.
Reference graph
Works this paper leans on
-
[1]
Load the OPFDataset HeteroData graph containing bus, generator, load, and branch data
-
[2]
Construct the cyipopt NLP problem with exact Hessian and sparse Jacobian structure
-
[3]
Initialise IPOPT at the midpoint(l+u)/2with default dual initialisation
-
[4]
Run IPOPT to convergence (tolerance10 −4)
-
[5]
Extract the full converged state:D i = (x∗ i , λ∗ i , z∗ l,i, z∗ u,i, µ∗ i , f(x ∗ i ))
-
[6]
E.2 Convergence statistics Table 18: Dual label extraction statistics for case118
Save as a PyTorch tensor file:data/duals/case118/{split}/duals_{idx:06d}.pt. E.2 Convergence statistics Table 18: Dual label extraction statistics for case118. Split Instances Converged Rate Mean time (s) Total time (h) Train 5,000 5,000 100% 2.5 3.5 Validation 500 500 100% 2.5 0.35 Test 50 50 100% 2.5 0.035 E.3 Dual variable distributions The extracted d...
-
[7]
Direct concatenation: (P d i , Qd i ) are appended to the feature vector of the bus node at which the load is connected, and to the feature vectors of all generators connected to that bus
-
[8]
Global load skip: the sum of all loads P i(P d i , Qd i ) is passed through a small MLP and concatenated to the generator decoder input, providing a global demand signal. Without load injection, the model outputs near-constant predictions (pred std ∼0.01–0.20 versus true std ∼0.6–1.0, correlation ≈0 for all variables), as the static graph features carry n...
-
[9]
Adding edge updates (Exp E) reduced loss to 0.45—the first time any GNN variant dropped below 1.0
Edge updates broke the 1.0 loss floor.The original node-only GNN plateaued at val loss 1.0 regardless of training configuration. Adding edge updates (Exp E) reduced loss to 0.45—the first time any GNN variant dropped below 1.0. This suggests that edge features carry information critical for dual prediction that node-only message passing cannot capture
-
[10]
Loss strategies provided modest iteration gains.Binding-mask loss and two-stage decod- ing each independently reduced iterations from 7.0 to 6.7, but neither reduced validation loss substantially. The binding mask helps the model allocate capacity to the sparse but critical binding multipliers; two-stage decoding conditions dual prediction on predicted primals
-
[11]
This is the largest single-modification gain in the entire ablation
Removing node residuals was the decisive change.Val loss dropped from 0.45 to 0.09 (5×), and IPOPT iterations from 7.0 to 5.4, from a single architectural modification. This is the largest single-modification gain in the entire ablation
-
[12]
Further refinements hit a ceiling at 5.3.Per-node bias (1,268 additional parameters) and two-stage decoding each independently reached 5.3 iterations. Combining all three (best_combo, 500 epochs) did not push below 5.4, indicating an architectural ceiling for this model family on case118
-
[13]
H= 256 (∼25M params) achieved 6.6 iterations—worse than H= 128
Wider models do not help. H= 256 (∼25M params) achieved 6.6 iterations—worse than H= 128 . The additional capacity introduces optimisation difficulty without improving representational quality at this problem scale
-
[14]
Physics loss was counterproductive.Adding an AC power balance violation loss (Exp E2) increased val loss from 0.45 to 1.10 and worsened iterations from 7.0 to 7.2. The physics loss conflicts with the per-variable normalisation: the power balance residual operates in physical units, creating a scale mismatch with the normalised MSE. 21 H Independent CANOS ...
work page 2025
-
[15]
The noise prediction task is harder than direct regression.The diffusion model must learnϵ(x t, t)at every noise level, a strictly harder mapping than directx 0 prediction
-
[16]
DDIM sampling introduces cumulative error.Each of the 50 denoising steps contributes a small approximation error that compounds
-
[17]
The KKT scoring proxy is approximate.A full KKT residual computation (requiring Jacobian evaluation) would be more accurate but also more expensive
-
[18]
Case118 is effectively unimodal.Each load scenario maps to a single well-separated optimum. Multi-sample diversity provides no benefit when the solution mapping is deter- ministic. 5.K= 5 is worse than K= 1 .The scoring function may select atypical samples with low complementarity proxy but poor overall KKT satisfaction, suggesting the proxy metric is not...
work page 2020
-
[19]
developed gauge-map projections for problems with linear constraints, while Liang et al
-
[20]
proposed homeomorphic projections for non-convex feasible regions. More recently, Chen et al. [2024] trained networks to predict feasible dual solutions, recovering associated primals via the stationarity condition. Our objective differs from these approaches: we seek to reduce solver iterations while retaining the feasibility and optimality guarantees of...
work page 2024
-
[21]
introduced the idea of learning optimiser update rules. Sambharya et al. [2024] learned warm- starts for fixed-point splitting methods on QPs by differentiating through unrolled solver iterations. Briden et al. [2024] proposed Lagrangian-informed losses for warm-starting trajectory optimisation under an SQP solver. Graph neural networks for physical simul...
work page 2024
-
[22]
provided an open-source reimplementation with physics-informed branch flow derivations. Liu et al. [2022] developed topology-aware GNNs with physics-based feasibility regularisation, demonstrating adaptivity to topological perturbations. We adopt the same architectural family but extend it to predict the full primal-dual-barrier state—a task that these pr...
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.