pith. sign in

arxiv: 2511.06323 · v3 · submitted 2025-11-09 · 🧮 math.OC

Krylov Subspace Acceleration for First-Order Splitting Methods in Convex Quadratic Programming

Pith reviewed 2026-05-17 23:55 UTC · model grok-4.3

classification 🧮 math.OC
keywords Krylov subspace accelerationfirst-order methodsconvex quadratic programmingAnderson accelerationmodel predictive controloptimal controlsplitting methods
0
0 comments X

The pith

A Krylov subspace acceleration scheme for first-order methods on convex quadratic programs outperforms Anderson acceleration in iteration count and often in computation time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes an acceleration scheme for first-order splitting methods applied to convex quadratic programs, built around a Krylov subspace construction. It draws motivation from the fact that these methods typically produce piecewise-affine operators, allowing the scheme to minimize residuals over a growing subspace without the mixing-coefficient ill-conditioning that affects Anderson acceleration in slow-convergence regions. Numerical tests on model predictive control and statistical learning problems confirm fewer iterations across the board, with additional gains in wall-clock time for optimal control instances and high-accuracy targets.

Core claim

The central claim is that a Krylov subspace acceleration, constructed from the history of residuals generated by a first-order method on a convex QP, produces a new iterate by solving a small least-squares problem over that subspace; this approach remains numerically stable where Anderson acceleration breaks down and delivers lower iteration counts on standard QP collections from optimal control and machine learning.

What carries the argument

The Krylov subspace acceleration scheme, which assembles a subspace from successive residuals of the first-order iteration and finds the minimal-residual combination within that subspace, analogous to GMRES but adapted to the piecewise-affine operators that arise from splitting methods on QPs.

If this is right

  • Lower iteration counts for first-order splitting methods on convex QPs from standard benchmark sets.
  • Particularly large savings in iteration count and runtime for optimal control problems and high-accuracy solves.
  • Avoidance of the numerical instability that Anderson acceleration exhibits during slow convergence phases.
  • Direct applicability to the splitting methods already used in model predictive control and statistical learning solvers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same subspace construction could be applied to first-order methods on other problems whose iteration operators are piecewise affine.
  • The method opens the possibility of deriving convergence-rate bounds that exploit the finite-dimensional Krylov property rather than mixing weights.
  • Embedding the accelerator inside existing QP solvers could shorten solution times in embedded or real-time optimal control loops.
  • Scaling tests on larger QPs might show whether the small dense least-squares step remains negligible relative to the overall cost.

Load-bearing premise

That first-order methods applied to convex quadratic programs typically consist of piecewise-affine operators.

What would settle it

Running the proposed Krylov scheme and Anderson acceleration side-by-side on the same collection of MPC and statistical-learning QPs and finding that the Krylov method does not produce strictly fewer iterations on average, particularly for optimal-control instances solved to high accuracy.

Figures

Figures reproduced from arXiv: 2511.06323 by Gabriel Berk Pereira, Paul J. Goulart.

Figure 1
Figure 1. Figure 1: Histogram of ratio between SpMV (P x, Ax, and A⊤y) times with complex/real 64-bit floating point vectors. CSC matrices P and A taken from 108 mpc and sslsq problems with m, n ≤ 100 000. (3 samples with ratio smaller than 1 removed.) [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Histogram of ratio between time to solve linear systems in-place [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Relative time performance profile for tolerance [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 6
Figure 6. Figure 6: Relative time performance profile for tolerance [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: Relative time performance profile for tolerance [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
read the original abstract

We propose an acceleration scheme for first-order methods (FOMs) for convex quadratic programs (QPs) that is analogous to Anderson acceleration and the Generalized Minimal Residual algorithm for linear systems. We motivate our proposed method from the observation that FOMs applied to QPs typically consist of piecewise-affine operators. We describe our Krylov subspace acceleration scheme, contrasting it with existing Anderson acceleration schemes and showing that it largely avoids the latter's well-known ill-conditioning issues in regions of slow convergence. We demonstrate the performance of our scheme relative to Anderson acceleration using standard collections of problems from model predictive control and statistical learning applications. We show that our method is faster than Anderson acceleration across the board in terms of iteration count, and in many cases in computation time, particularly for optimal control and for problems solved to high accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a Krylov subspace acceleration scheme for first-order splitting methods on convex quadratic programs. Motivated by the piecewise-affine nature of the operators arising from proximal or splitting steps, the method builds a subspace from successive residuals and solves a small least-squares problem at each iteration, analogous to GMRES. It is contrasted with Anderson acceleration and claimed to avoid progressive ill-conditioning in slow-convergence regimes. Numerical tests on standard MPC and statistical learning collections report fewer iterations than Anderson acceleration across the board and lower wall-clock time in many cases, especially for optimal control problems and high-accuracy solves.

Significance. If the conditioning and stability properties hold under active-set changes, the approach could provide a practical, more robust acceleration technique for first-order QP solvers in control and learning applications. The explicit use of the piecewise-affine structure to motivate the subspace construction is a distinctive angle that, if supported, would strengthen the contribution beyond standard empirical acceleration studies.

major comments (2)
  1. [§3] §3 (Krylov subspace construction and motivation): The claim that the piecewise-affine character of the operators ensures the Krylov least-squares problem remains better-conditioned than Anderson's mixing matrix when the active affine piece changes near the fixed point is not accompanied by analysis or targeted numerical diagnostics. This assumption is load-bearing for the central performance claim of stable high-accuracy acceleration.
  2. [Numerical experiments] Numerical experiments section: The manuscript asserts superiority 'across the board' in iteration count and 'in many cases' in computation time, yet supplies no details on experimental setup, problem dimensions, stopping tolerances, hardware, number of instances, or statistical measures. This prevents verification of the headline empirical results.
minor comments (2)
  1. The description of the QR factorization step for the small least-squares problem would benefit from an explicit algorithm listing or pseudocode box for reproducibility.
  2. A few sentences clarifying the precise relation between the proposed residual-based subspace and classical GMRES (e.g., whether restarts or deflation are used) would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments. We appreciate the recognition of the potential practical value of the proposed acceleration scheme and address each major comment below, indicating planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (Krylov subspace construction and motivation): The claim that the piecewise-affine character of the operators ensures the Krylov least-squares problem remains better-conditioned than Anderson's mixing matrix when the active affine piece changes near the fixed point is not accompanied by analysis or targeted numerical diagnostics. This assumption is load-bearing for the central performance claim of stable high-accuracy acceleration.

    Authors: We agree that the manuscript would benefit from additional support for the conditioning claim. The motivation in §3 rests on the piecewise-affine structure of the proximal/splitting operators for convex QPs, which implies that successive residuals reflect changes in the active affine piece; the Krylov least-squares problem then operates on a subspace spanned by these residuals rather than a fixed-size mixing matrix, thereby avoiding the progressive ill-conditioning observed in Anderson acceleration during slow convergence. While a full theoretical analysis of conditioning under arbitrary active-set changes lies beyond the present scope, we will add a short discussion clarifying this structural distinction together with targeted numerical diagnostics (e.g., condition-number histories of the least-squares matrices) in the revised §3. revision: partial

  2. Referee: [Numerical experiments] Numerical experiments section: The manuscript asserts superiority 'across the board' in iteration count and 'in many cases' in computation time, yet supplies no details on experimental setup, problem dimensions, stopping tolerances, hardware, number of instances, or statistical measures. This prevents verification of the headline empirical results.

    Authors: We apologize for the omission. The numerical section will be expanded to include a complete description of the experimental setup: problem dimensions drawn from the standard MPC and statistical-learning collections, stopping tolerances, hardware platform, number of instances per collection, and statistical measures (means, medians, and standard deviations of iteration counts and runtimes). These additions will enable independent verification of the reported performance advantages. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation rests on external piecewise-affine observation and independent benchmarks

full rationale

The paper motivates the Krylov scheme from the standard observation that first-order methods on convex QPs yield piecewise-affine operators, then constructs an acceleration analogous to GMRES that solves a small least-squares problem on successive residuals. Performance is demonstrated via direct comparison on external benchmark collections (model predictive control and statistical learning problems) rather than on quantities defined from the method's own fitted parameters. No step reduces by construction to a self-definition, a fitted input renamed as prediction, or a load-bearing self-citation chain; the central claim of improved iteration count and stability therefore remains independent of the inputs it is evaluated against.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that first-order methods on QPs produce piecewise-affine operators and on the empirical performance observed on standard test sets. No free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption First-order methods applied to convex quadratic programs typically consist of piecewise-affine operators.
    This observation is stated as the motivation for the Krylov subspace construction and for the claimed avoidance of ill-conditioning.

pith-pipeline@v0.9.0 · 5435 in / 1305 out tokens · 42766 ms · 2026-05-17T23:55:02.659811+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Anderson Acceleration for Linearly Converging SQP-Type Methods

    math.OC 2026-04 unverdicted novelty 5.0

    Anderson acceleration speeds up local convergence of linearly converging SQP-type methods, with a heuristic for distant points, shown via optimal control examples in acados.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · cited by 1 Pith paper

  1. [1]

    Boyd and L

    S. Boyd and L. Vandenberghe,Convex Optimization. Cambridge: Cambridge University Press, 2004

  2. [2]

    A Splitting Method for Optimal Control,

    B. O’Donoghue, G. Stathopoulos, and S. Boyd, “A Splitting Method for Optimal Control,”IEEE Transactions on Control Systems Technol- ogy, vol. 21, no. 6, pp. 2432–2442, 2013

  3. [3]

    Towards Proper Assessment of QP Algorithms for Embedded Model Predictive Control,

    D. Kouzoupis, A. Zanelli, H. Peyrl, and H. J. Ferreau, “Towards Proper Assessment of QP Algorithms for Embedded Model Predictive Control,” in2015 European Control Conference (ECC), 2015, pp. 2609–2616

  4. [4]

    Support-Vector Networks,

    C. Cortes and V . Vapnik, “Support-Vector Networks,”Machine Learn- ing, vol. 20, no. 3, pp. 273–297, Sep. 1995

  5. [5]

    Regression Shrinkage and Selection Via the Lasso,

    R. Tibshirani, “Regression Shrinkage and Selection Via the Lasso,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 58, no. 1, pp. 267–288, Jan. 1996

  6. [6]

    Robust Estimation of a Location Parameter,

    P. J. Huber, “Robust Estimation of a Location Parameter,” inBreak- throughs in Statistics: Methodology and Distribution, S. Kotz and N. L. Johnson, Eds. New York, NY: Springer New York, 1992, pp. 492–518. 8

  7. [7]

    Real-Time Convex Optimization in Signal Processing,

    J. Mattingley and S. Boyd, “Real-Time Convex Optimization in Signal Processing,”IEEE Signal Processing Magazine, vol. 27, no. 3, pp. 50– 61, May 2010

  8. [8]

    Multi-Period Trading via Convex Optimization,

    S. Boyd, E. Busseti, S. Diamond, R. N. Kahn, K. Koh, P. Nystrup, and J. Speth, “Multi-Period Trading via Convex Optimization,”Foun- dations and Trends® in Optimization, vol. 3, no. 1, pp. 1–76, 2017

  9. [9]

    Nocedal and S

    J. Nocedal and S. J. Wright,Numerical Optimization, 2nd ed., ser. Springer Series in Operations Research and Financial Engineering. New York, NY: Springer, 2006

  10. [10]

    Numerical Experience with Lower Bounds for MIQP Branch-And-Bound,

    R. Fletcher and S. Leyffer, “Numerical Experience with Lower Bounds for MIQP Branch-And-Bound,”SIAM Journal on Optimization, vol. 8, no. 2, pp. 604–616, May 1998

  11. [11]

    S. J. Wright,Primal-Dual Interior-Point Methods, ser. Other Titles in Applied Mathematics. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM, 3600 Market Street, Floor 6, Philadelphia, PA 19104), 1997, no. 54

  12. [12]

    Interior-point methods,

    F. A. Potra and S. J. Wright, “Interior-point methods,”Numerical Analysis 2000. Vol. IV: Optimization and Nonlinear Equations, vol. 124, no. 1, pp. 281–302, Dec. 2000

  13. [13]

    Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers,

    S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers,”Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011

  14. [14]

    Proximal Algorithms,

    N. Parikh and S. Boyd, “Proximal Algorithms,”Foundations and Trends in Optimization, vol. 1, no. 3, pp. 127–239, 2014

  15. [15]

    OSQP: An operator splitting solver for quadratic programs,

    B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “OSQP: An operator splitting solver for quadratic programs,”Mathematical Programming Computation, vol. 12, no. 4, pp. 637–672, Dec. 2020

  16. [16]

    Optimal parameter selection for the alternating direction method of multipliers (ADMM): Quadratic problems,

    E. Ghadimi, A. Teixeira, I. Shames, and M. Johansson, “Optimal parameter selection for the alternating direction method of multipliers (ADMM): Quadratic problems,”IEEE Transactions on Automatic Control, vol. 60, no. 3, pp. 644–658, 2015

  17. [17]

    Acceleration Methods,

    A. d’Aspremont, D. Scieur, and A. Taylor, “Acceleration Methods,” Foundations and Trends in Optimization, vol. 5, no. 1-2, pp. 1–245, 2021

  18. [18]

    Nesterov,Lectures on Convex Optimization, ser

    Y . Nesterov,Lectures on Convex Optimization, ser. Springer Optimiza- tion and Its Applications. Cham: Springer International Publishing, 2018, vol. 137

  19. [19]

    A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems,

    A. Beck and M. Teboulle, “A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems,”SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009

  20. [20]

    GMRES-Accelerated ADMM for Quadratic Objectives,

    R. Y . Zhang and J. K. White, “GMRES-Accelerated ADMM for Quadratic Objectives,”SIAM Journal on Optimization, vol. 28, no. 4, pp. 3025–3056, 2018

  21. [21]

    Iterative Procedures for Nonlinear Integral Equa- tions,

    D. G. Anderson, “Iterative Procedures for Nonlinear Integral Equa- tions,”Journal of the ACM, vol. 12, no. 4, pp. 547–560, Oct. 1965

  22. [22]

    Operator Splitting for a Homogeneous Embedding of the Linear Complementarity Problem,

    B. O’Donoghue, “Operator Splitting for a Homogeneous Embedding of the Linear Complementarity Problem,”SIAM Journal on Optimiza- tion, vol. 31, no. 3, pp. 1999–2023, 2021

  23. [23]

    COSMO: A Conic Operator Splitting Method for Convex Conic Problems,

    M. Garstka, M. Cannon, and P. Goulart, “COSMO: A Conic Operator Splitting Method for Convex Conic Problems,”Journal of Optimiza- tion Theory and Applications, vol. 190, no. 3, pp. 779–810, Sep. 2021

  24. [24]

    Safeguarded Anderson acceleration for parametric nonexpan- sive operators,

    ——, “Safeguarded Anderson acceleration for parametric nonexpan- sive operators,” in2022 European Control Conference (ECC), 2022, pp. 435–440

  25. [25]

    SCS: Splitting conic solver, version 3.2.8,

    B. O’Donoghue, E. Chu, N. Parikh, and S. Boyd, “SCS: Splitting conic solver, version 3.2.8,” Nov. 2023

  26. [26]

    GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems,

    Y . Saad and M. H. Schultz, “GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems,”SIAM Journal on Scientific and Statistical Computing, vol. 7, no. 3, pp. 856– 869, Jul. 1986

  27. [27]

    Anderson Acceleration for Fixed-Point Iterations,

    H. F. Walker and P. Ni, “Anderson Acceleration for Fixed-Point Iterations,”SIAM Journal on Numerical Analysis, vol. 49, no. 4, pp. 1715–1735, 2011

  28. [28]

    L. N. Trefethen and D. Bau, III,Numerical Linear Algebra. Philadel- phia, PA: Society for Industrial and Applied Mathematics, 1997

  29. [29]

    H. H. Bauschke and P. L. Combettes,Convex Analysis and Monotone Operator Theory in Hilbert Spaces, ser. CMS Books in Mathematics. Cham: Springer International Publishing, 2017

  30. [30]

    E. K. Ryu and W. Yin,Large-Scale Convex Optimization: Algorithms & Analyses via Monotone Operators. Cambridge: Cambridge Uni- versity Press, 2022

  31. [31]

    Verification of first-order methods for parametric quadratic optimization,

    V . Ranjan and B. Stellato, “Verification of first-order methods for parametric quadratic optimization,”Mathematical Programming, Jul. 2025

  32. [32]

    Rate of Convergence Analysis of Decom- position Methods Based on the Proximal Method of Multipliers for Convex Minimization,

    R. Shefi and M. Teboulle, “Rate of Convergence Analysis of Decom- position Methods Based on the Proximal Method of Multipliers for Convex Minimization,”SIAM Journal on Optimization, vol. 24, no. 1, pp. 269–297, Jan. 2014

  33. [33]

    Degenerate Preconditioned Proximal Point Algorithms,

    K. Bredies, E. Chenchene, D. A. Lorenz, and E. Naldi, “Degenerate Preconditioned Proximal Point Algorithms,”SIAM Journal on Opti- mization, vol. 32, no. 3, pp. 2376–2401, Sep. 2022

  34. [34]

    An introduction to continuous optimiza- tion for imaging,

    A. Chambolle and T. Pock, “An introduction to continuous optimiza- tion for imaging,”Acta Numerica, vol. 25, pp. 161–319, 2016

  35. [35]

    Acceleration of Primal–Dual Methods by Preconditioning and Simple Subproblem Procedures,

    Y . Liu, Y . Xu, and W. Yin, “Acceleration of Primal–Dual Methods by Preconditioning and Simple Subproblem Procedures,”Journal of Scientific Computing, vol. 86, no. 2, p. 21, Jan. 2021

  36. [36]

    Monotone Operators and the Proximal Point Algorithm,

    R. T. Rockafellar, “Monotone Operators and the Proximal Point Algorithm,”SIAM Journal on Control and Optimization, vol. 14, no. 5, pp. 877–898, Aug. 1976

  37. [37]

    Globally Convergent Type-I Anderson Acceleration for Nonsmooth Fixed-Point Iterations,

    J. Zhang, B. O’Donoghue, and S. Boyd, “Globally Convergent Type-I Anderson Acceleration for Nonsmooth Fixed-Point Iterations,”SIAM Journal on Optimization, vol. 30, no. 4, pp. 3170–3197, 2020

  38. [38]

    Local Linear Convergence of the Alternating Direction Method of Multipliers on Quadratic or Linear Programs,

    D. Boley, “Local Linear Convergence of the Alternating Direction Method of Multipliers on Quadratic or Linear Programs,”SIAM Journal on Optimization, vol. 23, no. 4, pp. 2183–2207, Jan. 2013

  39. [39]

    Safeguarded Learned Convex Optimization,

    H. Heaton, X. Chen, Z. Wang, and W. Yin, “Safeguarded Learned Convex Optimization,”Proceedings of the AAAI Conference on Arti- ficial Intelligence, vol. 37, no. 6, pp. 7848–7855, Jun. 2023

  40. [40]

    T. A. Davis,Direct Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, 2006

  41. [41]

    Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms,

    S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel, “Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms,” inProceedings of the 2007 ACM/IEEE Con- ference on Supercomputing, ser. Sc ’07. New York, NY , USA: Association for Computing Machinery, 2007

  42. [42]

    Roofline: An Insightful Visual Performance Model for Multicore Architectures,

    S. Williams, A. Waterman, and D. Patterson, “Roofline: An Insightful Visual Performance Model for Multicore Architectures,”Communica- tions of the ACM, vol. 52, no. 4, pp. 65–76, Apr. 2009

  43. [43]

    Clarabel: An interior-point solver for conic programs with quadratic objectives,

    P. J. Goulart and Y . Chen, “Clarabel: An interior-point solver for conic programs with quadratic objectives,” May 2024

  44. [44]

    Theory of Inexact Krylov Subspace Methods and Applications to Scientific Computing,

    V . Simoncini and D. B. Szyld, “Theory of Inexact Krylov Subspace Methods and Applications to Scientific Computing,”SIAM Journal on Scientific Computing, vol. 25, no. 2, pp. 454–477, Jan. 2003

  45. [45]

    A Practical and Optimal First-Order Method for Large-Scale Convex Quadratic Programming,

    H. Lu and J. Yang, “A Practical and Optimal First-Order Method for Large-Scale Convex Quadratic Programming,”Mathematical Pro- gramming, Jul. 2025

  46. [46]

    Benchmarking optimization software with performance profiles,

    E. D. Dolan and J. J. Mor ´e, “Benchmarking optimization software with performance profiles,”Mathematical Programming, vol. 91, no. 2, pp. 201–213, Jan. 2002

  47. [47]

    The University of Florida Sparse Matrix Collection,

    T. A. Davis and Y . Hu, “The University of Florida Sparse Matrix Collection,”ACM Trans. Math. Softw., vol. 38, no. 1, Dec. 2011