Krylov Subspace Acceleration for First-Order Splitting Methods in Convex Quadratic Programming
Pith reviewed 2026-05-17 23:55 UTC · model grok-4.3
The pith
A Krylov subspace acceleration scheme for first-order methods on convex quadratic programs outperforms Anderson acceleration in iteration count and often in computation time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a Krylov subspace acceleration, constructed from the history of residuals generated by a first-order method on a convex QP, produces a new iterate by solving a small least-squares problem over that subspace; this approach remains numerically stable where Anderson acceleration breaks down and delivers lower iteration counts on standard QP collections from optimal control and machine learning.
What carries the argument
The Krylov subspace acceleration scheme, which assembles a subspace from successive residuals of the first-order iteration and finds the minimal-residual combination within that subspace, analogous to GMRES but adapted to the piecewise-affine operators that arise from splitting methods on QPs.
If this is right
- Lower iteration counts for first-order splitting methods on convex QPs from standard benchmark sets.
- Particularly large savings in iteration count and runtime for optimal control problems and high-accuracy solves.
- Avoidance of the numerical instability that Anderson acceleration exhibits during slow convergence phases.
- Direct applicability to the splitting methods already used in model predictive control and statistical learning solvers.
Where Pith is reading between the lines
- The same subspace construction could be applied to first-order methods on other problems whose iteration operators are piecewise affine.
- The method opens the possibility of deriving convergence-rate bounds that exploit the finite-dimensional Krylov property rather than mixing weights.
- Embedding the accelerator inside existing QP solvers could shorten solution times in embedded or real-time optimal control loops.
- Scaling tests on larger QPs might show whether the small dense least-squares step remains negligible relative to the overall cost.
Load-bearing premise
That first-order methods applied to convex quadratic programs typically consist of piecewise-affine operators.
What would settle it
Running the proposed Krylov scheme and Anderson acceleration side-by-side on the same collection of MPC and statistical-learning QPs and finding that the Krylov method does not produce strictly fewer iterations on average, particularly for optimal-control instances solved to high accuracy.
Figures
read the original abstract
We propose an acceleration scheme for first-order methods (FOMs) for convex quadratic programs (QPs) that is analogous to Anderson acceleration and the Generalized Minimal Residual algorithm for linear systems. We motivate our proposed method from the observation that FOMs applied to QPs typically consist of piecewise-affine operators. We describe our Krylov subspace acceleration scheme, contrasting it with existing Anderson acceleration schemes and showing that it largely avoids the latter's well-known ill-conditioning issues in regions of slow convergence. We demonstrate the performance of our scheme relative to Anderson acceleration using standard collections of problems from model predictive control and statistical learning applications. We show that our method is faster than Anderson acceleration across the board in terms of iteration count, and in many cases in computation time, particularly for optimal control and for problems solved to high accuracy.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a Krylov subspace acceleration scheme for first-order splitting methods on convex quadratic programs. Motivated by the piecewise-affine nature of the operators arising from proximal or splitting steps, the method builds a subspace from successive residuals and solves a small least-squares problem at each iteration, analogous to GMRES. It is contrasted with Anderson acceleration and claimed to avoid progressive ill-conditioning in slow-convergence regimes. Numerical tests on standard MPC and statistical learning collections report fewer iterations than Anderson acceleration across the board and lower wall-clock time in many cases, especially for optimal control problems and high-accuracy solves.
Significance. If the conditioning and stability properties hold under active-set changes, the approach could provide a practical, more robust acceleration technique for first-order QP solvers in control and learning applications. The explicit use of the piecewise-affine structure to motivate the subspace construction is a distinctive angle that, if supported, would strengthen the contribution beyond standard empirical acceleration studies.
major comments (2)
- [§3] §3 (Krylov subspace construction and motivation): The claim that the piecewise-affine character of the operators ensures the Krylov least-squares problem remains better-conditioned than Anderson's mixing matrix when the active affine piece changes near the fixed point is not accompanied by analysis or targeted numerical diagnostics. This assumption is load-bearing for the central performance claim of stable high-accuracy acceleration.
- [Numerical experiments] Numerical experiments section: The manuscript asserts superiority 'across the board' in iteration count and 'in many cases' in computation time, yet supplies no details on experimental setup, problem dimensions, stopping tolerances, hardware, number of instances, or statistical measures. This prevents verification of the headline empirical results.
minor comments (2)
- The description of the QR factorization step for the small least-squares problem would benefit from an explicit algorithm listing or pseudocode box for reproducibility.
- A few sentences clarifying the precise relation between the proposed residual-based subspace and classical GMRES (e.g., whether restarts or deflation are used) would improve readability.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments. We appreciate the recognition of the potential practical value of the proposed acceleration scheme and address each major comment below, indicating planned revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (Krylov subspace construction and motivation): The claim that the piecewise-affine character of the operators ensures the Krylov least-squares problem remains better-conditioned than Anderson's mixing matrix when the active affine piece changes near the fixed point is not accompanied by analysis or targeted numerical diagnostics. This assumption is load-bearing for the central performance claim of stable high-accuracy acceleration.
Authors: We agree that the manuscript would benefit from additional support for the conditioning claim. The motivation in §3 rests on the piecewise-affine structure of the proximal/splitting operators for convex QPs, which implies that successive residuals reflect changes in the active affine piece; the Krylov least-squares problem then operates on a subspace spanned by these residuals rather than a fixed-size mixing matrix, thereby avoiding the progressive ill-conditioning observed in Anderson acceleration during slow convergence. While a full theoretical analysis of conditioning under arbitrary active-set changes lies beyond the present scope, we will add a short discussion clarifying this structural distinction together with targeted numerical diagnostics (e.g., condition-number histories of the least-squares matrices) in the revised §3. revision: partial
-
Referee: [Numerical experiments] Numerical experiments section: The manuscript asserts superiority 'across the board' in iteration count and 'in many cases' in computation time, yet supplies no details on experimental setup, problem dimensions, stopping tolerances, hardware, number of instances, or statistical measures. This prevents verification of the headline empirical results.
Authors: We apologize for the omission. The numerical section will be expanded to include a complete description of the experimental setup: problem dimensions drawn from the standard MPC and statistical-learning collections, stopping tolerances, hardware platform, number of instances per collection, and statistical measures (means, medians, and standard deviations of iteration counts and runtimes). These additions will enable independent verification of the reported performance advantages. revision: yes
Circularity Check
No circularity: derivation rests on external piecewise-affine observation and independent benchmarks
full rationale
The paper motivates the Krylov scheme from the standard observation that first-order methods on convex QPs yield piecewise-affine operators, then constructs an acceleration analogous to GMRES that solves a small least-squares problem on successive residuals. Performance is demonstrated via direct comparison on external benchmark collections (model predictive control and statistical learning problems) rather than on quantities defined from the method's own fitted parameters. No step reduces by construction to a self-definition, a fitted input renamed as prediction, or a load-bearing self-citation chain; the central claim of improved iteration count and stability therefore remains independent of the inputs it is evaluated against.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption First-order methods applied to convex quadratic programs typically consist of piecewise-affine operators.
Forward citations
Cited by 1 Pith paper
-
Anderson Acceleration for Linearly Converging SQP-Type Methods
Anderson acceleration speeds up local convergence of linearly converging SQP-type methods, with a heuristic for distant points, shown via optimal control examples in acados.
Reference graph
Works this paper leans on
-
[1]
S. Boyd and L. Vandenberghe,Convex Optimization. Cambridge: Cambridge University Press, 2004
work page 2004
-
[2]
A Splitting Method for Optimal Control,
B. O’Donoghue, G. Stathopoulos, and S. Boyd, “A Splitting Method for Optimal Control,”IEEE Transactions on Control Systems Technol- ogy, vol. 21, no. 6, pp. 2432–2442, 2013
work page 2013
-
[3]
Towards Proper Assessment of QP Algorithms for Embedded Model Predictive Control,
D. Kouzoupis, A. Zanelli, H. Peyrl, and H. J. Ferreau, “Towards Proper Assessment of QP Algorithms for Embedded Model Predictive Control,” in2015 European Control Conference (ECC), 2015, pp. 2609–2616
work page 2015
-
[4]
C. Cortes and V . Vapnik, “Support-Vector Networks,”Machine Learn- ing, vol. 20, no. 3, pp. 273–297, Sep. 1995
work page 1995
-
[5]
Regression Shrinkage and Selection Via the Lasso,
R. Tibshirani, “Regression Shrinkage and Selection Via the Lasso,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 58, no. 1, pp. 267–288, Jan. 1996
work page 1996
-
[6]
Robust Estimation of a Location Parameter,
P. J. Huber, “Robust Estimation of a Location Parameter,” inBreak- throughs in Statistics: Methodology and Distribution, S. Kotz and N. L. Johnson, Eds. New York, NY: Springer New York, 1992, pp. 492–518. 8
work page 1992
-
[7]
Real-Time Convex Optimization in Signal Processing,
J. Mattingley and S. Boyd, “Real-Time Convex Optimization in Signal Processing,”IEEE Signal Processing Magazine, vol. 27, no. 3, pp. 50– 61, May 2010
work page 2010
-
[8]
Multi-Period Trading via Convex Optimization,
S. Boyd, E. Busseti, S. Diamond, R. N. Kahn, K. Koh, P. Nystrup, and J. Speth, “Multi-Period Trading via Convex Optimization,”Foun- dations and Trends® in Optimization, vol. 3, no. 1, pp. 1–76, 2017
work page 2017
-
[9]
J. Nocedal and S. J. Wright,Numerical Optimization, 2nd ed., ser. Springer Series in Operations Research and Financial Engineering. New York, NY: Springer, 2006
work page 2006
-
[10]
Numerical Experience with Lower Bounds for MIQP Branch-And-Bound,
R. Fletcher and S. Leyffer, “Numerical Experience with Lower Bounds for MIQP Branch-And-Bound,”SIAM Journal on Optimization, vol. 8, no. 2, pp. 604–616, May 1998
work page 1998
-
[11]
S. J. Wright,Primal-Dual Interior-Point Methods, ser. Other Titles in Applied Mathematics. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM, 3600 Market Street, Floor 6, Philadelphia, PA 19104), 1997, no. 54
work page 1997
-
[12]
F. A. Potra and S. J. Wright, “Interior-point methods,”Numerical Analysis 2000. Vol. IV: Optimization and Nonlinear Equations, vol. 124, no. 1, pp. 281–302, Dec. 2000
work page 2000
-
[13]
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers,”Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011
work page 2011
-
[14]
N. Parikh and S. Boyd, “Proximal Algorithms,”Foundations and Trends in Optimization, vol. 1, no. 3, pp. 127–239, 2014
work page 2014
-
[15]
OSQP: An operator splitting solver for quadratic programs,
B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “OSQP: An operator splitting solver for quadratic programs,”Mathematical Programming Computation, vol. 12, no. 4, pp. 637–672, Dec. 2020
work page 2020
-
[16]
E. Ghadimi, A. Teixeira, I. Shames, and M. Johansson, “Optimal parameter selection for the alternating direction method of multipliers (ADMM): Quadratic problems,”IEEE Transactions on Automatic Control, vol. 60, no. 3, pp. 644–658, 2015
work page 2015
-
[17]
A. d’Aspremont, D. Scieur, and A. Taylor, “Acceleration Methods,” Foundations and Trends in Optimization, vol. 5, no. 1-2, pp. 1–245, 2021
work page 2021
-
[18]
Nesterov,Lectures on Convex Optimization, ser
Y . Nesterov,Lectures on Convex Optimization, ser. Springer Optimiza- tion and Its Applications. Cham: Springer International Publishing, 2018, vol. 137
work page 2018
-
[19]
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems,
A. Beck and M. Teboulle, “A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems,”SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009
work page 2009
-
[20]
GMRES-Accelerated ADMM for Quadratic Objectives,
R. Y . Zhang and J. K. White, “GMRES-Accelerated ADMM for Quadratic Objectives,”SIAM Journal on Optimization, vol. 28, no. 4, pp. 3025–3056, 2018
work page 2018
-
[21]
Iterative Procedures for Nonlinear Integral Equa- tions,
D. G. Anderson, “Iterative Procedures for Nonlinear Integral Equa- tions,”Journal of the ACM, vol. 12, no. 4, pp. 547–560, Oct. 1965
work page 1965
-
[22]
Operator Splitting for a Homogeneous Embedding of the Linear Complementarity Problem,
B. O’Donoghue, “Operator Splitting for a Homogeneous Embedding of the Linear Complementarity Problem,”SIAM Journal on Optimiza- tion, vol. 31, no. 3, pp. 1999–2023, 2021
work page 1999
-
[23]
COSMO: A Conic Operator Splitting Method for Convex Conic Problems,
M. Garstka, M. Cannon, and P. Goulart, “COSMO: A Conic Operator Splitting Method for Convex Conic Problems,”Journal of Optimiza- tion Theory and Applications, vol. 190, no. 3, pp. 779–810, Sep. 2021
work page 2021
-
[24]
Safeguarded Anderson acceleration for parametric nonexpan- sive operators,
——, “Safeguarded Anderson acceleration for parametric nonexpan- sive operators,” in2022 European Control Conference (ECC), 2022, pp. 435–440
work page 2022
-
[25]
SCS: Splitting conic solver, version 3.2.8,
B. O’Donoghue, E. Chu, N. Parikh, and S. Boyd, “SCS: Splitting conic solver, version 3.2.8,” Nov. 2023
work page 2023
-
[26]
GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems,
Y . Saad and M. H. Schultz, “GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems,”SIAM Journal on Scientific and Statistical Computing, vol. 7, no. 3, pp. 856– 869, Jul. 1986
work page 1986
-
[27]
Anderson Acceleration for Fixed-Point Iterations,
H. F. Walker and P. Ni, “Anderson Acceleration for Fixed-Point Iterations,”SIAM Journal on Numerical Analysis, vol. 49, no. 4, pp. 1715–1735, 2011
work page 2011
-
[28]
L. N. Trefethen and D. Bau, III,Numerical Linear Algebra. Philadel- phia, PA: Society for Industrial and Applied Mathematics, 1997
work page 1997
-
[29]
H. H. Bauschke and P. L. Combettes,Convex Analysis and Monotone Operator Theory in Hilbert Spaces, ser. CMS Books in Mathematics. Cham: Springer International Publishing, 2017
work page 2017
-
[30]
E. K. Ryu and W. Yin,Large-Scale Convex Optimization: Algorithms & Analyses via Monotone Operators. Cambridge: Cambridge Uni- versity Press, 2022
work page 2022
-
[31]
Verification of first-order methods for parametric quadratic optimization,
V . Ranjan and B. Stellato, “Verification of first-order methods for parametric quadratic optimization,”Mathematical Programming, Jul. 2025
work page 2025
-
[32]
R. Shefi and M. Teboulle, “Rate of Convergence Analysis of Decom- position Methods Based on the Proximal Method of Multipliers for Convex Minimization,”SIAM Journal on Optimization, vol. 24, no. 1, pp. 269–297, Jan. 2014
work page 2014
-
[33]
Degenerate Preconditioned Proximal Point Algorithms,
K. Bredies, E. Chenchene, D. A. Lorenz, and E. Naldi, “Degenerate Preconditioned Proximal Point Algorithms,”SIAM Journal on Opti- mization, vol. 32, no. 3, pp. 2376–2401, Sep. 2022
work page 2022
-
[34]
An introduction to continuous optimiza- tion for imaging,
A. Chambolle and T. Pock, “An introduction to continuous optimiza- tion for imaging,”Acta Numerica, vol. 25, pp. 161–319, 2016
work page 2016
-
[35]
Acceleration of Primal–Dual Methods by Preconditioning and Simple Subproblem Procedures,
Y . Liu, Y . Xu, and W. Yin, “Acceleration of Primal–Dual Methods by Preconditioning and Simple Subproblem Procedures,”Journal of Scientific Computing, vol. 86, no. 2, p. 21, Jan. 2021
work page 2021
-
[36]
Monotone Operators and the Proximal Point Algorithm,
R. T. Rockafellar, “Monotone Operators and the Proximal Point Algorithm,”SIAM Journal on Control and Optimization, vol. 14, no. 5, pp. 877–898, Aug. 1976
work page 1976
-
[37]
Globally Convergent Type-I Anderson Acceleration for Nonsmooth Fixed-Point Iterations,
J. Zhang, B. O’Donoghue, and S. Boyd, “Globally Convergent Type-I Anderson Acceleration for Nonsmooth Fixed-Point Iterations,”SIAM Journal on Optimization, vol. 30, no. 4, pp. 3170–3197, 2020
work page 2020
-
[38]
D. Boley, “Local Linear Convergence of the Alternating Direction Method of Multipliers on Quadratic or Linear Programs,”SIAM Journal on Optimization, vol. 23, no. 4, pp. 2183–2207, Jan. 2013
work page 2013
-
[39]
Safeguarded Learned Convex Optimization,
H. Heaton, X. Chen, Z. Wang, and W. Yin, “Safeguarded Learned Convex Optimization,”Proceedings of the AAAI Conference on Arti- ficial Intelligence, vol. 37, no. 6, pp. 7848–7855, Jun. 2023
work page 2023
-
[40]
T. A. Davis,Direct Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, 2006
work page 2006
-
[41]
Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms,
S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel, “Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms,” inProceedings of the 2007 ACM/IEEE Con- ference on Supercomputing, ser. Sc ’07. New York, NY , USA: Association for Computing Machinery, 2007
work page 2007
-
[42]
Roofline: An Insightful Visual Performance Model for Multicore Architectures,
S. Williams, A. Waterman, and D. Patterson, “Roofline: An Insightful Visual Performance Model for Multicore Architectures,”Communica- tions of the ACM, vol. 52, no. 4, pp. 65–76, Apr. 2009
work page 2009
-
[43]
Clarabel: An interior-point solver for conic programs with quadratic objectives,
P. J. Goulart and Y . Chen, “Clarabel: An interior-point solver for conic programs with quadratic objectives,” May 2024
work page 2024
-
[44]
Theory of Inexact Krylov Subspace Methods and Applications to Scientific Computing,
V . Simoncini and D. B. Szyld, “Theory of Inexact Krylov Subspace Methods and Applications to Scientific Computing,”SIAM Journal on Scientific Computing, vol. 25, no. 2, pp. 454–477, Jan. 2003
work page 2003
-
[45]
A Practical and Optimal First-Order Method for Large-Scale Convex Quadratic Programming,
H. Lu and J. Yang, “A Practical and Optimal First-Order Method for Large-Scale Convex Quadratic Programming,”Mathematical Pro- gramming, Jul. 2025
work page 2025
-
[46]
Benchmarking optimization software with performance profiles,
E. D. Dolan and J. J. Mor ´e, “Benchmarking optimization software with performance profiles,”Mathematical Programming, vol. 91, no. 2, pp. 201–213, Jan. 2002
work page 2002
-
[47]
The University of Florida Sparse Matrix Collection,
T. A. Davis and Y . Hu, “The University of Florida Sparse Matrix Collection,”ACM Trans. Math. Softw., vol. 38, no. 1, Dec. 2011
work page 2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.