Presolving for GPU-Accelerated First-Order LP Solvers

Daniel Cederberg; Stephen Boyd

arxiv: 2604.23951 · v1 · submitted 2026-04-27 · 🧮 math.OC

Presolving for GPU-Accelerated First-Order LP Solvers

Daniel Cederberg , Stephen Boyd This is my paper

Pith reviewed 2026-05-08 02:55 UTC · model grok-4.3

classification 🧮 math.OC

keywords linear programmingpresolvingGPU accelerationfirst-order methodsoptimization solversproblem reduction

0 comments

The pith

A small set of inexpensive presolve rules matches most of the size reduction from Gurobi's commercial presolver and produces net speedups for GPU-accelerated LP solvers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether traditional LP presolving, which prioritizes maximum problem reduction even when costly, still fits when solvers run on fast GPUs. It presents a collection of basic presolve rules that together recover most of the size reduction achieved by Gurobi's full presolver while using far less time. When these rules are applied before the cuPDLPx solver, the total pipeline time drops substantially, even though presolving can occupy a noticeable share of runtime. The work supplies an open-source C implementation of the presolver so others can test and adopt the same lightweight approach.

Core claim

We identify a set of relatively simple presolve rules and show that a carefully engineered collection of these captures most of the reduction achieved by Gurobi's commercial state-of-the-art presolver, at a fraction of the cost. Moreover, we demonstrate that such lightweight presolving can yield substantial end-to-end speedups for the GPU-accelerated solver cuPDLPx, despite presolving sometimes constituting a significant fraction of the total runtime.

What carries the argument

The lightweight presolver built from a carefully engineered collection of simple presolve rules, released as the open-source PSLP library.

If this is right

Lightweight presolving continues to improve end-to-end performance for GPU-accelerated first-order LP solvers even when presolve time is non-negligible.
The same rules remain useful as GPU speed increases further.
The open-source PSLP implementation can be integrated into other solvers to obtain comparable gains.
Presolve rules should be selected and engineered with attention to their computational cost relative to the subsequent solve time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same cost-aware selection of rules could be applied to first-order solvers for quadratic or conic programs where sequential preprocessing competes with parallel solves.
Automatically choosing which subset of rules to apply based on quick problem statistics might reduce overhead further on particular instance classes.
The results point toward hardware-specific presolving strategies that trade off reduction depth against preprocessing latency rather than maximizing reduction alone.

Load-bearing premise

The benchmark instances used represent the linear programs that arise in practice.

What would settle it

Running the lightweight presolver and cuPDLPx on a new collection of industrial LPs and finding either far smaller size reductions than Gurobi or no net reduction in total solve time.

Figures

Figures reproduced from arXiv: 2604.23951 by Daniel Cederberg, Stephen Boyd.

**Figure 1.** Figure 1: Comparison of PSLP and Gurobi’s presolver on Mittelmann’s LP collection. Each bin shows the ratio of the nonzero entries in the reduced problem to that in the original problem. Above each bin, we show the number of nonzero entries in the original problem, and the presolve times of PSLP and Gurobi. The problems are sorted by the number of nonzero entries in the original problem. 14 view at source ↗

**Figure 2.** Figure 2: Comparison of PSLP and Gurobi’s presolver on the 50 largest MIPLIB relaxations. 15 view at source ↗

**Figure 3.** Figure 3: The speedup (i.e., the ratio of the total solve time without presolve to the total solve time with presolve) of enabling PSLP for cuPDLPx for each problem instance. 18 view at source ↗

read the original abstract

Recent research has focused on developing GPU-accelerated first-order solvers for linear programming (LP). This line of work, however, has largely overlooked the role of presolving, and thus prior results do not fully reflect the speedups achievable through GPU acceleration in a realistic end-to-end solver pipeline. At the same time, LP presolving has traditionally been developed for CPU-based solvers, where presolve time rarely dominates the total runtime and the emphasis has been on maximizing the reduction in problem size, even at the expense of costly presolve rules. Given the high performance of modern GPU-accelerated solvers and the inherently sequential nature of presolving, it is unclear whether this traditional approach to presolving remains appropriate. In this paper we revisit LP presolving from the perspective of GPU-accelerated first-order LP solvers. We identify a set of relatively simple presolve rules and show that a carefully engineered collection of these captures most of the reduction achieved by Gurobi's commercial state-of-the-art presolver, at a fraction of the cost. Moreover, we demonstrate that such lightweight presolving can yield substantial end-to-end speedups for the GPU-accelerated solver cuPDLPx, despite presolving sometimes constituting a significant fraction of the total runtime. These results suggest that lightweight presolving may remain beneficial as GPU performance continues to scale, while the sequential nature of presolving presumably does not. We accompany this paper with an open-source C implementation of an LP presolver, called PSLP (Presolver for Linear Programs). PSLP is battle-tested and has been adopted by the community, with integrations in cuPDLPx, cuOpt (NVIDIA's optimization library), and HPR-LP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A small set of cheap presolve rules gets most of Gurobi's size reduction for GPU first-order LP solvers and produces real end-to-end speedups on cuPDLPx.

read the letter

The main point is that the authors picked a narrow collection of simple, sequential presolve rules and showed they recover most of the variable and constraint reduction that Gurobi's full presolver delivers, while staying fast enough to help rather than hurt GPU-accelerated first-order methods. They also ship a clean open-source C implementation called PSLP that is already in use inside cuPDLPx and NVIDIA's cuOpt library. That combination of targeted rules plus working code is the concrete advance over prior work that treated presolve as an afterthought for these solvers.

Referee Report

1 major / 2 minor

Summary. The paper proposes a collection of relatively simple LP presolve rules and shows that a carefully engineered subset captures most of the problem-size reduction achieved by Gurobi's commercial presolver while incurring far lower computational cost. It further demonstrates that applying this lightweight presolving produces substantial end-to-end speedups for the GPU first-order solver cuPDLPx, even when presolve time becomes a non-negligible fraction of total runtime. The work is accompanied by an open-source C implementation (PSLP) that has already been integrated into cuPDLPx, cuOpt, and HPR-LP.

Significance. If the reported reduction ratios and speedups hold under broader testing, the results would be important for the practical deployment of GPU-accelerated first-order LP solvers. The paper correctly identifies that traditional CPU-centric presolving assumptions may not transfer directly to high-throughput GPU regimes and supplies concrete evidence that low-cost rules remain beneficial. The open-source release and documented community adoption constitute a clear strength.

major comments (1)

[§5] §5 (Experimental Evaluation): The central claims that the engineered rules 'capture most of the reduction achieved by Gurobi' and 'yield substantial end-to-end speedups' rest on comparisons performed on a fixed collection of benchmark instances. No diversity metrics (condition numbers, sparsity patterns, problem origins, or application domains) are reported for these instances. This omission is load-bearing because the skeptic's concern about representativeness directly affects the generalizability of both the reduction-ratio and speedup results.

minor comments (2)

[Abstract] Abstract: the phrase 'battle-tested' is used without supporting detail; a short paragraph in the implementation section describing the validation suite would improve clarity.
[§3] §3 (Presolve Rules): some rule descriptions reuse similar variable names across different reductions; a consistent notation table would aid readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and positive evaluation of our work. We address the single major comment below and will revise the manuscript accordingly to improve the presentation of our experimental results.

read point-by-point responses

Referee: [§5] §5 (Experimental Evaluation): The central claims that the engineered rules 'capture most of the reduction achieved by Gurobi' and 'yield substantial end-to-end speedups' rest on comparisons performed on a fixed collection of benchmark instances. No diversity metrics (condition numbers, sparsity patterns, problem origins, or application domains) are reported for these instances. This omission is load-bearing because the skeptic's concern about representativeness directly affects the generalizability of both the reduction-ratio and speedup results.

Authors: We agree that providing explicit diversity metrics would enhance the reader's ability to assess the generalizability of our findings. Our test set consists of instances drawn from standard benchmark collections including Netlib, MIPLIB, and other publicly available LP problems used in the literature for presolve and solver evaluations. These collections are known to encompass a range of problem sizes, structures, and origins. Nevertheless, to directly address the referee's concern, in the revised manuscript we will add a dedicated paragraph or table in Section 5 that reports summary statistics on key characteristics such as the distribution of problem dimensions, average sparsity (nonzeros per row/column), and available metadata on problem origins and domains. Where feasible, we will also include estimates of condition numbers or other numerical properties. This addition will not alter the core results but will provide the requested context for evaluating representativeness. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical rules and benchmarks are self-contained

full rationale

The paper identifies a collection of simple presolve rules and evaluates their reduction and speedups via direct empirical comparison to Gurobi on benchmark instances plus runtime tests on cuPDLPx. No derivations, predictions, or uniqueness claims reduce to fitted parameters, self-definitions, or self-citation chains; all load-bearing statements rest on external measurements and an open-source implementation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work is algorithmic and empirical; the abstract introduces no new free parameters, mathematical axioms, or postulated entities. The presolve rules are standard techniques chosen for low cost.

pith-pipeline@v0.9.0 · 5605 in / 1135 out tokens · 76360 ms · 2026-05-08T02:55:06.496101+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages

[1]

Applegate, M

[ADH+25] D. Applegate, M. D´ ıaz, O. Hinder, H. Lu, M. Lubin, B. O’Donoghue, and W. Schudy. PDLP: A Practical First-Order Method for Large-Scale Linear Pro- gramming.arXiv preprint arXiv:2501.07018,

work page arXiv
[2]

[DCB13] A

Available at https://cowles.yale.edu/sites/default/files/2022-09/m13-all.pdf. [DCB13] A. Domahidi, E. Chu, and S. Boyd. ECOS: An SOCP Solver for Embedded Systems. In2013 European Control Conference (ECC), pages 3071–3076,

work page 2022
[3]

arXiv preprint arXiv:2405.12762 (2 024)

[GC24] P. Goulart and Y. Chen. Clarabel: An Interior-Point Solver for Conic Programs with Quadratic Objectives.arXiv preprint arXiv:2405.12762,

work page arXiv
[4]

Gleixner, G

[GHG+21] A. Gleixner, G. Hendel, G. Gamrath, T. Achterberg, M. Bastubbe, T. Berthold, P. Christophel, K. Jarck, T. Koch, J. Linderoth, M. L¨ ubbecke, H. Mittelmann, D. Ozyurt, T. Ralphs, D. Salvagnin, and Y. Shinano. MIPLIB 2017: Data-Driven Compilation of the 6th Mixed-Integer Programming Library.Mathematical Pro- gramming Computation, 13(3):443–490,

work page 2017
[5]

[LPY25] H. Lu, Z. Peng, and J. Yang. cuPDLPx: A Further Enhanced GPU-Based First-Order Solver for Linear Programming.arXiv preprint arXiv:2507.14051,

work page arXiv
[6]

[LYH+23] H. Lu, J. Yang, H. Hu, Q. Huangfu, J. Liu, T. Liu, Y. Ye, C. Zhang, and D. Ge. cuPDLP-C: A Strengthened Implementation of cuPDLP for Linear Programming by C Language.arXiv preprint arXiv:2312.14832,

work page arXiv
[7]

[MS03] C

Accessed: 2025-11-06. [MS03] C. M´ esz´ aros and U. Suhl. Advanced Preprocessing Techniques for Linear and Quadratic Programming.OR Spectrum, 25(4):575–595,

work page 2025

[1] [1]

Applegate, M

[ADH+25] D. Applegate, M. D´ ıaz, O. Hinder, H. Lu, M. Lubin, B. O’Donoghue, and W. Schudy. PDLP: A Practical First-Order Method for Large-Scale Linear Pro- gramming.arXiv preprint arXiv:2501.07018,

work page arXiv

[2] [2]

[DCB13] A

Available at https://cowles.yale.edu/sites/default/files/2022-09/m13-all.pdf. [DCB13] A. Domahidi, E. Chu, and S. Boyd. ECOS: An SOCP Solver for Embedded Systems. In2013 European Control Conference (ECC), pages 3071–3076,

work page 2022

[3] [3]

arXiv preprint arXiv:2405.12762 (2 024)

[GC24] P. Goulart and Y. Chen. Clarabel: An Interior-Point Solver for Conic Programs with Quadratic Objectives.arXiv preprint arXiv:2405.12762,

work page arXiv

[4] [4]

Gleixner, G

[GHG+21] A. Gleixner, G. Hendel, G. Gamrath, T. Achterberg, M. Bastubbe, T. Berthold, P. Christophel, K. Jarck, T. Koch, J. Linderoth, M. L¨ ubbecke, H. Mittelmann, D. Ozyurt, T. Ralphs, D. Salvagnin, and Y. Shinano. MIPLIB 2017: Data-Driven Compilation of the 6th Mixed-Integer Programming Library.Mathematical Pro- gramming Computation, 13(3):443–490,

work page 2017

[5] [5]

[LPY25] H. Lu, Z. Peng, and J. Yang. cuPDLPx: A Further Enhanced GPU-Based First-Order Solver for Linear Programming.arXiv preprint arXiv:2507.14051,

work page arXiv

[6] [6]

[LYH+23] H. Lu, J. Yang, H. Hu, Q. Huangfu, J. Liu, T. Liu, Y. Ye, C. Zhang, and D. Ge. cuPDLP-C: A Strengthened Implementation of cuPDLP for Linear Programming by C Language.arXiv preprint arXiv:2312.14832,

work page arXiv

[7] [7]

[MS03] C

Accessed: 2025-11-06. [MS03] C. M´ esz´ aros and U. Suhl. Advanced Preprocessing Techniques for Linear and Quadratic Programming.OR Spectrum, 25(4):575–595,

work page 2025