pith. sign in

arxiv: 2605.17362 · v1 · pith:PDBV5L7Wnew · submitted 2026-05-17 · 💻 cs.LG

Learning Fill-in Reduction Ordering via Graph Policy Optimization for Sparse Matrices

Pith reviewed 2026-05-20 14:26 UTC · model grok-4.3

classification 💻 cs.LG
keywords sparse matrix reorderingfill-in reductiongraph neural networksreinforcement learningpolicy optimizationsymbolic factorizationmemory reductionSuiteSparse collection
0
0 comments X

The pith

A graph policy optimization method learns matrix reorderings that cut fill-ins by modeling both global sparsity patterns and local symbolic feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to find permutations of sparse matrices that minimize the extra nonzeros, called fill-ins, created during factorization, thereby lowering memory and runtime costs in linear solvers. It frames this NP-complete ordering task as a reinforcement learning problem solved via graph policy optimization. Both the policy and value networks rely on a multi-hop graph neural network to encode global fill-in information from the matrix sparsity pattern. The policy additionally queries symbolic factorization at each step to obtain exact local fill-in counts, and an adaptive saturation function aligns this local signal with the value network to stabilize training. On the SuiteSparse collection the approach reports average gains of 29.3 fewer fill-ins and 31.3 lower peak memory relative to prior heuristics and reinforcement-learning baselines.

Core claim

The central claim is that a graph policy optimization framework, whose policy and value networks both employ a multi-hop graph neural backbone to embed global fill-in structure while the policy further extracts local step-wise fill-in counts via symbolic factorization and aligns them to the value network through an adaptive saturation function, produces reorderings that reduce fill-ins and memory usage more effectively than existing methods.

What carries the argument

Graph policy optimization that combines a multi-hop graph neural backbone for global embedding with direct symbolic-factorization feedback for local fill-in counts, aligned to the value network by an adaptive saturation function.

If this is right

  • The resulting permutations directly lower the memory footprint and arithmetic cost of sparse LU or Cholesky factorization.
  • Training stability improves because local exact fill-in signals are scaled and fed back to the value network.
  • Both global sparsity patterns and step-level exact counts are used, avoiding the limitations of purely pattern-based or purely feedback-based prior approaches.
  • The learned policy generalizes across matrices in the SuiteSparse collection without hand-crafted heuristic rules.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same global-plus-local graph reinforcement-learning pattern could be tested on related combinatorial ordering tasks such as bandwidth minimization or nested dissection.
  • Domain-specific matrices from finite-element or circuit applications might show even larger gains if the symbolic factorization step is tuned to those structures.
  • If the graph neural backbone scales, the trained policy could be applied to matrices larger than those seen during training without retraining from scratch.

Load-bearing premise

The multi-hop graph neural networks can capture enough global fill-in information and the adaptive saturation function can successfully align local symbolic feedback to the value network for stable training.

What would settle it

On a fresh collection of sparse matrices drawn from the same distribution as SuiteSparse, the method produces no statistically significant reduction in fill-ins or peak memory compared with the strongest existing heuristic.

Figures

Figures reproduced from arXiv: 2605.17362 by Huiyuan Li, Shuzi Niu, Tao Yuan, Wenjia Wu, Ziwei Li.

Figure 3
Figure 3. Figure 3: The policy network takes the elimination graph [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗
read the original abstract

Matrix reordering in large sparse solvers seeks a permutation that minimizes factorization fill-in to reduce memory and computation. Because the minimum fill-in ordering problem is NP-complete and fill-in is implicit in the sparsity pattern, graph-theoretic heuristics are used. Existing reinforcement learning methods either ignore sparsity patterns--missing the global fill-in--or lack local exact fill-in feedback. We propose a graph policy optimization method, modeling fill-ins from global and local views: both the policy and value networks use a multi-hop graph neural backbone to embed global fill-in; the policy further interacts with symbolic factorization over graphs to extract local, step-level fill-ins, and the resulting feedback is aligned with the value network via an adaptive saturation function to improve convergence. On the SuiteSparse Matrix Collection, our method achieves mean reductions of 29.3 in fill-ins and 31.3 in peak memory usage over state-of-the-art baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proposes a graph policy optimization method for learning fill-in reduction orderings in sparse matrices to minimize factorization fill-in. It employs a multi-hop graph neural network backbone in both policy and value networks to capture global fill-in information from sparsity patterns, augments the policy with local step-level fill-in feedback extracted via symbolic factorization over graphs, and aligns this feedback with the value network using an adaptive saturation function for improved convergence. Empirical evaluation on the SuiteSparse Matrix Collection reports mean reductions of 29.3 fill-ins and 31.3 in peak memory usage relative to state-of-the-art baselines.

Significance. If the reported performance gains prove robust and consistent, the work would offer a meaningful contribution by addressing gaps in prior RL approaches for this NP-complete problem through the combination of global graph embeddings and local exact symbolic feedback. This could improve practical sparse solvers in scientific computing, though the significance hinges on the reliability of the aggregate empirical results.

major comments (1)
  1. [Experimental results] Experimental results section: the central claim of mean reductions of 29.3 fill-ins and 31.3 peak memory usage is presented as aggregate figures without reported N (number of matrices), standard deviations, medians, or per-matrix/category breakdowns. This directly undermines assessment of whether the improvements are representative or potentially skewed by a small subset of matrices, as the headline numbers are load-bearing for the paper's empirical contribution.
minor comments (1)
  1. [Abstract] The abstract would benefit from explicitly naming the specific state-of-the-art baselines compared against and the total number of SuiteSparse matrices used in the evaluation for improved clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address the single major comment below and will incorporate revisions to strengthen the empirical presentation.

read point-by-point responses
  1. Referee: [Experimental results] Experimental results section: the central claim of mean reductions of 29.3 fill-ins and 31.3 peak memory usage is presented as aggregate figures without reported N (number of matrices), standard deviations, medians, or per-matrix/category breakdowns. This directly undermines assessment of whether the improvements are representative or potentially skewed by a small subset of matrices, as the headline numbers are load-bearing for the paper's empirical contribution.

    Authors: We agree that the current presentation of aggregate means alone limits the ability to assess consistency and potential skew. The experiments were conducted on a subset of matrices drawn from the SuiteSparse Matrix Collection, yet the manuscript reports only the headline mean reductions without accompanying statistics. In the revised version we will add the exact number of matrices evaluated, standard deviations of the reported reductions, median values, and per-matrix or category-wise breakdowns (e.g., by matrix size or application domain). These additions will be placed in the experimental results section and, where space permits, summarized in the abstract and conclusion to make the robustness of the gains transparent. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results on external public benchmark

full rationale

The paper proposes a graph policy optimization method for fill-in reduction ordering and reports mean performance improvements (29.3 fill-ins, 31.3 peak memory) on the public SuiteSparse Matrix Collection against external SOTA baselines. These are direct empirical comparisons on an independent dataset collection. No derivation chain reduces a claimed result to a fitted parameter, self-citation, or self-referential definition by construction. The method description (multi-hop GNN backbone, symbolic factorization feedback, adaptive saturation) is architectural and does not create a closed loop where outputs are forced by inputs in the manner of the enumerated circularity patterns. This is a standard non-circular empirical ML paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard background facts about the NP-completeness of minimum fill-in ordering and the utility of graph models for sparsity patterns; no new free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)
  • standard math The minimum fill-in ordering problem is NP-complete and therefore requires heuristics
    Stated as motivation in the abstract for using graph-theoretic and learning-based approximations.
  • domain assumption Graph neural networks can capture global fill-in patterns from the sparsity graph
    Implicit in the choice of multi-hop GNN backbone for the policy and value networks.

pith-pipeline@v0.9.0 · 5689 in / 1405 out tokens · 52701 ms · 2026-05-20T14:26:21.346755+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

  1. [1]

    Many large-scale scien- tific simulations—such as structural engineering and computational fluid dynamics—lead to large sparse linear systemsAx=b

    INTRODUCTION Matrix reordering is a key step that can substantially reduce memory usage when solving sparse linear systems. Many large-scale scien- tific simulations—such as structural engineering and computational fluid dynamics—lead to large sparse linear systemsAx=b. A ma- trixA∈R n×n is sparse when only a small fraction of its entries are nonzero [1],...

  2. [2]

    Learning Fill-in Reduction Ordering via Graph Policy Optimization for Sparse Matrices

    METHOD 2.1. Fill-in Generation Given a sparse matrixA, fill-ins arise from matrix factorization, most commonly LU factorization [12, 13, 14]. It is expressed as A=LUand derived from the Gaussian elimination process.Land Uare the lower and upper triangular matrices respectively andLhas a unit diagonalI. For simplicity, we mainly consider the symmetric arXi...

  3. [3]

    Experimental Settings As training data, we generate 10,000 symmetric matrices via Delau- nay triangulation [10, 9, 8] with sizesn∈[60,200]

    EXPERIMENTS 3.1. Experimental Settings As training data, we generate 10,000 symmetric matrices via Delau- nay triangulation [10, 9, 8] with sizesn∈[60,200]. For evaluation, we randomly sample from each category of the SuiteSparse Ma- trix Collection to form a 332-matrix test set spanning five domains: Structural Problem (SP), 2D/3D Problem (2D/3D), Circui...

  4. [4]

    CONCLUSION Sparse matrix reordering seeks an elimination ordering that mini- mizes fill-ins during matrix factorization. To avoid costly numeri- cal factorization and leverage graph structures, we analyze the cor- respondence between symbolic and numerical factorization and es- tablish that fill-in depends on the original graph and the elimina- tion order...

  5. [5]

    2021YFB0300203

    ACKNOWLEDGMENTS This research was supported by the National Key R&D Program of China under Grant No. 2021YFB0300203

  6. [6]

    Wilkinson and C

    J.H. Wilkinson and C. Reinsch,Handbook for Automatic Computation: Volume II: Linear Algebra, vol. 186 ofDie Grundlehren der mathematischen Wissenschaften, in Einzel- darstellungen mit besonderer Ber ¨ucksichtigung der Anwen- dungsgebiete, Springer Berlin Heidelberg, Berlin, Heidelberg, 1971

  7. [7]

    Al- gorithmic aspects of vertex elimination,

    Donald J. Rose, Robert E. Tarjan, and George S. Lueker, “Al- gorithmic aspects of vertex elimination,”SIAM Journal on Computing, vol. 5, no. 2, pp. 266–283, 1976

  8. [8]

    Computing the minimum fill-in is np- complete,

    M. Yannakakis, “Computing the minimum fill-in is np- complete,”SIAM Journal on Algebraic Discrete Methods, vol. 2, no. 1, pp. 77–79, 1981

  9. [9]

    A graph-theoretic study of the numerical solution of sparse positive definite systems of linear equations,

    D.J. Rose, “A graph-theoretic study of the numerical solution of sparse positive definite systems of linear equations,”Graph theory and computing, pp. 183–217, 1972

  10. [10]

    An approximate minimum degree ordering algorithm,

    Patrick R. Amestoy, Timothy A. Davis, and Iain S. Duff, “An approximate minimum degree ordering algorithm,”SIAM Journal on Matrix Analysis and Applications, vol. 17, no. 4, pp. 886–905, 1996

  11. [11]

    A column approximate minimum degree order- ing algorithm,

    Timothy A. Davis, John R. Gilbert, Stefan I. Larimore, and Es- mond G. Ng, “A column approximate minimum degree order- ing algorithm,”ACM Transactions on Mathematical Software, vol. 30, no. 3, pp. 377–380, 2004

  12. [12]

    Nested dissection of a regular finite element mesh,

    J. Alan George, “Nested dissection of a regular finite element mesh,”SIAM Journal on Numerical Analysis, vol. 10, no. 2, pp. 345–363, 1973

  13. [13]

    Graph partitioning and sparse matrix ordering using reinforcement learning and graph neural networks,

    Alice Gatti, Zhihui Hu, Tess Smidt, and Pieter Ghysels, “Graph partitioning and sparse matrix ordering using reinforcement learning and graph neural networks,” inJournal of Machine Learning Research. 2022, JMLR.org

  14. [14]

    Alpha elimination: Us- ing deep reinforcement learning to reduce fill-in during sparse matrix decomposition,

    Arpan Dasgupta and Pawan Kumar, “Alpha elimination: Us- ing deep reinforcement learning to reduce fill-in during sparse matrix decomposition,” inJoint European Conference on Ma- chine Learning and Knowledge Discovery in Databases. 2023, Springer

  15. [15]

    Learning algebraic multigrid using graph neural net- works,

    Ilay Luz, Meirav Galun, Haggai Maron, Ronen Basri, and Irad Yavneh, “Learning algebraic multigrid using graph neural net- works,” inProceedings of the 37th International Conference on Machine Learning. 2020, vol. 119, pp. 6489–6499, PMLR

  16. [16]

    The university of florida sparse matrix collection,

    Timothy A. Davis and Yifan Hu, “The university of florida sparse matrix collection,”ACM Transactions on Mathematical Software (TOMS), vol. 38, no. 1, pp. 1–25, 2011

  17. [17]

    Liu,Computer Solution of Large Sparse Positive Definite, Prentice-Hall, 1981

    Alan George and Joseph W.H. Liu,Computer Solution of Large Sparse Positive Definite, Prentice-Hall, 1981

  18. [18]

    Duff, Albert M

    Iain S. Duff, Albert M. Erisman, and John K. Reid,Direct Methods for Sparse Matrices, Clarendon Press, Oxford, 1986

  19. [19]

    Davis, Sivasankaran Rajamanickam, and Wis- sam M

    Timothy A. Davis, Sivasankaran Rajamanickam, and Wis- sam M. Sid-Lakhdar,A Survey of Direct Methods for Sparse Linear Systems, vol. 25 ofActa Numerica, Cambridge Univer- sity Press, 2016

  20. [20]

    The role of elimination trees in sparse factoriza- tion,

    J.W.H. Liu, “The role of elimination trees in sparse factoriza- tion,”SIAM Journal on Matrix Analysis and Applications, vol. 11, no. 1, pp. 134–172, 1990

  21. [21]

    Collec- tive influence algorithm to find influencers via optimal perco- lation in massively large social media,

    F. Morone, B. Min, L. Bo, R. Mari, and H. A. Makse, “Collec- tive influence algorithm to find influencers via optimal perco- lation in massively large social media,”Scientific Reports, vol. 6, 2016

  22. [22]

    Mixhop: Higher-order graph con- volutional architectures via sparsified neighborhood mixing,

    Sami Abu-El-Haija, Bryan Perozzi, Amol Kapoor, Nazanin Alipourfard, Kristina Lerman, Hrayr Harutyunyan, Greg Ver Steeg, and Aram Galstyan, “Mixhop: Higher-order graph con- volutional architectures via sparsified neighborhood mixing,” inProceedings of the 36th International Conference on Ma- chine Learning, Kamalika Chaudhuri and Ruslan Salakhutdi- nov, Ed...

  23. [23]

    Metis, a software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reduced orderings of sparse matrices,

    George Karypis and Vipin Kumar, “Metis, a software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reduced orderings of sparse matrices,” Tech. Rep., University of Minnesota, Minneapolis, MN, USA, 1998

  24. [24]

    An overview of superlu: Algorithms, implementa- tion, and user interface,

    X. S. Li, “An overview of superlu: Algorithms, implementa- tion, and user interface,”ACM Transactions on Mathematical Software, vol. 31, no. 3, pp. 302–325, 2005

  25. [25]

    Inductive representa- tion learning on large graphs,

    W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representa- tion learning on large graphs,” inAdvances in Neural Informa- tion Processing Systems, 2017, pp. 1024–1034