pith. sign in

arxiv: 1907.05309 · v1 · pith:LUQZJTIInew · submitted 2019-07-11 · 💻 cs.DS

State-of-The-Art Sparse Direct Solvers

Pith reviewed 2026-05-24 22:53 UTC · model grok-4.3

classification 💻 cs.DS
keywords sparse direct solverspreprocessingfill-in reductioncircuit simulationBLAS kernelsparallel factorizationcombinatorial algorithms
0
0 comments X

The pith

Preprocessing with combinatorial algorithms and dense BLAS kernels has significantly advanced sparse direct solvers for circuit simulation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews modern sparse elimination methods that rely on a preprocessing phase using combinatorial algorithms. These steps improve diagonal dominance, reduce fill-in during factorization, and increase opportunities for parallel execution. The methods also identify dense submatrices that can then be processed efficiently with multithreaded level-3 BLAS routines. The authors illustrate these advances through examples drawn from circuit simulation problems and conclude that the combination of techniques has produced substantial performance gains in recent years.

Core claim

Modern sparse direct solvers rely on a preprocessing phase based on combinatorial algorithms which improve diagonal dominance, reduce fill-in, and improve concurrency to allow for parallel treatment, while detecting dense submatrices that are handled by dense matrix kernels based on multithreaded level-3 BLAS; these combined improvements have advanced direct solution methods significantly for problems arising from circuit simulation.

What carries the argument

The preprocessing phase based on combinatorial algorithms that improves diagonal dominance, reduces fill-in, and enables parallelism, together with the detection and handling of dense submatrices via multithreaded level-3 BLAS kernels.

If this is right

  • Larger circuit simulation problems become solvable within practical time and memory limits.
  • Parallel execution on multi-core processors improves scalability of the factorization phase.
  • Hybrid dense-sparse workflows become standard for handling matrices with localized dense blocks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same preprocessing ideas could transfer to other application areas that generate large sparse matrices, such as finite-element structural analysis.
  • Hardware-specific tuning of the BLAS kernels might yield further gains on accelerators or GPUs.
  • Integration with iterative refinement or mixed-precision arithmetic could extend the approach to even larger systems.

Load-bearing premise

The chosen preprocessing algorithms and dense kernel strategies represent the current state of the art and have delivered significant advances in practice.

What would settle it

Direct timing comparisons on the same circuit simulation matrices showing that the reviewed preprocessing and kernel combinations produce no measurable reduction in solve time or memory use compared with earlier direct solvers.

read the original abstract

In this chapter we will give an insight into modern sparse elimination methods. These are driven by a preprocessing phase based on combinatorial algorithms which improve diagonal dominance, reduce fill-in, and improve concurrency to allow for parallel treatment. Moreover, these methods detect dense submatrices which can be handled by dense matrix kernels based on multithreaded level-3 BLAS. We will demonstrate for problems arising from circuit simulation, how the improvements in recent years have advanced direct solution methods significantly.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript is a review chapter describing modern sparse direct solvers. It focuses on a preprocessing phase using combinatorial algorithms to improve diagonal dominance, reduce fill-in, and enhance concurrency for parallelism, along with detection of dense submatrices handled via multithreaded level-3 BLAS kernels. The authors state they will demonstrate significant recent advances in direct methods for circuit-simulation problems.

Significance. If the review provides an accurate synthesis of recent preprocessing and kernel techniques with clear links to performance gains on circuit matrices, it could serve as a useful reference for the sparse linear algebra community. However, as a descriptive survey without new experiments or parameter-free derivations, its significance is limited to curation and exposition rather than advancing the state of the art.

major comments (1)
  1. [Abstract] Abstract: the central claim that 'the improvements in recent years have advanced direct solution methods significantly' for circuit-simulation problems is supported only by the authors' selection of preprocessing algorithms and cited performance numbers; no new controlled benchmarks (controlling for ordering, hardware, or baseline solvers) are described, leaving the magnitude of the advance unverifiable from the manuscript.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our review chapter. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'the improvements in recent years have advanced direct solution methods significantly' for circuit-simulation problems is supported only by the authors' selection of preprocessing algorithms and cited performance numbers; no new controlled benchmarks (controlling for ordering, hardware, or baseline solvers) are described, leaving the magnitude of the advance unverifiable from the manuscript.

    Authors: The manuscript is explicitly a review chapter whose purpose is to synthesize and curate recent literature on preprocessing techniques for sparse direct solvers, with emphasis on circuit-simulation matrices. The abstract claim is therefore grounded in the performance results reported in the cited primary sources rather than in any new experiments conducted by the authors. Because the work is not a research contribution presenting original benchmarks, we did not include new controlled experiments. We can, however, revise the abstract to make the review nature of the claim more explicit and to add a sentence directing readers to the specific cited papers that contain the original benchmark data. revision: partial

Circularity Check

0 steps flagged

No circularity; descriptive review without derivations, predictions, or self-referential reductions

full rationale

The paper is a review chapter describing existing preprocessing algorithms (diagonal dominance, fill-in reduction, concurrency) and dense BLAS kernels for sparse direct solvers, with a focus on circuit simulation matrices. No equations, fitted parameters, uniqueness theorems, or new predictions appear in the provided text or abstract. The claim of 'significant advances' rests on curation of prior methods rather than any derivation that reduces to the paper's own inputs by construction. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results are present. This is the expected outcome for a survey paper whose content is self-contained as description.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper; no new free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5608 in / 877 out tokens · 17665 ms · 2026-05-24T22:53:41.563995+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

  1. [1]

    A. Aho, J. Hopcroft, and J. Ullman.Data structures and algorithms. Addison-Wesley, 1983

  2. [2]

    Anapproximateminimumdegreeorderingalgorithm

    P.Amestoy,T.A.Davis,andI.S.Duff. Anapproximateminimumdegreeorderingalgorithm. SIAM J. Matrix Anal. Appl., 17(4):886–905, 1996

  3. [3]

    McKenney, S

    E.Anderson,Z.Bai,C.Bischof,J.Demmel,J.Dongarra,J.D.Croz,A.Greenbaum,S.Ham- marling, A. McKenney, S. Ostrouchov, and D. Sorensen.LAPACK Users’ Guide, Second Edition. SIAM Publications, 1995

  4. [4]

    M.Benzi,J.Haws,andM.Tuma.Preconditioninghighlyindefiniteandnonsymmetricmatrices. SIAM J. Sci. Comput., 22(4):1333–1353, 2000. 28 M. Bollhöfer, O. Schenk, R. Janalík, Steve Hamm, and K. Gullapalli

  5. [5]

    C. Berge. Two theorems in graph theory. InProceedings of National Academy of Science, pages 842–844, USA, 1957

  6. [6]

    J. D. Booth, N. D. Ellingwoodb, and S. R. Heidi K. Thornquist. Basker: Parallel sparse lu factorizationutilizinghierarchicalparallelismanddatalayouts. ParallelComputing,68:17–31, 2017

  7. [7]

    Gpu-acceleratedsparselufactorizationforcircuitsim- ulation with performance modeling.IEEE TrAnsactions on Parallel and Distributed Systems, 26:786–795, 2015

    X.Chen,L.Ren,Y.Wang,andH.Yang. Gpu-acceleratedsparselufactorizationforcircuitsim- ulation with performance modeling.IEEE TrAnsactions on Parallel and Distributed Systems, 26:786–795, 2015

  8. [8]

    X. Chen, Y. Wang, and H. Yang. Nicslu: An adaptive sparse matrix solver for parallel circuit simulation. IEEETransactionsonComputer-AidedDesignofIntegratedCircuitsandSystems , 32:261–274, 2013

  9. [9]

    Chevalier and F

    C. Chevalier and F. Pellegrini. PT-SCOTCH: a tool for efficient parallel graph ordering. Parallel Comput., 34(6–8):318–331, 2008

  10. [10]

    ACM, 1969

    E.CuthillandJ.McKee.Reducingthebandwidthofsparsesymmetricmatrices.In Proceedings of the 24th national conference of the ACM. ACM, 1969

  11. [11]

    Davis and K

    T. Davis and K. Stanley. Sparse LU factorization of circuit simulation matrices. InNumerical Aspects of Circuit and Device Modeling Workshop, June 2004

  12. [12]

    T. A. Davis.Direct Methods for Sparse Linear Systems. SIAM Publications, 2006

  13. [13]

    ACM SIGNUM Newslett., 20:2–18, 1985

    D.DodsonandJ.G.Lewis.Issuesrelatingtoextensionofthebasiclinearalgebrasubprograms. ACM SIGNUM Newslett., 20:2–18, 1985

  14. [14]

    J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson. Issues relating to extension of the basic linear algebra subprograms.ACM SIGNUM Newslett., 20:2–18, 1985

  15. [15]

    I. S. Duff, A. Erisman, and J. Reid.Direct Methods for Sparse Matrices. Oxford University Press, 1986

  16. [16]

    I. S. Duff and J. Koster. The design and use of algorithms for permuting large entries to the diagonal of sparse matrices.SIAM J. Matrix Anal. Appl., 20:889–901, 1999

  17. [17]

    I. S. Duff and S. Pralet. Strategies for scaling and pivoting for sparse symmetric indefinite problems. Technical Report TR/PA/04/59, CERFACS, Toulouse, France, 2004

  18. [18]

    Alinear-timeheuristicforimprovingnetworkpartitions

    C.M.FiducciaandR.M.Mattheyses. Alinear-timeheuristicforimprovingnetworkpartitions. InProceedings of the 19th Design Automation Conference, pages 175âĂ“–181. IEEE, 1997

  19. [19]

    M. Fiedler. A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory.Czecheslovak Mathematical Journal, 25(100):619–633, 1975

  20. [20]

    George and J

    A. George and J. W. H. Liu. The evolution of the minimum degree ordering algorithm.SIAM Review, 31:1–19, 1989

  21. [21]

    J. A. George and J. W. Liu.Computer Solution of Large Sparse Positive Definite Systems. Prentice-Hall, Englewood Cliffs, NJ, USA, 1981

  22. [22]

    J. A. George and E. Ng. An implementation of Gaussian elimination with partial pivoting for sparse systems.SIAM J. Sci. Statist. Comput., 6(2):390–409, 1985

  23. [23]

    Predictingstructureinsparsematrixcomputations

    J.R.Gilbert. Predictingstructureinsparsematrixcomputations. SIAMJ.MatrixAnal.Appl. , 15(1):162–79, 1994

  24. [24]

    Gupta and L

    A. Gupta and L. Ying. On algorithms for finding maximum matchings in bipartite graphs. Technical Report RC 21576 (97320), IBM T. J. Watson Research Center, Yorktown Heights, NY, October 25, 1999

  25. [25]

    Karypis and V

    G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1):359–392, 1998

  26. [26]

    Karypis, K

    G. Karypis, K. Schloegel, and V. Kumar.ParMeTis: Parallel Graph Partitioning and Sparse Matrix Ordering Library, Version 2.0. University of Minnesota, Dept. of Computer Science, September 1999

  27. [27]

    Kernighan and S

    B. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs.The Bell System Technical Journal, 29(2):291–307, 1970

  28. [28]

    OnparallelpushâĂ“relabelbasedalgorithmsforbipartite maximum matching.Parallel Comput., 40(7):289–308, 2014

    J.Langguth,A.Azad,andF.Manne. OnparallelpushâĂ“relabelbasedalgorithmsforbipartite maximum matching.Parallel Comput., 40(7):289–308, 2014

  29. [29]

    Langguth, M

    J. Langguth, M. M. A. Patwary, and F. Manne. Parallel algorithms for bipartite matching problems on distributed memory computers.Parallel Comput., 37(12):820–845, 2011. State-of-The-Art Sparse Direct Solvers 29

  30. [30]

    LaSalle and G

    D. LaSalle and G. Karypis. Multi–threaded graph partitioning. Technical report, Department of Computer Science & Engineering, University of Minnesota, Minneapolis, 2013

  31. [31]

    W.-K. Lee, R. Achar, and M. S. Nakhla. Dynamic gpu parallel sparse lu factorization for fast circuit simulation.IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pages 1–12, 2018

  32. [32]

    X. S. Li. An overview of SuperLU: Algorithms, implementation, and user interface.ACM Trans. Math. Softw., 31(3):302–325, 2005

  33. [33]

    J. W. Liu. The role of elimination trees in sparse factorization.SIAM J. Matrix Anal. Appl., 11(1):134–172, 1990

  34. [34]

    J. W. Liu and A. Sherman. Comparative analysis of the Cuthill–McKee and the reverse Cuthill–McKee ordering algorithms for sparse matrices.SIAM J. Numer. Anal., 13:198–213, 1976

  35. [35]

    H. M. Markowitz. The elimination form of the inverse and its application to linear program- ming. Management Science, 3:255–269, April 1957

  36. [36]

    SPICE2:acomputerprogramtosimulatesemiconductorcircuits

    L.W.Nagel. SPICE2:acomputerprogramtosimulatesemiconductorcircuits. Memorandum No. ERL-M520, University of California, Berkeley, California, May 1975

  37. [37]

    Olschowka and A

    M. Olschowka and A. Neumaier. A new pivoting strategy for Gaussian elimination.Linear Algebra and its Applications, 240:131–151, 1996

  38. [38]

    Pellegrini, J

    F. Pellegrini, J. Roman, and P. Amestoy. Hybridizing nested dissection and halo approximate minimum degree for efficient sparse matrix ordering.Concurrency: Practice and Experience, 12:69–84, 2000

  39. [39]

    Rewienski

    M. Rewienski. A perspective on fast-spice simulation technology. Springer, Dordrecht, 2011

  40. [40]

    Agraph-theoreticstudyofthenumericalsolutionofsparsepositivedefinitesystems of linear equations

    D.J.Rose. Agraph-theoreticstudyofthenumericalsolutionofsparsepositivedefinitesystems of linear equations. In R. C. Read, editor,Graph Theory and Computing. Academic Press, 1972

  41. [41]

    O. Schenk. Scalable parallel sparse LU factorization methods on shared memory multipro- cessors. PhD thesis, ETH Zurich, 2000. Diss. Technische Wissenschaften ETH Zurich, Nr. 13515, 2000

  42. [42]

    Schenk and K

    O. Schenk and K. Gärtner. Solving unsymmetric sparse systems of linear equations with PARDISO. Journal of Future Generation Computer Systems, 20(3):475–487, 2004

  43. [43]

    R. E. Tarjan. Data structures and network algorithms. InCBMS–NSF Regional Conference Series in Applied Mathematics, volume 44, 1983

  44. [44]

    X.Zhao,L.Han,andZ.Feng. Aperformance-guidedgraphsparsificationapproachtoscalable and robust spice-accurate integrated circuit simulations.IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems, 34:1639–1651, 2015