State-of-The-Art Sparse Direct Solvers
Pith reviewed 2026-05-24 22:53 UTC · model grok-4.3
The pith
Preprocessing with combinatorial algorithms and dense BLAS kernels has significantly advanced sparse direct solvers for circuit simulation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Modern sparse direct solvers rely on a preprocessing phase based on combinatorial algorithms which improve diagonal dominance, reduce fill-in, and improve concurrency to allow for parallel treatment, while detecting dense submatrices that are handled by dense matrix kernels based on multithreaded level-3 BLAS; these combined improvements have advanced direct solution methods significantly for problems arising from circuit simulation.
What carries the argument
The preprocessing phase based on combinatorial algorithms that improves diagonal dominance, reduces fill-in, and enables parallelism, together with the detection and handling of dense submatrices via multithreaded level-3 BLAS kernels.
If this is right
- Larger circuit simulation problems become solvable within practical time and memory limits.
- Parallel execution on multi-core processors improves scalability of the factorization phase.
- Hybrid dense-sparse workflows become standard for handling matrices with localized dense blocks.
Where Pith is reading between the lines
- The same preprocessing ideas could transfer to other application areas that generate large sparse matrices, such as finite-element structural analysis.
- Hardware-specific tuning of the BLAS kernels might yield further gains on accelerators or GPUs.
- Integration with iterative refinement or mixed-precision arithmetic could extend the approach to even larger systems.
Load-bearing premise
The chosen preprocessing algorithms and dense kernel strategies represent the current state of the art and have delivered significant advances in practice.
What would settle it
Direct timing comparisons on the same circuit simulation matrices showing that the reviewed preprocessing and kernel combinations produce no measurable reduction in solve time or memory use compared with earlier direct solvers.
read the original abstract
In this chapter we will give an insight into modern sparse elimination methods. These are driven by a preprocessing phase based on combinatorial algorithms which improve diagonal dominance, reduce fill-in, and improve concurrency to allow for parallel treatment. Moreover, these methods detect dense submatrices which can be handled by dense matrix kernels based on multithreaded level-3 BLAS. We will demonstrate for problems arising from circuit simulation, how the improvements in recent years have advanced direct solution methods significantly.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a review chapter describing modern sparse direct solvers. It focuses on a preprocessing phase using combinatorial algorithms to improve diagonal dominance, reduce fill-in, and enhance concurrency for parallelism, along with detection of dense submatrices handled via multithreaded level-3 BLAS kernels. The authors state they will demonstrate significant recent advances in direct methods for circuit-simulation problems.
Significance. If the review provides an accurate synthesis of recent preprocessing and kernel techniques with clear links to performance gains on circuit matrices, it could serve as a useful reference for the sparse linear algebra community. However, as a descriptive survey without new experiments or parameter-free derivations, its significance is limited to curation and exposition rather than advancing the state of the art.
major comments (1)
- [Abstract] Abstract: the central claim that 'the improvements in recent years have advanced direct solution methods significantly' for circuit-simulation problems is supported only by the authors' selection of preprocessing algorithms and cited performance numbers; no new controlled benchmarks (controlling for ordering, hardware, or baseline solvers) are described, leaving the magnitude of the advance unverifiable from the manuscript.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our review chapter. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'the improvements in recent years have advanced direct solution methods significantly' for circuit-simulation problems is supported only by the authors' selection of preprocessing algorithms and cited performance numbers; no new controlled benchmarks (controlling for ordering, hardware, or baseline solvers) are described, leaving the magnitude of the advance unverifiable from the manuscript.
Authors: The manuscript is explicitly a review chapter whose purpose is to synthesize and curate recent literature on preprocessing techniques for sparse direct solvers, with emphasis on circuit-simulation matrices. The abstract claim is therefore grounded in the performance results reported in the cited primary sources rather than in any new experiments conducted by the authors. Because the work is not a research contribution presenting original benchmarks, we did not include new controlled experiments. We can, however, revise the abstract to make the review nature of the claim more explicit and to add a sentence directing readers to the specific cited papers that contain the original benchmark data. revision: partial
Circularity Check
No circularity; descriptive review without derivations, predictions, or self-referential reductions
full rationale
The paper is a review chapter describing existing preprocessing algorithms (diagonal dominance, fill-in reduction, concurrency) and dense BLAS kernels for sparse direct solvers, with a focus on circuit simulation matrices. No equations, fitted parameters, uniqueness theorems, or new predictions appear in the provided text or abstract. The claim of 'significant advances' rests on curation of prior methods rather than any derivation that reduces to the paper's own inputs by construction. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results are present. This is the expected outcome for a survey paper whose content is self-contained as description.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
A. Aho, J. Hopcroft, and J. Ullman.Data structures and algorithms. Addison-Wesley, 1983
work page 1983
-
[2]
Anapproximateminimumdegreeorderingalgorithm
P.Amestoy,T.A.Davis,andI.S.Duff. Anapproximateminimumdegreeorderingalgorithm. SIAM J. Matrix Anal. Appl., 17(4):886–905, 1996
work page 1996
-
[3]
E.Anderson,Z.Bai,C.Bischof,J.Demmel,J.Dongarra,J.D.Croz,A.Greenbaum,S.Ham- marling, A. McKenney, S. Ostrouchov, and D. Sorensen.LAPACK Users’ Guide, Second Edition. SIAM Publications, 1995
work page 1995
-
[4]
M.Benzi,J.Haws,andM.Tuma.Preconditioninghighlyindefiniteandnonsymmetricmatrices. SIAM J. Sci. Comput., 22(4):1333–1353, 2000. 28 M. Bollhöfer, O. Schenk, R. Janalík, Steve Hamm, and K. Gullapalli
work page 2000
-
[5]
C. Berge. Two theorems in graph theory. InProceedings of National Academy of Science, pages 842–844, USA, 1957
work page 1957
-
[6]
J. D. Booth, N. D. Ellingwoodb, and S. R. Heidi K. Thornquist. Basker: Parallel sparse lu factorizationutilizinghierarchicalparallelismanddatalayouts. ParallelComputing,68:17–31, 2017
work page 2017
-
[7]
X.Chen,L.Ren,Y.Wang,andH.Yang. Gpu-acceleratedsparselufactorizationforcircuitsim- ulation with performance modeling.IEEE TrAnsactions on Parallel and Distributed Systems, 26:786–795, 2015
work page 2015
-
[8]
X. Chen, Y. Wang, and H. Yang. Nicslu: An adaptive sparse matrix solver for parallel circuit simulation. IEEETransactionsonComputer-AidedDesignofIntegratedCircuitsandSystems , 32:261–274, 2013
work page 2013
-
[9]
C. Chevalier and F. Pellegrini. PT-SCOTCH: a tool for efficient parallel graph ordering. Parallel Comput., 34(6–8):318–331, 2008
work page 2008
- [10]
-
[11]
T. Davis and K. Stanley. Sparse LU factorization of circuit simulation matrices. InNumerical Aspects of Circuit and Device Modeling Workshop, June 2004
work page 2004
-
[12]
T. A. Davis.Direct Methods for Sparse Linear Systems. SIAM Publications, 2006
work page 2006
-
[13]
ACM SIGNUM Newslett., 20:2–18, 1985
D.DodsonandJ.G.Lewis.Issuesrelatingtoextensionofthebasiclinearalgebrasubprograms. ACM SIGNUM Newslett., 20:2–18, 1985
work page 1985
-
[14]
J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson. Issues relating to extension of the basic linear algebra subprograms.ACM SIGNUM Newslett., 20:2–18, 1985
work page 1985
-
[15]
I. S. Duff, A. Erisman, and J. Reid.Direct Methods for Sparse Matrices. Oxford University Press, 1986
work page 1986
-
[16]
I. S. Duff and J. Koster. The design and use of algorithms for permuting large entries to the diagonal of sparse matrices.SIAM J. Matrix Anal. Appl., 20:889–901, 1999
work page 1999
-
[17]
I. S. Duff and S. Pralet. Strategies for scaling and pivoting for sparse symmetric indefinite problems. Technical Report TR/PA/04/59, CERFACS, Toulouse, France, 2004
work page 2004
-
[18]
Alinear-timeheuristicforimprovingnetworkpartitions
C.M.FiducciaandR.M.Mattheyses. Alinear-timeheuristicforimprovingnetworkpartitions. InProceedings of the 19th Design Automation Conference, pages 175âĂ“–181. IEEE, 1997
work page 1997
-
[19]
M. Fiedler. A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory.Czecheslovak Mathematical Journal, 25(100):619–633, 1975
work page 1975
-
[20]
A. George and J. W. H. Liu. The evolution of the minimum degree ordering algorithm.SIAM Review, 31:1–19, 1989
work page 1989
-
[21]
J. A. George and J. W. Liu.Computer Solution of Large Sparse Positive Definite Systems. Prentice-Hall, Englewood Cliffs, NJ, USA, 1981
work page 1981
-
[22]
J. A. George and E. Ng. An implementation of Gaussian elimination with partial pivoting for sparse systems.SIAM J. Sci. Statist. Comput., 6(2):390–409, 1985
work page 1985
-
[23]
Predictingstructureinsparsematrixcomputations
J.R.Gilbert. Predictingstructureinsparsematrixcomputations. SIAMJ.MatrixAnal.Appl. , 15(1):162–79, 1994
work page 1994
-
[24]
A. Gupta and L. Ying. On algorithms for finding maximum matchings in bipartite graphs. Technical Report RC 21576 (97320), IBM T. J. Watson Research Center, Yorktown Heights, NY, October 25, 1999
work page 1999
-
[25]
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1):359–392, 1998
work page 1998
-
[26]
G. Karypis, K. Schloegel, and V. Kumar.ParMeTis: Parallel Graph Partitioning and Sparse Matrix Ordering Library, Version 2.0. University of Minnesota, Dept. of Computer Science, September 1999
work page 1999
-
[27]
B. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs.The Bell System Technical Journal, 29(2):291–307, 1970
work page 1970
-
[28]
J.Langguth,A.Azad,andF.Manne. OnparallelpushâĂ“relabelbasedalgorithmsforbipartite maximum matching.Parallel Comput., 40(7):289–308, 2014
work page 2014
-
[29]
J. Langguth, M. M. A. Patwary, and F. Manne. Parallel algorithms for bipartite matching problems on distributed memory computers.Parallel Comput., 37(12):820–845, 2011. State-of-The-Art Sparse Direct Solvers 29
work page 2011
-
[30]
D. LaSalle and G. Karypis. Multi–threaded graph partitioning. Technical report, Department of Computer Science & Engineering, University of Minnesota, Minneapolis, 2013
work page 2013
-
[31]
W.-K. Lee, R. Achar, and M. S. Nakhla. Dynamic gpu parallel sparse lu factorization for fast circuit simulation.IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pages 1–12, 2018
work page 2018
-
[32]
X. S. Li. An overview of SuperLU: Algorithms, implementation, and user interface.ACM Trans. Math. Softw., 31(3):302–325, 2005
work page 2005
-
[33]
J. W. Liu. The role of elimination trees in sparse factorization.SIAM J. Matrix Anal. Appl., 11(1):134–172, 1990
work page 1990
-
[34]
J. W. Liu and A. Sherman. Comparative analysis of the Cuthill–McKee and the reverse Cuthill–McKee ordering algorithms for sparse matrices.SIAM J. Numer. Anal., 13:198–213, 1976
work page 1976
-
[35]
H. M. Markowitz. The elimination form of the inverse and its application to linear program- ming. Management Science, 3:255–269, April 1957
work page 1957
-
[36]
SPICE2:acomputerprogramtosimulatesemiconductorcircuits
L.W.Nagel. SPICE2:acomputerprogramtosimulatesemiconductorcircuits. Memorandum No. ERL-M520, University of California, Berkeley, California, May 1975
work page 1975
-
[37]
M. Olschowka and A. Neumaier. A new pivoting strategy for Gaussian elimination.Linear Algebra and its Applications, 240:131–151, 1996
work page 1996
-
[38]
F. Pellegrini, J. Roman, and P. Amestoy. Hybridizing nested dissection and halo approximate minimum degree for efficient sparse matrix ordering.Concurrency: Practice and Experience, 12:69–84, 2000
work page 2000
- [39]
-
[40]
Agraph-theoreticstudyofthenumericalsolutionofsparsepositivedefinitesystems of linear equations
D.J.Rose. Agraph-theoreticstudyofthenumericalsolutionofsparsepositivedefinitesystems of linear equations. In R. C. Read, editor,Graph Theory and Computing. Academic Press, 1972
work page 1972
-
[41]
O. Schenk. Scalable parallel sparse LU factorization methods on shared memory multipro- cessors. PhD thesis, ETH Zurich, 2000. Diss. Technische Wissenschaften ETH Zurich, Nr. 13515, 2000
work page 2000
-
[42]
O. Schenk and K. Gärtner. Solving unsymmetric sparse systems of linear equations with PARDISO. Journal of Future Generation Computer Systems, 20(3):475–487, 2004
work page 2004
-
[43]
R. E. Tarjan. Data structures and network algorithms. InCBMS–NSF Regional Conference Series in Applied Mathematics, volume 44, 1983
work page 1983
-
[44]
X.Zhao,L.Han,andZ.Feng. Aperformance-guidedgraphsparsificationapproachtoscalable and robust spice-accurate integrated circuit simulations.IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems, 34:1639–1651, 2015
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.