pith. sign in

arxiv: 2602.14289 · v2 · pith:LQTMVLV6new · submitted 2026-02-15 · 💻 cs.MS · cs.DC· cs.NA· math.NA

Parallel Sparse and Data-Sparse Factorization-based Linear Solvers

Pith reviewed 2026-05-25 07:02 UTC · model grok-4.3

classification 💻 cs.MS cs.DCcs.NAmath.NA
keywords sparse direct solversparallel linear solverslow-rank compressioncommunication reductionhierarchical matricesfactorizationscalable solversheterogeneous computing
0
0 comments X

The pith

Direct solvers remain essential for robust large-scale linear systems via advances in parallel communication reduction and low-rank compression.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews recent advances in sparse direct solvers for large-scale, ill-conditioned algebraic equations needed in multiphysics simulations, machine learning, and data science. It examines progress along two axes: reducing communication and latency costs in task- and data-parallel settings, and lowering computational complexity through low-rank and hierarchical matrix compression. Direct solvers are highlighted for their robustness and accuracy as key building blocks in scalable solver toolchains. The review covers algorithmic principles along with parallelization challenges and practices for achieving high speed and reliability on modern heterogeneous machines.

Core claim

Because of their robustness and accuracy, direct solvers are crucial components in building a scalable solver toolchain, and the key recent advances worth highlighting are techniques for communication reduction and low-rank compression in parallel sparse direct solvers.

What carries the argument

Sparse direct solvers that combine task- and data-parallel communication reduction with low-rank and hierarchical matrix algebra compression to handle factorization of large ill-conditioned systems.

If this is right

  • Direct solvers can solve ill-conditioned and indefinite equations more efficiently in parallel environments.
  • Scalable solver toolchains become feasible for applications in multiphysics, machine learning, and data science.
  • High speed and reliability are delivered on heterogeneous parallel machines through targeted parallelization practices.
  • Computational complexity drops via low-rank approximations without sacrificing the accuracy of factorization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • These advances may expand the range of problems where direct solvers are preferred over iterative alternatives due to guaranteed robustness.
  • Implementation details from the review could guide development of hybrid solver libraries that mix direct and other methods.
  • The techniques suggest potential extensions to time-dependent or nonlinear problems where repeated factorizations occur.

Load-bearing premise

The reviewed techniques in communication reduction and low-rank compression constitute the key recent advances worth highlighting for parallel sparse direct solvers.

What would settle it

Demonstration on large ill-conditioned systems that communication reduction or low-rank compression techniques fail to improve scalability, accuracy, or reliability of direct solvers compared to prior methods.

Figures

Figures reproduced from arXiv: 2602.14289 by Xiaoye Sherry Li, Yang Liu.

Figure 1
Figure 1. Figure 1: Sketch of the right-looking and multifrontal algorithms. We define [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of level set in a lower triangular SpTRSV, [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The view of the logical 3D process grid and an example of 18 processes arranged as a 3x3x2 process grid. [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Two-level etree partition and the matrix view of the submatrix mapping to four 2D process grids. [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Asymptotic per process communication volumes given in Tables 1 and 2, with different [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of several types of hierarchical matrices (HODLR, [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: [44, figs. 1 and 4] (a) Block partitioning of a hierarchical matrix for a 3D problem of size [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: [44, fig. 5] The time of the CPU and GPU implementations of the construction algorithm [44] for the kernel and [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: [43, figs. 1 and 4-12] Illustration of the factorization algorithm of [43] on the [PITH_FULL_IMAGE:figures/full_fig_p025_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: [43, figs. 13 and 16] The factorization time, memory and solve time of the shared-memory parallel [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: For a hierarchical matrix, the most common parallel layout is perhaps the 1D block layout of [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗
Figure 11
Figure 11. Figure 11: (Top) Parallel process layouts for distributing a hierarchical matrix on 8 MPI processes: (a) 1D block layout [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Strong scaling (on distributed-memory systems) of various phases of several hierarchical matrix algorithms: [PITH_FULL_IMAGE:figures/full_fig_p027_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: [60, fig. 5] The assembly tree of rank-structured multifrontal solvers using different compression algorithms. [PITH_FULL_IMAGE:figures/full_fig_p029_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: [60, figs. 5 and 8] Characteristics of the factorization phase including flops, CPU time and memory for a [PITH_FULL_IMAGE:figures/full_fig_p030_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: (a) [60, fig. 6] Strong scaling of the parallel HSS-multifrontal solver of STRUMPACK. (b) [59, fig. 1] [PITH_FULL_IMAGE:figures/full_fig_p032_15.png] view at source ↗
read the original abstract

Efficient solutions of large-scale, ill-conditioned and indefinite algebraic equations are ubiquitously needed in numerous computational fields, including multiphysics simulations, machine learning, and data science. Because of their robustness and accuracy, direct solvers are crucial components in building a scalable solver toolchain. In this chapter, we will review recent advances of sparse direct solvers along two axes: 1) reducing communication and latency costs in both task- and data-parallel settings, and 2) reducing computational complexity via low-rank and other compression techniques such as hierarchical matrix algebra. In addition to algorithmic principles, we also illustrate the key parallelization challenges and best practices to deliver high speed and reliability on modern heterogeneous parallel machines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. This manuscript is a survey chapter reviewing recent advances in sparse direct solvers for large-scale, ill-conditioned, and indefinite linear systems arising in multiphysics simulations, machine learning, and data science. It organizes the review along two axes—reducing communication and latency costs in task- and data-parallel settings, and reducing computational complexity via low-rank and hierarchical matrix compression techniques—while also covering parallelization challenges and best practices for heterogeneous machines. The central observation is that direct solvers remain crucial for robustness and accuracy in scalable solver toolchains.

Significance. If the coverage is balanced and up-to-date, the chapter could serve as a useful reference for practitioners needing robust direct methods. No new theorems, algorithms, empirical results, machine-checked proofs, or reproducible code are presented; significance therefore rests entirely on the quality and representativeness of the literature synthesis rather than on any novel technical contribution.

minor comments (2)
  1. The abstract states that the two axes constitute 'the key recent advances' but provides no explicit selection criteria or discussion of scope limitations; a short paragraph in the introduction justifying the focus relative to other directions (e.g., hybrid direct-iterative methods) would improve transparency.
  2. Because the work is a survey rather than a research article, the absence of any tables summarizing complexity, communication volume, or software availability for the reviewed packages is a missed opportunity for clarity; adding such a summary table would aid readers.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review and recommendation to accept. The assessment accurately captures the manuscript's scope as a literature synthesis on communication reduction and data-sparse techniques in sparse direct solvers.

Circularity Check

0 steps flagged

No significant circularity in survey paper

full rationale

This is a survey chapter reviewing existing advances in sparse direct solvers along axes of communication reduction and low-rank compression. No new theorems, derivations, predictions, fitted parameters, or empirical results are asserted. The strongest claim (direct solvers' robustness) is a standard observation, and the weakest assumption (editorial scope of reviewed techniques) is not a falsifiable premise internal to any derivation chain. No load-bearing steps reduce to self-definition, fitted inputs, or self-citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a review paper; no new free parameters, axioms, or invented entities are introduced by the authors.

pith-pipeline@v0.9.0 · 5644 in / 879 out tokens · 32603 ms · 2026-05-25T07:02:27.318358+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

231 extracted references · 231 canonical work pages · 6 internal anchors

  1. [1]

    The International Journal of High 34 Contents Performance Computing Applications38(5), 468–490 (2024)

    Abdelfattah, A., Beams, N., Carson, R., Ghysels, P., Kolev, T., Stitt, T., uro Vargas, A., Tomov, S., Dongarra, J.: MAGMA: Enabling exascale performance with accelerated BLAS and LAPACK for diverse GPU architectures. The International Journal of High 34 Contents Performance Computing Applications38(5), 468–490 (2024). doi:10.1177/10943420241261960

  2. [2]

    Neutrino Production via $e^-e^+$ Collision at $Z$-boson Peak

    Abdelfattah, A., Ghysels, P., Boukaram, W., Tomov, S., Li, X.S., Dongarra, J.: Addressing irregular patterns of matrix computations on GPUs and their impact on applications powered by sparse direct solvers. In: SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–14. IEEE (2022). doi:10.1109/SC41404.2022.00031

  3. [3]

    Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects. J. Phys.: Conf. Ser.180(1) (2009)

  4. [4]

    AHMED.https://www.wr.uni-bayreuth.de/en/software/ahmed/index.html

  5. [5]

    In: International Conference on High Performance Computing, pp

    Al-Harthi, N., Alomairy, R., Akbudak, K., Chen, R., Ltaief, H., Bagci, H., Keyes, D.: Solving acoustic boundary integral equations using high performance tile low-rank LU factorization. In: International Conference on High Performance Computing, pp. 209–229. Springer (2020)

  6. [6]

    The University of Texas at Austin (2019)

    Alger, N.V.: Data-scalable Hessian preconditioning for distributed parameter PDE-constrained inverse problems. The University of Texas at Austin (2019)

  7. [7]

    In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp

    Aliaga, J.I., Carratal ´a-S´aez, R., Kriemann, R., Quintana-Ort ´ı, E.S.: Task-parallel LU factorization of hierarchical matrices using OmpSs. In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1148–1157. IEEE (2017)

  8. [8]

    SIAM Journal on Scientific Computing42(5), A3397–A3426 (2020)

    Ambartsumyan, I., Boukaram, W., Bui-Thanh, T., Ghattas, O., Keyes, D., Stadler, G., Turkiyyah, G., Zampini, S.: Hierarchical matrix approximations of Hessians arising in inverse problems governed by PDEs. SIAM Journal on Scientific Computing42(5), A3397–A3426 (2020)

  9. [9]

    Journal of Scientific Computing57(3), 477–501 (2013)

    Ambikasaran, S., Darve, E.: An𝑂(𝑁log𝑁)fast direct solver for partial hierarchically semi-separable matrices: with application to radial basis function interpolation. Journal of Scientific Computing57(3), 477–501 (2013)

  10. [10]

    The Inverse Fast Multipole Method

    Ambikasaran, S., Darve, E.: The inverse fast multipole method. arXiv preprint arXiv:1407.1572 (2014)

  11. [11]

    IEEE transactions on pattern analysis and machine intelligence38(2), 252–265 (2015)

    Ambikasaran, S., Foreman-Mackey, D., Greengard, L., Hogg, D.W., O’Neil, M.: Fast direct methods for Gaussian processes. IEEE transactions on pattern analysis and machine intelligence38(2), 252–265 (2015)

  12. [12]

    Fast symmetric factorization of hierarchical matrices with applications

    Ambikasaran, S., O’Neil, M., Singh, K.R.: Fast symmetric factorization of hierarchical matrices with applications. arXiv preprint arXiv:1405.0223 (2014)

  13. [13]

    Amestoy, P., Ashcraft, C., Boiteau, O., Buttari, A., L ’Excellent, J.Y., Weisbecker, C.: Improving multifrontal methods by means of block low-rank representations. SIAM J. Sci. Comput.37(3), A1451–A1474 (2015)

  14. [14]

    SIAM Journal on Scientific Computing39(4), A1710–A1740 (2017)

    Amestoy, P., Buttari, A., L ’Excellent, J.Y., Mary, T.: On the complexity of the block low-rank multifrontal factorization. SIAM Journal on Scientific Computing39(4), A1710–A1740 (2017)

  15. [15]

    ACM Trans

    Amestoy, P.R., Buttari, A., L ’Excellent, J.Y., Mary, T.: Performance and scalability of the block low-rank multifrontal factorization on multicore architectures. ACM Trans. Math. Softw.45(1) (2019). doi:10.1145/3242094

  16. [16]

    Amestoy, P.R., Buttari, A., L ’Excellent, J.Y., Mary, T.A.: Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel block low-rank format. SIAM J. Sci. Comput.41(3), A1414–A1442 (2019). doi:10.1137/18M1182760

  17. [17]

    ACM Trans

    Amestoy, P.R., Duff, I.S., L ’excellent, J.Y., Li, X.S.: Analysis and comparison of two general sparse solvers for distributed memory computers. ACM Trans. Math. Softw.27(4), 388–421 (2001). doi:10.1145/504210.504212. URLhttps://doi.org/10.1145/ 504210.504212

  18. [18]

    In: International Workshop on Applied Parallel Computing, pp

    Amestoy, P.R., Duff, I.S., L ’Excellent, J.Y., Koster, J.: MUMPS: a general purpose distributed memory sparse solver. In: International Workshop on Applied Parallel Computing, pp. 121–130. Springer (2000)

  19. [19]

    SIAM Journal on Matrix Anal

    Amestoy, P.R., Duff, I.S., L ’Excellent, J.Y., Koster, J.: A fully asynchronous multi-frontal solver using distributed dynamic scheduling. SIAM Journal on Matrix Anal. Appl.23, 15–41 (2001). doi:10.1137/S0895479899358194

  20. [20]

    Aminfar, A., Ambikasaran, S., Darve, E.: A fast block low-rank dense solver with applications to finite-element matrices. J. Comput. Phys.304, 170–188 (2016)

  21. [21]

    Anderson, E., Saad, Y.: Solving sparse triangular linear systems on parallel computers. Int. J. High Speed Comput.1(1), 73–95 (1989). doi:10.1142/S0129053389000056. URLhttps://doi.org/10.1142/S0129053389000056

  22. [22]

    Advances in Computational Mathematics49, 1–46 (2021)

    Angleitner, N., Faustmann, M., Melenk, J.M.:H-inverses for RBF interpolation. Advances in Computational Mathematics49, 1–46 (2021). URLhttps://api.semanticscholar.org/CorpusID:237540970

  23. [23]

    ACM Transactions on Mathematical Software48(1), 2:1–2:33 (2022)

    Anzt, H., Cojean, T., Flegar, G., G ¨obel, F., Gr¨ utzmacher, T., Nayak, P., Ribizel, T., Tsai, Y.M., Quintana-Ort´ı, E.S.: Ginkgo: a modern linear operator algebra framework for high performance computing. ACM Transactions on Mathematical Software48(1), 2:1–2:33 (2022). doi:10.1145/3480935. URLhttps://doi.org/10.1145/3480935

  24. [24]

    SIAM Journal on Matrix Analysis and Applications42(2), 990–1010 (2021)

    Ashcraft, C., Buttari, A., Mary, T.: Block low-rank matrices with shared bases: potential and limitations of the BLR 2 format. SIAM Journal on Matrix Analysis and Applications42(2), 990–1010 (2021)

  25. [25]

    ASKIT.https://padas.oden.utexas.edu/libaskit

  26. [26]

    Starpu: A unified platform for task scheduling on heterogeneous multicore architectures,

    Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience23(2), 187–198 (2011). doi:10.1002/cpe.1631. URLhttps: //inria.hal.science/inria-00550877

  27. [27]

    Acta Numerica23, 1–155 (2014)

    Ballard, G., Carson, E., Demmel, J., Hoemmen, M., Knight, N., Schwartz, O.: Communication lower bounds and optimal algorithms for numerical linear algebra. Acta Numerica23, 1–155 (2014). doi:10.1017/S0962492914000038

  28. [28]

    Cambridge University Press (2025)

    Ballard, G., Kolda, T.G.: Tensor decompositions for data science. Cambridge University Press (2025)

  29. [29]

    SIAM Journal on Scientific Computing37(4), B519–B542 (2015)

    Barnett, A., Wu, B., Veerapaneni, S.: Spectrally accurate quadratures for evaluation of layer potentials close to the boundary for the 2D Stokes and Laplace equations. SIAM Journal on Scientific Computing37(4), B519–B542 (2015)

  30. [30]

    Numerische Mathematik86(4), 565–589 (2000)

    Bebendorf, M.: Approximation of boundary element matrices. Numerische Mathematik86(4), 565–589 (2000). doi:10.1007/PL00005410

  31. [31]

    Mathematical Methods in the Applied Sciences29(14), 1721–1747 (2006)

    Bebendorf, M., Grzhibovskis, R.: Accelerating Galerkin BEM for linear elasticity using adaptive cross approximation. Mathematical Methods in the Applied Sciences29(14), 1721–1747 (2006). doi:10.1002/mma.759

  32. [32]

    Numerische Mathematik95(1), 1–28 (2003) Contents 35

    Bebendorf, M., Hackbusch, W.: Existence ofH-matrix approximants to the inverse FE-matrix of elliptic operators with L∞- coefficients. Numerische Mathematik95(1), 1–28 (2003) Contents 35

  33. [33]

    Numerische Mathematik 121(4), 609–635 (2012)

    Bebendorf, M., Venn, R.: Constructing nested bases approximations from the entries of non-local operators. Numerische Mathematik 121(4), 609–635 (2012)

  34. [34]

    In: IEEE International Parallel and Distributed Processing Symposium, pp

    Belli, R., Hoefler, T.: Notified access: Extending remote memory access programming models for producer-consumer synchronization. In: IEEE International Parallel and Distributed Processing Symposium, pp. 871–881. IEEE (2015)

  35. [35]

    Bendoraityte, J., B ¨orm, S.: DistributedH 2-matrices for non-local operators. Comput. Vis. Sci11, 237–249 (2008)

  36. [36]

    Survey of Nearest Neighbor Techniques

    Bhatia, N., Vandana: Survey of nearest neighbor techniques (2010). URLhttps://arxiv.org/abs/1007.0085

  37. [37]

    CRC press (2018)

    Birdsall, C.K., Langdon, A.B.: Plasma physics via computer simulation. CRC press (2018)

  38. [38]

    SIAM, Philadelphia (1997).http://www.netlib.org/scalapack

    Blackford, L.S., Choi, J., D’ Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK users’ guide. SIAM, Philadelphia (1997).http://www.netlib.org/scalapack

  39. [39]

    B ¨orm Steffen Grasedyck, L., Hackbusch, W.: Introduction to hierarchical matrices with application. Eng. Anal. Bound. Elem.27, 405–422 (2003)

  40. [40]

    Computing and Visualization in Science16, 247–258 (2013)

    B ¨orm, S., Reimer, K.: Efficient arithmetic operations for rank-structured matrices based on hierarchical low-rank updates. Computing and Visualization in Science16, 247–258 (2013). doi:10.1007/s00791-014-0236-7

  41. [41]

    Parallel Comput.38(1–2), 37–51 (2012)

    Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.: DAGuE: A generic distributed DAG engine for High Performance Computing. Parallel Comput.38(1–2), 37–51 (2012). doi:10.1016/j.parco.2011.10.003. URLhttps: //icl.utk.edu/parsec/

  42. [42]

    The International Journal of High Performance Computing Applications38(6), 585–598 (2024)

    Boukaram, W., Hong, Y., Liu, Y., Shi, T., Li, X.S.: Batched sparse direct solver design and evaluation in SuperLU DIST. The International Journal of High Performance Computing Applications38(6), 585–598 (2024). doi:10.1177/10943420241268200

  43. [43]

    arXiv preprint arXiv:2509.11152 (2025)

    Boukaram, W., Keyes, D., Li, S., Liu, Y., Turkiyyah, G.: Linear complexityH 2 direct solver for fine-grained parallel architectures. arXiv preprint arXiv:2509.11152 (2025)

  44. [44]

    In: The 26th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2025)

    Boukaram, W., Liu, Y., Ghysels, P., Li, X.S.: Adaptive Sketching Based Construction of H2 Matrices on GPUs. In: The 26th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2025). IEEE (2025). Best Paper Award

  45. [45]

    ACM Transactions on Mathematical Software45(1), 1–28 (2019)

    Boukaram, W.H., Turkiyyah, G., Keyes, D.: Hierarchical matrix operations on GPUs: Matrix-vector multiplication and compression. ACM Transactions on Mathematical Software45(1), 1–28 (2019). doi:10.1145/3232850

  46. [46]

    Bradley, A.M.: A Hybrid Multithreaded Direct Sparse Triangular Solver, pp. 13–22. doi:10.1137/1.9781611974690.ch2

  47. [47]

    B ¨orm, S.: DirectionalH 2-matrix compression for high-frequency problems. Numer. Linear Algebra Appl.24(6), e2112 (2017). doi:10.1002/nla.2112

  48. [48]

    SIAM Journal on Matrix Analysis and Applications41(2), 715–746 (2020)

    Cambier, L., Chen, C., Boman, E.G., Rajamanickam, S., Tuminaro, R.S., Darve, E.: An algebraic sparsified nested dissection algorithm using low-rank approximations. SIAM Journal on Matrix Analysis and Applications41(2), 715–746 (2020)

  49. [49]

    SIAM Multiscale Model

    Cand `es, E., Demanet, L., Ying, L.: A fast butterfly algorithm for the computation of Fourier integral operators. SIAM Multiscale Model. Simul.7(4), 1727–1750 (2009)

  50. [50]

    In: 2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools), pp

    Cao, Q., Pei, Y., Herault, T., Akbudak, K., Mikhalev, A., Bosilca, G., Ltaief, H., Keyes, D., Dongarra, J.: Performance analysis of tile low-rank Cholesky factorization using PARSEC instrumentation tools. In: 2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools), pp. 25–32. IEEE (2019)

  51. [51]

    An Efficient Solver for Sparse Linear Systems Based on Rank-Structured Cholesky Factorization

    Chadwick, J.N., Bindel, D.S.: An efficient solver for sparse linear systems based on rank-structured Cholesky factorization. arXiv preprint arXiv:1507.05593 (2015)

  52. [52]

    SIAM Journal on Matrix Analysis and Applications29(1), 67–81 (2007)

    Chandrasekaran, S., Dewilde, P., Gu, M., Lyons, W., Pals, T.: A fast solver for HSS representations via sparse matrices. SIAM Journal on Matrix Analysis and Applications29(1), 67–81 (2007)

  53. [53]

    SIAM Journal on Matrix Anal

    Chandrasekaran, S., Dewilde, P., Gu, M., Somasunderam, N.: On the numerical rank of the off-diagonal blocks of Schur complements of discretized elliptic PDEs. SIAM Journal on Matrix Anal. Appl.31, 2261–2290 (2010). doi:10.1137/090775932

  54. [54]

    SIAM Journal on Matrix Analysis and Applications28(3), 603–622 (2006)

    Chandrasekaran, S., Gu, M., Pals, T.: A fast ULV decomposition solver for hierarchically semiseparable representations. SIAM Journal on Matrix Analysis and Applications28(3), 603–622 (2006)

  55. [55]

    In: 2020 IEEE International parallel and distributed processing symposium (IPDPS), pp

    Ch ´avez, G., Liu, Y., Ghysels, P., Li, X.S., Rebrova, E.: Scalable and memory-efficient kernel ridge regression. In: 2020 IEEE International parallel and distributed processing symposium (IPDPS), pp. 956–965. IEEE (2020)

  56. [56]

    In: SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, pp

    Chen, C., Martinsson, P.G.: Solving linear systems on a GPU with hierarchically off-diagonal low-rank approximations. In: SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–15. IEEE (2022)

  57. [57]

    In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp

    Chenhan, D.Y., March, W.B., Biros, G.: An𝑛log𝑛parallel fast direct solver for kernel matrices. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 886–896. IEEE (2017)

  58. [58]

    In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp

    Chenhan, D.Y., March, W.B., Xiao, B., Biros, G.: INV-ASKIT: a parallel fast direct solver for kernel matrices. In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 161–171. IEEE (2016)

  59. [59]

    Claus, L., Ghysels, P., Boukaram, W.H., Li, X.S.: A graphics processing unit accelerated sparse direct solver and preconditioner with block low rank compression

  60. [60]

    ACM Transactions on Mathematical Software49(3), 1–28 (2023)

    Claus, L., Ghysels, P., Liu, Y., Nhan, T.A., Thirumalaisamy, R., Bhalla, A.P.S., Li, S.: Sparse approximate multifrontal factorization with composite compression methods. ACM Transactions on Mathematical Software49(3), 1–28 (2023)

  61. [61]

    Applied and Computational Harmonic Analysis38(2), 284–317 (2015)

    Corona, E., Martinsson, P.G., Zorin, D.: An𝑂(𝑁)direct solver for integral equations on the plane. Applied and Computational Harmonic Analysis38(2), 284–317 (2015)

  62. [62]

    Corona, E., Rahimian, A., Zorin, D.: A tensor-train accelerated solver for integral equations in complex geometries. J. Comput. Phys. 334, 145–169 (2017)

  63. [63]

    SIAM Journal on Scientific Computing39(3), A761–A796 (2017)

    Coulier, P., Pouransari, H., Darve, E.: The inverse fast multipole method: using a fast approximate direct solver as a preconditioner for dense linear systems. SIAM Journal on Scientific Computing39(3), A761–A796 (2017)

  64. [64]

    3—an unsymmetric-pattern multifrontal method

    Davis, T.A.: Algorithm 832: UMFPACK V4. 3—an unsymmetric-pattern multifrontal method. ACM Trans. Math. Softw.30(2), 196–199 (2004)

  65. [65]

    In: 2016 ieee international parallel and distributed processing symposium (ipdps), pp

    Di, S., Cappello, F.: Fast error-bounded lossy HPC data compression with SZ. In: 2016 ieee international parallel and distributed processing symposium (ipdps), pp. 730–739. IEEE (2016) 36 Contents

  66. [66]

    DINFMM.https://github.com/Tianyu-Liang/DINFMM/tree/main

  67. [67]

    In: Proceedings of the 2021 SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), pp

    Ding, N., Liu, Y., Williams, S., Li, X.S.: A message-driven, multi-GPU parallel sparse triangular solver. In: Proceedings of the 2021 SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), pp. 147–159. doi:10.1137/1.9781611976830.14

  68. [68]

    In: Proceedings of the 2020 SIAM Conference on Parallel Processing for Scientific Com puting (PP), pp

    Ding, N., Williams, S., Liu, Y., Li, X.S.: Leveraging one-sided communication for sparse triangular solvers. In: Proceedings of the 2020 SIAM Conference on Parallel Processing for Scientific Com puting (PP), pp. 93–105. doi:10.1137/1.9781611976137.9

  69. [69]

    ACM Trans

    Duff, I.S.: Ma57—a code for the solution of sparse symmetric definite and indefinite systems. ACM Trans. Math. Softw.30(2), 118–144 (2004). doi:10.1145/992200.992202. URLhttps://doi.org/10.1145/992200.992202

  70. [70]

    Engquist, B., Ying, L.: Fast directional multilevel algorithms for oscillatory kernels. SIAM J. Sci. Comput.29(4), 1710–1737 (2007)

  71. [71]

    Communications on Pure and Applied Mathematics71(11), 2220–2274 (2018)

    Engquist, B., Zhao, H.: Approximate separability of the Green’s function of the Helmholtz equation in the high frequency limit. Communications on Pure and Applied Mathematics71(11), 2220–2274 (2018)

  72. [72]

    Mathematics of Computation85(297), 119–152 (2016)

    Faustmann, M., Melenk, J., Praetorius, D.: Existence ofH-matrix approximants to the inverses of BEM matrices: The simple-layer operator. Mathematics of Computation85(297), 119–152 (2016)

  73. [73]

    In: Spectral and High Order Methods for Partial Differential Equations-ICOSAHOM 2012: Selected papers from the ICOSAHOM conference, June 25-29, 2012, Gammarth, Tunisia, pp

    Faustmann, M., Melenk, J.M., Praetorius, D.: A new proof for existence of H-matrix approximants to the inverse of FEM matrices: the Dirichlet problem for the Laplacian. In: Spectral and High Order Methods for Partial Differential Equations-ICOSAHOM 2012: Selected papers from the ICOSAHOM conference, June 25-29, 2012, Gammarth, Tunisia, pp. 249–259. Spring...

  74. [74]

    Ima Journal of Numerical Analysis37, 1211–1244 (2015)

    Faustmann, M., Melenk, J.M., Praetorius, D.: Existence ofH-matrix approximants to the inverse of BEM matrices: the hyper- singular integral operator. Ima Journal of Numerical Analysis37, 1211–1244 (2015). URLhttps://api.semanticscholar.org/ CorpusID:116945940

  75. [75]

    Faustmann, M., Melenk, J.M., Praetorius, D.:H-matrix approximability of the inverses of FEM matrices. Numer. Math.131(4), 615–642 (2015). doi:10.1007/s00211-015-0706-9. URLhttps://doi.org/10.1007/s00211-015-0706-9

  76. [76]

    In: Workshop on Fast Direct Solvers

    Faverge, M., Pichon, G., Ramet, P., Roman, J.: On the use of H-matrix arithmetic in PaStiX: a preliminary study. In: Workshop on Fast Direct Solvers. Toulouse, France (2015). URLhttps://inria.hal.science/hal-01187882

  77. [77]

    Communications in Mathematical Sciences18(1), 91–108 (2020)

    Feliu-Fab `a, J., Ho, K.L., Ying, L.: Recursively preconditioned hierarchical interpolative factorization for elliptic partial differential equations. Communications in Mathematical Sciences18(1), 91–108 (2020)

  78. [78]

    Feliu-Fab `a, J., Ying, L.: Approximate inversion of discrete Fourier integral operators. J. Comput. Phys.446, 110654 (2021)

  79. [79]

    Electron

    Feng, Y., Xiao, J., Gu, M.: Flip-flop spectrum-revealing QR factorization and its applications to singular value decomposition. Electron. Trans. Numer. Anal.51, 469–494 (2019). doi:10.1553/etna vol51s469

  80. [80]

    In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’23

    Fu, X., Zhang, B., Wang, T., Li, W., Lu, Y., Yi, E., Zhao, J., Geng, X., Li, F., Zhang, J., Jin, Z., Liu, W.: PanguLU: a scalable regular two-dimensional block-cyclic sparse direct solver on distributed heterogeneous systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’23. Associati...

Showing first 80 references.