pith. sign in

arxiv: 2309.14085 · v3 · submitted 2023-09-25 · 🧮 math.NA · cs.NA· math-ph· math.MP

New Algebraic Fast Algorithms for N-body Problems in Two and Three Dimensions

Pith reviewed 2026-05-24 06:38 UTC · model grok-4.3

classification 🧮 math.NA cs.NAmath-phmath.MP
keywords hierarchical matricesH2 matricesN-body problemsmatrix-vector productweak admissibilityalgebraic low-rank approximationACAnested bases
0
0 comments X

The pith

Weak admissibility in higher dimensions enables new nested and semi-nested algebraic hierarchical matrix algorithms that match standard H2 performance for fast N-body matrix-vector products.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents two algebraic multilevel algorithms for fast matrix-vector multiplication on N-body kernel matrices in two and three dimensions: the fully nested efficient H²_* and the semi-nested (H² + H)_*. Both rely on a weak admissibility condition that treats far-field and vertex-sharing clusters as admissible, allowing algebraic low-rank approximations via ACA or NCA to build nested bases in a black-box manner. The methods are applied to discretized integral equations and radial basis function interpolation, and numerical tests in the same C++ environment show they require memory and time comparable to the NCA-based standard H² matrix algorithm while improving on non-nested H matrix approaches. A sympathetic reader would care because the nested structure reduces storage and arithmetic relative to earlier hierarchical formats without sacrificing algebraic generality.

Core claim

By adopting the weak admissibility condition in which admissible clusters comprise both far-field and vertex-sharing pairs, the efficient H²_* algorithm constructs a fully nested hierarchical representation and the (H² + H)_* algorithm constructs a semi-nested representation; both are built entirely from algebraic compressions such as ACA and NCA, yielding fast matrix-vector products whose measured memory and runtime are competitive with the NCA-based standard H² matrix algorithm on the tested kernels.

What carries the argument

The weak admissibility condition (admissible clusters are far-field and vertex-sharing), which enables the construction of nested and semi-nested bases via purely algebraic low-rank approximations.

If this is right

  • The nested bases reduce the cost of repeated matrix-vector products relative to non-nested H matrices while preserving algebraic black-box applicability.
  • The same weak-admissibility construction yields fast iterative solvers for dense linear systems arising from boundary integral equations.
  • The algorithms extend directly to radial basis function interpolation without kernel-specific analytic expansions.
  • Comparative timings performed in identical C++ implementations establish that both new variants achieve memory and runtime within a small factor of the standard H2 reference across multiple kernels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The semi-nested variant may trade a modest increase in storage for simpler basis construction when full nesting is unnecessary.
  • Because the method is purely algebraic, it could be applied to kernels that lack closed-form low-rank expansions, provided the weak-admissibility blocks remain numerically low-rank.
  • The vertex-sharing part of the admissibility condition may allow coarser partitions in three dimensions than strong admissibility, potentially improving parallel scalability.

Load-bearing premise

The low-rank approximations obtained under the weak admissibility condition remain effective enough to produce nested bases whose storage and arithmetic costs stay competitive with standard H2 methods.

What would settle it

For a 3D kernel on a mesh with N greater than 10^5, measure whether the ranks required by vertex-sharing admissible blocks grow faster than those of a standard strong-admissibility H2 construction, causing total memory or MVP time to exceed the NCA-based H2 reference by more than a small constant factor.

read the original abstract

We present two new algebraic multilevel hierarchical matrix algorithms to perform fast matrix-vector product (MVP) for $N$-body problems in $d$ dimensions, namely efficient $\mathcal{H}^2_{*}$ (fully nested algorithm, i.e., $\mathcal{H}^2$ matrix-like algorithm) and $(\mathcal{H}^2 + \mathcal{H})_{*}$ (semi-nested algorithm, i.e., cross of $\mathcal{H}^2$ and $\mathcal{H}$ matrix-like algorithms). The efficient $\mathcal{H}^2_{*}$ and $(\mathcal{H}^2 + \mathcal{H})_{*}$ hierarchical representations are based on our recently introduced weak admissibility condition in higher dimensions, where the admissible clusters are the far-field and the vertex-sharing clusters. Due to the use of nested form of the bases, the proposed hierarchical matrix algorithms are more efficient than the non-nested algorithms ($\mathcal{H}$ matrix algorithms). We rely on purely algebraic low-rank approximation techniques (e.g., ACA and NCA) and develop both algorithms in a black-box fashion. Another noteworthy contribution of this article is that we perform a comparative study of the proposed algorithms with different algebraic (NCA or ACA-based compression) fast MVP algorithms in $2$D and $3$D. The fast algorithms are tested on various kernel matrices and applied to get fast iterative solutions of a dense linear system arising from the discretized integral equations and radial basis function interpolation. Notably, all the algorithms are developed in a similar fashion in $\texttt{C++}$ and tested within the same environment, allowing for meaningful comparisons. The numerical results demonstrate that the proposed algorithms are competitive to the NCA-based standard $\mathcal{H}^2$ matrix algorithm with respect to the memory and time. The C++ implementation of the proposed algorithms is available at https://github.com/riteshkhan/H2weak/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces two new algebraic multilevel hierarchical matrix algorithms, H²_* (fully nested) and (H² + H)_* (semi-nested), for fast matrix-vector products in N-body problems in 2D and 3D. These rely on a weak admissibility condition (far-field plus vertex-sharing clusters) and algebraic low-rank approximations via ACA/NCA in a black-box fashion. The nested bases are claimed to yield efficiency gains over non-nested H-matrices, with numerical tests on kernel matrices demonstrating competitiveness to the standard NCA-based H² algorithm in memory and time; the methods are applied to integral equations and RBF interpolation, with all implementations in the same C++ environment and public code released.

Significance. If the competitiveness holds, the work supplies practical algebraic alternatives to standard H² methods that exploit nesting under weak admissibility while enabling direct comparisons via a shared code base. The public C++ implementation on GitHub is a clear strength for reproducibility and verification of the reported timings and memory figures.

major comments (2)
  1. [Numerical experiments section] Numerical experiments section: the central claim of competitiveness in 3D rests on vertex-sharing blocks admitting effective low-rank approximations that preserve nesting efficiency; without explicit reporting of average or maximum ranks (or rank growth with cluster size) for these blocks on the tested 3D kernels (e.g., 1/|x-y|), it is impossible to confirm that the extra admissible interactions do not inflate storage or MVP cost enough to offset the reported gains.
  2. [Algorithm description] Algorithm description (likely §3–4): the construction of the nested bases under the weak admissibility condition must be shown to remain O(N) or O(N log N) when vertex-sharing clusters are included; the current algebraic development does not address whether the transfer matrices or basis sizes grow with the additional admissible blocks in 3D.
minor comments (1)
  1. A table summarizing the kernels, dimensions, and observed ranks for each method would improve the clarity of the comparative study.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for major revision. We address the two major comments point by point below, proposing revisions to strengthen the presentation of our results on the weak-admissibility-based algebraic H² methods.

read point-by-point responses
  1. Referee: [Numerical experiments section] Numerical experiments section: the central claim of competitiveness in 3D rests on vertex-sharing blocks admitting effective low-rank approximations that preserve nesting efficiency; without explicit reporting of average or maximum ranks (or rank growth with cluster size) for these blocks on the tested 3D kernels (e.g., 1/|x-y|), it is impossible to confirm that the extra admissible interactions do not inflate storage or MVP cost enough to offset the reported gains.

    Authors: We agree that explicit reporting of ranks for the vertex-sharing admissible blocks would strengthen the evidence for the competitiveness claim in 3D. In the revised manuscript we will add a new table (or subsection) in the numerical experiments section that reports the observed average and maximum ranks, as well as rank growth trends with cluster size, separately for far-field and vertex-sharing blocks on the tested 3D kernels including 1/|x-y|. This addition will allow readers to directly assess whether the extra admissible interactions inflate storage or MVP cost. revision: yes

  2. Referee: [Algorithm description] Algorithm description (likely §3–4): the construction of the nested bases under the weak admissibility condition must be shown to remain O(N) or O(N log N) when vertex-sharing clusters are included; the current algebraic development does not address whether the transfer matrices or basis sizes grow with the additional admissible blocks in 3D.

    Authors: The nested bases and transfer matrices are constructed algebraically via the same bottom-up procedure used in standard H² methods (ACA/NCA on admissible blocks), now applied uniformly to both far-field and vertex-sharing blocks under the weak admissibility condition. Because the hierarchical partitioning and nesting structure are unchanged, the number of admissible blocks per cluster remains bounded by the geometry of the cluster tree, so the asymptotic cost of basis construction stays O(N log N). We will add a short clarifying paragraph in §3–4 that makes this reasoning explicit and notes that the empirical 3D timings already confirm the expected scaling; a full formal proof is beyond the scope of the present algebraic black-box development but follows the same counting arguments as classical H² analysis. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper develops new algebraic H²_* and (H² + H)_* algorithms from first principles using ACA/NCA low-rank approximations and a weak admissibility condition defined in prior independent work by the same authors. The central claims rest on direct numerical comparisons of memory and runtime against the standard NCA-based H² algorithm, all implemented and timed in the same C++ environment with public code. No derivation step reduces by construction to its inputs, no parameter is fitted and then relabeled as a prediction, and no uniqueness theorem or ansatz is smuggled via self-citation; the work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the weak admissibility condition enables low-rank structure in higher dimensions for the kernels considered, along with standard algebraic compression techniques.

axioms (1)
  • domain assumption Weak admissibility condition holds for admissible clusters (far-field and vertex-sharing) in 2D and 3D for the tested kernels.
    Invoked as the basis for the new hierarchical representations in the abstract.

pith-pipeline@v0.9.0 · 5882 in / 1182 out tokens · 25653 ms · 2026-05-24T06:38:19.420488+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 1 internal anchor

  1. [1]

    Ambikasaran and E

    S. Ambikasaran and E. Dar ve , An O (N log N ) fast direct solver for partial hierarchically semi-separable matrices: With application to radial basis function interpolation, Journal of Scientific Computing, 57 (2013), pp. 477–501

  2. [2]

    Ambikasaran, K

    S. Ambikasaran, K. R. Singh, and S. S. Sankaran , HODLRlib: A library for hierarchical matrices, Journal of Open Source Software, 4 (2019), p. 1167. 36 RITESH KHAN, SIVARAM AMBIKASARAN

  3. [3]

    Barnes and P

    J. Barnes and P. Hut , A hierarchicalO(N log N ) force-calculation algorithm, nature, 324 (1986), pp. 446–449

  4. [4]

    Bebendorf, Hierarchical Matrices: A Means to Efficiently Solve Elliptic Boundary Value Problems, Springer, 2008

    M. Bebendorf, Hierarchical Matrices: A Means to Efficiently Solve Elliptic Boundary Value Problems, Springer, 2008

  5. [5]

    Bebendorf and S

    M. Bebendorf and S. Rjasanow , Adaptive low-rank approximation of collocation matrices, Computing, 70 (2003), pp. 1–24

  6. [6]

    Bebendorf and R

    M. Bebendorf and R. Venn , Constructing nested bases approximations from the entries of non-local operators, Nu- merische Mathematik, 121 (2012), pp. 609–635

  7. [7]

    Börm, Construction of data-sparseH2-matrices by hierarchical compression, SIAM Journal on Scientific Computing, 31 (2009), pp

    S. Börm, Construction of data-sparseH2-matrices by hierarchical compression, SIAM Journal on Scientific Computing, 31 (2009), pp. 1820–1839

  8. [8]

    S. Börm, L. Grasedyck, and W. Hackbusch , Hierarchical matrices, Lecture notes, 21 (2003), p. 2003

  9. [9]

    S. Börm, L. Grasedyck, and W. Hackbusch , Introduction to hierarchical matrices with applications, Engineering analysis with boundary elements, 27 (2003), pp. 405–422

  10. [10]

    Bozzini, L

    M. Bozzini, L. Lenarduzzi, M. Rossini, and R. Schaback , Interpolation with variably scaled kernels, IMA Journal of Numerical Analysis, 35 (2015), pp. 199–219

  11. [11]

    Chandrasekaran, P

    S. Chandrasekaran, P. Dewilde, M. Gu, W. Lyons, and T. Pals , A fast solver for HSS representations via sparse matrices, SIAM Journal on Matrix Analysis and Applications, 29 (2007), pp. 67–81

  12. [12]

    Chandrasekaran and I

    S. Chandrasekaran and I. C. Ipsen , On rank-revealing factorisations, SIAM Journal on Matrix Analysis and Appli- cations, 15 (1994), pp. 592–622

  13. [13]

    Fong and E

    W. Fong and E. Dar ve , The black-box fast multipole method, Journal of Computational Physics, 228 (2009), pp. 8712– 8725

  14. [14]

    Gillman, P

    A. Gillman, P. M. Young, and P.-G. Martinsson , A direct solver withO(N ) complexity for integral equations on one-dimensional domains, Frontiers of Mathematics in China, 7 (2012), pp. 217–247

  15. [15]

    Grasedyck and W

    L. Grasedyck and W. Hackbusch , Construction and arithmetics of h-matrices, Computing, 70 (2003), pp. 295–334

  16. [16]

    Gray and A

    A. Gray and A. Moore , N-body’problems in statistical learning, Advances in neural information processing systems, 13 (2000)

  17. [17]

    Greengard and V

    L. Greengard and V. Rokhlin , A fast algorithm for particle simulations, Journal of computational physics, 73 (1987), pp. 325–348

  18. [18]

    Greengard and V

    L. Greengard and V. Rokhlin , A new version of the fast multipole method for the laplace equation in three dimensions, Acta numerica, 6 (1997), pp. 229–269

  19. [19]

    Gujjula and S

    V. Gujjula and S. Ambikasaran , A new nested cross approximation, arXiv preprint arXiv:2203.14832, (2022)

  20. [20]

    Hackbusch, Hierarchical Matrices: Algorithms and Analysis, vol

    W. Hackbusch, Hierarchical Matrices: Algorithms and Analysis, vol. 49, Springer, 12 2015, https://doi.org/10.1007/ 978-3-662-47324-5

  21. [21]

    Hackbusch, B

    W. Hackbusch, B. N. Khoromskij, and R. Kriemann , Hierarchical matrices based on a weak admissibility criterion, Computing, 73 (2004), pp. 207–243

  22. [22]

    Halko, P.-G

    N. Halko, P.-G. Martinsson, and J. A. Tropp , Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM review, 53 (2011), pp. 217–288

  23. [23]

    K. L. Ho and L. Ying , Hierarchical interpolative factorization for elliptic operators: integral equations, arXiv preprint arXiv:1307.2666, (2013)

  24. [24]

    V. A. Kandappan, V. Gujjula, and S. Ambikasaran , HODLR2D: A new class of hierarchical matrices, SIAM Journal on Scientific Computing, 45 (2023), pp. A2382–A2408

  25. [25]

    R. Khan, V. A. Kandappan, and S. Ambikasaran , HODLRdD: A new black-box fast algorithm forN-body problems in d-dimensions with guaranteed error bounds: Applications to integral equations and support vector machines, Journal of Computational Physics, 501 (2024), p. 112786, https://doi.org/10.1016/j.jcp.2024.112786

  26. [26]

    Malhotra and G

    D. Malhotra and G. Biros , Pvfmm: A parallel kernel independent fmm for particle and volume potentials, Commu- nications in Computational Physics, 18 (2015), pp. 808–830

  27. [27]

    Massei, L

    S. Massei, L. Robol, and D. Kressner , Hierarchical adaptive low-rank format with applications to discretized partial differential equations, Numerical Linear Algebra with Applications, 29 (2022), p. e2448

  28. [28]

    Pan, On the existence and computation of rank-revealing lu factorizations, Linear Algebra and its Applications, 316 (2000), pp

    C.-T. Pan, On the existence and computation of rank-revealing lu factorizations, Linear Algebra and its Applications, 316 (2000), pp. 199–222

  29. [29]

    C. E. Rasmussen, Gaussian Processes in Machine Learning, Springer Berlin Heidelberg, Berlin, Heidelberg, 2004, pp. 63– 71, https://doi.org/10.1007/978-3-540-28650-9_4

  30. [30]

    Saad and M

    Y. Saad and M. H. Schultz ,Gmres: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM Journal on scientific and statistical computing, 7 (1986), pp. 856–869

  31. [31]

    A. K. Saibaba, S. Ambikasaran, J. Y. Li, P. K. Kitanidis, and E. F. Dar ve , Application of hierarchical matrices to linear inverse problems in geostatistics, Oil & Gas Science and Technology–Revue d’IFP Energies nouvelles, 67 (2012), pp. 857–875

  32. [32]

    J. Xia, S. Chandrasekaran, M. Gu, and X. S. Li , Fast algorithms for hierarchically semiseparable matrices, Numer- ical Linear Algebra with Applications, 17 (2010), pp. 953–976

  33. [33]

    C. Yang, R. Duraisw ami, N. A. Gumerov, and L. Da vis , Improved fast gauss transform and efficient kernel density estimation, in Computer Vision, IEEE International Conference on, vol. 2, IEEE Computer Society, 2003, pp. 464–464

  34. [34]

    L. Ying, G. Biros, and D. Zorin , A kernel-independent adaptive fast multipole algorithm in two and three dimensions, Journal of Computational Physics, 196 (2004), pp. 591–626

  35. [35]

    Yokota, H

    R. Yokota, H. Ibeid, and D. Keyes , Fast multipole method as a matrix-free hierarchical low-rank approximation, in International Workshop on Eigenvalue Problems: Algorithms, Software and Applications in Petascale Computing, Springer, 2015, pp. 267–286

  36. [36]

    Y. Zhao, D. Jiao, and J. Mao , Fast nested cross approximation algorithm for solving large-scale electromagnetic problems, IEEE Transactions on Microwave Theory and Techniques, 67 (2019), pp. 3271–3283. NEW ALGEBRAIC FAST ALGORITHMS FORN-BODY PROBLEMS IN TWO AND THREE DIMENSIONS 37 Appendix A. Purely algebraicH2√ d(b) algorithm [36, 19]. In this section, ...

  37. [37]

    (Appendix A.1)

    Initialization of H2√ d(b) representation. (Appendix A.1)

  38. [38]

    (Appendix A.2) A.1

    Calculation of the potential (MVP). (Appendix A.2) A.1. Initialization of H2√ d(b) representation.In the B2T pivot selection, the pivots of a cluster at a parent level are obtained from the pivots at its child level. Therefore, one needs to traverse the2d uniform tree from bottom to top direction (starting at the leaf level) to find pivots of all clusters...

  39. [41]

    L2L ˜UXcX , X ∈ parent (Xc) / L2P (UX ). A.2. Calculation of the potential (MVP).The procedure to compute the potential is as follows:

  40. [42]

    Upward traversal: • Particles to multipole(P2M) at leaf levelκ : For all leaf clustersX, calculate v(κ) X = V ∗ X q(κ) X 38 RITESH KHAN, SIVARAM AMBIKASARAN • Multipole to multipole(M2M) at non-leaf level: For all non-leafX clusters, calculate v(l) X = X Xc∈child(X) ˜V ∗ XX c v(l+1) Xc , κ − 1 ≥ l ≥ 2

  41. [43]

    Transverse traversal: • Multipole to local(M2L) at all levels and for all clusters: For all clusterX, calculate u(l) X = X Y ∈IL√ d(X) TX,Y v(l) Y , 2 ≤ l ≤ κ

  42. [44]

    Downward traversal: • Local to local(L2L) at non-leaf level: For all non-leaf clustersX, calculate u(l+1) Xc := u(l+1) Xc + ˜UXcX u(l) X , 2 ≤ l ≤ κ − 1 and X ∈ parent (Xc) • Local to particles(L2P) at leaf levelκ : For all leaf clustersX, calculate ϕ(κ) X = UX u(κ) X Near-field potential and the total potential at leaf level.For each leaf clusterX, we ad...

  43. [45]

    (Appendix B.1)

    Initialization of H2√ d(t) representation. (Appendix B.1)

  44. [46]

    (Appendix B.2) B.1

    Calculation of the potential (MVP). (Appendix B.2) B.1. Initialization of H2√ d(t) representation. In the T2B pivot selection, the pivots of a cluster at a child level are obtained from its own index set and the pivots at its parent level. Therefore, one needs to traverse the 2d uniform tree from top to bottom direction to find pivots of all clusters. The...

  45. [48]

    M2L (TX,Y ) , Y ∈ IL √ d (X)

  46. [49]

    L2L ˜UXcX , X ∈ parent (Xc) / L2P (UX ). B.2. Calculation of the potential (MVP).We perform the potential (MVP) calculation by follow- ing upward, transverse and downward tree traversal, which is similar to Appendix A.2. We refer the reader to Algorithm1 of [36] for more details. Appendix C. H2 ∗(t): T2B NCA on ourweak admissibility condition. This sectio...

  47. [50]

    (Appendix C.1)

    Initialization of H2 ∗(t) representation. (Appendix C.1)

  48. [51]

    (Appendix C.2) C.1

    Calculation of the potential (MVP). (Appendix C.2) C.1. Initialization of H2 ∗(t) representation. In the T2B pivot selection, the pivots of a cluster at a child level are obtained from its own index set and the pivots at its parent level. Therefore, one needs to traverse the 2d uniform tree from top to bottom direction to find pivots of all clusters. The ...

  49. [52]

    P2M (VX ) / M2M ˜VXX c , Xc ∈ child (X)

  50. [53]

    M2L (TX,Y ) , Y ∈ IL ∗ (X)

  51. [54]

    L2L ˜UXcX , X ∈ parent (Xc) / L2P (UX ). C.2. Calculation of the potential (MVP).The procedure to compute the potential is as follows:

  52. [55]

    Upward traversal: • Particles to multipole(P2M) at leaf levelκ : For all leaf clustersX, calculate v(κ) X = V ∗ X q(κ) X • Multipole to multipole(M2M) at non-leaf level: For all non-leafX clusters, calculate v(l) X = X Xc∈child(X) ˜V ∗ XX c v(l+1) Xc , κ − 1 ≥ l ≥ 1

  53. [56]

    Transverse traversal: • Multipole to local(M2L) at all levels and for all clusters: For all clusterX, calculate u(l) X = X Y ∈IL∗(X) TX,Y v(l) Y , 1 ≤ l ≤ κ

  54. [57]

    Downward traversal: • Local to local(L2L) at non-leaf level: For all non-leaf clustersX, calculate u(l+1) Xc := u(l+1) Xc + ˜UXcX u(l) X , 1 ≤ l ≤ κ − 1 and X ∈ parent (Xc) • Local to particles(L2P) at leaf levelκ : For all leaf clustersX, calculate ϕ(κ) X = UX u(κ) X Near-field potential and the total potential at leaf level.For each leaf clusterX, we ad...