New Algebraic Fast Algorithms for N-body Problems in Two and Three Dimensions
Pith reviewed 2026-05-24 06:38 UTC · model grok-4.3
The pith
Weak admissibility in higher dimensions enables new nested and semi-nested algebraic hierarchical matrix algorithms that match standard H2 performance for fast N-body matrix-vector products.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By adopting the weak admissibility condition in which admissible clusters comprise both far-field and vertex-sharing pairs, the efficient H²_* algorithm constructs a fully nested hierarchical representation and the (H² + H)_* algorithm constructs a semi-nested representation; both are built entirely from algebraic compressions such as ACA and NCA, yielding fast matrix-vector products whose measured memory and runtime are competitive with the NCA-based standard H² matrix algorithm on the tested kernels.
What carries the argument
The weak admissibility condition (admissible clusters are far-field and vertex-sharing), which enables the construction of nested and semi-nested bases via purely algebraic low-rank approximations.
If this is right
- The nested bases reduce the cost of repeated matrix-vector products relative to non-nested H matrices while preserving algebraic black-box applicability.
- The same weak-admissibility construction yields fast iterative solvers for dense linear systems arising from boundary integral equations.
- The algorithms extend directly to radial basis function interpolation without kernel-specific analytic expansions.
- Comparative timings performed in identical C++ implementations establish that both new variants achieve memory and runtime within a small factor of the standard H2 reference across multiple kernels.
Where Pith is reading between the lines
- The semi-nested variant may trade a modest increase in storage for simpler basis construction when full nesting is unnecessary.
- Because the method is purely algebraic, it could be applied to kernels that lack closed-form low-rank expansions, provided the weak-admissibility blocks remain numerically low-rank.
- The vertex-sharing part of the admissibility condition may allow coarser partitions in three dimensions than strong admissibility, potentially improving parallel scalability.
Load-bearing premise
The low-rank approximations obtained under the weak admissibility condition remain effective enough to produce nested bases whose storage and arithmetic costs stay competitive with standard H2 methods.
What would settle it
For a 3D kernel on a mesh with N greater than 10^5, measure whether the ranks required by vertex-sharing admissible blocks grow faster than those of a standard strong-admissibility H2 construction, causing total memory or MVP time to exceed the NCA-based H2 reference by more than a small constant factor.
read the original abstract
We present two new algebraic multilevel hierarchical matrix algorithms to perform fast matrix-vector product (MVP) for $N$-body problems in $d$ dimensions, namely efficient $\mathcal{H}^2_{*}$ (fully nested algorithm, i.e., $\mathcal{H}^2$ matrix-like algorithm) and $(\mathcal{H}^2 + \mathcal{H})_{*}$ (semi-nested algorithm, i.e., cross of $\mathcal{H}^2$ and $\mathcal{H}$ matrix-like algorithms). The efficient $\mathcal{H}^2_{*}$ and $(\mathcal{H}^2 + \mathcal{H})_{*}$ hierarchical representations are based on our recently introduced weak admissibility condition in higher dimensions, where the admissible clusters are the far-field and the vertex-sharing clusters. Due to the use of nested form of the bases, the proposed hierarchical matrix algorithms are more efficient than the non-nested algorithms ($\mathcal{H}$ matrix algorithms). We rely on purely algebraic low-rank approximation techniques (e.g., ACA and NCA) and develop both algorithms in a black-box fashion. Another noteworthy contribution of this article is that we perform a comparative study of the proposed algorithms with different algebraic (NCA or ACA-based compression) fast MVP algorithms in $2$D and $3$D. The fast algorithms are tested on various kernel matrices and applied to get fast iterative solutions of a dense linear system arising from the discretized integral equations and radial basis function interpolation. Notably, all the algorithms are developed in a similar fashion in $\texttt{C++}$ and tested within the same environment, allowing for meaningful comparisons. The numerical results demonstrate that the proposed algorithms are competitive to the NCA-based standard $\mathcal{H}^2$ matrix algorithm with respect to the memory and time. The C++ implementation of the proposed algorithms is available at https://github.com/riteshkhan/H2weak/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces two new algebraic multilevel hierarchical matrix algorithms, H²_* (fully nested) and (H² + H)_* (semi-nested), for fast matrix-vector products in N-body problems in 2D and 3D. These rely on a weak admissibility condition (far-field plus vertex-sharing clusters) and algebraic low-rank approximations via ACA/NCA in a black-box fashion. The nested bases are claimed to yield efficiency gains over non-nested H-matrices, with numerical tests on kernel matrices demonstrating competitiveness to the standard NCA-based H² algorithm in memory and time; the methods are applied to integral equations and RBF interpolation, with all implementations in the same C++ environment and public code released.
Significance. If the competitiveness holds, the work supplies practical algebraic alternatives to standard H² methods that exploit nesting under weak admissibility while enabling direct comparisons via a shared code base. The public C++ implementation on GitHub is a clear strength for reproducibility and verification of the reported timings and memory figures.
major comments (2)
- [Numerical experiments section] Numerical experiments section: the central claim of competitiveness in 3D rests on vertex-sharing blocks admitting effective low-rank approximations that preserve nesting efficiency; without explicit reporting of average or maximum ranks (or rank growth with cluster size) for these blocks on the tested 3D kernels (e.g., 1/|x-y|), it is impossible to confirm that the extra admissible interactions do not inflate storage or MVP cost enough to offset the reported gains.
- [Algorithm description] Algorithm description (likely §3–4): the construction of the nested bases under the weak admissibility condition must be shown to remain O(N) or O(N log N) when vertex-sharing clusters are included; the current algebraic development does not address whether the transfer matrices or basis sizes grow with the additional admissible blocks in 3D.
minor comments (1)
- A table summarizing the kernels, dimensions, and observed ranks for each method would improve the clarity of the comparative study.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and the recommendation for major revision. We address the two major comments point by point below, proposing revisions to strengthen the presentation of our results on the weak-admissibility-based algebraic H² methods.
read point-by-point responses
-
Referee: [Numerical experiments section] Numerical experiments section: the central claim of competitiveness in 3D rests on vertex-sharing blocks admitting effective low-rank approximations that preserve nesting efficiency; without explicit reporting of average or maximum ranks (or rank growth with cluster size) for these blocks on the tested 3D kernels (e.g., 1/|x-y|), it is impossible to confirm that the extra admissible interactions do not inflate storage or MVP cost enough to offset the reported gains.
Authors: We agree that explicit reporting of ranks for the vertex-sharing admissible blocks would strengthen the evidence for the competitiveness claim in 3D. In the revised manuscript we will add a new table (or subsection) in the numerical experiments section that reports the observed average and maximum ranks, as well as rank growth trends with cluster size, separately for far-field and vertex-sharing blocks on the tested 3D kernels including 1/|x-y|. This addition will allow readers to directly assess whether the extra admissible interactions inflate storage or MVP cost. revision: yes
-
Referee: [Algorithm description] Algorithm description (likely §3–4): the construction of the nested bases under the weak admissibility condition must be shown to remain O(N) or O(N log N) when vertex-sharing clusters are included; the current algebraic development does not address whether the transfer matrices or basis sizes grow with the additional admissible blocks in 3D.
Authors: The nested bases and transfer matrices are constructed algebraically via the same bottom-up procedure used in standard H² methods (ACA/NCA on admissible blocks), now applied uniformly to both far-field and vertex-sharing blocks under the weak admissibility condition. Because the hierarchical partitioning and nesting structure are unchanged, the number of admissible blocks per cluster remains bounded by the geometry of the cluster tree, so the asymptotic cost of basis construction stays O(N log N). We will add a short clarifying paragraph in §3–4 that makes this reasoning explicit and notes that the empirical 3D timings already confirm the expected scaling; a full formal proof is beyond the scope of the present algebraic black-box development but follows the same counting arguments as classical H² analysis. revision: partial
Circularity Check
No significant circularity detected
full rationale
The paper develops new algebraic H²_* and (H² + H)_* algorithms from first principles using ACA/NCA low-rank approximations and a weak admissibility condition defined in prior independent work by the same authors. The central claims rest on direct numerical comparisons of memory and runtime against the standard NCA-based H² algorithm, all implemented and timed in the same C++ environment with public code. No derivation step reduces by construction to its inputs, no parameter is fitted and then relabeled as a prediction, and no uniqueness theorem or ansatz is smuggled via self-citation; the work is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Weak admissibility condition holds for admissible clusters (far-field and vertex-sharing) in 2D and 3D for the tested kernels.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
weak admissibility condition in higher dimensions, where the admissible clusters are the far-field and the vertex-sharing clusters... HODLRdD... H2_*(b+t)... (H2+H)_*
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
NCA... ACA... algebraic low-rank approximation... black-box fashion
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
S. Ambikasaran and E. Dar ve , An O (N log N ) fast direct solver for partial hierarchically semi-separable matrices: With application to radial basis function interpolation, Journal of Scientific Computing, 57 (2013), pp. 477–501
work page 2013
-
[2]
S. Ambikasaran, K. R. Singh, and S. S. Sankaran , HODLRlib: A library for hierarchical matrices, Journal of Open Source Software, 4 (2019), p. 1167. 36 RITESH KHAN, SIVARAM AMBIKASARAN
work page 2019
-
[3]
J. Barnes and P. Hut , A hierarchicalO(N log N ) force-calculation algorithm, nature, 324 (1986), pp. 446–449
work page 1986
-
[4]
M. Bebendorf, Hierarchical Matrices: A Means to Efficiently Solve Elliptic Boundary Value Problems, Springer, 2008
work page 2008
-
[5]
M. Bebendorf and S. Rjasanow , Adaptive low-rank approximation of collocation matrices, Computing, 70 (2003), pp. 1–24
work page 2003
-
[6]
M. Bebendorf and R. Venn , Constructing nested bases approximations from the entries of non-local operators, Nu- merische Mathematik, 121 (2012), pp. 609–635
work page 2012
-
[7]
S. Börm, Construction of data-sparseH2-matrices by hierarchical compression, SIAM Journal on Scientific Computing, 31 (2009), pp. 1820–1839
work page 2009
-
[8]
S. Börm, L. Grasedyck, and W. Hackbusch , Hierarchical matrices, Lecture notes, 21 (2003), p. 2003
work page 2003
-
[9]
S. Börm, L. Grasedyck, and W. Hackbusch , Introduction to hierarchical matrices with applications, Engineering analysis with boundary elements, 27 (2003), pp. 405–422
work page 2003
-
[10]
M. Bozzini, L. Lenarduzzi, M. Rossini, and R. Schaback , Interpolation with variably scaled kernels, IMA Journal of Numerical Analysis, 35 (2015), pp. 199–219
work page 2015
-
[11]
S. Chandrasekaran, P. Dewilde, M. Gu, W. Lyons, and T. Pals , A fast solver for HSS representations via sparse matrices, SIAM Journal on Matrix Analysis and Applications, 29 (2007), pp. 67–81
work page 2007
-
[12]
S. Chandrasekaran and I. C. Ipsen , On rank-revealing factorisations, SIAM Journal on Matrix Analysis and Appli- cations, 15 (1994), pp. 592–622
work page 1994
-
[13]
W. Fong and E. Dar ve , The black-box fast multipole method, Journal of Computational Physics, 228 (2009), pp. 8712– 8725
work page 2009
-
[14]
A. Gillman, P. M. Young, and P.-G. Martinsson , A direct solver withO(N ) complexity for integral equations on one-dimensional domains, Frontiers of Mathematics in China, 7 (2012), pp. 217–247
work page 2012
-
[15]
L. Grasedyck and W. Hackbusch , Construction and arithmetics of h-matrices, Computing, 70 (2003), pp. 295–334
work page 2003
-
[16]
A. Gray and A. Moore , N-body’problems in statistical learning, Advances in neural information processing systems, 13 (2000)
work page 2000
-
[17]
L. Greengard and V. Rokhlin , A fast algorithm for particle simulations, Journal of computational physics, 73 (1987), pp. 325–348
work page 1987
-
[18]
L. Greengard and V. Rokhlin , A new version of the fast multipole method for the laplace equation in three dimensions, Acta numerica, 6 (1997), pp. 229–269
work page 1997
-
[19]
V. Gujjula and S. Ambikasaran , A new nested cross approximation, arXiv preprint arXiv:2203.14832, (2022)
-
[20]
Hackbusch, Hierarchical Matrices: Algorithms and Analysis, vol
W. Hackbusch, Hierarchical Matrices: Algorithms and Analysis, vol. 49, Springer, 12 2015, https://doi.org/10.1007/ 978-3-662-47324-5
work page 2015
-
[21]
W. Hackbusch, B. N. Khoromskij, and R. Kriemann , Hierarchical matrices based on a weak admissibility criterion, Computing, 73 (2004), pp. 207–243
work page 2004
-
[22]
N. Halko, P.-G. Martinsson, and J. A. Tropp , Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM review, 53 (2011), pp. 217–288
work page 2011
-
[23]
K. L. Ho and L. Ying , Hierarchical interpolative factorization for elliptic operators: integral equations, arXiv preprint arXiv:1307.2666, (2013)
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[24]
V. A. Kandappan, V. Gujjula, and S. Ambikasaran , HODLR2D: A new class of hierarchical matrices, SIAM Journal on Scientific Computing, 45 (2023), pp. A2382–A2408
work page 2023
-
[25]
R. Khan, V. A. Kandappan, and S. Ambikasaran , HODLRdD: A new black-box fast algorithm forN-body problems in d-dimensions with guaranteed error bounds: Applications to integral equations and support vector machines, Journal of Computational Physics, 501 (2024), p. 112786, https://doi.org/10.1016/j.jcp.2024.112786
-
[26]
D. Malhotra and G. Biros , Pvfmm: A parallel kernel independent fmm for particle and volume potentials, Commu- nications in Computational Physics, 18 (2015), pp. 808–830
work page 2015
- [27]
-
[28]
C.-T. Pan, On the existence and computation of rank-revealing lu factorizations, Linear Algebra and its Applications, 316 (2000), pp. 199–222
work page 2000
-
[29]
C. E. Rasmussen, Gaussian Processes in Machine Learning, Springer Berlin Heidelberg, Berlin, Heidelberg, 2004, pp. 63– 71, https://doi.org/10.1007/978-3-540-28650-9_4
-
[30]
Y. Saad and M. H. Schultz ,Gmres: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM Journal on scientific and statistical computing, 7 (1986), pp. 856–869
work page 1986
-
[31]
A. K. Saibaba, S. Ambikasaran, J. Y. Li, P. K. Kitanidis, and E. F. Dar ve , Application of hierarchical matrices to linear inverse problems in geostatistics, Oil & Gas Science and Technology–Revue d’IFP Energies nouvelles, 67 (2012), pp. 857–875
work page 2012
-
[32]
J. Xia, S. Chandrasekaran, M. Gu, and X. S. Li , Fast algorithms for hierarchically semiseparable matrices, Numer- ical Linear Algebra with Applications, 17 (2010), pp. 953–976
work page 2010
-
[33]
C. Yang, R. Duraisw ami, N. A. Gumerov, and L. Da vis , Improved fast gauss transform and efficient kernel density estimation, in Computer Vision, IEEE International Conference on, vol. 2, IEEE Computer Society, 2003, pp. 464–464
work page 2003
-
[34]
L. Ying, G. Biros, and D. Zorin , A kernel-independent adaptive fast multipole algorithm in two and three dimensions, Journal of Computational Physics, 196 (2004), pp. 591–626
work page 2004
- [35]
-
[36]
Y. Zhao, D. Jiao, and J. Mao , Fast nested cross approximation algorithm for solving large-scale electromagnetic problems, IEEE Transactions on Microwave Theory and Techniques, 67 (2019), pp. 3271–3283. NEW ALGEBRAIC FAST ALGORITHMS FORN-BODY PROBLEMS IN TWO AND THREE DIMENSIONS 37 Appendix A. Purely algebraicH2√ d(b) algorithm [36, 19]. In this section, ...
work page 2019
- [37]
-
[38]
Calculation of the potential (MVP). (Appendix A.2) A.1. Initialization of H2√ d(b) representation.In the B2T pivot selection, the pivots of a cluster at a parent level are obtained from the pivots at its child level. Therefore, one needs to traverse the2d uniform tree from bottom to top direction (starting at the leaf level) to find pivots of all clusters...
-
[41]
L2L ˜UXcX , X ∈ parent (Xc) / L2P (UX ). A.2. Calculation of the potential (MVP).The procedure to compute the potential is as follows:
-
[42]
Upward traversal: • Particles to multipole(P2M) at leaf levelκ : For all leaf clustersX, calculate v(κ) X = V ∗ X q(κ) X 38 RITESH KHAN, SIVARAM AMBIKASARAN • Multipole to multipole(M2M) at non-leaf level: For all non-leafX clusters, calculate v(l) X = X Xc∈child(X) ˜V ∗ XX c v(l+1) Xc , κ − 1 ≥ l ≥ 2
-
[43]
Transverse traversal: • Multipole to local(M2L) at all levels and for all clusters: For all clusterX, calculate u(l) X = X Y ∈IL√ d(X) TX,Y v(l) Y , 2 ≤ l ≤ κ
-
[44]
Downward traversal: • Local to local(L2L) at non-leaf level: For all non-leaf clustersX, calculate u(l+1) Xc := u(l+1) Xc + ˜UXcX u(l) X , 2 ≤ l ≤ κ − 1 and X ∈ parent (Xc) • Local to particles(L2P) at leaf levelκ : For all leaf clustersX, calculate ϕ(κ) X = UX u(κ) X Near-field potential and the total potential at leaf level.For each leaf clusterX, we ad...
- [45]
-
[46]
Calculation of the potential (MVP). (Appendix B.2) B.1. Initialization of H2√ d(t) representation. In the T2B pivot selection, the pivots of a cluster at a child level are obtained from its own index set and the pivots at its parent level. Therefore, one needs to traverse the 2d uniform tree from top to bottom direction to find pivots of all clusters. The...
-
[48]
M2L (TX,Y ) , Y ∈ IL √ d (X)
-
[49]
L2L ˜UXcX , X ∈ parent (Xc) / L2P (UX ). B.2. Calculation of the potential (MVP).We perform the potential (MVP) calculation by follow- ing upward, transverse and downward tree traversal, which is similar to Appendix A.2. We refer the reader to Algorithm1 of [36] for more details. Appendix C. H2 ∗(t): T2B NCA on ourweak admissibility condition. This sectio...
- [50]
-
[51]
Calculation of the potential (MVP). (Appendix C.2) C.1. Initialization of H2 ∗(t) representation. In the T2B pivot selection, the pivots of a cluster at a child level are obtained from its own index set and the pivots at its parent level. Therefore, one needs to traverse the 2d uniform tree from top to bottom direction to find pivots of all clusters. The ...
-
[52]
P2M (VX ) / M2M ˜VXX c , Xc ∈ child (X)
-
[53]
M2L (TX,Y ) , Y ∈ IL ∗ (X)
-
[54]
L2L ˜UXcX , X ∈ parent (Xc) / L2P (UX ). C.2. Calculation of the potential (MVP).The procedure to compute the potential is as follows:
-
[55]
Upward traversal: • Particles to multipole(P2M) at leaf levelκ : For all leaf clustersX, calculate v(κ) X = V ∗ X q(κ) X • Multipole to multipole(M2M) at non-leaf level: For all non-leafX clusters, calculate v(l) X = X Xc∈child(X) ˜V ∗ XX c v(l+1) Xc , κ − 1 ≥ l ≥ 1
-
[56]
Transverse traversal: • Multipole to local(M2L) at all levels and for all clusters: For all clusterX, calculate u(l) X = X Y ∈IL∗(X) TX,Y v(l) Y , 1 ≤ l ≤ κ
-
[57]
Downward traversal: • Local to local(L2L) at non-leaf level: For all non-leaf clustersX, calculate u(l+1) Xc := u(l+1) Xc + ˜UXcX u(l) X , 1 ≤ l ≤ κ − 1 and X ∈ parent (Xc) • Local to particles(L2P) at leaf levelκ : For all leaf clustersX, calculate ϕ(κ) X = UX u(κ) X Near-field potential and the total potential at leaf level.For each leaf clusterX, we ad...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.