Optimized M2L Kernels for the Chebyshev Interpolation based Fast Multipole Method

B\'erenger Bramas (INRIA Bordeaux - Sud-Ouest); Eric Darve; Matthias Messner (INRIA Bordeaux - Sud-Ouest); Olivier Coulaud (INRIA Bordeaux - Sud-Ouest)

arxiv: 1210.7292 · v2 · pith:CVQFMIICnew · submitted 2012-10-27 · 💻 cs.NA · cs.CE· cs.MS· math.NA

Optimized M2L Kernels for the Chebyshev Interpolation based Fast Multipole Method

Matthias Messner (INRIA Bordeaux - Sud-Ouest) , B\'erenger Bramas (INRIA Bordeaux - Sud-Ouest) , Olivier Coulaud (INRIA Bordeaux - Sud-Ouest) , Eric Darve This is my paper

classification 💻 cs.NA cs.CEcs.MSmath.NA

keywords kernelsmethodoperatorbeenchebyshevfastfunctionsinterpolation

0 comments

read the original abstract

A fast multipole method (FMM) for asymptotically smooth kernel functions (1/r, 1/r^4, Gauss and Stokes kernels, radial basis functions, etc.) based on a Chebyshev interpolation scheme has been introduced in [Fong et al., 2009]. The method has been extended to oscillatory kernels (e.g., Helmholtz kernel) in [Messner et al., 2012]. Beside its generality this FMM turns out to be favorable due to its easy implementation and its high performance based on intensive use of highly optimized BLAS libraries. However, one of its bottlenecks is the precomputation of the multiple-to-local (M2L) operator, and its higher number of floating point operations (flops) compared to other FMM formulations. Here, we present several optimizations for that operator, which is known to be the costliest FMM operator. The most efficient ones do not only reduce the precomputation time by a factor up to 340 but they also speed up the matrix-vector product. We conclude with comparisons and numerical validations of all presented optimizations.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Simple Communication Scheme for Distributed Fast Multipole Methods
cs.DC 2026-04 unverdicted novelty 4.0

A simple MPI-based scheme for distributed uniform-tree FMMs achieves weak scaling to 3.2e10 points on 512 nodes while preserving shared-memory optimizations.