Efficient implementation of the overlap operator on multi-GPUs
read the original abstract
Lattice QCD calculations were one of the first applications to show the potential of GPUs in the area of high performance computing. Our interest is to find ways to effectively use GPUs for lattice calculations using the overlap operator. The large memory footprint of these codes requires the use of multiple GPUs in parallel. In this paper we show the methods we used to implement this operator efficiently. We run our codes both on a GPU cluster and a CPU cluster with similar interconnects. We find that to match performance the CPU cluster requires 20-30 times more CPU cores than GPUs.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Diagonal Kenney-Laub Rational Approximation to the Overlap Operator using Wilson and Brillouin Kernel
Diagonal Kenney-Laub rational approximation to the overlap operator using Wilson and Brillouin kernels shows enhanced chiral symmetry preservation and efficiency over Chebyshev polynomials on quenched lattices.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.