Recognition: unknown
Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization
read the original abstract
The past decade has witnessed a dramatic acceleration of lattice quantum chromodynamics calculations in nuclear and particle physics. This has been due to both significant progress in accelerating the iterative linear solvers using multi-grid algorithms, and due to the throughput improvements brought by GPUs. Deploying hierarchical algorithms optimally on GPUs is non-trivial owing to the lack of parallelism on the coarse grids, and as such, these advances have not proved multiplicative. Using the QUDA library, we demonstrate that by exposing all sources of parallelism that the underlying stencil problem possesses, and through appropriate mapping of this parallelism to the GPU architecture, we can achieve high efficiency even for the coarsest of grids. Results are presented for the Wilson-Clover discretization, where we demonstrate up to 10x speedup over present state-of-the-art GPU-accelerated methods on Titan. Finally, we look to the future, and consider the software implications of our findings.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Charmonium radiative transitions to dileptons from lattice QCD: The case of $h_c \to \eta_c \ell^+\ell^-$ and $\chi_{c1} \to J/\psi\,\ell^+\ell^-$
First fully dynamical lattice QCD yields Γ(h_c → η_c e⁺e⁻) = 5.45(19) keV (3σ above BESIII) and Γ(χ_c1 → J/ψ e⁺e⁻) = 2.869(90) keV, with continuum-extrapolated results and q² distributions.
-
Scalar and Tensor Form Factors for $\Lambda \rightarrow p\ell \bar{\nu}_\ell$ from Lattice QCD
Lattice QCD yields the scalar and tensor form factors for Λ→pℓν̄ℓ as functions of q², providing a model-independent input to constrain non-standard charged-current interactions via the predicted R^{μe} ratio compared ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.