Portable Ewald summation algorithms for Stokes flow achieve ~8M particles/sec on H200 GPU with a novel P2G kernel providing 16x speedup and good multi-GPU scaling.
A massively parallel adaptive fast-multipole method on heterogeneous architectures , ISBN=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Charm++ techniques enable efficient overdecomposition on multi-vendor GPGPU distributed systems.
citing papers explorer
-
A performance portable fast Ewald summation for Stokes flow
Portable Ewald summation algorithms for Stokes flow achieve ~8M particles/sec on H200 GPU with a novel P2G kernel providing 16x speedup and good multi-GPU scaling.