A full 3D DG-FE unstructured-mesh ocean model is ported to multi-GPU systems with memory and solver optimizations, delivering single-GPU performance equivalent to ~1500 CPU cores and 50x speedup on 4xA100 nodes while scaling to 1024 GPUs.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
An efficient multi-GPU implementation for the Discontinuous Galerkin ocean model SLIM
A full 3D DG-FE unstructured-mesh ocean model is ported to multi-GPU systems with memory and solver optimizations, delivering single-GPU performance equivalent to ~1500 CPU cores and 50x speedup on 4xA100 nodes while scaling to 1024 GPUs.