LEO performs cross-vendor backward slicing from stalled GPU instructions to attribute root causes to source code, enabling optimizations that produce geometric-mean speedups of 1.73-1.82x on 21 workloads.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.DC 3years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
OpenMP port of gPLUTO achieves comparable performance to OpenACC on NVIDIA but is 3x slower at application level and up to 10x at kernel level on AMD MI250X, driven by strided memory accesses, latency bounds, and C++ abstraction overheads.
New hardware-usage-based similarity metrics can identify matching computational kernels between proxy applications and performance suites on both CPU and GPU systems.
citing papers explorer
-
LEO: Tracing GPU Stall Root Causes via Cross-Vendor Backward Slicing
LEO performs cross-vendor backward slicing from stalled GPU instructions to attribute root causes to source code, enabling optimizations that produce geometric-mean speedups of 1.73-1.82x on 21 workloads.
-
On the Limits of Performance Portability in Directive-Based GPU Programming
OpenMP port of gPLUTO achieves comparable performance to OpenACC on NVIDIA but is 3x slower at application level and up to 10x at kernel level on AMD MI250X, driven by strided memory accesses, latency bounds, and C++ abstraction overheads.
-
On Similarity of Computational Kernels in our Codes and Proxies
New hardware-usage-based similarity metrics can identify matching computational kernels between proxy applications and performance suites on both CPU and GPU systems.