LEO performs cross-vendor backward slicing from stalled GPU instructions to attribute root causes to source code, enabling optimizations that produce geometric-mean speedups of 1.73-1.82x on 21 workloads.
Refining HPCToolkit for application performance analysis at exascale
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.DC 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
An accelerated hpcanalysis framework ingests performance data from 100,000 MPI ranks in 9.69 seconds, delivers up to 314x GPU speedup, maps network congestion on Aurora, and uses a new tri-dimensional model to identify 32.28% potential speedup in a GAMESS workload on Frontier.
citing papers explorer
-
LEO: Tracing GPU Stall Root Causes via Cross-Vendor Backward Slicing
LEO performs cross-vendor backward slicing from stalled GPU instructions to attribute root causes to source code, enabling optimizations that produce geometric-mean speedups of 1.73-1.82x on 21 workloads.
-
Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics
An accelerated hpcanalysis framework ingests performance data from 100,000 MPI ranks in 9.69 seconds, delivers up to 314x GPU speedup, maps network congestion on Aurora, and uses a new tri-dimensional model to identify 32.28% potential speedup in a GAMESS workload on Frontier.