Title resolution pending

· 2020 · arXiv 9587.2019

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

LEO: Tracing GPU Stall Root Causes via Cross-Vendor Backward Slicing

cs.DC · 2026-04-21 · unverdicted · novelty 6.0

LEO performs cross-vendor backward slicing from stalled GPU instructions to attribute root causes to source code, enabling optimizations that produce geometric-mean speedups of 1.73-1.82x on 21 workloads.

On the Limits of Performance Portability in Directive-Based GPU Programming

cs.DC · 2026-06-10 · unverdicted · novelty 5.0 · 2 refs

OpenMP port of gPLUTO achieves comparable performance to OpenACC on NVIDIA but is 3x slower at application level and up to 10x at kernel level on AMD MI250X, driven by strided memory accesses, latency bounds, and C++ abstraction overheads.

cs.DC · 2026-05-07 · unverdicted · novelty 5.0

New hardware-usage-based similarity metrics can identify matching computational kernels between proxy applications and performance suites on both CPU and GPU systems.

citing papers explorer

Showing 3 of 3 citing papers after filters.

LEO: Tracing GPU Stall Root Causes via Cross-Vendor Backward Slicing cs.DC · 2026-04-21 · unverdicted · none · ref 8
LEO performs cross-vendor backward slicing from stalled GPU instructions to attribute root causes to source code, enabling optimizations that produce geometric-mean speedups of 1.73-1.82x on 21 workloads.
On the Limits of Performance Portability in Directive-Based GPU Programming cs.DC · 2026-06-10 · unverdicted · none · ref 10 · 2 links
OpenMP port of gPLUTO achieves comparable performance to OpenACC on NVIDIA but is 3x slower at application level and up to 10x at kernel level on AMD MI250X, driven by strided memory accesses, latency bounds, and C++ abstraction overheads.
On Similarity of Computational Kernels in our Codes and Proxies cs.DC · 2026-05-07 · unverdicted · none · ref 5
New hardware-usage-based similarity metrics can identify matching computational kernels between proxy applications and performance suites on both CPU and GPU systems.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer