Analyzing the Performance of Applications at Exascale

· 2025 · arXiv 1145.373041

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

method 2

citation-polarity summary

use method 2

representative citing papers

JZ-Tree: GPU friendly neighbour search and friends-of-friends with dual tree walks in JAX plus CUDA

cs.DC · 2026-04-07 · unverdicted · novelty 7.0

JZ-Tree introduces a flattened Morton plane-based tree hierarchy enabling collaborative dual-tree walks that deliver more than 10x faster exact k-NN search and FoF clustering on GPUs for N greater than 10 million particles, with multi-GPU scaling.

LEO: Tracing GPU Stall Root Causes via Cross-Vendor Backward Slicing

cs.DC · 2026-04-21 · unverdicted · novelty 6.0

LEO performs cross-vendor backward slicing from stalled GPU instructions to attribute root causes to source code, enabling optimizations that produce geometric-mean speedups of 1.73-1.82x on 21 workloads.

Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics

cs.DC · 2026-05-05 · unverdicted · novelty 5.0

An accelerated hpcanalysis framework ingests performance data from 100,000 MPI ranks in 9.69 seconds, delivers up to 314x GPU speedup, maps network congestion on Aurora, and uses a new tri-dimensional model to identify 32.28% potential speedup in a GAMESS workload on Frontier.

citing papers explorer

Showing 3 of 3 citing papers.

JZ-Tree: GPU friendly neighbour search and friends-of-friends with dual tree walks in JAX plus CUDA cs.DC · 2026-04-07 · unverdicted · none · ref 23
JZ-Tree introduces a flattened Morton plane-based tree hierarchy enabling collaborative dual-tree walks that deliver more than 10x faster exact k-NN search and FoF clustering on GPUs for N greater than 10 million particles, with multi-GPU scaling.
LEO: Tracing GPU Stall Root Causes via Cross-Vendor Backward Slicing cs.DC · 2026-04-21 · unverdicted · none · ref 28
LEO performs cross-vendor backward slicing from stalled GPU instructions to attribute root causes to source code, enabling optimizations that produce geometric-mean speedups of 1.73-1.82x on 21 workloads.
Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics cs.DC · 2026-05-05 · unverdicted · none · ref 9
An accelerated hpcanalysis framework ingests performance data from 100,000 MPI ranks in 9.69 seconds, delivers up to 314x GPU speedup, maps network congestion on Aurora, and uses a new tri-dimensional model to identify 32.28% potential speedup in a GAMESS workload on Frontier.

Analyzing the Performance of Applications at Exascale

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer