JZ-Tree introduces a flattened Morton plane-based tree hierarchy enabling collaborative dual-tree walks that deliver more than 10x faster exact k-NN search and FoF clustering on GPUs for N greater than 10 million particles, with multi-GPU scaling.
Analyzing the Performance of Applications at Exascale
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.DC 3years
2026 3verdicts
UNVERDICTED 3roles
method 2polarities
use method 2representative citing papers
LEO performs cross-vendor backward slicing from stalled GPU instructions to attribute root causes to source code, enabling optimizations that produce geometric-mean speedups of 1.73-1.82x on 21 workloads.
An accelerated hpcanalysis framework ingests performance data from 100,000 MPI ranks in 9.69 seconds, delivers up to 314x GPU speedup, maps network congestion on Aurora, and uses a new tri-dimensional model to identify 32.28% potential speedup in a GAMESS workload on Frontier.
citing papers explorer
-
JZ-Tree: GPU friendly neighbour search and friends-of-friends with dual tree walks in JAX plus CUDA
JZ-Tree introduces a flattened Morton plane-based tree hierarchy enabling collaborative dual-tree walks that deliver more than 10x faster exact k-NN search and FoF clustering on GPUs for N greater than 10 million particles, with multi-GPU scaling.
-
LEO: Tracing GPU Stall Root Causes via Cross-Vendor Backward Slicing
LEO performs cross-vendor backward slicing from stalled GPU instructions to attribute root causes to source code, enabling optimizations that produce geometric-mean speedups of 1.73-1.82x on 21 workloads.
-
Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics
An accelerated hpcanalysis framework ingests performance data from 100,000 MPI ranks in 9.69 seconds, delivers up to 314x GPU speedup, maps network congestion on Aurora, and uses a new tri-dimensional model to identify 32.28% potential speedup in a GAMESS workload on Frontier.