Locality-aware cta clustering for modern gpus,

· 2017 · arXiv 7697.303770

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

WaveTune: Wave-aware Bilinear Modeling for Efficient GPU Kernel Auto-tuning

cs.PF · 2026-04-11 · unverdicted · novelty 6.0

WaveTune introduces a wave-aware bilinear latency predictor and wave-structured sparse sampling to enable fast runtime auto-tuning of GPU kernels, achieving up to 1.83x kernel speedup and 1.33x TTFT reduction with drastically lower overhead.

PipeWeave: Synergizing Analytical and Learning Models for Unified GPU Performance Prediction

cs.PF · 2026-01-21 · unverdicted · novelty 6.0

PipeWeave predicts GPU kernel performance with 6.1% average error and end-to-end inference with 8.5% error by feeding analytical pipeline features into ML, cutting prior method errors by 4-7x across 11 GPUs.

citing papers explorer

Showing 2 of 2 citing papers.

WaveTune: Wave-aware Bilinear Modeling for Efficient GPU Kernel Auto-tuning cs.PF · 2026-04-11 · unverdicted · none · ref 20
WaveTune introduces a wave-aware bilinear latency predictor and wave-structured sparse sampling to enable fast runtime auto-tuning of GPU kernels, achieving up to 1.83x kernel speedup and 1.33x TTFT reduction with drastically lower overhead.
PipeWeave: Synergizing Analytical and Learning Models for Unified GPU Performance Prediction cs.PF · 2026-01-21 · unverdicted · none · ref 28
PipeWeave predicts GPU kernel performance with 6.1% average error and end-to-end inference with 8.5% error by feeding analytical pipeline features into ML, cutting prior method errors by 4-7x across 11 GPUs.

Locality-aware cta clustering for modern gpus,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer