COMPOSE is a timing-driven composable CGRA architecture that fuses cross-iteration operations and defers registration to deliver 1.6x performance and 2.9x EDP gains over prior CGRA designs for recurrence-bound loops.
GCStack+GCScaler: Fast and accurate GPU performance analyses using fine-grained stall cycle accounting and interval analysis
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
LEO performs cross-vendor backward slicing from stalled GPU instructions to attribute root causes to source code, enabling optimizations that produce geometric-mean speedups of 1.73-1.82x on 21 workloads.
citing papers explorer
-
COMPOSE: Static Timing-driven Composable Reconfigurable Architecture for Accelerating Recurrence-Bound Loops
COMPOSE is a timing-driven composable CGRA architecture that fuses cross-iteration operations and defers registration to deliver 1.6x performance and 2.9x EDP gains over prior CGRA designs for recurrence-bound loops.
-
LEO: Tracing GPU Stall Root Causes via Cross-Vendor Backward Slicing
LEO performs cross-vendor backward slicing from stalled GPU instructions to attribute root causes to source code, enabling optimizations that produce geometric-mean speedups of 1.73-1.82x on 21 workloads.