CUTLASS: CUDA Templates for Linear Algebra Subroutines

NVIDIA · 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

cs.PL · 2025-10-09 · conditional · novelty 6.0

Neptune introduces dependency-breaking fusion with algebraic corrections for reduction sequences, generating FlashAttention-like kernels from plain attention code with 1.35x average speedup across ten benchmarks and four GPU architectures.

citing papers explorer

Showing 1 of 1 citing paper.

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs cs.PL · 2025-10-09 · conditional · none · ref 27
Neptune introduces dependency-breaking fusion with algebraic corrections for reduction sequences, generating FlashAttention-like kernels from plain attention code with 1.35x average speedup across ten benchmarks and four GPU architectures.

CUTLASS: CUDA Templates for Linear Algebra Subroutines

fields

years

verdicts

representative citing papers

citing papers explorer