Neptune introduces dependency-breaking fusion with algebraic corrections for reduction sequences, generating FlashAttention-like kernels from plain attention code with 1.35x average speedup across ten benchmarks and four GPU architectures.
NVIDIA RTX 6000 Ada-generation Graphics Card.https://www.nvidia.com/en-us/design-visualization/rtx-6000/, 2024
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.PL 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs
Neptune introduces dependency-breaking fusion with algebraic corrections for reduction sequences, generating FlashAttention-like kernels from plain attention code with 1.35x average speedup across ten benchmarks and four GPU architectures.