Fine-grained fusion and adaptive scheduling in SSMs deliver up to 4.8x speedup and 10x lower on-chip memory, enabling a fusion-aware accelerator with 1.78x higher performance than MARCA at equal area.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AR 2verdicts
UNVERDICTED 2representative citing papers
Mamba-3 architectural changes optimized for hyperscale GPUs cause 28% higher edge latency at 880M parameters and 48% at 15M parameters compared to earlier versions.
citing papers explorer
-
Fine-Grained Fusion: The Missing Piece in Area-Efficient State Space Model Acceleration
Fine-grained fusion and adaptive scheduling in SSMs deliver up to 4.8x speedup and 10x lower on-chip memory, enabling a fusion-aware accelerator with 1.78x higher performance than MARCA at equal area.
-
The Hyperscale Lottery: How State-Space Models Have Sacrificed Edge Efficiency
Mamba-3 architectural changes optimized for hyperscale GPUs cause 28% higher edge latency at 880M parameters and 48% at 15M parameters compared to earlier versions.