Fine-Grained Fusion: The Missing Piece in Area-Efficient State Space Model Acceleration

· 2025 · cs.AR · arXiv 2504.17333

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

State Space Models (SSMs) offer a promising alternative to transformers for long-sequence processing. However, their efficiency remains hindered by memory-bound operations, particularly in the prefill stage. While MARCA, a recent first effort to accelerate SSMs through a dedicated hardware accelerator, achieves great speedup over high-end GPUs, an analysis into the broader accelerator design space is lacking. This work systematically analyzes SSM acceleration opportunities both from the scheduling perspective through fine-grained operator fusion and the hardware perspective through design space exploration, using an extended version of the Stream modeling framework. Our results demonstrate that the improved data locality stemming from our optimized fusion and scheduling strategy enables a speedup of up to 4.8x over unfused execution, while our adaptive memory-aware fusion approach reduces on-chip memory requirements by an order of magnitude without sacrificing performance. We further explore accelerator design trade-offs, showing that a fusion-aware hardware architecture can achieve 1.78x higher performance than the state-of-the-art MARCA accelerator, within the same area budget. These results establish operator fusion as a key enabler for next-generation SSM accelerators.

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models

cs.AR · 2026-04-04 · unverdicted · novelty 6.0

Mambalaya delivers 4.9x prefill and 1.9x generation speedups on Mamba layers over prior accelerators by systematically fusing inter-Einsum operations.

citing papers explorer

Showing 1 of 1 citing paper.

Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models cs.AR · 2026-04-04 · unverdicted · none · ref 22 · internal anchor
Mambalaya delivers 4.9x prefill and 1.9x generation speedups on Mamba layers over prior accelerators by systematically fusing inter-Einsum operations.

Fine-Grained Fusion: The Missing Piece in Area-Efficient State Space Model Acceleration

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer