pith. sign in

arxiv: 2408.01129 · v8 · submitted 2024-08-02 · 💻 cs.LG · cs.AI

A Survey of Mamba

classification 💻 cs.LG cs.AI
keywords modelsmambaarchitecturereviewstudiesapplicationsdatadeep
0
0 comments X
read the original abstract

As one of the most representative DL techniques, Transformer architecture has empowered numerous advanced models, especially the large language models (LLMs) that comprise billions of parameters, becoming a cornerstone in deep learning. Despite the impressive achievements, Transformers still face inherent limitations, particularly the time-consuming inference resulting from the quadratic computation complexity of attention calculation. Recently, a novel architecture named Mamba, drawing inspiration from classical state space models (SSMs), has emerged as a promising alternative for building foundation models, delivering comparable modeling abilities to Transformers while preserving near-linear scalability concerning sequence length. This has sparked an increasing number of studies actively exploring Mamba's potential to achieve impressive performance across diverse domains. Given such rapid evolution, there is a critical need for a systematic review that consolidates existing Mamba-empowered models, offering a comprehensive understanding of this emerging model architecture. In this survey, we therefore conduct an in-depth investigation of recent Mamba-associated studies, covering three main aspects: the advancements of Mamba-based models, the techniques of adapting Mamba to diverse data, and the applications where Mamba can excel. Specifically, we first review the foundational knowledge of various representative deep learning models and the details of Mamba-1&2 as preliminaries. Then, to showcase the significance of Mamba for AI, we comprehensively review the related studies focusing on Mamba models' architecture design, data adaptability, and applications. Finally, we present a discussion of current limitations and explore various promising research directions to provide deeper insights for future investigations.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. mKG-RAG: Leveraging Multimodal Knowledge Graphs in Retrieval-Augmented Generation for Knowledge-intensive VQA

    cs.CV 2025-08 unverdicted novelty 7.0

    mKG-RAG constructs multimodal KGs via MLLM-driven extraction and vision-text matching then applies dual-stage query-aware retrieval to achieve new state-of-the-art results on knowledge-based VQA.

  2. DeMa: Dual-Path Delay-Aware Mamba for Efficient Multivariate Time Series Analysis

    cs.LG 2026-01 unverdicted novelty 6.0

    DeMa is a dual-path delay-aware Mamba architecture that decomposes MTS into intra-series temporal and inter-series variate paths to achieve SOTA performance with linear complexity on forecasting, imputation, anomaly d...

  3. Predicting one-year clinical instability and mortality in heart failure patients using sequence modeling

    cs.LG 2025-11 unverdicted novelty 4.0

    Sequence models on EHR data from a Swedish heart failure cohort achieve AUPRCs of 0.555 to 0.854 for one-year instability and mortality predictions and support four care pathways.

  4. When control meets large language models: From words to dynamics

    eess.SY 2026-02 unverdicted novelty 3.0

    The paper proposes a bidirectional continuum between LLMs and control systems, covering LLM-assisted controller design, control-based LLM steering, and state-space modeling of LLMs.

  5. Advancing Intelligent Sequence Modeling: Evolution, Trade-offs, and Applications of State- Space Architectures from S4 to Mamba

    cs.LG 2025-03 unverdicted

    A survey tracing the evolution of state-space models like S4 and Mamba, their efficiency trade-offs, and applications in NLP, vision, and other domains.