Mamba: Linear-time sequence modeling with selective state spaces

Albert Gu, Tri Dao

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Identify Then Project: Contrastive Learning of Latent Dynamics from Partial Observations with Port-Hamiltonian Structure

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

A two-stage contrastive teacher-student framework learns and then projects latent dynamics onto port-Hamiltonian submanifolds from partial observations.

LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

LoopUS converts pretrained LLMs into looped latent refinement models via block decomposition, selective gating, random deep supervision, and confidence-based early exiting to improve reasoning performance.

On the Architectural Complexity of Neural Networks

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

A framework quantifies DNN complexity via tensor operations, links 40 years of breakthroughs to complexity increases, and releases a dataset of 3000+ unexplored high-complexity architectures.

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

TOA augments attention with learnable sequence-space operators and stochastic regularization to enable signed temporal mixing, yielding gains on forecasting and related benchmarks when added to PatchTST and iTransformer.

Rhamba: Region-Aware Hybrid Attention-Mamba Framework for Self-Supervised Learning in Resting-State fMRI

cs.LG · 2026-05-02 · unverdicted · novelty 6.0 · 2 refs

Rhamba uses region-aware masking strategies and hybrid Attention-Mamba models pretrained on ABIDE fMRI data to achieve top AUROC on schizophrenia and ADHD classification tasks while outperforming prior methods.

Parcae: Scaling Laws For Stable Looped Language Models

cs.LG · 2026-04-14 · unverdicted · novelty 6.0

Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.

Hardware-Software Co-Design of Scalable, Energy-Efficient Analog Recurrent Computations

cs.AR · 2026-05-12 · unverdicted · novelty 5.0

BMRUs enable analog recurrent neural network hardware via discrete outputs that suppress noise 20-fold, with one-to-one parameter-to-circuit mapping and linear power scaling for recurrence.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Mamba: Linear-time sequence modeling with selective state spaces

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer