Blackmamba: Mixture of experts for state-space models.arXiv preprint arXiv:2402.01771

Quentin Anthony, Yury Tokpanov, Paolo Glorioso, Beren Millidge · 2024 · arXiv 2402.01771

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Hidden State Poisoning Attacks against Mamba-based Language Models

cs.CL · 2026-01-05 · unverdicted · novelty 7.0

Short input phrases can irreversibly overwrite hidden states in Mamba models, impairing information retrieval on a new benchmark while leaving pure Transformer models unaffected.

ZAYA1-8B Technical Report

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

A Survey on Efficient Inference for Large Language Models

cs.CL · 2024-04-22 · accept · novelty 3.0

The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.

A Survey of Mamba

cs.LG · 2024-08-02 · unverdicted · novelty 2.0

The paper consolidates existing research on Mamba models, their architecture variants, adaptations to different data modalities, and applications across domains.

citing papers explorer

Showing 4 of 4 citing papers.

Hidden State Poisoning Attacks against Mamba-based Language Models cs.CL · 2026-01-05 · unverdicted · none · ref 1
Short input phrases can irreversibly overwrite hidden states in Mamba models, impairing information retrieval on a new benchmark while leaving pure Transformer models unaffected.
ZAYA1-8B Technical Report cs.AI · 2026-05-06 · unverdicted · none · ref 21
ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.
A Survey on Efficient Inference for Large Language Models cs.CL · 2024-04-22 · accept · none · ref 107
The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.
A Survey of Mamba cs.LG · 2024-08-02 · unverdicted · none · ref 5
The paper consolidates existing research on Mamba models, their architecture variants, adaptations to different data modalities, and applications across domains.

Blackmamba: Mixture of experts for state-space models.arXiv preprint arXiv:2402.01771

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer