pith. sign in

Blackmamba: Mixture of experts for state-space models.arXiv preprint arXiv:2402.01771

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

background 3

citation-polarity summary

years

2026 2 2024 2

roles

background 3

polarities

background 3

representative citing papers

ZAYA1-8B Technical Report

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

A Survey on Efficient Inference for Large Language Models

cs.CL · 2024-04-22 · accept · novelty 3.0

The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.

A Survey of Mamba

cs.LG · 2024-08-02 · unverdicted · novelty 2.0

The paper consolidates existing research on Mamba models, their architecture variants, adaptations to different data modalities, and applications across domains.

citing papers explorer

Showing 4 of 4 citing papers.

  • Hidden State Poisoning Attacks against Mamba-based Language Models cs.CL · 2026-01-05 · unverdicted · none · ref 1

    Short input phrases can irreversibly overwrite hidden states in Mamba models, impairing information retrieval on a new benchmark while leaving pure Transformer models unaffected.

  • ZAYA1-8B Technical Report cs.AI · 2026-05-06 · unverdicted · none · ref 21

    ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

  • A Survey on Efficient Inference for Large Language Models cs.CL · 2024-04-22 · accept · none · ref 107

    The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.

  • A Survey of Mamba cs.LG · 2024-08-02 · unverdicted · none · ref 5

    The paper consolidates existing research on Mamba models, their architecture variants, adaptations to different data modalities, and applications across domains.