Ternary Mamba-2 1.3B models reach 48.1% zero-shot accuracy via QAT from pretrained checkpoints in 102M tokens, close to Bi-Mamba, with 3.61x compression.
Quamba2: Robust and efficient post-training quantization for selective state space models.arXiv preprint arXiv:2503.22879, 2025
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
MOSAIC uses an Integer Linear Program scheduler for expert placement and prompt assignment plus adaptive aggregation to achieve 1.7-2.3x end-to-end speedup on 4-GPU MoA workloads while keeping accuracy within 0.1pp.
citing papers explorer
-
Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models
Ternary Mamba-2 1.3B models reach 48.1% zero-shot accuracy via QAT from pretrained checkpoints in 102M tokens, close to Bi-Mamba, with 3.61x compression.
-
MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency
MOSAIC uses an Integer Linear Program scheduler for expert placement and prompt assignment plus adaptive aggregation to achieve 1.7-2.3x end-to-end speedup on 4-GPU MoA workloads while keeping accuracy within 0.1pp.