Ternary Mamba-2 1.3B models reach 48.1% zero-shot accuracy via QAT from pretrained checkpoints in 102M tokens, close to Bi-Mamba, with 3.61x compression.
Spike-temporal latent representation for energy-efficient event-to-video reconstruction
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
MAR integrates SSMs and sparsification with new ATMN neurons and SBDS distillation to produce efficient LLMs that match dense-model performance at substantially lower inference energy.
SpikingMamba distills Mamba into an SNN LLM achieving 4.76x energy savings with a 4.78% zero-shot accuracy gap that narrows to 2.23% after RL.
citing papers explorer
-
Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models
Ternary Mamba-2 1.3B models reach 48.1% zero-shot accuracy via QAT from pretrained checkpoints in 102M tokens, close to Bi-Mamba, with 3.61x compression.
-
MAR: Efficient Large Language Models via Module-aware Architecture Refinement
MAR integrates SSMs and sparsification with new ATMN neurons and SBDS distillation to produce efficient LLMs that match dense-model performance at substantially lower inference energy.
-
SpikingMamba: Towards Energy-Efficient Large Language Models via Knowledge Distillation from Mamba
SpikingMamba distills Mamba into an SNN LLM achieving 4.76x energy savings with a 4.78% zero-shot accuracy gap that narrows to 2.23% after RL.