The sequence length is fixed at 2048 tokens, and the embedding layer remains frozen throughout training

We apply a linear warm-up for the first 1% of steps, followed by cosine annealing · 2048 · arXiv 3849.1128

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

cs.NE · 2025-10-06 · unverdicted · novelty 5.0

SpikingMamba distills Mamba into an SNN LLM achieving 4.76x energy savings with a 4.78% zero-shot accuracy gap that narrows to 2.23% after RL.

Showing 1 of 1 citing paper.

SpikingMamba: Towards Energy-Efficient Large Language Models via Knowledge Distillation from Mamba cs.NE · 2025-10-06 · unverdicted · none · ref 29
SpikingMamba distills Mamba into an SNN LLM achieving 4.76x energy savings with a 4.78% zero-shot accuracy gap that narrows to 2.23% after RL.