Effective distillation to hybrid xlstm architectures, 2026

Lukas Hauzenberger, Niklas Schmidinger, Thomas Schmied, Anamaria-Roberta Hartl, David Stap, Pieter-Jan Hoedt, Maximilian Beck, Sebastian Böck, Günter Klambauer, Sepp Hochreiter · 2026 · arXiv 2603.15590

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

Pretraining Recurrent Networks without Recurrence

cs.LG · 2026-06-04 · unverdicted · novelty 6.0

SMT reduces RNN training to supervised learning on memory transitions (m_t, x_{t+1}) to m_{t+1} obtained from a Transformer encoder, enabling time-parallel training with O(1) gradient paths.

On Subquadratic Architectures: From Applications to Principles

cs.LG · 2026-06-10 · unverdicted · novelty 4.0

xLSTM outperforms Mamba-2 and Gated DeltaNet on tasks with complex dependencies because its gating scheme enables more flexible and stable state tracking and memory accumulation.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Pretraining Recurrent Networks without Recurrence cs.LG · 2026-06-04 · unverdicted · none · ref 43
SMT reduces RNN training to supervised learning on memory transitions (m_t, x_{t+1}) to m_{t+1} obtained from a Transformer encoder, enabling time-parallel training with O(1) gradient paths.
On Subquadratic Architectures: From Applications to Principles cs.LG · 2026-06-10 · unverdicted · none · ref 47
xLSTM outperforms Mamba-2 and Gated DeltaNet on tasks with complex dependencies because its gating scheme enables more flexible and stable state tracking and memory accumulation.

Effective distillation to hybrid xlstm architectures, 2026

fields

years

verdicts

representative citing papers

citing papers explorer