SMT reduces RNN training to supervised learning on memory transitions (m_t, x_{t+1}) to m_{t+1} obtained from a Transformer encoder, enabling time-parallel training with O(1) gradient paths.
Parallelizing non-linear sequential models over the sequence length
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 3years
2026 3verdicts
UNVERDICTED 3roles
background 2representative citing papers
LBI enables tractable parallel backpropagation by reducing inter-region adjoint computation to low-dimensional r x r Jacobians while preserving exact gradients under a bounded-interface model.
SST V2 introduces parallel-trainable nonlinear recurrence in latent space to let transformers reason continuously across positions, delivering +15 points on GPQA-Diamond and halving remaining GSM8K errors over matched baselines.
citing papers explorer
-
Pretraining Recurrent Networks without Recurrence
SMT reduces RNN training to supervised learning on memory transitions (m_t, x_{t+1}) to m_{t+1} obtained from a Transformer encoder, enabling time-parallel training with O(1) gradient paths.
-
LBI: Parallel Scan Backpropagation via Latent Bounded Interfaces
LBI enables tractable parallel backpropagation by reducing inter-region adjoint computation to low-dimensional r x r Jacobians while preserving exact gradients under a bounded-interface model.
-
State Stream Transformer (SST) V2: Parallel Training of Nonlinear Recurrence for Latent Space Reasoning
SST V2 introduces parallel-trainable nonlinear recurrence in latent space to let transformers reason continuously across positions, delivering +15 points on GPQA-Diamond and halving remaining GSM8K errors over matched baselines.