pith. sign in

hub

Longhorn: State Space Models are Amortized Online Learners

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

hub tools

citation-role summary

background 2 baseline 1 dataset 1

citation-polarity summary

representative citing papers

Short window attention enables long-term memorization

cs.LG · 2025-09-29 · unverdicted · novelty 6.0

Short sliding windows in hybrid attention-xLSTM models boost long-context performance by encouraging long-term memory use, and stochastic window sizing improves both short and long tasks.

Titans: Learning to Memorize at Test Time

cs.LG · 2024-12-31 · unverdicted · novelty 6.0

Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.

Kaczmarz Linear Attention

cs.LG · 2026-05-09 · unverdicted · novelty 5.0

Kaczmarz Linear Attention replaces the empirical coefficient in Gated DeltaNet with a key-norm-normalized step size derived from the online regression objective, yielding lower perplexity and better needle-in-haystack performance.

TTT3R: 3D Reconstruction as Test-Time Training

cs.CV · 2025-09-30 · unverdicted · novelty 5.0

TTT3R derives a closed-form learning rate from memory-observation alignment confidence to boost length generalization in RNN-based 3D reconstruction by 2x in global pose estimation.

Gated Delta Networks: Improving Mamba2 with Delta Rule

cs.CL · 2024-12-09 · unverdicted · novelty 5.0

Gated DeltaNet integrates gating and delta rules into linear transformers, outperforming Mamba2 and DeltaNet on language modeling, reasoning, retrieval, and long-context tasks.

citing papers explorer

Showing 11 of 11 citing papers.