hub

arXiv preprint arXiv:2501.12352 , year=

Ke Alexander Wang, Jiaxin Shi, Emily B Fox · 2025 · arXiv 2501.12352

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 3 support 1

representative citing papers

Test-Time Training with KV Binding Is Secretly Linear Attention

cs.LG · 2026-02-24 · conditional · novelty 8.0

Test-time training with KV binding reduces to learned linear attention.

Sparse Attention as Compact Kernel Regression

cs.LG · 2026-01-30 · unverdicted · novelty 8.0

Sparse attention arises from compact kernel regression, with Epanechnikov and similar kernels mapping to normalized ReLU, sparsemax, and alpha-entmax attention.

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.

ZipMap: Linear-Time Stateful 3D Reconstruction via Test-Time Training

cs.CV · 2026-03-04 · unverdicted · novelty 7.0

ZipMap achieves linear-time bidirectional 3D reconstruction by zipping image collections into a compact stateful representation via test-time training layers.

OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

OSDN adds online diagonal preconditioning to the Delta Rule, preserving chunkwise parallelism while proving super-geometric convergence and delivering 32-39% recall gains at 340M-1.3B scales.

Scal3R: Scalable Test-Time Training for Large-Scale 3D Reconstruction

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

Scal3R achieves better accuracy and consistency in large-scale 3D scene reconstruction by maintaining a compressed global context through test-time adaptation of lightweight neural networks on long video sequences.

Fast Spatial Memory with Elastic Test-Time Training

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

Elastic Test-Time Training stabilizes test-time updates via an elastic prior and moving-average anchor, enabling Fast Spatial Memory for scalable long-sequence 4D reconstruction with reduced memory use and fewer shortcuts.

In-Place Test-Time Training

cs.LG · 2026-04-07 · conditional · novelty 6.0

In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.

Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression

cs.LG · 2025-11-26 · unverdicted · novelty 6.0

Gated KalmaNet uses exact Kalman gain computation with adaptive gating and Chebyshev iteration to improve SSM performance on long-context tasks over prior approximations like DeltaNet.

Kaczmarz Linear Attention

cs.LG · 2026-05-09 · unverdicted · novelty 5.0

Kaczmarz Linear Attention replaces the empirical coefficient in Gated DeltaNet with a key-norm-normalized step size derived from the online regression objective, yielding lower perplexity and better needle-in-haystack performance.

MDN: Parallelizing Stepwise Momentum for Delta Linear Attention

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

MDN parallelizes stepwise momentum for delta linear attention using geometric reordering and dynamical systems analysis, yielding performance gains over Mamba2 and GDN on 400M and 1.3B models.

TTT3R: 3D Reconstruction as Test-Time Training

cs.CV · 2025-09-30 · unverdicted · novelty 5.0

TTT3R derives a closed-form learning rate from memory-observation alignment confidence to boost length generalization in RNN-based 3D reconstruction by 2x in global pose estimation.

StateX: Enhancing RNN Recall via Post-training State Expansion

cs.CL · 2025-09-26 · unverdicted · novelty 5.0

StateX post-trains RNNs to expand recurrent state size, improving recall and in-context learning with negligible parameter growth.

Distributed Associative Memory via Online Convex Optimization

cs.LG · 2025-09-26 · unverdicted · novelty 5.0

A distributed online convex optimization protocol for associative memory achieves sublinear regret guarantees and outperforms baselines in experiments.

citing papers explorer

Showing 14 of 14 citing papers.

Test-Time Training with KV Binding Is Secretly Linear Attention cs.LG · 2026-02-24 · conditional · none · ref 20
Test-time training with KV binding reduces to learned linear attention.
Sparse Attention as Compact Kernel Regression cs.LG · 2026-01-30 · unverdicted · none · ref 18
Sparse attention arises from compact kernel regression, with Epanechnikov and similar kernels mapping to normalized ReLU, sparsemax, and alpha-entmax attention.
Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences cs.LG · 2026-04-22 · unverdicted · none · ref 56
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
ZipMap: Linear-Time Stateful 3D Reconstruction via Test-Time Training cs.CV · 2026-03-04 · unverdicted · none · ref 70
ZipMap achieves linear-time bidirectional 3D reconstruction by zipping image collections into a compact stateful representation via test-time training layers.
OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention cs.LG · 2026-05-13 · unverdicted · none · ref 61
OSDN adds online diagonal preconditioning to the Delta Rule, preserving chunkwise parallelism while proving super-geometric convergence and delivering 32-39% recall gains at 340M-1.3B scales.
Scal3R: Scalable Test-Time Training for Large-Scale 3D Reconstruction cs.CV · 2026-04-09 · unverdicted · none · ref 79
Scal3R achieves better accuracy and consistency in large-scale 3D scene reconstruction by maintaining a compressed global context through test-time adaptation of lightweight neural networks on long video sequences.
Fast Spatial Memory with Elastic Test-Time Training cs.CV · 2026-04-08 · unverdicted · none · ref 51
Elastic Test-Time Training stabilizes test-time updates via an elastic prior and moving-average anchor, enabling Fast Spatial Memory for scalable long-sequence 4D reconstruction with reduced memory use and fewer shortcuts.
In-Place Test-Time Training cs.LG · 2026-04-07 · conditional · none · ref 55
In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.
Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression cs.LG · 2025-11-26 · unverdicted · none · ref 61
Gated KalmaNet uses exact Kalman gain computation with adaptive gating and Chebyshev iteration to improve SSM performance on long-context tasks over prior approximations like DeltaNet.
Kaczmarz Linear Attention cs.LG · 2026-05-09 · unverdicted · none · ref 44
Kaczmarz Linear Attention replaces the empirical coefficient in Gated DeltaNet with a key-norm-normalized step size derived from the online regression objective, yielding lower perplexity and better needle-in-haystack performance.
MDN: Parallelizing Stepwise Momentum for Delta Linear Attention cs.LG · 2026-05-07 · unverdicted · none · ref 84
MDN parallelizes stepwise momentum for delta linear attention using geometric reordering and dynamical systems analysis, yielding performance gains over Mamba2 and GDN on 400M and 1.3B models.
TTT3R: 3D Reconstruction as Test-Time Training cs.CV · 2025-09-30 · unverdicted · none · ref 88
TTT3R derives a closed-form learning rate from memory-observation alignment confidence to boost length generalization in RNN-based 3D reconstruction by 2x in global pose estimation.
StateX: Enhancing RNN Recall via Post-training State Expansion cs.CL · 2025-09-26 · unverdicted · none · ref 21
StateX post-trains RNNs to expand recurrent state size, improving recall and in-context learning with negligible parameter growth.
Distributed Associative Memory via Online Convex Optimization cs.LG · 2025-09-26 · unverdicted · none · ref 11
A distributed online convex optimization protocol for associative memory achieves sublinear regret guarantees and outperforms baselines in experiments.

arXiv preprint arXiv:2501.12352 , year=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer