TTT layers treat the hidden state as a trainable model updated at test time, allowing linear-complexity sequence models to scale perplexity reduction with context length unlike Mamba.
Learning to (learn at test time)
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
TTT-Discover applies test-time RL to set new state-of-the-art results on math inequalities, GPU kernels, algorithm contests, and single-cell denoising using an open model and public code.
TextGrad performs automatic differentiation for compound AI systems by backpropagating natural-language feedback from LLMs to optimize variables ranging from code to molecular structures.
A closed-form scalar frame-level gate α_t derived from internal feature changes extends effective memory in recurrent 3D reconstruction and improves accuracy on long sequences up to 4541 frames.
citing papers explorer
-
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
TTT layers treat the hidden state as a trainable model updated at test time, allowing linear-complexity sequence models to scale perplexity reduction with context length unlike Mamba.
-
Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
-
Learning to Discover at Test Time
TTT-Discover applies test-time RL to set new state-of-the-art results on math inequalities, GPU kernels, algorithm contests, and single-cell denoising using an open model and public code.
-
TextGrad: Automatic "Differentiation" via Text
TextGrad performs automatic differentiation for compound AI systems by backpropagating natural-language feedback from LLMs to optimize variables ranging from code to molecular structures.
-
Rethinking the State Update Gate for Long-Sequence Recurrent 3D Reconstruction
A closed-form scalar frame-level gate α_t derived from internal feature changes extends effective memory in recurrent 3D reconstruction and improves accuracy on long sequences up to 4541 frames.