VLA stabilizes linear attention by solving regularized least-squares updates with unit-length writes, yielding Jacobian spectral norm exactly 1 and 109x smaller state norms while improving multi-query recall accuracy over standard linear attention and DeltaNet.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 2years
2026 2representative citing papers
RK4 at 80 function evaluations matches Euler at 200 in sliced Wasserstein quality for flow matching sampling, with the adaptive solver concentrating steps near t=1 due to stiffening velocity fields.
citing papers explorer
-
Variational Linear Attention: Stable Associative Memory for Long-Context Transformers
VLA stabilizes linear attention by solving regularized least-squares updates with unit-length writes, yielding Jacobian spectral norm exactly 1 and 109x smaller state norms while improving multi-query recall accuracy over standard linear attention and DeltaNet.
-
From Euler to Dormand-Prince: ODE Solvers for Flow Matching Generative Models
RK4 at 80 function evaluations matches Euler at 200 in sliced Wasserstein quality for flow matching sampling, with the adaptive solver concentrating steps near t=1 due to stiffening velocity fields.