pith. machine review for the scientific record. sign in

arxiv: 2512.12602 · v4 · submitted 2025-12-14 · 💻 cs.LG

Recognition: unknown

Exact Flow Linear Attention: Exact Solution from Continuous-Time Dynamics

Authors on Pith no claims yet
classification 💻 cs.LG
keywords attentiondelta-rulelinearexactupdatedynamicseflaflow
0
0 comments X
read the original abstract

In this paper, we introduce Exact Flow Linear Attention~(EFLA), an exact-flow formulation of delta-rule linear attention. We show that the delta-rule update can be interpreted as an explicit Euler discretization of an underlying continuous-time system. EFLA replaces this first-order update with the exact closed-form flow. By exploiting the rank-1 structure of the dynamics matrix, both the matrix exponential and the input integral collapse to a simple update that preserves delta-rule linear attention's algebraic structure, parameter count, linear-time complexity, and chunkwise parallelism. This attention mechanism removes the Euler discretization error of the delta-rule dynamics without introducing additional parameters. Experiments on robustness tests, language modeling benchmarks, and the MAD synthetic benchmark show that EFLA improves stability under corrupted and high-energy inputs, reduces perplexity, and achieves stronger downstream performance compared to SSM and Euler-style baselines. These results establish exact-flow integration as a principled and scalable update mechanism for delta-rule linear attention.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. $\delta$-mem: Efficient Online Memory for Large Language Models

    cs.AI 2026-05 unverdicted novelty 6.0

    δ-mem augments frozen LLMs with an 8x8 online memory state updated by delta-rule learning to generate low-rank attention corrections, delivering 1.10x average gains over the backbone and larger improvements on memory-...

  2. Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity

    cs.LG 2026-04 unverdicted novelty 6.0

    Sonata is a small hybrid world model pre-trained to predict future IMU states that outperforms autoregressive baselines on clinical discrimination, fall-risk prediction, and cross-cohort transfer while fitting on-devi...

  3. MDN: Parallelizing Stepwise Momentum for Delta Linear Attention

    cs.LG 2026-05 unverdicted novelty 5.0

    MDN parallelizes stepwise momentum for delta linear attention using geometric reordering and dynamical systems analysis, yielding performance gains over Mamba2 and GDN on 400M and 1.3B models.