⟨kl, xt⟩ dL dqt +⟨k l, dL dqt ⟩xt + 2a ||Ht||F ⟨xt, dL dqt ⟩Htkl # (50) Or equivalently, collecting the terms that are linear in the gradient: dL dkl =− X t≥l t−1Y i=l γi !

First-order methods for solving Hξ=q converge at most at a rate Ra := √κ−1√κ+1, we see ωi converges at an even faster rate · 2048 · arXiv 2372.0036

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression

cs.LG · 2025-11-26 · unverdicted · novelty 6.0

Gated KalmaNet uses exact Kalman gain computation with adaptive gating and Chebyshev iteration to improve SSM performance on long-context tasks over prior approximations like DeltaNet.

citing papers explorer

Showing 1 of 1 citing paper.

Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression cs.LG · 2025-11-26 · unverdicted · none · ref 77
Gated KalmaNet uses exact Kalman gain computation with adaptive gating and Chebyshev iteration to improve SSM performance on long-context tasks over prior approximations like DeltaNet.

⟨kl, xt⟩ dL dqt +⟨k l, dL dqt ⟩xt + 2a ||Ht||F ⟨xt, dL dqt ⟩Htkl # (50) Or equivalently, collecting the terms that are linear in the gradient: dL dkl =− X t≥l t−1Y i=l γi !

fields

years

verdicts

representative citing papers

citing papers explorer