Parallelizing linear transformers with the delta rule over sequence length,

· 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

FG$^2$-GDN: Enhancing Long-Context Gated Delta Networks with Doubly Fine-Grained Control

cs.LG · 2026-04-21 · unverdicted · novelty 5.0

FG²-GDN replaces the scalar beta in the delta update with a channel-wise vector and decouples key/value scaling to improve recall over prior GDN and KDA models.

A Cellular Doctrine of Morality: Intrinsic Active Precision and the Mind-Reality Overload Dilemma

cs.AI · 2026-05-02 · unverdicted · novelty 3.0

AI incorporating active precision from pyramidal neurons may reduce information overload by evaluating evidence coherence before attention rather than maximizing rewards.

citing papers explorer

Showing 2 of 2 citing papers.

FG$^2$-GDN: Enhancing Long-Context Gated Delta Networks with Doubly Fine-Grained Control cs.LG · 2026-04-21 · unverdicted · none · ref 6
FG²-GDN replaces the scalar beta in the delta update with a channel-wise vector and decouples key/value scaling to improve recall over prior GDN and KDA models.
A Cellular Doctrine of Morality: Intrinsic Active Precision and the Mind-Reality Overload Dilemma cs.AI · 2026-05-02 · unverdicted · none · ref 36
AI incorporating active precision from pyramidal neurons may reduce information overload by evaluating evidence coherence before attention rather than maximizing rewards.

Parallelizing linear transformers with the delta rule over sequence length,

fields

years

verdicts

representative citing papers

citing papers explorer