hub

Transformers are rnns: Fast autoregressive transformers with linear attention

Noveen Sachdeva, Ziran Wang, Kyungtae Han, Rohit Gupta, Julian McAuley · 2020 · 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC) · DOI 10.1109/itsc55140.2022.9922275

11 Pith papers cite this work, alongside 2 external citations. Polarity classification is still indexing.

11 Pith papers citing it

2 external citations · external index

open at publisher browse 11 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

A meshfree exterior calculus for generalizable and data-efficient learning of physics from point clouds

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

MEEC equips point clouds with a discrete exterior calculus that satisfies exact conservation and is differentiable in point positions, allowing a single trained kernel to produce compatible physics on unseen geometries and parameters.

Rotation Equivariant Mamba for Vision Tasks

cs.CV · 2026-03-10 · unverdicted · novelty 8.0

EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-equivariant baselines across classification, segmentation, and super-resolution.

Mamba-VGGT: Persistent Long-Sequence Video Geometry Grounded Transformer via External Sliding Window Mamba Memory

cs.CV · 2026-05-17 · unverdicted · novelty 7.0

Mamba-VGGT introduces a Sliding Window Mamba memory module and Zero-Init Spatial Memory Injector to enable persistent long-range geometric reasoning in VGGT for extended video sequences.

Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Stream-CQSA uses CQS-based decomposition to stream exact attention computations for billion-token sequences on limited-memory hardware.

A Hormone-inspired Emotion Layer for Transformer language models (HELT)

cs.NE · 2026-04-13 · unverdicted · novelty 7.0

HormoneT5 augments T5 with a hormone-inspired block that predicts six continuous emotion values and uses them to modulate responses, reporting over 85% per-hormone accuracy and human preference for emotional quality.

Training Transformers for KV Cache Compressibility

cs.LG · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

Training transformers with KV sparsification during continued pretraining produces representations that admit better post-hoc KV cache compression, improving quality under memory budgets for long-context tasks.

Linear-Time Global Visual Modeling without Explicit Attention

cs.CV · 2026-05-03 · unverdicted · novelty 6.0

Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.

UniBCI: Towards a Unified Pretrained Model for Invasive Brain-Computer Interfaces

cs.NE · 2026-04-30 · unverdicted · novelty 6.0

UniBCI is a unified pretrained model for invasive neural spike data that uses CST tokenization, IAA attention, and self-supervised masked reconstruction to achieve SOTA downstream performance with better generalization and efficiency.

Axiomatizing Neural Networks via Pursuit of Subspaces

cs.LG · 2026-05-19 · unverdicted · novelty 5.0

Authors introduce the Pursuit of Subspaces (PoS) hypothesis, an axiomatic geometric framework that unifies explanations for representation, computation, and generalization in shallow and deep neural networks.

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

cs.CV · 2026-05-14 · unverdicted · novelty 5.0

SANA-WM is a 2.6B-parameter efficient world model that synthesizes minute-scale 720p videos with 6-DoF camera control, trained on 213K public clips in 15 days on 64 H100s and runnable on single GPUs at 36x higher throughput than prior open baselines.

Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism

cs.LG · 2025-10-30 · unverdicted · novelty 5.0

Nirvana adds a task-aware memory trigger and updater to specialized generalist models, achieving strong general benchmark results, lowest perplexity in biomedicine/finance/law, and improved MRI reconstruction fidelity.

citing papers explorer

Showing 11 of 11 citing papers.

A meshfree exterior calculus for generalizable and data-efficient learning of physics from point clouds cs.LG · 2026-05-08 · unverdicted · none · ref 69
MEEC equips point clouds with a discrete exterior calculus that satisfies exact conservation and is differentiable in point positions, allowing a single trained kernel to produce compatible physics on unseen geometries and parameters.
Rotation Equivariant Mamba for Vision Tasks cs.CV · 2026-03-10 · unverdicted · none · ref 36
EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-equivariant baselines across classification, segmentation, and super-resolution.
Mamba-VGGT: Persistent Long-Sequence Video Geometry Grounded Transformer via External Sliding Window Mamba Memory cs.CV · 2026-05-17 · unverdicted · none · ref 14
Mamba-VGGT introduces a Sliding Window Mamba memory module and Zero-Init Spatial Memory Injector to enable persistent long-range geometric reasoning in VGGT for extended video sequences.
Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling cs.LG · 2026-04-22 · unverdicted · none · ref 14
Stream-CQSA uses CQS-based decomposition to stream exact attention computations for billion-token sequences on limited-memory hardware.
A Hormone-inspired Emotion Layer for Transformer language models (HELT) cs.NE · 2026-04-13 · unverdicted · none · ref 37
HormoneT5 augments T5 with a hormone-inspired block that predicts six continuous emotion values and uses them to modulate responses, reporting over 85% per-hormone accuracy and human preference for emotional quality.
Training Transformers for KV Cache Compressibility cs.LG · 2026-05-07 · unverdicted · none · ref 29 · 2 links
Training transformers with KV sparsification during continued pretraining produces representations that admit better post-hoc KV cache compression, improving quality under memory budgets for long-context tasks.
Linear-Time Global Visual Modeling without Explicit Attention cs.CV · 2026-05-03 · unverdicted · none · ref 19
Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.
UniBCI: Towards a Unified Pretrained Model for Invasive Brain-Computer Interfaces cs.NE · 2026-04-30 · unverdicted · none · ref 25
UniBCI is a unified pretrained model for invasive neural spike data that uses CST tokenization, IAA attention, and self-supervised masked reconstruction to achieve SOTA downstream performance with better generalization and efficiency.
Axiomatizing Neural Networks via Pursuit of Subspaces cs.LG · 2026-05-19 · unverdicted · none · ref 36
Authors introduce the Pursuit of Subspaces (PoS) hypothesis, an axiomatic geometric framework that unifies explanations for representation, computation, and generalization in shallow and deep neural networks.
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer cs.CV · 2026-05-14 · unverdicted · none · ref 69
SANA-WM is a 2.6B-parameter efficient world model that synthesizes minute-scale 720p videos with 6-DoF camera control, trained on 213K public clips in 15 days on 64 H100s and runnable on single GPUs at 36x higher throughput than prior open baselines.
Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism cs.LG · 2025-10-30 · unverdicted · none · ref 17
Nirvana adds a task-aware memory trigger and updater to specialized generalist models, achieving strong general benchmark results, lowest perplexity in biomedicine/finance/law, and improved MRI reconstruction fidelity.

Transformers are rnns: Fast autoregressive transformers with linear attention

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer