Exact Flow Linear Attention derives a closed-form exact update for delta-rule linear attention from continuous-time dynamics, removing Euler discretization error while preserving linear complexity and structure.
Star-transformer
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 8roles
dataset 1polarities
use dataset 1representative citing papers
BlossomRec is a sparse attention mechanism that uses two distinct block-level patterns for long-term and short-term interests, fused by a gated output, to reduce computation in sequential recommendation Transformers.
Kimi Linear hybridizes linear attention with a new KDA module to beat full attention on tasks while slashing KV cache by 75% and speeding decoding up to 6x.
MoBA routes attention over blocks via MoE-style gating to enable dynamic, bias-light long-context attention that matches full attention performance at lower cost.
DragNUWA integrates text, image, and trajectory controls into a diffusion video model using a Trajectory Sampler, Multiscale Fusion, and Adaptive Training to enable fine-grained open-domain video generation.
GraphStar is a new GNN that adds star nodes and relay attention to achieve non-local representations for node, graph, and link tasks, claiming 2-5% gains over prior SOTA on benchmarks.
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
Presents agglomerative attention, a linear-complexity attention model that achieves comparable performance to full attention on language modeling tasks.
citing papers explorer
-
Exact Flow Linear Attention: Exact Solution from Continuous-Time Dynamics
Exact Flow Linear Attention derives a closed-form exact update for delta-rule linear attention from continuous-time dynamics, removing Euler discretization error while preserving linear complexity and structure.
-
BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations
BlossomRec is a sparse attention mechanism that uses two distinct block-level patterns for long-term and short-term interests, fused by a gated output, to reduce computation in sequential recommendation Transformers.
-
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Linear hybridizes linear attention with a new KDA module to beat full attention on tasks while slashing KV cache by 75% and speeding decoding up to 6x.
-
MoBA: Mixture of Block Attention for Long-Context LLMs
MoBA routes attention over blocks via MoE-style gating to enable dynamic, bias-light long-context attention that matches full attention performance at lower cost.
-
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory
DragNUWA integrates text, image, and trajectory controls into a diffusion video model using a Trajectory Sampler, Multiscale Fusion, and Adaptive Training to enable fine-grained open-domain video generation.
-
Graph Star Net for Generalized Multi-Task Learning
GraphStar is a new GNN that adds star nodes and relay attention to achieve non-local representations for node, graph, and link tasks, claiming 2-5% gains over prior SOTA on benchmarks.
-
TabEmb: Joint Semantic-Structure Embedding for Table Annotation
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
-
Agglomerative Attention
Presents agglomerative attention, a linear-complexity attention model that achieves comparable performance to full attention on language modeling tasks.