pith. sign in

super hub Mixed citations

Attention is all you need.Advances in neural information processing systems, 30

Mixed citation behavior. Most common role is background (52%).

138 Pith papers citing it
Background 52% of classified citations

hub tools

citation-role summary

background 17 method 12 baseline 2 dataset 1 other 1

citation-polarity summary

claims ledger

  • background and block-sparse FlashAttentionenable longer context in Transformers, yielding higher quality models (0.7 better perplexity on GPT-2 and 6.4 points of lift on long-document classification) and entirely new capabilities: the first Transformers to achieve better-than-chance performance on the Path-X challenge (seq. length 16K, 61.4% accuracy) and Path-256 (seq. length 64K, 63.1% accuracy). 1 Introduction Transformer models [82] have emerged as the most widely used architecture in applications such a
  • method would differ in the network structure (i.e., G̸=G ′). Then, there exists (f, g) such that the population trajectories({x i(t+k)}, G(t+k)and({x ′ i(t+k)}, G ′(t+k)diverge for allk >0. Single-task agentic systems either treat observations as independent and identically distributed (i.i.d.) [93] or the dependencies are modeled globally through a full attention mechanism [ 95]. Neither captures the topology-constrained local observability that characterizes real social systems. In a MASS, G is an ir

authors

co-cited works

representative citing papers

Any-Dimensional Invariant Universality

cs.LG · 2026-05-22 · unverdicted · novelty 8.0

A systematic approach maps any-dimensional invariant functions to a unique function on an infinite-dimensional limit space admitting a topology with compact sets where universality holds, with examples of non-universal architectures and fixes.

Rotation Equivariant Mamba for Vision Tasks

cs.CV · 2026-03-10 · unverdicted · novelty 8.0

EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-equivariant baselines across classification, segmentation, and super-resolution.

Dynamic Chunking for Diffusion Language Models

cs.CL · 2026-05-15 · unverdicted · novelty 7.0

DCDM replaces positional blocks with learnable semantic chunks via differentiable Chunking Attention, yielding consistent gains over block and unstructured diffusion baselines up to 1.5B parameters.

Can Graphs Help Vision SSMs See Better?

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

GraphScan replaces geometric or coordinate-based scanning in Vision SSMs with learned local semantic graph routing, yielding SOTA results among such models on classification and segmentation tasks.

DeepL\'evy: Learning Heavy-Tailed Uncertainty in Highly Volatile Time Series

cs.LG · 2026-05-11 · unverdicted · novelty 7.0 · 3 refs

DeepLévy learns mixtures of Lévy stable distributions for heavy-tailed time series forecasting by minimizing discrepancies between empirical and parametric characteristic functions, outperforming prior methods on tail risk metrics under extreme volatility.

TIDES: Implicit Time-Awareness in Selective State Space Models

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

TIDES reconciles selective SSM expressivity with continuous-time physical discretization by moving input dependence onto the state matrix, enabling native irregular time series handling and achieving SOTA on UEA and Physiome-ODE benchmarks.

citing papers explorer

Showing 50 of 138 citing papers.