Attention is a smoothed cubic spline

Zehua Lai, Lek-Heng Lim, Yucong Liu · 2024 · arXiv 2408.09624

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Copositive Matrices with Ordered Off-Diagonal Entries

math.OC · 2026-05-15 · unverdicted · novelty 7.0

Copositive matrices with nondecreasing off-diagonal entries admit a PSD plus nonnegative decomposition, which implies exactness of a natural relaxation for separable quadratic optimization over the simplex.

Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights

cs.LG · 2025-05-06 · unverdicted · novelty 7.0

Transformers achieve approximation and generalization error bounds for noisy manifold regression that scale with the intrinsic dimension of the task-level manifold.

Algebraic Invariants of Lightning Self-Attention

math.AG · 2026-04-17 · unverdicted · novelty 5.0

Lightning self-attention coefficients are coordinates on an algebraic variety obeying Chow-type, low-rank, Veronese-type, and Sylvester-resultant invariants.

A Mathematical Explanation of Transformers

cs.LG · 2025-10-05 · unverdicted · novelty 5.0

The Transformer is interpreted as discretization of a structured integro-differential equation in continuous domains for tokens and features, unifying attention, feedforward, and normalization via operator and variational views.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights cs.LG · 2025-05-06 · unverdicted · none · ref 8
Transformers achieve approximation and generalization error bounds for noisy manifold regression that scale with the intrinsic dimension of the task-level manifold.
A Mathematical Explanation of Transformers cs.LG · 2025-10-05 · unverdicted · none · ref 24
The Transformer is interpreted as discretization of a structured integro-differential equation in continuous domains for tokens and features, unifying attention, feedforward, and normalization via operator and variational views.

Attention is a smoothed cubic spline

fields

years

verdicts

representative citing papers

citing papers explorer