Transformers with O(sum m^j) blocks and O(d sum m^j) parameters can exactly interpolate any finite dataset of input sequences in R^d to output sequences of lengths m^j.
Hernández and E
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2025 2representative citing papers
The authors interpret neural networks as PDEs, propose a dual formulation for parabolic coefficient control with an existence proof for minimizers, and establish existence for an approximated hyperbolic control problem.
citing papers explorer
-
Exact Sequence Interpolation with Transformers
Transformers with O(sum m^j) blocks and O(d sum m^j) parameters can exactly interpolate any finite dataset of input sequences in R^d to output sequences of lengths m^j.
-
Control and optimization for Neural Partial Differential Equations in Supervised Learning
The authors interpret neural networks as PDEs, propose a dual formulation for parabolic coefficient control with an existence proof for minimizers, and establish existence for an approximated hyperbolic control problem.