Transformers with O(sum m^j) blocks and O(d sum m^j) parameters can exactly interpolate any finite dataset of input sequences in R^d to output sequences of lengths m^j.
Approximation theory of the MLP model in neural net works
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Adaptive RBF-KAN adds multiple radial basis kernels and LOOCV-based shape initialization to FastKAN, with benchmark tests on 2D functions showing kernel-specific advantages for smooth, discontinuous, and oscillatory cases.
Dense ReLU networks under natural weight and dimension constraints fail to approximate certain Lipschitz functions, unlike unrestricted networks.
citing papers explorer
-
Exact Sequence Interpolation with Transformers
Transformers with O(sum m^j) blocks and O(d sum m^j) parameters can exactly interpolate any finite dataset of input sequences in R^d to output sequences of lengths m^j.
-
Adaptive RBF-KAN: A Comparative Evaluation of Dynamic Shape Parameters in Kolmogorov-Arnold Networks
Adaptive RBF-KAN adds multiple radial basis kernels and LOOCV-based shape initialization to FastKAN, with benchmark tests on 2D functions showing kernel-specific advantages for smooth, discontinuous, and oscillatory cases.
-
Neural Networks With Dense Weights Are Not Universal Approximators
Dense ReLU networks under natural weight and dimension constraints fail to approximate certain Lipschitz functions, unlike unrestricted networks.