Necessary and sufficient conditions for universality of Kolmogorov-Arnold networks
Pith reviewed 2026-05-20 23:51 UTC · model grok-4.3
The pith
Deep KANs achieve universal approximation on any compact set precisely when they include one fixed non-affine univariate edge function.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Deep KANs in which all edge functions are either affine or equal to a fixed continuous function σ are dense in C(K) for every compact set K⊂R^n if and only if σ is non-affine. In contrast, for KANs with exactly two hidden layers, universality holds if and only if σ is nonpolynomial. The full class of affine functions is not required and can be replaced by a finite set; in the nonpolynomial case a fixed family of five affine functions suffices for arbitrary depth. For every continuous non-affine σ there exists a finite affine family A_σ such that deep KANs with edge functions drawn from A_σ ∪ {σ} remain universal. KANs using the spline-based edge parameterization of Liu et al. are universal,
What carries the argument
The KAN architecture whose nodes sum univariate edge functions, together with the precise condition that a single fixed continuous univariate function σ must be non-affine for the resulting networks to be dense in the uniform topology on C(K).
If this is right
- Universality holds for arbitrary depth once a single non-affine σ is included, even when the remaining edges are drawn from a finite affine family of size five in the nonpolynomial case.
- With exactly two hidden layers the non-affine function σ must additionally be nonpolynomial to guarantee density.
- Spline-based KANs with any fixed spline degree and fixed knot sequence are universal approximators in the classical sense.
- Every continuous function on a compact set can be approximated using only the coordinate functions and two fixed univariate functions combined by repeated addition and composition.
Where Pith is reading between the lines
- The non-linearity required for universality can be localized to a single type of edge function, suggesting that most edges in a trained KAN could be kept linear or fixed without sacrificing expressivity.
- The finite affine replacement result opens the possibility of discretizing the linear part of a KAN into a small lookup table while preserving approximation power.
- The superposition implication tightens the classical Kolmogorov-Arnold theorem by showing that two fixed univariate functions, rather than a continuum, suffice when addition and composition are allowed freely.
Load-bearing premise
The KAN architecture is defined with univariate edge functions combined via summation at nodes and density is measured in the uniform norm on C(K); any change in this definition could invalidate the if-and-only-if statements.
What would settle it
Construct a deep KAN using only affine functions plus the fixed affine map σ(x) = x and exhibit a continuous target function on [0,1]^2 whose uniform approximation error remains bounded away from zero; repeat the construction with a non-affine σ such as σ(x) = x^2 and verify that the error can be driven to zero.
read the original abstract
We analyze the universal approximation property of Kolmogorov-Arnold Networks (KANs) in terms of their edge functions. If these functions are all affine, then universality clearly fails. How many non-affine functions are needed, in addition to affine ones, to ensure universality? We show that a single one suffices. More precisely, we prove that deep KANs in which all edge functions are either affine or equal to a fixed continuous function $\sigma$ are dense in $C(K)$ for every compact set $K\subset\mathbb{R}^n$ if and only if $\sigma$ is non-affine. In contrast, for KANs with exactly two hidden layers, universality holds if and only if $\sigma$ is nonpolynomial. We further show that the full class of affine functions is not required; it can be replaced by a finite set without affecting universality. In particular, in the nonpolynomial case, a fixed family of five affine functions suffices when the depth is arbitrary. More generally, for every continuous non-affine function $\sigma$, there exists a finite affine family $A_\sigma$ such that deep KANs with edge functions in $A_\sigma\cup\{\sigma\}$ remain universal. We also prove that KANs with the spline-based edge parameterization introduced by Liu et al.~\cite{Liu2024} are universal approximators in the classical sense, even when the spline degree and knot sequence are fixed in advance. This paper also has implications for the theory of superpositions of real functions. In particular, we show that every continuous multivariate function can be approximated arbitrarily well using only the coordinate functions and two fixed univariate functions under repeated addition and composition.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper establishes necessary and sufficient conditions for the universal approximation property of Kolmogorov-Arnold Networks (KANs). It proves that deep KANs whose edge functions are restricted to the affine class or a single fixed continuous function σ are dense in C(K) for every compact K ⊂ R^n if and only if σ is non-affine. For KANs with exactly two hidden layers the corresponding condition is that σ is non-polynomial. The manuscript further shows that the full affine class can be replaced by a finite family A_σ (of size five when σ is non-polynomial) while preserving universality, and that the spline-based edge parameterization of Liu et al. yields universal approximators even when degree and knots are fixed in advance. The work also derives consequences for the theory of superpositions of real functions.
Significance. If the stated iff statements hold, the paper supplies a sharp characterization of the minimal nonlinearity required for KAN universality. The explicit finite affine family A_σ and the fixed-spline universality result are practically useful and theoretically clean. The reduction to coordinate functions plus two fixed univariate functions under addition and composition strengthens the classical superposition literature. These contributions rest on standard density arguments in C(K) together with explicit constructions that avoid circularity or parameter-fitting artifacts.
major comments (1)
- §4, Theorem 4.3 (two-layer case): the necessity proof that a polynomial σ yields only polynomial networks is clear, but the sufficiency direction for non-polynomial σ relies on a specific nesting argument; an explicit low-dimensional counter-example (e.g., n=1, target x^2) showing that two layers with a polynomial σ cannot approximate it would strengthen the claim.
minor comments (3)
- §2.1: the precise definition of a KAN layer (how univariate edge functions are summed at each node for vector-valued inputs) should be stated with an equation or diagram to avoid ambiguity when comparing to the classical Kolmogorov-Arnold representation.
- The statement that |A_σ|=5 suffices for non-polynomial σ appears in the abstract and §5; a short table or explicit construction for σ(x)=x^3 would make the finite-family result immediately verifiable.
- Reference to Liu et al. (2024) is given, but the precise spline degree and knot sequence used in the universality proof for fixed splines should be restated in the theorem statement for self-contained reading.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation and the recommendation of minor revision. The single major comment is addressed below.
read point-by-point responses
-
Referee: §4, Theorem 4.3 (two-layer case): the necessity proof that a polynomial σ yields only polynomial networks is clear, but the sufficiency direction for non-polynomial σ relies on a specific nesting argument; an explicit low-dimensional counter-example (e.g., n=1, target x^2) showing that two layers with a polynomial σ cannot approximate it would strengthen the claim.
Authors: We appreciate the suggestion to make the necessity direction more concrete. Our general argument already shows that any polynomial σ produces networks whose outputs are polynomials of bounded degree (determined by the fixed degree of σ and the two-layer nesting depth), which cannot approximate non-polynomial targets. We agree, however, that an explicit low-dimensional illustration would improve readability. In the revised manuscript we will add a short example for n=1 and target x² (or a similar quadratic) demonstrating that, for a polynomial σ, the two-layer architecture cannot recover the target due to the restricted functional form. This addition will appear in §4 alongside the existing proof. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper establishes its iff universality claims for deep KANs via direct proofs from the given architecture definitions (univariate edge functions summed at nodes) and uniform norm on C(K). Necessity when σ is affine follows immediately because the resulting networks are affine maps, which are not dense. Sufficiency constructs explicit finite affine probes from deviation points of a non-affine σ and uses nesting to generate separating nonlinearities; the two-layer case separately requires non-polynomial σ because depth is bounded. The finite family A_σ is constructed explicitly without fitting to target data, and the spline parameterization result is a separate verification against the Liu et al. definition. All steps are self-contained in standard density arguments and do not reduce to self-citations, fitted inputs renamed as predictions, or definitional equivalences.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Edge functions are continuous univariate maps from R to R.
- domain assumption The network topology consists of summations at nodes with univariate functions on edges.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
deep KANs in which all edge functions are either affine or equal to a fixed continuous function σ are dense in C(K) ... iff σ is non-affine (Theorem 5.1)
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery and embed_strictMono unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
KANs ... with exactly two hidden layers, universality holds iff σ is nonpolynomial (Theorem 3.1)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Z. Liu, Y. Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljaˇ ci´ c, T. Y. Hou, M. Tegmark, KAN: Kolmogorov–Arnold Networks,arXiv:2404.19756, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[2]
A. N. Kolmogorov, On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. (Russian), Dokl. Akad. Nauk SSSR114(1957), 953–956
work page 1957
-
[3]
V. I. Arnold, On the representation of continuous functions of three variables by superpositions of continuous functions of two variables. (Russian),Mat. Sb. (N.S.) 48/90(1959), 3–74; English transl. in:Amer. Math. Soc. Transl.(2)28(1963), 61–147. 17
work page 1959
-
[4]
S. Ya. Khavinson,Best approximation by linear superpositions (approximate nomog- raphy), American Mathematical Society, Providence, RI, 1997, 175 pp
work page 1997
-
[5]
V. E. Ismailov,Ridge functions and applications in neural networks, American Math- ematical Society, Providence, RI, 2021, 186 pp
work page 2021
-
[6]
V. E. Ismailov, A three layer neural network can represent any multivariate function, J. Math. Anal. Appl.523(2023), no. 1, Article No. 127096, 8 pp
work page 2023
-
[7]
G. G. Lorentz, Metric entropy, widths, and superpositions of functions,Amer. Math. Monthly69(1962), 469–485
work page 1962
-
[8]
D. A. Sprecher, On the structure of continuous functions of several variables,Trans. Amer. Math. Soc.115(1965), 340–355
work page 1965
-
[9]
A. Ismayilova and V. E. Ismailov, On the Kolmogorov neural networks,Neural Net- works176(2024), Article No. 106333
work page 2024
-
[10]
B. Igelnik and N. Parikh, Kolmogorov’s spline network,IEEE Trans. Neural Netw. 14(2003), no. 4, 725–733
work page 2003
-
[11]
A. Polar and M. Poluektov, A deep machine learning algorithm for construction of the Kolmogorov—Arnold representation,Eng. Appl. Artif. Intell.99(2021), Article No. 104137
work page 2021
-
[12]
M. Poluektov and A. Polar, Construction of the Kolmogorov–Arnold networks using the Newton–Kaczmarz method,Mach. Learn.114(2025), Article No. 185
work page 2025
- [13]
-
[14]
A. Kratsios, B. J. Kim, and T. Furuya, Approximation rates in Besov norms and sample-complexity of Kolmogorov–Arnold networks with residual connections, arXiv:2504.15110, 2025
-
[15]
S. Gleyzer, H. Nguyen, D. P. Ramakrishnan, and E. A. F. Reinhardt, Sinusoidal approximation theorem for Kolmogorov–Arnold networks,Mathematics13(2025), no. 19, Article No. 3157
work page 2025
- [16]
-
[17]
J. D. Toscano, L.-L. Wang, and G. E. Karniadakis, KKANs: K˚ urkov´ a–Kolmogorov– Arnold networks and their learning dynamics,Neural Networks191(2025), Article No. 107831
work page 2025
- [18]
-
[19]
X. Zhang and H. Zhou, Generalization bounds and model complexity for Kolmo- gorov–Arnold networks,The Thirteenth International Conference on Learning Rep- resentations, 2025. 18
work page 2025
-
[20]
S. M. Eshtehardian, M. H. Yassaee, and B. Khalaj, On the convergence of two-layer Kolmogorov–Arnold networks with first-layer training,The Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[21]
S. Somvanshi, S. A. Javed, M. M. Islam, D. Pandit, and S. Das, A survey on Kolmogorov–Arnold network,ACM Comput. Surv., 58 (2025), no. 2, Article 55
work page 2025
-
[22]
A Practitioner's Guide to Kolmogorov-Arnold Networks
A. Noorizadegan, S. Wang, L. Ling, J. P. Dominguez-Morales A practitioner’s guide to Kolmogorov–Arnold networks,arXiv:2510.25781, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [23]
-
[24]
Pinkus, Approximation theory of the MLP model in neural networks,Acta nu- merica8(1999), 143–195
A. Pinkus, Approximation theory of the MLP model in neural networks,Acta nu- merica8(1999), 143–195
work page 1999
-
[25]
de Boor,A practical guide to splines, Revised Edition, Springer, New York, 2001, 346 pp
C. de Boor,A practical guide to splines, Revised Edition, Springer, New York, 2001, 346 pp
work page 2001
-
[26]
L. L. Schumaker,Spline functions: basic theory, Third Edition, Cambridge Univer- sity Press, Cambridge, 2007, 582 pp. 19
work page 2007
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.