Necessary and sufficient conditions for universality of Kolmogorov-Arnold networks

Vugar Ismailov

arxiv: 2604.23765 · v2 · pith:O6OMS6LPnew · submitted 2026-04-26 · 💻 cs.LG · cs.NE· math.FA

Necessary and sufficient conditions for universality of Kolmogorov-Arnold networks

Vugar Ismailov This is my paper

Pith reviewed 2026-05-20 23:51 UTC · model grok-4.3

classification 💻 cs.LG cs.NEmath.FA

keywords Kolmogorov-Arnold networksuniversal approximationnon-affine functionsunivariate edge functionsspline parameterizationfunction superpositionsdensity in C(K)

0 comments

The pith

Deep KANs achieve universal approximation on any compact set precisely when they include one fixed non-affine univariate edge function.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes necessary and sufficient conditions on the edge functions of Kolmogorov-Arnold networks for them to approximate any continuous function on a compact domain in the uniform norm. It proves that when all edges are either affine or identical to one fixed continuous univariate function σ, density holds for arbitrary depth if and only if σ is non-affine. For networks limited to exactly two hidden layers the condition strengthens to σ being nonpolynomial. The work further shows that the full family of affine functions can be replaced by a finite collection without losing universality, and that standard spline-based KAN parameterizations remain universal even with fixed degree and knots. These results also imply that every continuous multivariate function can be approximated arbitrarily closely using only the coordinate projections together with two fixed univariate functions under repeated addition and composition.

Core claim

Deep KANs in which all edge functions are either affine or equal to a fixed continuous function σ are dense in C(K) for every compact set K⊂R^n if and only if σ is non-affine. In contrast, for KANs with exactly two hidden layers, universality holds if and only if σ is nonpolynomial. The full class of affine functions is not required and can be replaced by a finite set; in the nonpolynomial case a fixed family of five affine functions suffices for arbitrary depth. For every continuous non-affine σ there exists a finite affine family A_σ such that deep KANs with edge functions drawn from A_σ ∪ {σ} remain universal. KANs using the spline-based edge parameterization of Liu et al. are universal,

What carries the argument

The KAN architecture whose nodes sum univariate edge functions, together with the precise condition that a single fixed continuous univariate function σ must be non-affine for the resulting networks to be dense in the uniform topology on C(K).

If this is right

Universality holds for arbitrary depth once a single non-affine σ is included, even when the remaining edges are drawn from a finite affine family of size five in the nonpolynomial case.
With exactly two hidden layers the non-affine function σ must additionally be nonpolynomial to guarantee density.
Spline-based KANs with any fixed spline degree and fixed knot sequence are universal approximators in the classical sense.
Every continuous function on a compact set can be approximated using only the coordinate functions and two fixed univariate functions combined by repeated addition and composition.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The non-linearity required for universality can be localized to a single type of edge function, suggesting that most edges in a trained KAN could be kept linear or fixed without sacrificing expressivity.
The finite affine replacement result opens the possibility of discretizing the linear part of a KAN into a small lookup table while preserving approximation power.
The superposition implication tightens the classical Kolmogorov-Arnold theorem by showing that two fixed univariate functions, rather than a continuum, suffice when addition and composition are allowed freely.

Load-bearing premise

The KAN architecture is defined with univariate edge functions combined via summation at nodes and density is measured in the uniform norm on C(K); any change in this definition could invalidate the if-and-only-if statements.

What would settle it

Construct a deep KAN using only affine functions plus the fixed affine map σ(x) = x and exhibit a continuous target function on [0,1]^2 whose uniform approximation error remains bounded away from zero; repeat the construction with a non-affine σ such as σ(x) = x^2 and verify that the error can be driven to zero.

read the original abstract

We analyze the universal approximation property of Kolmogorov-Arnold Networks (KANs) in terms of their edge functions. If these functions are all affine, then universality clearly fails. How many non-affine functions are needed, in addition to affine ones, to ensure universality? We show that a single one suffices. More precisely, we prove that deep KANs in which all edge functions are either affine or equal to a fixed continuous function $\sigma$ are dense in $C(K)$ for every compact set $K\subset\mathbb{R}^n$ if and only if $\sigma$ is non-affine. In contrast, for KANs with exactly two hidden layers, universality holds if and only if $\sigma$ is nonpolynomial. We further show that the full class of affine functions is not required; it can be replaced by a finite set without affecting universality. In particular, in the nonpolynomial case, a fixed family of five affine functions suffices when the depth is arbitrary. More generally, for every continuous non-affine function $\sigma$, there exists a finite affine family $A_\sigma$ such that deep KANs with edge functions in $A_\sigma\cup\{\sigma\}$ remain universal. We also prove that KANs with the spline-based edge parameterization introduced by Liu et al.~\cite{Liu2024} are universal approximators in the classical sense, even when the spline degree and knot sequence are fixed in advance. This paper also has implications for the theory of superpositions of real functions. In particular, we show that every continuous multivariate function can be approximated arbitrarily well using only the coordinate functions and two fixed univariate functions under repeated addition and composition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

KANs hit universality with one non-affine continuous edge function for deep nets or a non-polynomial one for two layers, and a finite affine set works in place of all affines.

read the letter

The core result here is that deep KANs built from affine edge functions plus one fixed continuous σ are dense in C(K) on any compact K if and only if σ is non-affine. For exactly two hidden layers the bar is higher: σ must be non-polynomial. They also replace the entire affine class with a finite family A_σ that still guarantees universality, and they confirm the original spline KANs stay universal even with fixed degree and knots in advance. The superposition implication is a side note but follows directly from the same constructions. What the paper does cleanly is separate the necessity direction, which is immediate once everything collapses to affine maps, from the sufficiency direction that uses non-affinity to produce enough independent nonlinearities through nesting and summation. The explicit bound of five affines in the non-polynomial case is concrete and useful. The arguments sit on standard density results in uniform norm, so they feel grounded rather than invented. One soft spot is that everything stays inside uniform approximation on compact subsets of R^n; that is the right classical setting but it does not automatically extend to L^p or unbounded domains without extra work. The two-layer versus arbitrary-depth distinction is handled sharply and avoids overgeneralizing. Readers who care about exact minimal conditions in approximation theory or who are building theoretical variants of KANs will find the characterizations directly usable. The paper shows clear engagement with the functional-analysis side of the literature and the derivations appear consistent with the stated definitions of the architecture. It deserves a serious referee because the iff statements are sharp and the supporting constructions are explicit enough to check.

Referee Report

1 major / 3 minor

Summary. The paper establishes necessary and sufficient conditions for the universal approximation property of Kolmogorov-Arnold Networks (KANs). It proves that deep KANs whose edge functions are restricted to the affine class or a single fixed continuous function σ are dense in C(K) for every compact K ⊂ R^n if and only if σ is non-affine. For KANs with exactly two hidden layers the corresponding condition is that σ is non-polynomial. The manuscript further shows that the full affine class can be replaced by a finite family A_σ (of size five when σ is non-polynomial) while preserving universality, and that the spline-based edge parameterization of Liu et al. yields universal approximators even when degree and knots are fixed in advance. The work also derives consequences for the theory of superpositions of real functions.

Significance. If the stated iff statements hold, the paper supplies a sharp characterization of the minimal nonlinearity required for KAN universality. The explicit finite affine family A_σ and the fixed-spline universality result are practically useful and theoretically clean. The reduction to coordinate functions plus two fixed univariate functions under addition and composition strengthens the classical superposition literature. These contributions rest on standard density arguments in C(K) together with explicit constructions that avoid circularity or parameter-fitting artifacts.

major comments (1)

§4, Theorem 4.3 (two-layer case): the necessity proof that a polynomial σ yields only polynomial networks is clear, but the sufficiency direction for non-polynomial σ relies on a specific nesting argument; an explicit low-dimensional counter-example (e.g., n=1, target x^2) showing that two layers with a polynomial σ cannot approximate it would strengthen the claim.

minor comments (3)

§2.1: the precise definition of a KAN layer (how univariate edge functions are summed at each node for vector-valued inputs) should be stated with an equation or diagram to avoid ambiguity when comparing to the classical Kolmogorov-Arnold representation.
The statement that |A_σ|=5 suffices for non-polynomial σ appears in the abstract and §5; a short table or explicit construction for σ(x)=x^3 would make the finite-family result immediately verifiable.
Reference to Liu et al. (2024) is given, but the precise spline degree and knot sequence used in the universality proof for fixed splines should be restated in the theorem statement for self-contained reading.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation and the recommendation of minor revision. The single major comment is addressed below.

read point-by-point responses

Referee: §4, Theorem 4.3 (two-layer case): the necessity proof that a polynomial σ yields only polynomial networks is clear, but the sufficiency direction for non-polynomial σ relies on a specific nesting argument; an explicit low-dimensional counter-example (e.g., n=1, target x^2) showing that two layers with a polynomial σ cannot approximate it would strengthen the claim.

Authors: We appreciate the suggestion to make the necessity direction more concrete. Our general argument already shows that any polynomial σ produces networks whose outputs are polynomials of bounded degree (determined by the fixed degree of σ and the two-layer nesting depth), which cannot approximate non-polynomial targets. We agree, however, that an explicit low-dimensional illustration would improve readability. In the revised manuscript we will add a short example for n=1 and target x² (or a similar quadratic) demonstrating that, for a polynomial σ, the two-layer architecture cannot recover the target due to the restricted functional form. This addition will appear in §4 alongside the existing proof. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper establishes its iff universality claims for deep KANs via direct proofs from the given architecture definitions (univariate edge functions summed at nodes) and uniform norm on C(K). Necessity when σ is affine follows immediately because the resulting networks are affine maps, which are not dense. Sufficiency constructs explicit finite affine probes from deviation points of a non-affine σ and uses nesting to generate separating nonlinearities; the two-layer case separately requires non-polynomial σ because depth is bounded. The finite family A_σ is constructed explicitly without fitting to target data, and the spline parameterization result is a separate verification against the Liu et al. definition. All steps are self-contained in standard density arguments and do not reduce to self-citations, fitted inputs renamed as predictions, or definitional equivalences.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper rests on standard background from real analysis and approximation theory; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Edge functions are continuous univariate maps from R to R.
Required for the networks to map into C(K) and for density statements to make sense.
domain assumption The network topology consists of summations at nodes with univariate functions on edges.
This is the defining structure of KANs used throughout the claims.

pith-pipeline@v0.9.0 · 5837 in / 1489 out tokens · 33605 ms · 2026-05-20T23:51:54.315867+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

deep KANs in which all edge functions are either affine or equal to a fixed continuous function σ are dense in C(K) ... iff σ is non-affine (Theorem 5.1)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery and embed_strictMono unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

KANs ... with exactly two hidden layers, universality holds iff σ is nonpolynomial (Theorem 3.1)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 2 internal anchors

[1]

Z. Liu, Y. Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljaˇ ci´ c, T. Y. Hou, M. Tegmark, KAN: Kolmogorov–Arnold Networks,arXiv:2404.19756, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[2]

A. N. Kolmogorov, On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. (Russian), Dokl. Akad. Nauk SSSR114(1957), 953–956

work page 1957
[3]

V. I. Arnold, On the representation of continuous functions of three variables by superpositions of continuous functions of two variables. (Russian),Mat. Sb. (N.S.) 48/90(1959), 3–74; English transl. in:Amer. Math. Soc. Transl.(2)28(1963), 61–147. 17

work page 1959
[4]

S. Ya. Khavinson,Best approximation by linear superpositions (approximate nomog- raphy), American Mathematical Society, Providence, RI, 1997, 175 pp

work page 1997
[5]

V. E. Ismailov,Ridge functions and applications in neural networks, American Math- ematical Society, Providence, RI, 2021, 186 pp

work page 2021
[6]

V. E. Ismailov, A three layer neural network can represent any multivariate function, J. Math. Anal. Appl.523(2023), no. 1, Article No. 127096, 8 pp

work page 2023
[7]

G. G. Lorentz, Metric entropy, widths, and superpositions of functions,Amer. Math. Monthly69(1962), 469–485

work page 1962
[8]

D. A. Sprecher, On the structure of continuous functions of several variables,Trans. Amer. Math. Soc.115(1965), 340–355

work page 1965
[9]

Ismayilova and V

A. Ismayilova and V. E. Ismailov, On the Kolmogorov neural networks,Neural Net- works176(2024), Article No. 106333

work page 2024
[10]

Igelnik and N

B. Igelnik and N. Parikh, Kolmogorov’s spline network,IEEE Trans. Neural Netw. 14(2003), no. 4, 725–733

work page 2003
[11]

Polar and M

A. Polar and M. Poluektov, A deep machine learning algorithm for construction of the Kolmogorov—Arnold representation,Eng. Appl. Artif. Intell.99(2021), Article No. 104137

work page 2021
[12]

Poluektov and A

M. Poluektov and A. Polar, Construction of the Kolmogorov–Arnold networks using the Newton–Kaczmarz method,Mach. Learn.114(2025), Article No. 185

work page 2025
[13]

W. Liu, E. Chatzi, and Z. Lai, On the rate of convergence of Kolmogorov–Arnold network regression estimators,arXiv:2509.19830, 2025

work page arXiv 2025
[14]

Approximation rates in besov norms and sample- complexity of kolmogorov-arnold networks with residual connections, 2025

A. Kratsios, B. J. Kim, and T. Furuya, Approximation rates in Besov norms and sample-complexity of Kolmogorov–Arnold networks with residual connections, arXiv:2504.15110, 2025

work page arXiv 2025
[15]

Gleyzer, H

S. Gleyzer, H. Nguyen, D. P. Ramakrishnan, and E. A. F. Reinhardt, Sinusoidal approximation theorem for Kolmogorov–Arnold networks,Mathematics13(2025), no. 19, Article No. 3157

work page 2025
[16]

S-T. Chiu, S. W. Cheung, U. Braga-Neto, C. S. Lee, and R. P. Li, Free-RBF-KAN: Kolmogorov-Arnold networks with adaptive radial basis functions for efficient func- tion learning,arXiv:2601.07760, 2026

work page arXiv 2026
[17]

J. D. Toscano, L.-L. Wang, and G. E. Karniadakis, KKANs: K˚ urkov´ a–Kolmogorov– Arnold networks and their learning dynamics,Neural Networks191(2025), Article No. 107831

work page 2025
[18]

Y. Wang, J. W. Siegel, Z. Liu, and T. Y. Hou, On the expressiveness and spectral bias of KANs,arXiv:2410.01803, 2024

work page arXiv 2024
[19]

Zhang and H

X. Zhang and H. Zhou, Generalization bounds and model complexity for Kolmo- gorov–Arnold networks,The Thirteenth International Conference on Learning Rep- resentations, 2025. 18

work page 2025
[20]

S. M. Eshtehardian, M. H. Yassaee, and B. Khalaj, On the convergence of two-layer Kolmogorov–Arnold networks with first-layer training,The Fourteenth International Conference on Learning Representations, 2026

work page 2026
[21]

Somvanshi, S

S. Somvanshi, S. A. Javed, M. M. Islam, D. Pandit, and S. Das, A survey on Kolmogorov–Arnold network,ACM Comput. Surv., 58 (2025), no. 2, Article 55

work page 2025
[22]

A Practitioner's Guide to Kolmogorov-Arnold Networks

A. Noorizadegan, S. Wang, L. Ling, J. P. Dominguez-Morales A practitioner’s guide to Kolmogorov–Arnold networks,arXiv:2510.25781, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

Leshno, V

M. Leshno, V. Ya. Lin, A. Pinkus, and S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function,Neural Networks,6(1993), 861–867

work page 1993
[24]

Pinkus, Approximation theory of the MLP model in neural networks,Acta nu- merica8(1999), 143–195

A. Pinkus, Approximation theory of the MLP model in neural networks,Acta nu- merica8(1999), 143–195

work page 1999
[25]

de Boor,A practical guide to splines, Revised Edition, Springer, New York, 2001, 346 pp

C. de Boor,A practical guide to splines, Revised Edition, Springer, New York, 2001, 346 pp

work page 2001
[26]

L. L. Schumaker,Spline functions: basic theory, Third Edition, Cambridge Univer- sity Press, Cambridge, 2007, 582 pp. 19

work page 2007

[1] [1]

Z. Liu, Y. Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljaˇ ci´ c, T. Y. Hou, M. Tegmark, KAN: Kolmogorov–Arnold Networks,arXiv:2404.19756, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[2] [2]

A. N. Kolmogorov, On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. (Russian), Dokl. Akad. Nauk SSSR114(1957), 953–956

work page 1957

[3] [3]

V. I. Arnold, On the representation of continuous functions of three variables by superpositions of continuous functions of two variables. (Russian),Mat. Sb. (N.S.) 48/90(1959), 3–74; English transl. in:Amer. Math. Soc. Transl.(2)28(1963), 61–147. 17

work page 1959

[4] [4]

S. Ya. Khavinson,Best approximation by linear superpositions (approximate nomog- raphy), American Mathematical Society, Providence, RI, 1997, 175 pp

work page 1997

[5] [5]

V. E. Ismailov,Ridge functions and applications in neural networks, American Math- ematical Society, Providence, RI, 2021, 186 pp

work page 2021

[6] [6]

V. E. Ismailov, A three layer neural network can represent any multivariate function, J. Math. Anal. Appl.523(2023), no. 1, Article No. 127096, 8 pp

work page 2023

[7] [7]

G. G. Lorentz, Metric entropy, widths, and superpositions of functions,Amer. Math. Monthly69(1962), 469–485

work page 1962

[8] [8]

D. A. Sprecher, On the structure of continuous functions of several variables,Trans. Amer. Math. Soc.115(1965), 340–355

work page 1965

[9] [9]

Ismayilova and V

A. Ismayilova and V. E. Ismailov, On the Kolmogorov neural networks,Neural Net- works176(2024), Article No. 106333

work page 2024

[10] [10]

Igelnik and N

B. Igelnik and N. Parikh, Kolmogorov’s spline network,IEEE Trans. Neural Netw. 14(2003), no. 4, 725–733

work page 2003

[11] [11]

Polar and M

A. Polar and M. Poluektov, A deep machine learning algorithm for construction of the Kolmogorov—Arnold representation,Eng. Appl. Artif. Intell.99(2021), Article No. 104137

work page 2021

[12] [12]

Poluektov and A

M. Poluektov and A. Polar, Construction of the Kolmogorov–Arnold networks using the Newton–Kaczmarz method,Mach. Learn.114(2025), Article No. 185

work page 2025

[13] [13]

W. Liu, E. Chatzi, and Z. Lai, On the rate of convergence of Kolmogorov–Arnold network regression estimators,arXiv:2509.19830, 2025

work page arXiv 2025

[14] [14]

Approximation rates in besov norms and sample- complexity of kolmogorov-arnold networks with residual connections, 2025

A. Kratsios, B. J. Kim, and T. Furuya, Approximation rates in Besov norms and sample-complexity of Kolmogorov–Arnold networks with residual connections, arXiv:2504.15110, 2025

work page arXiv 2025

[15] [15]

Gleyzer, H

S. Gleyzer, H. Nguyen, D. P. Ramakrishnan, and E. A. F. Reinhardt, Sinusoidal approximation theorem for Kolmogorov–Arnold networks,Mathematics13(2025), no. 19, Article No. 3157

work page 2025

[16] [16]

S-T. Chiu, S. W. Cheung, U. Braga-Neto, C. S. Lee, and R. P. Li, Free-RBF-KAN: Kolmogorov-Arnold networks with adaptive radial basis functions for efficient func- tion learning,arXiv:2601.07760, 2026

work page arXiv 2026

[17] [17]

J. D. Toscano, L.-L. Wang, and G. E. Karniadakis, KKANs: K˚ urkov´ a–Kolmogorov– Arnold networks and their learning dynamics,Neural Networks191(2025), Article No. 107831

work page 2025

[18] [18]

Y. Wang, J. W. Siegel, Z. Liu, and T. Y. Hou, On the expressiveness and spectral bias of KANs,arXiv:2410.01803, 2024

work page arXiv 2024

[19] [19]

Zhang and H

X. Zhang and H. Zhou, Generalization bounds and model complexity for Kolmo- gorov–Arnold networks,The Thirteenth International Conference on Learning Rep- resentations, 2025. 18

work page 2025

[20] [20]

S. M. Eshtehardian, M. H. Yassaee, and B. Khalaj, On the convergence of two-layer Kolmogorov–Arnold networks with first-layer training,The Fourteenth International Conference on Learning Representations, 2026

work page 2026

[21] [21]

Somvanshi, S

S. Somvanshi, S. A. Javed, M. M. Islam, D. Pandit, and S. Das, A survey on Kolmogorov–Arnold network,ACM Comput. Surv., 58 (2025), no. 2, Article 55

work page 2025

[22] [22]

A Practitioner's Guide to Kolmogorov-Arnold Networks

A. Noorizadegan, S. Wang, L. Ling, J. P. Dominguez-Morales A practitioner’s guide to Kolmogorov–Arnold networks,arXiv:2510.25781, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[23] [23]

Leshno, V

M. Leshno, V. Ya. Lin, A. Pinkus, and S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function,Neural Networks,6(1993), 861–867

work page 1993

[24] [24]

Pinkus, Approximation theory of the MLP model in neural networks,Acta nu- merica8(1999), 143–195

A. Pinkus, Approximation theory of the MLP model in neural networks,Acta nu- merica8(1999), 143–195

work page 1999

[25] [25]

de Boor,A practical guide to splines, Revised Edition, Springer, New York, 2001, 346 pp

C. de Boor,A practical guide to splines, Revised Edition, Springer, New York, 2001, 346 pp

work page 2001

[26] [26]

L. L. Schumaker,Spline functions: basic theory, Third Edition, Cambridge Univer- sity Press, Cambridge, 2007, 582 pp. 19

work page 2007