pith. sign in

arxiv: 2604.23765 · v2 · pith:O6OMS6LPnew · submitted 2026-04-26 · 💻 cs.LG · cs.NE· math.FA

Necessary and sufficient conditions for universality of Kolmogorov-Arnold networks

Pith reviewed 2026-05-20 23:51 UTC · model grok-4.3

classification 💻 cs.LG cs.NEmath.FA
keywords Kolmogorov-Arnold networksuniversal approximationnon-affine functionsunivariate edge functionsspline parameterizationfunction superpositionsdensity in C(K)
0
0 comments X

The pith

Deep KANs achieve universal approximation on any compact set precisely when they include one fixed non-affine univariate edge function.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes necessary and sufficient conditions on the edge functions of Kolmogorov-Arnold networks for them to approximate any continuous function on a compact domain in the uniform norm. It proves that when all edges are either affine or identical to one fixed continuous univariate function σ, density holds for arbitrary depth if and only if σ is non-affine. For networks limited to exactly two hidden layers the condition strengthens to σ being nonpolynomial. The work further shows that the full family of affine functions can be replaced by a finite collection without losing universality, and that standard spline-based KAN parameterizations remain universal even with fixed degree and knots. These results also imply that every continuous multivariate function can be approximated arbitrarily closely using only the coordinate projections together with two fixed univariate functions under repeated addition and composition.

Core claim

Deep KANs in which all edge functions are either affine or equal to a fixed continuous function σ are dense in C(K) for every compact set K⊂R^n if and only if σ is non-affine. In contrast, for KANs with exactly two hidden layers, universality holds if and only if σ is nonpolynomial. The full class of affine functions is not required and can be replaced by a finite set; in the nonpolynomial case a fixed family of five affine functions suffices for arbitrary depth. For every continuous non-affine σ there exists a finite affine family A_σ such that deep KANs with edge functions drawn from A_σ ∪ {σ} remain universal. KANs using the spline-based edge parameterization of Liu et al. are universal,

What carries the argument

The KAN architecture whose nodes sum univariate edge functions, together with the precise condition that a single fixed continuous univariate function σ must be non-affine for the resulting networks to be dense in the uniform topology on C(K).

If this is right

  • Universality holds for arbitrary depth once a single non-affine σ is included, even when the remaining edges are drawn from a finite affine family of size five in the nonpolynomial case.
  • With exactly two hidden layers the non-affine function σ must additionally be nonpolynomial to guarantee density.
  • Spline-based KANs with any fixed spline degree and fixed knot sequence are universal approximators in the classical sense.
  • Every continuous function on a compact set can be approximated using only the coordinate functions and two fixed univariate functions combined by repeated addition and composition.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The non-linearity required for universality can be localized to a single type of edge function, suggesting that most edges in a trained KAN could be kept linear or fixed without sacrificing expressivity.
  • The finite affine replacement result opens the possibility of discretizing the linear part of a KAN into a small lookup table while preserving approximation power.
  • The superposition implication tightens the classical Kolmogorov-Arnold theorem by showing that two fixed univariate functions, rather than a continuum, suffice when addition and composition are allowed freely.

Load-bearing premise

The KAN architecture is defined with univariate edge functions combined via summation at nodes and density is measured in the uniform norm on C(K); any change in this definition could invalidate the if-and-only-if statements.

What would settle it

Construct a deep KAN using only affine functions plus the fixed affine map σ(x) = x and exhibit a continuous target function on [0,1]^2 whose uniform approximation error remains bounded away from zero; repeat the construction with a non-affine σ such as σ(x) = x^2 and verify that the error can be driven to zero.

read the original abstract

We analyze the universal approximation property of Kolmogorov-Arnold Networks (KANs) in terms of their edge functions. If these functions are all affine, then universality clearly fails. How many non-affine functions are needed, in addition to affine ones, to ensure universality? We show that a single one suffices. More precisely, we prove that deep KANs in which all edge functions are either affine or equal to a fixed continuous function $\sigma$ are dense in $C(K)$ for every compact set $K\subset\mathbb{R}^n$ if and only if $\sigma$ is non-affine. In contrast, for KANs with exactly two hidden layers, universality holds if and only if $\sigma$ is nonpolynomial. We further show that the full class of affine functions is not required; it can be replaced by a finite set without affecting universality. In particular, in the nonpolynomial case, a fixed family of five affine functions suffices when the depth is arbitrary. More generally, for every continuous non-affine function $\sigma$, there exists a finite affine family $A_\sigma$ such that deep KANs with edge functions in $A_\sigma\cup\{\sigma\}$ remain universal. We also prove that KANs with the spline-based edge parameterization introduced by Liu et al.~\cite{Liu2024} are universal approximators in the classical sense, even when the spline degree and knot sequence are fixed in advance. This paper also has implications for the theory of superpositions of real functions. In particular, we show that every continuous multivariate function can be approximated arbitrarily well using only the coordinate functions and two fixed univariate functions under repeated addition and composition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The paper establishes necessary and sufficient conditions for the universal approximation property of Kolmogorov-Arnold Networks (KANs). It proves that deep KANs whose edge functions are restricted to the affine class or a single fixed continuous function σ are dense in C(K) for every compact K ⊂ R^n if and only if σ is non-affine. For KANs with exactly two hidden layers the corresponding condition is that σ is non-polynomial. The manuscript further shows that the full affine class can be replaced by a finite family A_σ (of size five when σ is non-polynomial) while preserving universality, and that the spline-based edge parameterization of Liu et al. yields universal approximators even when degree and knots are fixed in advance. The work also derives consequences for the theory of superpositions of real functions.

Significance. If the stated iff statements hold, the paper supplies a sharp characterization of the minimal nonlinearity required for KAN universality. The explicit finite affine family A_σ and the fixed-spline universality result are practically useful and theoretically clean. The reduction to coordinate functions plus two fixed univariate functions under addition and composition strengthens the classical superposition literature. These contributions rest on standard density arguments in C(K) together with explicit constructions that avoid circularity or parameter-fitting artifacts.

major comments (1)
  1. §4, Theorem 4.3 (two-layer case): the necessity proof that a polynomial σ yields only polynomial networks is clear, but the sufficiency direction for non-polynomial σ relies on a specific nesting argument; an explicit low-dimensional counter-example (e.g., n=1, target x^2) showing that two layers with a polynomial σ cannot approximate it would strengthen the claim.
minor comments (3)
  1. §2.1: the precise definition of a KAN layer (how univariate edge functions are summed at each node for vector-valued inputs) should be stated with an equation or diagram to avoid ambiguity when comparing to the classical Kolmogorov-Arnold representation.
  2. The statement that |A_σ|=5 suffices for non-polynomial σ appears in the abstract and §5; a short table or explicit construction for σ(x)=x^3 would make the finite-family result immediately verifiable.
  3. Reference to Liu et al. (2024) is given, but the precise spline degree and knot sequence used in the universality proof for fixed splines should be restated in the theorem statement for self-contained reading.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation and the recommendation of minor revision. The single major comment is addressed below.

read point-by-point responses
  1. Referee: §4, Theorem 4.3 (two-layer case): the necessity proof that a polynomial σ yields only polynomial networks is clear, but the sufficiency direction for non-polynomial σ relies on a specific nesting argument; an explicit low-dimensional counter-example (e.g., n=1, target x^2) showing that two layers with a polynomial σ cannot approximate it would strengthen the claim.

    Authors: We appreciate the suggestion to make the necessity direction more concrete. Our general argument already shows that any polynomial σ produces networks whose outputs are polynomials of bounded degree (determined by the fixed degree of σ and the two-layer nesting depth), which cannot approximate non-polynomial targets. We agree, however, that an explicit low-dimensional illustration would improve readability. In the revised manuscript we will add a short example for n=1 and target x² (or a similar quadratic) demonstrating that, for a polynomial σ, the two-layer architecture cannot recover the target due to the restricted functional form. This addition will appear in §4 alongside the existing proof. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper establishes its iff universality claims for deep KANs via direct proofs from the given architecture definitions (univariate edge functions summed at nodes) and uniform norm on C(K). Necessity when σ is affine follows immediately because the resulting networks are affine maps, which are not dense. Sufficiency constructs explicit finite affine probes from deviation points of a non-affine σ and uses nesting to generate separating nonlinearities; the two-layer case separately requires non-polynomial σ because depth is bounded. The finite family A_σ is constructed explicitly without fitting to target data, and the spline parameterization result is a separate verification against the Liu et al. definition. All steps are self-contained in standard density arguments and do not reduce to self-citations, fitted inputs renamed as predictions, or definitional equivalences.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper rests on standard background from real analysis and approximation theory; no free parameters or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption Edge functions are continuous univariate maps from R to R.
    Required for the networks to map into C(K) and for density statements to make sense.
  • domain assumption The network topology consists of summations at nodes with univariate functions on edges.
    This is the defining structure of KANs used throughout the claims.

pith-pipeline@v0.9.0 · 5837 in / 1489 out tokens · 33605 ms · 2026-05-20T23:51:54.315867+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 2 internal anchors

  1. [1]

    Z. Liu, Y. Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljaˇ ci´ c, T. Y. Hou, M. Tegmark, KAN: Kolmogorov–Arnold Networks,arXiv:2404.19756, 2024

  2. [2]

    A. N. Kolmogorov, On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. (Russian), Dokl. Akad. Nauk SSSR114(1957), 953–956

  3. [3]

    V. I. Arnold, On the representation of continuous functions of three variables by superpositions of continuous functions of two variables. (Russian),Mat. Sb. (N.S.) 48/90(1959), 3–74; English transl. in:Amer. Math. Soc. Transl.(2)28(1963), 61–147. 17

  4. [4]

    S. Ya. Khavinson,Best approximation by linear superpositions (approximate nomog- raphy), American Mathematical Society, Providence, RI, 1997, 175 pp

  5. [5]

    V. E. Ismailov,Ridge functions and applications in neural networks, American Math- ematical Society, Providence, RI, 2021, 186 pp

  6. [6]

    V. E. Ismailov, A three layer neural network can represent any multivariate function, J. Math. Anal. Appl.523(2023), no. 1, Article No. 127096, 8 pp

  7. [7]

    G. G. Lorentz, Metric entropy, widths, and superpositions of functions,Amer. Math. Monthly69(1962), 469–485

  8. [8]

    D. A. Sprecher, On the structure of continuous functions of several variables,Trans. Amer. Math. Soc.115(1965), 340–355

  9. [9]

    Ismayilova and V

    A. Ismayilova and V. E. Ismailov, On the Kolmogorov neural networks,Neural Net- works176(2024), Article No. 106333

  10. [10]

    Igelnik and N

    B. Igelnik and N. Parikh, Kolmogorov’s spline network,IEEE Trans. Neural Netw. 14(2003), no. 4, 725–733

  11. [11]

    Polar and M

    A. Polar and M. Poluektov, A deep machine learning algorithm for construction of the Kolmogorov—Arnold representation,Eng. Appl. Artif. Intell.99(2021), Article No. 104137

  12. [12]

    Poluektov and A

    M. Poluektov and A. Polar, Construction of the Kolmogorov–Arnold networks using the Newton–Kaczmarz method,Mach. Learn.114(2025), Article No. 185

  13. [13]

    W. Liu, E. Chatzi, and Z. Lai, On the rate of convergence of Kolmogorov–Arnold network regression estimators,arXiv:2509.19830, 2025

  14. [14]

    Approximation rates in besov norms and sample- complexity of kolmogorov-arnold networks with residual connections, 2025

    A. Kratsios, B. J. Kim, and T. Furuya, Approximation rates in Besov norms and sample-complexity of Kolmogorov–Arnold networks with residual connections, arXiv:2504.15110, 2025

  15. [15]

    Gleyzer, H

    S. Gleyzer, H. Nguyen, D. P. Ramakrishnan, and E. A. F. Reinhardt, Sinusoidal approximation theorem for Kolmogorov–Arnold networks,Mathematics13(2025), no. 19, Article No. 3157

  16. [16]

    S-T. Chiu, S. W. Cheung, U. Braga-Neto, C. S. Lee, and R. P. Li, Free-RBF-KAN: Kolmogorov-Arnold networks with adaptive radial basis functions for efficient func- tion learning,arXiv:2601.07760, 2026

  17. [17]

    J. D. Toscano, L.-L. Wang, and G. E. Karniadakis, KKANs: K˚ urkov´ a–Kolmogorov– Arnold networks and their learning dynamics,Neural Networks191(2025), Article No. 107831

  18. [18]

    Y. Wang, J. W. Siegel, Z. Liu, and T. Y. Hou, On the expressiveness and spectral bias of KANs,arXiv:2410.01803, 2024

  19. [19]

    Zhang and H

    X. Zhang and H. Zhou, Generalization bounds and model complexity for Kolmo- gorov–Arnold networks,The Thirteenth International Conference on Learning Rep- resentations, 2025. 18

  20. [20]

    S. M. Eshtehardian, M. H. Yassaee, and B. Khalaj, On the convergence of two-layer Kolmogorov–Arnold networks with first-layer training,The Fourteenth International Conference on Learning Representations, 2026

  21. [21]

    Somvanshi, S

    S. Somvanshi, S. A. Javed, M. M. Islam, D. Pandit, and S. Das, A survey on Kolmogorov–Arnold network,ACM Comput. Surv., 58 (2025), no. 2, Article 55

  22. [22]

    A Practitioner's Guide to Kolmogorov-Arnold Networks

    A. Noorizadegan, S. Wang, L. Ling, J. P. Dominguez-Morales A practitioner’s guide to Kolmogorov–Arnold networks,arXiv:2510.25781, 2025

  23. [23]

    Leshno, V

    M. Leshno, V. Ya. Lin, A. Pinkus, and S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function,Neural Networks,6(1993), 861–867

  24. [24]

    Pinkus, Approximation theory of the MLP model in neural networks,Acta nu- merica8(1999), 143–195

    A. Pinkus, Approximation theory of the MLP model in neural networks,Acta nu- merica8(1999), 143–195

  25. [25]

    de Boor,A practical guide to splines, Revised Edition, Springer, New York, 2001, 346 pp

    C. de Boor,A practical guide to splines, Revised Edition, Springer, New York, 2001, 346 pp

  26. [26]

    L. L. Schumaker,Spline functions: basic theory, Third Edition, Cambridge Univer- sity Press, Cambridge, 2007, 582 pp. 19