Depth Lower Bounds for ReLU Networks with Binary Inputs
Pith reviewed 2026-07-01 07:46 UTC · model grok-4.3
The pith
ReLU networks computing certain explicit functions on binary inputs exactly must satisfy width^d = Omega(2^n) for depth d.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
There exists an explicit family of functions computable exactly by a ReLU network of depth n+1 and constant width, such that any ReLU network of depth d and width w computing the function exactly must satisfy w^d = Omega(2^n). In particular, no network of depth d = o(n/log n) can compute it with width polynomial in n. The lower bound applies under exact infinite-accuracy computation.
What carries the argument
An explicit family of functions on {0,1}^n that a constant-width ReLU network computes exactly at depth n+1 but requires w^d exponential in n at smaller depths.
Load-bearing premise
The lower bound holds only for exact computation with infinite precision, since an exponential-precision truncation of the output is computable by a polynomial-size TC^0 circuit.
What would settle it
Exhibiting a depth-d ReLU network with width w where w^d is o(2^n) that computes one of the functions in the family exactly would falsify the claim.
read the original abstract
We study the role of depth in ReLU networks with discrete (Boolean) inputs and real-valued outputs, complementing two established lines of work. For Boolean inputs, striking depth separation results were proven for $\mathsf{AC}^0$ but with threshold ($\mathsf{TC}^0$) or ReLU gates depth separation is only established for depth two vs. three. On the other hand, for {\em real-valued} functions and ReLU networks, Telgarsky's (2016) constructed a simple one variable class of functions which establishes separation at higher depths. In this paper we are interested to establish an all-depths depth separation for ReLU networks on $\{0,1\}^n$. We do so by exhibiting an explicit family of functions computable exactly by a ReLU network of depth $n+1$ and constant width, such that any ReLU network of depth $d$ and width $w$ computing the function exactly must satisfy $w^d = \Omega(2^n)$; in particular, no network of depth $d = o(n/\log n)$ can compute it with width polynomial in $n$. We note that our lower bound relies on \emph{exact, infinite-accuracy} computation as an exponential precision truncation of the output is computable by a polynomial-size $\mathsf{TC}^0$ circuit.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper exhibits an explicit family of functions on {0,1}^n computable exactly by a ReLU network of depth n+1 and constant width, while proving that any exact ReLU network of depth d and width w must satisfy w^d = Omega(2^n). This yields depth separations up to d = o(n/log n) even for polynomial width. The lower bound is stated to require exact infinite-precision real arithmetic; an exponential-precision truncation of the output lies in polynomial-size TC^0.
Significance. If the claimed explicit construction and matching lower bound hold, the result supplies an all-depths separation for ReLU networks on Boolean inputs, extending both the AC^0 depth separations and Telgarsky's one-variable real-valued construction. The explicitness of the upper-bound network and the parameter-free character of the Omega(2^n) lower bound are notable strengths. The authors' own precision caveat correctly situates the result within circuit-complexity distinctions between exact real computation and TC^0.
minor comments (2)
- The abstract and introduction should explicitly reference the section containing the upper-bound construction (depth n+1, constant width) so readers can locate the explicit network immediately.
- Clarify in the statement of the main theorem whether the Omega(2^n) bound is asymptotic in n for fixed d or holds uniformly; the current wording leaves the quantifiers slightly ambiguous.
Simulated Author's Rebuttal
We thank the referee for the positive report, the recognition of the explicit construction and parameter-free lower bound, and the recommendation to accept. We are pleased that the situating of the result with respect to exact real arithmetic versus TC^0 has been noted as a strength.
Circularity Check
No significant circularity
full rationale
The paper establishes an explicit upper-bound construction (depth n+1, constant width) for a family of functions on Boolean inputs and proves a matching lower bound via direct analysis of ReLU network computation under exact real arithmetic. No equations reduce to self-definition, no fitted parameters are relabeled as predictions, and the central claim does not rest on self-citations or imported uniqueness theorems; the TC^0 caveat is an explicit qualification rather than a hidden dependency. The derivation chain is therefore self-contained against external mathematical benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math ReLU networks are defined with standard piecewise-linear activation and exact real arithmetic
- domain assumption Depth-d width-w ReLU networks have expressive power bounded by the product w^d in some measure
Reference graph
Works this paper leans on
-
[1]
Mix , title =
Hesse, William and Allender, Eric and Barrington, David A. Mix , title =. Journal of Computer and System Sciences , volume =. 2002 , note =
2002
-
[2]
Automata, Languages and Programming, 28th International Colloquium (ICALP 2001) , series =
Hesse, William , title =. Automata, Languages and Programming, 28th International Colloquium (ICALP 2001) , series =
2001
-
[3]
and Tate, Stephen R
Reif, John H. and Tate, Stephen R. , title =. SIAM Journal on Computing , volume =. 1992 , doi =
1992
-
[4]
, title =
Reif, John H. , title =. Proceedings of the 2nd Annual Conference on Structure in Complexity Theory , pages =
-
[5]
6th International Conference on Learning Representations (ICLR) , year =
Arora, Raman and Basu, Amitabh and Mianjy, Poorya and Mukherjee, Anirbit , title =. 6th International Conference on Learning Representations (ICLR) , year =
-
[6]
2017 , eprint =
Mukherjee, Anirbit and Basu, Amitabh , title =. 2017 , eprint =
2017
-
[7]
Advances in Neural Information Processing Systems (NeurIPS) , volume =
Hertrich, Christoph and Basu, Amitabh and Di Summa, Marco and Skutella, Martin , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =. 2021 , note =
2021
-
[8]
International Conference on Learning Representations (ICLR) , year =
Haase, Christian and Hertrich, Christoph and Loho, Georg , title =. International Conference on Learning Representations (ICLR) , year =
-
[9]
IEEE Transactions on Information Theory , volume =
Wang, Shuning and Sun, Xusheng , title =. IEEE Transactions on Information Theory , volume =. 2005 , doi =
2005
-
[10]
Conference on Learning Theory (COLT) , pages =
Neyshabur, Behnam and Tomioka, Ryota and Srebro, Nathan , title =. Conference on Learning Theory (COLT) , pages =
-
[11]
SIAM Journal on Computing , volume =
Maass, Wolfgang , title =. SIAM Journal on Computing , volume =. 1997 , doi =
1997
-
[12]
Conference on Learning Theory (COLT) , pages =
Eldan, Ronen and Shamir, Ohad , title =. Conference on Learning Theory (COLT) , pages =
-
[13]
29th Annual Conference on Learning Theory (COLT) , series =
Telgarsky, Matus , title =. 29th Annual Conference on Learning Theory (COLT) , series =
-
[14]
Proceedings of the 15th Annual ACM Symposium on Theory of Computing (STOC) , pages =
Sipser, Michael , title =. Proceedings of the 15th Annual ACM Symposium on Theory of Computing (STOC) , pages =
-
[15]
Almost optimal lower bounds for small depth circuits , booktitle =
H. Almost optimal lower bounds for small depth circuits , booktitle =
-
[16]
An Average-Case Depth Hierarchy Theorem for Boolean Circuits , year =
H. An Average-Case Depth Hierarchy Theorem for Boolean Circuits , year =. J. ACM , month = aug, articleno =. doi:10.1145/3095799 , abstract =
-
[17]
and Sipser, Michael , title =
Furst, Merrick and Saxe, James B. and Sipser, Michael , title =. Mathematical Systems Theory , volume =
-
[18]
and Williams, Ryan , title =
Kane, Daniel M. and Williams, Ryan , title =. Proceedings of the 48th Annual ACM Symposium on Theory of Computing (STOC) , pages =. 2016 , note =
2016
-
[19]
, title =
Impagliazzo, Russell and Paturi, Ramamohan and Saks, Michael E. , title =. SIAM Journal on Computing , volume =. 1997 , doi =
1997
-
[20]
Besicovitch, A. S. , title =. Journal of the London Mathematical Society , volume =. 1940 , doi =
1940
-
[21]
Mordell, L. J. , title =. Pacific Journal of Mathematics , volume =
-
[22]
, title =
Barrington, David A. , title =. Journal of Computer and System Sciences , volume =. 1989 , doi =
1989
-
[23]
and Rudich, Steven , title =
Razborov, Alexander A. and Rudich, Steven , title =. Journal of Computer and System Sciences , volume =. 1997 , doi =
1997
-
[24]
Journal of the ACM , volume =
Naor, Moni and Reingold, Omer , title =. Journal of the ACM , volume =. 2004 , doi =
2004
-
[25]
Muroga, Saburo , title =
-
[26]
Majority gates vs
Goldmann, Mikael and H. Majority gates vs. general weighted threshold gates , journal =. 1992 , doi =
1992
-
[27]
SIAM Journal on Computing , volume =
Goldmann, Mikael and Karpinski, Marek , title =. SIAM Journal on Computing , volume =. 1998 , doi =
1998
-
[28]
On the size of weights for threshold gates , journal =
H. On the size of weights for threshold gates , journal =. 1994 , doi =
1994
-
[29]
Journal of the ACM , volume =
Williams, Ryan , title =. Journal of the ACM , volume =. 2014 , doi =
2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.