Depth Lower Bounds for ReLU Networks with Binary Inputs

Elchanan Mossel; Neil Krishnan

arxiv: 2606.18540 · v2 · pith:2IA4DL5Anew · submitted 2026-06-16 · 💻 cs.CC

Depth Lower Bounds for ReLU Networks with Binary Inputs

Neil Krishnan , Elchanan Mossel This is my paper

Pith reviewed 2026-07-01 07:46 UTC · model grok-4.3

classification 💻 cs.CC

keywords ReLU networksdepth lower boundsBoolean inputscircuit complexityneural network expressivityexact computation

0 comments

The pith

ReLU networks computing certain explicit functions on binary inputs exactly must satisfy width^d = Omega(2^n) for depth d.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs an explicit family of functions on n binary inputs that a ReLU network of depth n+1 and constant width computes exactly. It proves that any exact ReLU network of depth d must have width w satisfying w^d exponential in n, so that depth o(n/log n) forces superpolynomial width. This establishes depth separation at all depths for ReLU networks with Boolean inputs, going beyond known results for threshold circuits or real-valued functions. A reader would care because it shows depth is essential for efficient exact computation in this setting rather than just a matter of width.

Core claim

There exists an explicit family of functions computable exactly by a ReLU network of depth n+1 and constant width, such that any ReLU network of depth d and width w computing the function exactly must satisfy w^d = Omega(2^n). In particular, no network of depth d = o(n/log n) can compute it with width polynomial in n. The lower bound applies under exact infinite-accuracy computation.

What carries the argument

An explicit family of functions on {0,1}^n that a constant-width ReLU network computes exactly at depth n+1 but requires w^d exponential in n at smaller depths.

Load-bearing premise

The lower bound holds only for exact computation with infinite precision, since an exponential-precision truncation of the output is computable by a polynomial-size TC^0 circuit.

What would settle it

Exhibiting a depth-d ReLU network with width w where w^d is o(2^n) that computes one of the functions in the family exactly would falsify the claim.

read the original abstract

We study the role of depth in ReLU networks with discrete (Boolean) inputs and real-valued outputs, complementing two established lines of work. For Boolean inputs, striking depth separation results were proven for $\mathsf{AC}^0$ but with threshold ($\mathsf{TC}^0$) or ReLU gates depth separation is only established for depth two vs. three. On the other hand, for {\em real-valued} functions and ReLU networks, Telgarsky's (2016) constructed a simple one variable class of functions which establishes separation at higher depths. In this paper we are interested to establish an all-depths depth separation for ReLU networks on $\{0,1\}^n$. We do so by exhibiting an explicit family of functions computable exactly by a ReLU network of depth $n+1$ and constant width, such that any ReLU network of depth $d$ and width $w$ computing the function exactly must satisfy $w^d = \Omega(2^n)$; in particular, no network of depth $d = o(n/\log n)$ can compute it with width polynomial in $n$. We note that our lower bound relies on \emph{exact, infinite-accuracy} computation as an exponential precision truncation of the output is computable by a polynomial-size $\mathsf{TC}^0$ circuit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives an explicit all-depths separation for exact ReLU computation on Boolean inputs, with a clean width-depth tradeoff.

read the letter

The core result is a family of functions on {0,1}^n that a ReLU net computes exactly at depth n+1 with constant width, yet any exact depth-d width-w net needs w^d = Omega(2^n). This rules out polynomial width for depth o(n/log n).

What stands out is the extension past the depth-two versus three separations that existed for ReLU on Boolean inputs. The upper-bound construction is explicit and matches the lower bound in the stated regime. The authors also flag the precision caveat up front, which keeps the claim honest.

The main limitation is the exact, infinite-accuracy requirement. The paper notes that an exponential-precision truncation falls into polynomial-size TC^0, so the hardness is tied to the real-valued output model rather than any discrete circuit. That narrows the practical takeaway but does not contradict the stated theorem.

The argument looks internally consistent from the abstract and the stress-test note. No circularity or hidden fitting appears. The citation pattern is standard for the area.

This is for people working on neural network expressivity and circuit complexity. A reader who cares about depth separations in exact real-valued models will find the construction useful. The paper deserves a serious referee; the claim is sharp enough to warrant checking the details.

Referee Report

0 major / 2 minor

Summary. The paper exhibits an explicit family of functions on {0,1}^n computable exactly by a ReLU network of depth n+1 and constant width, while proving that any exact ReLU network of depth d and width w must satisfy w^d = Omega(2^n). This yields depth separations up to d = o(n/log n) even for polynomial width. The lower bound is stated to require exact infinite-precision real arithmetic; an exponential-precision truncation of the output lies in polynomial-size TC^0.

Significance. If the claimed explicit construction and matching lower bound hold, the result supplies an all-depths separation for ReLU networks on Boolean inputs, extending both the AC^0 depth separations and Telgarsky's one-variable real-valued construction. The explicitness of the upper-bound network and the parameter-free character of the Omega(2^n) lower bound are notable strengths. The authors' own precision caveat correctly situates the result within circuit-complexity distinctions between exact real computation and TC^0.

minor comments (2)

The abstract and introduction should explicitly reference the section containing the upper-bound construction (depth n+1, constant width) so readers can locate the explicit network immediately.
Clarify in the statement of the main theorem whether the Omega(2^n) bound is asymptotic in n for fixed d or holds uniformly; the current wording leaves the quantifiers slightly ambiguous.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive report, the recognition of the explicit construction and parameter-free lower bound, and the recommendation to accept. We are pleased that the situating of the result with respect to exact real arithmetic versus TC^0 has been noted as a strength.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper establishes an explicit upper-bound construction (depth n+1, constant width) for a family of functions on Boolean inputs and proves a matching lower bound via direct analysis of ReLU network computation under exact real arithmetic. No equations reduce to self-definition, no fitted parameters are relabeled as predictions, and the central claim does not rest on self-citations or imported uniqueness theorems; the TC^0 caveat is an explicit qualification rather than a hidden dependency. The derivation chain is therefore self-contained against external mathematical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The result rests on standard definitions of ReLU networks, exact real-valued computation, and basic properties from circuit complexity; no free parameters, new entities, or ad-hoc axioms are introduced in the abstract.

axioms (2)

standard math ReLU networks are defined with standard piecewise-linear activation and exact real arithmetic
Invoked implicitly when stating exact computation by depth n+1 constant-width networks.
domain assumption Depth-d width-w ReLU networks have expressive power bounded by the product w^d in some measure
The Omega(2^n) lower bound is stated directly in terms of this product.

pith-pipeline@v0.9.1-grok · 5766 in / 1422 out tokens · 35320 ms · 2026-07-01T07:46:11.688883+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 1 canonical work pages

[1]

Mix , title =

Hesse, William and Allender, Eric and Barrington, David A. Mix , title =. Journal of Computer and System Sciences , volume =. 2002 , note =

2002
[2]

Automata, Languages and Programming, 28th International Colloquium (ICALP 2001) , series =

Hesse, William , title =. Automata, Languages and Programming, 28th International Colloquium (ICALP 2001) , series =

2001
[3]

and Tate, Stephen R

Reif, John H. and Tate, Stephen R. , title =. SIAM Journal on Computing , volume =. 1992 , doi =

1992
[4]

, title =

Reif, John H. , title =. Proceedings of the 2nd Annual Conference on Structure in Complexity Theory , pages =
[5]

6th International Conference on Learning Representations (ICLR) , year =

Arora, Raman and Basu, Amitabh and Mianjy, Poorya and Mukherjee, Anirbit , title =. 6th International Conference on Learning Representations (ICLR) , year =
[6]

2017 , eprint =

Mukherjee, Anirbit and Basu, Amitabh , title =. 2017 , eprint =

2017
[7]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Hertrich, Christoph and Basu, Amitabh and Di Summa, Marco and Skutella, Martin , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =. 2021 , note =

2021
[8]

International Conference on Learning Representations (ICLR) , year =

Haase, Christian and Hertrich, Christoph and Loho, Georg , title =. International Conference on Learning Representations (ICLR) , year =
[9]

IEEE Transactions on Information Theory , volume =

Wang, Shuning and Sun, Xusheng , title =. IEEE Transactions on Information Theory , volume =. 2005 , doi =

2005
[10]

Conference on Learning Theory (COLT) , pages =

Neyshabur, Behnam and Tomioka, Ryota and Srebro, Nathan , title =. Conference on Learning Theory (COLT) , pages =
[11]

SIAM Journal on Computing , volume =

Maass, Wolfgang , title =. SIAM Journal on Computing , volume =. 1997 , doi =

1997
[12]

Conference on Learning Theory (COLT) , pages =

Eldan, Ronen and Shamir, Ohad , title =. Conference on Learning Theory (COLT) , pages =
[13]

29th Annual Conference on Learning Theory (COLT) , series =

Telgarsky, Matus , title =. 29th Annual Conference on Learning Theory (COLT) , series =
[14]

Proceedings of the 15th Annual ACM Symposium on Theory of Computing (STOC) , pages =

Sipser, Michael , title =. Proceedings of the 15th Annual ACM Symposium on Theory of Computing (STOC) , pages =
[15]

Almost optimal lower bounds for small depth circuits , booktitle =

H. Almost optimal lower bounds for small depth circuits , booktitle =
[16]

An Average-Case Depth Hierarchy Theorem for Boolean Circuits , year =

H. An Average-Case Depth Hierarchy Theorem for Boolean Circuits , year =. J. ACM , month = aug, articleno =. doi:10.1145/3095799 , abstract =

work page doi:10.1145/3095799
[17]

and Sipser, Michael , title =

Furst, Merrick and Saxe, James B. and Sipser, Michael , title =. Mathematical Systems Theory , volume =
[18]

and Williams, Ryan , title =

Kane, Daniel M. and Williams, Ryan , title =. Proceedings of the 48th Annual ACM Symposium on Theory of Computing (STOC) , pages =. 2016 , note =

2016
[19]

, title =

Impagliazzo, Russell and Paturi, Ramamohan and Saks, Michael E. , title =. SIAM Journal on Computing , volume =. 1997 , doi =

1997
[20]

Besicovitch, A. S. , title =. Journal of the London Mathematical Society , volume =. 1940 , doi =

1940
[21]

Mordell, L. J. , title =. Pacific Journal of Mathematics , volume =
[22]

, title =

Barrington, David A. , title =. Journal of Computer and System Sciences , volume =. 1989 , doi =

1989
[23]

and Rudich, Steven , title =

Razborov, Alexander A. and Rudich, Steven , title =. Journal of Computer and System Sciences , volume =. 1997 , doi =

1997
[24]

Journal of the ACM , volume =

Naor, Moni and Reingold, Omer , title =. Journal of the ACM , volume =. 2004 , doi =

2004
[25]

Muroga, Saburo , title =
[26]

Majority gates vs

Goldmann, Mikael and H. Majority gates vs. general weighted threshold gates , journal =. 1992 , doi =

1992
[27]

SIAM Journal on Computing , volume =

Goldmann, Mikael and Karpinski, Marek , title =. SIAM Journal on Computing , volume =. 1998 , doi =

1998
[28]

On the size of weights for threshold gates , journal =

H. On the size of weights for threshold gates , journal =. 1994 , doi =

1994
[29]

Journal of the ACM , volume =

Williams, Ryan , title =. Journal of the ACM , volume =. 2014 , doi =

2014

[1] [1]

Mix , title =

Hesse, William and Allender, Eric and Barrington, David A. Mix , title =. Journal of Computer and System Sciences , volume =. 2002 , note =

2002

[2] [2]

Automata, Languages and Programming, 28th International Colloquium (ICALP 2001) , series =

Hesse, William , title =. Automata, Languages and Programming, 28th International Colloquium (ICALP 2001) , series =

2001

[3] [3]

and Tate, Stephen R

Reif, John H. and Tate, Stephen R. , title =. SIAM Journal on Computing , volume =. 1992 , doi =

1992

[4] [4]

, title =

Reif, John H. , title =. Proceedings of the 2nd Annual Conference on Structure in Complexity Theory , pages =

[5] [5]

6th International Conference on Learning Representations (ICLR) , year =

Arora, Raman and Basu, Amitabh and Mianjy, Poorya and Mukherjee, Anirbit , title =. 6th International Conference on Learning Representations (ICLR) , year =

[6] [6]

2017 , eprint =

Mukherjee, Anirbit and Basu, Amitabh , title =. 2017 , eprint =

2017

[7] [7]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Hertrich, Christoph and Basu, Amitabh and Di Summa, Marco and Skutella, Martin , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =. 2021 , note =

2021

[8] [8]

International Conference on Learning Representations (ICLR) , year =

Haase, Christian and Hertrich, Christoph and Loho, Georg , title =. International Conference on Learning Representations (ICLR) , year =

[9] [9]

IEEE Transactions on Information Theory , volume =

Wang, Shuning and Sun, Xusheng , title =. IEEE Transactions on Information Theory , volume =. 2005 , doi =

2005

[10] [10]

Conference on Learning Theory (COLT) , pages =

Neyshabur, Behnam and Tomioka, Ryota and Srebro, Nathan , title =. Conference on Learning Theory (COLT) , pages =

[11] [11]

SIAM Journal on Computing , volume =

Maass, Wolfgang , title =. SIAM Journal on Computing , volume =. 1997 , doi =

1997

[12] [12]

Conference on Learning Theory (COLT) , pages =

Eldan, Ronen and Shamir, Ohad , title =. Conference on Learning Theory (COLT) , pages =

[13] [13]

29th Annual Conference on Learning Theory (COLT) , series =

Telgarsky, Matus , title =. 29th Annual Conference on Learning Theory (COLT) , series =

[14] [14]

Proceedings of the 15th Annual ACM Symposium on Theory of Computing (STOC) , pages =

Sipser, Michael , title =. Proceedings of the 15th Annual ACM Symposium on Theory of Computing (STOC) , pages =

[15] [15]

Almost optimal lower bounds for small depth circuits , booktitle =

H. Almost optimal lower bounds for small depth circuits , booktitle =

[16] [16]

An Average-Case Depth Hierarchy Theorem for Boolean Circuits , year =

H. An Average-Case Depth Hierarchy Theorem for Boolean Circuits , year =. J. ACM , month = aug, articleno =. doi:10.1145/3095799 , abstract =

work page doi:10.1145/3095799

[17] [17]

and Sipser, Michael , title =

Furst, Merrick and Saxe, James B. and Sipser, Michael , title =. Mathematical Systems Theory , volume =

[18] [18]

and Williams, Ryan , title =

Kane, Daniel M. and Williams, Ryan , title =. Proceedings of the 48th Annual ACM Symposium on Theory of Computing (STOC) , pages =. 2016 , note =

2016

[19] [19]

, title =

Impagliazzo, Russell and Paturi, Ramamohan and Saks, Michael E. , title =. SIAM Journal on Computing , volume =. 1997 , doi =

1997

[20] [20]

Besicovitch, A. S. , title =. Journal of the London Mathematical Society , volume =. 1940 , doi =

1940

[21] [21]

Mordell, L. J. , title =. Pacific Journal of Mathematics , volume =

[22] [22]

, title =

Barrington, David A. , title =. Journal of Computer and System Sciences , volume =. 1989 , doi =

1989

[23] [23]

and Rudich, Steven , title =

Razborov, Alexander A. and Rudich, Steven , title =. Journal of Computer and System Sciences , volume =. 1997 , doi =

1997

[24] [24]

Journal of the ACM , volume =

Naor, Moni and Reingold, Omer , title =. Journal of the ACM , volume =. 2004 , doi =

2004

[25] [25]

Muroga, Saburo , title =

[26] [26]

Majority gates vs

Goldmann, Mikael and H. Majority gates vs. general weighted threshold gates , journal =. 1992 , doi =

1992

[27] [27]

SIAM Journal on Computing , volume =

Goldmann, Mikael and Karpinski, Marek , title =. SIAM Journal on Computing , volume =. 1998 , doi =

1998

[28] [28]

On the size of weights for threshold gates , journal =

H. On the size of weights for threshold gates , journal =. 1994 , doi =

1994

[29] [29]

Journal of the ACM , volume =

Williams, Ryan , title =. Journal of the ACM , volume =. 2014 , doi =

2014