Depth Lower Bounds for ReLU Networks with Binary Inputs

Elchanan Mossel; Neil Krishnan

arxiv: 2606.18540 · v1 · pith:2IA4DL5Anew · submitted 2026-06-16 · 💻 cs.CC

Depth Lower Bounds for ReLU Networks with Binary Inputs

Neil Krishnan , Elchanan Mossel This is my paper

Pith reviewed 2026-06-26 21:25 UTC · model grok-4.3

classification 💻 cs.CC

keywords ReLU networksdepth separationlower boundsbinary inputsexact computationTC0 circuitsneural network expressivity

0 comments

The pith

ReLU networks on binary inputs must use depth linear in n or width exponential in n for certain explicit functions under exact computation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs an explicit family of functions on {0,1}^n inputs that a ReLU network of depth n+1 and constant width computes exactly. It shows any exact ReLU network of depth d and width w for these functions must obey w^d = Ω(2^n), so depth o(n/log n) forces superpolynomial width. This gives depth separations across all depths for ReLU networks with discrete inputs, extending earlier results that were limited to depth two versus three or to real-valued functions. A reader would care because the result isolates depth as a necessary resource for efficient exact computation even when inputs are binary and outputs real-valued.

Core claim

The authors exhibit an explicit family of functions computable exactly by a ReLU network of depth n+1 and constant width, such that any ReLU network of depth d and width w computing the function exactly must satisfy w^d = Ω(2^n); in particular, no network of depth d = o(n/log n) can compute it with width polynomial in n. The lower bound applies only under exact, infinite-accuracy computation.

What carries the argument

An explicit family of functions on {0,1}^n that admit exact computation by a depth n+1 constant-width ReLU network but force w^d exponential in n for any shallower depth d.

If this is right

No polynomial-width ReLU network of depth o(n/log n) computes any function in the family exactly.
Depth separation for ReLU networks on Boolean inputs holds for every depth up to n.
Exact computation on discrete inputs can require high depth even when constant-width networks suffice at depth n+1.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Allowing approximation or finite precision removes the separation, since the truncated outputs are in TC^0.
The result isolates a gap between exact and approximate expressivity that may appear in other activation functions or input domains.
It suggests examining whether similar depth-width tradeoffs arise when the output is required only up to constant or logarithmic precision.

Load-bearing premise

The separation holds only for exact computation with infinite numerical precision; an exponential-precision truncation of the output lies in polynomial-size TC^0.

What would settle it

Constructing a ReLU network of depth d = o(n/log n) and width polynomial in n that computes one of the functions exactly on every input in {0,1}^n would falsify the lower bound.

read the original abstract

We study the role of depth in ReLU networks with discrete (Boolean) inputs and real-valued outputs, complementing two established lines of work. For Boolean inputs, striking depth separation results were proven for $\mathsf{AC}^0$ but with threshold ($\mathsf{TC}^0$) or ReLU gates depth separation is only established for depth two vs. three. On the other hand, for {\em real-valued} functions and ReLU networks, Telgarsky's (2016) constructed a simple one variable class of functions which establishes separation at higher depths. In this paper we are interested to establish an all-depths depth separation for ReLU networks on $\{0,1\}^n$. We do so by exhibiting an explicit family of functions computable exactly by a ReLU network of depth $n+1$ and constant width, such that any ReLU network of depth $d$ and width $w$ computing the function exactly must satisfy $w^d = \Omega(2^n)$; in particular, no network of depth $d = o(n/\log n)$ can compute it with width polynomial in $n$. We note that our lower bound relies on \emph{exact, infinite-accuracy} computation as an exponential precision truncation of the output is computable by a polynomial-size $\mathsf{TC}^0$ circuit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives the first all-depths separation for exact ReLU networks on Boolean inputs via an explicit family requiring w^d = Omega(2^n), but the result is confined to infinite-precision exact computation.

read the letter

The paper establishes an explicit family of functions on {0,1}^n that a depth n+1 constant-width ReLU network computes exactly, while any depth-d width-w network needs w^d exponential in n. This rules out polynomial width for depths o(n/log n). It is the first such all-depths result for exact ReLU computation on discrete inputs.

The construction extends Telgarsky's real-valued separation to the Boolean case and moves past the depth-2-versus-3 bounds that were previously known for ReLU or TC0 on Boolean inputs. The authors cite the relevant AC0 and TC0 literature and state the width-depth tradeoff cleanly.

They also do the right thing by flagging the central limitation up front: the lower bound holds only for exact, infinite-accuracy computation. They explicitly note that an exponential-precision truncation of the output lies in polynomial-size TC0, so the separation vanishes under approximation or finite precision.

That acknowledged restriction is the main soft spot. Real networks always operate with finite bits and some error tolerance, so the result does not directly constrain practical models. The abstract alone does not let me verify the details of the construction or the error analysis in the proof, but nothing in the stated claim looks circular or dependent on fitted parameters.

The work is aimed at researchers in neural network circuit complexity. Anyone following depth-separation questions for ReLU will find the explicit family and the all-depths extension useful. It deserves a serious referee because it resolves an open question in the exact setting with a concrete construction, even though the scope is narrow.

Referee Report

0 major / 2 minor

Summary. The manuscript constructs an explicit family of functions on {0,1}^n that are exactly computable by a constant-width ReLU network of depth n+1, yet any exact-computing ReLU network of depth d and width w must satisfy w^d = Ω(2^n). This rules out polynomial-width networks for depths d = o(n/log n). The lower bound is stated to hold only under exact, infinite-precision arithmetic; an exponential-precision truncation of the output lies in polynomial-size TC^0.

Significance. If the construction and lower-bound argument hold, the result supplies an all-depths depth separation for ReLU networks on Boolean inputs, extending beyond the depth-2-vs-3 separations previously known for threshold or ReLU gates and complementing Telgarsky’s real-valued examples. The explicitness of the function family and the explicit qualification regarding exact versus approximate computation are strengths that make the claim directly testable.

minor comments (2)

Abstract: the function family is described only at a high level; a one-sentence description of the concrete function (or its recursive definition) would improve readability without lengthening the abstract.
The manuscript should include a short remark (perhaps in the introduction) contrasting the exact-computation setting with the finite-precision regime already known to be in TC^0, to make the scope of the separation immediately clear to readers.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our results, the recognition of the explicit construction and the careful qualification regarding exact versus approximate computation, and the recommendation for minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper exhibits an explicit function family computable by depth n+1 ReLU networks and proves a matching lower bound w^d = Ω(2^n) via direct construction against external classes (TC^0). The derivation relies on exact infinite-precision computation, with the finite-precision caveat stated explicitly in the abstract; no equations reduce to fitted parameters, self-definitions, or load-bearing self-citations. The central claim is a self-contained proof construction independent of the authors' prior results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Result rests on standard properties of ReLU networks and circuit classes; no free parameters, new entities, or ad-hoc axioms visible in abstract.

axioms (2)

standard math ReLU networks compute piecewise linear functions via linear transformations and max(0,x) activations
Foundational definition of the model class under study.
domain assumption Exact computation requires infinite precision arithmetic
Explicitly invoked to obtain the separation; truncation moves the function into TC0.

pith-pipeline@v0.9.1-grok · 5766 in / 1310 out tokens · 28534 ms · 2026-06-26T21:25:48.543311+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 1 canonical work pages

[1]

Mix , title =

Hesse, William and Allender, Eric and Barrington, David A. Mix , title =. Journal of Computer and System Sciences , volume =. 2002 , note =

2002
[2]

Automata, Languages and Programming, 28th International Colloquium (ICALP 2001) , series =

Hesse, William , title =. Automata, Languages and Programming, 28th International Colloquium (ICALP 2001) , series =

2001
[3]

and Tate, Stephen R

Reif, John H. and Tate, Stephen R. , title =. SIAM Journal on Computing , volume =. 1992 , doi =

1992
[4]

, title =

Reif, John H. , title =. Proceedings of the 2nd Annual Conference on Structure in Complexity Theory , pages =
[5]

6th International Conference on Learning Representations (ICLR) , year =

Arora, Raman and Basu, Amitabh and Mianjy, Poorya and Mukherjee, Anirbit , title =. 6th International Conference on Learning Representations (ICLR) , year =
[6]

2017 , eprint =

Mukherjee, Anirbit and Basu, Amitabh , title =. 2017 , eprint =

2017
[7]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Hertrich, Christoph and Basu, Amitabh and Di Summa, Marco and Skutella, Martin , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =. 2021 , note =

2021
[8]

International Conference on Learning Representations (ICLR) , year =

Haase, Christian and Hertrich, Christoph and Loho, Georg , title =. International Conference on Learning Representations (ICLR) , year =
[9]

IEEE Transactions on Information Theory , volume =

Wang, Shuning and Sun, Xusheng , title =. IEEE Transactions on Information Theory , volume =. 2005 , doi =

2005
[10]

Conference on Learning Theory (COLT) , pages =

Neyshabur, Behnam and Tomioka, Ryota and Srebro, Nathan , title =. Conference on Learning Theory (COLT) , pages =
[11]

SIAM Journal on Computing , volume =

Maass, Wolfgang , title =. SIAM Journal on Computing , volume =. 1997 , doi =

1997
[12]

Conference on Learning Theory (COLT) , pages =

Eldan, Ronen and Shamir, Ohad , title =. Conference on Learning Theory (COLT) , pages =
[13]

29th Annual Conference on Learning Theory (COLT) , series =

Telgarsky, Matus , title =. 29th Annual Conference on Learning Theory (COLT) , series =
[14]

Proceedings of the 15th Annual ACM Symposium on Theory of Computing (STOC) , pages =

Sipser, Michael , title =. Proceedings of the 15th Annual ACM Symposium on Theory of Computing (STOC) , pages =
[15]

Almost optimal lower bounds for small depth circuits , booktitle =

H. Almost optimal lower bounds for small depth circuits , booktitle =
[16]

An Average-Case Depth Hierarchy Theorem for Boolean Circuits , year =

H. An Average-Case Depth Hierarchy Theorem for Boolean Circuits , year =. J. ACM , month = aug, articleno =. doi:10.1145/3095799 , abstract =

work page doi:10.1145/3095799
[17]

and Sipser, Michael , title =

Furst, Merrick and Saxe, James B. and Sipser, Michael , title =. Mathematical Systems Theory , volume =
[18]

and Williams, Ryan , title =

Kane, Daniel M. and Williams, Ryan , title =. Proceedings of the 48th Annual ACM Symposium on Theory of Computing (STOC) , pages =. 2016 , note =

2016
[19]

, title =

Impagliazzo, Russell and Paturi, Ramamohan and Saks, Michael E. , title =. SIAM Journal on Computing , volume =. 1997 , doi =

1997
[20]

Besicovitch, A. S. , title =. Journal of the London Mathematical Society , volume =. 1940 , doi =

1940
[21]

Mordell, L. J. , title =. Pacific Journal of Mathematics , volume =

[1] [1]

Mix , title =

Hesse, William and Allender, Eric and Barrington, David A. Mix , title =. Journal of Computer and System Sciences , volume =. 2002 , note =

2002

[2] [2]

Automata, Languages and Programming, 28th International Colloquium (ICALP 2001) , series =

Hesse, William , title =. Automata, Languages and Programming, 28th International Colloquium (ICALP 2001) , series =

2001

[3] [3]

and Tate, Stephen R

Reif, John H. and Tate, Stephen R. , title =. SIAM Journal on Computing , volume =. 1992 , doi =

1992

[4] [4]

, title =

Reif, John H. , title =. Proceedings of the 2nd Annual Conference on Structure in Complexity Theory , pages =

[5] [5]

6th International Conference on Learning Representations (ICLR) , year =

Arora, Raman and Basu, Amitabh and Mianjy, Poorya and Mukherjee, Anirbit , title =. 6th International Conference on Learning Representations (ICLR) , year =

[6] [6]

2017 , eprint =

Mukherjee, Anirbit and Basu, Amitabh , title =. 2017 , eprint =

2017

[7] [7]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Hertrich, Christoph and Basu, Amitabh and Di Summa, Marco and Skutella, Martin , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =. 2021 , note =

2021

[8] [8]

International Conference on Learning Representations (ICLR) , year =

Haase, Christian and Hertrich, Christoph and Loho, Georg , title =. International Conference on Learning Representations (ICLR) , year =

[9] [9]

IEEE Transactions on Information Theory , volume =

Wang, Shuning and Sun, Xusheng , title =. IEEE Transactions on Information Theory , volume =. 2005 , doi =

2005

[10] [10]

Conference on Learning Theory (COLT) , pages =

Neyshabur, Behnam and Tomioka, Ryota and Srebro, Nathan , title =. Conference on Learning Theory (COLT) , pages =

[11] [11]

SIAM Journal on Computing , volume =

Maass, Wolfgang , title =. SIAM Journal on Computing , volume =. 1997 , doi =

1997

[12] [12]

Conference on Learning Theory (COLT) , pages =

Eldan, Ronen and Shamir, Ohad , title =. Conference on Learning Theory (COLT) , pages =

[13] [13]

29th Annual Conference on Learning Theory (COLT) , series =

Telgarsky, Matus , title =. 29th Annual Conference on Learning Theory (COLT) , series =

[14] [14]

Proceedings of the 15th Annual ACM Symposium on Theory of Computing (STOC) , pages =

Sipser, Michael , title =. Proceedings of the 15th Annual ACM Symposium on Theory of Computing (STOC) , pages =

[15] [15]

Almost optimal lower bounds for small depth circuits , booktitle =

H. Almost optimal lower bounds for small depth circuits , booktitle =

[16] [16]

An Average-Case Depth Hierarchy Theorem for Boolean Circuits , year =

H. An Average-Case Depth Hierarchy Theorem for Boolean Circuits , year =. J. ACM , month = aug, articleno =. doi:10.1145/3095799 , abstract =

work page doi:10.1145/3095799

[17] [17]

and Sipser, Michael , title =

Furst, Merrick and Saxe, James B. and Sipser, Michael , title =. Mathematical Systems Theory , volume =

[18] [18]

and Williams, Ryan , title =

Kane, Daniel M. and Williams, Ryan , title =. Proceedings of the 48th Annual ACM Symposium on Theory of Computing (STOC) , pages =. 2016 , note =

2016

[19] [19]

, title =

Impagliazzo, Russell and Paturi, Ramamohan and Saks, Michael E. , title =. SIAM Journal on Computing , volume =. 1997 , doi =

1997

[20] [20]

Besicovitch, A. S. , title =. Journal of the London Mathematical Society , volume =. 1940 , doi =

1940

[21] [21]

Mordell, L. J. , title =. Pacific Journal of Mathematics , volume =