Depth Lower Bounds for ReLU Networks with Binary Inputs
Pith reviewed 2026-06-26 21:25 UTC · model grok-4.3
The pith
ReLU networks on binary inputs must use depth linear in n or width exponential in n for certain explicit functions under exact computation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors exhibit an explicit family of functions computable exactly by a ReLU network of depth n+1 and constant width, such that any ReLU network of depth d and width w computing the function exactly must satisfy w^d = Ω(2^n); in particular, no network of depth d = o(n/log n) can compute it with width polynomial in n. The lower bound applies only under exact, infinite-accuracy computation.
What carries the argument
An explicit family of functions on {0,1}^n that admit exact computation by a depth n+1 constant-width ReLU network but force w^d exponential in n for any shallower depth d.
If this is right
- No polynomial-width ReLU network of depth o(n/log n) computes any function in the family exactly.
- Depth separation for ReLU networks on Boolean inputs holds for every depth up to n.
- Exact computation on discrete inputs can require high depth even when constant-width networks suffice at depth n+1.
Where Pith is reading between the lines
- Allowing approximation or finite precision removes the separation, since the truncated outputs are in TC^0.
- The result isolates a gap between exact and approximate expressivity that may appear in other activation functions or input domains.
- It suggests examining whether similar depth-width tradeoffs arise when the output is required only up to constant or logarithmic precision.
Load-bearing premise
The separation holds only for exact computation with infinite numerical precision; an exponential-precision truncation of the output lies in polynomial-size TC^0.
What would settle it
Constructing a ReLU network of depth d = o(n/log n) and width polynomial in n that computes one of the functions exactly on every input in {0,1}^n would falsify the lower bound.
read the original abstract
We study the role of depth in ReLU networks with discrete (Boolean) inputs and real-valued outputs, complementing two established lines of work. For Boolean inputs, striking depth separation results were proven for $\mathsf{AC}^0$ but with threshold ($\mathsf{TC}^0$) or ReLU gates depth separation is only established for depth two vs. three. On the other hand, for {\em real-valued} functions and ReLU networks, Telgarsky's (2016) constructed a simple one variable class of functions which establishes separation at higher depths. In this paper we are interested to establish an all-depths depth separation for ReLU networks on $\{0,1\}^n$. We do so by exhibiting an explicit family of functions computable exactly by a ReLU network of depth $n+1$ and constant width, such that any ReLU network of depth $d$ and width $w$ computing the function exactly must satisfy $w^d = \Omega(2^n)$; in particular, no network of depth $d = o(n/\log n)$ can compute it with width polynomial in $n$. We note that our lower bound relies on \emph{exact, infinite-accuracy} computation as an exponential precision truncation of the output is computable by a polynomial-size $\mathsf{TC}^0$ circuit.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript constructs an explicit family of functions on {0,1}^n that are exactly computable by a constant-width ReLU network of depth n+1, yet any exact-computing ReLU network of depth d and width w must satisfy w^d = Ω(2^n). This rules out polynomial-width networks for depths d = o(n/log n). The lower bound is stated to hold only under exact, infinite-precision arithmetic; an exponential-precision truncation of the output lies in polynomial-size TC^0.
Significance. If the construction and lower-bound argument hold, the result supplies an all-depths depth separation for ReLU networks on Boolean inputs, extending beyond the depth-2-vs-3 separations previously known for threshold or ReLU gates and complementing Telgarsky’s real-valued examples. The explicitness of the function family and the explicit qualification regarding exact versus approximate computation are strengths that make the claim directly testable.
minor comments (2)
- Abstract: the function family is described only at a high level; a one-sentence description of the concrete function (or its recursive definition) would improve readability without lengthening the abstract.
- The manuscript should include a short remark (perhaps in the introduction) contrasting the exact-computation setting with the finite-precision regime already known to be in TC^0, to make the scope of the separation immediately clear to readers.
Simulated Author's Rebuttal
We thank the referee for the positive and accurate summary of our results, the recognition of the explicit construction and the careful qualification regarding exact versus approximate computation, and the recommendation for minor revision. No major comments were raised in the report.
Circularity Check
No significant circularity
full rationale
The paper exhibits an explicit function family computable by depth n+1 ReLU networks and proves a matching lower bound w^d = Ω(2^n) via direct construction against external classes (TC^0). The derivation relies on exact infinite-precision computation, with the finite-precision caveat stated explicitly in the abstract; no equations reduce to fitted parameters, self-definitions, or load-bearing self-citations. The central claim is a self-contained proof construction independent of the authors' prior results.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math ReLU networks compute piecewise linear functions via linear transformations and max(0,x) activations
- domain assumption Exact computation requires infinite precision arithmetic
Reference graph
Works this paper leans on
-
[1]
Mix , title =
Hesse, William and Allender, Eric and Barrington, David A. Mix , title =. Journal of Computer and System Sciences , volume =. 2002 , note =
2002
-
[2]
Automata, Languages and Programming, 28th International Colloquium (ICALP 2001) , series =
Hesse, William , title =. Automata, Languages and Programming, 28th International Colloquium (ICALP 2001) , series =
2001
-
[3]
and Tate, Stephen R
Reif, John H. and Tate, Stephen R. , title =. SIAM Journal on Computing , volume =. 1992 , doi =
1992
-
[4]
, title =
Reif, John H. , title =. Proceedings of the 2nd Annual Conference on Structure in Complexity Theory , pages =
-
[5]
6th International Conference on Learning Representations (ICLR) , year =
Arora, Raman and Basu, Amitabh and Mianjy, Poorya and Mukherjee, Anirbit , title =. 6th International Conference on Learning Representations (ICLR) , year =
-
[6]
2017 , eprint =
Mukherjee, Anirbit and Basu, Amitabh , title =. 2017 , eprint =
2017
-
[7]
Advances in Neural Information Processing Systems (NeurIPS) , volume =
Hertrich, Christoph and Basu, Amitabh and Di Summa, Marco and Skutella, Martin , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =. 2021 , note =
2021
-
[8]
International Conference on Learning Representations (ICLR) , year =
Haase, Christian and Hertrich, Christoph and Loho, Georg , title =. International Conference on Learning Representations (ICLR) , year =
-
[9]
IEEE Transactions on Information Theory , volume =
Wang, Shuning and Sun, Xusheng , title =. IEEE Transactions on Information Theory , volume =. 2005 , doi =
2005
-
[10]
Conference on Learning Theory (COLT) , pages =
Neyshabur, Behnam and Tomioka, Ryota and Srebro, Nathan , title =. Conference on Learning Theory (COLT) , pages =
-
[11]
SIAM Journal on Computing , volume =
Maass, Wolfgang , title =. SIAM Journal on Computing , volume =. 1997 , doi =
1997
-
[12]
Conference on Learning Theory (COLT) , pages =
Eldan, Ronen and Shamir, Ohad , title =. Conference on Learning Theory (COLT) , pages =
-
[13]
29th Annual Conference on Learning Theory (COLT) , series =
Telgarsky, Matus , title =. 29th Annual Conference on Learning Theory (COLT) , series =
-
[14]
Proceedings of the 15th Annual ACM Symposium on Theory of Computing (STOC) , pages =
Sipser, Michael , title =. Proceedings of the 15th Annual ACM Symposium on Theory of Computing (STOC) , pages =
-
[15]
Almost optimal lower bounds for small depth circuits , booktitle =
H. Almost optimal lower bounds for small depth circuits , booktitle =
-
[16]
An Average-Case Depth Hierarchy Theorem for Boolean Circuits , year =
H. An Average-Case Depth Hierarchy Theorem for Boolean Circuits , year =. J. ACM , month = aug, articleno =. doi:10.1145/3095799 , abstract =
-
[17]
and Sipser, Michael , title =
Furst, Merrick and Saxe, James B. and Sipser, Michael , title =. Mathematical Systems Theory , volume =
-
[18]
and Williams, Ryan , title =
Kane, Daniel M. and Williams, Ryan , title =. Proceedings of the 48th Annual ACM Symposium on Theory of Computing (STOC) , pages =. 2016 , note =
2016
-
[19]
, title =
Impagliazzo, Russell and Paturi, Ramamohan and Saks, Michael E. , title =. SIAM Journal on Computing , volume =. 1997 , doi =
1997
-
[20]
Besicovitch, A. S. , title =. Journal of the London Mathematical Society , volume =. 1940 , doi =
1940
-
[21]
Mordell, L. J. , title =. Pacific Journal of Mathematics , volume =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.