Depth Lower Bounds for ReLU Networks with Binary Inputs
Pith reviewed 2026-06-26 21:25 UTC · model grok-4.3
The pith
ReLU networks on binary inputs must use depth linear in n or width exponential in n for certain explicit functions under exact computation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors exhibit an explicit family of functions computable exactly by a ReLU network of depth n+1 and constant width, such that any ReLU network of depth d and width w computing the function exactly must satisfy w^d = Ω(2^n); in particular, no network of depth d = o(n/log n) can compute it with width polynomial in n. The lower bound applies only under exact, infinite-accuracy computation.
What carries the argument
An explicit family of functions on {0,1}^n that admit exact computation by a depth n+1 constant-width ReLU network but force w^d exponential in n for any shallower depth d.
If this is right
- No polynomial-width ReLU network of depth o(n/log n) computes any function in the family exactly.
- Depth separation for ReLU networks on Boolean inputs holds for every depth up to n.
- Exact computation on discrete inputs can require high depth even when constant-width networks suffice at depth n+1.
Where Pith is reading between the lines
- Allowing approximation or finite precision removes the separation, since the truncated outputs are in TC^0.
- The result isolates a gap between exact and approximate expressivity that may appear in other activation functions or input domains.
- It suggests examining whether similar depth-width tradeoffs arise when the output is required only up to constant or logarithmic precision.
Load-bearing premise
The separation holds only for exact computation with infinite numerical precision; an exponential-precision truncation of the output lies in polynomial-size TC^0.
What would settle it
Constructing a ReLU network of depth d = o(n/log n) and width polynomial in n that computes one of the functions exactly on every input in {0,1}^n would falsify the lower bound.
read the original abstract
We study the role of depth in ReLU networks with discrete (Boolean) inputs and real-valued outputs, complementing two established lines of work. For Boolean inputs, striking depth separation results were proven for $\mathsf{AC}^0$ but with threshold ($\mathsf{TC}^0$) or ReLU gates depth separation is only established for depth two vs. three. On the other hand, for {\em real-valued} functions and ReLU networks, Telgarsky's (2016) constructed a simple one variable class of functions which establishes separation at higher depths. In this paper we are interested to establish an all-depths depth separation for ReLU networks on $\{0,1\}^n$. We do so by exhibiting an explicit family of functions computable exactly by a ReLU network of depth $n+1$ and constant width, such that any ReLU network of depth $d$ and width $w$ computing the function exactly must satisfy $w^d = \Omega(2^n)$; in particular, no network of depth $d = o(n/\log n)$ can compute it with width polynomial in $n$. We note that our lower bound relies on \emph{exact, infinite-accuracy} computation as an exponential precision truncation of the output is computable by a polynomial-size $\mathsf{TC}^0$ circuit.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript constructs an explicit family of functions on {0,1}^n that are exactly computable by a constant-width ReLU network of depth n+1, yet any exact-computing ReLU network of depth d and width w must satisfy w^d = Ω(2^n). This rules out polynomial-width networks for depths d = o(n/log n). The lower bound is stated to hold only under exact, infinite-precision arithmetic; an exponential-precision truncation of the output lies in polynomial-size TC^0.
Significance. If the construction and lower-bound argument hold, the result supplies an all-depths depth separation for ReLU networks on Boolean inputs, extending beyond the depth-2-vs-3 separations previously known for threshold or ReLU gates and complementing Telgarsky’s real-valued examples. The explicitness of the function family and the explicit qualification regarding exact versus approximate computation are strengths that make the claim directly testable.
minor comments (2)
- Abstract: the function family is described only at a high level; a one-sentence description of the concrete function (or its recursive definition) would improve readability without lengthening the abstract.
- The manuscript should include a short remark (perhaps in the introduction) contrasting the exact-computation setting with the finite-precision regime already known to be in TC^0, to make the scope of the separation immediately clear to readers.
Simulated Author's Rebuttal
We thank the referee for the positive and accurate summary of our results, the recognition of the explicit construction and the careful qualification regarding exact versus approximate computation, and the recommendation for minor revision. No major comments were raised in the report.
Circularity Check
No significant circularity
full rationale
The paper exhibits an explicit function family computable by depth n+1 ReLU networks and proves a matching lower bound w^d = Ω(2^n) via direct construction against external classes (TC^0). The derivation relies on exact infinite-precision computation, with the finite-precision caveat stated explicitly in the abstract; no equations reduce to fitted parameters, self-definitions, or load-bearing self-citations. The central claim is a self-contained proof construction independent of the authors' prior results.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math ReLU networks compute piecewise linear functions via linear transformations and max(0,x) activations
- domain assumption Exact computation requires infinite precision arithmetic
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.