pith. sign in

arxiv: 2606.18540 · v1 · pith:2IA4DL5Anew · submitted 2026-06-16 · 💻 cs.CC

Depth Lower Bounds for ReLU Networks with Binary Inputs

Pith reviewed 2026-06-26 21:25 UTC · model grok-4.3

classification 💻 cs.CC
keywords ReLU networksdepth separationlower boundsbinary inputsexact computationTC0 circuitsneural network expressivity
0
0 comments X

The pith

ReLU networks on binary inputs must use depth linear in n or width exponential in n for certain explicit functions under exact computation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs an explicit family of functions on {0,1}^n inputs that a ReLU network of depth n+1 and constant width computes exactly. It shows any exact ReLU network of depth d and width w for these functions must obey w^d = Ω(2^n), so depth o(n/log n) forces superpolynomial width. This gives depth separations across all depths for ReLU networks with discrete inputs, extending earlier results that were limited to depth two versus three or to real-valued functions. A reader would care because the result isolates depth as a necessary resource for efficient exact computation even when inputs are binary and outputs real-valued.

Core claim

The authors exhibit an explicit family of functions computable exactly by a ReLU network of depth n+1 and constant width, such that any ReLU network of depth d and width w computing the function exactly must satisfy w^d = Ω(2^n); in particular, no network of depth d = o(n/log n) can compute it with width polynomial in n. The lower bound applies only under exact, infinite-accuracy computation.

What carries the argument

An explicit family of functions on {0,1}^n that admit exact computation by a depth n+1 constant-width ReLU network but force w^d exponential in n for any shallower depth d.

If this is right

  • No polynomial-width ReLU network of depth o(n/log n) computes any function in the family exactly.
  • Depth separation for ReLU networks on Boolean inputs holds for every depth up to n.
  • Exact computation on discrete inputs can require high depth even when constant-width networks suffice at depth n+1.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Allowing approximation or finite precision removes the separation, since the truncated outputs are in TC^0.
  • The result isolates a gap between exact and approximate expressivity that may appear in other activation functions or input domains.
  • It suggests examining whether similar depth-width tradeoffs arise when the output is required only up to constant or logarithmic precision.

Load-bearing premise

The separation holds only for exact computation with infinite numerical precision; an exponential-precision truncation of the output lies in polynomial-size TC^0.

What would settle it

Constructing a ReLU network of depth d = o(n/log n) and width polynomial in n that computes one of the functions exactly on every input in {0,1}^n would falsify the lower bound.

read the original abstract

We study the role of depth in ReLU networks with discrete (Boolean) inputs and real-valued outputs, complementing two established lines of work. For Boolean inputs, striking depth separation results were proven for $\mathsf{AC}^0$ but with threshold ($\mathsf{TC}^0$) or ReLU gates depth separation is only established for depth two vs. three. On the other hand, for {\em real-valued} functions and ReLU networks, Telgarsky's (2016) constructed a simple one variable class of functions which establishes separation at higher depths. In this paper we are interested to establish an all-depths depth separation for ReLU networks on $\{0,1\}^n$. We do so by exhibiting an explicit family of functions computable exactly by a ReLU network of depth $n+1$ and constant width, such that any ReLU network of depth $d$ and width $w$ computing the function exactly must satisfy $w^d = \Omega(2^n)$; in particular, no network of depth $d = o(n/\log n)$ can compute it with width polynomial in $n$. We note that our lower bound relies on \emph{exact, infinite-accuracy} computation as an exponential precision truncation of the output is computable by a polynomial-size $\mathsf{TC}^0$ circuit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript constructs an explicit family of functions on {0,1}^n that are exactly computable by a constant-width ReLU network of depth n+1, yet any exact-computing ReLU network of depth d and width w must satisfy w^d = Ω(2^n). This rules out polynomial-width networks for depths d = o(n/log n). The lower bound is stated to hold only under exact, infinite-precision arithmetic; an exponential-precision truncation of the output lies in polynomial-size TC^0.

Significance. If the construction and lower-bound argument hold, the result supplies an all-depths depth separation for ReLU networks on Boolean inputs, extending beyond the depth-2-vs-3 separations previously known for threshold or ReLU gates and complementing Telgarsky’s real-valued examples. The explicitness of the function family and the explicit qualification regarding exact versus approximate computation are strengths that make the claim directly testable.

minor comments (2)
  1. Abstract: the function family is described only at a high level; a one-sentence description of the concrete function (or its recursive definition) would improve readability without lengthening the abstract.
  2. The manuscript should include a short remark (perhaps in the introduction) contrasting the exact-computation setting with the finite-precision regime already known to be in TC^0, to make the scope of the separation immediately clear to readers.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our results, the recognition of the explicit construction and the careful qualification regarding exact versus approximate computation, and the recommendation for minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper exhibits an explicit function family computable by depth n+1 ReLU networks and proves a matching lower bound w^d = Ω(2^n) via direct construction against external classes (TC^0). The derivation relies on exact infinite-precision computation, with the finite-precision caveat stated explicitly in the abstract; no equations reduce to fitted parameters, self-definitions, or load-bearing self-citations. The central claim is a self-contained proof construction independent of the authors' prior results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Result rests on standard properties of ReLU networks and circuit classes; no free parameters, new entities, or ad-hoc axioms visible in abstract.

axioms (2)
  • standard math ReLU networks compute piecewise linear functions via linear transformations and max(0,x) activations
    Foundational definition of the model class under study.
  • domain assumption Exact computation requires infinite precision arithmetic
    Explicitly invoked to obtain the separation; truncation moves the function into TC0.

pith-pipeline@v0.9.1-grok · 5766 in / 1310 out tokens · 28534 ms · 2026-06-26T21:25:48.543311+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 1 canonical work pages

  1. [1]

    Mix , title =

    Hesse, William and Allender, Eric and Barrington, David A. Mix , title =. Journal of Computer and System Sciences , volume =. 2002 , note =

  2. [2]

    Automata, Languages and Programming, 28th International Colloquium (ICALP 2001) , series =

    Hesse, William , title =. Automata, Languages and Programming, 28th International Colloquium (ICALP 2001) , series =

  3. [3]

    and Tate, Stephen R

    Reif, John H. and Tate, Stephen R. , title =. SIAM Journal on Computing , volume =. 1992 , doi =

  4. [4]

    , title =

    Reif, John H. , title =. Proceedings of the 2nd Annual Conference on Structure in Complexity Theory , pages =

  5. [5]

    6th International Conference on Learning Representations (ICLR) , year =

    Arora, Raman and Basu, Amitabh and Mianjy, Poorya and Mukherjee, Anirbit , title =. 6th International Conference on Learning Representations (ICLR) , year =

  6. [6]

    2017 , eprint =

    Mukherjee, Anirbit and Basu, Amitabh , title =. 2017 , eprint =

  7. [7]

    Advances in Neural Information Processing Systems (NeurIPS) , volume =

    Hertrich, Christoph and Basu, Amitabh and Di Summa, Marco and Skutella, Martin , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =. 2021 , note =

  8. [8]

    International Conference on Learning Representations (ICLR) , year =

    Haase, Christian and Hertrich, Christoph and Loho, Georg , title =. International Conference on Learning Representations (ICLR) , year =

  9. [9]

    IEEE Transactions on Information Theory , volume =

    Wang, Shuning and Sun, Xusheng , title =. IEEE Transactions on Information Theory , volume =. 2005 , doi =

  10. [10]

    Conference on Learning Theory (COLT) , pages =

    Neyshabur, Behnam and Tomioka, Ryota and Srebro, Nathan , title =. Conference on Learning Theory (COLT) , pages =

  11. [11]

    SIAM Journal on Computing , volume =

    Maass, Wolfgang , title =. SIAM Journal on Computing , volume =. 1997 , doi =

  12. [12]

    Conference on Learning Theory (COLT) , pages =

    Eldan, Ronen and Shamir, Ohad , title =. Conference on Learning Theory (COLT) , pages =

  13. [13]

    29th Annual Conference on Learning Theory (COLT) , series =

    Telgarsky, Matus , title =. 29th Annual Conference on Learning Theory (COLT) , series =

  14. [14]

    Proceedings of the 15th Annual ACM Symposium on Theory of Computing (STOC) , pages =

    Sipser, Michael , title =. Proceedings of the 15th Annual ACM Symposium on Theory of Computing (STOC) , pages =

  15. [15]

    Almost optimal lower bounds for small depth circuits , booktitle =

    H. Almost optimal lower bounds for small depth circuits , booktitle =

  16. [16]

    An Average-Case Depth Hierarchy Theorem for Boolean Circuits , year =

    H. An Average-Case Depth Hierarchy Theorem for Boolean Circuits , year =. J. ACM , month = aug, articleno =. doi:10.1145/3095799 , abstract =

  17. [17]

    and Sipser, Michael , title =

    Furst, Merrick and Saxe, James B. and Sipser, Michael , title =. Mathematical Systems Theory , volume =

  18. [18]

    and Williams, Ryan , title =

    Kane, Daniel M. and Williams, Ryan , title =. Proceedings of the 48th Annual ACM Symposium on Theory of Computing (STOC) , pages =. 2016 , note =

  19. [19]

    , title =

    Impagliazzo, Russell and Paturi, Ramamohan and Saks, Michael E. , title =. SIAM Journal on Computing , volume =. 1997 , doi =

  20. [20]

    Besicovitch, A. S. , title =. Journal of the London Mathematical Society , volume =. 1940 , doi =

  21. [21]

    Mordell, L. J. , title =. Pacific Journal of Mathematics , volume =