arxiv: 2507.16079 · v2 · submitted 2025-07-21 · 💻 cs.LG · cs.AI

A Lower Bound for the Number of Linear Regions of Ternary ReLU Regression Neural Networks

Yuta Nakahara , Manabu Kobayashi , Toshiyasu Matsushima This is my paper

Pith reviewed 2026-05-19 03:22 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords ternary neural networkslinear regionsReLU activationnetwork expressivitylower boundswidth and depth scalingregression neural networks

0 comments p. Extension

The pith

Ternary ReLU regression networks achieve the same polynomial-in-width and exponential-in-depth lower bounds on linear regions as general ReLU networks after modest width doublings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes lower bounds on the number of linear regions in ternary neural networks where weights are restricted to -1, 0, and 1. It proves that these bounds scale polynomially with width and exponentially with depth, matching the behavior of unrestricted networks. A reader would care because this result offers a theoretical reason for the strong practical performance of efficient ternary networks in applications like image recognition.

Core claim

The number of linear regions in ternary ReLU regression neural networks increases polynomially with respect to network width and exponentially with respect to depth. It suffices to first double the width, then either square the width or double the depth of ternary networks with alternating ReLU and identity layers to achieve a lower bound comparable to that of general ReLU regression networks. When ReLU is used in every layer, a similar bound is obtained by further doubling the width.

What carries the argument

Lower-bound constructions based on ternary networks with alternating ReLU and identity layers after initial width doubling.

If this is right

If true, ternary networks maintain high expressivity despite parameter restrictions.
Width doubling serves as a simple fix to recover full scaling behavior.
The result applies directly to regression tasks with ReLU activations.
This scaling holds for both alternating layer patterns and all-ReLU configurations with extra width.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The alternating layer pattern may inspire new architectures for quantized networks.
Similar lower bounds could be derived for other discrete weight sets beyond ternary.
Implementing these constructions in small-scale networks would allow empirical verification of the region counts.

Load-bearing premise

The proof depends on carefully constructed networks that use alternating ReLU and identity layers or ReLU in all layers after width adjustments, rather than arbitrary layer configurations.

What would settle it

A counterexample would be a ternary network where the maximum number of linear regions grows slower than the stated polynomial or exponential rates despite following the suggested width and depth adjustments.

Figures

Figures reproduced from arXiv: 2507.16079 by Manabu Kobayashi, Toshiyasu Matsushima, Yuta Nakahara.

**Figure 2.** Figure 2: ReLU Regression NN with number of linear regions equal to [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: (a): A certain edge of a bounded integer coefficient NN. Here, the maximum weight is [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: (a): A bounded integer coefficient NN where weight [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: A ternary ReLU Regression NN with number of linear regions equal to [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: A graph of hj (x) Acknowledgment We thank Professor Suko at Waseda University for providing the motivation for this research. References Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1, 2016. URL https://arxiv.org/abs/1602.02830. Fengfu Li, Bin Liu, Xiaoxing Wang,… view at source ↗

read the original abstract

With the advancement of deep learning, reducing computational complexity and memory consumption has become a critical challenge, and ternary neural networks (NNs) that restrict parameters to $\{-1, 0, +1\}$ have attracted attention as a promising approach. While ternary NNs demonstrate excellent performance in practical applications such as image recognition and natural language processing, their theoretical understanding remains insufficient. In this paper, we theoretically analyze the expressivity of ternary NNs from the perspective of the number of linear regions. Specifically, we evaluate the number of linear regions of ternary regression NNs with Rectified Linear Unit (ReLU) for activation functions and prove that the number of linear regions increases polynomially with respect to network width and exponentially with respect to depth, similar to standard NNs. Moreover, we show that it suffices to first double the width, then either square the width or double the depth of ternary NNs with alternating ReLU and identity layers to achieve a lower bound on the maximum number of linear regions comparable to that of general ReLU regression NNs. When using ReLU in all the layers, a similar bound is obtained by further doubling the width. This provides a theoretical explanation, in some sense, for the practical success of ternary NNs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives the first explicit lower bound on linear regions for ternary-weight ReLU networks and shows the scaling can match full-precision cases with width-doubling constructions.

read the letter

The main point is that ternary ReLU regression networks with weights restricted to {-1,0,1} can still achieve a number of linear regions that grows polynomially in width and exponentially in depth. The authors supply explicit constructions to prove the lower bound, first doubling width then alternating ReLU and identity layers, or squaring width or doubling depth after that. For all-ReLU layers they add another width doubling. This is new because prior linear-region results focused on unrestricted real weights, and the ternary case had no matching explicit bound before.

Referee Report

2 major / 3 minor

Summary. The manuscript derives lower bounds on the maximum number of linear regions realizable by ternary-weight ({-1,0,1}) ReLU regression networks. It proves that this number grows polynomially in width and exponentially in depth, and supplies explicit constructions showing that first doubling the width and then either squaring the width or doubling the depth (in networks that alternate ReLU and identity layers) suffices to match the scaling known for unrestricted ReLU networks; an extra width doubling yields an analogous bound when every layer uses ReLU.

Significance. If the constructions and counting arguments hold, the result supplies a concrete theoretical explanation for the practical success of ternary networks: their expressivity, measured by linear-region count, is not materially impaired by the weight restriction once modest width or depth overhead is allowed. The explicit, constructive nature of the argument is a strength; it directly exhibits families of ternary networks that attain the stated scaling.

major comments (2)

[Abstract and concluding theorem] The comparability claim in the abstract and the final theorem statement should be made quantitative: the precise polynomial degree in width and the base of the exponential in depth must be stated explicitly and compared side-by-side with the best-known lower bounds for general ReLU regression networks (e.g., the constructions of Montúfar et al. or subsequent refinements). Without this, it is unclear whether the ternary bound is asymptotically tight or merely order-of-magnitude similar.
[Construction for all-ReLU layers] §4 (or the section containing the all-ReLU construction): the additional width-doubling step for networks that place ReLU in every layer must be shown to preserve the ternary weight constraint while still producing the claimed multiplicative growth in region count; the current sketch leaves open whether the extra neurons can be assigned weights in {-1,0,1} without collapsing regions.

minor comments (3)

[Preliminaries / Construction] Clarify the precise definition of an 'identity layer' in the alternating construction and confirm that its weights remain restricted to {-1,0,1}.
[Main theorem statement] The dependence of the bound on input dimension should be stated explicitly; linear-region counts for regression networks are typically sensitive to this parameter.
[Results] A short table or corollary summarizing the exact scaling (width polynomial degree, depth exponential base) for each construction variant would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and constructive suggestions for minor revision. The comments correctly identify opportunities to strengthen the quantitative presentation of our bounds and the explicitness of one construction. We address both points below and will incorporate the necessary clarifications.

read point-by-point responses

Referee: [Abstract and concluding theorem] The comparability claim in the abstract and the final theorem statement should be made quantitative: the precise polynomial degree in width and the base of the exponential in depth must be stated explicitly and compared side-by-side with the best-known lower bounds for general ReLU regression networks (e.g., the constructions of Montúfar et al. or subsequent refinements). Without this, it is unclear whether the ternary bound is asymptotically tight or merely order-of-magnitude similar.

Authors: We agree that an explicit quantitative comparison will improve clarity. In the revision we will restate the abstract and the main theorem to specify the exact polynomial degree in width and the exponential base in depth achieved by the ternary constructions. We will add a direct side-by-side comparison (in text or a short table) with the lower bounds of Montúfar et al. and later refinements, showing that the asymptotic scaling is identical once the modest width overhead is accounted for. revision: yes
Referee: [Construction for all-ReLU layers] §4 (or the section containing the all-ReLU construction): the additional width-doubling step for networks that place ReLU in every layer must be shown to preserve the ternary weight constraint while still producing the claimed multiplicative growth in region count; the current sketch leaves open whether the extra neurons can be assigned weights in {-1,0,1} without collapsing regions.

Authors: We will expand the relevant section to give an explicit weight assignment in {-1,0,1} for the additional neurons introduced by the extra width doubling. The revision will include a short lemma verifying that these weights realize the desired identity-like behavior and that the linear-region count multiplies exactly as claimed, with no unintended collapse of regions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; lower bound via explicit constructions

full rationale

The paper establishes a lower bound on the maximum number of linear regions by providing explicit network constructions (alternating ReLU/identity layers after width doubling, or all-ReLU after further doubling) that achieve polynomial-in-width and exponential-in-depth scaling under the ternary weight constraint. This is a direct combinatorial argument witnessed by the constructed families, with no reduction of the claimed bound to a fitted parameter, self-referential definition, or load-bearing self-citation. The derivation chain relies on counting regions in the constructed architectures rather than assuming the result to prove it, making the central claim self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The proof rests on standard combinatorial counting of hyperplane arrangements induced by ReLU activations and on the assumption that weights are restricted to the discrete set {-1,0,+1}. No free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The number of linear regions can be bounded from below by counting the distinct activation patterns realizable by networks whose weights lie in {-1,0,+1}.
This counting assumption is invoked when the authors state that the number of regions grows polynomially in width and exponentially in depth.

pith-pipeline@v0.9.0 · 5760 in / 1401 out tokens · 28241 ms · 2026-05-19T03:22:09.740096+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · 4 internal anchors

[1]

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1, 2016. URL https://arxiv.org/abs/1602.02830

work page internal anchor Pith review Pith/arXiv arXiv 2016
[2]

Ternary weight networks, 2022

Fengfu Li, Bin Liu, Xiaoxing Wang, Bo Zhang, and Junchi Yan. Ternary weight networks, 2022. URL https://arxiv.org/abs/1605.04711

work page arXiv 2022
[3]

Xnor-net: Imagenet classification using binary convolutional neural networks

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision -- ECCV 2016, pages 525--542, Cham, 2016. Springer International Publishing. ISBN 978-3-319-46493-0

work page 2016
[4]

Reactnet: Towards precise binary neural network with generalized activation functions

Zechun Liu, Zhiqiang Shen, Marios Savvides, and Kwang-Ting Cheng. Reactnet: Towards precise binary neural network with generalized activation functions. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision -- ECCV 2020, pages 143--159, Cham, 2020. Springer International Publishing. ISBN 978-3-030-58568-6

work page 2020
[5]

B inary BERT : Pushing the limit of BERT quantization

Haoli Bai, Wei Zhang, Lu Hou, Lifeng Shang, Jin Jin, Xin Jiang, Qun Liu, Michael Lyu, and Irwin King. B inary BERT : Pushing the limit of BERT quantization. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli, editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference...

work page doi:10.18653/v1/2021.acl-long.334 2021
[6]

BitNet: Scaling 1-bit Transformers for Large Language Models

Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, and Furu Wei. Bitnet: Scaling 1-bit transformers for large language models, 2023. URL https://arxiv.org/abs/2310.11453

work page internal anchor Pith review Pith/arXiv arXiv 2023
[7]

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, and Furu Wei. The era of 1-bit llms: All large language models are in 1.58 bits, 2024. URL https://arxiv.org/abs/2402.17764

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

Binary deep neural networks for speech recognition

Xu Xiang, Yanmin Qian, and Kai Yu. Binary deep neural networks for speech recognition. In Interspeech 2017, pages 533--537, 2017. doi:10.21437/Interspeech.2017-1343

work page doi:10.21437/interspeech.2017-1343 2017
[9]

On the number of linear regions of deep neural networks

Guido Mont\' u far, Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio. On the number of linear regions of deep neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/paper_files/paper/...

work page 2014
[10]

On the number of response regions of deep feed forward networks with piece-wise linear activations

Razvan Pascanu, Guido Montufar, and Yoshua Bengio. On the number of response regions of deep feed forward networks with piece-wise linear activations, 2014. URL https://arxiv.org/abs/1312.6098

work page internal anchor Pith review Pith/arXiv arXiv 2014
[11]

Bounding and counting linear regions of deep neural networks

Thiago Serra, Christian Tjandraatmadja, and Srikumar Ramalingam. Bounding and counting linear regions of deep neural networks. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4558--4566. PMLR, 10--15 Jul 2018. URL https://proceedi...

work page 2018
[12]

Complexity of linear regions in deep networks

Boris Hanin and David Rolnick. Complexity of linear regions in deep networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2596--2604. PMLR, 09--15 Jun 2019. URL https://proceedings.mlr.press/v97/hanin19a.html

work page 2019
[13]

Theoretical analysis of the advantage of deepening neural networks

Yasushi Esaki, Yuta Nakahara, and Toshiyasu Matsushima. Theoretical analysis of the advantage of deepening neural networks. In 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 479--484, 2020. doi:10.1109/ICMLA51294.2020.00081

work page doi:10.1109/icmla51294.2020.00081 2020