A Lower Bound for the Number of Linear Regions of Ternary ReLU Regression Neural Networks
Pith reviewed 2026-05-19 03:22 UTC · model grok-4.3
The pith
Ternary ReLU regression networks achieve the same polynomial-in-width and exponential-in-depth lower bounds on linear regions as general ReLU networks after modest width doublings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The number of linear regions in ternary ReLU regression neural networks increases polynomially with respect to network width and exponentially with respect to depth. It suffices to first double the width, then either square the width or double the depth of ternary networks with alternating ReLU and identity layers to achieve a lower bound comparable to that of general ReLU regression networks. When ReLU is used in every layer, a similar bound is obtained by further doubling the width.
What carries the argument
Lower-bound constructions based on ternary networks with alternating ReLU and identity layers after initial width doubling.
If this is right
- If true, ternary networks maintain high expressivity despite parameter restrictions.
- Width doubling serves as a simple fix to recover full scaling behavior.
- The result applies directly to regression tasks with ReLU activations.
- This scaling holds for both alternating layer patterns and all-ReLU configurations with extra width.
Where Pith is reading between the lines
- The alternating layer pattern may inspire new architectures for quantized networks.
- Similar lower bounds could be derived for other discrete weight sets beyond ternary.
- Implementing these constructions in small-scale networks would allow empirical verification of the region counts.
Load-bearing premise
The proof depends on carefully constructed networks that use alternating ReLU and identity layers or ReLU in all layers after width adjustments, rather than arbitrary layer configurations.
What would settle it
A counterexample would be a ternary network where the maximum number of linear regions grows slower than the stated polynomial or exponential rates despite following the suggested width and depth adjustments.
Figures
read the original abstract
With the advancement of deep learning, reducing computational complexity and memory consumption has become a critical challenge, and ternary neural networks (NNs) that restrict parameters to $\{-1, 0, +1\}$ have attracted attention as a promising approach. While ternary NNs demonstrate excellent performance in practical applications such as image recognition and natural language processing, their theoretical understanding remains insufficient. In this paper, we theoretically analyze the expressivity of ternary NNs from the perspective of the number of linear regions. Specifically, we evaluate the number of linear regions of ternary regression NNs with Rectified Linear Unit (ReLU) for activation functions and prove that the number of linear regions increases polynomially with respect to network width and exponentially with respect to depth, similar to standard NNs. Moreover, we show that it suffices to first double the width, then either square the width or double the depth of ternary NNs with alternating ReLU and identity layers to achieve a lower bound on the maximum number of linear regions comparable to that of general ReLU regression NNs. When using ReLU in all the layers, a similar bound is obtained by further doubling the width. This provides a theoretical explanation, in some sense, for the practical success of ternary NNs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript derives lower bounds on the maximum number of linear regions realizable by ternary-weight ({-1,0,1}) ReLU regression networks. It proves that this number grows polynomially in width and exponentially in depth, and supplies explicit constructions showing that first doubling the width and then either squaring the width or doubling the depth (in networks that alternate ReLU and identity layers) suffices to match the scaling known for unrestricted ReLU networks; an extra width doubling yields an analogous bound when every layer uses ReLU.
Significance. If the constructions and counting arguments hold, the result supplies a concrete theoretical explanation for the practical success of ternary networks: their expressivity, measured by linear-region count, is not materially impaired by the weight restriction once modest width or depth overhead is allowed. The explicit, constructive nature of the argument is a strength; it directly exhibits families of ternary networks that attain the stated scaling.
major comments (2)
- [Abstract and concluding theorem] The comparability claim in the abstract and the final theorem statement should be made quantitative: the precise polynomial degree in width and the base of the exponential in depth must be stated explicitly and compared side-by-side with the best-known lower bounds for general ReLU regression networks (e.g., the constructions of Montúfar et al. or subsequent refinements). Without this, it is unclear whether the ternary bound is asymptotically tight or merely order-of-magnitude similar.
- [Construction for all-ReLU layers] §4 (or the section containing the all-ReLU construction): the additional width-doubling step for networks that place ReLU in every layer must be shown to preserve the ternary weight constraint while still producing the claimed multiplicative growth in region count; the current sketch leaves open whether the extra neurons can be assigned weights in {-1,0,1} without collapsing regions.
minor comments (3)
- [Preliminaries / Construction] Clarify the precise definition of an 'identity layer' in the alternating construction and confirm that its weights remain restricted to {-1,0,1}.
- [Main theorem statement] The dependence of the bound on input dimension should be stated explicitly; linear-region counts for regression networks are typically sensitive to this parameter.
- [Results] A short table or corollary summarizing the exact scaling (width polynomial degree, depth exponential base) for each construction variant would improve readability.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation and constructive suggestions for minor revision. The comments correctly identify opportunities to strengthen the quantitative presentation of our bounds and the explicitness of one construction. We address both points below and will incorporate the necessary clarifications.
read point-by-point responses
-
Referee: [Abstract and concluding theorem] The comparability claim in the abstract and the final theorem statement should be made quantitative: the precise polynomial degree in width and the base of the exponential in depth must be stated explicitly and compared side-by-side with the best-known lower bounds for general ReLU regression networks (e.g., the constructions of Montúfar et al. or subsequent refinements). Without this, it is unclear whether the ternary bound is asymptotically tight or merely order-of-magnitude similar.
Authors: We agree that an explicit quantitative comparison will improve clarity. In the revision we will restate the abstract and the main theorem to specify the exact polynomial degree in width and the exponential base in depth achieved by the ternary constructions. We will add a direct side-by-side comparison (in text or a short table) with the lower bounds of Montúfar et al. and later refinements, showing that the asymptotic scaling is identical once the modest width overhead is accounted for. revision: yes
-
Referee: [Construction for all-ReLU layers] §4 (or the section containing the all-ReLU construction): the additional width-doubling step for networks that place ReLU in every layer must be shown to preserve the ternary weight constraint while still producing the claimed multiplicative growth in region count; the current sketch leaves open whether the extra neurons can be assigned weights in {-1,0,1} without collapsing regions.
Authors: We will expand the relevant section to give an explicit weight assignment in {-1,0,1} for the additional neurons introduced by the extra width doubling. The revision will include a short lemma verifying that these weights realize the desired identity-like behavior and that the linear-region count multiplies exactly as claimed, with no unintended collapse of regions. revision: yes
Circularity Check
No significant circularity; lower bound via explicit constructions
full rationale
The paper establishes a lower bound on the maximum number of linear regions by providing explicit network constructions (alternating ReLU/identity layers after width doubling, or all-ReLU after further doubling) that achieve polynomial-in-width and exponential-in-depth scaling under the ternary weight constraint. This is a direct combinatorial argument witnessed by the constructed families, with no reduction of the claimed bound to a fitted parameter, self-referential definition, or load-bearing self-citation. The derivation chain relies on counting regions in the constructed architectures rather than assuming the result to prove it, making the central claim self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The number of linear regions can be bounded from below by counting the distinct activation patterns realizable by networks whose weights lie in {-1,0,+1}.
Reference graph
Works this paper leans on
-
[1]
Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1, 2016. URL https://arxiv.org/abs/1602.02830
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[2]
Fengfu Li, Bin Liu, Xiaoxing Wang, Bo Zhang, and Junchi Yan. Ternary weight networks, 2022. URL https://arxiv.org/abs/1605.04711
-
[3]
Xnor-net: Imagenet classification using binary convolutional neural networks
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision -- ECCV 2016, pages 525--542, Cham, 2016. Springer International Publishing. ISBN 978-3-319-46493-0
work page 2016
-
[4]
Reactnet: Towards precise binary neural network with generalized activation functions
Zechun Liu, Zhiqiang Shen, Marios Savvides, and Kwang-Ting Cheng. Reactnet: Towards precise binary neural network with generalized activation functions. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision -- ECCV 2020, pages 143--159, Cham, 2020. Springer International Publishing. ISBN 978-3-030-58568-6
work page 2020
-
[5]
B inary BERT : Pushing the limit of BERT quantization
Haoli Bai, Wei Zhang, Lu Hou, Lifeng Shang, Jin Jin, Xin Jiang, Qun Liu, Michael Lyu, and Irwin King. B inary BERT : Pushing the limit of BERT quantization. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli, editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference...
-
[6]
BitNet: Scaling 1-bit Transformers for Large Language Models
Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, and Furu Wei. Bitnet: Scaling 1-bit transformers for large language models, 2023. URL https://arxiv.org/abs/2310.11453
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[7]
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, and Furu Wei. The era of 1-bit llms: All large language models are in 1.58 bits, 2024. URL https://arxiv.org/abs/2402.17764
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[8]
Binary deep neural networks for speech recognition
Xu Xiang, Yanmin Qian, and Kai Yu. Binary deep neural networks for speech recognition. In Interspeech 2017, pages 533--537, 2017. doi:10.21437/Interspeech.2017-1343
-
[9]
On the number of linear regions of deep neural networks
Guido Mont\' u far, Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio. On the number of linear regions of deep neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/paper_files/paper/...
work page 2014
-
[10]
On the number of response regions of deep feed forward networks with piece-wise linear activations
Razvan Pascanu, Guido Montufar, and Yoshua Bengio. On the number of response regions of deep feed forward networks with piece-wise linear activations, 2014. URL https://arxiv.org/abs/1312.6098
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[11]
Bounding and counting linear regions of deep neural networks
Thiago Serra, Christian Tjandraatmadja, and Srikumar Ramalingam. Bounding and counting linear regions of deep neural networks. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4558--4566. PMLR, 10--15 Jul 2018. URL https://proceedi...
work page 2018
-
[12]
Complexity of linear regions in deep networks
Boris Hanin and David Rolnick. Complexity of linear regions in deep networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2596--2604. PMLR, 09--15 Jun 2019. URL https://proceedings.mlr.press/v97/hanin19a.html
work page 2019
-
[13]
Theoretical analysis of the advantage of deepening neural networks
Yasushi Esaki, Yuta Nakahara, and Toshiyasu Matsushima. Theoretical analysis of the advantage of deepening neural networks. In 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 479--484, 2020. doi:10.1109/ICMLA51294.2020.00081
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.