arxiv: 2605.04946 · v2 · submitted 2026-05-06 · 💻 cs.LG · stat.ML

Training-Time Batch Normalization Reshapes Local Partition Geometry in Piecewise-Affine Networks

Xuan Qi , Yi Wei , Fanqi Yu , Furao Shen , Vittorio Murino , Cigdem Beyan This is my paper

Pith reviewed 2026-05-13 06:44 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords batch normalizationpiecewise-affine networksswitching hyperplanesaffine regionslocal partitionReLUtraining-time normalization

0 comments

The pith

Batch normalization during training increases expected local partition refinement in piecewise-affine networks by recentering switching hyperplanes on the batch centroid.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how batch normalization affects the actual function computed by continuous piecewise-affine networks, rather than just training dynamics. It models the division of input space into affine regions separated by switching hyperplanes and shows that BN ties the positions of these hyperplanes to the centroid and statistics of each training batch. Under sufficient conditions on batch statistics and layer maps, this leads to more refined partitions locally, with the refinement carrying forward through layers when prior maps act as affine embeddings. A reader would care because it supplies a geometric explanation for why BN changes the expressivity and behavior of networks at the function level.

Core claim

Conditioned on a mini-batch, BN defines for each neuron a reference hyperplane through the batch centroid, with breakpoint-switching hyperplanes as parallel translates whose offsets are in batch-standardized coordinates and independent of the raw bias. This yields an exact criterion for hyperplane intersection with local windows and a local region-density functional. Under explicit sufficient conditions, BN increases expected local partition refinement in ReLU and more general piecewise-affine networks, and the mechanism transfers locally through depth inside parent affine regions where the upstream representation map is an affine embedding.

What carries the argument

Batch-conditional reference hyperplanes through the batch centroid that determine offsets for switching hyperplanes independent of bias, enabling a local region-density functional based on affine-region counts.

Load-bearing premise

The network is continuous piecewise-affine and the upstream representation maps satisfy the affine-embedding condition along with the stated conditions on batch statistics.

What would settle it

A direct count of affine regions intersecting a local window in a trained ReLU network showing no increase in refinement when BN is used compared to the no-BN case under the same batch conditions.

Figures

Figures reproduced from arXiv: 2605.04946 by Cigdem Beyan, Fanqi Yu, Furao Shen, Vittorio Murino, Xuan Qi, Yi Wei.

**Figure 1.** Figure 1: The three two-dimensional datasets used in the local-region experiments: Gaus view at source ↗

**Figure 2.** Figure 2: Training dynamics of exact local region counts in single-layer networks. We plot view at source ↗

**Figure 3.** Figure 3: Representative single-layer partition visualization on a two-dimensional task. The view at source ↗

**Figure 4.** Figure 4: Bias-decoupling diagnostic under fixed reference batches. We plot layerwise Pear view at source ↗

**Figure 5.** Figure 5: Explicit bias-shift invariance under fixed reference batches. After applying view at source ↗

**Figure 6.** Figure 6: Training-time batch-conditional hyperplanes under a fixed reference batch. The view at source ↗

**Figure 7.** Figure 7: Exact ℓ∞ window-cut criterion under fixed reference batches. The normalizedoffset test matches explicit hyperplane–box intersection checks on both datasets. instantaneous mini-batch statistics. We compute the inference-mode normalized offsets ∆ℓ,j = |w ⊤ ℓ,ju¯ℓ + bℓ,j | ∥wℓ,j∥1 , ∆ BN,run ℓ,j = |w ⊤ ℓ,ju¯ℓ + bℓ,j − µ¯ℓ,j + αℓ,j√ v¯ℓ,j + ε| ∥wℓ,j∥1 , αℓ,j := βℓ,j/γℓ,j . (67) 31 view at source ↗

**Figure 8.** Figure 8: Training-conditional offset diagnostics across trials and checkpoints. view at source ↗

**Figure 9.** Figure 9: Centroid-to-hyperplane Euclidean distance distributions in representation space. view at source ↗

**Figure 10.** Figure 10: Inference-mode layerwise window-cut rates evaluated at radii selected by a fixed view at source ↗

**Figure 11.** Figure 11: Representative input-space partitions for deep MLPs at epoch 100 across three view at source ↗

**Figure 12.** Figure 12: Assumption check for the multilayer construction inside sampled parent regions. view at source ↗

**Figure 13.** Figure 13: Empirical CDFs of normalized offsets on three real datasets. In every layer and view at source ↗

**Figure 14.** Figure 14: Affine-region partitions on matched two-dimensional slices for BN and non-BN view at source ↗

**Figure 15.** Figure 15: Decision-boundary evolution under matched BN and non-BN training on Two view at source ↗

**Figure 16.** Figure 16: Validation accuracy over training epochs on Two Moons and Gaussian Quantiles view at source ↗

read the original abstract

Batch normalization (BN) is central to modern deep networks, but its effect on the realized function during training remains less understood than its optimization benefits. We study training-time BN in continuous piecewise-affine (CPA) networks through the geometry of switching hyperplanes and the induced affine-region partition. Conditioned on a mini-batch, we show that BN defines for each neuron a reference hyperplane through the batch centroid, and that breakpoint-switching hyperplanes are parallel translates whose offsets are expressed in batch-standardized coordinates and are independent of the raw bias. This yields an exact criterion for when a switching hyperplane intersects a local $\ell_\infty$ window and motivates a local region-density functional based on exact affine-region counts. Under explicit sufficient conditions, we show that BN increases expected local partition refinement in ReLU and more general piecewise-affine networks, and that this mechanism transfers locally through depth inside parent affine regions where the upstream representation map is an affine embedding. These results provide a function-level geometric account of training-time BN as a batch-conditional recentering mechanism near the data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clean geometric account of how training-time BN recenters switching hyperplanes around the batch centroid in CPA networks and derives an exact intersection criterion plus a local density functional, but the depth-transfer claim hinges on an unquantified affine-embedding condition.

read the letter

This paper shows that, conditioned on a mini-batch, BN turns each neuron's reference hyperplane into one passing through the batch centroid, with breakpoint hyperplanes as parallel translates whose offsets depend only on standardized coordinates. From that it extracts an exact criterion for when a hyperplane crosses a local ℓ∞ window and builds a region-density functional that counts affine pieces. Under the stated sufficient conditions this produces higher expected local refinement for ReLU and other piecewise-affine layers, and the effect is claimed to propagate locally through depth when the upstream map is an affine embedding on the parent region. That geometric picture is new relative to the literature the abstract cites and gives a function-level view of BN as batch-conditional recentering rather than just an optimization trick. The single-layer derivation looks direct and avoids circularity; it rests on straightforward manipulation of hyperplane offsets once standardization is applied. The main soft spot is that the multi-layer transfer still requires the upstream representation to act as an affine embedding on the relevant parent regions, yet the paper supplies no bounds, sampling checks, or counterexamples showing how often this holds once earlier BN layers are present. If the embedding fails on a positive-measure set, the depth claim reduces to the single-layer case. The sufficient conditions on batch statistics are explicit, which is good, but without visible derivation steps or verification that they are non-vacuous the central claim stays plausible rather than demonstrated. This is for people working on the geometry of deep networks and the functional side of normalization. It is worth sending to a serious referee because the new criterion and functional are concrete enough to be checked or extended, even if the depth part needs more evidence.

Referee Report

2 major / 2 minor

Summary. The paper analyzes training-time batch normalization in continuous piecewise-affine (CPA) networks via the geometry of switching hyperplanes and induced affine-region partitions. Conditioned on a mini-batch, BN is shown to define reference hyperplanes through the batch centroid with breakpoint offsets expressed in batch-standardized coordinates and independent of raw bias; this yields a criterion for hyperplane intersection with local windows and a local region-density functional. Under explicit sufficient conditions the analysis claims BN increases expected local partition refinement for ReLU and general CPA networks, with the mechanism transferring locally through depth inside parent affine regions where the upstream representation map is an affine embedding. The results are positioned as a function-level geometric account of BN as a batch-conditional recentering mechanism.

Significance. If the central claims hold, the work supplies a precise geometric mechanism linking BN to increased local expressivity through partition refinement, distinct from its usual optimization or regularization interpretations. The explicit sufficient conditions and the depth-transfer result under affine-embedding assumptions could help explain empirical depth-dependent effects of BN and guide architecture or initialization choices. The absence of machine-checked proofs or reproducible code is noted, but the direct manipulation of hyperplane offsets in standardized coordinates is a clear strength.

major comments (2)

[Abstract] Abstract and the derivation of the sufficient conditions: the claim that BN increases expected local partition refinement is stated to hold under explicit sufficient conditions, yet no derivation steps, error bounds, or verification that the conditions are non-vacuous appear in the provided text. This leaves the central quantitative claim unsupported by visible evidence and requires a self-contained proof or counterexample check before the result can be accepted.
[Abstract] Depth-transfer claim (stated in abstract): the local transfer of refinement through depth is conditioned on the upstream representation map being an affine embedding on the relevant parent region. No prevalence bounds, sampling statistics, or robustness checks are supplied showing how often this injectivity-plus-affine-structure condition holds once BN is inserted at earlier layers; violation on a positive-measure set would reduce the multi-layer claim to the single-layer case.

minor comments (2)

Notation for the local region-density functional and the exact affine-region counts should be introduced with an explicit equation number and a small illustrative diagram showing a 2-D example of hyperplane offsets before and after standardization.
The manuscript should clarify whether the sufficient conditions on batch statistics are assumed to hold with high probability under standard data assumptions or are treated as deterministic given the mini-batch.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and indicate the revisions we will incorporate.

read point-by-point responses

Referee: [Abstract] Abstract and the derivation of the sufficient conditions: the claim that BN increases expected local partition refinement is stated to hold under explicit sufficient conditions, yet no derivation steps, error bounds, or verification that the conditions are non-vacuous appear in the provided text. This leaves the central quantitative claim unsupported by visible evidence and requires a self-contained proof or counterexample check before the result can be accepted.

Authors: The derivation begins from the batch-conditional reference hyperplane through the centroid and proceeds by expressing switching offsets in standardized coordinates, yielding an exact intersection criterion with local windows. This leads to the local region-density functional whose expectation is compared with and without BN under the stated conditions on batch moments and hyperplane geometry. The result is deterministic (hence exact) given those conditions, so error bounds are not required. We will expand the presentation with an explicit step-by-step derivation subsection and a low-dimensional verification example confirming the conditions hold with positive probability for standard batch statistics. This addresses the request for self-contained evidence without altering the claims. revision: yes
Referee: [Abstract] Depth-transfer claim (stated in abstract): the local transfer of refinement through depth is conditioned on the upstream representation map being an affine embedding on the relevant parent region. No prevalence bounds, sampling statistics, or robustness checks are supplied showing how often this injectivity-plus-affine-structure condition holds once BN is inserted at earlier layers; violation on a positive-measure set would reduce the multi-layer claim to the single-layer case.

Authors: The transfer result is deliberately stated as local and conditional on the upstream map being an affine embedding within each parent region; this is the minimal assumption needed to preserve the piecewise-affine structure and region-counting under composition. We do not supply prevalence statistics because the manuscript emphasizes the geometric mechanism rather than its measure-theoretic frequency. In revision we will add a short discussion noting that, for generic weights in ReLU networks, the non-embedding set has measure zero, together with a brief numerical illustration on small networks. This strengthens the presentation while preserving the conditional character of the claim. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation uses direct hyperplane geometry under stated sufficient conditions

full rationale

The paper derives its geometric claims by explicit manipulation of switching hyperplanes in batch-standardized coordinates, defining reference hyperplanes through the batch centroid and expressing offsets independently of raw bias. The increase in expected local partition refinement and its depth-transfer are shown only under explicit sufficient conditions on batch statistics and the upstream map being an affine embedding; these conditions are stated as assumptions rather than derived from the result itself. No fitted parameters are renamed as predictions, no self-citations are load-bearing for the central claims, and no ansatz or uniqueness theorem is smuggled in. The derivation is self-contained against the stated assumptions and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Analysis rests on the assumption that the network is continuous piecewise-affine and that batch statistics induce well-defined hyperplane offsets; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption The network realizes a continuous piecewise-affine function.
Stated in the opening sentence of the abstract as the setting for the geometric analysis.
domain assumption Mini-batch statistics are well-defined and the batch centroid exists.
Implicit in the claim that BN defines a reference hyperplane through the batch centroid.

pith-pipeline@v0.9.0 · 5501 in / 1396 out tokens · 67392 ms · 2026-05-13T06:44:37.109509+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under explicit sufficient conditions, BN increases expected local partition refinement in ReLU and more general piecewise-affine networks, and this mechanism transfers locally through depth inside parent affine regions where the upstream representation map is an affine embedding.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Lemma 3 (Exact breakpoint-switching hyperplane under standard BN) ... H^BN_a = {u : ⟨w_j,u⟩ = ⟨w_j,ū⟩ + δ_a √(v_j+ε)}

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages

[1]

Batch normalization: Accelerating deep network training by reducing internal covariate shift

Ioffe, Sergey and Szegedy, Christian. Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proc. Int. Conf. Mach. Learn. 2015

work page 2015
[2]

Weight normalization: A simple reparameterization to accelerate training of deep neural networks

Salimans, Tim and Kingma, Durk P. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. Adv. Neural Inf. Process. Syst. 2016

work page 2016
[3]

and Selman, Bart and Weinberger, Kilian Q

Bjorck, Nils and Gomes, Carla P. and Selman, Bart and Weinberger, Kilian Q. Understanding batch normalization. Adv. Neural Inf. Process. Syst. 2018

work page 2018
[4]

How does batch normalization help optimization?

Santurkar, Shibani and Tsipras, Dimitris and Ilyas, Andrew and Madry, Aleksander. How does batch normalization help optimization?. Adv. Neural Inf. Process. Syst. 2018

work page 2018
[5]

Deep ReLU networks have surprisingly few activation patterns

Hanin, Boris and Rolnick, David. Deep ReLU networks have surprisingly few activation patterns. Adv. Neural Inf. Process. Syst. 2019

work page 2019
[6]

Complexity of linear regions in deep networks

Hanin, Boris and Rolnick, David. Complexity of linear regions in deep networks. Proc. Int. Conf. Mach. Learn. 2019

work page 2019
[7]

The geometry of deep networks: Power diagram subdivision

Balestriero, Randall and Cosentino, Romain and Aazhang, Behnaam and Baraniuk, Richard. The geometry of deep networks: Power diagram subdivision. Adv. Neural Inf. Process. Syst. 2019

work page 2019
[8]

Deep residual learning for image recognition

He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian. Deep residual learning for image recognition. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2016

work page 2016
[9]

Densely connected convolutional networks

Huang, Gao and Liu, Zhuang and Van Der Maaten, Laurens and Weinberger, Kilian Q. Densely connected convolutional networks. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2017

work page 2017
[10]

Girshick, and Jian Sun

Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian. Faster R - C N N : Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017. doi:10.1109/TPAMI.2016.2577031

work page doi:10.1109/tpami.2016.2577031 2017
[11]

ChannelNets: Compact and efficient convolutional neural networks via channel-wise convolutions

Gao, Hongyang and Wang, Zhengyang and Cai, Lei and Ji, Shuiwang. ChannelNets: Compact and efficient convolutional neural networks via channel-wise convolutions. IEEE Trans. Pattern Anal. Mach. Intell. 2021

work page 2021
[12]

Scaled- YOLOv4 : Scaling cross stage partial network

Wang, Chien-Yao and Bochkovskiy, Alexey and Liao, Hong-Yuan Mark. Scaled- YOLOv4 : Scaling cross stage partial network. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2021

work page 2021
[13]

GhostNets on heterogeneous devices via cheap operations

Han, Kai and Wang, Yunhe and Xu, Chang and Guo, Jianyuan and Xu, Chunjing and Wu, Enhua and Tian, Qi. GhostNets on heterogeneous devices via cheap operations. Int. J. Comput. Vis. 2022

work page 2022
[14]

On the expected complexity of maxout networks

Tseran, Hanna and Montufar, Guido F. On the expected complexity of maxout networks. Adv. Neural Inf. Process. Syst. 2021

work page 2021
[15]

Polyhedral complex extraction from R e LU networks using edge subdivision

Berzins, Arturs. Polyhedral complex extraction from R e LU networks using edge subdivision. Proc. Int. Conf. Mach. Learn. 2023

work page 2023
[16]

On the number of regions of piecewise linear neural networks

Goujon, Alexis and Etemadi, Arian and Unser, Michael. On the number of regions of piecewise linear neural networks. J. Comput. Appl. Math. 2024

work page 2024
[17]

Lower and upper bounds for numbers of linear regions of graph convolutional networks

Chen, Hao and Wang, Yu Guang and Xiong, Huan. Lower and upper bounds for numbers of linear regions of graph convolutional networks. Neural Networks. 2023

work page 2023
[18]

Sharp bounds for the number of regions of maxout networks and vertices of M inkowski sums

Montufar, Guido and Ren, Yue and Zhang, Leon. Sharp bounds for the number of regions of maxout networks and vertices of M inkowski sums. SIAM J. Appl. Algebra Geom. 2022

work page 2022
[19]

Estimation and comparison of linear regions for R e LU networks

Wang, Yuan. Estimation and comparison of linear regions for R e LU networks. Proc. Int. Joint Conf. Artif. Intell. 2022

work page 2022
[20]

On the number of linear regions of convolutional neural networks

Xiong, Huan and Huang, Lei and Yu, Mengyang and Liu, Li and Zhu, Fan and Shao, Ling. On the number of linear regions of convolutional neural networks. Proc. Int. Conf. Mach. Learn. 2020

work page 2020
[21]

2022 , doi =

Qiang Hu and Hao Zhang and Feifei Gao and Chengwen Xing and Jianping An , title =. 2022 , doi =

work page 2022
[22]

Using activation histograms to bound the number of affine regions in R e LU feed-forward neural networks

Hinz, Peter. Using activation histograms to bound the number of affine regions in R e LU feed-forward neural networks. 2021

work page 2021
[23]

Understanding deep neural networks with rectified linear units

Arora, Raman and Basu, Amitabh and Mianjy, Poorya and Mukherjee, Anirbit. Understanding deep neural networks with rectified linear units. Proc. Int. Conf. Learn. Represent. 2018

work page 2018
[24]

Splinecam: Exact visualization and characterization of deep network geometry and decision boundaries

Humayun, Ahmed Imtiaz and Balestriero, Randall and Balakrishnan, Guha and Baraniuk, Richard G. Splinecam: Exact visualization and characterization of deep network geometry and decision boundaries. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2023

work page 2023
[25]

Deep sparse rectifier neural networks

Glorot, Xavier and Bordes, Antoine and Bengio, Yoshua. Deep sparse rectifier neural networks. Proc. Int. Conf. Artif. Intell. Stat. 2011

work page 2011
[26]

Rectified linear units improve restricted B oltzmann machines

Nair, Vinod and Hinton, Geoffrey E. Rectified linear units improve restricted B oltzmann machines. Proc. Int. Conf. Mach. Learn. 2010

work page 2010
[27]

Exponential convergence rates for batch normalization: The power of length-direction decoupling in non-convex optimization

Kohler, Jonas and Daneshmand, Hadi and Lucchi, Aurelien and Hofmann, Thomas and Zhou, Ming and Neymeyr, Klaus. Exponential convergence rates for batch normalization: The power of length-direction decoupling in non-convex optimization. Proc. Int. Conf. Artif. Intell. Stat. 2019

work page 2019
[28]

A mean field theory of batch normalization

Yang, Greg and Pennington, Jeffrey and Rao, Vinay and Sohl-Dickstein, Jascha and Schoenholz, Samuel S. A mean field theory of batch normalization. Proc. Int. Conf. Learn. Represent. 2019

work page 2019
[29]

Empirical studies on the properties of linear regions in deep neural networks

Zhang, Xiao and Wu, Dongrui. Empirical studies on the properties of linear regions in deep neural networks. Proc. Int. Conf. Learn. Represent. 2020

work page 2020
[30]

Batch normalization explained

Balestriero, Randall and Baraniuk, Richard G. Batch normalization explained. 2022

work page 2022
[31]

Facing up to arrangements: Face-count formulas for partitions of space by hyperplanes

Zaslavsky, Thomas. Facing up to arrangements: Face-count formulas for partitions of space by hyperplanes. 1975

work page 1975
[32]

and others

Stanley, Richard P. and others. An introduction to hyperplane arrangements. Geom. Comb. 2004

work page 2004
[33]

and Pascanu, Razvan and Cho, Kyunghyun and Bengio, Yoshua

Montufar, Guido F. and Pascanu, Razvan and Cho, Kyunghyun and Bengio, Yoshua. On the number of linear regions of deep neural networks. Adv. Neural Inf. Process. Syst. 2014

work page 2014
[34]

Gennadiy Averkov and Christopher Hojny and Maximilian Merkert , title =. Proc. Int. Conf. Learn. Represent. (. 2025 , publisher =

work page 2025
[35]

Pawel Piwek and Adam Klukowski and Tianyang Hu , title =. Proc. Conf. Uncertainty in Artificial Intelligence (. 2023 , publisher =

work page 2023
[36]

2025 , doi =

Zhiwei Li and Cheng Wang , title =. 2025 , doi =

work page 2025
[37]

Baraniuk , title =

Randall Balestriero and Richard G. Baraniuk , title =. Proc. Int. Conf. Mach. Learn. (. 2018 , publisher =

work page 2018
[38]

Jeong and David Rolnick , title =

Boris Hanin and Ryan S. Jeong and David Rolnick , title =. Proc. Int. Conf. Learn. Represent. (. 2022 , publisher =

work page 2022
[39]

Laine , title =

Max Milkert and David Hyde and Forrest J. Laine , title =. Proc. Int. Conf. Mach. Learn. (. 2025 , publisher =

work page 2025
[40]

Advances in Neural Information Processing Systems , volume =

Saket Tiwari and George Konidaris , title =. Advances in Neural Information Processing Systems , volume =

work page
[41]

Martin Trimmel and Henning Petzka and Cristian Sminchisescu , title =. Proc. Int. Conf. Learn. Represent. (. 2021 , publisher =

work page 2021
[42]

Bartlett , title =

Martin Anthony and Peter L. Bartlett , title =. 2002 , isbn =

work page 2002
[43]

Journal of Computational Mathematics , volume =

Juncai He and Lin Li and Jinchao Xu and Chunyue Zheng , title =. Journal of Computational Mathematics , volume =. 2020 , doi =

work page 2020
[44]

2023 , doi =

Christoph Hertrich and Amitabh Basu and Marco Di Summa and Martin Skutella , title =. 2023 , doi =

work page 2023
[45]

Rao , title =

Kuan-Lin Chen and Harinath Garudadri and Bhaskar D. Rao , title =. Advances in Neural Information Processing Systems , volume =

work page
[46]

Christian Haase and Christoph Hertrich and Georg Loho , title =. Proc. Int. Conf. Learn. Represent. (. 2023 , publisher =

work page 2023
[47]

Advances in Neural Information Processing Systems , year=

The Computational Complexity of Counting Linear Regions in ReLU Neural Networks , author=. Advances in Neural Information Processing Systems , year=

work page
[48]

Rectifier nonlinearities improve neural network acoustic models , author=. Proc. icml , volume=. 2013 , organization=

work page 2013
[49]

2007 , publisher=

Stochastic orders , author=. 2007 , publisher=

work page 2007
[50]

International Conference on Neural Information Processing , pages=

Comparative analysis of the linear regions in ReLU and LeakyReLU networks , author=. International Conference on Neural Information Processing , pages=. 2023 , organization=

work page 2023
[51]

International Conference on Artificial Neural Networks , pages=

Empirical Study on the Effect of Residual Networks on the Expressiveness of Linear Regions , author=. International Conference on Artificial Neural Networks , pages=. 2023 , organization=

work page 2023
[52]

arXiv preprint arXiv:2310.18725 , year=

The Evolution of the Interplay Between Input Distributions and Linear Regions in Networks , author=. arXiv preprint arXiv:2310.18725 , year=

work page arXiv