Fast generalization error bound of deep learning without scale invariance of activation functions
Pith reviewed 2026-05-24 16:13 UTC · model grok-4.3
The pith
Deep neural networks achieve fast generalization bounds without scale-invariant activation functions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Without the scale invariance of activation functions, the tight generalization error bound which is essentially the same as that obtained under the scale invariance assumption is derived, showing that the invariance is not essential to obtain the fast rate of convergence in this analysis framework.
What carries the argument
The generalization error analysis framework that produces tight bounds, applied directly to deep networks whose activations lack scale invariance.
If this is right
- The fast convergence rate applies to networks that use common non-invariant activations such as sigmoid, tanh, and ELU.
- The analysis framework extends to deep learning models with a wider range of activation functions.
- Scale invariance is not required to reach the improved rate in this setting.
Where Pith is reading between the lines
- The finding opens the possibility that other restrictive assumptions in generalization analyses could also be relaxed while preserving the fast rate.
- Empirical checks of the predicted rate on networks with non-invariant activations would provide a direct test of the bound.
- The result suggests that activation choice may not be a limiting factor for theoretical fast rates in this framework.
Load-bearing premise
The existing framework for analyzing generalization error remains valid and can be applied without change to activation functions that lack scale invariance.
What would settle it
An empirical or theoretical demonstration that deep networks using sigmoid or hyperbolic tangent activations converge only at the slower rate of order one over square root of n, while the bound predicts a faster rate, would falsify the claim.
read the original abstract
In theoretical analysis of deep learning, discovering which features of deep learning lead to good performance is an important task. In this paper, using the framework for analyzing the generalization error developed in Suzuki (2018), we derive a fast learning rate for deep neural networks with more general activation functions. In Suzuki (2018), assuming the scale invariance of activation functions, the tight generalization error bound of deep learning was derived. They mention that the scale invariance of the activation function is essential to derive tight error bounds. Whereas the rectified linear unit (ReLU; Nair and Hinton, 2010) satisfies the scale invariance, the other famous activation functions including the sigmoid and the hyperbolic tangent functions, and the exponential linear unit (ELU; Clevert et al., 2016) does not satisfy this condition. The existing analysis indicates a possibility that a deep learning with the non scale invariant activations may have a slower convergence rate of $O(1/\sqrt{n})$ when one with the scale invariant activations can reach a rate faster than $O(1/\sqrt{n})$. In this paper, without the scale invariance of activation functions, we derive the tight generalization error bound which is essentially the same as that of Suzuki (2018). From this result, at least in the framework of Suzuki (2018), it is shown that the scale invariance of the activation functions is not essential to get the fast rate of convergence. Simultaneously, it is also shown that the theoretical framework proposed by Suzuki (2018) can be widely applied for analysis of deep learning with general activation functions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper extends the generalization error analysis framework of Suzuki (2018) to deep neural networks with activation functions that lack scale invariance (e.g., sigmoid, tanh, ELU). It derives a tight generalization error bound that is essentially the same as in Suzuki (2018), concluding that scale invariance is not essential for the fast rate of convergence within this framework and that the Suzuki framework applies more broadly to general activations.
Significance. If the derivation holds, the result broadens the applicability of fast-rate bounds to activation functions commonly used in practice, removing a potential restriction from Suzuki (2018) and confirming the framework's versatility. This addresses a gap between theoretical assumptions and empirical deep learning.
minor comments (1)
- [Abstract] Abstract: the sentence stating that 'the other famous activation functions including the sigmoid and the hyperbolic tangent functions, and the exponential linear unit (ELU; Clevert et al., 2016) does not satisfy this condition' contains a subject-verb agreement error ('functions' is plural, so it should read 'do not').
Simulated Author's Rebuttal
We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. We are glad that the extension of the Suzuki (2018) framework to non-scale-invariant activations is viewed as addressing a gap between theory and practice.
Circularity Check
No significant circularity
full rationale
The paper explicitly extends the independent Suzuki (2018) framework by removing its scale-invariance assumption on activations and re-derives an equivalent generalization bound. This is a standard modification of prior external work rather than any self-definitional loop, fitted-input prediction, or load-bearing self-citation. No equations or steps in the abstract reduce the claimed result to its own inputs by construction; the derivation is presented as building on an externally developed analysis that remains valid after the modification.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The generalization error analysis framework from Suzuki (2018) applies without the scale invariance assumption on activations.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.