Beyond Discreteness: Sample Complexity Analysis of Straight-Through Estimator for 1-bit Quantization

Halyun Jeong; Jack Xin; Penghang Yin

arxiv: 2505.18113 · v2 · pith:VAVBRZQBnew · submitted 2025-05-23 · 💻 cs.LG · math.OC

Beyond Discreteness: Sample Complexity Analysis of Straight-Through Estimator for 1-bit Quantization

Halyun Jeong , Jack Xin , Penghang Yin This is my paper

classification 💻 cs.LG math.OC

keywords samplecomplexitydataneuralquantizationtraininganalysesanalysis

0 comments

read the original abstract

Training quantized neural networks requires addressing the non-differentiable and discrete nature of the underlying optimization problem. To tackle this challenge, the straight-through estimator (STE) has become the most widely adopted heuristic, allowing backpropagation through discrete operations by introducing biased yet valid surrogate gradients. However, its theoretical properties remain largely unexplored, with few existing analyses focus on the generalization error by assuming an infinite amount of training data. In contrast, this work presents the first sample complexity analysis of STE in the context of neural network quantization. Our theoretical results highlight the critical role of sample size in the success of STE, a key insight absent from existing studies. Specifically, by analyzing the quantization-aware training of a two-layer neural network with binary weights and activations, we derive the sample complexity bounds in terms of the data dimensionality that guarantee the convergence of STE-based optimization to the global minimum for both ergodic and non-ergodic analyses. Moreover, in the presence of label noises, we prove an intriguing recurrence property of STE-gradient method, where the iterate repeatedly escape from and return to the optimal binary weights. Finally, we empirically demonstrate that STE fails for general non-Gaussian data but its effectiveness can be restored through normalization, underscoring its practical importance in effective quantization.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Training Non-Differentiable Networks via Optimal Transport
cs.LG 2026-05 unverdicted novelty 8.0

PolyStep optimizes non-differentiable networks via forward-only polytope evaluations and optimal-transport barycentric updates, reaching 93.4% accuracy on hard-LIF spiking networks while outperforming gradient-free baselines.