Learning low-precision neural networks without Straight-Through Estimator(STE)

Matthew Mattina; Zhi-Gang Liu

Learning low-precision neural networks without Straight-Through Estimator(STE)

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 1903.01061 v2 pith:KD7TAXU7 submitted 2019-03-04 cs.LG stat.ML

Learning low-precision neural networks without Straight-Through Estimator(STE)

Zhi-Gang Liu , Matthew Mattina This is my paper

classification cs.LG stat.ML

keywords alphafull-precisionlow-precisionweightaffinealpha-blendingbitscombination

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

The Straight-Through Estimator (STE) is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding. We propose an alternative methodology called alpha-blending (AB), which quantizes neural networks to low-precision using stochastic gradient descent (SGD). Our method (AB) avoids STE approximation by replacing the quantized weight in the loss function by an affine combination of the quantized weight w_q and the corresponding full-precision weight w with non-trainable scalar coefficient $\alpha$ and $1-\alpha$. During training, $\alpha$ is gradually increased from 0 to 1; the gradient updates to the weights are through the full-precision term, $(1-\alpha)w$, of the affine combination; the model is converted from full-precision to low-precision progressively. To evaluate the method, a 1-bit BinaryNet on CIFAR10 dataset and 8-bits, 4-bits MobileNet v1, ResNet_50 v1/2 on ImageNet dataset are trained using the alpha-blending approach, and the evaluation indicates that AB improves top-1 accuracy by 0.9%, 0.82% and 2.93% respectively compared to the results of STE based quantization.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Layerwise Progressive Freezing: A Training Scaffold for Depth-Scalable Binary Networks
cs.LG 2026-06 unverdicted novelty 7.0

StoMPP progressively binarizes BNN layers layerwise from input to output via stochastic masks, delivering depth-scalable accuracy gains in a fully STE-free regime by controlling activation-induced gradient blockades.
Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
cs.CV 2024-08 unverdicted novelty 7.0

CoRa reclaims quantization residuals in pre-trained ConvNets by searching low-rank adapter architectures instead of weights, matching SOTA accuracy on ImageNet in 3-4 bit settings with under 250 iterations on 1600 images.