Neural gradients are near- lognormal: improved quantized and sparse training

Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner, Daniel Soudry · 2006 · arXiv 2006.08173

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

HRM-Text: Efficient Pretraining Beyond Scaling

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

A 1B-parameter hierarchical recurrent model pretrained on 40B instruction-response tokens achieves 60.7% MMLU and strong results on ARC-C, DROP, GSM8K, and MATH while using 100-900x fewer tokens than standard baselines.

Deployment-Aligned Low-Precision Neural Architecture Search for Spaceborne Edge AI

cs.CV · 2026-04-27 · unverdicted · novelty 4.0

Deployment-aligned low-precision NAS recovers about two-thirds of the accuracy drop from post-training quantization, achieving 0.826 mIoU on-device for a 95k-parameter model on Intel Movidius Myriad X without added complexity.

citing papers explorer

Showing 2 of 2 citing papers.

HRM-Text: Efficient Pretraining Beyond Scaling cs.CL · 2026-05-20 · unverdicted · none · ref 76
A 1B-parameter hierarchical recurrent model pretrained on 40B instruction-response tokens achieves 60.7% MMLU and strong results on ARC-C, DROP, GSM8K, and MATH while using 100-900x fewer tokens than standard baselines.
Deployment-Aligned Low-Precision Neural Architecture Search for Spaceborne Edge AI cs.CV · 2026-04-27 · unverdicted · none · ref 38
Deployment-aligned low-precision NAS recovers about two-thirds of the accuracy drop from post-training quantization, achieving 0.826 mIoU on-device for a 95k-parameter model on Intel Movidius Myriad X without added complexity.

Neural gradients are near- lognormal: improved quantized and sparse training

fields

years

verdicts

representative citing papers

citing papers explorer