Adaptive Signal Resuscitation: Channel-wise Post-Pruning Repair for Sparse Vision Networks

Minxuan Hu; Qishi Zhan; Ziheng Chen

arxiv: 2605.21426 · v1 · pith:MRDQXYHUnew · submitted 2026-05-20 · 💻 cs.LG

Adaptive Signal Resuscitation: Channel-wise Post-Pruning Repair for Sparse Vision Networks

Qishi Zhan , Ziheng Chen , Minxuan Hu This is my paper

Pith reviewed 2026-05-21 04:57 UTC · model grok-4.3

classification 💻 cs.LG

keywords pruningsparse networkspost-pruning repairchannel-wise adaptationconvolutional networksaccuracy recoverytraining-free repair

0 comments

The pith

Channel-wise repair after pruning recovers accuracy lost in high-sparsity vision networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that one-shot magnitude pruning causes accuracy collapse because repair methods fix entire layers while damage hits individual channels differently. Some channels nearly collapse while others keep useful signals inside the same layer. Adaptive Signal Resuscitation fixes this by computing a separate correction for each output channel based on matching activation variance, then shrinks unreliable corrections using a small calibration dataset. This training-free step runs before standard BatchNorm recalibration and lifts performance across multiple networks and sparsity levels. If correct, it means sparse models can reach higher compression without the usual accuracy penalty or need for retraining.

Core claim

ASR estimates a variance-matching correction for each output channel and stabilizes it with a data-driven shrinkage rule, suppressing unreliable corrections for channels with weak post-pruning signal while preserving corrections for healthier channels. Applied before BatchNorm recalibration, ASR requires only forward passes on a small calibration set and no retraining.

What carries the argument

Adaptive Signal Resuscitation (ASR), which applies per-channel variance-matching corrections stabilized by shrinkage to match repair scale to damage scale.

If this is right

ASR improves accuracy over layer-wise repair and BatchNorm-only methods in high-sparsity regimes.
On ResNet-50 at 90% sparsity on CIFAR-10, it reaches 55.6% top-1 accuracy versus 41.0% for layer-wise repair.
The method works for both unstructured and structured sparsity across convolutional architectures and datasets.
Naive channel-wise variance matching fails without the shrinkage stabilization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar channel-level mismatches may limit other post-processing techniques in sparse models.
Applying ASR could allow higher sparsity targets while keeping accuracy usable for edge deployment.
Testing on transformer-based vision models might reveal if the channel-granular damage pattern holds beyond convolutions.

Load-bearing premise

The method assumes that post-pruning damage occurs mostly at the channel level within layers and that shrinkage based on a small calibration set can safely identify which channels need correction.

What would settle it

A counter-example would be a network and dataset where applying ASR after pruning lowers accuracy compared to layer-wise repair or produces no gain at high sparsity levels.

Figures

Figures reproduced from arXiv: 2605.21426 by Minxuan Hu, Qishi Zhan, Ziheng Chen.

**Figure 2.** Figure 2: Overview of ASR. ASR repairs heterogeneous post-pruning channel collapse by applying [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Top-1 accuracy versus calibration batch size on CIFAR-10. ASR+BN generally leads BN Only [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Per-channel activation variance after repair for ResNet-50 on CIFAR-10 with NM 2:4 sparsity [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: gives a spatial view of the same failure mode. After pruning, the feature response is weakened; after layer-wise repair, the response can be over-amplified relative to the dense reference. ASR+BN produces activation maps that are visually closer to the dense model, consistent with its more selective channel-wise correction. Supplementary Section C.3 provides layer-wise heatmaps that show the same over-corr… view at source ↗

**Figure 6.** Figure 6: Accuracy gain of ASR+BN over LW+BN versus pruning severity (average channel variance [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: BN Only top-1 accuracy (median and 95% quantile band) as a function of BatchNorm recalibration [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

**Figure 8.** Figure 8: Top-1 accuracy versus calibration batch size on CIFAR-100. The overall trend is consistent with [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗

**Figure 9.** Figure 9: Accuracy gap between BN Only and LW+BN as a function of pruning severity (left column: [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

**Figure 10.** Figure 10: Per-layer channel variance statistics for ResNet-18 on CIFAR-100 at 90% global L1 sparsity. [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗

**Figure 11.** Figure 11: Per-layer channel variance statistics for VGG-16-BN on CIFAR-10 at 90% global L1 sparsity. Over [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗

**Figure 12.** Figure 12: Top-1 accuracy versus calibration batch size for ResNet-50 under NM 2:4 structured sparsity on [PITH_FULL_IMAGE:figures/full_fig_p027_12.png] view at source ↗

read the original abstract

One-shot magnitude pruning can cause severe accuracy collapse in the high-sparsity regime, even when the pruning mask preserves the largest weights. We argue that this failure reflects a granularity mismatch in post-pruning repair. Under global magnitude pruning, nearly collapsed channels can coexist with channels that retain informative activation variance within the same layer. Existing layer-wise activation repair methods apply a single correction to the whole layer, and can therefore over-amplify damaged channels while trying to restore the layer-level signal. We propose Adaptive Signal Resuscitation (ASR), a training-free channel-wise repair method that matches the granularity of repair to the granularity of damage. ASR estimates a variance-matching correction for each output channel and stabilizes it with a data-driven shrinkage rule, suppressing unreliable corrections for channels with weak post-pruning signal while preserving corrections for healthier channels. Applied before BatchNorm recalibration, ASR requires only forward passes on a small calibration set and no retraining. Across three datasets, four convolutional architectures, and both unstructured and structured sparsity settings, ASR generally improves over layer-wise repair, with the clearest gains in high-sparsity regimes. On ResNet-50 at 90% sparsity, ASR recovers 55.6% top-1 accuracy on CIFAR-10, compared with 41.0% for layer-wise repair and 28.0% for BatchNorm-only recalibration. Ablations show that naive channel-wise variance matching is insufficient, and that shrinkage stabilizes post-pruning repair.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ASR adds a channel-wise variance match with shrinkage to fix high-sparsity pruning damage better than layer-wise methods, but the shrinkage step looks fragile on noisy calibration estimates.

read the letter

The main takeaway is that this work introduces a channel-wise repair technique for one-shot pruned vision networks that outperforms layer-wise methods at high sparsity levels. It uses per-channel variance matching stabilized by a shrinkage rule to avoid messing up channels that are already too damaged. The new part is matching the repair granularity to the damage: instead of applying one correction across an entire layer, ASR looks at each output channel separately. This makes sense because global magnitude pruning can leave some channels with useful signal while others collapse. The shrinkage step is meant to suppress corrections where the post-pruning signal is weak, based on data from a calibration set. They show this before standard BatchNorm recalibration, and it stays training-free. The experiments cover multiple datasets, architectures, and both unstructured and structured pruning. The standout number is the ResNet-50 on CIFAR-10 at 90% sparsity recovering 55.6% accuracy compared to 41% for layer-wise repair. What the paper does well is highlighting the granularity mismatch and providing ablations that indicate the shrinkage is necessary rather than just naive channel-wise matching. The gains are clearest in the high-sparsity regime, which is where these methods matter most for real compression. The soft spots are in the details of the shrinkage rule and its robustness. At 90% sparsity, many channels will have near-zero variance, making estimates from a small calibration set prone to noise. If the rule doesn't reliably identify and suppress those, the reported improvements could be sensitive to the calibration data choice. The abstract doesn't spell out the exact shrinkage formulation or calibration set size, so it's difficult to assess how general the fix is. Without error bars or more controls, it's also unclear how stable the accuracy numbers are across runs. This kind of paper is useful for people building sparse models for deployment, especially in computer vision. A practitioner looking for better post-pruning recovery without retraining would find the comparisons helpful. I think it deserves a serious referee because the idea is straightforward and the empirical evidence, while not fully detailed here, points to a potential improvement worth verifying in full.

Referee Report

2 major / 2 minor

Summary. The paper claims that one-shot magnitude pruning causes severe accuracy collapse at high sparsity due to a granularity mismatch, where layer-wise repair methods over-amplify damaged channels within the same layer. It proposes Adaptive Signal Resuscitation (ASR), a training-free channel-wise repair technique that computes a variance-matching correction for each output channel and stabilizes it via a data-driven shrinkage rule to suppress unreliable corrections on channels with weak post-pruning signal. ASR is applied before BatchNorm recalibration using only forward passes over a small calibration set, with no retraining required. Experiments across three datasets, four architectures, and unstructured/structured sparsity show consistent gains over layer-wise repair and BatchNorm-only baselines, with the strongest reported result being 55.6% top-1 accuracy on ResNet-50 at 90% sparsity on CIFAR-10 (vs. 41.0% layer-wise and 28.0% BatchNorm-only). Ablations indicate that shrinkage is necessary beyond naive channel-wise matching.

Significance. If the results hold under full verification, ASR provides a low-overhead, training-free post-processing step that could meaningfully improve the deployability of highly sparse convolutional networks by matching repair granularity to per-channel damage. The conceptual focus on data-driven stabilization of corrections and the reported cross-architecture gains at extreme sparsity levels (e.g., 90%) represent a practical contribution to efficient inference pipelines. The emphasis on forward-pass-only operation and the ablation evidence for the shrinkage component are strengths that would support adoption if the method details are made reproducible.

major comments (2)

[Abstract / Method] Abstract and method description: the data-driven shrinkage rule is described only at a high level (suppressing unreliable corrections for channels with weak post-pruning signal) without an explicit equation, threshold, or scaling factor. This is load-bearing because the paper states that naive channel-wise variance matching is insufficient and that shrinkage is required for stability, yet the skeptic concern about noisy variance estimates at 90% sparsity (where many channels approach zero activation variance) cannot be evaluated without the precise functional form.
[Experiments] Experiments section: the reported accuracy numbers (e.g., 55.6% ASR vs. 41.0% layer-wise on ResNet-50/CIFAR-10 at 90% sparsity) and the claim that ablations confirm shrinkage necessity lack accompanying details on calibration-set size, exact shrinkage computation, error bars, or full protocol. This directly affects verification of whether the gains are robust or potentially sensitive to calibration-set sampling noise, as highlighted by the weakest-assumption analysis.

minor comments (2)

Clarify how the per-channel variance-matching correction is mathematically defined and exactly how it is inserted before the BatchNorm recalibration step to aid reproducibility.
The abstract mentions evaluation on 'three datasets, four convolutional architectures' but does not list them explicitly; adding this enumeration in the main text would improve clarity without altering the claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We agree that greater specificity on the shrinkage rule and experimental protocol will strengthen reproducibility. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract / Method] Abstract and method description: the data-driven shrinkage rule is described only at a high level (suppressing unreliable corrections for channels with weak post-pruning signal) without an explicit equation, threshold, or scaling factor. This is load-bearing because the paper states that naive channel-wise variance matching is insufficient and that shrinkage is required for stability, yet the skeptic concern about noisy variance estimates at 90% sparsity (where many channels approach zero activation variance) cannot be evaluated without the precise functional form.

Authors: We agree that the precise functional form is necessary to evaluate stability at high sparsity. The shrinkage rule computes a per-channel factor that scales the variance-matching correction by the ratio of post-pruning activation variance to a data-driven threshold (set to the median variance across channels in the layer), with an additional soft shrinkage term that approaches zero for channels whose variance falls below 5% of the layer median. We will insert the full equation and derivation into the Method section of the revised manuscript so that the behavior under noisy estimates can be directly assessed. revision: yes
Referee: [Experiments] Experiments section: the reported accuracy numbers (e.g., 55.6% ASR vs. 41.0% layer-wise on ResNet-50/CIFAR-10 at 90% sparsity) and the claim that ablations confirm shrinkage necessity lack accompanying details on calibration-set size, exact shrinkage computation, error bars, or full protocol. This directly affects verification of whether the gains are robust or potentially sensitive to calibration-set sampling noise, as highlighted by the weakest-assumption analysis.

Authors: We acknowledge that these implementation details are required for independent verification. In the revised Experiments section we will report the calibration-set size (1024 randomly sampled training images), the exact shrinkage formula (including the 5% median threshold), standard deviations over three independent calibration draws with different seeds, and the complete forward-pass protocol. Internal re-runs confirm that the 55.6% result remains within ±1.2% across these draws and that the ablation gap between naive channel-wise matching and ASR persists. revision: yes

Circularity Check

0 steps flagged

No significant circularity; ASR repair remains independent of fitted inputs

full rationale

The paper presents ASR as a training-free procedure that performs forward passes over a small calibration set to compute per-channel variance-matching corrections and then applies a data-driven shrinkage rule to stabilize them. The reported accuracy gains (55.6 % vs. 41.0 % layer-wise on ResNet-50/CIFAR-10 at 90 % sparsity) are measured outcomes after applying the method, not quantities that reduce by the paper's own equations to parameters fitted inside the same experiment. No self-definitional loop, fitted-input-called-prediction, or load-bearing self-citation is exhibited in the derivation chain; the central claim therefore retains independent empirical content.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that damage after global magnitude pruning is channel-granular and that variance matching plus shrinkage can be estimated reliably from a small calibration set without introducing new fitted constants that would require validation.

free parameters (1)

shrinkage rule threshold or scaling factor
The data-driven shrinkage rule must involve at least one tunable or data-derived parameter to decide how strongly to suppress corrections on weak channels.

axioms (1)

domain assumption Post-pruning signal damage occurs primarily at the granularity of individual output channels within a layer
This premise justifies moving from layer-wise to channel-wise repair and is invoked to explain why global corrections fail.

pith-pipeline@v0.9.0 · 5802 in / 1318 out tokens · 35061 ms · 2026-05-21T04:57:47.690512+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

A natural moment-matching objective is therefore to choose γ_i so that the repaired pruned variance is close to the corresponding dense variance... γ⋆_i = sqrt(v_d,i / v_p,i). ... s_i = v_p,i / (v_p,i + λ) ... γ_i = s_i γ̂_i + (1−s_i)·1
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Channels whose variance has collapsed toward zero receive a correction close to the identity, while channels with healthier variance retain a stronger correction.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

[1]

What is the state of neural network pruning?Proceedings of Machine Learning and Systems, 2:129–146, 2020

Davis Blalock, Jose Javier Gonzalez Ortiz, Jonathan Frankle, and John Guttag. What is the state of neural network pruning?Proceedings of Machine Learning and Systems, 2:129–146, 2020

work page 2020
[2]

Mahoney, and Kurt Keutzer

Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. Zeroq: A novel zero shot quantization framework. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13169–13178, 2020

work page 2020
[3]

Hongrong Cheng, Miao Zhang, and Javen Qinfeng Shi. A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):10558–10578, 2024

work page 2024
[4]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. Imagenet: A large-scale hierarchical image database. In2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pages 248–255. IEEE Computer Society, 2009

work page 2009
[5]

The lottery ticket hypothesis: Finding sparse, trainable neural networks

Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. InInternational Conference on Learning Representations, 2019

work page 2019
[6]

Optimal brain compression: A framework for accurate post-training quantization and pruning

Elias Frantar and Dan Alistarh. Optimal brain compression: A framework for accurate post-training quantization and pruning. InAdvances in Neural Information Processing Systems, volume 35, pages 4475–4488. Curran Associates, Inc., 2022

work page 2022
[7]

Sparsegpt: Massive language models can be accurately pruned in one-shot

Elias Frantar and Dan Alistarh. Sparsegpt: Massive language models can be accurately pruned in one-shot. InProceedings of the 40th International Conference on Machine Learning, ICML, 2023

work page 2023
[8]

Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. InInternational Conference on Learning Representations, 2016

work page 2016
[9]

Song Han, Jeff Pool, John Tran, and William J. Dally. Learning both weights and connections for efficient neural networks. InAdvances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015

work page 2015
[10]

Hassibi, D

B. Hassibi, D. G. Stork, and G. J. Wolff. Optimal brain surgeon and general network pruning. InIEEE International Conference on Neural Networks, pages 293–299, 1993

work page 1993
[11]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778. IEEE Computer Society, 2016

work page 2016
[12]

Soft filter pruning for accelerat- ing deep convolutional neural networks

Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, and Yi Yang. Soft filter pruning for accelerat- ing deep convolutional neural networks. InProceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pages 2234–2240, 2018

work page 2018
[13]

Channel pruning for accelerating very deep neural networks

Yihui He, Xiangyu Zhang, and Jian Sun. Channel pruning for accelerating very deep neural networks. In2017 IEEE International Conference on Computer Vision, pages 1398–1406, 2017

work page 2017
[14]

Imagenette: A smaller subset of 10 easily classified classes from imagenet

Jeremy Howard. Imagenette: A smaller subset of 10 easily classified classes from imagenet. https: //github.com/fastai/imagenette, 2019. Accessed: 2026-05-05. 15

work page 2019
[15]

Weinberger

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4700–4708, 2017

work page 2017
[16]

Accelerated sparse neural training: A provable and efficient method to find N:M transposable masks

Itay Hubara, Brian Chmiel, Moshe Island, Ron Banner, Joseph Naor, and Daniel Soudry. Accelerated sparse neural training: A provable and efficient method to find N:M transposable masks. InAdvances in Neural Information Processing Systems, volume 34, pages 21099–21111. Curran Associates, Inc., 2021

work page 2021
[17]

Accurate post training quantization with small calibration sets

Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, and Daniel Soudry. Accurate post training quantization with small calibration sets. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 4466–4475. PMLR, 2021

work page 2021
[18]

Batch normalization: Accelerating deep network training by reducing internal covariate shift

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. InProceedings of the 32nd International Conference on Machine Learning, volume 37 ofProceedings of Machine Learning Research, pages 448–456. PMLR, 2015

work page 2015
[19]

Sunil Rao

Hemant Ishwaran and J. Sunil Rao. Spike and slab variable selection: Frequentist and bayesian strategies. The Annals of Statistics, 33(2):730–773, 2005

work page 2005
[20]

Estimation with quadratic loss.Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1:361–379, 1961

William James and Charles Stein. Estimation with quadratic loss.Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1:361–379, 1961

work page 1961
[21]

REPAIR: REnor- malizing permuted activations for interpolation repair

Keller Jordan, Hanie Sedghi, Olga Saukh, Rahim Entezari, and Behnam Neyshabur. REPAIR: REnor- malizing permuted activations for interpolation repair. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[22]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009

work page 2009
[23]

Post-training deep neural network pruning via layer-wise calibration

Ivan Lazarevich, Alexander Kozlov, and Nikita Malinin. Post-training deep neural network pruning via layer-wise calibration. InProceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 798–805, 2021

work page 2021
[24]

Denker, and Sara A

Yann LeCun, John S. Denker, and Sara A. Solla. Optimal brain damage. InAdvances in Neural Information Processing Systems, volume 2. Morgan-Kaufmann, 1989

work page 1989
[25]

Layer-adaptive sparsity for the magnitude-based pruning

Jaeho Lee, Sejun Park, Sangwoo Mo, Sungsoo Ahn, and Jinwoo Shin. Layer-adaptive sparsity for the magnitude-based pruning. InInternational Conference on Learning Representations, 2021

work page 2021
[26]

Eagleeye: Fast sub-net evaluation for efficient neural network pruning

Bailin Li, Bowen Wu, Jiang Su, and Guangrun Wang. Eagleeye: Fast sub-net evaluation for efficient neural network pruning. InComputer Vision – ECCV 2020, volume 12347 ofLecture Notes in Computer Science, pages 639–654. Springer, 2020

work page 2020
[27]

Pruning filters for efficient convnets

Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning filters for efficient convnets. InInternational Conference on Learning Representations, 2017

work page 2017
[28]

{BRECQ}: Pushing the limit of post-training quantization by block reconstruction

Yuhang Li, Ruihao Gong, Xu Tan, Yang Yang, Peng Hu, Qi Zhang, Fengwei Yu, Wei Wang, and Shi Gu. {BRECQ}: Pushing the limit of post-training quantization by block reconstruction. InInternational Conference on Learning Representations, 2021

work page 2021
[29]

Towards optimal structured cnn pruning via generative adversarial learning

Shaohui Lin, Rongrong Ji, Chenqian Yan, Baochang Zhang, Liujuan Cao, Qixiang Ye, Feiyue Huang, and David Doermann. Towards optimal structured cnn pruning via generative adversarial learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019. 16

work page 2019
[30]

Learning efficient convolutional networks through network slimming

Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. Learning efficient convolutional networks through network slimming. In2017 IEEE International Conference on Computer Vision, pages 2755–2763, 2017

work page 2017
[31]

Rethinking the value of network pruning

Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. Rethinking the value of network pruning. InInternational Conference on Learning Representations, 2019

work page 2019
[32]

Bayesian compression for deep learning

Christos Louizos, Karen Ullrich, and Max Welling. Bayesian compression for deep learning. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

work page 2017
[33]

Thinet: A filter level pruning method for deep neural network compression

Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. Thinet: A filter level pruning method for deep neural network compression. In2017 IEEE International Conference on Computer Vision, pages 5068–5076, 2017

work page 2017
[34]

Accelerating sparse deep neural networks, 2021

Asit Mishra, Jorge Albericio Latorre, Jeff Pool, Darko Stosic, Dusan Stosic, Ganesh Venkatesh, Chong Yu, and Paulius Micikevicius. Accelerating sparse deep neural networks, 2021

work page 2021
[35]

Variational dropout sparsifies deep neural networks

Dmitry Molchanov, Arsenii Ashukha, and Dmitry Vetrov. Variational dropout sparsifies deep neural networks. InProceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 2498–2507. PMLR, 2017

work page 2017
[36]

Importance estimation for neural network pruning

Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, and Jan Kautz. Importance estimation for neural network pruning. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11256–11264. IEEE Computer Society, 2019

work page 2019
[37]

Up or down? adaptive rounding for post-training quantization

Markus Nagel, Rana Ali Amjad, Mart Van Baalen, Christos Louizos, and Tijmen Blankevoort. Up or down? adaptive rounding for post-training quantization. InProceedings of the 37th International Conference on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pages 7197–7206. PMLR, 2020

work page 2020
[38]

Data-free quantization through weight equalization and bias correction

Markus Nagel, Mart Van Baalen, Tijmen Blankevoort, and Max Welling. Data-free quantization through weight equalization and bias correction. In2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 1325–1334, 2019

work page 2019
[39]

Comparing rewinding and fine-tuning in neural network pruning

Alex Renda, Jonathan Frankle, and Michael Carbin. Comparing rewinding and fine-tuning in neural network pruning. InInternational Conference on Learning Representations, 2020

work page 2020
[40]

Signal collapse in one-shot pruning: When sparse models fail to distinguish neural representations, 2025

Dhananjay Saikumar and Blesson Varghese. Signal collapse in one-shot pruning: When sparse models fail to distinguish neural representations, 2025

work page 2025
[41]

Very deep convolutional networks for large-scale image recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. InInternational Conference on Learning Representations, 2015

work page 2015
[42]

Zico Kolter

Mingjie Sun, Zhuang Liu, Anna Bair, and J. Zico Kolter. A simple and effective pruning approach for large language models. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[43]

Structured probabilistic pruning for convolu- tional neural network acceleration

Huan Wang, Qiming Zhang, Yuehai Wang, and Haoji Hu. Structured probabilistic pruning for convolu- tional neural network acceleration. InBritish Machine Vision Conference, 2018

work page 2018
[44]

Learning structured sparsity in deep neural networks

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Learning structured sparsity in deep neural networks. InAdvances in Neural Information Processing Systems, volume 29, 2016. 17

work page 2016
[45]

Netadapt: Platform-aware neural network adaptation for mobile applications

Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, and Hartwig Adam. Netadapt: Platform-aware neural network adaptation for mobile applications. In Proceedings of the European Conference on Computer Vision, September 2018

work page 2018
[46]

empirical Bayes

Aojun Zhou, Yukun Ma, Junnan Zhu, Jianbo Liu, Zhijie Zhang, Kun Yuan, Wenxiu Sun, and Hongsheng Li. Learning n:m fine-grained structured sparse neural networks from scratch. InInternational Conference on Learning Representations, 2021. 18 A Methodology This section expands the method introduced in Section 3 of the main paper. We provide additional method-...

work page 2021

[1] [1]

What is the state of neural network pruning?Proceedings of Machine Learning and Systems, 2:129–146, 2020

Davis Blalock, Jose Javier Gonzalez Ortiz, Jonathan Frankle, and John Guttag. What is the state of neural network pruning?Proceedings of Machine Learning and Systems, 2:129–146, 2020

work page 2020

[2] [2]

Mahoney, and Kurt Keutzer

Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. Zeroq: A novel zero shot quantization framework. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13169–13178, 2020

work page 2020

[3] [3]

Hongrong Cheng, Miao Zhang, and Javen Qinfeng Shi. A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):10558–10578, 2024

work page 2024

[4] [4]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. Imagenet: A large-scale hierarchical image database. In2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pages 248–255. IEEE Computer Society, 2009

work page 2009

[5] [5]

The lottery ticket hypothesis: Finding sparse, trainable neural networks

Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. InInternational Conference on Learning Representations, 2019

work page 2019

[6] [6]

Optimal brain compression: A framework for accurate post-training quantization and pruning

Elias Frantar and Dan Alistarh. Optimal brain compression: A framework for accurate post-training quantization and pruning. InAdvances in Neural Information Processing Systems, volume 35, pages 4475–4488. Curran Associates, Inc., 2022

work page 2022

[7] [7]

Sparsegpt: Massive language models can be accurately pruned in one-shot

Elias Frantar and Dan Alistarh. Sparsegpt: Massive language models can be accurately pruned in one-shot. InProceedings of the 40th International Conference on Machine Learning, ICML, 2023

work page 2023

[8] [8]

Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. InInternational Conference on Learning Representations, 2016

work page 2016

[9] [9]

Song Han, Jeff Pool, John Tran, and William J. Dally. Learning both weights and connections for efficient neural networks. InAdvances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015

work page 2015

[10] [10]

Hassibi, D

B. Hassibi, D. G. Stork, and G. J. Wolff. Optimal brain surgeon and general network pruning. InIEEE International Conference on Neural Networks, pages 293–299, 1993

work page 1993

[11] [11]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778. IEEE Computer Society, 2016

work page 2016

[12] [12]

Soft filter pruning for accelerat- ing deep convolutional neural networks

Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, and Yi Yang. Soft filter pruning for accelerat- ing deep convolutional neural networks. InProceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pages 2234–2240, 2018

work page 2018

[13] [13]

Channel pruning for accelerating very deep neural networks

Yihui He, Xiangyu Zhang, and Jian Sun. Channel pruning for accelerating very deep neural networks. In2017 IEEE International Conference on Computer Vision, pages 1398–1406, 2017

work page 2017

[14] [14]

Imagenette: A smaller subset of 10 easily classified classes from imagenet

Jeremy Howard. Imagenette: A smaller subset of 10 easily classified classes from imagenet. https: //github.com/fastai/imagenette, 2019. Accessed: 2026-05-05. 15

work page 2019

[15] [15]

Weinberger

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4700–4708, 2017

work page 2017

[16] [16]

Accelerated sparse neural training: A provable and efficient method to find N:M transposable masks

Itay Hubara, Brian Chmiel, Moshe Island, Ron Banner, Joseph Naor, and Daniel Soudry. Accelerated sparse neural training: A provable and efficient method to find N:M transposable masks. InAdvances in Neural Information Processing Systems, volume 34, pages 21099–21111. Curran Associates, Inc., 2021

work page 2021

[17] [17]

Accurate post training quantization with small calibration sets

Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, and Daniel Soudry. Accurate post training quantization with small calibration sets. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 4466–4475. PMLR, 2021

work page 2021

[18] [18]

Batch normalization: Accelerating deep network training by reducing internal covariate shift

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. InProceedings of the 32nd International Conference on Machine Learning, volume 37 ofProceedings of Machine Learning Research, pages 448–456. PMLR, 2015

work page 2015

[19] [19]

Sunil Rao

Hemant Ishwaran and J. Sunil Rao. Spike and slab variable selection: Frequentist and bayesian strategies. The Annals of Statistics, 33(2):730–773, 2005

work page 2005

[20] [20]

Estimation with quadratic loss.Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1:361–379, 1961

William James and Charles Stein. Estimation with quadratic loss.Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1:361–379, 1961

work page 1961

[21] [21]

REPAIR: REnor- malizing permuted activations for interpolation repair

Keller Jordan, Hanie Sedghi, Olga Saukh, Rahim Entezari, and Behnam Neyshabur. REPAIR: REnor- malizing permuted activations for interpolation repair. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023

[22] [22]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009

work page 2009

[23] [23]

Post-training deep neural network pruning via layer-wise calibration

Ivan Lazarevich, Alexander Kozlov, and Nikita Malinin. Post-training deep neural network pruning via layer-wise calibration. InProceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 798–805, 2021

work page 2021

[24] [24]

Denker, and Sara A

Yann LeCun, John S. Denker, and Sara A. Solla. Optimal brain damage. InAdvances in Neural Information Processing Systems, volume 2. Morgan-Kaufmann, 1989

work page 1989

[25] [25]

Layer-adaptive sparsity for the magnitude-based pruning

Jaeho Lee, Sejun Park, Sangwoo Mo, Sungsoo Ahn, and Jinwoo Shin. Layer-adaptive sparsity for the magnitude-based pruning. InInternational Conference on Learning Representations, 2021

work page 2021

[26] [26]

Eagleeye: Fast sub-net evaluation for efficient neural network pruning

Bailin Li, Bowen Wu, Jiang Su, and Guangrun Wang. Eagleeye: Fast sub-net evaluation for efficient neural network pruning. InComputer Vision – ECCV 2020, volume 12347 ofLecture Notes in Computer Science, pages 639–654. Springer, 2020

work page 2020

[27] [27]

Pruning filters for efficient convnets

Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning filters for efficient convnets. InInternational Conference on Learning Representations, 2017

work page 2017

[28] [28]

{BRECQ}: Pushing the limit of post-training quantization by block reconstruction

Yuhang Li, Ruihao Gong, Xu Tan, Yang Yang, Peng Hu, Qi Zhang, Fengwei Yu, Wei Wang, and Shi Gu. {BRECQ}: Pushing the limit of post-training quantization by block reconstruction. InInternational Conference on Learning Representations, 2021

work page 2021

[29] [29]

Towards optimal structured cnn pruning via generative adversarial learning

Shaohui Lin, Rongrong Ji, Chenqian Yan, Baochang Zhang, Liujuan Cao, Qixiang Ye, Feiyue Huang, and David Doermann. Towards optimal structured cnn pruning via generative adversarial learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019. 16

work page 2019

[30] [30]

Learning efficient convolutional networks through network slimming

Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. Learning efficient convolutional networks through network slimming. In2017 IEEE International Conference on Computer Vision, pages 2755–2763, 2017

work page 2017

[31] [31]

Rethinking the value of network pruning

Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. Rethinking the value of network pruning. InInternational Conference on Learning Representations, 2019

work page 2019

[32] [32]

Bayesian compression for deep learning

Christos Louizos, Karen Ullrich, and Max Welling. Bayesian compression for deep learning. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

work page 2017

[33] [33]

Thinet: A filter level pruning method for deep neural network compression

Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. Thinet: A filter level pruning method for deep neural network compression. In2017 IEEE International Conference on Computer Vision, pages 5068–5076, 2017

work page 2017

[34] [34]

Accelerating sparse deep neural networks, 2021

Asit Mishra, Jorge Albericio Latorre, Jeff Pool, Darko Stosic, Dusan Stosic, Ganesh Venkatesh, Chong Yu, and Paulius Micikevicius. Accelerating sparse deep neural networks, 2021

work page 2021

[35] [35]

Variational dropout sparsifies deep neural networks

Dmitry Molchanov, Arsenii Ashukha, and Dmitry Vetrov. Variational dropout sparsifies deep neural networks. InProceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 2498–2507. PMLR, 2017

work page 2017

[36] [36]

Importance estimation for neural network pruning

Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, and Jan Kautz. Importance estimation for neural network pruning. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11256–11264. IEEE Computer Society, 2019

work page 2019

[37] [37]

Up or down? adaptive rounding for post-training quantization

Markus Nagel, Rana Ali Amjad, Mart Van Baalen, Christos Louizos, and Tijmen Blankevoort. Up or down? adaptive rounding for post-training quantization. InProceedings of the 37th International Conference on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pages 7197–7206. PMLR, 2020

work page 2020

[38] [38]

Data-free quantization through weight equalization and bias correction

Markus Nagel, Mart Van Baalen, Tijmen Blankevoort, and Max Welling. Data-free quantization through weight equalization and bias correction. In2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 1325–1334, 2019

work page 2019

[39] [39]

Comparing rewinding and fine-tuning in neural network pruning

Alex Renda, Jonathan Frankle, and Michael Carbin. Comparing rewinding and fine-tuning in neural network pruning. InInternational Conference on Learning Representations, 2020

work page 2020

[40] [40]

Signal collapse in one-shot pruning: When sparse models fail to distinguish neural representations, 2025

Dhananjay Saikumar and Blesson Varghese. Signal collapse in one-shot pruning: When sparse models fail to distinguish neural representations, 2025

work page 2025

[41] [41]

Very deep convolutional networks for large-scale image recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. InInternational Conference on Learning Representations, 2015

work page 2015

[42] [42]

Zico Kolter

Mingjie Sun, Zhuang Liu, Anna Bair, and J. Zico Kolter. A simple and effective pruning approach for large language models. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[43] [43]

Structured probabilistic pruning for convolu- tional neural network acceleration

Huan Wang, Qiming Zhang, Yuehai Wang, and Haoji Hu. Structured probabilistic pruning for convolu- tional neural network acceleration. InBritish Machine Vision Conference, 2018

work page 2018

[44] [44]

Learning structured sparsity in deep neural networks

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Learning structured sparsity in deep neural networks. InAdvances in Neural Information Processing Systems, volume 29, 2016. 17

work page 2016

[45] [45]

Netadapt: Platform-aware neural network adaptation for mobile applications

Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, and Hartwig Adam. Netadapt: Platform-aware neural network adaptation for mobile applications. In Proceedings of the European Conference on Computer Vision, September 2018

work page 2018

[46] [46]

empirical Bayes

Aojun Zhou, Yukun Ma, Junnan Zhu, Jianbo Liu, Zhijie Zhang, Kun Yuan, Wenxiu Sun, and Hongsheng Li. Learning n:m fine-grained structured sparse neural networks from scratch. InInternational Conference on Learning Representations, 2021. 18 A Methodology This section expands the method introduced in Section 3 of the main paper. We provide additional method-...

work page 2021