arxiv: 2510.09696 · v3 · submitted 2025-10-09 · 💻 cs.LG · cs.AI

Vanishing Contributions: A Unified Framework for Smooth and Iterative Model Compression

Lorenzo Nikiforos , Luciano Prono , Charalampos Antoniadis , Fabio Pareschi , Riccardo Rovatti , Gianluca Setti This is my paper

Pith reviewed 2026-05-18 08:35 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords model compressionfine-tuningpruningquantizationiterative compressiondeep neural networksaffine combinationaccuracy preservation

0 comments p. Extension

The pith

A framework gradually blends outputs from original and compressed neural networks during fine-tuning to stabilize iterative model compression.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes VCON as a way to compress deep neural networks more reliably by avoiding abrupt switches to a smaller version. Instead of replacing the original model outright, the approach runs both the full and compressed networks at the same time. Their outputs are mixed with weights that shift steadily from the original toward the compressed model. This slow handoff is intended to keep training stable and limit accuracy loss across different compression methods and tasks. A sympathetic reader would care because current compression techniques often force difficult trade-offs between size and performance that this blending might ease.

Core claim

VCON executes both the original uncompressed model and the compressed model in parallel during fine-tuning; the contribution of the original model is progressively reduced while that of the compressed model is gradually increased via an affine combination, improving stability and mitigating accuracy degradation. In most settings this yields accuracy gains exceeding 1% over baselines, with some configurations showing improvements above 15%.

What carries the argument

The affine combination of outputs from the original and compressed models, which gradually shifts emphasis from the uncompressed network to the compressed one throughout fine-tuning.

If this is right

The same blending process works with pruning, quantization, and low-rank decomposition without needing separate iterative schedules for each.
Accuracy improves by more than 1 percent over post-shot and iterative baselines on computer vision and natural language processing benchmarks in most tested cases.
Some compression configurations reach accuracy gains above 15 percent when the gradual transition is used.
Training remains more stable because the network never experiences an instantaneous jump from full to compressed behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The parallel-execution idea could extend to other sudden model changes such as architecture search or domain adaptation where abrupt shifts cause instability.
Resource costs during the transition phase might be reduced by running the two models only on selected layers or batches rather than the full network.
The blending schedule itself could be made data-driven instead of fixed, potentially removing the need for manual tuning across new datasets.

Load-bearing premise

An affine combination of outputs from the original and compressed models during fine-tuning will reliably allow the network to adapt smoothly without introducing new instabilities or requiring technique-specific tuning of the blending schedule.

What would settle it

An experiment applying VCON to a standard compression task and finding no reduction in accuracy degradation or stability improvement compared with direct replacement or existing iterative baselines would falsify the central claim.

Figures

Figures reproduced from arXiv: 2510.09696 by Charalampos Antoniadis, Fabio Pareschi, Gianluca Setti, Lorenzo Nikiforos, Luciano Prono, Riccardo Rovatti.

**Figure 2.** Figure 2: Illustration of block-wise VCON: the first two blocks [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Visual intuition of the VCON approach: when a model parameter [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

The increasing scale of Deep Neural Networks (DNNs) introduces the need for compression techniques such as pruning, quantization, and low-rank decomposition. While these methods are very effective at reducing memory, computation, and energy consumption, they may introduce severe accuracy degradation, which is often mitigated by using iterative, gradual compression. However, different compression techniques require distinct iterative approaches, and some result in unstable, discontinuous model fine-tuning. We introduce Vanishing Contributions (VCON), a unified framework for the smooth, iterative transition of DNNs into a compressed form. Rather than replacing the original network directly with its compressed version, VCON executes both in parallel during fine-tuning. The contribution of the original (uncompressed) model is progressively reduced, while that of the compressed model is gradually increased. This affine combination allows the network to slowly adapt, improving stability and mitigating accuracy degradation. We evaluate VCON on computer vision and natural language processing benchmarks, using multiple compression strategies. In most settings, our framework improves accuracy over post-shot and iterative baselines. Typical gains exceed 1%, while some configuration exhibits improvements above 15%. VCON is thus compatible with existing compression techniques and consistently improves performance across diverse tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces Vanishing Contributions (VCON), a unified framework for smooth iterative compression of deep neural networks. Rather than directly substituting the original model with a compressed version (via pruning, quantization, or low-rank decomposition), VCON runs both models in parallel during fine-tuning and applies an affine combination to progressively reduce the original model's contribution while increasing the compressed model's. This is claimed to improve training stability and yield accuracy gains over post-shot and iterative baselines, with typical improvements exceeding 1% and some configurations above 15% on vision and NLP benchmarks.

Significance. If the central claims hold under rigorous validation, VCON could offer a practical, general-purpose technique for mitigating accuracy loss during compression transitions, reducing the need for bespoke iterative fine-tuning schedules per compression method. This would be valuable for efficient ML deployment, particularly if the approach proves robust across diverse architectures and compression types without introducing new instabilities.

major comments (3)

[Abstract] Abstract: the description of the affine combination ((1-α)·f_orig + α·f_comp) does not specify the combination point (logits vs. intermediate features) or the functional form of the ramp schedule α(t). Without these, it is impossible to verify whether the same blending rule remains stable and technique-agnostic when the compressed model changes structure or precision, which is load-bearing for the unified-framework claim.
[Experimental section] Experimental section (as referenced in the abstract's performance claims): the abstract asserts 'consistent accuracy improvements across vision and language benchmarks' and 'gains exceed 1%' (some >15%) but supplies no dataset sizes, number of runs, error bars, or statistical tests. This prevents verification of whether the reported gains are reliable or could be artifacts of single-run evaluation.
[Method description] Method description: the central claim presupposes that a single, technique-agnostic schedule for α exists that avoids accuracy cliffs or new instabilities. No ablation on schedule sensitivity or per-technique retuning is referenced, which directly tests the unification premise.

minor comments (1)

[Abstract] Abstract: the sentence 'some configuration exhibits improvements above 15%' should identify the specific compression method, dataset, and configuration for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our work. We provide point-by-point responses to the major comments below, indicating revisions where we have updated the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the description of the affine combination ((1-α)·f_orig + α·f_comp) does not specify the combination point (logits vs. intermediate features) or the functional form of the ramp schedule α(t). Without these, it is impossible to verify whether the same blending rule remains stable and technique-agnostic when the compressed model changes structure or precision, which is load-bearing for the unified-framework claim.

Authors: We agree with this observation and have revised the abstract to specify that the affine combination is performed on the logits and that α(t) is implemented as a linear ramp schedule. These details, along with the justification for their choice to maintain stability and technique-agnosticism, are elaborated in Section 3 of the paper. revision: yes
Referee: [Experimental section] Experimental section (as referenced in the abstract's performance claims): the abstract asserts 'consistent accuracy improvements across vision and language benchmarks' and 'gains exceed 1%' (some >15%) but supplies no dataset sizes, number of runs, error bars, or statistical tests. This prevents verification of whether the reported gains are reliable or could be artifacts of single-run evaluation.

Authors: The full paper in Section 4 includes dataset sizes, results from multiple runs with error bars, and statistical comparisons. To improve the abstract, we have added a sentence noting that the gains are consistent across multiple runs with reported standard deviations. This addresses the concern while maintaining abstract brevity. revision: yes
Referee: [Method description] Method description: the central claim presupposes that a single, technique-agnostic schedule for α exists that avoids accuracy cliffs or new instabilities. No ablation on schedule sensitivity or per-technique retuning is referenced, which directly tests the unification premise.

Authors: We have added an ablation study on different schedules for α in the revised manuscript, demonstrating that the linear schedule is robust and does not require per-technique retuning for the compression methods tested. This is now referenced in the method section to strengthen the unification claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity; VCON defined independently and evaluated externally

full rationale

The paper introduces VCON by defining an affine combination of original and compressed model outputs during joint fine-tuning, with the original contribution progressively reduced via a ramped parameter. This construction is presented as a new technique and then tested on external CV and NLP benchmarks across pruning, quantization, and low-rank methods. No equation or claim reduces the performance gains to a self-definition, a fitted input relabeled as prediction, or a load-bearing self-citation chain. The derivation remains self-contained because the method is specified first and its benefits are measured against independent baselines rather than derived tautologically from its own parameters.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that gradual output blending during fine-tuning produces stable adaptation; the blending schedule itself functions as an implicit free parameter whose specific form is not detailed in the abstract.

free parameters (1)

contribution schedule
The rate and functional form governing how quickly the original model's weight decreases and the compressed model's weight increases must be chosen or tuned for each task and compression method.

axioms (1)

domain assumption Affine combination of model outputs during fine-tuning permits stable adaptation across compression techniques
Invoked when the paper states that the parallel execution and gradual shift mitigates accuracy degradation.

pith-pipeline@v0.9.0 · 5759 in / 1308 out tokens · 41582 ms · 2026-05-18T08:35:27.510378+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 14 internal anchors

[1]

Deep learning,

Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,”Nature, vol. 521, no. 7553, pp. 436–444, May 2015. doi:10.1038/nature14539 NIKIFOROSet al.: V ANISHING CONTRIBUTIONS: A UNIFIED APPROACH TO SMOOTHLY TRANSITION NEURAL MODELS INTO COMPRESSED FORM 9 TABLE VI TRAINING CONFIGURATIONS AND HYPERPARAMETERS Configuration ViT-T/16, ViT-S/16, ViT-B/16 BERT, dist...

work page doi:10.1038/nature14539 2015
[2]

A survey on deploying mobile deep learning applications: A systemic and technical perspective,

Y . Wang, J. Wang, W. Zhang, Y . Zhan, S. Guo, Q. Zheng, and X. Wang, “A survey on deploying mobile deep learning applications: A systemic and technical perspective,”Digital Communications and Networks, vol. 8, no. 1, pp. 1–17, Feb. 2022. doi:10.1016/j.dcan.2021.06.001

work page doi:10.1016/j.dcan.2021.06.001 2022
[3]

A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommenda- tions,

H. Cheng, M. Zhang, and J. Q. Shi, “A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommenda- tions,” Aug. 2024. doi:10.48550/arXiv.2308.06767

work page doi:10.48550/arxiv.2308.06767 2024
[4]

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers,

Z. Allen-Zhu, Y . Li, and Y . Liang, “Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers,” in Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., 2019

work page 2019
[5]

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

S. Han, H. Mao, and W. J. Dally, “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” Feb. 2016. doi:10.48550/arXiv.1510.00149

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1510.00149 2016
[6]

Methods for Pruning Deep Neu- ral Networks,

S. Vadera and S. Ameen, “Methods for Pruning Deep Neu- ral Networks,”IEEE Access, vol. 10, pp. 63 280–63 300, 2022. doi:10.1109/ACCESS.2022.3182659

work page doi:10.1109/access.2022.3182659 2022
[7]

A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions,

R. Mishra, H. P. Gupta, and T. Dutta, “A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions,” Oct

work page
[8]

doi:10.48550/arXiv.2010.03954

work page doi:10.48550/arxiv.2010.03954 2010
[9]

A Survey on Methods and Theories of Quantized Neural Networks

Y . Guo, “A Survey on Methods and Theories of Quantized Neural Networks,” Dec. 2018. doi:10.48550/arXiv.1808.04752

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1808.04752 2018
[10]

[2112.06126] Neural Network Quantization for Efficient Inference: A Survey,

“[2112.06126] Neural Network Quantization for Efficient Inference: A Survey,” https://arxiv.org/abs/2112.06126

work page arXiv
[11]

A Multiply-And-Max/Min Neuron Paradigm for Aggressively Prunable Deep Neural Networks,

L. Prono, P. Bich, C. Boretti, M. Mangia, F. Pareschi, R. Rovatti, and G. Setti, “A Multiply-And-Max/Min Neuron Paradigm for Aggressively Prunable Deep Neural Networks,”IEEE Transactions on Neural Net- works and Learning Systems, vol. 36, no. 8, pp. 14 414–14 427, Aug

work page
[12]

doi:10.1109/TNNLS.2025.3527644

work page doi:10.1109/tnnls.2025.3527644 2025
[13]

Pruning neural networks without any data by iteratively conserving synaptic flow,

H. Tanaka, D. Kunin, D. L. Yamins, and S. Ganguli, “Pruning neural networks without any data by iteratively conserving synaptic flow,” in Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 6377–6389

work page 2020
[14]

Adaptive Iterative Pruning for Accelerating Deep Neural Networks,

Y . Gordienko, Y . Kochura, V . Taran, N. Gordienko, A. Bugaiov, and S. Stirenko, “Adaptive Iterative Pruning for Accelerating Deep Neural Networks,” in2019 XIth International Scientific and Practical Confer- ence on Electronics and Information Technologies (ELIT), Sep. 2019, pp. 173–178. doi:10.1109/ELIT.2019.8892346

work page doi:10.1109/elit.2019.8892346 2019
[15]

DropNet: Reducing Neural Network Com- plexity via Iterative Pruning,

C. M. J. Tan and M. Motani, “DropNet: Reducing Neural Network Com- plexity via Iterative Pruning,” inProceedings of the 37th International Conference on Machine Learning. PMLR, Nov. 2020, pp. 9356–9366

work page 2020
[16]

Progressive Channel-Shrinking Network,

J. Pan, S. Yang, L. G. Foo, Q. Ke, H. Rahmani, Z. Fan, and J. Liu, “Progressive Channel-Shrinking Network,”IEEE Transactions on Mul- timedia, vol. 26, pp. 2016–2026, 2024. doi:10.1109/TMM.2023.3291197

work page doi:10.1109/tmm.2023.3291197 2016
[17]

Towards Higher Ranks via Adversarial Weight Pruning,

Y . Tian, H. Chen, T. Guo, C. Xu, and Y . Wang, “Towards Higher Ranks via Adversarial Weight Pruning,” Nov. 2023. doi:10.48550/arXiv.2311.17493

work page doi:10.48550/arxiv.2311.17493 2023
[18]

Embedding Com- pression with Isotropic Iterative Quantization,

S. Liao, J. Chen, Y . Wang, Q. Qiu, and B. Yuan, “Embedding Com- pression with Isotropic Iterative Quantization,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8336–8343, Apr. 2020. doi:10.1609/aaai.v34i05.6350

work page doi:10.1609/aaai.v34i05.6350 2020
[19]

Gradient-Aware In- cremental Network Quantization,

J. Meng, Z. Qu, W. Zhou, S. Hu, and B. Ye, “Gradient-Aware In- cremental Network Quantization,” inNetwork and Parallel Computing, X. Chen, G. Min, D. Guo, X. Xie, and L. Pu, Eds. Singapore: Springer Nature, 2025, pp. 430–441. doi:10.1007/978-981-96-2864-3 34

work page doi:10.1007/978-981-96-2864-3 2025
[20]

Iterative Low-Rank Approximation for CNN Com- pression,

M. Kholiavchenko, “Iterative Low-Rank Approximation for CNN Com- pression,” Nov. 2019. doi:10.48550/arXiv.1803.08995

work page doi:10.48550/arxiv.1803.08995 2019
[21]

Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration

Y . He, P. Liu, Z. Wang, Z. Hu, and Y . Yang, “Filter Pruning via Geo- metric Median for Deep Convolutional Neural Networks Acceleration,” Jul. 2019. doi:10.48550/arXiv.1811.00250

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1811.00250 2019
[22]

NISP: Pruning Networks using Neuron Importance Score Propagation

R. Yu, A. Li, C.-F. Chen, J.-H. Lai, V . I. Morariu, X. Han, M. Gao, C.-Y . Lin, and L. S. Davis, “NISP: Pruning Networks using Neuron Impor- tance Score Propagation,” Mar. 2018. doi:10.48550/arXiv.1711.05908

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1711.05908 2018
[23]

ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression

J.-H. Luo, J. Wu, and W. Lin, “ThiNet: A Filter Level Prun- ing Method for Deep Neural Network Compression,” Jul. 2017. doi:10.48550/arXiv.1707.06342

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06342 2017
[24]

Pruning Filters for Efficient ConvNets

H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning Fil- ters for Efficient ConvNets,” Mar. 2017. doi:10.48550/arXiv.1608.08710

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1608.08710 2017
[25]

A Simple and Effective Pruning Approach for Large Language Models

M. Sun, Z. Liu, A. Bair, and J. Z. Kolter, “A Simple and Ef- fective Pruning Approach for Large Language Models,” May 2024. doi:10.48550/arXiv.2306.11695

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2306.11695 2024
[26]

Group Fisher Pruning for Practical Network Compression,

L. Liu, S. Zhang, Z. Kuang, A. Zhou, J.-H. Xue, X. Wang, Y . Chen, W. Yang, Q. Liao, and W. Zhang, “Group Fisher Pruning for Practical Network Compression,” Aug. 2021. doi:10.48550/arXiv.2108.00708

work page doi:10.48550/arxiv.2108.00708 2021
[27]

Manifold Regularized Dynamic Network Pruning,

Y . Tang, Y . Wang, Y . Xu, Y . Deng, C. Xu, D. Tao, and C. Xu, “Manifold Regularized Dynamic Network Pruning,” Mar. 2021. doi:10.48550/arXiv.2103.05861

work page doi:10.48550/arxiv.2103.05861 2021
[28]

Sparsegpt: Massive language models can be accurately pruned in one-shot, 2023 b

E. Frantar and D. Alistarh, “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot,” Mar. 2023. doi:10.48550/arXiv.2301.00774

work page doi:10.48550/arxiv.2301.00774 2023
[29]

Predicting Parameters in Deep Learning

M. Denil, B. Shakibi, L. Dinh, M. Ranzato, and N. de Freitas, “Predicting Parameters in Deep Learning,” Oct. 2014. doi:10.48550/arXiv.1306.0543

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1306.0543 2014
[30]

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

V . Lebedev, Y . Ganin, M. Rakhuba, I. Oseledets, and V . Lempitsky, “Speeding-up Convolutional Neural Networks Using Fine-tuned CP- Decomposition,” Apr. 2015. doi:10.48550/arXiv.1412.6553

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1412.6553 2015
[31]

Constrained Optimization Based Low-Rank Ap- proximation of Deep Neural Networks,

C. Li and C. J. R. Shi, “Constrained Optimization Based Low-Rank Ap- proximation of Deep Neural Networks,” inProceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 732–747

work page 2018
[32]

Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets,

T. N. Sainath, B. Kingsbury, V . Sindhwani, E. Arisoy, and B. Ramabhad- ran, “Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets,” in2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013, pp. 6655–6659. doi:10.1109/ICASSP.2013.6638949

work page doi:10.1109/icassp.2013.6638949 2013
[33]

Accelerating Very Deep Convo- lutional Networks for Classification and Detection,

X. Zhang, J. Zou, K. He, and J. Sun, “Accelerating Very Deep Convo- lutional Networks for Classification and Detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 10, pp. 1943– 1955, Oct. 2016. doi:10.1109/TPAMI.2015.2502579

work page doi:10.1109/tpami.2015.2502579 1943
[34]

Tensorizing Neural Networks

A. Novikov, D. Podoprikhin, A. Osokin, and D. Vetrov, “Tensorizing Neural Networks,” Dec. 2015. doi:10.48550/arXiv.1509.06569

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1509.06569 2015
[35]

AdaBin: Improving Bi- nary Neural Networks with Adaptive Binary Sets,

Z. Tu, X. Chen, P. Ren, and Y . Wang, “AdaBin: Improving Bi- nary Neural Networks with Adaptive Binary Sets,” Oct. 2022. doi:10.48550/arXiv.2208.08084

work page doi:10.48550/arxiv.2208.08084 2022
[36]

BiPer: Binary Neural Networks using a Periodic Function,

E. Vargas, C. V . Correa, C. Hinojosa, and H. Arguello, “BiPer: Binary Neural Networks using a Periodic Function,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 5684–5693

work page 2024
[37]

TernaryLLM: Ternarized Large Language Model,

T. Chen, Z. Li, W. Xu, Z. Zhu, D. Li, L. Tian, E. Barsoum, P. Wang, and J. Cheng, “TernaryLLM: Ternarized Large Language Model,” Jun

work page
[38]

doi:10.48550/arXiv.2406.07177

work page doi:10.48550/arxiv.2406.07177
[39]

TerViT: An Efficient Ternary Vision Transformer,

S. Xu, Y . Li, T. Ma, B. Zeng, B. Zhang, P. Gao, and J. Lv, “TerViT: An Efficient Ternary Vision Transformer,” Jan. 2022. doi:10.48550/arXiv.2201.08050

work page doi:10.48550/arxiv.2201.08050 2022
[40]

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks,

M. Rastegari, V . Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks,” NIKIFOROSet al.: V ANISHING CONTRIBUTIONS: A UNIFIED APPROACH TO SMOOTHLY TRANSITION NEURAL MODELS INTO COMPRESSED FORM 10 inComputer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Cham: Springer Inte...

work page doi:10.1007/978-3-319-46493-0 2016
[41]

Low-bit Quantization of Neural Networks for Efficient Inference

Y . Choukroun, E. Kravchik, F. Yang, and P. Kisilev, “Low-bit Quan- tization of Neural Networks for Efficient Inference,” Mar. 2019. doi:10.48550/arXiv.1902.06822

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1902.06822 2019
[42]

Only Train Once: A One-Shot Neural Network Training And Pruning Framework,

T. Chen, B. Ji, T. Ding, B. Fang, G. Wang, Z. Zhu, L. Liang, Y . Shi, S. Yi, and X. Tu, “Only Train Once: A One-Shot Neural Network Training And Pruning Framework,” inAdvances in Neural Information Processing Systems, vol. 34. Curran Associates, Inc., 2021, pp. 19 637–19 651

work page 2021
[43]

SLiM: One- shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression,

M. Mozaffari, A. Yazdanbakhsh, and M. M. Dehnavi, “SLiM: One- shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression,” Aug. 2025. doi:10.48550/arXiv.2410.09615

work page doi:10.48550/arxiv.2410.09615 2025
[44]

OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization,

P. Hu, X. Peng, H. Zhu, M. M. S. Aly, and J. Lin, “OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization,” May 2022. doi:10.48550/arXiv.2205.11141

work page doi:10.48550/arxiv.2205.11141 2022
[45]

Iterative clustering pruning for convolutional neural networks,

J. Chang, Y . Lu, P. Xue, Y . Xu, and Z. Wei, “Iterative clustering pruning for convolutional neural networks,”Knowledge-Based Systems, vol. 265, p. 110386, Apr. 2023. doi:10.1016/j.knosys.2023.110386

work page doi:10.1016/j.knosys.2023.110386 2023
[46]

Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quantization Rates using ADMM

S. Ye, X. Feng, T. Zhang, X. Ma, S. Lin, Z. Li, K. Xu, W. Wen, S. Liu, J. Tang, M. Fardad, X. Lin, Y . Liu, and Y . Wang, “Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quanti- zation Rates using ADMM,” Mar. 2019. doi:10.48550/arXiv.1903.09769

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1903.09769 2019
[47]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

“[1308.3432] Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation,” https://arxiv.org/abs/1308.3432

work page internal anchor Pith review Pith/arXiv arXiv
[48]

Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks,

M. Huh, B. Cheung, P. Agrawal, and P. Isola, “Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks,” May 2023. doi:10.48550/arXiv.2305.08842

work page doi:10.48550/arxiv.2305.08842 2023
[49]

S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training,

Y . Hu, J. Zhu, and J. Chen, “S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training,” Dec. 2024. doi:10.48550/arXiv.2409.09099

work page doi:10.48550/arxiv.2409.09099 2024
[50]

Learning Multiple Layers of Features from Tiny Im- ages,

A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Im- ages,” 2009

work page 2009
[51]

, author Dong, W

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in2009 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2009, pp. 248–255. doi:10.1109/CVPR.2009.5206848

work page doi:10.1109/cvpr.2009.5206848 2009
[52]

Transforming Question Answering Datasets Into Natural Language Inference Datasets

D. Demszky, K. Guu, and P. Liang, “Transforming Question Answer- ing Datasets Into Natural Language Inference Datasets,” Sep. 2018. doi:10.48550/arXiv.1809.02922

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1809.02922 2018
[53]

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference,

A. Williams, N. Nangia, and S. Bowman, “A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference,” inProceedings of the 2018 Conference of the North American Chapter of the Asso- ciation for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), M. Walker, H. Ji, and A. Stent, Eds. New Orleans, Louisiana: Ass...

work page doi:10.18653/v1/n18-1101 2018