hub

Pruning Convolutional Neural Networks for Resource Efficient Inference

URL https://mistral · 2016 · cs.LG · arXiv 1611.06440

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

open full Pith review browse 15 citing papers arXiv PDF

abstract

We propose a new formulation for pruning convolutional kernels in neural networks to enable efficient inference. We interleave greedy criteria-based pruning with fine-tuning by backpropagation - a computationally efficient procedure that maintains good generalization in the pruned network. We propose a new criterion based on Taylor expansion that approximates the change in the cost function induced by pruning network parameters. We focus on transfer learning, where large pretrained networks are adapted to specialized tasks. The proposed criterion demonstrates superior performance compared to other criteria, e.g. the norm of kernel weights or feature map activation, for pruning large CNNs after adaptation to fine-grained classification tasks (Birds-200 and Flowers-102) relaying only on the first order gradient information. We also show that pruning can lead to more than 10x theoretical (5x practical) reduction in adapted 3D-convolutional filters with a small drop in accuracy in a recurrent gesture classifier. Finally, we show results for the large-scale ImageNet dataset to emphasize the flexibility of our approach.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

TPV: Parameter Perturbations Through the Lens of Test Prediction Variance

stat.ML · 2025-12-11 · unverdicted · novelty 7.0

TPV measures first-order sensitivity of model outputs to parameter perturbations, unifies robustness analysis under one lens, proves train-to-test convergence in overparameterized limits, and enables label-free pruning and model selection applications.

Aggressive or Imperceptible, or Both: Network Pruning Assisted Hybrid Byzantines in Federated Learning

cs.LG · 2024-04-09 · unverdicted · novelty 7.0

A hybrid sparse Byzantine attack using network pruning insights and slow accumulation bypasses eight state-of-the-art defenses in federated learning simulations.

NetTailor: Tuning the Architecture, Not Just the Weights

cs.CV · 2019-06-29 · unverdicted · novelty 7.0

NetTailor adapts CNN architecture for new tasks by assembling pre-trained universal blocks with task-specific layers, trained via activation mimicry and complexity penalties to match accuracy while reducing size for simpler tasks.

MNAFT: modality neuron-aware fine-tuning of multimodal large language models for image translation

cs.CL · 2026-04-18 · unverdicted · novelty 7.0

MNAFT identifies language-agnostic and language-specific neurons via activation analysis and selectively fine-tunes only relevant ones in MLLMs to close the modality gap and outperform full fine-tuning and other methods on image translation benchmarks.

Post-Training Pruning for Diffusion Transformers

cs.CV · 2026-07-01 · unverdicted · novelty 6.0

DiT-Pruning introduces an energy-based saliency metric balancing weights and activations plus clustering-aware granularity for post-training pruning of DiTs, showing near-zero CLIP score degradation at 50% sparsity on FLUX.1-dev.

CascadeFormer: Depth-Tapered Transformers Motivated by Gradient Fan-in Asymmetry

cs.LG · 2026-06-25 · unverdicted · novelty 6.0

CascadeFormer tapers Transformer width with depth based on gradient fan-in asymmetry to match uniform baselines in perplexity while cutting latency 8.6%.

DREAM-S: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal Generation

cs.LG · 2026-05-30 · unverdicted · novelty 6.0

DREAM-S combines neural architecture search, target-aware supernet training, and attention-entropy-guided distillation to accelerate speculative decoding in VLMs, reporting up to 3.85x speedup over standard methods.

Deep-OFDM: Neural Modulation for High Mobility

cs.IT · 2025-06-21 · unverdicted · novelty 6.0

A CNN modulator jointly trained with a neural receiver spreads information across local time-frequency neighborhoods in OFDM, breaking QAM rotational symmetry to support sparse or zero pilots under high Doppler.

H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

cs.LG · 2023-06-24 · unverdicted · novelty 6.0

H2O evicts non-heavy-hitter tokens from the KV cache using a dynamic submodular policy, retaining recent and frequent-co-occurrence tokens to reduce memory while preserving accuracy.

COP: Customized Deep Model Compression via Regularized Correlation-Based Filter-Level Pruning

cs.CV · 2019-06-25 · unverdicted · novelty 6.0

COP prunes CNN filters using correlation-based importance with global normalization and dual regularization on parameter quantity and FLOPs to enable customized compression.

ReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal Smoothing

cs.CL · 2026-05-09 · conditional · novelty 6.0

ReST-KV formulates KV eviction as layer-wise output reconstruction optimization with spatial-temporal smoothing, outperforming baselines by 2.58% on LongBench and 15.2% on RULER while cutting decoding latency by 10.61x at 128k context.

Efficient Remote Sensing Instance Segmentation with Linear-Time State Space Distilled Visual Foundation Models

cs.CV · 2026-06-24 · unverdicted · novelty 5.0

RS4D distills ViT knowledge into SSM backbones for remote sensing instance segmentation, delivering 8x fewer parameters and 9x fewer FLOPs than ViT methods while matching or exceeding accuracy on SSDD, WHU, and NWPU datasets.

Strategic Over-Parameterization for Generalizable Low-Rank Adaptation

cs.LG · 2026-05-15 · unverdicted · novelty 5.0

LoRA-Over injects auxiliary parameters into low-rank adapters during training and decomposes them back into standard LoRA at inference, with static or dynamic scheduling to allocate extra capacity where needed, yielding better generalization than vanilla LoRA on GLUE, MT-Bench, GSM8K and HumanEval.

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs

cs.LG · 2023-09-29 · unverdicted · novelty 5.0

Pruning small-magnitude weights from pre-trained LLMs causes monotonic irreversible performance degradation on difficult downstream tasks, supporting the Junk DNA Hypothesis that these weights hold essential knowledge.

Modality-Aware Zero-Shot Pruning and Sparse Attention for Efficient Multimodal Edge Inference

cs.LG · 2026-04-10 · unverdicted · novelty 5.0

SentryFuse delivers modality-aware zero-shot pruning and sparse attention that improves accuracy by 12.7% on average and up to 18% under sensor dropout while cutting memory 28.2% and latency up to 1.63x across multimodal edge models.

citing papers explorer

Showing 15 of 15 citing papers.

TPV: Parameter Perturbations Through the Lens of Test Prediction Variance stat.ML · 2025-12-11 · unverdicted · none · ref 2 · internal anchor
TPV measures first-order sensitivity of model outputs to parameter perturbations, unifies robustness analysis under one lens, proves train-to-test convergence in overparameterized limits, and enables label-free pruning and model selection applications.
Aggressive or Imperceptible, or Both: Network Pruning Assisted Hybrid Byzantines in Federated Learning cs.LG · 2024-04-09 · unverdicted · none · ref 75 · internal anchor
A hybrid sparse Byzantine attack using network pruning insights and slow accumulation bypasses eight state-of-the-art defenses in federated learning simulations.
NetTailor: Tuning the Architecture, Not Just the Weights cs.CV · 2019-06-29 · unverdicted · none · ref 43 · internal anchor
NetTailor adapts CNN architecture for new tasks by assembling pre-trained universal blocks with task-specific layers, trained via activation mimicry and complexity penalties to match accuracy while reducing size for simpler tasks.
MNAFT: modality neuron-aware fine-tuning of multimodal large language models for image translation cs.CL · 2026-04-18 · unverdicted · none · ref 16
MNAFT identifies language-agnostic and language-specific neurons via activation analysis and selectively fine-tunes only relevant ones in MLLMs to close the modality gap and outperform full fine-tuning and other methods on image translation benchmarks.
Post-Training Pruning for Diffusion Transformers cs.CV · 2026-07-01 · unverdicted · none · ref 10 · internal anchor
DiT-Pruning introduces an energy-based saliency metric balancing weights and activations plus clustering-aware granularity for post-training pruning of DiTs, showing near-zero CLIP score degradation at 50% sparsity on FLUX.1-dev.
CascadeFormer: Depth-Tapered Transformers Motivated by Gradient Fan-in Asymmetry cs.LG · 2026-06-25 · unverdicted · none · ref 24 · internal anchor
CascadeFormer tapers Transformer width with depth based on gradient fan-in asymmetry to match uniform baselines in perplexity while cutting latency 8.6%.
DREAM-S: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal Generation cs.LG · 2026-05-30 · unverdicted · none · ref 102 · internal anchor
DREAM-S combines neural architecture search, target-aware supernet training, and attention-entropy-guided distillation to accelerate speculative decoding in VLMs, reporting up to 3.85x speedup over standard methods.
Deep-OFDM: Neural Modulation for High Mobility cs.IT · 2025-06-21 · unverdicted · none · ref 27 · internal anchor
A CNN modulator jointly trained with a neural receiver spreads information across local time-frequency neighborhoods in OFDM, breaking QAM rotational symmetry to support sparse or zero pilots under high Doppler.
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models cs.LG · 2023-06-24 · unverdicted · none · ref 59 · internal anchor
H2O evicts non-heavy-hitter tokens from the KV cache using a dynamic submodular policy, retaining recent and frequent-co-occurrence tokens to reduce memory while preserving accuracy.
COP: Customized Deep Model Compression via Regularized Correlation-Based Filter-Level Pruning cs.CV · 2019-06-25 · unverdicted · none · ref 14 · internal anchor
COP prunes CNN filters using correlation-based importance with global normalization and dual regularization on parameter quantity and FLOPs to enable customized compression.
ReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal Smoothing cs.CL · 2026-05-09 · conditional · none · ref 16
ReST-KV formulates KV eviction as layer-wise output reconstruction optimization with spatial-temporal smoothing, outperforming baselines by 2.58% on LongBench and 15.2% on RULER while cutting decoding latency by 10.61x at 128k context.
Efficient Remote Sensing Instance Segmentation with Linear-Time State Space Distilled Visual Foundation Models cs.CV · 2026-06-24 · unverdicted · none · ref 65 · internal anchor
RS4D distills ViT knowledge into SSM backbones for remote sensing instance segmentation, delivering 8x fewer parameters and 9x fewer FLOPs than ViT methods while matching or exceeding accuracy on SSDD, WHU, and NWPU datasets.
Strategic Over-Parameterization for Generalizable Low-Rank Adaptation cs.LG · 2026-05-15 · unverdicted · none · ref 21 · internal anchor
LoRA-Over injects auxiliary parameters into low-rank adapters during training and decomposes them back into standard LoRA at inference, with static or dynamic scheduling to allocate extra capacity where needed, yielding better generalization than vanilla LoRA on GLUE, MT-Bench, GSM8K and HumanEval.
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs cs.LG · 2023-09-29 · unverdicted · none · ref 39 · internal anchor
Pruning small-magnitude weights from pre-trained LLMs causes monotonic irreversible performance degradation on difficult downstream tasks, supporting the Junk DNA Hypothesis that these weights hold essential knowledge.
Modality-Aware Zero-Shot Pruning and Sparse Attention for Efficient Multimodal Edge Inference cs.LG · 2026-04-10 · unverdicted · none · ref 28
SentryFuse delivers modality-aware zero-shot pruning and sparse attention that improves accuracy by 12.7% on average and up to 18% under sensor dropout while cutting memory 28.2% and latency up to 1.63x across multimodal edge models.

Pruning Convolutional Neural Networks for Resource Efficient Inference

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer