SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

Forrest N. Iandola , Song Han , Matthew W. Moskewicz , Khalid Ashraf , William J. Dally , Kurt Keutzer

Authors on Pith no claims yet

classification 💻 cs.CV cs.AI

keywords accuracysqueezenetsmallerdnnslessmodeladvantagesalexnet-level

read the original abstract

Recent research on deep neural networks has focused primarily on improving accuracy. For a given accuracy level, it is typically possible to identify multiple DNN architectures that achieve that accuracy level. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1) Smaller DNNs require less communication across servers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an autonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To provide all of these advantages, we propose a small DNN architecture called SqueezeNet. SqueezeNet achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters. Additionally, with model compression techniques we are able to compress SqueezeNet to less than 0.5MB (510x smaller than AlexNet). The SqueezeNet architecture is available for download here: https://github.com/DeepScale/SqueezeNet

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 14 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading
cs.CR 2026-04 unverdicted novelty 7.0

Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
cs.CV 2017-04 accept novelty 7.0

MobileNets introduce depthwise separable convolutions plus width and resolution multipliers to produce efficient CNNs that trade off latency and accuracy for mobile and embedded vision applications.
DiBA: Diagonal and Binary Matrix Approximation for Neural Network Weight Compression
cs.LG 2026-05 unverdicted novelty 6.0

DiBA factors weight matrices into diagonal-binary-diagonal-binary-diagonal form to cut matrix-vector multiplies from mn to m+k+n operations and improves accuracy on DistilBERT and audio transformer tasks after replacement.
On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference
cs.CR 2026-05 conditional novelty 6.0

An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.
Homodyne Photonic Tensor Processor exceeds 1,000-TOPS
cs.ET 2026-04 unverdicted novelty 6.0

A homodyne photonic tensor processor using TFLN transmitters and Si/SiN circuits demonstrates 1,000-6,000 TOPS throughput with 6-7 bit accuracy at up to 120 Gbaud/s clock rates.
Co-Design of CNN Accelerators for TinyML using Approximate Matrix Decomposition
cs.AR 2026-04 unverdicted novelty 6.0

A co-design framework using approximate matrix decomposition and genetic algorithms delivers 33% average latency reduction in TinyML CNN FPGA accelerators with 1.3% average accuracy loss versus standard systolic arrays.
StableTTA: Improving Vision Model Performance by Training-free Test-Time Adaptation Methods
cs.CV 2026-04 unverdicted novelty 6.0

StableTTA improves ImageNet-1K accuracy across 71 vision models by stabilizing logit aggregation under coherent-batch inference and enabling efficient single-forward-pass adaptation.
CNN-ViT Fusion with Adaptive Attention Gate for Brain Tumor MRI Classification: A Hybrid Deep Learning Model
cs.CV 2026-04 unverdicted novelty 5.0

Hybrid CNN-ViT with adaptive attention gate achieves 97.6% accuracy on brain tumor MRI classification, outperforming baselines.
YOLOv4: Optimal Speed and Accuracy of Object Detection
cs.CV 2020-04 unverdicted novelty 5.0

YOLOv4 achieves 43.5% AP (65.7% AP50) on MS COCO at ~65 FPS on Tesla V100 by integrating WRC, CSP, CmBN, SAT, Mish activation, Mosaic augmentation, DropBlock, and CIoU loss.
Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey
cs.CV 2026-05 unverdicted novelty 4.0

A comprehensive survey of edge deep learning in computer vision and medical diagnostics that presents a novel categorization of hardware platforms by performance and usage scenarios.
Are Data Augmentation and Segmentation Always Necessary? Insights from COVID-19 X-Rays and a Methodology Thereof
cs.CV 2026-04 unverdicted novelty 3.0

Lung segmentation is necessary for reliable COVID-19 X-ray classification while excessive data augmentation leads to overfitting, with the proposed SDL-COVID method reaching 95.21% precision and low false negatives.
Vision-Based Lane Following and Traffic Sign Recognition for Resource-Constrained Autonomous Vehicles
cs.CV 2026-04 conditional novelty 3.0

A threshold-based lane detector with perspective warp and histogram curvature plus EfficientNet-B0 achieves 3.16% max lane offset RMSE and 90% on-device sign accuracy while running real-time on resource-limited hardware.
2D Pre-Training for 3D Pose Estimation
cs.CV 2026-04 unverdicted novelty 3.0

2D pre-training for 3D human pose estimation yields lower error and higher efficiency than 3D-only training, reaching MPJPE below 64.5 mm on standard benchmarks.
A Transfer Learning Evaluation of Deep Neural Networks for Image Classification
cs.CV 2026-05 unverdicted novelty 2.0

Empirical comparison of transfer learning performance across eleven pre-trained models on five image datasets using accuracy, time, and size metrics.