SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
read the original abstract
Recent research on deep neural networks has focused primarily on improving accuracy. For a given accuracy level, it is typically possible to identify multiple DNN architectures that achieve that accuracy level. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1) Smaller DNNs require less communication across servers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an autonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To provide all of these advantages, we propose a small DNN architecture called SqueezeNet. SqueezeNet achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters. Additionally, with model compression techniques we are able to compress SqueezeNet to less than 0.5MB (510x smaller than AlexNet). The SqueezeNet architecture is available for download here: https://github.com/DeepScale/SqueezeNet
This paper has not been read by Pith yet.
Forward citations
Cited by 38 Pith papers
-
Fast and Lightweight Backdoor Detection via Head Random Probing
HTell detects backdoors by random probing of the model head, reporting 99.03% true positive rate and 2.11% false positive rate at 12.69 ms per model on a benchmark of over 6700 models.
-
Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading
Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.
-
Beyond Assumptions: Measuring Federated Learning over Real 5G Networks
Real 5G testbed experiments show consistent stragglers in 70% of federated learning trials due to communication delays, challenging common wireless FL assumptions.
-
The Indirect Convolution Algorithm
The Indirect Convolution algorithm avoids im2col by using an indirection buffer, reducing memory overhead proportionally to input channels and outperforming GEMM-based methods by up to 62% for convolutions requiring t...
-
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
EfficientNet scales network depth, width, and resolution uniformly via a compound coefficient to deliver state-of-the-art accuracy and efficiency on image classification.
-
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
MobileNets introduce depthwise separable convolutions plus width and resolution multipliers to produce efficient CNNs that trade off latency and accuracy for mobile and embedded vision applications.
-
Not Too Generative, Not Too Discriminative: The Human Alignment Sweet Spot
Hybrid JEMs at intermediate generative-discriminative balance maximize human alignment on perceptual similarity, gloss, uncertainty, robustness, cue conflict, and feature attribution benchmarks.
-
AutoMCU: Feasibility-First MCU Neural Network Customization via LLM-based Multi-Agent Systems
AutoMCU uses feasibility-first LLM multi-agent coordination to automate MCU-constrained neural network design, delivering competitive accuracy on CIFAR-10/100 in 1-2 hours versus hundreds of GPU hours for prior HW-NAS...
-
DiBA: Diagonal and Binary Matrix Approximation for Neural Network Weight Compression
DiBA factors weight matrices into diagonal-binary-diagonal-binary-diagonal form to cut matrix-vector multiplies from mn to m+k+n operations and improves accuracy on DistilBERT and audio transformer tasks after replacement.
-
On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference
An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.
-
Homodyne Photonic Tensor Processor exceeds 1,000-TOPS
A homodyne photonic tensor processor using TFLN transmitters and Si/SiN circuits demonstrates 1,000-6,000 TOPS throughput with 6-7 bit accuracy at up to 120 Gbaud/s clock rates.
-
Co-Design of CNN Accelerators for TinyML using Approximate Matrix Decomposition
A co-design framework using approximate matrix decomposition and genetic algorithms delivers 33% average latency reduction in TinyML CNN FPGA accelerators with 1.3% average accuracy loss versus standard systolic arrays.
-
StableTTA: Improving Vision Model Performance by Training-free Test-Time Adaptation Methods
StableTTA improves ImageNet-1K accuracy across 71 vision models by stabilizing logit aggregation under coherent-batch inference and enabling efficient single-forward-pass adaptation.
-
IGen: Scalable Data Generation for Robot Learning from Open-World Images
IGen generates realistic visuomotor training data including actions and temporally coherent visuals from unstructured open-world images via 3D reconstruction and VLM reasoning.
-
Expressive yet Efficient Feature Expansion with Adaptive Cross-Hadamard Products
Proposes ACH module with differentiable sampling and softsign normalization for efficient feature expansion, integrated via NAS into Hadaptive-Net to claim SOTA accuracy/speed trade-offs on image classification.
-
Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT
NoNN partitions a teacher model into disjoint compressed students via network science for distributed IoT inference, matching teacher accuracy with far lower per-device memory and communication.
-
A Unified Optimization Approach for CNN Model Inference on Integrated GPUs
A unified IR plus ML-based scheduling for CNN inference on multi-vendor integrated GPUs matches or exceeds vendor libraries (up to 1.62x) on image models while supporting more models.
-
Learning Objectness from Sonar Images for Class-Independent Object Detection
A fully convolutional network regresses objectness from sonar images to achieve 96% recall using only 100 proposals per image, outperforming EdgeBoxes and Selective Search in efficiency.
-
CNN-ViT Fusion with Adaptive Attention Gate for Brain Tumor MRI Classification: A Hybrid Deep Learning Model
Hybrid CNN-ViT with adaptive attention gate achieves 97.6% accuracy on brain tumor MRI classification, outperforming baselines.
-
Scalable Explainability-as-a-Service (XaaS) for Edge AI Systems
XaaS decouples explanation generation from model inference via a distributed cache, verification protocol, and adaptive engine, achieving 38% lower latency in three edge-AI use cases.
-
YOLOv4: Optimal Speed and Accuracy of Object Detection
YOLOv4 achieves 43.5% AP (65.7% AP50) on MS COCO at ~65 FPS on Tesla V100 by integrating WRC, CSP, CmBN, SAT, Mish activation, Mosaic augmentation, DropBlock, and CIoU loss.
-
Neuron ranking -- an informed way to condense convolutional neural networks architecture
Shapley value and variational importance switch methods produce consistent rankings of filter importance in CNNs, enabling compression and interpretability.
-
One Size Does Not Fit All: Quantifying and Exposing the Accuracy-Latency Trade-off in Machine Learning Cloud Service APIs via Tolerance Tiers
Proposes Tolerance Tiers architecture for MLaaS to let consumers select accuracy-latency trade-offs, shown to outperform single-version deployment on ASR and vision workloads.
-
SwiftNet: Using Graph Propagation as Meta-knowledge to Search Highly Representative Neural Architectures
GRAM meta-graph search plus structure pruning yields SwiftNet models with 2.15x higher accuracy density than MobileNet-V2 and 26x lower search cost than FBNet on ImageNet edge constraints.
-
Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey
A comprehensive survey of edge deep learning in computer vision and medical diagnostics that presents a novel categorization of hardware platforms by performance and usage scenarios.
-
Single-bit-per-weight deep convolutional neural networks without batch-normalization layers for embedded systems
Experiments show that shifted-ReLU layers can replace batch-normalization in single-bit-weight wide residual networks on CIFAR-10/100 and ImageNet without consistent accuracy penalty.
-
What does it mean to understand a neural network?
Simple training code produces complex neural networks, suggesting that brain learning rules may be easier to understand than mature brain properties and that neuroscience should shift focus accordingly.
-
FusionAccel: A General Re-configurable Deep Learning Inference Accelerator on FPGA for Convolutional Neural Networks
FusionAccel is a scalable, runtime-reconfigurable RTL CNN inference accelerator implemented and verified on Xilinx Spartan-6 FPGA with results identical to Caffe-CPU and designed for ASIC migration.
-
Are Data Augmentation and Segmentation Always Necessary? Insights from COVID-19 X-Rays and a Methodology Thereof
Lung segmentation is necessary for reliable COVID-19 X-ray classification while excessive data augmentation leads to overfitting, with the proposed SDL-COVID method reaching 95.21% precision and low false negatives.
-
Vision-Based Lane Following and Traffic Sign Recognition for Resource-Constrained Autonomous Vehicles
A threshold-based lane detector with perspective warp and histogram curvature plus EfficientNet-B0 achieves 3.16% max lane offset RMSE and 90% on-device sign accuracy while running real-time on resource-limited hardware.
-
2D Pre-Training for 3D Pose Estimation
2D pre-training for 3D human pose estimation yields lower error and higher efficiency than 3D-only training, reaching MPJPE below 64.5 mm on standard benchmarks.
-
Slim-CNN: A Light-Weight CNN for Face Attribute Prediction
Slim-Net uses stacked Slim Modules of depthwise separable convolutions to predict face attributes on CelebA at 91.24% accuracy with at least 25 times fewer parameters than comparable models.
-
Sense Smarter, Think Better: Edge Perception for Next-Generation Networks
A structured survey of edge perception that integrates sensing modalities, edge AI, task-driven designs, and open challenges for 6G networks.
-
A Transfer Learning Evaluation of Deep Neural Networks for Image Classification
Empirical comparison of transfer learning performance across eleven pre-trained models on five image datasets using accuracy, time, and size metrics.
-
Analysis of Hyperparameter Optimization Effects on Lightweight Deep Models for Real-Time Image Classification
Hyperparameter tuning on seven lightweight models trained on a 90k-image ImageNet subset yields 1.5-3.5% top-1 accuracy gains, with RepVGG-A2 and MobileNetV3-L achieving sub-5ms latency and over 9800 FPS on GPU.
-
NeRF: Neural Radiance Field in 3D Vision: A Comprehensive Review (Updated Post-Gaussian Splatting)
A literature survey of NeRF and neural field methods from 2020-2025, organized by architecture and application taxonomies with benchmarks and dataset overviews, covering both pre- and post-Gaussian Splatting periods.
-
A Comparison of Super-Resolution and Nearest Neighbors Interpolation Applied to Object Detection on Satellite Data
Nearest-neighbor interpolation matches multi-scale deep super-resolution performance for vehicle detection on 4x-upscaled xView satellite imagery, with a 0.0002 AP difference.
-
Deep Learning in the Automotive Industry: Recent Advances and Application Examples
An overview of deep learning applications and challenges in the automotive industry, covering ADAS, automated driving, virtual sensing, and data-driven development.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.