Going Deeper with Convolutions

Andrew Rabinovich; Christian Szegedy; Dragomir Anguelov; Dumitru Erhan; Pierre Sermanet; Scott Reed; Vincent Vanhoucke; Wei Liu; Yangqing Jia

arxiv: 1409.4842 · v1 · pith:XROIYAJVnew · submitted 2014-09-17 · 💻 cs.CV

Going Deeper with Convolutions

Christian Szegedy , Wei Liu , Yangqing Jia , Pierre Sermanet , Scott Reed , Dragomir Anguelov , Dumitru Erhan , Vincent Vanhoucke

show 1 more author

Andrew Rabinovich

This is my paper

classification 💻 cs.CV

keywords networkarchitectureclassificationdeepdetectionilsvrcqualityachieved

0 comments

read the original abstract

We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC 2014 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 16 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AGAN: Towards Automated Design of Generative Adversarial Networks
cs.LG 2019-06 unverdicted novelty 8.0

AGAN is the first neural architecture search method for GANs that discovers architectures outperforming state-of-the-art on CIFAR-10 unsupervised image generation and competitive on supervised tasks.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
cs.LG 2015-02 conditional novelty 8.0

Batch Normalization normalizes layer inputs per mini-batch to reduce internal covariate shift, allowing higher learning rates, less careful initialization, and faster convergence in deep networks.
Conditional Generative Adversarial Nets
cs.LG 2014-11 accept novelty 8.0

Conditional GANs generate samples matching a given condition by supplying the condition to both generator and discriminator.
Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading
cs.CR 2026-04 unverdicted novelty 7.0

Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.
Polarized Target Nuclear Magnetic Resonance Measurements with Deep Neural Networks
physics.ins-det 2026-03 unverdicted novelty 7.0

Deep neural networks reduce fitting uncertainties in CW-NMR polarization measurements for dynamically polarized targets.
Mixed Precision Training
cs.AI 2017-10 accept novelty 7.0

Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
cs.CV 2015-10 conditional novelty 7.0

A pruning-quantization-Huffman pipeline compresses deep neural networks 35-49x without accuracy loss.
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop
cs.CV 2015-06 accept novelty 7.0

LSUN dataset of one million images per category across 30 classes is constructed via iterative human-in-the-loop deep learning labeling.
Separable Convolutional LSTMs for Faster Video Segmentation
cs.CV 2019-07 unverdicted novelty 6.0

Separable convLSTMs cut parameters and FLOPs in video segmentation, delivering up to 15% faster GPU inference with similar or slightly lower accuracy.
Pre-localization of Massive Black Hole Binaries in the Millihertz Band
gr-qc 2026-04 unverdicted novelty 5.0

A neural spline flow pipeline performs amortized inference on millihertz MBHB signals, delivering ~20 deg² pre-merger sky localizations in ~1 minute while matching PTMCMC sky modes and parameter uncertainties.
Enhancing Hazy Wildlife Imagery: AnimalHaze3k and IncepDehazeGan
cs.CV 2026-04 conditional novelty 5.0

A new wildlife-specific hazy image dataset and IncepDehazeGan model that reports state-of-the-art dehazing metrics and more than doubles downstream animal detection performance.
One Size Does Not Fit All: Quantifying and Exposing the Accuracy-Latency Trade-off in Machine Learning Cloud Service APIs via Tolerance Tiers
cs.LG 2019-06 unverdicted novelty 5.0

Proposes Tolerance Tiers architecture for MLaaS to let consumers select accuracy-latency trade-offs, shown to outperform single-version deployment on ASR and vision workloads.
IncepDeHazeGAN: Novel Satellite Image Dehazing
cs.CV 2026-04 unverdicted novelty 4.0

IncepDeHazeGAN is a GAN with Inception blocks and multi-layer feature fusion that claims state-of-the-art single-image dehazing performance on satellite datasets.
Deep-Learning-Based Aerial Image Classification for Emergency Response Applications Using Unmanned Aerial Vehicles
cs.CV 2019-06 unverdicted novelty 4.0

Introduces AIDER database and a lightweight CNN achieving ~3x higher performance on embedded platforms with <2% accuracy drop for aerial disaster scene classification.
Measuring the Transferability of Adversarial Examples
cs.LG 2019-07 unverdicted novelty 3.0

Empirical measurement of adversarial example transferability between VGG and Inception model classes with methodological refinements to attack strength selection, perturbation clipping, and evaluation via SSIM.
Vision Language Models versus Machine Learning Models Performance on Polyp Detection and Classification in Colonoscopy Images
eess.IV 2025-03 unverdicted novelty 2.0

Empirical benchmark of 11 models on polyp detection and classification in colonoscopy images shows ResNet50 highest, BiomedCLIP and GPT-4 moderate on detection, and general VLMs weak on classification.