Very Deep Convolutional Networks for Large-Scale Image Recognition

Andrew Zisserman; Karen Simonyan

arxiv: 1409.1556 · v6 · submitted 2014-09-04 · 💻 cs.CV

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan , Andrew Zisserman This is my paper

classification 💻 cs.CV

keywords depthconvolutionaldeepimagelarge-scalenetworksrecognitionrepresentations

0 comments

read the original abstract

In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Rotation Equivariant Mamba for Vision Tasks
cs.CV 2026-03 unverdicted novelty 8.0

EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-e...
Density estimation using Real NVP
cs.LG 2016-05 accept novelty 8.0

Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.
U-Net: Convolutional Networks for Biomedical Image Segmentation
cs.CV 2015-05 accept novelty 8.0

A u-shaped fully-convolutional encoder-decoder with skip connections trained with elastic-deformation augmentation produces accurate biomedical image segmentations from very small training sets.
MAPS: A Synthetic Dataset for Probing Vision Models in a Controlled 3D Scene Space
cs.CV 2026-05 unverdicted novelty 7.0

MAPS provides 2618 validated 3D meshes and a controllable rendering pipeline to attribute vision model recognition failures to specific scene parameters, finding camera distance and elevation as the dominant failure f...
Implicit Bias of Mirror Flow in Homogeneous Neural Networks: Sparse and Dense Feature Learning
cs.LG 2026-05 unverdicted novelty 7.0

Mirror flow reaches max-margin solutions in homogeneous neural networks where the mirror map choice controls whether learned features are sparse or dense while convergence can be exponentially slow.
Next-Acceleration-Scale Prediction for Autoregressive MRI Reconstruction
eess.IV 2026-05 unverdicted novelty 7.0

Next-acceleration-scale autoregressive prediction in discrete latent space with on-policy privileged information distillation yields improved MRI reconstructions from sparse measurements on the fastMRI benchmark.
How to Evaluate and Refine your CAM
cs.CV 2026-05 unverdicted novelty 7.0

Introduces synthetic ground-truth dataset for CAM evaluation, proposes ARCC composite metric, and RefineCAM method that aggregates layers for higher-resolution maps outperforming baselines.
KamonBench: A Grammar-Based Dataset for Evaluating Compositional Factor Recovery in Vision-Language Models
cs.CV 2026-05 unverdicted novelty 7.0

KamonBench is a grammar-based dataset of 20,000 synthetic Japanese crests with multi-format annotations that enables direct evaluation of factor recovery beyond caption accuracy in vision-language models.
KamonBench: A Grammar-Based Dataset for Evaluating Compositional Factor Recovery in Vision-Language Models
cs.CV 2026-05 unverdicted novelty 7.0

KamonBench is a grammar-generated synthetic dataset of compositional kamon crests with explicit factor annotations to evaluate factor recovery in vision-language models.
From Compression to Accountability: Harmless Copyright Protection for Dataset Distillation
cs.CR 2026-05 unverdicted novelty 7.0

SubPopMark embeds verifiable subpopulation biases into distilled datasets via CVM and USTM optimization stages, allowing provenance inference through comparison of model output signatures against a reference behavior bank.
From Compression to Accountability: Harmless Copyright Protection for Dataset Distillation
cs.CR 2026-05 unverdicted novelty 7.0

SubPopMark protects distilled datasets by injecting verifiable subpopulation biases that create distinguishable model behaviors for copyright tracing without using backdoors.
Human face perception reflects inverse-generative and naturalistic discriminative objectives
q-bio.NC 2026-05 unverdicted novelty 7.0

Human face perception aligns with neural networks trained on inverse-generative and naturalistic discriminative tasks, as these best predict human dissimilarity judgments on controversial and random face pairs.
Cross-Modal-Domain Generalization Through Semantically Aligned Discrete Representations
cs.CV 2026-05 unverdicted novelty 7.0

CoDAAR aligns modality-specific codebooks at the index level using Discrete Temporal Alignment and Cascading Semantic Alignment to achieve cross-modal generalization while preserving unique structures, reporting state...
Cross-Modal-Domain Generalization Through Semantically Aligned Discrete Representations
cs.CV 2026-05 unverdicted novelty 7.0

CoDAAR creates a unified discrete representation space for multimodal sequences by aligning modality-specific codebooks through index-level semantic consensus, enabling both specificity and cross-modal generalization.
TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles
cs.CV 2026-05 unverdicted novelty 7.0

TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.
Concept-Based Abductive and Contrastive Explanations for Behaviors of Vision Models
cs.LG 2026-05 unverdicted novelty 7.0

Concept-based abductive and contrastive explanations find minimal high-level concepts that causally determine vision model outcomes on individual images or groups sharing a specified behavior.
Empirical Evidence for Simply Connected Decision Regions in Image Classifiers
cs.CV 2026-05 unverdicted novelty 7.0

Empirical tests with quad-mesh filling indicate that decision regions in modern image classifiers are simply connected.
Retain-Neutral Surrogates for Min-Max Unlearning
cs.LG 2026-05 unverdicted novelty 7.0

ROSU derives a closed-form retain-neutral perturbation for min-max unlearning that bounds retain damage via curvature and improves performance when gradients are aligned.
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
cs.CV 2026-05 unverdicted novelty 7.0

D-OPSD formulates supervised fine-tuning of step-distilled diffusion models as on-policy self-distillation by minimizing distribution differences between a text-only student and a multimodal teacher on the student's o...
DMGD: Train-Free Dataset Distillation with Semantic-Distribution Matching in Diffusion Models
cs.CV 2026-05 unverdicted novelty 7.0

DMGD achieves better performance than fine-tuned SOTA methods in dataset distillation on ImageNet subsets by using semantic matching through conditional likelihood optimization and OT-based distribution matching in a ...
Heterogeneous Model Fusion for Privacy-Aware Multi-Camera Surveillance via Synthetic Domain Adaptation
cs.CV 2026-05 unverdicted novelty 7.0

HeroCrystal uses single-image diffusion synthesis, probabilistic federated Faster R-CNN with contrastive debiasing, and inconsistent-category integration to reach 33.4% mAP in privacy-preserving multi-camera object detection.
Dual-branch Robust Unlearnable Examples
cs.CV 2026-05 unverdicted novelty 7.0

DUNE creates robust unlearnable examples through dual-branch spatial-color perturbation optimization and ensemble strategies, achieving lower average test accuracies of 14.95% to 50.82% than 12 prior methods against 7...
Hierarchical Spatio-Channel Clustering for Efficient Model Compression in Medical Image Analysis
cs.CV 2026-04 unverdicted novelty 7.0

A spatio-channel clustering framework for CNN compression reduces FLOPs by 81% and raises brain tumor MRI classification accuracy from 87.76% to 89.80% compared with global SVD and Tucker baselines.
KAConvNet: Kolmogorov-Arnold Convolutional Networks for Vision Recognition
cs.CV 2026-04 unverdicted novelty 7.0

KAConvNet introduces a Kolmogorov-Arnold Convolutional Layer to build networks competitive with ViTs and CNNs while offering stronger theoretical interpretability.
Different Strokes for Different Folks: Writer Identification for Historical Arabic Manuscripts
cs.CV 2026-04 unverdicted novelty 7.0

CNN models with attention reach 99.05% top-1 accuracy on line-level splits and 78.61% on page-disjoint splits for writer identification after expanding the labeled portion of the Muharaf historical Arabic manuscript dataset.
Causal Disentanglement for Full-Reference Image Quality Assessment
cs.CV 2026-04 unverdicted novelty 7.0

Causal disentanglement decouples content and degradation representations via intervention on latents and a content-masking module to predict quality scores from degradation features, achieving strong benchmark perform...
MESA: A Training-Free Multi-Exemplar Deep Framework for Restoring Ancient Inscription Textures
cs.CV 2026-04 unverdicted novelty 7.0

MESA restores ancient inscription textures via multi-exemplar style transfer from VGG19 features with per-layer exemplar selection and OCR-derived weights, without any model training.
Channel-Level Semantic Perturbations: Unlearnable Examples for Diverse Training Paradigms
cs.LG 2026-04 unverdicted novelty 7.0

Unlearnable examples fail under pretraining-finetuning due to semantic filtering by frozen layers, but Shallow Semantic Camouflage restores effectiveness by confining perturbations to semantically valid subspaces.
Physically-Induced Atmospheric Adversarial Perturbations: Enhancing Transferability and Robustness in Remote Sensing Image Classification
cs.CV 2026-04 unverdicted novelty 7.0

FogFool creates fog-based adversarial perturbations using Perlin noise optimization to achieve high black-box transferability (83.74% TASR) and robustness to defenses in remote sensing classification.
VidTAG: Temporally Aligned Video to GPS Geolocalization with Denoising Sequence Prediction at a Global Scale
cs.CV 2026-04 unverdicted novelty 7.0

VidTAG achieves fine-grained global video-to-GPS geolocalization via temporal frame alignment and denoising sequence refinement, reporting 20% gains at 1 km over GeoCLIP and 25% on CityGuessr68k.
Ghosts of eruptions past: Searching for historical Galactic supernovae using variable thermal dust echoes and machine learning
astro-ph.HE 2026-04 unverdicted novelty 7.0

An all-sky NEOWISE-based search using difference imaging and a CNN classifier trained on Cas A echoes detects no other historical Galactic supernova dust echoes at WISE sensitivity and delivers a catalog of 20477 Cas ...
Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning
cs.CR 2026-03 unverdicted novelty 7.0

SABLE shows that semantics-aware natural triggers enable effective backdoor attacks in federated learning against multiple aggregation rules while preserving benign accuracy.
Do Machines Fail Like Humans? A Human-Centred Out-of-Distribution Spectrum for Mapping Error Alignment
cs.AI 2026-03 unverdicted novelty 7.0

A human-centered OOD spectrum based on perceptual difficulty shows vision-language models align best with human errors across regimes, with CNNs stronger on near-OOD and ViTs on far-OOD.
Retinex Meets Language: A Physics-Semantics-Guided Underwater Image Enhancement Network
cs.CV 2026-03 unverdicted novelty 7.0

PSG-UIENet fuses Retinex physics with CLIP-derived text semantics and a new multimodal dataset to enhance underwater images, claiming better results than fifteen prior methods.
FedBCD:Communication-Efficient Accelerated Block Coordinate Gradient Descent for Federated Learning
cs.LG 2026-03 unverdicted novelty 7.0

FedBCGD reduces communication in federated learning by a factor of 1/N through block-wise parameter updates with accelerated convergence guarantees.
Two-stage Convolutional Neural Network for pseudo six-dimensional phase space reconstruction
hep-ex 2026-03 unverdicted novelty 7.0

A two-stage CNN reconstructs pseudo 6D phase space from 16 x-y images taken at varying rotation angles in the KEK-ATF injector.
Descriptor: Parasitoid Wasps and Associated Hymenoptera Dataset (DAPWH)
cs.CV 2026-02 unverdicted novelty 7.0

Releases the DAPWH dataset of 3556 wasp images including 1739 COCO-annotated examples to enable AI models for identifying Ichneumonoidea and associated families.
The Weight of a Bit: EMFI Sensitivity Analysis of Embedded Deep Learning Models
cs.CR 2026-02 unverdicted novelty 7.0

Floating-point weight formats in embedded neural networks suffer near-total accuracy loss from a single electromagnetic fault injection, while 8-bit integer formats retain substantially higher accuracy on the same hardware.
A Case for Hypergraphs to Model and Map SNNs on Neuromorphic Hardware
cs.AR 2026-01 conditional novelty 7.0

Hypergraph modeling of SNNs improves neuron-to-core mapping on neuromorphic hardware by exploiting hyperedge overlap and locality for better partitioning and placement than graph-based methods.
Thinking Like Van Gogh: Structure-Aware Style Transfer via Flow-Guided 3D Gaussian Splatting
cs.CV 2026-01 unverdicted novelty 7.0

Flow-guided advection in 3D Gaussian Splatting transfers 2D artistic motion into 3D geometry to produce structure-aware stylization.
LooseRoPE: Content-aware Attention Manipulation for Semantic Harmonization
cs.GR 2026-01 unverdicted novelty 7.0

LooseRoPE modulates RoPE in diffusion attention maps to continuously trade off between preserving a pasted object's identity and harmonizing it with its new surroundings.
Fusion2Print: Deep Flash-Non-Flash Fusion for Contactless Fingerprint Matching
cs.CV 2026-01 unverdicted novelty 7.0

Fusion2Print fuses flash-non-flash contactless fingerprints via attention-based networks and U-Net enhancement to reach AUC 0.999 and EER 1.12% with cross-domain compatibility.
Agentic AI in Remote Sensing: Foundations, Taxonomy, and Emerging Systems
cs.CV 2026-01 unverdicted novelty 7.0

The paper delivers the first comprehensive review and unified taxonomy of agentic AI in remote sensing, covering single-agent copilots, multi-agent systems, planning mechanisms, benchmarks, and a roadmap while noting ...
It's Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models
cs.CV 2025-12 unverdicted novelty 7.0

Noise optimization during sampling recovers diversity in mode-collapsed diffusion models while preserving output fidelity.
GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents
q-bio.QM 2025-10 unverdicted novelty 7.0

GenCellAgent deploys a planner-executor-evaluator LLM agent loop to automatically select, adapt, and refine segmentation tools for diverse cellular microscopy images, matching or exceeding specialist performance on 4,...
A fast machine learning tool to predict the composition of astronomical ices from infrared absorption spectra
astro-ph.GA 2025-09 unverdicted novelty 7.0

Neural-network model trained on lab ice spectra predicts fractional composition of H2O, CO, CO2, CH3OH, NH3, and CH4 from 2.5-10 micron IR absorption with typical 3% error and was validated on two JWST background-star...
Prospects for Deep-Learning-Based Mass Reconstruction of Ultra-High-Energy Cosmic Rays using Simulated Air-Shower Profiles
astro-ph.HE 2025-08 conditional novelty 7.0

A CNN predicts ln A from longitudinal shower profiles with bias under 0.4, resolution 1-1.5, and proton-iron merit factor 2.19, outperforming simpler ML models on shape parameters and remaining robust to hadronic mode...
Partitioning for Intrinsic Model Inversion Resistance in Collaborative Inference
cs.IT 2025-06 conditional novelty 7.0

The authors identify a Golden Partition Zone based on an intra-class variance shift in entropy bounds that enables intrinsic model inversion resistance when partitioning neural networks for collaborative inference.
Event-based Civil Infrastructure Visual Defect Detection: ev-CIVIL Dataset and Benchmark
cs.CV 2025-04 unverdicted novelty 7.0

Presents the ev-CIVIL dataset and benchmark showing that event-based cameras can support real-time detection of cracks and spalling in civil infrastructure under challenging lighting.
Higher Order Approximation Rates for ReLU CNNs in Korobov Spaces
cs.LG 2025-01 unverdicted novelty 7.0

ReLU CNNs achieve (m+1)-th order L_p approximation rates for Korobov functions with mixed derivatives of order m+1 via approximate sparse grid basis representations, improving on classical second-order rates.
V-RoAst: Visual Road Assessment. Can VLM be a Road Safety Assessor Using the iRAP Standard?
cs.CV 2024-08 unverdicted novelty 7.0

V-RoAst applies zero-shot VLMs (Gemini-1.5-flash, GPT-4o-mini) to iRAP road safety attribute classification on a new ThaiRAP image dataset and compares them to CNN baselines, finding better generalization to unseen cl...
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
cs.CV 2024-06 unverdicted novelty 7.0

MirrorCheck detects adversarial attacks on VLMs via T2I regeneration for semantic consistency checks, using stochastic model selection and one-time perturbations for robustness against adaptive attacks.
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
cs.CV 2024-01 conditional novelty 7.0

Vim is a bidirectional Mamba vision backbone that outperforms DeiT in accuracy on standard tasks while being substantially faster and more memory-efficient for high-resolution images.
Attacking the Spike: On the Transferability and Security of Spiking Neural Networks to Adversarial Examples
cs.NE 2022-09 unverdicted novelty 7.0

MDSE attack uses dynamic multi-surrogate gradient estimation to create adversarial examples that simultaneously fool SNNs, ViTs, and CNNs, with reported gains up to 91.4% on ensembles and 3x on adversarially trained S...
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
cs.LG 2022-08 conditional novelty 7.0

LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.
Neural Operator: Graph Kernel Network for Partial Differential Equations
cs.LG 2020-03 unverdicted novelty 7.0

Graph Kernel Networks learn PDE solution operators that generalize across discretization methods and grid resolutions using graph-based kernel integration.
A Simple Framework for Contrastive Learning of Visual Representations
cs.LG 2020-02 accept novelty 7.0

SimCLR learns visual representations by contrasting augmented views of the same image and reaches 76.5% ImageNet top-1 accuracy with a linear classifier, matching a supervised ResNet-50.
MVB: A Large-Scale Dataset for Baggage Re-Identification and Merged Siamese Networks
cs.CV 2019-07 unverdicted novelty 7.0

Releases MVB, a multi-view baggage re-identification dataset with 4519 identities and 22660 images, plus a merged Siamese network baseline evaluated on it.
Switchable Normalization for Learning-to-Normalize Deep Representation
cs.CV 2019-07 unverdicted novelty 7.0

Switchable Normalization learns per-layer weights to combine channel, layer, and minibatch normalizers, claiming robustness to batch size and better results than fixed normalizers on ImageNet, COCO, CityScapes, ADE20K...
DaiMoN: A Decentralized Artificial Intelligence Model Network
cs.LG 2019-07 unverdicted novelty 7.0

DaiMoN introduces a decentralized ledger-based network for collaborative ML model improvement with label-hidden proof-of-improvement enabled by a novel learnable Distance Embedding for Labels (DEL) function.