hub Mixed citations

Rethinking the inception architecture for computer vision

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna · 2015 · cs.CV · arXiv 1512.00567

Mixed citation behavior. Most common role is background (62%).

21 Pith papers citing it

Background 62% of classified citations

open full Pith review browse 21 citing papers arXiv PDF

abstract

Convolutional networks are at the core of most state-of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21.2% top-1 and 5.6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3.5% top-5 error on the validation set (3.6% error on the test set) and 17.3% top-1 error on the validation set.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 method 3

citation-polarity summary

background 5 use method 3

representative citing papers

Mind the Gap: Geometrically Accurate Generative Reconstruction from Disjoint Views

cs.CV · 2026-05-08 · unverdicted · novelty 8.0

GLADOS reconstructs 3D geometry from disjoint views by generating intermediate perspectives, performing robust coarse alignment that tolerates generative inconsistencies, and iteratively expanding context for consistency.

Categorical Reparameterization with Gumbel-Softmax

stat.ML · 2016-11-03 · unverdicted · novelty 8.0

Gumbel-Softmax provides a continuous relaxation of categorical sampling that anneals to discrete samples for gradient-based optimization.

Transition-Matrix Regularization for Next Dialogue Act Prediction in Counselling Conversations

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

KL regularization aligning model predictions with empirical transition patterns improves macro-F1 by 9-42% in next dialogue act prediction on German counselling data and transfers to other datasets.

Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading

cs.CR · 2026-04-19 · unverdicted · novelty 7.0

Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.

Physics-informed, Generative Adversarial Design of Funicular Shells

cs.CE · 2026-04-17 · unverdicted · novelty 7.0

A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.

Diffusion Models Beat GANs on Image Synthesis

cs.LG · 2021-05-11 · accept · novelty 7.0

Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

cs.CV · 2017-04-17 · accept · novelty 7.0

MobileNets introduce depthwise separable convolutions plus width and resolution multipliers to produce efficient CNNs that trade off latency and accuracy for mobile and embedded vision applications.

MIRAGE: Robust multi-modal architectures translate fMRI-to-image models from vision to mental imagery

q-bio.NC · 2026-05-16 · unverdicted · novelty 6.0

MIRAGE achieves state-of-the-art mental image reconstruction from fMRI on the NSD-Imagery benchmark by using a linear backbone with multi-modal text and image features fed to a diffusion model.

Towards Label-Free Single-Cell Phenotyping Using Multi-Task Learning

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

A hybrid CNN-transformer model with multi-task learning achieves 91.3% WBC classification accuracy and 0.72 Pearson correlation for CD16 expression regression from label-free DPC images, augmented by LLM-generated summaries.

Separable Convolutional LSTMs for Faster Video Segmentation

cs.CV · 2019-07-16 · unverdicted · novelty 6.0

Separable convLSTMs cut parameters and FLOPs in video segmentation, delivering up to 15% faster GPU inference with similar or slightly lower accuracy.

Demystifying MMD GANs

stat.ML · 2018-01-04 · accept · novelty 6.0

MMD GANs have unbiased critic gradients but biased generator gradients from sample-based learning, and the Kernel Inception Distance provides a practical new measure for GAN convergence and dynamic learning rate adaptation.

From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

LRP on EEG transformers reveals Clever Hans artifacts in motor imagery tasks and a recurring central electrode cluster as a candidate sensorimotor signature of arousal.

CASCADE: Context-Aware Relaxation for Speculative Image Decoding

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

CASCADE formalizes semantic interchangeability and convergence in target model representations to enable context-aware acceptance relaxation in tree-based speculative decoding, delivering up to 3.6x speedup on text-to-image models without quality loss.

From Spherical to Gaussian: A Comparative Analysis of Point Cloud Cropping Strategies in Large-Scale 3D Environments

cs.CV · 2026-05-03 · unverdicted · novelty 5.0 · 2 refs

Gaussian and related cropping strategies for point cloud subclouds improve 3D neural network performance over spherical cropping on large outdoor scenes.

Affine Disentangled GAN for Interpretable and Robust AV Perception

cs.CV · 2019-07-06 · unverdicted · novelty 5.0

ADIS-GAN disentangles affine transformations in a GAN to achieve over 98% classification accuracy on MNIST within 30 degrees rotation and over 90% under FGSM and PGD attacks while generating rotation and scaling factors.

Exploring Clustering Capability of Inpainting Model Embeddings for Pattern-based Individual Identification

cs.CV · 2026-05-06 · unverdicted · novelty 5.0

Inpainting auxiliary task improves clustering of embeddings for individual zebrafish identification based on skin patterns.

Attention Is All You Need

cs.CL · 2017-06-12 · unverdicted · novelty 5.0

Pith review generated a malformed one-line summary.

Incremental Semantic Mapping with Unsupervised On-line Learning

cs.RO · 2019-07-09 · unverdicted · novelty 4.0

An incremental semantic mapping system for robots using SOMs for topological mapping and unsupervised on-line place categorization.

Automatic Colon Polyp Detection using Region based Deep CNN and Post Learning Approaches

cs.CV · 2019-06-27 · unverdicted · novelty 4.0

Region-based deep CNN with transfer learning and post-learning methods achieves better polyp detection performance than prior systems on large colonoscopy image and video databases.

Measuring the Transferability of Adversarial Examples

cs.LG · 2019-07-14 · unverdicted · novelty 3.0

Empirical measurement of adversarial example transferability between VGG and Inception model classes with methodological refinements to attack strength selection, perturbation clipping, and evaluation via SSIM.

Diabetic Retinopathy Classification using Downscaling Algorithms and Deep Learning

cs.CV · 2026-05-12 · unverdicted · novelty 3.0

A multi-channel Inception V3 model with custom downscaling preprocessing outperforms prior methods on accuracy, sensitivity, and specificity when trained on the combined Kaggle and IDRiD diabetic retinopathy datasets.

citing papers explorer

Showing 21 of 21 citing papers.

Mind the Gap: Geometrically Accurate Generative Reconstruction from Disjoint Views cs.CV · 2026-05-08 · unverdicted · none · ref 41
GLADOS reconstructs 3D geometry from disjoint views by generating intermediate perspectives, performing robust coarse alignment that tolerates generative inconsistencies, and iteratively expanding context for consistency.
Categorical Reparameterization with Gumbel-Softmax stat.ML · 2016-11-03 · unverdicted · none · ref 10
Gumbel-Softmax provides a continuous relaxation of categorical sampling that anneals to discrete samples for gradient-based optimization.
Transition-Matrix Regularization for Next Dialogue Act Prediction in Counselling Conversations cs.CL · 2026-04-20 · unverdicted · none · ref 18
KL regularization aligning model predictions with empirical transition patterns improves macro-F1 by 9-42% in next dialogue act prediction on German counselling data and transfers to other datasets.
Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading cs.CR · 2026-04-19 · unverdicted · none · ref 37
Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.
Physics-informed, Generative Adversarial Design of Funicular Shells cs.CE · 2026-04-17 · unverdicted · none · ref 42
A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.
Diffusion Models Beat GANs on Image Synthesis cs.LG · 2021-05-11 · accept · none · ref 62
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications cs.CV · 2017-04-17 · accept · none · ref 31
MobileNets introduce depthwise separable convolutions plus width and resolution multipliers to produce efficient CNNs that trade off latency and accuracy for mobile and embedded vision applications.
MIRAGE: Robust multi-modal architectures translate fMRI-to-image models from vision to mental imagery q-bio.NC · 2026-05-16 · unverdicted · none · ref 48 · internal anchor
MIRAGE achieves state-of-the-art mental image reconstruction from fMRI on the NSD-Imagery benchmark by using a linear backbone with multi-modal text and image features fed to a diffusion model.
Towards Label-Free Single-Cell Phenotyping Using Multi-Task Learning cs.CV · 2026-05-14 · unverdicted · none · ref 21 · internal anchor
A hybrid CNN-transformer model with multi-task learning achieves 91.3% WBC classification accuracy and 0.72 Pearson correlation for CD16 expression regression from label-free DPC images, augmented by LLM-generated summaries.
Separable Convolutional LSTMs for Faster Video Segmentation cs.CV · 2019-07-16 · unverdicted · none · ref 21 · internal anchor
Separable convLSTMs cut parameters and FLOPs in video segmentation, delivering up to 15% faster GPU inference with similar or slightly lower accuracy.
Demystifying MMD GANs stat.ML · 2018-01-04 · accept · none · ref 56 · internal anchor
MMD GANs have unbiased critic gradients but biased generator gradients from sample-based learning, and the Kernel Inception Distance provides a practical new measure for GAN convergence and dynamic learning rate adaptation.
From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP cs.AI · 2026-05-12 · unverdicted · none · ref 61
LRP on EEG transformers reveals Clever Hans artifacts in motor imagery tasks and a recurring central electrode cluster as a candidate sensorimotor signature of arousal.
CASCADE: Context-Aware Relaxation for Speculative Image Decoding cs.CV · 2026-05-08 · unverdicted · none · ref 40
CASCADE formalizes semantic interchangeability and convergence in target model representations to enable context-aware acceptance relaxation in tree-based speculative decoding, delivering up to 3.6x speedup on text-to-image models without quality loss.
From Spherical to Gaussian: A Comparative Analysis of Point Cloud Cropping Strategies in Large-Scale 3D Environments cs.CV · 2026-05-03 · unverdicted · none · ref 76 · 2 links · internal anchor
Gaussian and related cropping strategies for point cloud subclouds improve 3D neural network performance over spherical cropping on large outdoor scenes.
Affine Disentangled GAN for Interpretable and Robust AV Perception cs.CV · 2019-07-06 · unverdicted · none · ref 27 · internal anchor
ADIS-GAN disentangles affine transformations in a GAN to achieve over 98% classification accuracy on MNIST within 30 degrees rotation and over 90% under FGSM and PGD attacks while generating rotation and scaling factors.
Exploring Clustering Capability of Inpainting Model Embeddings for Pattern-based Individual Identification cs.CV · 2026-05-06 · unverdicted · none · ref 43
Inpainting auxiliary task improves clustering of embeddings for individual zebrafish identification based on skin patterns.
Attention Is All You Need cs.CL · 2017-06-12 · unverdicted · none · ref 36
Pith review generated a malformed one-line summary.
Incremental Semantic Mapping with Unsupervised On-line Learning cs.RO · 2019-07-09 · unverdicted · none · ref 16 · internal anchor
An incremental semantic mapping system for robots using SOMs for topological mapping and unsupervised on-line place categorization.
Automatic Colon Polyp Detection using Region based Deep CNN and Post Learning Approaches cs.CV · 2019-06-27 · unverdicted · none · ref 37 · internal anchor
Region-based deep CNN with transfer learning and post-learning methods achieves better polyp detection performance than prior systems on large colonoscopy image and video databases.
Measuring the Transferability of Adversarial Examples cs.LG · 2019-07-14 · unverdicted · none · ref 20 · internal anchor
Empirical measurement of adversarial example transferability between VGG and Inception model classes with methodological refinements to attack strength selection, perturbation clipping, and evaluation via SSIM.
Diabetic Retinopathy Classification using Downscaling Algorithms and Deep Learning cs.CV · 2026-05-12 · unverdicted · none · ref 20
A multi-channel Inception V3 model with custom downscaling preprocessing outperforms prior methods on accuracy, sensitivity, and specificity when trained on the combined Kaggle and IDRiD diabetic retinopathy datasets.

Rethinking the inception architecture for computer vision

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer