Xception: Deep Learning with Depthwise Separable Convolutions
read the original abstract
We present an interpretation of Inception modules in convolutional neural networks as being an intermediate step in-between regular convolution and the depthwise separable convolution operation (a depthwise convolution followed by a pointwise convolution). In this light, a depthwise separable convolution can be understood as an Inception module with a maximally large number of towers. This observation leads us to propose a novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions. We show that this architecture, dubbed Xception, slightly outperforms Inception V3 on the ImageNet dataset (which Inception V3 was designed for), and significantly outperforms Inception V3 on a larger image classification dataset comprising 350 million images and 17,000 classes. Since the Xception architecture has the same number of parameters as Inception V3, the performance gains are not due to increased capacity but rather to a more efficient use of model parameters.
This paper has not been read by Pith yet.
Forward citations
Cited by 13 Pith papers
-
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
MobileNets introduce depthwise separable convolutions plus width and resolution multipliers to produce efficient CNNs that trade off latency and accuracy for mobile and embedded vision applications.
-
Deepfake Detection Generalization with Diffusion Noise
ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.
-
On-chip probabilistic inference for charged-particle tracking at the sensor edge
Neural networks integrated into silicon sensor front-end electronics can regress charged-particle hit positions and angles with calibrated uncertainties from single-layer data while satisfying hardware constraints on ...
-
Separable Convolutional LSTMs for Faster Video Segmentation
Separable convLSTMs cut parameters and FLOPs in video segmentation, delivering up to 15% faster GPU inference with similar or slightly lower accuracy.
-
Rethinking Atrous Convolution for Semantic Image Segmentation
DeepLabv3 improves semantic segmentation by capturing multi-scale context with cascaded or parallel atrous convolutions and adding global context to ASPP, achieving better results on PASCAL VOC 2012 without DenseCRF p...
-
EPNAS: Efficient Progressive Neural Architecture Search
EPNAS uses a progressive search policy with REINFORCE performance prediction to search neural architectures in parallel, supporting multiple resource constraints and outperforming ENAS and PNAS on CIFAR-10 and ImageNe...
-
The Ethical Dilemma when (not) Setting up Cost-based Decision Rules in Semantic Segmentation
Defining egoistic and altruistic cost functions for class confusions in semantic segmentation changes precision, recall, and segment-wise error rates relative to standard MAP decisions.
-
Remote Estimation of Free-Flow Speeds
A CNN estimates free-flow speeds from aerial imagery and metadata, performing nearly as well with imagery alone as with road features.
-
Deep Single Image Deraining Via Estimating Transmission and Atmospheric Light in rainy Scenes
A deep network estimates per-image atmospheric light and a transmission map, then recovers a clear image from the atmospheric scattering model, outperforming prior deraining methods.
-
Attention Is All You Need
Pith review generated a malformed one-line summary.
-
DYMAPIA: A Multi-Domain Framework for Detecting AI-based Video Manipulation
DYMAPIA builds dynamic anomaly masks from Fourier spectra, texture, edges, and optical flow to guide a lightweight DistXCNet classifier, reporting over 99% accuracy and F1 on FF++, Celeb-DF, and VDFD.
-
Measuring the Transferability of Adversarial Examples
Empirical measurement of adversarial example transferability between VGG and Inception model classes with methodological refinements to attack strength selection, perturbation clipping, and evaluation via SSIM.
-
A Comprehensive Comparison of Deep Learning Architectures for COVID-19 Classification on CT & X-ray Imagery
ResNet and VGG models achieve 95-98% average accuracy distinguishing COVID-19 from normal lung images on X-ray and CT datasets using transfer learning from pre-trained networks.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.