arxiv: 1905.11946 · v5 · submitted 2019-05-28 · 💻 cs.LG · cs.CV· stat.ML

Recognition: 3 theorem links

· Lean Theorem

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Mingxing Tan , Quoc V. Le

Authors on Pith no claims yet

Pith reviewed 2026-05-16 12:15 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML

keywords EfficientNetcompound scalingmodel scalingconvolutional neural networksImageNet accuracyneural architecture searchaccuracy-efficiency tradeoff

0 comments

The pith

Scaling depth, width, and resolution together with one compound coefficient produces more accurate and efficient convolutional networks than scaling any single dimension.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that fixed-resource ConvNets improve more when depth, width, and resolution are increased in a coordinated way rather than one at a time. A single coefficient phi multiplies each dimension by fixed factors found through grid search on a small model. This compound scaling is applied first to existing architectures like MobileNet and ResNet, then to a new baseline network discovered by neural architecture search. The resulting EfficientNet family reaches 84.3 percent top-1 accuracy on ImageNet while using far fewer parameters and less inference time than earlier leaders. The same models also transfer effectively to CIFAR-100, Flowers, and other datasets.

Core claim

A compound scaling method that raises network depth by alpha to the power phi, width by beta to the power phi, and resolution by gamma to the power phi, with alpha, beta, and gamma fixed by grid search on a baseline model, yields a family of networks called EfficientNets. EfficientNet-B7 attains 84.3 percent top-1 accuracy on ImageNet while requiring 8.4 times fewer parameters and 6.1 times less inference time than the previous best ConvNet.

What carries the argument

Compound scaling coefficient phi that simultaneously enlarges depth, width, and resolution according to fixed ratios determined once on a small network.

If this is right

EfficientNet-B7 sets a new accuracy record on ImageNet while cutting model size by 8.4 times and inference time by 6.1 times relative to prior ConvNets.
The same compound scaling improves both MobileNets and ResNets without changing their architectures.
EfficientNets maintain state-of-the-art results on CIFAR-100, Flowers, and three additional transfer datasets while using an order of magnitude fewer parameters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The separation between finding a good baseline architecture and then scaling it uniformly may apply to architectures other than ConvNets.
Systematic scaling rules could reduce the need for repeated architecture search when moving to new hardware constraints.
Adaptive versions of the coefficient might further improve performance on tasks with different accuracy versus speed priorities.

Load-bearing premise

The scaling ratios found by grid search on a small baseline network stay near-optimal when the same ratios are used on much larger models and on different datasets.

What would settle it

A larger model trained with scaling ratios different from those found on the small baseline achieves higher ImageNet accuracy than the model produced by the compound coefficient.

read the original abstract

Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available. In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, we propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. We demonstrate the effectiveness of this method on scaling up MobileNets and ResNet. To go even further, we use neural architecture search to design a new baseline network and scale it up to obtain a family of models, called EfficientNets, which achieve much better accuracy and efficiency than previous ConvNets. In particular, our EfficientNet-B7 achieves state-of-the-art 84.3% top-1 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet. Our EfficientNets also transfer well and achieve state-of-the-art accuracy on CIFAR-100 (91.7%), Flowers (98.8%), and 3 other transfer learning datasets, with an order of magnitude fewer parameters. Source code is at https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Compound scaling of depth, width, and resolution with one coefficient is the practical advance, and the experiments on ImageNet plus transfers show it beats single-dimension scaling without major holes.

read the letter

The main thing to know is that this paper gives a straightforward way to scale ConvNets by balancing depth, width, and resolution together instead of adjusting one dimension at a time. They start with a neural architecture search baseline, then apply a single coefficient phi to grow all three dimensions in fixed ratios, and the resulting EfficientNet family delivers higher accuracy at lower compute than prior models. They also test the same scaling rule on MobileNet and ResNet and see consistent gains, which helps show the idea is not tied to one architecture. On ImageNet, EfficientNet-B7 reaches 84.3% top-1 while using 8.4x fewer parameters and running 6.1x faster than the previous best ConvNet. The transfer results on CIFAR-100, Flowers, and three other datasets follow the same pattern, with an order of magnitude fewer parameters in several cases. Ablations directly compare compound scaling to depth-only, width-only, and resolution-only scaling and confirm the joint approach wins. Code is released, so the numbers are straightforward to reproduce. The soft spot is that the scaling ratios themselves come from grid search on the small baseline; the paper shows these ratios still work when applied to much larger models and other families, but it is possible a fresh search at each scale could do slightly better. That limitation is minor given the breadth of the validation. This paper is for anyone who trains or deploys ConvNets under resource constraints. The method is simple enough to adopt and the empirical support is strong enough to merit peer review.

Referee Report

1 major / 2 minor

Summary. The paper claims that carefully balancing network depth, width, and resolution via a single compound scaling coefficient phi yields better accuracy-efficiency tradeoffs than scaling any one dimension independently. The authors first identify fixed scaling ratios (alpha, beta, gamma) by grid search on a small baseline, demonstrate the method on MobileNet and ResNet families, then use neural architecture search to obtain EfficientNet-B0 and scale it uniformly to produce the B1-B7 family. EfficientNet-B7 reaches 84.3% top-1 ImageNet accuracy while being 8.4x smaller and 6.1x faster than prior best ConvNets, with strong transfer results on CIFAR-100, Flowers, and three additional datasets.

Significance. If the empirical results hold, the work is significant because it supplies a simple, reproducible scaling rule that improves upon conventional single-dimension scaling and has been widely adopted as a baseline. The large-scale ImageNet experiments, cross-family validation on MobileNet/ResNet, and transfer-task results provide direct support for the central claim, while the public code release aids reproducibility.

major comments (1)

§3.2: the grid search that fixes alpha=1.2, beta=1.1, gamma=1.15 is performed only on the small baseline with phi in [1,5]; although the paper shows consistent gains when these ratios are applied to larger models, the load-bearing assumption that the ratios remain near-optimal at scale is supported only by the final held-out results rather than by intermediate-scale ablations that would quantify sensitivity to the chosen coefficients.

minor comments (2)

The abstract states that EfficientNets achieve state-of-the-art accuracy on '3 other transfer learning datasets' but does not name them; explicitly listing all five datasets in the abstract would improve clarity.
Eq. (2) and the surrounding text: the FLOPS constraint alpha * beta^2 * gamma^2 ≈ 2 is stated without a short derivation or reference to the underlying FLOPS scaling assumptions; adding one sentence would make the origin of the constant transparent.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive review and the recommendation to accept. The summary accurately reflects the paper's contributions. We respond to the major comment below.

read point-by-point responses

Referee: [—] §3.2: the grid search that fixes alpha=1.2, beta=1.1, gamma=1.15 is performed only on the small baseline with phi in [1,5]; although the paper shows consistent gains when these ratios are applied to larger models, the load-bearing assumption that the ratios remain near-optimal at scale is supported only by the final held-out results rather than by intermediate-scale ablations that would quantify sensitivity to the chosen coefficients.

Authors: We appreciate the referee's careful reading of §3.2. The grid search for α=1.2, β=1.1, γ=1.15 was performed on the small baseline for φ ∈ [1,5] because the compound scaling rule is derived from the FLOPs equation, which predicts that the relative ratios among depth, width, and resolution should remain approximately constant across scales. We then directly test this assumption by applying the same fixed ratios to scale MobileNet and ResNet families as well as EfficientNet-B0 up to B7. The resulting models exhibit consistent accuracy-efficiency gains, culminating in EfficientNet-B7's state-of-the-art results. While additional ablations at every intermediate scale would be informative, they are computationally prohibitive; the broad validation across model families and the final held-out performance on large models constitute the most relevant evidence. In the revised manuscript we will add a short clarifying paragraph in §3.2 that explicitly states the theoretical motivation for constant ratios and summarizes the cross-scale validation already present in the experiments. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper determines scaling coefficients alpha, beta, and gamma via grid search on the small EfficientNet-B0 baseline to satisfy the compound scaling constraint. These fixed ratios are then applied uniformly via a single coefficient phi to produce larger models B1-B7. However, each scaled model is trained independently from scratch and evaluated on ImageNet plus transfer datasets, yielding accuracy and efficiency numbers that constitute external empirical measurements rather than outputs forced by the fitting procedure. No equation reduces to its own inputs by construction, no load-bearing self-citation is invoked for uniqueness, and the central performance claims rest on held-out training runs rather than tautological renaming or prediction of fitted quantities.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical observation that balanced scaling is superior, with the compound coefficient serving as the main tunable parameter.

free parameters (1)

compound coefficient phi
Chosen via grid search on the baseline network to match target resource budgets; different integer values produce the B1-B7 family.

axioms (1)

domain assumption There exists a fixed set of scaling ratios for depth, width, and resolution that remains near-optimal across model sizes.
Invoked when the authors apply the same ratios found on B0 to all larger models.

pith-pipeline@v0.9.0 · 5541 in / 1214 out tokens · 36767 ms · 2026-05-16T12:15:52.068716+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

PhiForcing phi_equation echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

we propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient... d=α^φ, w=β^φ, r=γ^φ s.t. α·β²·γ²≈2
Cost.FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

our EfficientNet-B7 achieves state-of-the-art 84.3% top-1 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification
cs.CV 2026-04 unverdicted novelty 7.0

The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray cl...
SMCNet: Supervised Surface Material Classification Using mmWave Radar IQ Signals and Complex-valued CNNs
eess.SP 2026-04 unverdicted novelty 7.0

SMCNet applies a complex-valued CNN to mmWave radar IQ data for high-accuracy surface material classification across multiple and unseen sensing distances.
The DeepFake Detection Challenge (DFDC) Dataset
cs.CV 2020-06 accept novelty 7.0

The DFDC dataset is the largest public collection of face-swapped videos and supports detectors that generalize to in-the-wild deepfakes.
LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection
cs.CV 2026-04 unverdicted novelty 6.0

LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and p...
Vision Transformers Need Registers
cs.CV 2023-09 unverdicted novelty 6.0

Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
Language Models (Mostly) Know What They Know
cs.CL 2022-07 unverdicted novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
A General Language Assistant as a Laboratory for Alignment
cs.CL 2021-12 conditional novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
Sharpness-Aware Minimization for Efficiently Improving Generalization
cs.LG 2020-10 conditional novelty 6.0

SAM solves a min-max problem to locate flat low-loss regions, improving generalization on CIFAR, ImageNet and label-noise tasks.
Exploring Clustering Capability of Inpainting Model Embeddings for Pattern-based Individual Identification
cs.CV 2026-05 unverdicted novelty 5.0

Inpainting auxiliary task improves clustering of embeddings for individual zebrafish identification based on skin patterns.
DBLP: Phase-Aware Bounded-Loss Transport for Burst-Resilient Distributed ML Training
cs.LG 2026-05 unverdicted novelty 5.0

DBLP is a training-phase-aware bounded-loss transport protocol that reduces end-to-end distributed ML training time by 24.4% on average (up to 33.9%) and achieves up to 5.88x communication speedup during microbursts w...
Equinox: Decentralized Scheduling for Hardware-Aware Orbital Intelligence
cs.DC 2026-04 unverdicted novelty 5.0

Equinox uses a barrier-function-derived marginal cost to enable value-based adaptive scheduling and neighbor offloading in energy-constrained satellite constellations, yielding 20-31% throughput gains and higher batte...
Non-identifiability of Explanations from Model Behavior in Deep Networks of Image Authenticity Judgments
cs.CV 2026-04 unverdicted novelty 5.0

Models predicting human authenticity judgments produce inconsistent attribution maps across architectures, showing that explanations are non-identifiable.
Generalizable Deepfake Detection Based on Forgery-aware Layer Masking and Multi-artifact Subspace Decomposition
cs.CV 2026-01 unverdicted novelty 5.0

FMSD improves cross-dataset generalization in deepfake detection by using gradient-based layer masking to select forgery-sensitive weights and SVD to split them into preserved semantic and multiple learnable artifact ...
Multi-Dataset Cross-Domain Knowledge Distillation for Unified Medical Image Segmentation, Classification, and Detection
cs.CV 2026-05 unverdicted novelty 4.0

A multi-dataset cross-domain knowledge distillation approach improves unified performance on medical image segmentation, classification, and detection by transferring domain-invariant features from a joint teacher mod...
DYMAPIA: A Multi-Domain Framework for Detecting AI-based Video Manipulation
cs.CV 2026-04 unverdicted novelty 4.0

DYMAPIA builds dynamic anomaly masks from Fourier spectra, texture, edges, and optical flow to guide a lightweight DistXCNet classifier, reporting over 99% accuracy and F1 on FF++, Celeb-DF, and VDFD.
Image Classification via Random Dilated Convolution with Multi-Branch Feature Extraction and Context Excitation
cs.CV 2026-04 unverdicted novelty 3.0

RDCNet reports state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewoof by combining random dilated convolutions with multi-branch and attention modules.
Real-Time Cellist Postural Evaluation With On-Device Computer Vision
cs.HC 2026-04 unverdicted novelty 3.0

Cello Evaluator is a real-time postural feedback system for cellists running on current Android phones via on-device computer vision, validated as user-friendly by experts.
Robust Deepfake Detection, NTIRE 2026 Challenge: Report
cs.CV 2026-04 unverdicted novelty 2.0

The NTIRE 2026 challenge finds that large foundation models combined with ensembles and degradation-aware training produce the most robust deepfake detectors.
Scaling Laws for Neural Language Models
cs.LG 2020-01

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · cited by 19 Pith papers · 7 internal anchors

[1]

L., Jacobs, D

Berg, T., Liu, J., Woo Lee, S., Alexander, M. L., Jacobs, D. W., and Belhumeur, P. N. Birdsnap: Large-scale fine-grained visual categorization of birds. CVPR, pp.\ 2011--2018, 2014

work page 2011
[2]

Food-101--mining discriminative components with random forests

Bossard, L., Guillaumin, M., and Van Gool, L. Food-101--mining discriminative components with random forests. ECCV, pp.\ 446--461, 2014

work page 2014
[3]

Proxylessnas: Direct neural architecture search on target task and hardware

Cai, H., Zhu, L., and Han, S. Proxylessnas: Direct neural architecture search on target task and hardware. ICLR, 2019

work page 2019
[4]

Xception: Deep learning with depthwise separable convolutions

Chollet, F. Xception: Deep learning with depthwise separable convolutions. CVPR, pp.\ 1610--02357, 2017

work page 2017
[5]

D., Zoph, B., Mane, D., Vasudevan, V., and Le, Q

Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., and Le, Q. V. Autoaugment: Learning augmentation policies from data. CVPR, 2019

work page 2019
[6]

Sigmoid-weighted linear units for neural network function approximation in reinforcement learning

Elfwing, S., Uchibe, E., and Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks, 107: 0 3--11, 2018

work page 2018
[7]

Squeezenext: Hardware-aware neural network design

Gholami, A., Kwon, K., Wu, B., Tai, Z., Yue, X., Jin, P., Zhao, S., and Keutzer, K. Squeezenext: Hardware-aware neural network design. ECV Workshop at CVPR'18, 2018

work page 2018
[8]

Han, S., Mao, H., and Dally, W. J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. ICLR, 2016

work page 2016
[9]

Deep residual learning for image recognition

He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. CVPR, pp.\ 770--778, 2016

work page 2016
[10]

Mask r-cnn

He, K., Gkioxari, G., Doll \'a r, P., and Girshick, R. Mask r-cnn. ICCV, pp.\ 2980--2988, 2017

work page 2017
[11]

Amc: Automl for model compression and acceleration on mobile devices

He, Y., Lin, J., Liu, Z., Wang, H., Li, L.-J., and Han, S. Amc: Automl for model compression and acceleration on mobile devices. ECCV, 2018

work page 2018
[12]

Gaussian Error Linear Units (GELUs)

Hendrycks, D. and Gimpel, K. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[13]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

Squeeze-and-excitation networks

Hu, J., Shen, L., and Sun, G. Squeeze-and-excitation networks. CVPR, 2018

work page 2018
[15]

Huang, G., Sun, Y., Liu, Z., Sedra, D., and Weinberger, K. Q. Deep networks with stochastic depth. ECCV, pp.\ 646--661, 2016

work page 2016
[16]

Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. Densely connected convolutional networks. CVPR, 2017

work page 2017
[17]

V., and Chen, Z

Huang, Y., Cheng, Y., Chen, D., Lee, H., Ngiam, J., Le, Q. V., and Chen, Z. Gpipe: Efficient training of giant neural networks using pipeline parallelism. arXiv preprint arXiv:1808.07233, 2018

work page arXiv 2018
[18]

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., and Keutzer, K. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and < 0.5 mb model size. arXiv preprint arXiv:1602.07360, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[19]

and Szegedy, C

Ioffe, S. and Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML, pp.\ 448--456, 2015

work page 2015
[20]

Kornblith, S., Shlens, J., and Le, Q. V. Do better imagenet models transfer better? CVPR, 2019

work page 2019
[21]

Collecting a large-scale dataset of fine-grained cars

Krause, J., Deng, J., Stark, M., and Fei-Fei, L. Collecting a large-scale dataset of fine-grained cars. Second Workshop on Fine-Grained Visual Categorizatio, 2013

work page 2013
[22]

and Hinton, G

Krizhevsky, A. and Hinton, G. Learning multiple layers of features from tiny images. Technical Report, 2009

work page 2009
[23]

Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In NIPS, pp.\ 1097--1105, 2012

work page 2012
[24]

and Jegelka, S

Lin, H. and Jegelka, S. Resnet with one-neuron hidden layers is a universal approximator. NeurIPS, pp.\ 6172--6181, 2018

work page 2018
[25]

Feature pyramid networks for object detection

Lin, T.-Y., Doll \'a r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. Feature pyramid networks for object detection. CVPR, 2017

work page 2017
[26]

Progressive neural architecture search

Liu, C., Zoph, B., Shlens, J., Hua, W., Li, L.-J., Fei-Fei, L., Yuille, A., Huang, J., and Murphy, K. Progressive neural architecture search. ECCV, 2018

work page 2018
[27]

The expressive power of neural networks: A view from the width

Lu, Z., Pu, H., Wang, F., Hu, Z., and Wang, L. The expressive power of neural networks: A view from the width. NeurIPS, 2018

work page 2018
[28]

Shufflenet v2: Practical guidelines for efficient cnn architecture design

Ma, N., Zhang, X., Zheng, H.-T., and Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. ECCV, 2018

work page 2018
[29]

Exploring the Limits of Weakly Supervised Pretraining

Mahajan, D., Girshick, R., Ramanathan, V., He, K., Paluri, M., Li, Y., Bharambe, A., and van der Maaten, L. Exploring the limits of weakly supervised pretraining. arXiv preprint arXiv:1805.00932, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[30]

Fine-Grained Visual Classification of Aircraft

Maji, S., Rahtu, E., Kannala, J., Blaschko, M., and Vedaldi, A. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[31]

Domain Adaptive Transfer Learning with Specialist Models

Ngiam, J., Peng, D., Vasudevan, V., Kornblith, S., Le, Q. V., and Pang, R. Domain adaptive transfer learning with specialist models. arXiv preprint arXiv:1811.07056, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[32]

and Zisserman, A

Nilsback, M.-E. and Zisserman, A. Automated flower classification over a large number of classes. ICVGIP, pp.\ 722--729, 2008

work page 2008
[33]

M., Vedaldi, A., Zisserman, A., and Jawahar, C

Parkhi, O. M., Vedaldi, A., Zisserman, A., and Jawahar, C. Cats and dogs. CVPR, pp.\ 3498--3505, 2012

work page 2012
[34]

On the expressive power of deep neural networks

Raghu, M., Poole, B., Kleinberg, J., Ganguli, S., and Sohl-Dickstein, J. On the expressive power of deep neural networks. ICML, 2017

work page 2017
[35]

Ramachandran, P., Zoph, B., and Le, Q. V. Searching for activation functions. arXiv preprint arXiv:1710.05941, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[36]

Real, E., Aggarwal, A., Huang, Y., and Le, Q. V. Regularized evolution for image classifier architecture search. AAAI, 2019

work page 2019
[37]

Imagenet large scale visual recognition challenge

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115 0 (3): 0 211--252, 2015

work page 2015
[38]

Mobilenetv2: Inverted residuals and linear bottlenecks

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. CVPR, 2018

work page 2018
[39]

and Shashua, A

Sharir, O. and Shashua, A. On the expressive power of overlapping architectures of deep learning. ICLR, 2018

work page 2018
[40]

Dropout: a simple way to prevent neural networks from overfitting

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15 0 (1): 0 1929--1958, 2014

work page 1929
[41]

Going deeper with convolutions

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. Going deeper with convolutions. CVPR, pp.\ 1--9, 2015

work page 2015
[42]

Rethinking the inception architecture for computer vision

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. Rethinking the inception architecture for computer vision. CVPR, pp.\ 2818--2826, 2016

work page 2016
[43]

Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. A. Inception-v4, inception-resnet and the impact of residual connections on learning. AAAI, 4: 0 12, 2017

work page 2017
[44]

Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., and Le, Q. V. MnasNet : Platform-aware neural architecture search for mobile. CVPR, 2019

work page 2019
[45]

Aggregated residual transformations for deep neural networks

Xie, S., Girshick, R., Doll \'a r, P., Tu, Z., and He, K. Aggregated residual transformations for deep neural networks. CVPR, pp.\ 5987--5995, 2017

work page 2017
[46]

Netadapt: Platform-aware neural network adaptation for mobile applications

Yang, T.-J., Howard, A., Chen, B., Zhang, X., Go, A., Sze, V., and Adam, H. Netadapt: Platform-aware neural network adaptation for mobile applications. ECCV, 2018

work page 2018
[47]

and Komodakis, N

Zagoruyko, S. and Komodakis, N. Wide residual networks. BMVC, 2016

work page 2016
[48]

C., and Lin, D

Zhang, X., Li, Z., Loy, C. C., and Lin, D. Polynet: A pursuit of structural diversity in very deep networks. CVPR, pp.\ 3900--3908, 2017

work page 2017
[49]

Shufflenet: An extremely efficient convolutional neural network for mobile devices

Zhang, X., Zhou, X., Lin, M., and Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. CVPR, 2018

work page 2018
[50]

Learning deep features for discriminative localization

Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A. Learning deep features for discriminative localization. CVPR, pp.\ 2921--2929, 2016

work page 2016
[51]

and Le, Q

Zoph, B. and Le, Q. V. Neural architecture search with reinforcement learning. ICLR, 2017

work page 2017
[52]

Zoph, B., Vasudevan, V., Shlens, J., and Le, Q. V. Learning transferable architectures for scalable image recognition. CVPR, 2018

work page 2018