NetTailor: Tuning the Architecture, Not Just the Weights
Pith reviewed 2026-05-25 12:33 UTC · model grok-4.3
The pith
NetTailor reuses pre-trained CNN layers as blocks to build task-specific networks whose size scales with task difficulty.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper shows that a transfer procedure can adapt network architecture, not merely weights, by combining pre-trained layers as universal blocks with task-specific layers, training the assembly to mimic a strong unconstrained CNN's activations, and applying soft-attention over blocks together with complexity regularization; this produces networks whose complexity automatically matches task difficulty while preserving classification accuracy and cross-task parameter sharing.
What carries the argument
NetTailor assembly of pre-trained universal blocks with task-specific layers, trained under activation mimicking, soft-attention, and complexity regularization.
If this is right
- Simple tasks receive networks with substantially fewer parameters than complex tasks.
- Classification accuracy stays comparable to training an unconstrained CNN from the same starting point.
- The same pre-trained blocks can be reused across many tasks without duplication.
- A single platform can host more tasks simultaneously because per-task memory and compute scale with difficulty.
Where Pith is reading between the lines
- The block-selection process could be extended to decide at inference time which blocks to activate based on input difficulty.
- The same modular reuse might combine with quantization or hardware-aware search to further reduce deployment cost on edge devices.
- If the attention mechanism generalizes, it could serve as a diagnostic for which layers matter most for a given task family.
Load-bearing premise
Training under the combined objectives of classification error, activation mimicking, soft attention, and complexity regularization will select blocks that remain accurate for the target task without separate validation of the resulting architecture.
What would settle it
Applying the procedure to tasks of graded difficulty and finding no reliable reduction in selected blocks or parameters for the simpler tasks relative to standard fine-tuning.
Figures
read the original abstract
Real-world applications of object recognition often require the solution of multiple tasks in a single platform. Under the standard paradigm of network fine-tuning, an entirely new CNN is learned per task, and the final network size is independent of task complexity. This is wasteful, since simple tasks require smaller networks than more complex tasks, and limits the number of tasks that can be solved simultaneously. To address these problems, we propose a transfer learning procedure, denoted NetTailor, in which layers of a pre-trained CNN are used as universal blocks that can be combined with small task-specific layers to generate new networks. Besides minimizing classification error, the new network is trained to mimic the internal activations of a strong unconstrained CNN, and minimize its complexity by the combination of 1) a soft-attention mechanism over blocks and 2) complexity regularization constraints. In this way, NetTailor can adapt the network architecture, not just its weights, to the target task. Experiments show that networks adapted to simple tasks, such as character or traffic sign recognition, become significantly smaller than those adapted to hard tasks, such as fine-grained recognition. More importantly, due to the modular nature of the procedure, this reduction in network complexity is achieved without compromise of either parameter sharing across tasks, or classification accuracy.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes NetTailor, a transfer learning procedure that treats layers of a pre-trained CNN as universal blocks which are combined with small task-specific layers to form task-adapted networks. Training minimizes classification error while also mimicking internal activations of an unconstrained CNN and applying soft-attention over blocks plus complexity regularization, thereby adapting architecture (not just weights) to task complexity. Experiments are claimed to show that networks for simple tasks (character/traffic-sign recognition) become significantly smaller than those for hard tasks (fine-grained recognition) while preserving accuracy and cross-task parameter sharing.
Significance. If the experimental claims hold with proper controls, the method offers a practical route to multi-task deployment on constrained hardware by producing task-dependent network sizes without retraining separate CNNs from scratch or sacrificing modularity. The combination of activation mimicking with attention-based block selection is a distinctive element that could influence subsequent work on modular transfer learning.
major comments (2)
- [Experiments] The central claim that size reduction is task-driven (rather than regularization-driven) requires evidence that the same complexity penalty produces different block selections across tasks of varying difficulty. The experiments section reports smaller architectures for simple tasks but does not include a control in which block selection is performed with task-independent regularization strength or with attention disabled, leaving open the possibility that observed size differences are artifacts of the regularization term alone.
- [Method (joint loss and attention mechanism)] The procedure relies on soft-attention during training to select blocks, yet no post-training enumeration or comparison (e.g., exhaustive search over subsets of the universal blocks at matched complexity) is provided to establish that the learned selection is near-optimal for the observed accuracy. Without this, the claim that the joint loss reliably yields task-appropriate architectures remains unverified.
minor comments (2)
- [Method] Notation for the soft-attention weights and the complexity regularization term should be introduced with explicit equations rather than descriptive text only.
- [Abstract and Experiments] The abstract states experimental support for size reduction without accuracy loss, but quantitative tables or figures are not referenced in the provided summary; ensure all reported accuracies and parameter counts appear with standard deviations or multiple runs.
Simulated Author's Rebuttal
We thank the referee for the detailed review and constructive suggestions. Below we address each major comment in turn.
read point-by-point responses
-
Referee: [Experiments] The central claim that size reduction is task-driven (rather than regularization-driven) requires evidence that the same complexity penalty produces different block selections across tasks of varying difficulty. The experiments section reports smaller architectures for simple tasks but does not include a control in which block selection is performed with task-independent regularization strength or with attention disabled, leaving open the possibility that observed size differences are artifacts of the regularization term alone.
Authors: The regularization strength is indeed task-independent, as stated in the method. The differences arise because the attention mechanism is optimized jointly with the task-specific classification and mimicry losses, which vary with task difficulty. To strengthen this, we will include an additional experiment disabling the attention mechanism (forcing all blocks) and show that the resulting networks do not exhibit the same task-dependent size variation. We will also clarify this in the text. Thus, a revision will be made. revision: yes
-
Referee: [Method (joint loss and attention mechanism)] The procedure relies on soft-attention during training to select blocks, yet no post-training enumeration or comparison (e.g., exhaustive search over subsets of the universal blocks at matched complexity) is provided to establish that the learned selection is near-optimal for the observed accuracy. Without this, the claim that the joint loss reliably yields task-appropriate architectures remains unverified.
Authors: We acknowledge that an exhaustive search would provide stronger evidence of optimality. However, such a search is intractable for the number of blocks considered (exponential complexity). The soft-attention serves as a practical, differentiable proxy trained end-to-end with the joint loss. We will add a paragraph discussing this limitation and note that the method prioritizes practicality and modularity over guaranteed optimality. No revision to the experiments is planned as the requested comparison is not feasible. revision: no
- Verification of near-optimality via exhaustive enumeration of block subsets, which is computationally infeasible.
Circularity Check
No circularity; new training procedure with independent objectives
full rationale
The paper defines NetTailor as a transfer procedure that combines classification loss with explicit new terms (activation mimicking of an unconstrained CNN, soft-attention over blocks, and complexity regularization). These objectives are introduced as design choices rather than derived from or reduced to quantities already present in the inputs or prior fits. The central experimental claim—that simpler tasks yield smaller adapted networks—is presented as an observed outcome of training, not a prediction forced by construction or by self-citation. No equations, uniqueness theorems, or ansatzes are shown to collapse the result back onto the method's own fitted parameters. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Layers from a pre-trained CNN can serve as universal blocks that combine with task-specific layers for new tasks.
- domain assumption Mimicking internal activations of a strong unconstrained CNN plus complexity regularization produces accurate yet smaller task-adapted networks.
Reference graph
Works this paper leans on
-
[1]
Expert gate: Lifelong learning with a network of experts
Rahaf Aljundi, Punarjay Chakravarty, and Tinne Tuytelaars. Expert gate: Lifelong learning with a network of experts. In Computer V ision and P attern Recognition (CVPR), 2017
work page 2017
-
[2]
Jimmy Ba and Rich Caruana. Do deep nets really need to be deep? In Advances in Neural Information Processing Systems (NeurIPS), 2014
work page 2014
-
[3]
Y oshua Bengio, J´erˆome Louradour, Ronan Collobert, and Jason W eston. Curriculum learning. InInternational Conference on Machine Learning (ICML), 2009
work page 2009
-
[4]
Cristian Bucilu, Rich Caruana, and Alexandru Niculescu-Mizil. Model compression. InInternational Conference on Knowledge Discovery and Data Mining (SIGKDD), 2006
work page 2006
-
[5]
Learning complexity-aware cascades for deep pedestrian detection
Zhaowei Cai, Mohammad Saberian, and Nuno V asconcelos. Learning complexity-aware cascades for deep pedestrian detection. In International Conference on Computer V ision (ICCV), 2015
work page 2015
-
[6]
Rich Caruana. Multitask learning. Machine learning, 28(1):41–75, 1997
work page 1997
-
[7]
Net2Net: Accelerating Learning via Knowledge Transfer
Tianqi Chen, Ian Goodfellow, and Jonathon Shlens. Net2net: Accelerating learning via knowledge transfer. arXiv preprint arXiv:1511.05641, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[8]
Describing textures in the wild
Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea V edaldi. Describing textures in the wild. In Computer V ision and P attern Recognition (CVPR), 2014
work page 2014
-
[9]
Imagenet: A large-scale hierarchical image database
Jia Deng, W ei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer V ision and P attern Recognition (CVPR), 2009
work page 2009
-
[10]
How do humans sketch objects? ACM Trans
Mathias Eitz, James Hays, and Marc Alexa. How do humans sketch objects? ACM Trans. Graph. (Proc. SIGGRAPH) , 31(4):44:1–44:10, 2012
work page 2012
-
[11]
M. Everingham, L. V an Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The P ASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal- network.org/challenges/VOC/voc2012/workshop/index.html
work page 2012
-
[12]
Y ang Fan, Fei Tian, T ao Qin, Xiang-Y ang Li, and Tie-Y an Liu. Learning to teach. arXiv preprint arXiv:1805.03643, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[13]
Spatially adaptive computation time for residual networks
Michael Figurnov, Maxwell D Collins, Y ukun Zhu, Li Zhang, Jonathan Huang, Dmitry V etrov, and Ruslan Salakhutdinov. Spatially adaptive computation time for residual networks. In Computer V ision and P attern Recognition (CVPR), 2017
work page 2017
-
[14]
Unsupervised domain adaptation by backpropagation
Y aroslav Ganin and Victor Lempitsky. Unsupervised domain adaptation by backpropagation. InInternational Conference on Machine Learning (ICML), 2015
work page 2015
-
[15]
Rich feature hierarchies for accurate object detection and semantic segmentation
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer V ision and P attern Recognition (CVPR), 2014
work page 2014
-
[16]
An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks
Ian J Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Y oshua Bengio. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[17]
Learning both weights and connections for efficient neural network
Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. InAdvances in Neural Information Processing Systems (NeurIPS), 2015
work page 2015
-
[18]
Second order derivatives for network pruning: Optimal brain surgeon
Babak Hassibi and David G Stork. Second order derivatives for network pruning: Optimal brain surgeon. InAdvances in Neural Information Processing Systems (NeurIPS), 1993
work page 1993
-
[19]
Kaiming He, Georgia Gkioxari, Piotr Doll´ar, and Ross Girshick. Mask r-cnn. In International Conference on Computer V ision (ICCV), 2017
work page 2017
-
[20]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InComputer V ision and P attern Recognition (CVPR), 2016
work page 2016
-
[21]
Distilling the Knowledge in a Neural Network
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[22]
Multi-scale dense net- works for resource efficient image classification
Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and Kilian Q W einberger. Multi-scale dense net- works for resource efficient image classification. InInternational Conference on Learning Representations (ICLR), 2018
work page 2018
-
[23]
Multi-task learning using uncertainty to weigh losses for scene geometry and seman- tics
Alex Kendall, Y arin Gal, and Roberto Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and seman- tics. In Computer V ision and P attern Recognition (CVPR), 2018
work page 2018
-
[24]
Overcoming catastrophic forgetting in neural networks.National Academy of Sciences, 2017
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel V eness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks.National Academy of Sciences, 2017
work page 2017
-
[25]
3D object representations for fine-grained categorization
Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3D object representations for fine-grained categorization. In International IEEE W orkshop on 3D Representation and Recognition (3dRR), 2013
work page 2013
-
[26]
Learning multiple layers of features from tiny images
Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. T echnical report, 2009
work page 2009
-
[27]
Im- agenet classification with deep convolutional neural networks
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Im- agenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NeurIPS), 2012
work page 2012
-
[28]
Human-level concept learning through probabilistic program induction
Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenen- baum. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015
work page 2015
-
[29]
Y ann LeCun, John S Denker, and Sara A Solla. Optimal brain damage. In Advances in Neural Information Processing Systems (NeurIPS), 1990
work page 1990
-
[30]
Overcoming catastrophic forgetting by incremental moment matching
Sang-W oo Lee, Jin-Hwa Kim, Jaehyun Jun, Jung-W oo Ha, and Byoung-T ak Zhang. Overcoming catastrophic forgetting by incremental moment matching. In Advances in Neural Information Processing Systems (NeurIPS), 2017
work page 2017
-
[31]
Pruning Filters for Efficient ConvNets
Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[32]
Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE Transactions on P attern Analysis and Machine Intelligence (TP AMI), 2017
work page 2017
-
[33]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European Conference on Computer V ision (ECCV), 2014
work page 2014
-
[34]
Progressive neural architecture search
Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, W ei Hua, Li-Jia Li, Li Fei-Fei, Alan Y uille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. In European Conference on Computer V ision (ECCV), 2018
work page 2018
-
[35]
DARTS: Differentiable Architecture Search
Hanxiao Liu, Karen Simonyan, and Yiming Y ang. Darts: Dif- ferentiable architecture search.arXiv preprint arXiv:1806.09055, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[36]
Gradient episodic memory for continual learning
David Lopez-Paz et al. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems (NeurIPS), 2017
work page 2017
-
[37]
Fine-Grained Visual Classification of Aircraft
Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew Blaschko, and Andrea V edaldi. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[38]
Piggyback: Adapting a single network to multiple tasks by learning to mask weights
Arun Mallya, Dillon Davis, and Svetlana Lazebnik. Piggyback: Adapting a single network to multiple tasks by learning to mask weights. In European Conference on Computer V ision (ECCV), 2018
work page 2018
-
[39]
Packnet: Adding multiple tasks to a single network by iterative pruning
Arun Mallya and Svetlana Lazebnik. Packnet: Adding multiple tasks to a single network by iterative pruning. InConference on Computer V ision and P attern Recognition (CVPR), 2018
work page 2018
-
[40]
Teacher-Student Curriculum Learning
T ambet Matiisen, A vital Oliver, T aco Cohen, and John Schul- man. Teacher-student curriculum learning. arXiv preprint arXiv:1707.00183, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[41]
Cross-stitch networks for multi-task learning
Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. Cross-stitch networks for multi-task learning. In Computer V ision and P attern Recognition (CVPR), 2016
work page 2016
-
[42]
T om Mitchell, William Cohen, Estevam Hruschka, Partha T alukdar, Bo Y ang, Justin Betteridge, Andrew Carlson, B Dalvi, Matt Gardner, Bryan Kisiel, et al. Never-ending learning. Communications of the ACM, 61(5):103–115, 2018
work page 2018
-
[43]
Pruning Convolutional Neural Networks for Resource Efficient Inference
Pavlo Molchanov, Stephen T yree, Tero Karras, Timo Aila, and Jan Kautz. Pruning convolutional neural networks for resource efficient inference.arXiv preprint arXiv:1611.06440, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[44]
Stefan Munder and Dariu M Gavrila. An experimental study on pedestrian classification.IEEE Transactions on P attern Analysis and Machine Intelligence (TP AMI), 28(11):1863–1868, 2006
work page 2006
-
[45]
Reading digits in natural images with unsupervised feature learning
Y uval Netzer, T ao W ang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. In Advances in Neural Information Processing Systems W orkshop (NeurIPS), 2011
work page 2011
-
[46]
M-E. Nilsback and A. Zisserman. Automated flower classifi- cation over a large number of classes. InIndian Conference on Computer V ision, Graphics and Image Processing, 2008
work page 2008
-
[47]
Rajeev Ranjan, Vishal M Patel, and Rama Chellappa. Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Transactions on P attern Analysis and Machine Intelligence (TP AMI), 2017
work page 2017
-
[48]
Encoder based lifelong learning
Amal Rannen, Rahaf Aljundi, Matthew B Blaschko, and Tinne Tuytelaars. Encoder based lifelong learning. In International Conference on Computer V ision (ICCV), 2017
work page 2017
-
[49]
Learning multiple visual domains with residual adapters
Sylvestre-Alvise Rebuffi, Hakan Bilen, and Andrea V edaldi. Learning multiple visual domains with residual adapters. In Advances in Neural Information Processing Systems (NeurIPS), 2017
work page 2017
-
[50]
Efficient parametrization of multi-domain deep neural networks
Sylvestre-Alvise Rebuffi, Hakan Bilen, and Andrea V edaldi. Efficient parametrization of multi-domain deep neural networks. In Computer V ision and P attern Recognition (CVPR), 2018
work page 2018
-
[51]
icarl: Incremental classifier and representation learning
Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. icarl: Incremental classifier and representation learning. In Computer V ision and P attern Recognition (CVPR), 2017
work page 2017
-
[52]
Y ou only look once: Unified, real-time object detection
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. Y ou only look once: Unified, real-time object detection. In Computer V ision and P attern Recognition (CVPR), 2016
work page 2016
-
[53]
FitNets: Hints for Thin Deep Nets
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Y oshua Bengio. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[54]
Amir Rosenfeld and John K Tsotsos. Incremental learning through deep adaptation.IEEE Transactions on P attern Analysis and Machine Intelligence (P AMI), 2018
work page 2018
-
[55]
Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. Progressive neural networks.arXiv preprint arXiv:1606.04671, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[56]
Large-scale Classification of Fine-Art Paintings: Learning The Right Metric on The Right Feature
Babak Saleh and Ahmed Elgammal. Large-scale classification of fine-art paintings: Learning the right metric on the right feature. arXiv preprint arXiv:1505.00855, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[57]
Facenet: A unified embedding for face recognition and clustering
Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In Computer V ision and P attern Recognition (CVPR), 2015
work page 2015
-
[58]
Cnn features off-the-shelf: an astounding baseline for recognition
Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. Cnn features off-the-shelf: an astounding baseline for recognition. In Computer V ision and P attern Recognition W orkshops (CVPRw), 2014
work page 2014
-
[59]
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012
work page internal anchor Pith review Pith/arXiv arXiv 2012
-
[60]
Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural networks, 32:323–332, 2012
work page 2012
-
[61]
Branchynet: Fast inference via early exiting from deep neural networks
Surat Teerapittayanon, Bradley McDanel, and HT Kung. Branchynet: Fast inference via early exiting from deep neural networks. In International Conference on P attern Recognition (ICPR), 2016
work page 2016
-
[62]
A lifelong learning perspective for mobile robot control
Sebastian Thrun. A lifelong learning perspective for mobile robot control. InIntelligent Robots and Systems, pages 201–214, 1995
work page 1995
-
[63]
Simultaneous deep transfer across domains and tasks
Eric Tzeng, Judy Hoffman, Trevor Darrell, and Kate Saenko. Simultaneous deep transfer across domains and tasks. In International Conference on Computer V ision (ICCV), 2015
work page 2015
-
[64]
Adversarial discriminative domain adaptation
Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. In Computer V ision and P attern Recognition (CVPR), 2017
work page 2017
-
[65]
Convolutional networks with adaptive inference graphs
Andreas V eit and Serge Belongie. Convolutional networks with adaptive inference graphs. InEuropean Conference on Computer V ision (ECCV), 2018
work page 2018
-
[66]
Rapid object detection using a boosted cascade of simple features
Paul Viola, Michael Jones, et al. Rapid object detection using a boosted cascade of simple features. InComputer V ision and P attern Recognition (CVPR), 2001
work page 2001
-
[67]
C. W ah, S. Branson, P . W elinder, P . Perona, and S. Belongie. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of T echnology, 2011
work page 2011
-
[68]
Fan Y ang, W ongun Choi, and Y uanqing Lin. Exploit all the layers: Fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In Computer V ision and P attern Recognition (CVPR), 2016
work page 2016
-
[69]
Lifelong Learning with Dynamically Expandable Networks
Jaehong Y oon, Eunho Y ang, et al. Lifelong learning with dynam- ically expandable networks.arXiv preprint arXiv:1708.01547, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[70]
Jason Y osinski, Jeff Clune, Y oshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? InAdvances in Neural Information Processing Systems (NeurIPS), 2014
work page 2014
-
[71]
Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. In British Machine V ision Conference (BMVC), 2016
work page 2016
-
[72]
Facial landmark detection by deep multi-task learning
Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou T ang. Facial landmark detection by deep multi-task learning. In European Conference on Computer V ision (ECCV), 2014
work page 2014
-
[73]
Learning deep features for scene recognition using places database
Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio T orralba, and Aude Oliva. Learning deep features for scene recognition using places database. In Advances in Neural Information Processing Systems (NeurIPS), 2014
work page 2014
-
[74]
Less is more: T owards compact cnns
Hao Zhou, Jose M Alvarez, and Fatih Porikli. Less is more: T owards compact cnns. InEuropean Conference on Computer V ision (ECCV), 2016
work page 2016
-
[75]
Neural Architecture Search with Reinforcement Learning
Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning.arXiv preprint arXiv:1611.01578, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[76]
Learning Transferable Architectures for Scalable Image Recognition
Barret Zoph, Vijay V asudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architectures for scalable image recognition. arXiv preprint arXiv:1707.07012, 2(6), 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.