Context-Aware Multipath Networks

Dumindu Tissera; Kumara Kahatapitiya; Ranga Rodrigo; Rukshan Wijesinghe; Subha Fernando

arxiv: 1907.11519 · v1 · pith:7UP42OJ2new · submitted 2019-07-26 · 💻 cs.CV · cs.LG

Context-Aware Multipath Networks

Dumindu Tissera , Kumara Kahatapitiya , Rukshan Wijesinghe , Subha Fernando , Ranga Rodrigo This is my paper

Pith reviewed 2026-05-24 15:52 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords context-aware networksmulti-path networksdata-dependent routingmulti-task learningimage classificationsemantic segmentationneural network generalization

0 comments

The pith

CAMNet uses data-dependent routing between parallel paths to allocate shared or separate resources according to input context, outperforming equivalent single-path and multi-path networks on classification and pixel-labeling tasks for one,

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Neural networks often require costly widening, deepening, or separate models to handle variations within a dataset or across multiple datasets. This paper presents Context-Aware Multipath Network (CAMNet), a multi-path architecture whose routing between parallel tensors is learned from the input data itself. The routing decides end-to-end which resources stay common across contexts and which become domain-specific. Experiments across image classification and pixel-labeling tasks show CAMNet exceeds the accuracy of single-path networks, standard multi-path networks, and deeper single-path networks, whether the datasets are presented individually, sequentially, or combined.

Core claim

CAMNet is a multi-path neural network with data-dependent routing between parallel tensors that captures variations within individual datasets and across multiple different datasets both simultaneously and sequentially. The routing mechanism controls information flow end-to-end and determines which resources remain common or become domain-specific, enabling the model to surpass the performance of equivalent single-path, multi-path, and deeper single-path networks on classification and pixel-labeling tasks.

What carries the argument

Data-dependent routing between parallel tensors, which learns to regulate information flow and allocate common versus domain-specific resources without manual task-specific redesign.

If this is right

The same architecture can be trained on single datasets, sequential datasets, or combined datasets without redesign.
Routing decisions emerge from the data rather than from hand-crafted rules or post-training adjustments.
Resource sharing occurs automatically when contexts are compatible and separation occurs when they are not.
The approach applies to both classification and dense prediction tasks without separate heads or branches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the routing generalizes, multi-task and continual-learning setups could reduce reliance on separate models or ensembles.
The mechanism might extend to other input modalities where context varies, such as video or sensor streams.
Training dynamics of the routing gates could be studied to understand when sharing versus separation is preferred.

Load-bearing premise

Data-dependent routing between parallel tensors can be learned end-to-end so that it reliably allocates common versus domain-specific resources across datasets without task-specific architectural changes.

What would settle it

A controlled experiment in which CAMNet is trained on the same dataset combinations and sequential schedules as the baselines yet fails to exceed their accuracy on both classification and pixel-labeling metrics would falsify the central performance claim.

Figures

Figures reproduced from arXiv: 1907.11519 by Dumindu Tissera, Kumara Kahatapitiya, Ranga Rodrigo, Rukshan Wijesinghe, Subha Fernando.

**Figure 2.** Figure 2: Operations carried out by a 3-dimensional tensor [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Constructing layer l + 1 based on the predictions and gates computed by layer l See Eq. 3 for a certain context so that each tensor is more likely to be allocated to a single tensor in the subsequent layer [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: Accuracy change when trained on a subsequent [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 7.** Figure 7: Route Visualization in image-to-image trans [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Weights Histograms of forward convolutions af [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

read the original abstract

Making a single network effectively address diverse contexts---learning the variations within a dataset or multiple datasets---is an intriguing step towards achieving generalized intelligence. Existing approaches of deepening, widening, and assembling networks are not cost effective in general. In view of this, networks which can allocate resources according to the context of the input and regulate flow of information across the network are effective. In this paper, we present Context-Aware Multipath Network (CAMNet), a multi-path neural network with data-dependant routing between parallel tensors. We show that our model performs as a generalized model capturing variations in individual datasets and multiple different datasets, both simultaneously and sequentially. CAMNet surpasses the performance of classification and pixel-labeling tasks in comparison with the equivalent single-path, multi-path, and deeper single-path networks, considering datasets individually, sequentially, and in combination. The data-dependent routing between tensors in CAMNet enables the model to control the flow of information end-to-end, deciding which resources to be common or domain-specific.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CAMNet adds learned data-dependent routing across parallel paths to handle single or multiple datasets in CV tasks, but the abstract supplies no numbers, datasets, or ablations to check the performance claims.

read the letter

The main point is that this paper describes CAMNet, a multi-path network where routing between parallel tensors is learned from the data to decide what stays shared versus domain-specific. It targets the practical issue of making one model handle variations inside a dataset or across several without always deepening or widening a single path. The routing is trained end-to-end, which fits existing lines of work on conditional computation and multi-task nets. That framing is clear and the motivation is straightforward: deepening and widening are not always efficient, so context-aware allocation makes sense for classification and pixel-labeling. The paper also notes the model can be used on datasets one at a time, in sequence, or combined, which is a reasonable test setup for multi-domain work. The routing idea itself is a legitimate extension rather than a complete reinvention. On the downside, the abstract states that CAMNet surpasses single-path, multi-path, and deeper single-path baselines but gives no accuracies, no dataset names, no error bars, and no ablation on whether the routing actually learns useful allocations. Without those details the central empirical claim stays unverified. The assumption that end-to-end routing will reliably separate common and specific resources is stated but not shown in the supplied text. No equations appear, so there is no hidden circularity to flag. This work is aimed at computer-vision researchers who already use or build multi-path or conditional models and want a routing variant for multi-dataset settings. A reader already familiar with the prior multi-path literature would get the most from it. The idea is coherent enough that a serious referee should see the full experiments and implementation details before any decision. I would send it to peer review rather than desk-reject, because the routing mechanism is a plausible next step even if the current write-up needs more evidence to stand on its own.

Referee Report

1 major / 0 minor

Summary. The paper introduces Context-Aware Multipath Network (CAMNet), a multi-path architecture with data-dependent routing between parallel tensors. It claims that this enables the model to capture variations within individual datasets as well as across multiple datasets (both sequentially and in combination), outperforming equivalent single-path, multi-path, and deeper single-path networks on classification and pixel-labeling tasks. The routing is presented as allowing end-to-end control over common versus domain-specific resources.

Significance. If the empirical performance claims hold under rigorous validation, the work could offer a practical route toward more parameter-efficient generalized networks that adapt resource allocation to input context without requiring task-specific redesigns or post-hoc adjustments.

major comments (1)

[Abstract] Abstract: the central empirical claim that 'CAMNet surpasses the performance of classification and pixel-labeling tasks in comparison with the equivalent single-path, multi-path, and deeper single-path networks' is stated without any quantitative results, error bars, dataset names/sizes, or ablation studies. This absence prevents verification of the magnitude or reliability of the reported gains and is load-bearing for the paper's primary contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the feedback. We address the single major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central empirical claim that 'CAMNet surpasses the performance of classification and pixel-labeling tasks in comparison with the equivalent single-path, multi-path, and deeper single-path networks' is stated without any quantitative results, error bars, dataset names/sizes, or ablation studies. This absence prevents verification of the magnitude or reliability of the reported gains and is load-bearing for the paper's primary contribution.

Authors: We agree that the abstract as currently written states the performance claim without supporting quantitative details. The experiments section of the manuscript reports specific results (accuracy deltas, dataset names and sizes, and ablations) that substantiate the claim, but these are not summarized in the abstract. In the revised version we will expand the abstract to include key quantitative results with error bars where available, explicit dataset references, and a brief mention of the ablation studies, while remaining within length limits. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical neural architecture (CAMNet) whose central claims are performance comparisons on classification and segmentation tasks across datasets. No derivation chain, equations, or first-principles results are described in the abstract or reader summary. Claims rest on experimental outcomes rather than any reduction of a 'prediction' to fitted inputs or self-citation. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked. The architecture is introduced as a design choice whose value is assessed externally via benchmarks, satisfying the condition for a self-contained empirical result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The paper introduces no mathematical axioms or derivations. The central claim rests on the empirical effectiveness of learned routing, which implicitly assumes standard neural network training assumptions (gradient descent, backpropagation) and the existence of sufficient training data to learn the routing decisions.

invented entities (1)

data-dependent routing between parallel tensors no independent evidence
purpose: To decide end-to-end which resources are common or domain-specific across contexts
This is the core new mechanism introduced in the abstract; no independent evidence outside the model performance is provided.

pith-pipeline@v0.9.0 · 5716 in / 1305 out tokens · 22775 ms · 2026-05-24T15:52:22.964640+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 4 internal anchors

[1]

Bucilua, R

C. Bucilua, R. Caruana, and A. Niculescu-Mizil. Model compression. In Proc. ACM SIGKDD Int. Conf. on Knowl. Discovery and Mata Mining, pages 535–541, 2006

work page 2006
[2]

Y . Bulatov. Notmnist dataset. Google (Books/OCR), Tech. Rep.[Online]. Available: http://yaroslavvb. blogspot. it/2011/09/notmnist-dataset. html, 2011

work page 2011
[3]

Deep Learning for Classical Japanese Literature

T. Clanuwat, M. Bober-Irizar, A. Kitamoto, A. Lamb, K. Ya- mamoto, and D. Ha. Deep learning for classical Japanese literature. arXiv preprint arXiv:1812.01718, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[4]

Cordts, M

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understanding. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , pages 3213–3223, 2016

work page 2016
[5]

Donahue, Y

J. Donahue, Y . Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional ac- tivation feature for generic visual recognition. In Proc. Int. Conf. Mach. Learn., pages 647–655, 2014

work page 2014
[6]

Fritsch, T

J. Fritsch, T. Kuehnl, and A. Geiger. A new performance measure and evaluation benchmark for road detection algo- rithms. In Int. Conf. on Intell. Transp. Syst. , pages 1693– 1700, 2013

work page 2013
[7]

Y . Gao, J. Ma, M. Zhao, W. Liu, and A. L. Yuille. Nddr-cnn: Layerwise feature fusing in multi-task cnns by neural dis- criminative dimensionality reduction. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019

work page 2019
[8]

Girshick, J

R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich fea- ture hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 580–587, 2014

work page 2014
[9]

D. Ha, A. Dai, and Q. V . Le. Hypernetworks. In Proc. Int. Conf. Learn. Representations, 2017

work page 2017
[10]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 770–778, 2016

work page 2016
[11]

Distilling the Knowledge in a Neural Network

G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[12]

G. E. Hinton, S. Sabour, and N. Frosst. Matrix capsules with EM routing. In Proc. Int. Conf. Learn. Representations, 2018

work page 2018
[13]

J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation net- works. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 7132–7141, June 2018

work page 2018
[14]

Isola, J.-Y

P. Isola, J.-Y . Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial networks. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , pages 5967– 5976, 2017

work page 2017
[15]

Z. Kang, K. Grauman, and F. Sha. Learning with whom to share in multi-task feature learning. InProc. Int. Conf. Mach. Learn., volume 2, page 4, 2011

work page 2011
[16]

Kirkpatrick, R

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Des- jardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al. Overcoming catastrophic for- getting in neural networks. Proc. of the Nat. Academy of Sci., 114(13):3521–3526, 2017

work page 2017
[17]

Krizhevsky, G

A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009

work page 2009
[18]

LeCun, L

Y . LeCun, L. Bottou, Y . Bengio, P. Haffner, et al. Gradient- based learning applied to document recognition. Proc. of the IEEE, 86(11):2278–2324, 1998

work page 1998
[19]

Li and D

Z. Li and D. Hoiem. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell., 40(12):2935–2947, 2018

work page 2018
[20]

Y . Lu, A. Kumar, S. Zhai, Y . Cheng, T. Javidi, and R. Feris. Fully-adaptive feature sharing in multi-task networks with applications in person attribute classiﬁcation. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5334–5343, 2017

work page 2017
[21]

Mallya, D

A. Mallya, D. Davis, and S. Lazebnik. Piggyback: Adapt- ing a single network to multiple tasks by learning to mask weights. In Eur. Conf. Comput. Vis., pages 67–82, 2018

work page 2018
[22]

Mallya and S

A. Mallya and S. Lazebnik. Packnet: Adding multiple tasks to a single network by iterative pruning. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 7765–7773, 2018

work page 2018
[23]

Meyerson and R

E. Meyerson and R. Miikkulainen. Beyond shared hierar- chies: Deep multitask learning through soft layer ordering. In ICLR, 2018

work page 2018
[24]

Misra, A

I. Misra, A. Shrivastava, A. Gupta, and M. Hebert. Cross- stitch networks for multi-task learning. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , pages 3994–4003, June 2016

work page 2016
[25]

Netzer, T

Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y . Ng. Reading digits in natural images with unsupervised fea- ture learning. NIPS workshop on deep learning and unsu- pervised feature learning, 2011:5, 2011

work page 2011
[26]

Pentina, V

A. Pentina, V . Sharmanska, and C. H. Lampert. Curriculum learning of multiple tasks. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 5492–5500, 2015

work page 2015
[27]

Rebufﬁ, H

S.-A. Rebufﬁ, H. Bilen, and A. Vedaldi. Learning multiple visual domains with residual adapters. InAdvances in Neural Information Processing Systems, pages 506–516, 2017

work page 2017
[28]

Rebufﬁ, A

S.-A. Rebufﬁ, A. Kolesnikov, G. Sperl, and C. H. Lampert. icarl: Incremental classiﬁer and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017

work page 2001
[29]

Ronneberger, P

O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolu- tional networks for biomedical image segmentation. In Int. Conf. on Medical Image Comput. and Computer-Assisted In- tervention, pages 234–241. Springer, 2015

work page 2015
[30]

Rosenbaum, T

C. Rosenbaum, T. Klinger, and M. Riemer. Routing net- works: Adaptive selection of non-linear functions for multi- task learning. In ICLR, 2018

work page 2018
[31]

Ruder, J

S. Ruder, J. Bingel, I. Augenstein, and A. Søgaard. La- tent multi-task architecture learning. In Proc. of AAAI 2019, February 2019

work page 2019
[32]

Sabour, N

S. Sabour, N. Frosst, and G. E. Hinton. Dynamic routing between capsules. In Adv. in Neural Inf. Process. Syst., pages 3856–3866, 2017

work page 2017
[33]

R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. arXiv preprint arXiv:1505.00387, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[34]

Tyle ˇcek and R

R. Tyle ˇcek and R. ˇS´ara. Spatial pattern templates for recog- nition of objects with regular structure. In German Conf. on Pattern Recognit., pages 364–374, Saarbrucken, Germany, 2013

work page 2013
[35]

Veit and S

A. Veit and S. Belongie. Convolutional networks with adap- tive inference graphs. In Eur. Conf. Comput. Vis., pages 3– 18, 2018

work page 2018
[36]

L. Wan, M. Zeiler, S. Zhang, Y . Le Cun, and R. Fergus. Reg- ularization of neural networks using dropconnect. In Proc. Int. Conf. Mach. Learn., pages 1058–1066, 2013

work page 2013
[37]

X. Wang, D. Fouhey, and A. Gupta. Designing deep net- works for surface normal estimation. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 539–547, 2015

work page 2015
[38]

Z. Wu, T. Nagarajan, A. Kumar, S. Rennie, L. S. Davis, K. Grauman, and R. Feris. Blockdrop: Dynamic inference paths in residual networks. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 8817–8826, 2018

work page 2018
[39]

H. Xiao, K. Rasul, and R. V ollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning al- gorithms. arXiv preprint arXiv:1708.07747, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[40]

D. Xu, W. Ouyang, X. Wang, and N. Sebe. Pad-net: Multi- tasks guided prediction-and-distillation network for simulta- neous depth estimation and scene parsing. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 675–684, 2018

work page 2018
[41]

Zhang, P

Z. Zhang, P. Luo, C. C. Loy, and X. Tang. Facial landmark detection by deep multi-task learning. In Eur. Conf. Comput. Vis., pages 94–108. Springer, 2014

work page 2014

[1] [1]

Bucilua, R

C. Bucilua, R. Caruana, and A. Niculescu-Mizil. Model compression. In Proc. ACM SIGKDD Int. Conf. on Knowl. Discovery and Mata Mining, pages 535–541, 2006

work page 2006

[2] [2]

Y . Bulatov. Notmnist dataset. Google (Books/OCR), Tech. Rep.[Online]. Available: http://yaroslavvb. blogspot. it/2011/09/notmnist-dataset. html, 2011

work page 2011

[3] [3]

Deep Learning for Classical Japanese Literature

T. Clanuwat, M. Bober-Irizar, A. Kitamoto, A. Lamb, K. Ya- mamoto, and D. Ha. Deep learning for classical Japanese literature. arXiv preprint arXiv:1812.01718, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[4] [4]

Cordts, M

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understanding. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , pages 3213–3223, 2016

work page 2016

[5] [5]

Donahue, Y

J. Donahue, Y . Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional ac- tivation feature for generic visual recognition. In Proc. Int. Conf. Mach. Learn., pages 647–655, 2014

work page 2014

[6] [6]

Fritsch, T

J. Fritsch, T. Kuehnl, and A. Geiger. A new performance measure and evaluation benchmark for road detection algo- rithms. In Int. Conf. on Intell. Transp. Syst. , pages 1693– 1700, 2013

work page 2013

[7] [7]

Y . Gao, J. Ma, M. Zhao, W. Liu, and A. L. Yuille. Nddr-cnn: Layerwise feature fusing in multi-task cnns by neural dis- criminative dimensionality reduction. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019

work page 2019

[8] [8]

Girshick, J

R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich fea- ture hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 580–587, 2014

work page 2014

[9] [9]

D. Ha, A. Dai, and Q. V . Le. Hypernetworks. In Proc. Int. Conf. Learn. Representations, 2017

work page 2017

[10] [10]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 770–778, 2016

work page 2016

[11] [11]

Distilling the Knowledge in a Neural Network

G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[12] [12]

G. E. Hinton, S. Sabour, and N. Frosst. Matrix capsules with EM routing. In Proc. Int. Conf. Learn. Representations, 2018

work page 2018

[13] [13]

J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation net- works. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 7132–7141, June 2018

work page 2018

[14] [14]

Isola, J.-Y

P. Isola, J.-Y . Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial networks. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , pages 5967– 5976, 2017

work page 2017

[15] [15]

Z. Kang, K. Grauman, and F. Sha. Learning with whom to share in multi-task feature learning. InProc. Int. Conf. Mach. Learn., volume 2, page 4, 2011

work page 2011

[16] [16]

Kirkpatrick, R

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Des- jardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al. Overcoming catastrophic for- getting in neural networks. Proc. of the Nat. Academy of Sci., 114(13):3521–3526, 2017

work page 2017

[17] [17]

Krizhevsky, G

A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009

work page 2009

[18] [18]

LeCun, L

Y . LeCun, L. Bottou, Y . Bengio, P. Haffner, et al. Gradient- based learning applied to document recognition. Proc. of the IEEE, 86(11):2278–2324, 1998

work page 1998

[19] [19]

Li and D

Z. Li and D. Hoiem. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell., 40(12):2935–2947, 2018

work page 2018

[20] [20]

Y . Lu, A. Kumar, S. Zhai, Y . Cheng, T. Javidi, and R. Feris. Fully-adaptive feature sharing in multi-task networks with applications in person attribute classiﬁcation. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5334–5343, 2017

work page 2017

[21] [21]

Mallya, D

A. Mallya, D. Davis, and S. Lazebnik. Piggyback: Adapt- ing a single network to multiple tasks by learning to mask weights. In Eur. Conf. Comput. Vis., pages 67–82, 2018

work page 2018

[22] [22]

Mallya and S

A. Mallya and S. Lazebnik. Packnet: Adding multiple tasks to a single network by iterative pruning. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 7765–7773, 2018

work page 2018

[23] [23]

Meyerson and R

E. Meyerson and R. Miikkulainen. Beyond shared hierar- chies: Deep multitask learning through soft layer ordering. In ICLR, 2018

work page 2018

[24] [24]

Misra, A

I. Misra, A. Shrivastava, A. Gupta, and M. Hebert. Cross- stitch networks for multi-task learning. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , pages 3994–4003, June 2016

work page 2016

[25] [25]

Netzer, T

Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y . Ng. Reading digits in natural images with unsupervised fea- ture learning. NIPS workshop on deep learning and unsu- pervised feature learning, 2011:5, 2011

work page 2011

[26] [26]

Pentina, V

A. Pentina, V . Sharmanska, and C. H. Lampert. Curriculum learning of multiple tasks. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 5492–5500, 2015

work page 2015

[27] [27]

Rebufﬁ, H

S.-A. Rebufﬁ, H. Bilen, and A. Vedaldi. Learning multiple visual domains with residual adapters. InAdvances in Neural Information Processing Systems, pages 506–516, 2017

work page 2017

[28] [28]

Rebufﬁ, A

S.-A. Rebufﬁ, A. Kolesnikov, G. Sperl, and C. H. Lampert. icarl: Incremental classiﬁer and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017

work page 2001

[29] [29]

Ronneberger, P

O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolu- tional networks for biomedical image segmentation. In Int. Conf. on Medical Image Comput. and Computer-Assisted In- tervention, pages 234–241. Springer, 2015

work page 2015

[30] [30]

Rosenbaum, T

C. Rosenbaum, T. Klinger, and M. Riemer. Routing net- works: Adaptive selection of non-linear functions for multi- task learning. In ICLR, 2018

work page 2018

[31] [31]

Ruder, J

S. Ruder, J. Bingel, I. Augenstein, and A. Søgaard. La- tent multi-task architecture learning. In Proc. of AAAI 2019, February 2019

work page 2019

[32] [32]

Sabour, N

S. Sabour, N. Frosst, and G. E. Hinton. Dynamic routing between capsules. In Adv. in Neural Inf. Process. Syst., pages 3856–3866, 2017

work page 2017

[33] [33]

R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. arXiv preprint arXiv:1505.00387, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[34] [34]

Tyle ˇcek and R

R. Tyle ˇcek and R. ˇS´ara. Spatial pattern templates for recog- nition of objects with regular structure. In German Conf. on Pattern Recognit., pages 364–374, Saarbrucken, Germany, 2013

work page 2013

[35] [35]

Veit and S

A. Veit and S. Belongie. Convolutional networks with adap- tive inference graphs. In Eur. Conf. Comput. Vis., pages 3– 18, 2018

work page 2018

[36] [36]

L. Wan, M. Zeiler, S. Zhang, Y . Le Cun, and R. Fergus. Reg- ularization of neural networks using dropconnect. In Proc. Int. Conf. Mach. Learn., pages 1058–1066, 2013

work page 2013

[37] [37]

X. Wang, D. Fouhey, and A. Gupta. Designing deep net- works for surface normal estimation. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 539–547, 2015

work page 2015

[38] [38]

Z. Wu, T. Nagarajan, A. Kumar, S. Rennie, L. S. Davis, K. Grauman, and R. Feris. Blockdrop: Dynamic inference paths in residual networks. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 8817–8826, 2018

work page 2018

[39] [39]

H. Xiao, K. Rasul, and R. V ollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning al- gorithms. arXiv preprint arXiv:1708.07747, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[40] [40]

D. Xu, W. Ouyang, X. Wang, and N. Sebe. Pad-net: Multi- tasks guided prediction-and-distillation network for simulta- neous depth estimation and scene parsing. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 675–684, 2018

work page 2018

[41] [41]

Zhang, P

Z. Zhang, P. Luo, C. C. Loy, and X. Tang. Facial landmark detection by deep multi-task learning. In Eur. Conf. Comput. Vis., pages 94–108. Springer, 2014

work page 2014