Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning

Dapeng Tao; Gongfan Fang; Mingli Song; Sihui Luo; Xinchao Wang; Yao Hu

arxiv: 1906.10546 · v1 · pith:GSU27DNLnew · submitted 2019-06-24 · 💻 cs.LG · cs.CV· stat.ML

Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning

Sihui Luo , Xinchao Wang , Gongfan Fang , Yao Hu , Dapeng Tao , Mingli Song This is my paper

Pith reviewed 2026-05-25 17:15 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML

keywords knowledge amalgamationheterogeneous networkscommon feature learningteacher-student learningmodel distillationdeep network reusemulti-task student model

0 comments

The pith

A student model integrates knowledge from heterogeneous teacher networks by mapping their features into a common space without original annotations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines reusing multiple pre-trained deep networks of varying architectures, each specialized for different tasks, without access to their training annotations. It introduces a method to transform features from these teachers into one shared space. The student network is trained to imitate all the mapped features at once, allowing it to combine the full knowledge from every teacher. Experiments show the resulting student achieves strong performance and can exceed the teachers on their individual specialized tasks.

Core claim

The central claim is that mapping features from heterogeneous teacher networks into a common space and training a student to imitate them all produces a lightweight multitalented model that amalgamates the intact knowledge from all teachers without any human annotations.

What carries the argument

The common feature learning scheme that transforms teacher features into a shared space for simultaneous student imitation.

If this is right

The student can handle multiple distinct tasks simultaneously in one lightweight network.
No access to original training data or annotations is required for the amalgamation process.
The student can exceed individual teacher performance on the teachers' own tasks.
Heterogeneous pre-trained models can be consolidated without retraining from scratch.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The common space idea could extend to combining models trained on entirely different data modalities.
This approach might serve as an alternative to model ensembles by producing a single efficient network.
Further tests on tasks with greater domain shift could clarify when the common space mapping breaks down.

Load-bearing premise

Mapping features from different teacher architectures into one common space is enough for the student to fully capture and combine their knowledge without labels.

What would settle it

A test case where the student, after training on the common feature mappings, still underperforms the teachers on their specialized tasks despite adequate optimization.

Figures

Figures reproduced from arXiv: 1906.10546 by Dapeng Tao, Gongfan Fang, Mingli Song, Sihui Luo, Xinchao Wang, Yao Hu.

**Figure 1.** Figure 1: Illustration of the proposed heterogeneous knowledge amalgamation approach. The student and the teachers may have different [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Illustration of the common feature learning block. Two types of losses are imposed: the first on the distances between the transformed features of the student (target net) and those of the teachers in the common space, and second on the reconstruction errors of the teachers’ features mapping back to the original space. teachers and then project them into a learned common feature space, in which the studen… view at source ↗

**Figure 3.** Figure 3: Visualization of the features of the teachers and those [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

An increasing number of well-trained deep networks have been released online by researchers and developers, enabling the community to reuse them in a plug-and-play way without accessing the training annotations. However, due to the large number of network variants, such public-available trained models are often of different architectures, each of which being tailored for a specific task or dataset. In this paper, we study a deep-model reusing task, where we are given as input pre-trained networks of heterogeneous architectures specializing in distinct tasks, as teacher models. We aim to learn a multitalented and light-weight student model that is able to grasp the integrated knowledge from all such heterogeneous-structure teachers, again without accessing any human annotation. To this end, we propose a common feature learning scheme, in which the features of all teachers are transformed into a common space and the student is enforced to imitate them all so as to amalgamate the intact knowledge. We test the proposed approach on a list of benchmarks and demonstrate that the learned student is able to achieve very promising performance, superior to those of the teachers in their specialized tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable scheme for turning heterogeneous pre-trained teachers into one label-free student via common-space feature imitation, but the claim of beating each teacher on its own task rests on an unproven assumption that the mapping keeps all task signals intact.

read the letter

The main contribution is a common feature learning step that projects activations from different-architecture teachers into one shared space, then trains the student to match those projected features. This lets you reuse public models for distinct tasks without touching the original annotations or data. The setup targets a real scenario where you have a bunch of released networks and want a single lighter model that covers all of them. That practical framing is the part worth noting; most distillation work either assumes matching architectures or needs the source data back. The abstract reports that the resulting student reaches promising numbers and even exceeds the individual teachers on their specialized tasks, which would be the payoff if it holds. The experiments are run on standard benchmarks, so at least the evaluation is on familiar ground. The soft spot is exactly the one the stress-test note flags. Nothing in the description shows that the common-space mapping is guaranteed to be information-preserving for every teacher. If the alignment objective trades off task-specific discriminative cues, the student cannot recover the full specialized knowledge and therefore cannot outperform the original teachers. Without bounds, ablations on information loss, or controls that vary how different the teachers are, the superiority result looks like it could be sensitive to the particular model choices and datasets. A reader would want to see those checks before treating the outperformance as reliable. This paper is for people already working on knowledge distillation or model reuse in vision. Someone tracking variants in that subfield would get a clear picture of this particular common-space trick and could decide whether to build on it. It deserves peer review because the problem is concrete, the method is simple to implement, and the experiments are on real benchmarks; a referee can sort out whether the central claim survives closer inspection of the mapping and the numbers.

Referee Report

2 major / 2 minor

Summary. The paper proposes a common feature learning approach for amalgamating knowledge from multiple heterogeneous pre-trained teacher networks (specialized on distinct tasks) into a single lightweight student model without access to training annotations. Teacher features are mapped into a shared space, the student is trained to imitate the mapped activations, and experiments on benchmarks are reported to show the student achieving superior performance to the individual teachers on their specialized tasks.

Significance. If the empirical results hold under scrutiny, the work offers a practical route to reusing public heterogeneous models for multi-task capability in a label-free setting, which addresses a growing need as more pre-trained networks become available. The experimental demonstration on benchmarks provides concrete evidence of feasibility for the amalgamation task.

major comments (2)

[Method] The central claim that the student recovers 'intact knowledge' from each teacher and exceeds each teacher on its specialized task rests on the sufficiency of the common-space mapping. No analysis, bound, or ablation is supplied showing that task-specific discriminative information is preserved rather than lost or entangled during the mapping (which is optimized only for alignment).
[Experiments] The headline performance claim requires that the student be evaluated on each teacher's original specialized task (with the same test distribution) and that gains are attributable to amalgamation rather than other factors. The reported benchmark results need explicit per-teacher task breakdowns and controls to confirm this.

minor comments (2)

[Method] Notation for the common feature space and the imitation loss could be clarified with an explicit equation relating the mapped teacher features to the student output.
[Abstract] The abstract's phrasing 'very promising performance' is imprecise; quantitative margins over the teachers should be stated directly.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address the two major comments point by point below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Method] The central claim that the student recovers 'intact knowledge' from each teacher and exceeds each teacher on its specialized task rests on the sufficiency of the common-space mapping. No analysis, bound, or ablation is supplied showing that task-specific discriminative information is preserved rather than lost or entangled during the mapping (which is optimized only for alignment).

Authors: We agree that a formal bound or information-theoretic analysis would strengthen the central claim. The current manuscript relies on the empirical observation that the student, trained to imitate the aligned features, outperforms each teacher on its original task. To directly address the concern, we will add (i) an ablation that replaces the learned common-space mapping with direct feature imitation or random projection and (ii) quantitative measurements of class-separability (e.g., linear-probe accuracy) before and after mapping. These additions will appear in a new subsection of the experiments. revision: yes
Referee: [Experiments] The headline performance claim requires that the student be evaluated on each teacher's original specialized task (with the same test distribution) and that gains are attributable to amalgamation rather than other factors. The reported benchmark results need explicit per-teacher task breakdowns and controls to confirm this.

Authors: All reported numbers were obtained by evaluating the student on the exact test splits used by each teacher. We will revise the experimental section to present a per-teacher breakdown table that lists, for every teacher, its own accuracy, the student’s accuracy on the same task, and two controls: (a) a student trained only on that teacher and (b) a student trained with a non-amalgamation baseline. This will make the attribution to amalgamation explicit. revision: yes

Circularity Check

0 steps flagged

No circularity; method is empirical proposal without derivation chain

full rationale

The paper proposes a common feature learning scheme to amalgamate knowledge from heterogeneous teachers into a student model. No equations, derivations, or predictions are presented that could reduce to inputs by construction. No self-citations, fitted parameters renamed as predictions, or uniqueness theorems are invoked in the provided text. The central claim rests on empirical results on benchmarks rather than a mathematical chain that is self-referential. This is a standard non-finding for a methods paper without visible formal derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only view yields no explicit free parameters, axioms, or invented entities; the approach is described at a conceptual level without mathematical or implementation specifics.

pith-pipeline@v0.9.0 · 5737 in / 936 out tokens · 20111 ms · 2026-05-25T17:15:31.971390+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 3 internal anchors

[1]

A theory of learning from different domains

[Ben-David et al., 2010] Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. A theory of learning from different domains. Machine learning , 79(1-2):151–175,

work page 2010
[2]

Arcface: Additive angular margin loss for deep face recognition,

[Deng et al., 2018] Jiankang Deng, Jia Guo, Xue Niannan, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. arXiv:1801.07698,

work page arXiv 2018
[3]

Dietterich

[Dietterich, 2000] Thomas G. Dietterich. Ensemble methods in machine learning. In Multiple Classiﬁer Systems, pages 1–15, Berlin, Heidelberg,

work page 2000
[4]

[Gong et al., 2016] Mingming Gong, Kun Zhang, Tongliang Liu, Dacheng Tao, Clark Glymour, and Bernhard Sch¨olkopf

Springer. [Gong et al., 2016] Mingming Gong, Kun Zhang, Tongliang Liu, Dacheng Tao, Clark Glymour, and Bernhard Sch¨olkopf. Domain adaptation with conditional transfer- able components. In IEEE Conference on Machine Learn- ing,

work page 2016
[5]

A kernel two-sample test

[Gretton et al., 2012] Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Sch ¨olkopf, and Alexander Smola. A kernel two-sample test. Journal of Machine Learning Research, 13(Mar):723–773,

work page 2012
[6]

Neural network ensembles

[Hansen and Peter, 1990] Lars Kai Hansen and Salamon Pe- ter. Neural network ensembles. IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 12(10):993–1001, October

work page 1990
[7]

Deep residual learning for image recog- nition

[He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recog- nition. In IEEE Conference on Computer Vision and Pat- tern Recognition, pages 770–778,

work page 2016
[8]

Distilling the Knowledge in a Neural Network

[Hinton et al., 2015] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531,

work page internal anchor Pith review Pith/arXiv arXiv 2015
[9]

Huang, Marwan Mattar, Tamara Berg, and Eric Learned-Miller

[Huang et al., 2008] Gary B. Huang, Marwan Mattar, Tamara Berg, and Eric Learned-Miller. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on Faces in ’Real-Life’ Images: Detection, Alignment, and Recognition, October

work page 2008
[10]

Weinberger

[Huang et al., 2016] Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q. Weinberger. Deep networks with stochastic depth. InProceedings of the 14th European conference on computer vision, pages 646–661,

work page 2016
[11]

Adam: A Method for Stochastic Optimization

[Kingma and Ba, 2014] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv 2014
[12]

Agedb: The ﬁrst manually collected, in-the-wild age database

[Moschoglou et al., 2017] Stylianos Moschoglou, Athana- sios Papaioannou, Christos Sagonas, Jiankang Deng, and Stefanos Zafeiriou. Agedb: The ﬁrst manually collected, in-the-wild age database. In IEEE Conference on Com- puter Vision and Pattern Recognition Workshops,

work page 2017
[13]

Fitnets: Hints for thin deep nets

[Romero et al., 2015] Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets. In The International Conference on Learning Represen- tations,

work page 2015
[14]

Chen, Carlos Castillo, Vishal M

[Sengupta et al., 2016] Soumyadip Sengupta, Jun-Cheng. Chen, Carlos Castillo, Vishal M. Patel, Rama Chellappa, and David. W. Jacobs. Frontal to proﬁle face veriﬁcation in the wild. In IEEE Winter Conference on Applications of Computer Vision, pages 1–9, March

work page 2016
[15]

Amalgamating knowledge towards comprehensive classiﬁcation

[Shen et al., 2019] Chengchao Shen, Xinchao Wang, Jie Song, Li Sun, and Mingli Song. Amalgamating knowledge towards comprehensive classiﬁcation. In Proceedings of the 33th AAAI Conference on Artiﬁcial Intelligence,

work page 2019
[16]

Swapout: Learning an ensemble of deep archi- tectures

[Singh et al., 2016] Saurabh Singh, Derek Hoiem, and David Forsyth. Swapout: Learning an ensemble of deep archi- tectures. In Proceedings of the 29th Advances in Neural Information Processing Systems, pages 28–36

work page 2016
[17]

Hin- ton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov

[Srivastava et al., 2014] Nitish Srivastava, Geoffrey E. Hin- ton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overﬁtting. Journal of Machine Learning Research, 15:1929–1958,

work page 2014
[18]

Going deeper with convolutions

[Szegedy et al., 2015] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabi- novich. Going deeper with convolutions. In IEEE Con- ference on computer vision and pattern recognition, pages 1–9,

work page 2015
[19]

Regularization of neu- ral networks using dropconnect

[Wan et al., 2013] Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, and Rob Fergus. Regularization of neu- ral networks using dropconnect. In Proceedings of the 30th International Conference on Machine Learning, vol- ume 28, pages 1058–1066,

work page 2013
[20]

Subspaces indexing model on Grassmann manifold for image search

[Wang et al., 2011] Xinchao Wang, Zhu Li, and Dacheng Tao. Subspaces indexing model on Grassmann manifold for image search. IEEE Transactions on Image Process- ing, 20(9):2627–2635,

work page 2011
[21]

Progressive blockwise knowledge distillation for neural network acceleration

[Wang et al., 2018] Hui Wang, Hanbin Zhao, Xi Li, and Xu Tan. Progressive blockwise knowledge distillation for neural network acceleration. In Proceedings of the 27th International Joint Conference on Artiﬁcal Intelligence , pages 2769–2775,

work page 2018
[22]

Student becoming the master: Knowledge amalgamation for joint scene parsing, depth estimation, and more

[Ye et al., 2019] Jingwen Ye, Yixin Ji, Xinchao Wang, Kairi Ou, Dapeng Tao, and Mingli Song. Student becoming the master: Knowledge amalgamation for joint scene parsing, depth estimation, and more. In IEEE Conference on Com- puter Vision and Pattern Recognition,

work page 2019
[23]

[Yi et al., 2014] Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z. Li. Learning face representation from scratch. arXiv:1411.7923,

work page internal anchor Pith review Pith/arXiv arXiv 2014
[24]

On compressing deep models by low rank and sparse decomposition

[Yu et al., 2017] Xiyu Yu, Tongliang Liu, Xinchao Wang, and Dacheng Tao. On compressing deep models by low rank and sparse decomposition. In IEEE Conference on Computer Vision and Pattern Recognition , pages 67–76,

work page 2017
[25]

Taskonomy: Disentangling task transfer learn- ing

[Zamir et al., 2018] Amir Zamir, Alexander Sax, William Shen, Leonidas Guibas, Jitendra Malik, and Silvio Savarese. Taskonomy: Disentangling task transfer learn- ing. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3712–3722, June 2018

work page 2018

[1] [1]

A theory of learning from different domains

[Ben-David et al., 2010] Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. A theory of learning from different domains. Machine learning , 79(1-2):151–175,

work page 2010

[2] [2]

Arcface: Additive angular margin loss for deep face recognition,

[Deng et al., 2018] Jiankang Deng, Jia Guo, Xue Niannan, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. arXiv:1801.07698,

work page arXiv 2018

[3] [3]

Dietterich

[Dietterich, 2000] Thomas G. Dietterich. Ensemble methods in machine learning. In Multiple Classiﬁer Systems, pages 1–15, Berlin, Heidelberg,

work page 2000

[4] [4]

[Gong et al., 2016] Mingming Gong, Kun Zhang, Tongliang Liu, Dacheng Tao, Clark Glymour, and Bernhard Sch¨olkopf

Springer. [Gong et al., 2016] Mingming Gong, Kun Zhang, Tongliang Liu, Dacheng Tao, Clark Glymour, and Bernhard Sch¨olkopf. Domain adaptation with conditional transfer- able components. In IEEE Conference on Machine Learn- ing,

work page 2016

[5] [5]

A kernel two-sample test

[Gretton et al., 2012] Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Sch ¨olkopf, and Alexander Smola. A kernel two-sample test. Journal of Machine Learning Research, 13(Mar):723–773,

work page 2012

[6] [6]

Neural network ensembles

[Hansen and Peter, 1990] Lars Kai Hansen and Salamon Pe- ter. Neural network ensembles. IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 12(10):993–1001, October

work page 1990

[7] [7]

Deep residual learning for image recog- nition

[He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recog- nition. In IEEE Conference on Computer Vision and Pat- tern Recognition, pages 770–778,

work page 2016

[8] [8]

Distilling the Knowledge in a Neural Network

[Hinton et al., 2015] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531,

work page internal anchor Pith review Pith/arXiv arXiv 2015

[9] [9]

Huang, Marwan Mattar, Tamara Berg, and Eric Learned-Miller

[Huang et al., 2008] Gary B. Huang, Marwan Mattar, Tamara Berg, and Eric Learned-Miller. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on Faces in ’Real-Life’ Images: Detection, Alignment, and Recognition, October

work page 2008

[10] [10]

Weinberger

[Huang et al., 2016] Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q. Weinberger. Deep networks with stochastic depth. InProceedings of the 14th European conference on computer vision, pages 646–661,

work page 2016

[11] [11]

Adam: A Method for Stochastic Optimization

[Kingma and Ba, 2014] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv 2014

[12] [12]

Agedb: The ﬁrst manually collected, in-the-wild age database

[Moschoglou et al., 2017] Stylianos Moschoglou, Athana- sios Papaioannou, Christos Sagonas, Jiankang Deng, and Stefanos Zafeiriou. Agedb: The ﬁrst manually collected, in-the-wild age database. In IEEE Conference on Com- puter Vision and Pattern Recognition Workshops,

work page 2017

[13] [13]

Fitnets: Hints for thin deep nets

[Romero et al., 2015] Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets. In The International Conference on Learning Represen- tations,

work page 2015

[14] [14]

Chen, Carlos Castillo, Vishal M

[Sengupta et al., 2016] Soumyadip Sengupta, Jun-Cheng. Chen, Carlos Castillo, Vishal M. Patel, Rama Chellappa, and David. W. Jacobs. Frontal to proﬁle face veriﬁcation in the wild. In IEEE Winter Conference on Applications of Computer Vision, pages 1–9, March

work page 2016

[15] [15]

Amalgamating knowledge towards comprehensive classiﬁcation

[Shen et al., 2019] Chengchao Shen, Xinchao Wang, Jie Song, Li Sun, and Mingli Song. Amalgamating knowledge towards comprehensive classiﬁcation. In Proceedings of the 33th AAAI Conference on Artiﬁcial Intelligence,

work page 2019

[16] [16]

Swapout: Learning an ensemble of deep archi- tectures

[Singh et al., 2016] Saurabh Singh, Derek Hoiem, and David Forsyth. Swapout: Learning an ensemble of deep archi- tectures. In Proceedings of the 29th Advances in Neural Information Processing Systems, pages 28–36

work page 2016

[17] [17]

Hin- ton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov

[Srivastava et al., 2014] Nitish Srivastava, Geoffrey E. Hin- ton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overﬁtting. Journal of Machine Learning Research, 15:1929–1958,

work page 2014

[18] [18]

Going deeper with convolutions

[Szegedy et al., 2015] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabi- novich. Going deeper with convolutions. In IEEE Con- ference on computer vision and pattern recognition, pages 1–9,

work page 2015

[19] [19]

Regularization of neu- ral networks using dropconnect

[Wan et al., 2013] Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, and Rob Fergus. Regularization of neu- ral networks using dropconnect. In Proceedings of the 30th International Conference on Machine Learning, vol- ume 28, pages 1058–1066,

work page 2013

[20] [20]

Subspaces indexing model on Grassmann manifold for image search

[Wang et al., 2011] Xinchao Wang, Zhu Li, and Dacheng Tao. Subspaces indexing model on Grassmann manifold for image search. IEEE Transactions on Image Process- ing, 20(9):2627–2635,

work page 2011

[21] [21]

Progressive blockwise knowledge distillation for neural network acceleration

[Wang et al., 2018] Hui Wang, Hanbin Zhao, Xi Li, and Xu Tan. Progressive blockwise knowledge distillation for neural network acceleration. In Proceedings of the 27th International Joint Conference on Artiﬁcal Intelligence , pages 2769–2775,

work page 2018

[22] [22]

Student becoming the master: Knowledge amalgamation for joint scene parsing, depth estimation, and more

[Ye et al., 2019] Jingwen Ye, Yixin Ji, Xinchao Wang, Kairi Ou, Dapeng Tao, and Mingli Song. Student becoming the master: Knowledge amalgamation for joint scene parsing, depth estimation, and more. In IEEE Conference on Com- puter Vision and Pattern Recognition,

work page 2019

[23] [23]

[Yi et al., 2014] Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z. Li. Learning face representation from scratch. arXiv:1411.7923,

work page internal anchor Pith review Pith/arXiv arXiv 2014

[24] [24]

On compressing deep models by low rank and sparse decomposition

[Yu et al., 2017] Xiyu Yu, Tongliang Liu, Xinchao Wang, and Dacheng Tao. On compressing deep models by low rank and sparse decomposition. In IEEE Conference on Computer Vision and Pattern Recognition , pages 67–76,

work page 2017

[25] [25]

Taskonomy: Disentangling task transfer learn- ing

[Zamir et al., 2018] Amir Zamir, Alexander Sax, William Shen, Leonidas Guibas, Jitendra Malik, and Silvio Savarese. Taskonomy: Disentangling task transfer learn- ing. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3712–3722, June 2018

work page 2018