Many could be better than all: A novel instance-oriented algorithm for Multi-modal Multi-label problem

Cheng zeng; Chongjun Wang; Hao Cheng; Lei Zhang; Yi Zhang

arxiv: 1907.11857 · v1 · pith:RKGQCF6Cnew · submitted 2019-07-27 · 💻 cs.LG · cs.MM· stat.ML

Many could be better than all: A novel instance-oriented algorithm for Multi-modal Multi-label problem

Yi Zhang , Cheng zeng , Hao Cheng , Chongjun Wang , Lei Zhang This is my paper

Pith reviewed 2026-05-24 14:57 UTC · model grok-4.3

classification 💻 cs.LG cs.MMstat.ML

keywords multi-modal multi-label learninginstance-oriented selectionclassifier chainspartial modalitiesmodality quality variation

0 comments

The pith

A new algorithm for multi-modal multi-label classification selects different modality subsets for each instance rather than using every available modality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an instance-oriented Multi-modal Classifier Chains algorithm for problems where objects carry multiple data modalities and multiple semantic labels. Because modality quality varies across instances and some channels add noise rather than signal, the method learns to choose which modalities to activate on a per-instance basis during testing. Experiments on a real-world herbs collection and two public datasets indicate that predictions made with these partial, instance-tailored modality sets can outperform those made with the complete modality collection. The work therefore questions the default assumption that incorporating every available modality always improves label prediction.

Core claim

The MCC algorithm chains modality-aware classifiers so that each test instance activates only a learned subset of its available modalities; the resulting predictions are shown to be at least as accurate, and often more accurate, than those obtained by feeding every modality into the same chain structure.

What carries the argument

Multi-modal Classifier Chains (MCC), an instance-specific modality-selection rule embedded inside a classifier-chain architecture that decides per instance which modalities to retain for label prediction.

If this is right

Prediction remains possible when some modalities are missing or too costly to acquire at test time.
Different instances can be served by different modality combinations without retraining the entire model.
The approach applies directly to domains such as herb identification where data channels are collected inconsistently.
Training must include an explicit mechanism to discover which modality combinations are useful for which instances.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same per-instance selection logic could be grafted onto other multi-modal architectures beyond classifier chains.
Computational cost at inference time drops when low-quality modalities are routinely skipped.
If modality inconsistency across instances is small, the learned rule will simply default to the full set.
The method invites comparison with per-instance feature selection techniques already used in single-modality settings.

Load-bearing premise

Modality quality varies enough across instances that a selection rule learned from training data will reliably choose subsets that improve accuracy over the full set.

What would settle it

On the same three datasets, an ablation that forces every test instance to use all modalities produces strictly higher accuracy than the selective MCC version.

read the original abstract

With the emergence of diverse data collection techniques, objects in real applications can be represented as multi-modal features. What's more, objects may have multiple semantic meanings. Multi-modal and Multi-label (MMML) problem becomes a universal phenomenon. The quality of data collected from different channels are inconsistent and some of them may not benefit for prediction. In real life, not all the modalities are needed for prediction. As a result, we propose a novel instance-oriented Multi-modal Classifier Chains (MCC) algorithm for MMML problem, which can make convince prediction with partial modalities. MCC extracts different modalities for different instances in the testing phase. Extensive experiments are performed on one real-world herbs dataset and two public datasets to validate our proposed algorithm, which reveals that it may be better to extract many instead of all of the modalities at hand.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MCC proposes instance-specific modality selection inside classifier chains for MMML and reports gains over the all-modalities baseline on three datasets.

read the letter

The main point is that the authors introduce MCC, an instance-oriented version of classifier chains that picks different modality subsets for each test example instead of always using the full set. They argue this can be better than using every available modality and back it with experiments on one herbs dataset plus two public ones. The framing around inconsistent modality quality is straightforward and matches a common practical issue in multi-modal data. The experiments are presented as validation that partial selection helps, which is the core empirical claim. The paper does a reasonable job stating the motivation and running the tests on real data. The soft spot is that the abstract supplies no equations, pseudocode, or description of how the instance-level selector is trained or decided at test time, so it is impossible to judge whether the reported improvements come from the new mechanism or from other implementation choices. The key assumption—that modality usefulness varies enough across instances for a learnable rule to beat the all-modalities case—is stated clearly but its strength rests entirely on those experiments whose protocol is not visible here. This work is aimed at researchers already working on multi-modal multi-label problems who need ideas for handling uneven modality quality. A reader in that narrow area could get value from the experimental setup if the full method section is solid. It deserves a serious referee to check the selection procedure and baseline comparisons rather than a desk reject.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a novel instance-oriented Multi-modal Classifier Chains (MCC) algorithm for the Multi-modal Multi-label (MMML) problem. The central claim is that modality quality is inconsistent across instances, so that selecting different (partial) modalities per instance during testing can produce better predictions than using all available modalities. The approach is said to be validated by experiments on one real-world herbs dataset and two public datasets.

Significance. If the selection mechanism can be shown to reliably outperform the all-modalities baseline, the result would address a practical issue in multi-modal data where some channels add noise rather than signal. Instance-specific modality selection could improve both accuracy and efficiency in MMML tasks. The current manuscript supplies no technical description of the selection rule or quantitative results, so significance cannot yet be assessed.

major comments (2)

[Abstract] Abstract: the description of MCC is limited to the sentence that it 'extracts different modalities for different instances in the testing phase.' No formulation, pseudocode, training procedure, or decision rule for modality selection is supplied, so the central algorithmic claim cannot be evaluated for correctness or novelty.
[Abstract] Abstract (experiments paragraph): the claim that 'extensive experiments ... validate our proposed algorithm' is unsupported because no datasets, metrics, baselines, hyper-parameters, or numerical results are reported. Without these, it is impossible to determine whether the reported gains actually support the instance-oriented selection hypothesis.

minor comments (2)

[Abstract] Abstract: 'make convince prediction' is presumably a typographical error for 'make confident prediction.'
[Abstract] Abstract: the acronym MMML is introduced without an explicit definition, although its expansion can be inferred from context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the comments. We address each major comment below and will revise the abstract accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the description of MCC is limited to the sentence that it 'extracts different modalities for different instances in the testing phase.' No formulation, pseudocode, training procedure, or decision rule for modality selection is supplied, so the central algorithmic claim cannot be evaluated for correctness or novelty.

Authors: The abstract is concise by design. The full manuscript (Sections 2-3) supplies the MCC formulation, training procedure, pseudocode, and the instance-specific decision rule that selects modalities according to per-instance quality estimates. We will expand the abstract with a one-sentence description of the selection mechanism. revision: yes
Referee: [Abstract] Abstract (experiments paragraph): the claim that 'extensive experiments ... validate our proposed algorithm' is unsupported because no datasets, metrics, baselines, hyper-parameters, or numerical results are reported. Without these, it is impossible to determine whether the reported gains actually support the instance-oriented selection hypothesis.

Authors: The abstract summarizes the validation. The manuscript reports the herbs dataset plus two public datasets, the metrics, baselines, and quantitative results showing partial-modality gains. We will revise the abstract to state the datasets and main numerical findings. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The provided text (abstract plus placeholder for full manuscript) contains no equations, derivation steps, fitted parameters, or self-citations that reduce any claimed result to its own inputs by construction. The central claim is an empirical observation that instance-specific modality selection can outperform using all modalities, validated by experiments on three datasets; this is an independent experimental assertion rather than a self-referential derivation. No load-bearing step matches any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no equations, parameters, or modeling assumptions; ledger left empty.

pith-pipeline@v0.9.0 · 5681 in / 927 out tokens · 22102 ms · 2026-05-24T14:57:23.633350+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 3 internal anchors

[1]

INTRODUCTION In many natural scenarios, objects might be complicated with multi-modal features and have multiple semantic meanings simultaneously. For one thing, data is collected from diverse channels and exhibits heterogeneous properties: each of these domains present different views of the same object, where each modal- ity can have its own individual ...

work page
[2]

Many could be better than all: A novel instance-oriented algorithm for Multi-modal Multi-label problem

RELA TED WORK In this section, we brieﬂy present state-of-the-art methods in multi-modal and multi-label [4] ﬁelds. As for modality ex- traction in multi-modal learning, it is closely related to fea- arXiv:1907.11857v1 [cs.LG] 27 Jul 2019 ture extraction [5]. Therefore, we brieﬂy review some related work on these two aspects in this section. Multi-label l...

work page internal anchor Pith review Pith/arXiv arXiv 1907
[3]

Feature selection algo- rithms do not alter the original representation of the variables, but merely select part of them

proposed regularized multilinear regression and selection for automatically selecting a set of features while optimizing prediction for high-dimensional data. Feature selection algo- rithms do not alter the original representation of the variables, but merely select part of them. Most of existing multi-label feature selection algorithms either boil down t...

work page
[4]

An overview of our MCC algorithm is shown in Fig.1

METHODOLOGY This section ﬁrst summarizes some formal symbols and def- initions used throughout this paper, and then introduces the formulation of the proposed MCC model. An overview of our MCC algorithm is shown in Fig.1. instance M MCC instance 1 instance 2 instance 3 Training Phase Testing Phase instance 1 instance 2 instance 3 modal 1modal 2modal Plabe...

work page
[5]

Niter represents maximum number of iterations

Nb denotes batch size of training phase. Niter represents maximum number of iterations. Cth represents the thresh- old of cost. Ath represents the threshold of accuracy of the predicted label. ˆct i denotes the sum of extraction cost and at i denotes accuracy of current predicted label

work page
[6]

Dataset Description We manually collect one real-world Herbs dataset and adapt two publicly available datasets including Emotions [20] and Scene [6]

EXPERIMENT 4.1. Dataset Description We manually collect one real-world Herbs dataset and adapt two publicly available datasets including Emotions [20] and Scene [6]. As for Herbs, there are 5 modalities with ex- plicit modal partitions: channel tropism, symptom, function, dosage and ﬂavor. As for Emotions and Scene, we divide the features into different m...

work page
[7]

However, the quality of modalities extracted from Table 2

CONCLUSION Complex objects, i.e., the articles, the images, etc can al- ways be represented with multi-modal and multi-label infor- mation. However, the quality of modalities extracted from Table 2. Comparison results (mean±std).↑/↓ indicates that the larger/smaller the better of a criterion. The best perfor- mance on each dataset is bolded. Algorithm Eva...

work page
[8]

2016YFB1001102), the National Natural Science Foundation of China (Grant No

ACKNOWLEDGEMENT This paper is supported by the National Key Research and De- velopment Program of China (Grant No. 2016YFB1001102), the National Natural Science Foundation of China (Grant No. 61876080), the Collaborative Innovation Center of Novel Software Technology and Industrialization at Nanjing Univer- sity

work page
[9]

College student scholarships and subsidies granting: A multi-modal multi-label ap- proach,

Han-Jia Ye, De-Chuan Zhan, Xiaolin Li, Zhen-Chuan Huang, and Yuan Jiang, “College student scholarships and subsidies granting: A multi-modal multi-label ap- proach,” in Data Mining (ICDM), 2016 IEEE 16th In- ternational Conference on. IEEE, 2016, pp. 559–568

work page 2016
[10]

Framewise phoneme classiﬁcation with bidirectional lstm and other neural network architectures,

Alex Graves and J ¨urgen Schmidhuber, “Framewise phoneme classiﬁcation with bidirectional lstm and other neural network architectures,”Neural Networks, vol. 18, no. 5-6, pp. 602–610, 2005

work page 2005
[11]

A novel connectionist sys- tem for improved unconstrained handwriting recogni- tion,

R Bertolami, H Bunke, S Fernandez, A Graves, M Li- wicki, and J Schmidhuber, “A novel connectionist sys- tem for improved unconstrained handwriting recogni- tion,” IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, vol. 31, no. 5, 2009

work page 2009
[12]

A review on multi-label learning algorithms,

Min-Ling Zhang and Zhi-Hua Zhou, “A review on multi-label learning algorithms,” IEEE transactions on knowledge and data engineering , vol. 26, no. 8, pp. 1819–1837, 2014

work page 2014
[13]

A re- view of feature selection techniques in bioinformatics,

Yvan Saeys, I ˜naki Inza, and Pedro Larra ˜naga, “A re- view of feature selection techniques in bioinformatics,” bioinformatics, vol. 23, no. 19, pp. 2507–2517, 2007

work page 2007
[14]

Learning multi-label scene clas- siﬁcation,

Matthew R Boutell, Jiebo Luo, Xipeng Shen, and Christopher M Brown, “Learning multi-label scene clas- siﬁcation,” Pattern recognition, vol. 37, no. 9, pp. 1757– 1771, 2004

work page 2004
[15]

Classiﬁer chains for multi-label classiﬁ- cation,

Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank, “Classiﬁer chains for multi-label classiﬁ- cation,” Machine learning, vol. 85, no. 3, pp. 333, 2011

work page 2011
[16]

Entropy chain multi-label classiﬁers for tradi- tional medicine diagnosing parkinson’s disease,

Yue Peng, Ming Fang, Chongjun Wang, and Junyuan Xie, “Entropy chain multi-label classiﬁers for tradi- tional medicine diagnosing parkinson’s disease,” in Bioinformatics and Biomedicine (BIBM), 2015 IEEE In- ternational Conference on. IEEE, 2015, pp. 856–862

work page 2015
[17]

Multi-label learning by exploiting la- bel correlations for tcm diagnosing parkinson’s disease,

Yue Peng, Chi Tang, Gang Chen, Junyuan Xie, and Chongjun Wang, “Multi-label learning by exploiting la- bel correlations for tcm diagnosing parkinson’s disease,” in Bioinformatics and Biomedicine (BIBM), 2017 IEEE International Conference on. IEEE, 2017, pp. 590–594

work page 2017
[18]

Nonlinear di- mensionality reduction by locally linear embedding,

Sam T Roweis and Lawrence K Saul, “Nonlinear di- mensionality reduction by locally linear embedding,” science, vol. 290, no. 5500, pp. 2323–2326, 2000

work page 2000
[19]

Multilinear regression for embedded feature selection with application to fmri analysis.,

Xiaonan Song and Haiping Lu, “Multilinear regression for embedded feature selection with application to fmri analysis.,” in AAAI, 2017, pp. 2562–2568

work page 2017
[20]

Multi- label informed feature selection.,

Ling Jian, Jundong Li, Kai Shu, and Huan Liu, “Multi- label informed feature selection.,” in IJCAI, 2016, pp. 1627–1633

work page 2016
[21]

An lp for sequential learning under bud- gets,

Joseph Wang, Kirill Trapeznikov, and Venkatesh Saligrama, “An lp for sequential learning under bud- gets,” in Artiﬁcial Intelligence and Statistics, 2014, pp. 987–995

work page 2014
[22]

Efﬁcient learning by directed acyclic graph for resource constrained prediction,

Joseph Wang, Kirill Trapeznikov, and Venkatesh Saligrama, “Efﬁcient learning by directed acyclic graph for resource constrained prediction,” in Advances in Neural Information Processing Systems , 2015, pp. 2152–2160

work page 2015
[23]

Instance speciﬁc discriminative modal pursuit: A se- rialized approach,

Yang Yang, De-Chuan Zhan, Ying Fan, and Yuan Jiang, “Instance speciﬁc discriminative modal pursuit: A se- rialized approach,” in Asian Conference on Machine Learning, 2017, pp. 65–80

work page 2017
[24]

Exploration on Generating Traditional Chinese Medicine Prescription from Symptoms with an End-to-End method

Wei Li, Zheng Yang, and Xu Sun, “Exploration on gen- erating traditional chinese medicine prescription from symptoms with an end-to-end method,” arXiv preprint arXiv:1801.09030, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[25]

Leo Breiman, Classiﬁcation and regression trees, Rout- ledge, 2017

work page 2017
[26]

Data classiﬁcation using feature selection and knn ma- chine learning approach,

Shemim Begum, Debasis Chakraborty, and Ram Sarkar, “Data classiﬁcation using feature selection and knn ma- chine learning approach,” inComputational Intelligence and Communication Networks (CICN), 2015 Interna- tional Conference on. IEEE, 2015, pp. 811–814

work page 2015
[27]

ADADELTA: An Adaptive Learning Rate Method

Matthew D Zeiler, “Adadelta: an adaptive learning rate method,” arXiv preprint arXiv:1212.5701, 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012
[28]

Multi-label classiﬁ- cation of music into emotions.,

Konstantinos Trohidis, Grigorios Tsoumakas, George Kalliris, and Ioannis P Vlahavas, “Multi-label classiﬁ- cation of music into emotions.,” in ISMIR, 2008, vol. 8, pp. 325–330

work page 2008
[29]

Ml-knn: A lazy learning approach to multi-label learning,

Min-Ling Zhang and Zhi-Hua Zhou, “Ml-knn: A lazy learning approach to multi-label learning,” Pattern recognition, vol. 40, no. 7, pp. 2038–2048, 2007

work page 2038
[30]

Optimizing the f-measure in multi-label classiﬁcation: Plug-in rule approach versus structured loss minimiza- tion,

Krzysztof Dembczynski, Arkadiusz Jachnik, Wojciech Kotlowski, Willem Waegeman, and Eyke H ¨ullermeier, “Optimizing the f-measure in multi-label classiﬁcation: Plug-in rule approach versus structured loss minimiza- tion,” in International Conference on Machine Learn- ing, 2013, pp. 1130–1138

work page 2013

[1] [1]

INTRODUCTION In many natural scenarios, objects might be complicated with multi-modal features and have multiple semantic meanings simultaneously. For one thing, data is collected from diverse channels and exhibits heterogeneous properties: each of these domains present different views of the same object, where each modal- ity can have its own individual ...

work page

[2] [2]

Many could be better than all: A novel instance-oriented algorithm for Multi-modal Multi-label problem

RELA TED WORK In this section, we brieﬂy present state-of-the-art methods in multi-modal and multi-label [4] ﬁelds. As for modality ex- traction in multi-modal learning, it is closely related to fea- arXiv:1907.11857v1 [cs.LG] 27 Jul 2019 ture extraction [5]. Therefore, we brieﬂy review some related work on these two aspects in this section. Multi-label l...

work page internal anchor Pith review Pith/arXiv arXiv 1907

[3] [3]

Feature selection algo- rithms do not alter the original representation of the variables, but merely select part of them

proposed regularized multilinear regression and selection for automatically selecting a set of features while optimizing prediction for high-dimensional data. Feature selection algo- rithms do not alter the original representation of the variables, but merely select part of them. Most of existing multi-label feature selection algorithms either boil down t...

work page

[4] [4]

An overview of our MCC algorithm is shown in Fig.1

METHODOLOGY This section ﬁrst summarizes some formal symbols and def- initions used throughout this paper, and then introduces the formulation of the proposed MCC model. An overview of our MCC algorithm is shown in Fig.1. instance M MCC instance 1 instance 2 instance 3 Training Phase Testing Phase instance 1 instance 2 instance 3 modal 1modal 2modal Plabe...

work page

[5] [5]

Niter represents maximum number of iterations

Nb denotes batch size of training phase. Niter represents maximum number of iterations. Cth represents the thresh- old of cost. Ath represents the threshold of accuracy of the predicted label. ˆct i denotes the sum of extraction cost and at i denotes accuracy of current predicted label

work page

[6] [6]

Dataset Description We manually collect one real-world Herbs dataset and adapt two publicly available datasets including Emotions [20] and Scene [6]

EXPERIMENT 4.1. Dataset Description We manually collect one real-world Herbs dataset and adapt two publicly available datasets including Emotions [20] and Scene [6]. As for Herbs, there are 5 modalities with ex- plicit modal partitions: channel tropism, symptom, function, dosage and ﬂavor. As for Emotions and Scene, we divide the features into different m...

work page

[7] [7]

However, the quality of modalities extracted from Table 2

CONCLUSION Complex objects, i.e., the articles, the images, etc can al- ways be represented with multi-modal and multi-label infor- mation. However, the quality of modalities extracted from Table 2. Comparison results (mean±std).↑/↓ indicates that the larger/smaller the better of a criterion. The best perfor- mance on each dataset is bolded. Algorithm Eva...

work page

[8] [8]

2016YFB1001102), the National Natural Science Foundation of China (Grant No

ACKNOWLEDGEMENT This paper is supported by the National Key Research and De- velopment Program of China (Grant No. 2016YFB1001102), the National Natural Science Foundation of China (Grant No. 61876080), the Collaborative Innovation Center of Novel Software Technology and Industrialization at Nanjing Univer- sity

work page

[9] [9]

College student scholarships and subsidies granting: A multi-modal multi-label ap- proach,

Han-Jia Ye, De-Chuan Zhan, Xiaolin Li, Zhen-Chuan Huang, and Yuan Jiang, “College student scholarships and subsidies granting: A multi-modal multi-label ap- proach,” in Data Mining (ICDM), 2016 IEEE 16th In- ternational Conference on. IEEE, 2016, pp. 559–568

work page 2016

[10] [10]

Framewise phoneme classiﬁcation with bidirectional lstm and other neural network architectures,

Alex Graves and J ¨urgen Schmidhuber, “Framewise phoneme classiﬁcation with bidirectional lstm and other neural network architectures,”Neural Networks, vol. 18, no. 5-6, pp. 602–610, 2005

work page 2005

[11] [11]

A novel connectionist sys- tem for improved unconstrained handwriting recogni- tion,

R Bertolami, H Bunke, S Fernandez, A Graves, M Li- wicki, and J Schmidhuber, “A novel connectionist sys- tem for improved unconstrained handwriting recogni- tion,” IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, vol. 31, no. 5, 2009

work page 2009

[12] [12]

A review on multi-label learning algorithms,

Min-Ling Zhang and Zhi-Hua Zhou, “A review on multi-label learning algorithms,” IEEE transactions on knowledge and data engineering , vol. 26, no. 8, pp. 1819–1837, 2014

work page 2014

[13] [13]

A re- view of feature selection techniques in bioinformatics,

Yvan Saeys, I ˜naki Inza, and Pedro Larra ˜naga, “A re- view of feature selection techniques in bioinformatics,” bioinformatics, vol. 23, no. 19, pp. 2507–2517, 2007

work page 2007

[14] [14]

Learning multi-label scene clas- siﬁcation,

Matthew R Boutell, Jiebo Luo, Xipeng Shen, and Christopher M Brown, “Learning multi-label scene clas- siﬁcation,” Pattern recognition, vol. 37, no. 9, pp. 1757– 1771, 2004

work page 2004

[15] [15]

Classiﬁer chains for multi-label classiﬁ- cation,

Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank, “Classiﬁer chains for multi-label classiﬁ- cation,” Machine learning, vol. 85, no. 3, pp. 333, 2011

work page 2011

[16] [16]

Entropy chain multi-label classiﬁers for tradi- tional medicine diagnosing parkinson’s disease,

Yue Peng, Ming Fang, Chongjun Wang, and Junyuan Xie, “Entropy chain multi-label classiﬁers for tradi- tional medicine diagnosing parkinson’s disease,” in Bioinformatics and Biomedicine (BIBM), 2015 IEEE In- ternational Conference on. IEEE, 2015, pp. 856–862

work page 2015

[17] [17]

Multi-label learning by exploiting la- bel correlations for tcm diagnosing parkinson’s disease,

Yue Peng, Chi Tang, Gang Chen, Junyuan Xie, and Chongjun Wang, “Multi-label learning by exploiting la- bel correlations for tcm diagnosing parkinson’s disease,” in Bioinformatics and Biomedicine (BIBM), 2017 IEEE International Conference on. IEEE, 2017, pp. 590–594

work page 2017

[18] [18]

Nonlinear di- mensionality reduction by locally linear embedding,

Sam T Roweis and Lawrence K Saul, “Nonlinear di- mensionality reduction by locally linear embedding,” science, vol. 290, no. 5500, pp. 2323–2326, 2000

work page 2000

[19] [19]

Multilinear regression for embedded feature selection with application to fmri analysis.,

Xiaonan Song and Haiping Lu, “Multilinear regression for embedded feature selection with application to fmri analysis.,” in AAAI, 2017, pp. 2562–2568

work page 2017

[20] [20]

Multi- label informed feature selection.,

Ling Jian, Jundong Li, Kai Shu, and Huan Liu, “Multi- label informed feature selection.,” in IJCAI, 2016, pp. 1627–1633

work page 2016

[21] [21]

An lp for sequential learning under bud- gets,

Joseph Wang, Kirill Trapeznikov, and Venkatesh Saligrama, “An lp for sequential learning under bud- gets,” in Artiﬁcial Intelligence and Statistics, 2014, pp. 987–995

work page 2014

[22] [22]

Efﬁcient learning by directed acyclic graph for resource constrained prediction,

Joseph Wang, Kirill Trapeznikov, and Venkatesh Saligrama, “Efﬁcient learning by directed acyclic graph for resource constrained prediction,” in Advances in Neural Information Processing Systems , 2015, pp. 2152–2160

work page 2015

[23] [23]

Instance speciﬁc discriminative modal pursuit: A se- rialized approach,

Yang Yang, De-Chuan Zhan, Ying Fan, and Yuan Jiang, “Instance speciﬁc discriminative modal pursuit: A se- rialized approach,” in Asian Conference on Machine Learning, 2017, pp. 65–80

work page 2017

[24] [24]

Exploration on Generating Traditional Chinese Medicine Prescription from Symptoms with an End-to-End method

Wei Li, Zheng Yang, and Xu Sun, “Exploration on gen- erating traditional chinese medicine prescription from symptoms with an end-to-end method,” arXiv preprint arXiv:1801.09030, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[25] [25]

Leo Breiman, Classiﬁcation and regression trees, Rout- ledge, 2017

work page 2017

[26] [26]

Data classiﬁcation using feature selection and knn ma- chine learning approach,

Shemim Begum, Debasis Chakraborty, and Ram Sarkar, “Data classiﬁcation using feature selection and knn ma- chine learning approach,” inComputational Intelligence and Communication Networks (CICN), 2015 Interna- tional Conference on. IEEE, 2015, pp. 811–814

work page 2015

[27] [27]

ADADELTA: An Adaptive Learning Rate Method

Matthew D Zeiler, “Adadelta: an adaptive learning rate method,” arXiv preprint arXiv:1212.5701, 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012

[28] [28]

Multi-label classiﬁ- cation of music into emotions.,

Konstantinos Trohidis, Grigorios Tsoumakas, George Kalliris, and Ioannis P Vlahavas, “Multi-label classiﬁ- cation of music into emotions.,” in ISMIR, 2008, vol. 8, pp. 325–330

work page 2008

[29] [29]

Ml-knn: A lazy learning approach to multi-label learning,

Min-Ling Zhang and Zhi-Hua Zhou, “Ml-knn: A lazy learning approach to multi-label learning,” Pattern recognition, vol. 40, no. 7, pp. 2038–2048, 2007

work page 2038

[30] [30]

Optimizing the f-measure in multi-label classiﬁcation: Plug-in rule approach versus structured loss minimiza- tion,

Krzysztof Dembczynski, Arkadiusz Jachnik, Wojciech Kotlowski, Willem Waegeman, and Eyke H ¨ullermeier, “Optimizing the f-measure in multi-label classiﬁcation: Plug-in rule approach versus structured loss minimiza- tion,” in International Conference on Machine Learn- ing, 2013, pp. 1130–1138

work page 2013