AMI-Net+: A Novel Multi-Instance Neural Network for Medical Diagnosis from Incomplete and Imbalanced Data

Josiah Poon; Simon Poon; Zeyuan Wang

arxiv: 1907.01734 · v1 · pith:U4DJISVPnew · submitted 2019-07-03 · 💻 cs.LG · stat.ML

AMI-Net+: A Novel Multi-Instance Neural Network for Medical Diagnosis from Incomplete and Imbalanced Data

Zeyuan Wang , Josiah Poon , Simon Poon This is my paper

Pith reviewed 2026-05-25 10:10 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords multi-instance learningmedical diagnosisimbalanced dataincomplete datafocal lossattention mechanismneural network

0 comments

The pith

AMI-Net+ improves diagnosis from incomplete and imbalanced medical data by replacing cross-entropy loss with focal loss and adding self-adaptive instance-level pooling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents AMI-Net+ as an extension to AMI-Net for medical diagnosis tasks where patient records are fragmentary and class distributions are extremely skewed. It retains embedding and multi-head attention to model symptom relations, swaps in focal loss to emphasize hard examples, and introduces a new self-adaptive multi-instance pooling step that produces bag-level representations directly from instance features. The authors test the resulting network on two real-world datasets drawn from separate medical domains and report that it exceeds the performance of AMI-Net and other baselines by a considerable margin.

Core claim

AMI-Net+ captures relations among symptoms and between symptoms and the target disease through embedding and attention layers; it substitutes focal loss for cross-entropy loss and replaces standard gated attention pooling with a novel self-adaptive multi-instance pooling operator that works at the instance level to form each bag representation.

What carries the argument

Self-adaptive multi-instance pooling operator that computes bag representations from instance-level features, paired with focal loss to address extreme class imbalance.

If this is right

The network produces more reliable diagnoses when trained on fragmentary patient records.
Focal loss combined with adaptive pooling mitigates the effect of extreme class imbalance in medical classification.
Symptom-disease relations are better modeled by the joint use of multi-head attention and gated attention pooling.
Performance gains hold across two distinct medical domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same loss-plus-pooling changes could be tested on non-medical multi-instance tasks that also suffer from missing features.
An ablation that isolates the contribution of the self-adaptive pooling step would clarify whether the gain comes mainly from the pooling or from focal loss.
If the method scales to larger numbers of instances per bag, it might apply to longitudinal electronic health records.

Load-bearing premise

The combination of focal loss and the new self-adaptive pooling will improve results on incomplete data without creating new overfitting or selection problems.

What would settle it

On either of the two real-world datasets, measure accuracy, F1, or AUC for AMI-Net+ and find that at least one metric is no higher than the corresponding metric for AMI-Net or the other baselines.

Figures

Figures reproduced from arXiv: 1907.01734 by Josiah Poon, Simon Poon, Zeyuan Wang.

**Figure 1.** Figure 1: The architecture of AMI-Net+ 2.3 Self-Attention Mechanism Self-attention is first proposed by Vaswani et. al [23] in the transformer architecture, to capture the correlations of words from the source and target sentences for the machine translation task. Their work demonstrates the validity of self-attention to reveal the syntactic and semantic information in text. In recent years, it has been applied in … view at source ↗

**Figure 2.** Figure 2: The architecture of multi-head attention. Scaled Dot-Product Attention. It takes the query, keys with 𝑑𝑘 dimensions, and values with 𝑑𝑣 dimensions as input and compute the cosine similarities, i.e., dot products, between the given query and all keys divided by a scaling factor √𝑑𝑘. The scaling factor makes sure that the gradient in back propagation wouldn’t vanish or be extreme small. Then a softmax funct… view at source ↗

**Figure 3.** Figure 3 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 3.** Figure 3: Comparison of different number of heads in multi-head attention [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

In medical real-world study (RWS), how to fully utilize the fragmentary and scarce information in model training to generate the solid diagnosis results is a challenging task. In this work, we introduce a novel multi-instance neural network, AMI-Net+, to train and predict from the incomplete and extremely imbalanced data. It is more effective than the state-of-art method, AMI-Net. First, we also implement embedding, multi-head attention and gated attention-based multi-instance pooling to capture the relations of symptoms themselves and with the given disease. Besides, we propose var-ious improvements to AMI-Net, that the cross-entropy loss is replaced by focal loss and we propose a novel self-adaptive multi-instance pooling method on instance-level to obtain the bag representation. We validate the performance of AMI-Net+ on two real-world datasets, from two different medical domains. Results show that our approach outperforms other base-line models by a considerable margin.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AMI-Net+ swaps in focal loss and adds self-adaptive pooling to AMI-Net but never explains how it handles missing values.

read the letter

The main takeaway is that this is an incremental update to AMI-Net. The authors replace cross-entropy with focal loss to deal with imbalance and introduce a self-adaptive multi-instance pooling method. They test it on two real medical datasets and say it beats baselines by a good margin. What the paper does is apply these standard tools in a medical multi-instance setting. The self-adaptive pooling is new within their framework, and using focal loss is a reasonable choice for imbalance. Checking performance on data from two different domains adds some practical value. The soft spots are the lack of any explanation for how incompleteness is handled. The architecture list does not mention masking, missingness flags, or imputation. This means the claimed gains on incomplete data cannot be linked to the new components. The abstract also omits all numbers, error bars, and ablation studies, so the performance claims cannot be verified from what is given. The stress-test concern holds up. The work is for researchers focused on multi-instance neural networks in healthcare. Someone in that niche might want to see the pooling variant, but without the missing data details or the actual results it is difficult to assess the contribution. It does not show enough rigor or evidence to merit a serious referee's time. I would not send this to peer review without major additions to cover the incompleteness mechanism and to provide quantitative support.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces AMI-Net+, an extension of AMI-Net for medical diagnosis from incomplete and imbalanced data. It retains embedding, multi-head attention, and gated attention pooling while adding focal loss (replacing cross-entropy) and a novel self-adaptive multi-instance pooling method at the instance level; the authors claim this yields superior performance over baselines on two real-world datasets from different medical domains.

Significance. If the performance claims are substantiated with quantitative results and the handling of missing data is explicitly described, the work could offer a practical advance for multi-instance learning on fragmentary medical records. The combination of focal loss for class imbalance and adaptive pooling is a reasonable direction, but the current lack of metrics, ablations, and missing-value mechanisms prevents assessment of whether these additions deliver the claimed gains.

major comments (3)

[Abstract] Abstract: the central claim that AMI-Net+ trains effectively from incomplete data is unsupported because no mechanism (masking, imputation, missingness indicators, or bag-level handling of absent instances) is described in the architecture or training procedure. Without this, performance gains on the two datasets cannot be attributed to the proposed changes rather than unstated preprocessing.
[Abstract] Abstract: the assertion that the approach 'outperforms other baseline models by a considerable margin' supplies no numerical results, error bars, statistical tests, or ablation studies, rendering the performance claim impossible to evaluate.
[Abstract] Abstract: the assumption that focal loss plus self-adaptive instance-level pooling will reliably improve results on incomplete data is presented without any quantitative support or analysis of potential overfitting or selection artifacts introduced by these components.

minor comments (2)

[Abstract] Abstract: 'var-ious' is a typographical error for 'various'.
[Abstract] Abstract: 'base-line' should be written as the single word 'baseline'.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback focused on the abstract. We will revise the abstract to improve clarity on data handling and to include quantitative support for the performance claims. Point-by-point responses follow.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that AMI-Net+ trains effectively from incomplete data is unsupported because no mechanism (masking, imputation, missingness indicators, or bag-level handling of absent instances) is described in the architecture or training procedure. Without this, performance gains on the two datasets cannot be attributed to the proposed changes rather than unstated preprocessing.

Authors: We agree the abstract should explicitly reference the handling of incomplete data. AMI-Net+ inherits the multi-instance bag representation from AMI-Net, in which absent instances are simply omitted from the bag; the embedding, multi-head attention, and pooling layers operate only on the observed instances without imputation or masking. We will revise the abstract to state this mechanism concisely so that the claim is supported. revision: yes
Referee: [Abstract] Abstract: the assertion that the approach 'outperforms other baseline models by a considerable margin' supplies no numerical results, error bars, statistical tests, or ablation studies, rendering the performance claim impossible to evaluate.

Authors: We accept that the abstract would be stronger with concrete numbers. The full manuscript contains tables reporting AUC, F1, and accuracy on both datasets together with comparisons to baselines. We will add the key quantitative improvements (with the reported margins) to the abstract while respecting length limits. revision: yes
Referee: [Abstract] Abstract: the assumption that focal loss plus self-adaptive instance-level pooling will reliably improve results on incomplete data is presented without any quantitative support or analysis of potential overfitting or selection artifacts introduced by these components.

Authors: The experimental section of the manuscript already demonstrates the gains from replacing cross-entropy with focal loss and from the new self-adaptive pooling on the two imbalanced medical datasets. Space constraints in the abstract prevent detailed ablation or overfitting analysis, but we will strengthen the abstract sentence to reference the supporting results and will consider adding a short discussion of these components if the revision allows. revision: partial

Circularity Check

0 steps flagged

No derivation chain; empirical architecture proposal with no self-referential reductions

full rationale

The paper describes an empirical neural network extension (embedding + multi-head attention + gated pooling, focal loss, self-adaptive instance pooling) and reports performance on two real-world datasets. No equations, first-principles derivations, or predictions are presented that reduce by construction to fitted parameters or prior self-citations. The central claim is an architectural and loss-function change whose validity is assessed externally via held-out data performance, satisfying the self-contained criterion.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard neural-network assumptions plus the domain assumption that symptom bags can be treated as multi-instance examples for disease prediction; no new entities are postulated.

free parameters (1)

network hyperparameters and attention weights
All neural-network weights and the self-adaptive pooling parameters are fitted to the training data; exact count and values not reported.

axioms (1)

domain assumption Multi-instance learning framework is appropriate for representing incomplete patient records as bags of symptoms
Invoked when the authors frame the medical diagnosis task as a multi-instance problem.

pith-pipeline@v0.9.0 · 5695 in / 1271 out tokens · 36861 ms · 2026-05-25T10:10:41.369595+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we propose various improvements to AMI-Net, that the cross-entropy loss is replaced by focal loss and we propose a novel self-adaptive multi-instance pooling method on instance-level
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

each patient’s record is viewed as a sentence with a bag of words, i.e., symptoms... multi-instance learning (MIL)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 4 internal anchors

[1]

Real world evidence: experience and lessons from China

Sun X, Tan J, Tang L, Guo JJ, Li X. Real world evidence: experience and lessons from China. bmj. 2018 Feb 5;360:j5262

work page 2018
[2]

Analysis of incomplete multivariate data

Schafer JL. Analysis of incomplete multivariate data. Chapman and Hall/CRC; 1997 Aug 1

work page 1997
[3]

A Study of K -Nearest Neighbour as an Imputation Method

Batista GE, Monard MC. A Study of K -Nearest Neighbour as an Imputation Method. HIS. 2002 Dec 30;87(251-260):48

work page 2002
[4]

Supervised learning from incomplete data via an EM approach

Ghahramani Z, Jordan MI. Supervised learning from incomplete data via an EM approach. InAdvances in neural information processing systems 1994 (pp. 120-127)

work page 1994
[5]

Feature set e mbedding for incomplete data

Grangier D, Melvin I. Feature set e mbedding for incomplete data. InAdvances in Neural Information Processing Systems 2010 (pp. 793-801)

work page 2010
[6]

A brief introduction to weakly supervised learning

Zhou ZH. A brief introduction to weakly supervised learning. National Science Review. 2017 Aug 25;5(1):44-53

work page 2017
[8]

EM-DD: An improved multiple-instance learning technique

Zhang Q, Goldman SA. EM-DD: An improved multiple-instance learning technique. In- Advances in neural information processing systems 2002 (pp. 1073-1080)

work page 2002
[9]

Support vector machines for multiple-instance learning

Andrews S, Tsochantaridis I, Hofmann T. Support vector machines for multiple-instance learning. InAdvances in neural information processing systems 2003 (pp. 577-584)

work page 2003
[10]

Multi-instance learning by treating instances as non-iid sam- ples

Zhou ZH, Sun YY, Li YF. Multi-instance learning by treating instances as non-iid sam- ples. InProceedings of the 26th annual international conference on machine learning 2009 Jun 14 (pp. 1249-1256). ACM

work page 2009
[11]

and Zhou, Z.H., 2014, December

Wei, X.S., Wu, J. and Zhou, Z.H., 2014, December. Scalable multi-instance learning. In 2014 IEEE International Conference on Data Mining (pp. 1037-1042). IEEE

work page 2014
[12]

Neural networks for multi-instance learning

Zhou ZH, Zhang ML. Neural networks for multi-instance learning. InProceedings of the International Conference on Intelligent Information Technology, Beijing, China 2002 Aug (pp. 455-459)

work page 2002
[13]

Attention-based Deep Multiple Instance Learning

Ilse M, Tomczak JM, Welling M. Attention-based deep multiple instance learning. arXiv preprint arXiv:1802.04712. 2018 Feb 13

work page internal anchor Pith review Pith/arXiv arXiv 2018
[14]

and Liu, W., 2018

Wang, X., Yan, Y., Tang, P., Bai, X. and Liu, W., 2018. Revisiting multiple instance neu- ral networks. Pattern Recognition, 74, pp.15-24

work page 2018
[15]

and Huang, J., 2018, November

Yan, Y., Wang, X., Guo, X., Fang, J., Liu, W. and Huang, J., 2018, November. Deep Multi-instance Learning with Dynamic Pooling. In Asian Conference on Machine Learn- ing (pp. 662-677)

work page 2018
[16]

Deep multiple instance learning for image classification and auto-annotation

Wu J, Yu Y, Huang C, Yu K. Deep multiple instance learning for image classification and auto-annotation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015 (pp. 3460-3469)

work page 2015
[17]

Ensemble multi-instance multi-label learning approach for video annotation task

Xu XS, Xue X, Zhou ZH. Ensemble multi-instance multi-label learning approach for video annotation task. InProceedings of the 19th ACM international conference on Multimedia 2011 Nov 28 (pp. 1153-1156). ACM

work page 2011
[18]

Multi-instance multi-label learning for relation extraction

Surdeanu M, Tibshirani J, Nallapati R, Manning CD. Multi-instance multi-label learning for relation extraction. InProceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning 2012 Jul 12 (pp. 455-465). Association for Computational Linguistics. 12

work page 2012
[19]

Residual attention network for image classification

Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X. Residual attention network for image classification. InProceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition 2017 (pp. 3156-3164)

work page 2017
[20]

Hierarchical attention networks for document classification

Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E. Hierarchical attention networks for document classification. InProceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2016 (pp. 1480-1489)

work page 2016
[21]

Deep MIML network

Feng J, Zhou ZH. Deep MIML network. InThirty-First AAAI Conference on Artificial In- telligence 2017 Feb 13

work page 2017
[22]

Attention-based Multi-instance Neural Network for Medical Diagnosis from Incomplete and Low Quality Data

Wang Z, Poon J, Sun S, Poon S. Attention-based Multi-instance Neural Network for Medi- cal Diagnosis from Incomplete and Low Quality Data. arXiv preprint arXiv:1904.04460. 2019 Apr 9

work page internal anchor Pith review Pith/arXiv arXiv 1904
[23]

Attention is all you need

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. InAdvances in neural information processing systems 2017 (pp. 5998-6008)

work page 2017
[24]

Focal loss for dense object detection

Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. InPro- ceedings of the IEEE international conference on computer vision 2017 (pp. 2980-2988)

work page 2017
[25]

Multi-instance multi-label learning

Zhou ZH, Zhang ML, Huang SJ, Li YF. Multi-instance multi-label learning. Artificial In- telligence. 2012 Jan 1;176(1):2291-320

work page 2012
[26]

Solving the multiple instance problem with axis-parallel rectangles

Dietterich TG, Lathrop RH, Lozano-Pérez T. Solving the multiple instance problem with axis-parallel rectangles. Artificial intelligence. 1997 Jan 1;89(1-2):31-71

work page 1997
[27]

A framework for multiple-instance learning

Maron O, Lozano-Pérez T. A framework for multiple-instance learning. InAdvances in neural information processing systems 1998 (pp. 570-576)

work page 1998
[28]

Multi instance neural networks

Ramon J, De Raedt L. Multi instance neural networks. InProceedings of the ICML-2000 workshop on attribute-value and relational learning 2000 (pp. 53-60)

work page 2000
[29]

Handwritten digit recognition with a back-propagation network

LeCun Y, Boser BE, Denker JS, Henderson D, Howard RE, Hubbard WE, Jackel LD. Handwritten digit recognition with a back-propagation network. InAdvances in neural in- formation processing systems 1990 (pp. 396-404)

work page 1990
[30]

Multiple instance learning: A sur- vey of problem characteristics and applications

Carbonneau MA, Cheplygina V, Granger E, Gagnon G. Multiple instance learning: A sur- vey of problem characteristics and applications. Pattern Recognition. 2018 May 1;77:329- 53

work page 2018
[31]

Deep semantic role labeling with self-attention

Tan Z, Wang M, Xie J, Chen Y, Shi X. Deep semantic role labeling with self-attention. InThirty-Second AAAI Conference on Artificial Intelligence 2018 Apr 26

work page 2018
[32]

Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction

Verga P, Strubell E, McCallum A. Simultaneously self-attending to all mentions for full- abstract biological relation extraction. arXiv preprint arXiv:1802.10569. 2018 Feb 28

work page internal anchor Pith review Pith/arXiv arXiv 2018
[33]

Layer Normalization

Lei Ba J, Kiros JR, Hinton GE. Layer normalization. arXiv preprint arXiv:1607.06450. 2016 Jul

work page internal anchor Pith review Pith/arXiv arXiv 2016
[34]

Deep residual learning for image recognition

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition 2016 (pp. 770-778)

work page 2016
[35]

Language modeling with gated convolutional networks

Dauphin YN, Fan A, Auli M, Grangier D. Language modeling with gated convolutional networks. InProceedings of the 34th International Conference on Machine Learning-Vol- ume 70 2017 Aug 6 (pp. 933-941). JMLR. org

work page 2017
[36]

Applied logistic regression

Hosmer Jr DW, Lemeshow S, Sturdivant RX. Applied logistic regression. John Wiley & Sons; 2013

work page 2013
[37]

Support vector machines: theory and applications

Wang L, editor. Support vector machines: theory and applications. Springer Science & Business Media; 2005 Jun 21

work page 2005
[38]

Random decision forests

Ho TK. Random decision forests. InProceedings of 3rd international conference on docu- ment analysis and recognition 1995 Aug 14 (Vol. 1, pp. 278-282). IEEE

work page 1995
[39]

Xgboost: A scalable tree boosting system

Chen T, Guestrin C. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining 2016 Aug 13 (pp. 785-794). ACM

work page 2016

[1] [1]

Real world evidence: experience and lessons from China

Sun X, Tan J, Tang L, Guo JJ, Li X. Real world evidence: experience and lessons from China. bmj. 2018 Feb 5;360:j5262

work page 2018

[2] [2]

Analysis of incomplete multivariate data

Schafer JL. Analysis of incomplete multivariate data. Chapman and Hall/CRC; 1997 Aug 1

work page 1997

[3] [3]

A Study of K -Nearest Neighbour as an Imputation Method

Batista GE, Monard MC. A Study of K -Nearest Neighbour as an Imputation Method. HIS. 2002 Dec 30;87(251-260):48

work page 2002

[4] [4]

Supervised learning from incomplete data via an EM approach

Ghahramani Z, Jordan MI. Supervised learning from incomplete data via an EM approach. InAdvances in neural information processing systems 1994 (pp. 120-127)

work page 1994

[5] [5]

Feature set e mbedding for incomplete data

Grangier D, Melvin I. Feature set e mbedding for incomplete data. InAdvances in Neural Information Processing Systems 2010 (pp. 793-801)

work page 2010

[6] [6]

A brief introduction to weakly supervised learning

Zhou ZH. A brief introduction to weakly supervised learning. National Science Review. 2017 Aug 25;5(1):44-53

work page 2017

[7] [8]

EM-DD: An improved multiple-instance learning technique

Zhang Q, Goldman SA. EM-DD: An improved multiple-instance learning technique. In- Advances in neural information processing systems 2002 (pp. 1073-1080)

work page 2002

[8] [9]

Support vector machines for multiple-instance learning

Andrews S, Tsochantaridis I, Hofmann T. Support vector machines for multiple-instance learning. InAdvances in neural information processing systems 2003 (pp. 577-584)

work page 2003

[9] [10]

Multi-instance learning by treating instances as non-iid sam- ples

Zhou ZH, Sun YY, Li YF. Multi-instance learning by treating instances as non-iid sam- ples. InProceedings of the 26th annual international conference on machine learning 2009 Jun 14 (pp. 1249-1256). ACM

work page 2009

[10] [11]

and Zhou, Z.H., 2014, December

Wei, X.S., Wu, J. and Zhou, Z.H., 2014, December. Scalable multi-instance learning. In 2014 IEEE International Conference on Data Mining (pp. 1037-1042). IEEE

work page 2014

[11] [12]

Neural networks for multi-instance learning

Zhou ZH, Zhang ML. Neural networks for multi-instance learning. InProceedings of the International Conference on Intelligent Information Technology, Beijing, China 2002 Aug (pp. 455-459)

work page 2002

[12] [13]

Attention-based Deep Multiple Instance Learning

Ilse M, Tomczak JM, Welling M. Attention-based deep multiple instance learning. arXiv preprint arXiv:1802.04712. 2018 Feb 13

work page internal anchor Pith review Pith/arXiv arXiv 2018

[13] [14]

and Liu, W., 2018

Wang, X., Yan, Y., Tang, P., Bai, X. and Liu, W., 2018. Revisiting multiple instance neu- ral networks. Pattern Recognition, 74, pp.15-24

work page 2018

[14] [15]

and Huang, J., 2018, November

Yan, Y., Wang, X., Guo, X., Fang, J., Liu, W. and Huang, J., 2018, November. Deep Multi-instance Learning with Dynamic Pooling. In Asian Conference on Machine Learn- ing (pp. 662-677)

work page 2018

[15] [16]

Deep multiple instance learning for image classification and auto-annotation

Wu J, Yu Y, Huang C, Yu K. Deep multiple instance learning for image classification and auto-annotation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015 (pp. 3460-3469)

work page 2015

[16] [17]

Ensemble multi-instance multi-label learning approach for video annotation task

Xu XS, Xue X, Zhou ZH. Ensemble multi-instance multi-label learning approach for video annotation task. InProceedings of the 19th ACM international conference on Multimedia 2011 Nov 28 (pp. 1153-1156). ACM

work page 2011

[17] [18]

Multi-instance multi-label learning for relation extraction

Surdeanu M, Tibshirani J, Nallapati R, Manning CD. Multi-instance multi-label learning for relation extraction. InProceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning 2012 Jul 12 (pp. 455-465). Association for Computational Linguistics. 12

work page 2012

[18] [19]

Residual attention network for image classification

Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X. Residual attention network for image classification. InProceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition 2017 (pp. 3156-3164)

work page 2017

[19] [20]

Hierarchical attention networks for document classification

Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E. Hierarchical attention networks for document classification. InProceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2016 (pp. 1480-1489)

work page 2016

[20] [21]

Deep MIML network

Feng J, Zhou ZH. Deep MIML network. InThirty-First AAAI Conference on Artificial In- telligence 2017 Feb 13

work page 2017

[21] [22]

Attention-based Multi-instance Neural Network for Medical Diagnosis from Incomplete and Low Quality Data

Wang Z, Poon J, Sun S, Poon S. Attention-based Multi-instance Neural Network for Medi- cal Diagnosis from Incomplete and Low Quality Data. arXiv preprint arXiv:1904.04460. 2019 Apr 9

work page internal anchor Pith review Pith/arXiv arXiv 1904

[22] [23]

Attention is all you need

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. InAdvances in neural information processing systems 2017 (pp. 5998-6008)

work page 2017

[23] [24]

Focal loss for dense object detection

Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. InPro- ceedings of the IEEE international conference on computer vision 2017 (pp. 2980-2988)

work page 2017

[24] [25]

Multi-instance multi-label learning

Zhou ZH, Zhang ML, Huang SJ, Li YF. Multi-instance multi-label learning. Artificial In- telligence. 2012 Jan 1;176(1):2291-320

work page 2012

[25] [26]

Solving the multiple instance problem with axis-parallel rectangles

Dietterich TG, Lathrop RH, Lozano-Pérez T. Solving the multiple instance problem with axis-parallel rectangles. Artificial intelligence. 1997 Jan 1;89(1-2):31-71

work page 1997

[26] [27]

A framework for multiple-instance learning

Maron O, Lozano-Pérez T. A framework for multiple-instance learning. InAdvances in neural information processing systems 1998 (pp. 570-576)

work page 1998

[27] [28]

Multi instance neural networks

Ramon J, De Raedt L. Multi instance neural networks. InProceedings of the ICML-2000 workshop on attribute-value and relational learning 2000 (pp. 53-60)

work page 2000

[28] [29]

Handwritten digit recognition with a back-propagation network

LeCun Y, Boser BE, Denker JS, Henderson D, Howard RE, Hubbard WE, Jackel LD. Handwritten digit recognition with a back-propagation network. InAdvances in neural in- formation processing systems 1990 (pp. 396-404)

work page 1990

[29] [30]

Multiple instance learning: A sur- vey of problem characteristics and applications

Carbonneau MA, Cheplygina V, Granger E, Gagnon G. Multiple instance learning: A sur- vey of problem characteristics and applications. Pattern Recognition. 2018 May 1;77:329- 53

work page 2018

[30] [31]

Deep semantic role labeling with self-attention

Tan Z, Wang M, Xie J, Chen Y, Shi X. Deep semantic role labeling with self-attention. InThirty-Second AAAI Conference on Artificial Intelligence 2018 Apr 26

work page 2018

[31] [32]

Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction

Verga P, Strubell E, McCallum A. Simultaneously self-attending to all mentions for full- abstract biological relation extraction. arXiv preprint arXiv:1802.10569. 2018 Feb 28

work page internal anchor Pith review Pith/arXiv arXiv 2018

[32] [33]

Layer Normalization

Lei Ba J, Kiros JR, Hinton GE. Layer normalization. arXiv preprint arXiv:1607.06450. 2016 Jul

work page internal anchor Pith review Pith/arXiv arXiv 2016

[33] [34]

Deep residual learning for image recognition

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition 2016 (pp. 770-778)

work page 2016

[34] [35]

Language modeling with gated convolutional networks

Dauphin YN, Fan A, Auli M, Grangier D. Language modeling with gated convolutional networks. InProceedings of the 34th International Conference on Machine Learning-Vol- ume 70 2017 Aug 6 (pp. 933-941). JMLR. org

work page 2017

[35] [36]

Applied logistic regression

Hosmer Jr DW, Lemeshow S, Sturdivant RX. Applied logistic regression. John Wiley & Sons; 2013

work page 2013

[36] [37]

Support vector machines: theory and applications

Wang L, editor. Support vector machines: theory and applications. Springer Science & Business Media; 2005 Jun 21

work page 2005

[37] [38]

Random decision forests

Ho TK. Random decision forests. InProceedings of 3rd international conference on docu- ment analysis and recognition 1995 Aug 14 (Vol. 1, pp. 278-282). IEEE

work page 1995

[38] [39]

Xgboost: A scalable tree boosting system

Chen T, Guestrin C. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining 2016 Aug 13 (pp. 785-794). ACM

work page 2016