AMI-Net+: A Novel Multi-Instance Neural Network for Medical Diagnosis from Incomplete and Imbalanced Data
Pith reviewed 2026-05-25 10:10 UTC · model grok-4.3
The pith
AMI-Net+ improves diagnosis from incomplete and imbalanced medical data by replacing cross-entropy loss with focal loss and adding self-adaptive instance-level pooling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AMI-Net+ captures relations among symptoms and between symptoms and the target disease through embedding and attention layers; it substitutes focal loss for cross-entropy loss and replaces standard gated attention pooling with a novel self-adaptive multi-instance pooling operator that works at the instance level to form each bag representation.
What carries the argument
Self-adaptive multi-instance pooling operator that computes bag representations from instance-level features, paired with focal loss to address extreme class imbalance.
If this is right
- The network produces more reliable diagnoses when trained on fragmentary patient records.
- Focal loss combined with adaptive pooling mitigates the effect of extreme class imbalance in medical classification.
- Symptom-disease relations are better modeled by the joint use of multi-head attention and gated attention pooling.
- Performance gains hold across two distinct medical domains.
Where Pith is reading between the lines
- The same loss-plus-pooling changes could be tested on non-medical multi-instance tasks that also suffer from missing features.
- An ablation that isolates the contribution of the self-adaptive pooling step would clarify whether the gain comes mainly from the pooling or from focal loss.
- If the method scales to larger numbers of instances per bag, it might apply to longitudinal electronic health records.
Load-bearing premise
The combination of focal loss and the new self-adaptive pooling will improve results on incomplete data without creating new overfitting or selection problems.
What would settle it
On either of the two real-world datasets, measure accuracy, F1, or AUC for AMI-Net+ and find that at least one metric is no higher than the corresponding metric for AMI-Net or the other baselines.
Figures
read the original abstract
In medical real-world study (RWS), how to fully utilize the fragmentary and scarce information in model training to generate the solid diagnosis results is a challenging task. In this work, we introduce a novel multi-instance neural network, AMI-Net+, to train and predict from the incomplete and extremely imbalanced data. It is more effective than the state-of-art method, AMI-Net. First, we also implement embedding, multi-head attention and gated attention-based multi-instance pooling to capture the relations of symptoms themselves and with the given disease. Besides, we propose var-ious improvements to AMI-Net, that the cross-entropy loss is replaced by focal loss and we propose a novel self-adaptive multi-instance pooling method on instance-level to obtain the bag representation. We validate the performance of AMI-Net+ on two real-world datasets, from two different medical domains. Results show that our approach outperforms other base-line models by a considerable margin.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces AMI-Net+, an extension of AMI-Net for medical diagnosis from incomplete and imbalanced data. It retains embedding, multi-head attention, and gated attention pooling while adding focal loss (replacing cross-entropy) and a novel self-adaptive multi-instance pooling method at the instance level; the authors claim this yields superior performance over baselines on two real-world datasets from different medical domains.
Significance. If the performance claims are substantiated with quantitative results and the handling of missing data is explicitly described, the work could offer a practical advance for multi-instance learning on fragmentary medical records. The combination of focal loss for class imbalance and adaptive pooling is a reasonable direction, but the current lack of metrics, ablations, and missing-value mechanisms prevents assessment of whether these additions deliver the claimed gains.
major comments (3)
- [Abstract] Abstract: the central claim that AMI-Net+ trains effectively from incomplete data is unsupported because no mechanism (masking, imputation, missingness indicators, or bag-level handling of absent instances) is described in the architecture or training procedure. Without this, performance gains on the two datasets cannot be attributed to the proposed changes rather than unstated preprocessing.
- [Abstract] Abstract: the assertion that the approach 'outperforms other baseline models by a considerable margin' supplies no numerical results, error bars, statistical tests, or ablation studies, rendering the performance claim impossible to evaluate.
- [Abstract] Abstract: the assumption that focal loss plus self-adaptive instance-level pooling will reliably improve results on incomplete data is presented without any quantitative support or analysis of potential overfitting or selection artifacts introduced by these components.
minor comments (2)
- [Abstract] Abstract: 'var-ious' is a typographical error for 'various'.
- [Abstract] Abstract: 'base-line' should be written as the single word 'baseline'.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback focused on the abstract. We will revise the abstract to improve clarity on data handling and to include quantitative support for the performance claims. Point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that AMI-Net+ trains effectively from incomplete data is unsupported because no mechanism (masking, imputation, missingness indicators, or bag-level handling of absent instances) is described in the architecture or training procedure. Without this, performance gains on the two datasets cannot be attributed to the proposed changes rather than unstated preprocessing.
Authors: We agree the abstract should explicitly reference the handling of incomplete data. AMI-Net+ inherits the multi-instance bag representation from AMI-Net, in which absent instances are simply omitted from the bag; the embedding, multi-head attention, and pooling layers operate only on the observed instances without imputation or masking. We will revise the abstract to state this mechanism concisely so that the claim is supported. revision: yes
-
Referee: [Abstract] Abstract: the assertion that the approach 'outperforms other baseline models by a considerable margin' supplies no numerical results, error bars, statistical tests, or ablation studies, rendering the performance claim impossible to evaluate.
Authors: We accept that the abstract would be stronger with concrete numbers. The full manuscript contains tables reporting AUC, F1, and accuracy on both datasets together with comparisons to baselines. We will add the key quantitative improvements (with the reported margins) to the abstract while respecting length limits. revision: yes
-
Referee: [Abstract] Abstract: the assumption that focal loss plus self-adaptive instance-level pooling will reliably improve results on incomplete data is presented without any quantitative support or analysis of potential overfitting or selection artifacts introduced by these components.
Authors: The experimental section of the manuscript already demonstrates the gains from replacing cross-entropy with focal loss and from the new self-adaptive pooling on the two imbalanced medical datasets. Space constraints in the abstract prevent detailed ablation or overfitting analysis, but we will strengthen the abstract sentence to reference the supporting results and will consider adding a short discussion of these components if the revision allows. revision: partial
Circularity Check
No derivation chain; empirical architecture proposal with no self-referential reductions
full rationale
The paper describes an empirical neural network extension (embedding + multi-head attention + gated pooling, focal loss, self-adaptive instance pooling) and reports performance on two real-world datasets. No equations, first-principles derivations, or predictions are presented that reduce by construction to fitted parameters or prior self-citations. The central claim is an architectural and loss-function change whose validity is assessed externally via held-out data performance, satisfying the self-contained criterion.
Axiom & Free-Parameter Ledger
free parameters (1)
- network hyperparameters and attention weights
axioms (1)
- domain assumption Multi-instance learning framework is appropriate for representing incomplete patient records as bags of symptoms
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we propose various improvements to AMI-Net, that the cross-entropy loss is replaced by focal loss and we propose a novel self-adaptive multi-instance pooling method on instance-level
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
each patient’s record is viewed as a sentence with a bag of words, i.e., symptoms... multi-instance learning (MIL)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Real world evidence: experience and lessons from China
Sun X, Tan J, Tang L, Guo JJ, Li X. Real world evidence: experience and lessons from China. bmj. 2018 Feb 5;360:j5262
work page 2018
-
[2]
Analysis of incomplete multivariate data
Schafer JL. Analysis of incomplete multivariate data. Chapman and Hall/CRC; 1997 Aug 1
work page 1997
-
[3]
A Study of K -Nearest Neighbour as an Imputation Method
Batista GE, Monard MC. A Study of K -Nearest Neighbour as an Imputation Method. HIS. 2002 Dec 30;87(251-260):48
work page 2002
-
[4]
Supervised learning from incomplete data via an EM approach
Ghahramani Z, Jordan MI. Supervised learning from incomplete data via an EM approach. InAdvances in neural information processing systems 1994 (pp. 120-127)
work page 1994
-
[5]
Feature set e mbedding for incomplete data
Grangier D, Melvin I. Feature set e mbedding for incomplete data. InAdvances in Neural Information Processing Systems 2010 (pp. 793-801)
work page 2010
-
[6]
A brief introduction to weakly supervised learning
Zhou ZH. A brief introduction to weakly supervised learning. National Science Review. 2017 Aug 25;5(1):44-53
work page 2017
-
[8]
EM-DD: An improved multiple-instance learning technique
Zhang Q, Goldman SA. EM-DD: An improved multiple-instance learning technique. In- Advances in neural information processing systems 2002 (pp. 1073-1080)
work page 2002
-
[9]
Support vector machines for multiple-instance learning
Andrews S, Tsochantaridis I, Hofmann T. Support vector machines for multiple-instance learning. InAdvances in neural information processing systems 2003 (pp. 577-584)
work page 2003
-
[10]
Multi-instance learning by treating instances as non-iid sam- ples
Zhou ZH, Sun YY, Li YF. Multi-instance learning by treating instances as non-iid sam- ples. InProceedings of the 26th annual international conference on machine learning 2009 Jun 14 (pp. 1249-1256). ACM
work page 2009
-
[11]
and Zhou, Z.H., 2014, December
Wei, X.S., Wu, J. and Zhou, Z.H., 2014, December. Scalable multi-instance learning. In 2014 IEEE International Conference on Data Mining (pp. 1037-1042). IEEE
work page 2014
-
[12]
Neural networks for multi-instance learning
Zhou ZH, Zhang ML. Neural networks for multi-instance learning. InProceedings of the International Conference on Intelligent Information Technology, Beijing, China 2002 Aug (pp. 455-459)
work page 2002
-
[13]
Attention-based Deep Multiple Instance Learning
Ilse M, Tomczak JM, Welling M. Attention-based deep multiple instance learning. arXiv preprint arXiv:1802.04712. 2018 Feb 13
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[14]
Wang, X., Yan, Y., Tang, P., Bai, X. and Liu, W., 2018. Revisiting multiple instance neu- ral networks. Pattern Recognition, 74, pp.15-24
work page 2018
-
[15]
Yan, Y., Wang, X., Guo, X., Fang, J., Liu, W. and Huang, J., 2018, November. Deep Multi-instance Learning with Dynamic Pooling. In Asian Conference on Machine Learn- ing (pp. 662-677)
work page 2018
-
[16]
Deep multiple instance learning for image classification and auto-annotation
Wu J, Yu Y, Huang C, Yu K. Deep multiple instance learning for image classification and auto-annotation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015 (pp. 3460-3469)
work page 2015
-
[17]
Ensemble multi-instance multi-label learning approach for video annotation task
Xu XS, Xue X, Zhou ZH. Ensemble multi-instance multi-label learning approach for video annotation task. InProceedings of the 19th ACM international conference on Multimedia 2011 Nov 28 (pp. 1153-1156). ACM
work page 2011
-
[18]
Multi-instance multi-label learning for relation extraction
Surdeanu M, Tibshirani J, Nallapati R, Manning CD. Multi-instance multi-label learning for relation extraction. InProceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning 2012 Jul 12 (pp. 455-465). Association for Computational Linguistics. 12
work page 2012
-
[19]
Residual attention network for image classification
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X. Residual attention network for image classification. InProceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition 2017 (pp. 3156-3164)
work page 2017
-
[20]
Hierarchical attention networks for document classification
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E. Hierarchical attention networks for document classification. InProceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2016 (pp. 1480-1489)
work page 2016
-
[21]
Feng J, Zhou ZH. Deep MIML network. InThirty-First AAAI Conference on Artificial In- telligence 2017 Feb 13
work page 2017
-
[22]
Wang Z, Poon J, Sun S, Poon S. Attention-based Multi-instance Neural Network for Medi- cal Diagnosis from Incomplete and Low Quality Data. arXiv preprint arXiv:1904.04460. 2019 Apr 9
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[23]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. InAdvances in neural information processing systems 2017 (pp. 5998-6008)
work page 2017
-
[24]
Focal loss for dense object detection
Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. InPro- ceedings of the IEEE international conference on computer vision 2017 (pp. 2980-2988)
work page 2017
-
[25]
Multi-instance multi-label learning
Zhou ZH, Zhang ML, Huang SJ, Li YF. Multi-instance multi-label learning. Artificial In- telligence. 2012 Jan 1;176(1):2291-320
work page 2012
-
[26]
Solving the multiple instance problem with axis-parallel rectangles
Dietterich TG, Lathrop RH, Lozano-Pérez T. Solving the multiple instance problem with axis-parallel rectangles. Artificial intelligence. 1997 Jan 1;89(1-2):31-71
work page 1997
-
[27]
A framework for multiple-instance learning
Maron O, Lozano-Pérez T. A framework for multiple-instance learning. InAdvances in neural information processing systems 1998 (pp. 570-576)
work page 1998
-
[28]
Multi instance neural networks
Ramon J, De Raedt L. Multi instance neural networks. InProceedings of the ICML-2000 workshop on attribute-value and relational learning 2000 (pp. 53-60)
work page 2000
-
[29]
Handwritten digit recognition with a back-propagation network
LeCun Y, Boser BE, Denker JS, Henderson D, Howard RE, Hubbard WE, Jackel LD. Handwritten digit recognition with a back-propagation network. InAdvances in neural in- formation processing systems 1990 (pp. 396-404)
work page 1990
-
[30]
Multiple instance learning: A sur- vey of problem characteristics and applications
Carbonneau MA, Cheplygina V, Granger E, Gagnon G. Multiple instance learning: A sur- vey of problem characteristics and applications. Pattern Recognition. 2018 May 1;77:329- 53
work page 2018
-
[31]
Deep semantic role labeling with self-attention
Tan Z, Wang M, Xie J, Chen Y, Shi X. Deep semantic role labeling with self-attention. InThirty-Second AAAI Conference on Artificial Intelligence 2018 Apr 26
work page 2018
-
[32]
Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction
Verga P, Strubell E, McCallum A. Simultaneously self-attending to all mentions for full- abstract biological relation extraction. arXiv preprint arXiv:1802.10569. 2018 Feb 28
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[33]
Lei Ba J, Kiros JR, Hinton GE. Layer normalization. arXiv preprint arXiv:1607.06450. 2016 Jul
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[34]
Deep residual learning for image recognition
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition 2016 (pp. 770-778)
work page 2016
-
[35]
Language modeling with gated convolutional networks
Dauphin YN, Fan A, Auli M, Grangier D. Language modeling with gated convolutional networks. InProceedings of the 34th International Conference on Machine Learning-Vol- ume 70 2017 Aug 6 (pp. 933-941). JMLR. org
work page 2017
-
[36]
Hosmer Jr DW, Lemeshow S, Sturdivant RX. Applied logistic regression. John Wiley & Sons; 2013
work page 2013
-
[37]
Support vector machines: theory and applications
Wang L, editor. Support vector machines: theory and applications. Springer Science & Business Media; 2005 Jun 21
work page 2005
-
[38]
Ho TK. Random decision forests. InProceedings of 3rd international conference on docu- ment analysis and recognition 1995 Aug 14 (Vol. 1, pp. 278-282). IEEE
work page 1995
-
[39]
Xgboost: A scalable tree boosting system
Chen T, Guestrin C. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining 2016 Aug 13 (pp. 785-794). ACM
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.