AC-MIL: Weakly Supervised Atrial LGE-MRI Quality Assessment via Adversarial Concept Disentanglement
Pith reviewed 2026-05-10 15:46 UTC · model grok-4.3
The pith
AC-MIL decomposes atrial LGE-MRI quality scores into localized radiological concepts from volume-level labels alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AC-MIL is a weakly supervised framework that decomposes global image quality into clinically defined radiological concepts using only volume-level supervision. An unsupervised residual branch is guided by an adversarial erasure mechanism to prevent leakage of concept information, and a spatial diversity constraint penalizes overlap between distinct concept attention maps, yielding localized and interpretable features together with competitive ordinal grading performance.
What carries the argument
Adversarial erasure on an unsupervised residual branch plus a spatial diversity penalty that forces distinct concept attention maps to occupy separate image regions.
If this is right
- Clinicians receive spatial maps that identify the precise cause of a non-diagnostic atrial LGE-MRI scan.
- Ordinal quality grading performance remains comparable to existing MIL baselines.
- Concept attention maps localize failure modes such as motion blur or inadequate contrast without requiring dense labels.
- The framework supplies interpretable explanations while operating under the same weak supervision as prior methods.
Where Pith is reading between the lines
- The same erasure-plus-diversity pattern could be tested on other weakly supervised medical imaging tasks to produce concept-level explanations.
- Aggregating the concept maps across many scans might surface systematic acquisition problems that guide protocol changes.
- Replacing the residual branch with an explicit noise model could test whether the current disentanglement remains stable under different artifact distributions.
Load-bearing premise
The adversarial erasure and spatial diversity constraint can separate clinically meaningful concepts from volume-level labels without leakage or spurious correlations.
What would settle it
A blinded review by radiologists in which the generated concept maps fail to align with the actual artifacts (motion, contrast, anatomy) judged responsible for non-diagnostic quality on a held-out set of scans.
Figures
read the original abstract
High-quality Late Gadolinium Enhancement (LGE) MRI can be helpful for atrial fibrillation management, yet scan quality is frequently compromised by patient motion, irregular breathing, and suboptimal image acquisition timing. While Multiple Instance Learning (MIL) has emerged as a powerful tool for automated quality assessment under weak supervision, current state-of-the-art methods map localized visual evidence to a single, opaque global feature vector. This black box approach fails to provide actionable feedback on specific failure modes, obscuring whether a scan degrades due to motion blur, inadequate contrast, or a lack of anatomical context. In this paper, we propose Adversarial Concept-MIL (AC-MIL), a weakly supervised framework that decomposes global image quality into clinically defined radiological concepts using only volume-level supervision. To capture latent quality variations without entangling predefined concepts, our framework incorporates an unsupervised residual branch guided by an adversarial erasure mechanism to strictly prevent information leakage. Furthermore, we introduce a spatial diversity constraint that penalizes overlap between distinct concept attention maps, ensuring localized and interpretable feature extraction. Extensive experiments on a clinical dataset of atrial LGE-MRI volumes demonstrate that AC-MIL successfully opens the MIL black box, providing highly localized spatial concept maps that allow clinicians to pinpoint the specific causes of non-diagnostic scans. Crucially, our framework achieves this deep clinical transparency while maintaining highly competitive ordinal grading performance against existing baselines. Code to be released on acceptance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes AC-MIL, a weakly supervised multiple-instance learning framework for ordinal quality grading of atrial LGE-MRI volumes. It decomposes global quality scores into clinically predefined radiological concepts (motion, contrast, anatomy) via an adversarial erasure mechanism that isolates an unsupervised residual branch, combined with a spatial diversity constraint to enforce non-overlapping attention maps. The central claim is that this yields highly localized, interpretable concept maps identifying specific failure modes while preserving competitive bag-level ordinal performance against baselines, all from volume-level labels only.
Significance. If the disentanglement is shown to be faithful rather than post-hoc plausible, the work would meaningfully advance interpretable MIL in medical imaging by supplying clinicians with spatially grounded explanations for non-diagnostic scans. The adversarial residual design and diversity penalty represent a technically coherent attempt to mitigate leakage under weak supervision, which could generalize to other concept-based quality assessment tasks.
major comments (3)
- [Experiments] Experiments section: the abstract asserts 'highly competitive ordinal grading performance' and 'highly localized spatial concept maps' that 'pinpoint the specific causes,' yet no quantitative metrics (accuracy, MAE, AUC), ablation tables, dataset statistics (number of volumes, class distribution, acquisition parameters), or baseline comparisons are supplied in the provided text; without these the central empirical claim cannot be evaluated.
- [Method] Method (adversarial erasure mechanism): the residual branch is described as absorbing 'everything else' after concept erasure, but the manuscript provides no post-training leakage diagnostic (mutual information between concept and residual features, or an ablation in which the residual alone is forced to predict the ordinal label after concept masking); this directly bears on whether the visualized maps are causally grounded or merely correlated with the bag label.
- [Method] Spatial diversity constraint: while the penalty discourages map overlap, the paper does not report a quantitative check that the resulting maps align with actual radiological failure modes (e.g., clinician-rated localization accuracy or correlation with motion/contrast ground-truth annotations); without such validation the interpretability claim rests on visual inspection alone.
minor comments (2)
- [Abstract] Abstract: 'Code to be released on acceptance' is stated, but the clinical dataset is referred to only generically; adding its name, size, and ethics approval reference would improve reproducibility.
- [Method] Notation: the distinction between the concept attention maps and the residual feature map should be made explicit in the first figure or equation block to avoid reader confusion about information flow.
Simulated Author's Rebuttal
We thank the referee for the valuable comments and suggestions. We have revised the manuscript to incorporate additional quantitative results and method validations as outlined in our point-by-point responses below.
read point-by-point responses
-
Referee: [Experiments] Experiments section: the abstract asserts 'highly competitive ordinal grading performance' and 'highly localized spatial concept maps' that 'pinpoint the specific causes,' yet no quantitative metrics (accuracy, MAE, AUC), ablation tables, dataset statistics (number of volumes, class distribution, acquisition parameters), or baseline comparisons are supplied in the provided text; without these the central empirical claim cannot be evaluated.
Authors: We agree that the experiments section requires these quantitative details for proper evaluation. In the revised manuscript, we have added comprehensive quantitative metrics including accuracy, MAE, and AUC for AC-MIL's ordinal grading performance, along with direct comparisons to several baseline methods. We have also included detailed dataset statistics such as the number of volumes, class distribution, and acquisition parameters. Furthermore, ablation tables have been added to demonstrate the impact of the adversarial residual branch and the spatial diversity constraint. These additions substantiate the claims made in the abstract. revision: yes
-
Referee: [Method] Method (adversarial erasure mechanism): the residual branch is described as absorbing 'everything else' after concept erasure, but the manuscript provides no post-training leakage diagnostic (mutual information between concept and residual features, or an ablation in which the residual alone is forced to predict the ordinal label after concept masking); this directly bears on whether the visualized maps are causally grounded or merely correlated with the bag label.
Authors: This is an important point to confirm the effectiveness of the disentanglement. We have revised the manuscript to include post-training leakage diagnostics. Specifically, we now report the mutual information between the concept features and the residual features, which is low, supporting minimal leakage. We have also added an ablation experiment in which the residual branch alone is used to predict the ordinal label after masking the concept branches, demonstrating substantially reduced performance. This indicates that the visualized concept maps are causally linked to the quality assessment rather than just correlated. revision: yes
-
Referee: [Method] Spatial diversity constraint: while the penalty discourages map overlap, the paper does not report a quantitative check that the resulting maps align with actual radiological failure modes (e.g., clinician-rated localization accuracy or correlation with motion/contrast ground-truth annotations); without such validation the interpretability claim rests on visual inspection alone.
Authors: We recognize the value of quantitative validation beyond visual inspection. However, the dataset is weakly supervised and lacks spatial ground-truth annotations for specific radiological failure modes. In the revised version, we have expanded the qualitative results with additional examples and included quantitative measures of the diversity constraint's effect, such as reduced overlap between attention maps. We have also incorporated feedback from clinical experts confirming the alignment of the maps with expected failure modes. We believe this combination provides strong support for the interpretability claims in the context of weak supervision. revision: partial
Circularity Check
No significant circularity; empirical validation independent of internal definitions
full rationale
The paper introduces AC-MIL as a new weakly-supervised framework combining adversarial erasure, a residual branch, and a spatial diversity constraint to produce concept attention maps from volume-level labels. All performance claims (ordinal grading accuracy, localization quality) are presented as outcomes of experiments on an external clinical dataset, with comparisons to baselines. No equations, loss terms, or derivations are shown that define the target concepts or performance metrics in terms of the method's own outputs, nor do any load-bearing steps reduce to self-citation chains or fitted parameters renamed as predictions. The framework is self-contained against external benchmarks and does not rely on tautological reductions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Volume-level supervision suffices to learn disentangled, clinically meaningful concept representations.
invented entities (1)
-
Adversarial erasure mechanism
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Europace23(3), 380–388 (Mar 2021)
Caixal, G., Alarcón, F., Althoff, T.F., Nuñez-Garcia, M., Benito, E.M., Bor- ràs, R., Perea, R.J., Prat-González, S., Garre, P., Soto-Iglesias, D., Gunturitz, C., Cozzari, J., Linhart, M., Tolosana, J.M., Arbelo, E., Roca-Luque, I., Sitges, M., Guasch, E., Mont, L.: Accuracy of left atrial fibrosis detection with car- diac magnetic resonance: correlation ...
-
[2]
In: 2017 ieee symposium on security and privacy (sp)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 ieee symposium on security and privacy (sp). pp. 39–57. Ieee (2017)
work page 2017
-
[3]
Colilla, S., Crow, A., Petkun, W., Singer, D.E., Simon, T., Liu, X.: Estimates of Current and Future Incidence and Prevalence of Atrial Fibrillation in the U.S. Adult Population. The American Journal of Cardiology112(8), 1142–1147 (Oct 2013).https://doi.org/10.1016/j.amjcard.2013.05.063
-
[4]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Corbetta, V., Dijkstra, F.S., Beets-Tan, R., Kervadec, H., Wickstrøm, K., Silva, W.: In-hoc concept representations to regularise deep learning in medical imaging. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 7312–7321 (2025)
work page 2025
-
[5]
Glob Cardiol Sci Pract2015, 8 (Mar 2015).https://doi.org/10
ElMaghawry, M., Romeih, S.: DECAAF: Emphasizing the importance of MRI in AF ablation. Glob Cardiol Sci Pract2015, 8 (Mar 2015).https://doi.org/10. 5339/gcsp.2015.8
work page 2015
-
[6]
JACC: cardiovascular imaging4(2), 150–156 (2011)
Flett, A.S., Hasleton, J., Cook, C., Hausenloy, D., Quarta, G., Ariti, C., Muthu- rangu,V.,Moon,J.C.:Evaluationoftechniquesforthequantificationofmyocardial scar of differing etiology using cardiac magnetic resonance. JACC: cardiovascular imaging4(2), 150–156 (2011)
work page 2011
-
[7]
In: International conference on machine learning
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: International conference on machine learning. pp. 1180–1189. PMLR (2015) 10 K. Sultan et al
work page 2015
-
[8]
Journal of Cardiovascular Magnetic Resonance21, 1–11 (2019)
Gräni, C., Eichhorn, C., Bière, L., Kaneko, K., Murthy, V.L., Agarwal, V., Aghayev, A., Steigner, M., Blankstein, R., Jerosch-Herold, M., et al.: Comparison of myocardial fibrosis quantification methods by cardiovascular magnetic resonance imaging for risk stratification of patients with suspected myocarditis. Journal of Cardiovascular Magnetic Resonance2...
work page 2019
-
[9]
Advances in Neural Information Processing Systems35, 23386–23397 (2022)
Havasi, M., Parbhoo, S., Doshi-Velez, F.: Addressing leakage in concept bottle- neck models. Advances in Neural Information Processing Systems35, 23386–23397 (2022)
work page 2022
-
[10]
In: International conference on machine learning
Ilse,M.,Tomczak,J.,Welling,M.:Attention-baseddeepmultipleinstancelearning. In: International conference on machine learning. pp. 2127–2136. PMLR (2018)
work page 2018
-
[11]
In: International conference on machine learning
Koh, P.W., Nguyen, T., Tang, Y.S., Mussmann, S., Pierson, E., Kim, B., Liang, P.: Concept bottleneck models. In: International conference on machine learning. pp. 5338–5348. PMLR (2020)
work page 2020
-
[12]
In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition
Li, B., Li, Y., Eliceiri, K.W.: Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14318–14328 (2021)
work page 2021
-
[13]
Decoupled Weight Decay Regularization
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[14]
JAMA311(5), 498–506 (Feb 2014).https://doi.org/10.1001/ jama.2014.3
Marrouche, N.F., Wilber, D., Hindricks, G., Jais, P., Akoum, N., Marchlinski, F., Kholmovski, E., Burgon, N., Hu, N., Mont, L., Deneke, T., Duytschaever, M., Neumann, T., Mansour, M., Mahnkopf, C., Herweg, B., Daoud, E., Wissner, E., Bansmann, P., Brachmann, J.: Association of Atrial Tissue Fibrosis Identified by Delayed Enhancement MRI and Atrial Fibrill...
work page 2014
-
[15]
Circulation119(13), 1758–1767 (Apr 2009).https://doi.org/10.1161/ CIRCULATIONAHA.108.811877
Oakes, R.S., Badger, T.J., Kholmovski, E.G., Akoum, N., Burgon, N.S., Fish, E.N., Blauer, J.J.E., Rao, S.N., DiBella, E.V.R., Segerson, N.M., Daccarett, M., Windfelder, J., McGann, C.J., Parker, D., MacLeod, R.S., Marrouche, N.F.: Detection and quantification of left atrial structural remodeling with delayed-enhancement magnetic resonance imaging in patie...
work page 2009
-
[16]
Pattern Analysis and Applications 26(3), 941–955 (2023)
Shi, X., Cao, W., Raschka, S.: Deep neural networks for rank-consistent ordinal regression based on conditional probabilities. Pattern Analysis and Applications 26(3), 941–955 (2023)
work page 2023
-
[17]
European journal of radiology74(3), e149–e153 (2010)
Spiewak, M., Malek, L.A., Misko, J., Chojnowska, L., Milosz, B., Klopotowski, M., Petryka, J., Dabrowski, M., Kepka, C., Ruzyllo, W.: Comparison of different quan- tification methods of late gadolinium enhancement in patients with hypertrophic cardiomyopathy. European journal of radiology74(3), e149–e153 (2010)
work page 2010
-
[18]
In: International Conference on Medical Image Computing and Computer-Assisted Intervention
Sultan, K.A., Hisham, M.H.H., Orkild, B., Morris, A., Kholmovski, E., Bieging, E., Kwan, E., Ranjan, R., DiBella, E., Elhabian, S.: Hamil-qa: Hierarchical approach to multiple instance learning for atrial lge mri quality assessment. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 275–284. Springer (2024)
work page 2024
-
[19]
Verma, A., Jiang, C.y., Betts, T.R., Chen, J., Deisenhofer, I., Mantovan, R., Macle, L., Morillo, C.A., Haverkamp, W., Weerasooriya, R., Albenque, J.P., Nardi, S., Menardi, E., Novak, P., Sanders, P.: Approaches to Catheter Ablation for Per- sistent Atrial Fibrillation. New England Journal of Medicine372(19), 1812–1822 (May 2015).https://doi.org/10.1056/N...
-
[20]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zhang, H., Meng, Y., Zhao, Y., Qiao, Y., Yang, X., Coupland, S.E., Zheng, Y.: Dtfd-mil: Double-tier feature distillation multiple instance learning for histopathol- ogy whole slide image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 18802–18812 (2022)
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.