Catching MRI outliers: unsupervised detection and localization of MRI artefacts and clinical anomalies using deep learning
Pith reviewed 2026-06-30 11:56 UTC · model grok-4.3
The pith
A two-stage unsupervised framework detects and localizes anomalies in pelvic and brain MRI by tokenizing slices and scoring deviations from normal token distributions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The two-stage framework, trained solely on reference images from public pelvic and brain datasets, compresses slices into discrete tokens, models the distribution of normal tokens, and estimates anomaly evidence by combining perceptual differences with token-surprisal scores based on negative log-likelihood. On held-out evaluation data the system reaches AUCs of 0.97 (pelvic MRI with synthetic and real anomalies) and 0.81 (brain MRI with clinically annotated abnormalities) while heatmaps align with ground-truth locations, supporting its use as an automated MRI quality-control layer.
What carries the argument
Two-stage tokenization-plus-distribution-modeling pipeline that converts MRI slices to discrete tokens and scores anomalies via combined perceptual difference and negative-log-likelihood surprisal.
If this is right
- The method supplies both a binary detection flag and a spatial heatmap that can highlight regions likely to compromise downstream AI tasks.
- Unsupervised training on normal images alone removes the need for labeled anomaly examples during model development.
- The same architecture can be applied to both pelvic and brain MRI without task-specific retraining of the core token model.
- Transparent visualization of flagged regions supports interpretability for clinical quality-control review.
Where Pith is reading between the lines
- The token-based representation might transfer to other MRI contrasts or body sites if the normal-token distribution can be re-estimated from appropriate reference scans.
- Integration into a radiotherapy planning pipeline could reduce the volume of images requiring manual inspection before AI segmentation or dose calculation.
- Because the method relies on public datasets, its reported performance sets a baseline that future work can compare against when new reference collections become available.
Load-bearing premise
Public reference datasets adequately represent the distribution of normal anatomy encountered in the target radiotherapy workflow, and the synthetic anomalies used for evaluation are representative of real clinical anomalies.
What would settle it
Performance measured on a new set of real clinical pelvic MRI cases drawn directly from the radiotherapy workflow falls substantially below the reported AUC of 0.97.
read the original abstract
Artificial intelligence is increasingly integrated into radiotherapy workflows, yet such pipelines remain vulnerable to out-of-distribution image data that may introduce unexpected behavior in clinical tasks. Deep learning-based anomaly detection for pelvic magnetic resonance imaging (MRI) remains largely unexplored, and transparent evaluation of its feasibility for full automation is limited. We developed and evaluated a fully automated, unsupervised anomaly-detection framework for pelvic and brain MRI. A two-stage framework was trained on reference images from public datasets: LUND-PROBE for pelvic MRI, and IXI, fastMRI, and fastMRI+ for brain MRI. In the first stage, MRI slices were compressed into discrete tokens; in the second, the distribution of normal tokens was modeled. Anomaly evidence was estimated by combining perceptual image differences with token-surprisal scores based on negative log-likelihood. Automated detection was evaluated on pelvic MRI with synthetic global and real clinical anomalies, and on brain MRI with clinically annotated fastMRI+ abnormalities. Sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and false-positive behavior in held-out normal cases were assessed. The framework achieved robust detection across hidden evaluation cohorts, with AUCs of 0.97 (95% CI, 0.95-0.98) and 0.81 (95% CI, 0.74-0.87) for pelvic and brain MRI, respectively. Heatmap analysis showed strong spatial agreement between detected anomalies and ground-truth locations, supporting localization accuracy and interpretability. These results support the potential of unsupervised anomaly detection as an automated MRI quality-control layer for radiotherapy workflows, with transparent visualization of image regions likely to compromise downstream AI-based tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a two-stage unsupervised anomaly detection framework for pelvic and brain MRI. It trains a token-compression model followed by a token-distribution model exclusively on public datasets (LUND-PROBE for pelvic; IXI, fastMRI, fastMRI+ for brain), then scores anomalies by combining perceptual image differences with negative-log-likelihood surprisal. Automated detection is evaluated on held-out pelvic cases containing synthetic global and real clinical anomalies and on brain cases with clinically annotated fastMRI+ abnormalities, yielding AUCs of 0.97 (95% CI 0.95-0.98) and 0.81 (95% CI 0.74-0.87) respectively, together with spatially localized heatmaps.
Significance. If the performance claims survive domain-matched validation, the work would supply a practical, fully unsupervised and interpretable quality-control layer that could be inserted upstream of downstream radiotherapy AI tasks to flag out-of-distribution MRI inputs.
major comments (3)
- [Abstract/Methods] Abstract and Methods: The central performance claims rest on the untested premise that the cited public reference datasets adequately represent the distribution of normal pelvic and brain anatomy encountered in radiotherapy workflows (different field strengths, coil configurations, positioning, and patient populations). No domain-shift quantification, statistical comparison, or radiotherapy-specific normal reference set is described; mismatch would directly invalidate calibration of the token-surprisal and perceptual-difference scores.
- [Methods] Methods: No information is supplied on the tokenization architecture, the probabilistic model used to capture the normal token distribution, training hyperparameters or procedure, baseline anomaly-detection methods, or the rule used to set decision thresholds. These omissions are load-bearing because they prevent any assessment of whether the reported AUCs are reproducible or whether thresholds were chosen post-hoc on the evaluation set.
- [Evaluation] Evaluation: Pelvic performance is assessed with synthetic global anomalies whose realism relative to actual clinical anomalies is not demonstrated; combined with the domain-shift issue above, this weakens the evidential support for the claim that the framework is ready for radiotherapy quality control.
minor comments (1)
- The phrase 'hidden evaluation cohorts' is used without an explicit statement of how these cohorts were constructed or how they differ from the training distribution beyond being held out.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate where revisions will be made to improve clarity, reproducibility, and evidential support.
read point-by-point responses
-
Referee: [Abstract/Methods] Abstract and Methods: The central performance claims rest on the untested premise that the cited public reference datasets adequately represent the distribution of normal pelvic and brain anatomy encountered in radiotherapy workflows (different field strengths, coil configurations, positioning, and patient populations). No domain-shift quantification, statistical comparison, or radiotherapy-specific normal reference set is described; mismatch would directly invalidate calibration of the token-surprisal and perceptual-difference scores.
Authors: We acknowledge the importance of domain shift considerations. The public datasets (LUND-PROBE, IXI, fastMRI, fastMRI+) were selected because they provide large-scale, standardized, high-quality acquisitions that are standard benchmarks in MRI research. However, we agree that explicit discussion of potential mismatches with radiotherapy-specific protocols is warranted. In the revised manuscript we will add a new subsection in Methods/Discussion that qualitatively compares acquisition parameters (field strength, coil type, patient positioning) between the training sets and typical radiotherapy MRI, note this as a limitation, and suggest prospective validation on in-house radiotherapy data. No new quantitative domain-shift experiments are feasible within the current study scope, but the added text will temper the claims accordingly. revision: partial
-
Referee: [Methods] Methods: No information is supplied on the tokenization architecture, the probabilistic model used to capture the normal token distribution, training hyperparameters or procedure, baseline anomaly-detection methods, or the rule used to set decision thresholds. These omissions are load-bearing because they prevent any assessment of whether the reported AUCs are reproducible or whether thresholds were chosen post-hoc on the evaluation set.
Authors: We apologize for the insufficient detail in the submitted Methods section. The full paper contains a high-level description, but we agree it lacks the necessary specifics for reproducibility. In the revision we will expand the Methods section to include: (i) the exact tokenization architecture and its hyperparameters, (ii) the probabilistic model (including its formulation), (iii) the complete training procedure and hyperparameter values, (iv) any baseline methods evaluated, and (v) the precise rule for threshold selection (performed on a held-out validation subset of normal cases, not the test set). These additions will directly address reproducibility concerns. revision: yes
-
Referee: [Evaluation] Evaluation: Pelvic performance is assessed with synthetic global anomalies whose realism relative to actual clinical anomalies is not demonstrated; combined with the domain-shift issue above, this weakens the evidential support for the claim that the framework is ready for radiotherapy quality control.
Authors: We agree that the realism of the synthetic anomalies should be explicitly justified. The synthetic anomalies were constructed to replicate common clinical artifacts (global intensity shifts, localized signal voids, noise patterns) observed in the real clinical anomaly cases within the evaluation cohort. In the revision we will add a supplementary figure and accompanying text that visually and quantitatively compares the synthetic anomalies to the real clinical anomalies present in the pelvic test set. We will also revise the Discussion to frame the framework as a promising quality-control approach whose clinical readiness requires further multi-center validation, rather than claiming immediate deployment readiness. revision: partial
Circularity Check
No circularity: training on external public datasets and evaluation on held-out anomaly cohorts are independent
full rationale
The paper trains an unsupervised token-based model on reference images drawn from independent public datasets (LUND-PROBE, IXI, fastMRI, fastMRI+) and evaluates detection performance on separate hidden cohorts containing synthetic or clinically annotated anomalies. No equations, fitted parameters, or self-citations are described that would make the reported AUCs equivalent to the training inputs by construction. The derivation chain consists of standard unsupervised density modeling followed by out-of-distribution scoring; the evaluation metrics are computed against externally labeled ground truth and therefore remain falsifiable outside the fitted values.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Artificial intelligence in radiation oncology
Huynh E, Hosny A, Guthier C, Bitterman DS, Petit SF , Haas-Kogan DA, et al. Artificial intelligence in radiation oncology. Nat Rev Clin Oncol 2020;17:771–81. https://doi.org/10.1038/s41571-020-0417-8
-
[2]
The Evolving Role of Artificial Intelligence in Radiotherapy Treatment Planning—A Literature Review
Kalsi S, French H, Chhaya S, Madani H, Mir R, Anosova A, et al. The Evolving Role of Artificial Intelligence in Radiotherapy Treatment Planning—A Literature Review. Clinical Oncology 2024;36:596–605. https://doi.org/10.1016/j.clon.2024.06.005
-
[3]
Artificial intelligence in radiotherapy: Current applications and future trends
Giraud P , Bibault J-E. Artificial intelligence in radiotherapy: Current applications and future trends. Diagnostic and Interventional Imaging 2024;105:475–80. https://doi.org/10.1016/j.diii.2024.06.001
-
[4]
Real world AI-driven segmentation: Efficiency gains and workflow challenges in radiotherapy
Malone C, Nicholson J, Ryan S, Thirion P , Woods R, McBride P , et al. Real world AI-driven segmentation: Efficiency gains and workflow challenges in radiotherapy. Radiotherapy and Oncology 2025;209:110977. https://doi.org/10.1016/j.radonc.2025.110977
-
[5]
The role of artificial intelligence in radiotherapy clinical practice
Landry G, Kurz C, Traverso A. The role of artificial intelligence in radiotherapy clinical practice. BJR Open 2023;5:20230030. https://doi.org/10.1259/bjro.20230030
-
[6]
AI in Radiation Oncology: A Comprehensive Review of Current Applications and Future Directions
Zafar F , Vilsan J, Mani S, Al Yousif AR, Cano-Reyes SE, Abraham G, et al. AI in Radiation Oncology: A Comprehensive Review of Current Applications and Future Directions. Cureus 2025;17:e92964. https://doi.org/10.7759/cureus.92964
-
[7]
Artificial intelligence-powered innovations in radiotherapy: boosting efficiency and efficacy
Chen J, Zhu X, Jin J-Y , Kong F-MS, Yang G. Artificial intelligence-powered innovations in radiotherapy: boosting efficiency and efficacy. Med Rev (2021) 2025;5:348–51. https://doi.org/10.1515/mr-2025-0007
-
[8]
Vandewinckele L, Claessens M, Dinkla A, Brouwer C, Crijns W, Verellen D, et al. Overview of artificial intelligence-based applications in radiotherapy: Recommendations for implementation and quality assurance. Radiotherapy and Oncology 2020;153:55–66. https://doi.org/10.1016/j.radonc.2020.09.008
-
[9]
Artificial intelligence for quality assurance in radiotherapy
Simon L, Robert C, Meyer P . Artificial intelligence for quality assurance in radiotherapy. Cancer/Radiothérapie 2021;25:623–6. https://doi.org/10.1016/j.canrad.2021.06.012
-
[10]
Yan S, Xie J, Chen N, Nguyen D, Su F-C, Yang D, et al. Artificial intelligence (AI)-based multi- organ contour quality assurance with uncertainty estimation for online adaptive radiotherapy (oART). Mach Learn: Health 2026;2:015001. https://doi.org/10.1088/3049- 477X/ae3320
-
[11]
Quality Assurance for AI-Based Applications in Radiation Therapy
Claessens M, Oria CS, Brouwer CL, Ziemer BP , Scholey JE, Lin H, et al. Quality Assurance for AI-Based Applications in Radiation Therapy. Seminars in Radiation Oncology 2022;32:421–
2022
-
[12]
https://doi.org/10.1016/j.semradonc.2022.06.011
-
[14]
Kleber CEJ, Karius R, Naessens LE, Van Toledo CO, A. C. Van Osch J, Boomsma MF , et al. Advancements in supervised deep learning for metal artifact reduction in computed tomography: A systematic review. European Journal of Radiology 2024;181:111732. https://doi.org/10.1016/j.ejrad.2024.111732
-
[15]
Data drift in medical machine learning: implications and potential remedies
Sahiner B, Chen W, Samala RK, Petrick N. Data drift in medical machine learning: implications and potential remedies. The British Journal of Radiology 2023;96:20220878. https://doi.org/10.1259/bjr.20220878
-
[16]
A review of deep learning-based Unsupervised Anomaly Detection in brain MRI
Behrendt F , Bhattacharya D, Maack L, Krüger J, Opfer R, Schlaefer A. A review of deep learning-based Unsupervised Anomaly Detection in brain MRI. Medical Image Analysis 2026;112:104076. https://doi.org/10.1016/j.media.2026.104076
-
[17]
Unsupervised brain imaging 3D anomaly detection and segmentation with transformers
Pinaya WHL, Tudosiu P-D, Gray R, Rees G, Nachev P , Ourselin S, et al. Unsupervised brain imaging 3D anomaly detection and segmentation with transformers. Medical Image Analysis 2022;79:102475. https://doi.org/10.1016/j.media.2022.102475
-
[18]
Anomaly detection in brain MRI: a comprehensive review
Kim J, Shin Y . Anomaly detection in brain MRI: a comprehensive review. Biomed Eng Lett 2026;16:369–85. https://doi.org/10.1007/s13534-026-00551-6
-
[20]
Bercea CI, Wiestler B, Rueckert D, Schnabel JA. Evaluating normative representation learning in generative AI for robust anomaly detection in brain imaging. Nat Commun 2025;16:1624. https://doi.org/10.1038/s41467-025-56321-y
-
[21]
Autoencoders for unsupervised anomaly segmentation in brain MR images: A comparative study
Baur C, Denner S, Wiestler B, Navab N, Albarqouni S. Autoencoders for unsupervised anomaly segmentation in brain MR images: A comparative study. Medical Image Analysis 2021;69:101952. https://doi.org/10.1016/j.media.2020.101952
-
[22]
Applications of Artificial Intelligence in Prostate Cancer Radiotherapy: A Narrative Review
Piras A, Comelli A, D’Aviero A, Dispensa N, Pavan N, Di Maida F , et al. Applications of Artificial Intelligence in Prostate Cancer Radiotherapy: A Narrative Review. Radiation 2026;6:15. https://doi.org/10.3390/radiation6020015
-
[23]
Olsson LE, Af Wetterstedt S, Scherman J, Gunnlaugsson A, Persson E, Jamtheim Gustafsson C. Evaluation of a deep learning magnetic resonance imaging reconstruction method for synthetic computed tomography generation in prostate radiotherapy. Physics and Imaging in Radiation Oncology 2024;29:100557. https://doi.org/10.1016/j.phro.2024.100557
-
[24]
Magnetic Resonance Imaging only Workflow for Radiotherapy Simulation and Planning in Prostate Cancer
Kerkmeijer LGW, Maspero M, Meijer GJ, Van Der Voort Van Zyp JRN, De Boer HCJ, Van Den Berg CAT. Magnetic Resonance Imaging only Workflow for Radiotherapy Simulation and Planning in Prostate Cancer. Clinical Oncology 2018;30:692–701. https://doi.org/10.1016/j.clon.2018.08.009
-
[25]
MR-guided radiotherapy for prostate cancer: state of the art and future perspectives
Sritharan K, Tree A. MR-guided radiotherapy for prostate cancer: state of the art and future perspectives. The British Journal of Radiology 2022;95:20210800. https://doi.org/10.1259/bjr.20210800
-
[26]
MRI-only treatment planning: benefits and challenges
Owrangi AM, Greer PB, Glide-Hurst CK. MRI-only treatment planning: benefits and challenges. Phys Med Biol 2018;63:05TR01. https://doi.org/10.1088/1361-6560/aaaca4
-
[27]
Auto-Segmentation and Auto-Planning in Automated Radiotherapy for Prostate Cancer
Huang S, Wu J, Lin X, Wang G, Song T, Chen L, et al. Auto-Segmentation and Auto-Planning in Automated Radiotherapy for Prostate Cancer. Bioengineering (Basel) 2025;12:620. https://doi.org/10.3390/bioengineering12060620
-
[28]
Bayerl N, Adams LC, Cavallaro A, Bäuerle T, Schlicht M, Wullich B, et al. Assessment of a fully-automated diagnostic AI software in prostate MRI: Clinical evaluation and histopathological correlation. European Journal of Radiology 2024;181:111790. https://doi.org/10.1016/j.ejrad.2024.111790
-
[29]
Errors in radiation oncology: a study in pathways and dosimetric impact
Klein EE, Drzymala RE, Purdy JA, Michalski J. Errors in radiation oncology: a study in pathways and dosimetric impact. J Appl Clin Med Phys 2005;6:81–94. https://doi.org/10.1120/jacmp.v6i3.2105
-
[30]
De Biase A, Sijtsema NM, Janssen T, Hurkmans C, Brouwer C, Van Ooijen P . Clinical adoption of deep learning target auto-segmentation for radiation therapy: challenges, clinical risks, and mitigation strategies. BJR|Artificial Intelligence 2024;1:ubae015. https://doi.org/10.1093/bjrai/ubae015
-
[31]
LUND- PROBE – LUND Prostate Radiotherapy Open Benchmarking and Evaluation dataset
Rogowski V , Olsson LE, Scherman J, Persson E, Kadhim M, Af Wetterstedt S, et al. LUND- PROBE – LUND Prostate Radiotherapy Open Benchmarking and Evaluation dataset. Sci Data 2025;12:611. https://doi.org/10.1038/s41597-025-04954-5
-
[32]
https://brain-development.org/ixi-dataset; 2026 [acessed 25 May 2026]
IXI Dataset. https://brain-development.org/ixi-dataset; 2026 [acessed 25 May 2026]
2026
-
[34]
Zhao R, Yaman B, Zhang Y , Stewart R, Dixon A, Knoll F , et al. fastMRI+, Clinical pathology annotations for knee and brain fully sampled magnetic resonance imaging data. Sci Data 2022;9:152. https://doi.org/10.1038/s41597-022-01255-z
-
[37]
https://github.com/MustafaKadhim/Self-supervised-anomaly-detection- for-medical-images; 2026 [accessed 25 May 2026]
GitHub Repository. https://github.com/MustafaKadhim/Self-supervised-anomaly-detection- for-medical-images; 2026 [accessed 25 May 2026]
2026
-
[38]
Anomaly detection in radiotherapy plans using deep autoencoder networks
Huang P , Shang J, Xu Y , Hu Z, Zhang K, Dai J, et al. Anomaly detection in radiotherapy plans using deep autoencoder networks. Front Oncol 2023;13:1142947. https://doi.org/10.3389/fonc.2023.1142947
-
[39]
Zarenia M, Zhang Y , Sarosiek C, Conlin R, Amjad A, Paulson E. Deep learning-based automatic contour quality assurance for auto-segmented abdominal MR-Linac contours. Phys Med Biol 2024;69:215029. https://doi.org/10.1088/1361-6560/ad87a6
-
[40]
Proof of concept of a fully unsupervised anomaly detection framework in CBCT‐guided radiotherapy
Luximon DC, Ritter M, Petragallo R, Pijanowski J, Neylon J, Ritter T, et al. Proof of concept of a fully unsupervised anomaly detection framework in CBCT‐guided radiotherapy. Medical Physics 2025;52:e18020. https://doi.org/10.1002/mp.18020
-
[41]
Bercea CI, Wiestler B, Rueckert D, Schnabel JA. Towards Universal Unsupervised Anomaly Detection in Medical Imaging 2024. https://doi.org/10.48550/ARXIV .2401.10637. Declaration of Competing Interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: CJG is a part time con...
work page internal anchor Pith review doi:10.48550/arxiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.