pith. sign in

arxiv: 2502.11638 · v3 · submitted 2025-02-17 · 💻 cs.CV

Safeguarding AI in Medical Imaging: Post-Hoc Out-of-Distribution Detection with Normalizing Flows

Pith reviewed 2026-05-23 03:05 UTC · model grok-4.3

classification 💻 cs.CV
keywords out-of-distribution detectionnormalizing flowsmedical imagingpost-hoc methoddistribution shiftAI safety
0
0 comments X

The pith

A post-hoc normalizing flow attached to frozen pre-trained models detects out-of-distribution medical images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that normalizing flows can be trained after the fact on the features of an existing medical imaging model to assign likelihood scores that flag out-of-distribution inputs. This matters because clinical AI systems must handle unexpected images without requiring hospitals to retrain or alter approved models. The authors test the approach on their MedOOD dataset of clinically relevant shifts and on MedMNIST, reporting higher AUROC than ViM, MDS, and ReAct. The method leaves the base classifier untouched and only adds a density model on top of its features.

Core claim

The authors demonstrate that a normalizing flow trained post-hoc on the feature representations of a pre-trained model can model the in-distribution and achieve an AUROC of 84.61 percent on the MedOOD dataset while outperforming ViM at 80.65 percent and MDS at 80.87 percent, and reach 93.8 percent AUROC on MedMNIST while surpassing ViM at 88.08 percent and ReAct at 87.05 percent.

What carries the argument

Normalizing flows trained on the feature embeddings extracted by the frozen pre-trained model to estimate input likelihood for OOD scoring.

If this is right

  • Any existing medical imaging model can receive the OOD detector without weight changes or regulatory re-approval of the core classifier.
  • Clinically relevant distribution shifts become detectable in real time during inference.
  • The same post-hoc attachment works across both the authors' custom MedOOD shifts and the MedMNIST benchmark.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The MedOOD dataset construction process could be reused to create shift-specific test sets for other imaging tasks.
  • Replacing the flow with another density estimator such as a variational autoencoder might yield comparable post-hoc performance.
  • Deployment on streaming hospital data would show whether the reported AUROC gains reduce actual diagnostic errors.

Load-bearing premise

Normalizing flows fitted to the features of the pre-trained model can capture the in-distribution well enough to separate it from clinically relevant out-of-distribution samples.

What would settle it

A collection of clinically shifted medical images on which the flow assigns higher likelihood scores than to in-distribution images, producing an AUROC below that of ViM or MDS.

Figures

Figures reproduced from arXiv: 2502.11638 by Dariush Lotfi, Kyongtae Ty Bae, Mohamad Koohi-Moghadam, Mohammad-Ali Nikouei Mahani.

Figure 1
Figure 1. Figure 1: Overview of the proposed method. (a) The base model is trained on ID data (e.g. Adult MRI) for AI tasks such as classification or segmentation. (b) In real-world scenarios, the base model may encounter OOD samples (e.g., Pediatric MRI) that result in high-confidence but inaccurate predictions. (c) Our method integrates seamlessly into clinical workflows without requiring model modifications or retraining. … view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of the OOD detection performance between our method and other post-hoc methods on MedOOD. Metrics were computed using micro-averaging [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: OOD, ID and Control ID datasets in MedOOD, alongside log￾likelihood histograms and t-SNE visualizations of features from the base model. The t-SNE visualization in feature space illustrates the degree of separation between the samples from the base model's perspective, while the histograms illustrate our model’s ability to assign distinct likelihood scores to ID versus OOD samples [PITH_FULL_IMAGE:figures… view at source ↗
read the original abstract

In AI-driven medical imaging, the failure to detect out-of-distribution (OOD) data poses a severe risk to clinical reliability, potentially leading to critical diagnostic errors. Current OOD detection methods often demand impractical retraining or modifications to pre-trained models, hindering their adoption in regulated clinical environments. To address this challenge, we propose a post-hoc normalizing flow-based approach that seamlessly integrates with existing pre-trained models without altering their weights. We evaluate the approach on our in-house-curated MedOOD dataset, designed to capture clinically relevant distribution shifts, and on the MedMNIST benchmark. The proposed method achieves an AUROC of 84.61% on MedOOD, outperforming ViM (80.65%) and MDS (80.87%), and reaches 93.8% AUROC on MedMNIST, surpassing ViM (88.08%) and ReAct (87.05%). This combination of strong performance and post-hoc integration capability makes our approach a practical and effective safeguard for clinical imaging workflows. The model and code to build OOD datasets are publicly accessible at https://github.com/dlotfi/MedOODFlow.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces a post-hoc normalizing flow method for out-of-distribution detection in medical imaging. The approach applies normalizing flows to features from a frozen pre-trained model to estimate likelihoods for OOD detection. It is tested on the MedOOD dataset, achieving 84.61% AUROC, and on MedMNIST with 93.8% AUROC, outperforming baselines including ViM, MDS, and ReAct. The method is presented as practical for clinical use due to its post-hoc nature, and code is made available.

Significance. The post-hoc integration with existing models is a notable strength for adoption in regulated medical environments where retraining is impractical. The creation of the MedOOD dataset targeting clinically relevant shifts adds value. Public code release supports reproducibility. If the normalizing flow successfully models the feature distributions as claimed, the approach could serve as an effective safeguard against OOD failures in AI medical imaging systems.

major comments (3)
  1. [§3] The central assumption that post-hoc normalizing flows can reliably model high-dimensional features from medical imaging models to produce accurate OOD likelihoods is load-bearing for the reported AUROCs, yet the manuscript provides no analysis of feature dimensionality, mode coverage, or calibration of the likelihoods.
  2. [Experiments section] Performance comparisons are given without mention of the number of runs, variance, or statistical tests, making it difficult to determine if the improvements (e.g., 84.61% vs 80.65% on MedOOD) are statistically significant.
  3. [§4.1] Details on the training procedure for the normalizing flows, including architecture, hyperparameters, and how the in-distribution data is used, are insufficient to allow reproduction or assessment of whether the flows avoid the known pitfalls in high-dimensional density estimation.
minor comments (2)
  1. [Abstract] The abstract states the performance numbers but the full methods are not summarized, which could be clarified for readers.
  2. [References] Ensure all baselines like ViM, MDS, ReAct are properly cited with full references.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point-by-point below, agreeing where revisions are warranted to enhance clarity, reproducibility, and rigor.

read point-by-point responses
  1. Referee: [§3] The central assumption that post-hoc normalizing flows can reliably model high-dimensional features from medical imaging models to produce accurate OOD likelihoods is load-bearing for the reported AUROCs, yet the manuscript provides no analysis of feature dimensionality, mode coverage, or calibration of the likelihoods.

    Authors: We agree that explicit supporting analysis would strengthen the claims. In the revised manuscript, we will add a dedicated subsection in §3 reporting the feature dimensionality from the backbone, quantitative or visual assessments of mode coverage in the learned density, and calibration metrics (such as likelihood histograms on ID data) to better substantiate the modeling assumption. revision: yes

  2. Referee: [Experiments section] Performance comparisons are given without mention of the number of runs, variance, or statistical tests, making it difficult to determine if the improvements (e.g., 84.61% vs 80.65% on MedOOD) are statistically significant.

    Authors: The reported figures reflect single runs, consistent with common practice in OOD detection benchmarks given computational costs. To address the concern, the revised experiments section will include results from multiple random seeds with reported means, standard deviations, and appropriate statistical tests (e.g., Wilcoxon signed-rank) for key comparisons. revision: yes

  3. Referee: [§4.1] Details on the training procedure for the normalizing flows, including architecture, hyperparameters, and how the in-distribution data is used, are insufficient to allow reproduction or assessment of whether the flows avoid the known pitfalls in high-dimensional density estimation.

    Authors: We acknowledge the insufficiency for full reproducibility. The revised §4.1 will detail the flow architecture (e.g., coupling layer types and counts, hidden sizes), all hyperparameters (learning rate, epochs, batch size, optimizer), and confirm training uses only in-distribution features from the frozen model. We will also briefly discuss mitigation of high-dimensional pitfalls via the post-hoc feature-space approach. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical post-hoc method with external benchmarks

full rationale

The paper describes a post-hoc normalizing flow applied to frozen pre-trained model features for OOD detection, evaluated via AUROC on MedOOD and MedMNIST against independent baselines (ViM, MDS, ReAct). No equations, derivations, fitted parameters renamed as predictions, or self-citation load-bearing steps appear in the provided text. The central claim rests on reported empirical performance rather than any self-referential construction or uniqueness theorem. This is the standard case of an applied ML method whose validity is tested externally.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5754 in / 1149 out tokens · 40191 ms · 2026-05-23T03:05:01.974737+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 3 internal anchors

  1. [1]

    AI in health and medicine,

    P. Rajpurkar, E. Chen, O. Banerjee, and E. J. Topol, "AI in health and medicine," Nature Medicine 2022 28:1, vol. 28, no. 1, 2022 -01-20, doi: 10.1038/s41591-021-01614-0

  2. [2]

    A review of uncertainty quantification in medical image analysis: Probabilistic and non - probabilistic methods,

    L. Huang, S. Ruan, Y. Xing, and M. Feng, "A review of uncertainty quantification in medical image analysis: Probabilistic and non - probabilistic methods," Medical Image Analysis, vol. 97, 2024/10/01, doi: 10.1016/j.media.2024.103223

  3. [3]

    The limits of fair medical imaging AI in real -world generalization,

    Y. Yang, H. Zhang, J. W. Gichoya, D. Katabi, and M. Ghassemi, "The limits of fair medical imaging AI in real -world generalization," Nature Medicine 2024 30:10, vol. 30, no. 10, 2024-06-28, doi: 10.1038/s41591- 024-03113-4

  4. [4]

    Out of Distribution Detection for Medical Images,

    O. Zhang, J. B. Delbrouck, and D. L. Rubin, "Out of Distribution Detection for Medical Images," Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12959 LNCS, pp. 102 -111, 2021, doi: 10.1007/978-3-030-87735-4_10

  5. [5]

    A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks,

    D. Hendrycks and K. Gimpel, "A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks," in International Conference on Learning Representations , 2017. [Online]. Available: https://openreview.net/forum?id=Hkg4TI9xl

  6. [6]

    Enhancing The Reliability of Out -of- distribution Image Detection in Neural Networks,

    S. Liang, Y. Li, and R. Srikant, "Enhancing The Reliability of Out -of- distribution Image Detection in Neural Networks," in International Conference on Learning Representations , 2018. [Online]. Available: https://openreview.net/forum?id=H1VGkIxRZ

  7. [7]

    Out-of-Distribution Detection based on In -Distribution Data Patterns Memorization with Modern Hopfield Energy,

    J. Zhang et al., "Out-of-Distribution Detection based on In -Distribution Data Patterns Memorization with Modern Hopfield Energy," in The Eleventh International Conference on Learning Representations , 2023. [Online]. Available: https://openreview.net/forum?id=KkazG4lgKL

  8. [8]

    & Beyer, L

    H. Wang, Z. Li, L. Feng, and W. Zhang, "ViM: Out-Of-Distribution with Virtual-logit Matching," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition , 2022, vol. 2022-June: IEEE Computer Society, pp. 4911 -4920, doi: 10.1109/CVPR52688.2022.00487

  9. [9]

    A Simple Unified Framework for Detecting Out -of-Distribution Samples and Adversarial Attacks,

    K. Lee, K. Lee, H. Lee, and J. Shin, "A Simple Unified Framework for Detecting Out -of-Distribution Samples and Adversarial Attacks," in Advances in Neural Information Processing Systems , S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa -Bianchi, and R. Garnett, Eds., 2018, vol. 31: Curran Associates, Inc. [Online]. Available: https://proceeding...

  10. [10]

    Detecting Out -of-Distribution Through the Lens of Neural Collapse,

    L. Liu and Y. Qin, "Detecting Out -of-Distribution Through the Lens of Neural Collapse," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition , 2025, vol. in press: IEEE Computer Society, doi: 10.48550/arXiv.2311.01479

  11. [11]

    Detecting Out -of-Distribution Examples with Gram Matrices,

    C. S. Sastry and S. Oore, "Detecting Out -of-Distribution Examples with Gram Matrices," in Proceedings of the 37th International Conference on Machine Learning, H. D. Iii and A. Singh, Eds., 2020/1// 2020, vol. 119: PMLR, pp. 8491 -8501. [Online]. Available: https://proceedings.mlr.press/v119/sastry20a.html

  12. [12]

    ReAct: Out -of-distribution Detection With Rectified Activations,

    Y. Sun, C. Guo, and Y. Li, "ReAct: Out -of-distribution Detection With Rectified Activations," in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. W. Vaughan, Eds., 2021, vol. 34: Curran Associates, Inc., pp. 144 -157. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2021/f...

  13. [13]

    DICE: Leveraging Sparsification for Out-of- Distribution Detection,

    Y. Sun and Y. Li, "DICE: Leveraging Sparsification for Out-of- Distribution Detection," in Computer Vision – ECCV 2022 , Cham, S. Avidan, G. Brostow, M. Cissé, M. Farinella Giovanni, and T. Hassner, Eds., 2022: Springer Nature Switzerland, pp. 691-708

  14. [14]

    Extremely Simple Activation Shaping for Out -of-Distribution Detection,

    A. Djurisic, N. Bozanic, A. Ashok, and R. Liu, "Extremely Simple Activation Shaping for Out -of-Distribution Detection," in The Eleventh International Conference on Learning Representations , 2023. [Online]. Available: https://openreview.net/forum?id=ndYXTEL6cZz

  15. [15]

    Scaling for Training Time and Post-hoc Out -of-distribution Detection Enhancement,

    K. Xu, R. Chen, G. Franchi, and A. Yao, "Scaling for Training Time and Post-hoc Out -of-distribution Detection Enhancement," in The Twelfth International Conference on Learning Representations , 2024. [Online]. Available: https://openreview.net/forum?id=RDSTjtnqCg

  16. [16]

    Variational Inference with Normalizing Flows,

    D. Rezende and S. Mohamed, "Variational Inference with Normalizing Flows," in Proceedings of the 32nd International Conference on Machine Learning, Lille, France, F. Bach and D. Blei, Eds., 2015/1// 2015, vol. 37: PMLR, pp. 1530 -1538. [Online]. Available: https://proceedings.mlr.press/v37/rezende15.html

  17. [17]

    Why Normalizing Flows Fail to Detect Out -of-Distribution Data,

    P. Kirichenko, P. Izmailov, and A. G. Wilson, "Why Normalizing Flows Fail to Detect Out -of-Distribution Data," in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, Eds., 2020, vol. 33: Curran Associates, Inc., pp. 20578-20589. [Online]. Available: https://proceedings.neurips.cc/paper_files...

  18. [18]

    Understanding Failures in Out-of-Distribution Detection with Deep Generative Models,

    L. Zhang, M. Goldstein, and R. Ranganath, "Understanding Failures in Out-of-Distribution Detection with Deep Generative Models," in Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research, M. Marina and Z. Tong, Eds., 2021, vol. 139: PMLR, pp. 12427 --12436. [Online]. Available: https://proceedings.ml...

  19. [19]

    A Geometric Explanation of the Likelihood OOD Detection Paradox,

    H. Kamkari, B. L. Ross, J. C. Cresswell, A. L. Caterini, R. Krishnan, and G. Loaiza -Ganem, "A Geometric Explanation of the Likelihood OOD Detection Paradox," in Proceedings of the 41st International Conference on Machine Learning , Proceedings of Machine Learning Research, S. Ruslan et al., Eds., 2024, vol. 235: PMLR, pp. 22908 --22935. [Online]. Availab...

  20. [20]

    Density estimation using Real NVP,

    L. Dinh, J. Sohl-Dickstein, and S. Bengio, "Density estimation using Real NVP," in International Conference on Learning Representations , 2017. [Online]. Available: https://openreview.net/forum?id=HkpbnH9lx

  21. [21]

    Glow: Generative Flow with Invertible 1x1 Convolutions,

    D. P. Kingma and P. Dhariwal, "Glow: Generative Flow with Invertible 1x1 Convolutions," in Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa - Bianchi, and R. Garnett, Eds., 2018, vol. 31: Curran Associates, Inc. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2018/file/d139d...

  22. [22]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016/12// 2016, vol. 2016- December: IEEE Computer Society, pp. 770 -778, doi: 10.1109/CVPR.2016.90

  23. [23]

    Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?,

    K. Hara, H. Kataoka, and Y. Satoh, "Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018/12// 2018: IEEE Computer Society, pp. 6546 -6555, doi: 10.1109/CVPR.2018.00685

  24. [24]

    OpenOOD: benchmarking generalized out-of-distribution detection,

    J. Yang et al., "OpenOOD: benchmarking generalized out-of-distribution detection," in Proceedings of the 36th International Conference on Neural Information Processing Systems , Red Hook, NY, USA, 2024: Curran Associates Inc. 14

  25. [25]

    OpenOOD v1.5: Enhanced Benchmark for Out -of- Distribution Detection,

    J. Zhang et al. , "OpenOOD v1.5: Enhanced Benchmark for Out -of- Distribution Detection," Journal of Data -centric Machine Learning Research, 2023/6// 2023, doi: 10.48550/arXiv.2306.09301

  26. [26]

    Full -Spectrum Out-of-Distribution Detection,

    J. Yang, K. Zhou, Z. Liu, J. Yang, K. Zhou, and Z. Liu, "Full -Spectrum Out-of-Distribution Detection," International Journal of Computer Vision 2023 131:10, vol. 131, no. 10, 2023 -06-13, doi: 10.1007/s11263 -023- 01811-z

  27. [27]

    MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification,

    J. Yang et al., "MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification," Scientific Data 2023 10:1, vol. 10, no. 1, pp. 1 -10, 2023/1// 2023, doi: 10.1038/s41597-022-01721- 8

  28. [28]

    The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS),

    B. H. Menze et al., "The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS)," IEEE transactions on medical imaging, vol. 34, no. 10, pp. 1993-2024, 2015/10// 2015, doi: 10.1109/TMI.2014.2377694

  29. [29]

    Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features,

    S. Bakas et al. , "Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features," Scientific data, vol. 4, 2017/9// 2017, doi: 10.1038/SDATA.2017.117

  30. [30]

    Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

    S. Bakas et al., "Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge," Sandra Gonzlez-Vill, vol. 124, 2018/11// 2018, doi: 10.48550/arXiv.1811.02629

  31. [31]

    Multi-layer Aggregation as a Key to Feature-Based OOD Detection,

    B. Lambert, F. Forbes, S. Doyle, and M. Dojat, "Multi-layer Aggregation as a Key to Feature-Based OOD Detection," pp. 104 -114, 2023, doi: 10.1007/978-3-031-44336-7_11

  32. [32]

    Quantifying and understanding uncertainty in deep - learning-based medical image segmentation,

    B. Lambert, "Quantifying and understanding uncertainty in deep - learning-based medical image segmentation," Université Grenoble Alpes [2020-....], 2024. [Online]. Available: https://theses.hal.science/tel- 04673383

  33. [33]

    Automated brain extraction of multisequence MRI using artificial neural networks,

    F. Isensee et al., "Automated brain extraction of multisequence MRI using artificial neural networks," Human Brain Mapping, vol. 40, no. 17, pp. 4952-4964, 2019/12// 2019, doi: 10.1002/HBM.24750

  34. [34]

    The SRI24 multichannel atlas of normal adult human brain structure,

    T. Rohlfing, N. M. Zahr, E. V. Sullivan, and A. Pfefferbaum, "The SRI24 multichannel atlas of normal adult human brain structure," Human Brain Mapping, vol. 31, no. 5, pp. 798 -819, 2010/5// 2010, doi: 10.1002/HBM.20906

  35. [35]

    The ANTsX ecosystem for quantitative biological and medical imaging,

    N. J. Tustison et al., "The ANTsX ecosystem for quantitative biological and medical imaging," Scientific Reports, vol. 11, no. 1, pp. 9068 -9068, 2021/4// 2021, doi: 10.1038/s41598-021-87564-6

  36. [36]

    The LUMIERE dataset: Longitudinal Glioblastoma MRI with expert RANO evaluation,

    Y. Suter et al., "The LUMIERE dataset: Longitudinal Glioblastoma MRI with expert RANO evaluation," Scientific Data, vol. 9, no. 1, 2022/12// 2022, doi: 10.1038/S41597-022-01881-7

  37. [37]

    The Brain Tumor Segmentation (BraTS) Challenge 2023: Focus on Pediatrics (CBTN -CONNECT-DIPGR- ASNR-MICCAI BraTS -PEDs),

    A. F. Kazerooni et al. , "The Brain Tumor Segmentation (BraTS) Challenge 2023: Focus on Pediatrics (CBTN -CONNECT-DIPGR- ASNR-MICCAI BraTS -PEDs)," ArXiv, pp. arXiv:2305.17033v7 - arXiv:2305.17033v7, 2023/5// 2023, doi: 10.48550/arXiv.2305.17033

  38. [38]

    The Brain Tumor Segmentation (BraTS) Challenge 2023: Glioma Segmentation in Sub -Saharan Africa Patient Population (BraTS-Africa),

    M. Adewole et al., "The Brain Tumor Segmentation (BraTS) Challenge 2023: Glioma Segmentation in Sub -Saharan Africa Patient Population (BraTS-Africa)," ArXiv, pp. arXiv:2305.19369v1 -arXiv:2305.19369v1, 2023/5// 2023, doi: 10.48550/arXiv.2305.19369

  39. [39]

    Development and Validation of Deep Learning Algorithms for Detection of Critical Findings in Head CT Scans

    S. Chilamkurthy et al., "Development and Validation of Deep Learning Algorithms for Detection of Critical Findings in Head CT Scans," 2018/3// 2018, doi: 10.48550/arXiv.1803.05854

  40. [40]

    Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge,

    H. J. Kuijf et al., "Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge," IEEE Transactions on Medical Imaging, vol. 38, no. 11, pp. 2556-2568, 2019/11// 2019, doi: 10.1109/TMI.2019.2905770

  41. [41]

    A large, curated, open -source stroke neuroimaging dataset to improve lesion segmentation algorithms,

    S. L. Liew et al. , "A large, curated, open -source stroke neuroimaging dataset to improve lesion segmentation algorithms," Scientific Data 2022 9:1, vol. 9, no. 1, pp. 1 -12, 2022/6// 2022, doi: 10.1038/s41597 -022- 01401-7

  42. [42]

    Simulation of Brain Resection for Cavity Segmentation Using Self-supervised and Semi-supervised Learning,

    F. Pérez-García, R. Rodionov, A. Alim-Marvasti, R. Sparks, J. S. Duncan, and S. Ourselin, "Simulation of Brain Resection for Cavity Segmentation Using Self-supervised and Semi-supervised Learning," Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12263 LNCS, pp. 115-...

  43. [43]

    IXI Dataset – Brain Development

    "IXI Dataset – Brain Development." https://brain-development.org/ixi- dataset/ (accessed

  44. [44]

    CHAOS Challenge - combined (CT-MR) healthy abdominal organ segmentation,

    A. E. Kavur et al. , "CHAOS Challenge - combined (CT-MR) healthy abdominal organ segmentation," Medical Image Analysis, vol. 69, pp. 101950-101950, 2021/4// 2021, doi: 10.1016/J.MEDIA.2020.101950

  45. [45]

    Development of Ground Truth Data for Automatic Lumbar Spine MRI Image Segmentation,

    F. Natalia et al. , "Development of Ground Truth Data for Automatic Lumbar Spine MRI Image Segmentation," Proceedings - 20th International Conference on High Performance Computing and Communications, 16th International Conference on Smart City and 4th International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, pp. 1449 -1454, 2019/1// 2...

  46. [46]

    Efficient Multiple Organ Localization in CT Image Using 3D Region Proposal Network,

    X. Xu, F. Zhou, B. Liu, D. Fu, and X. Bai, "Efficient Multiple Organ Localization in CT Image Using 3D Region Proposal Network," IEEE Transactions on Medical Imaging, vol. 38, no. 8, pp. 1885-1898, 2019/8// 2019, doi: 10.1109/TMI.2019.2894854

  47. [47]

    The Liver Tumor Segmentation Benchmark (LiTS),

    P. Bilic et al. , "The Liver Tumor Segmentation Benchmark (LiTS)," Medical Image Analysis, vol. 84, 2023/02/01, doi: 10.1016/j.media.2022.102680

  48. [48]

    ChestX-Ray8: Hospital-Scale Chest X -Ray Database and Benchmarks on Weakly -Supervised Classification and Localization of Common Thorax Diseases,

    X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, "ChestX-Ray8: Hospital-Scale Chest X -Ray Database and Benchmarks on Weakly -Supervised Classification and Localization of Common Thorax Diseases," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2017/7// 2017, vol. 2017 -January: IEEE Computer Society, pp. 3462-347...

  49. [49]

    Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning,

    D. S. Kermany et al. , "Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning," Cell, vol. 172, no. 5, pp. 1122- 1131.e9, 2018/2// 2018, doi: 10.1016/J.CELL.2018.02.010

  50. [50]

    Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study,

    J. N. Kather et al., "Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study," PLOS Medicine, vol. 16, no. 1, pp. e1002730 -e1002730, 2019, doi: 10.1371/JOURNAL.PMED.1002730

  51. [51]

    The HAM10000 dataset, a large collection of multi -source dermatoscopic images of common pigmented skin lesions,

    P. Tschandl, C. Rosendahl, and H. Kittler, "The HAM10000 dataset, a large collection of multi -source dermatoscopic images of common pigmented skin lesions," Scientific Data 2018 5:1, vol. 5, no. 1, pp. 1 -9, 2018/8// 2018, doi: 10.1038/sdata.2018.161

  52. [52]

    DeepDRiD: Diabetic Retinopathy —Grading and Image Quality Estimation Challenge,

    R. Liu et al. , "DeepDRiD: Diabetic Retinopathy —Grading and Image Quality Estimation Challenge," Patterns, vol. 3, no. 6, pp. 100512 - 100512, 2022/6// 2022, doi: 10.1016/J.PATTER.2022.100512

  53. [53]

    A dataset of microscopic peripheral blood cell images for development of automatic recognition systems,

    A. Acevedo, A. Merino, S. Alférez, Á. Molina, L. Boldú, and J. Rodellar, "A dataset of microscopic peripheral blood cell images for development of automatic recognition systems," Data in Brief, vol. 30, pp. 105474 - 105474, 2020/6// 2020, doi: 10.1016/J.DIB.2020.105474

  54. [54]

    Auto-Encoding Variational Bayes

    D. P. Kingma and M. Welling, "Auto -Encoding Variational Bayes," 2013/12/20, doi: 10.48550/arXiv.1312.6114

  55. [55]

    Residual Flows for Invertible Generative Modeling,

    R. T. Q. Chen, J. Behrmann, D. K. Duvenaud, and J. -H. Jacobsen, "Residual Flows for Invertible Generative Modeling," presented at the Advances in Neural Information Processing Systems, 2019, 2019. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2019/file/5d0d5594d24f 0f955548f0fc0ff83d10-Paper.pdf

  56. [56]

    Neural Spline Flows,

    C. Durkan, A. Bekasov, I. Murray, and G. Papamakarios, "Neural Spline Flows," presented at the Advances in Neural Information Processing Systems, 2019, 2019. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2019/file/7ac71d433f28 2034e088473244df8c02-Paper.pdf

  57. [57]

    Densely Connected Convolutional Networks,

    G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger, "Densely Connected Convolutional Networks," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21-26 July 2017 2017, pp. 2261-2269, doi: 10.1109/CVPR.2017.243

  58. [58]

    Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows,

    Z. Liu et al., "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows," presented at the Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021/10, 2021

  59. [59]

    Detecting Out-of-Distribution Inputs to Deep Generative Models Using Typicality,

    E. Nalisnick, A. Matsukawa, Y. W. Teh, and B. Lakshminarayanan, "Detecting Out-of-Distribution Inputs to Deep Generative Models Using Typicality," 2019/06/07, doi: 10.48550/arXiv.1906.02994