HiMatch-AD: DINOv3-driven Hierarchical Matching for Training-free Medical Anomaly Detection
Pith reviewed 2026-06-26 11:14 UTC · model grok-4.3
The pith
Dual-branch DINOv3 matching with uncertainty fusion enables training-free medical anomaly detection that outperforms existing methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HiMatch-AD retrieves semantically relevant normal references via dual-branch matching that jointly considers global CLS-token similarity and patch-level representations from DINOv3, then generates hierarchical anomaly maps across multiple transformer stages by comparing clustered normal features with query representations, and finally aggregates the maps through a unified uncertainty-based fusion mechanism that adaptively weights them according to reliability, all without task-specific training.
What carries the argument
Dual-branch global-plus-patch matching on DINOv3 features combined with uncertainty-weighted fusion of multi-stage anomaly maps.
If this is right
- Anomaly detection can scale across institutions and modalities using only one pretrained model and no retraining.
- Performance exceeds both training-based methods and prior DINO-based approaches on brain MRI, liver CT, and retinal OCT datasets.
- Uncertainty-aware fusion reduces the impact of unreliable anomaly responses compared with naive aggregation.
- Hierarchical maps from multiple stages capture deviations at both coarse and fine scales in a single pass.
Where Pith is reading between the lines
- The same matching structure might transfer to anomaly detection in non-medical images if the DINOv3 features remain informative.
- Future gains in foundation models would likely improve the method directly since it relies entirely on pretrained representations.
- Clinical workflows could incorporate the approach for new anomaly types without collecting fresh training sets each time.
Load-bearing premise
Dual-branch global-plus-patch matching on DINOv3 features plus uncertainty-weighted fusion of multi-stage maps will produce reliable anomaly scores without any task-specific optimization or domain adaptation.
What would settle it
Running the method on a medical imaging modality outside the BMAD benchmark, such as ultrasound, and checking whether it still beats simple nearest-neighbor baselines on both detection and localization metrics.
Figures
read the original abstract
Anomaly detection is essential for medical image analysis, where pathological regions often appear as rare deviations from normal anatomical structures. While training-based methods have achieved promising performance, they require task-specific optimization and extensive normal data, which limits scalability across modalities and institutions. Training-free approaches offer greater flexibility by leveraging pretrained visual representations, yet existing methods typically rely on simple nearest-neighbor retrieval and naive aggregation strategies, which may fail to capture hierarchical semantics and ignore the reliability of multiple anomaly responses. In this work, we propose HiMatch-AD, a DINOv3-driven hierarchical matching framework for training-free medical anomaly detection. Our method first retrieves semantically relevant normal references via dual-branch matching that jointly considers global CLS-token similarity and patch-level representations. Hierarchical anomaly maps are then generated across multiple transformer stages by comparing clustered normal features with query representations. To robustly aggregate anomaly responses, we introduce a unified uncertainty-based fusion mechanism that adaptively weights maps according to their reliability. The entire framework operates without any task-specific training. Extensive experiments on the BMAD benchmark, including brain MRI, liver CT, and retinal OCT datasets, demonstrate that HiMatch-AD consistently outperforms both training-based and DINO-based state-of-the-art methods, which highlights the effectiveness of multi-level matching and uncertainty-aware fusion for scalable medical anomaly detection.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes HiMatch-AD, a training-free medical anomaly detection framework driven by DINOv3 features. It retrieves normal references via dual-branch matching on global CLS tokens and patch representations, generates hierarchical anomaly maps across multiple transformer stages via clustered normal feature comparison, and fuses the maps with an uncertainty-weighted mechanism. The central empirical claim is that this approach consistently outperforms both training-based and DINO-based state-of-the-art methods on the BMAD benchmark spanning brain MRI, liver CT, and retinal OCT datasets.
Significance. If the reported outperformance holds under rigorous verification, the work would establish that pretrained natural-image features can support effective hierarchical matching and uncertainty-aware fusion for anomaly detection across multiple medical modalities without any task-specific training or adaptation. This would address key scalability limitations of training-based methods that require large normal datasets and modality-specific optimization, potentially enabling more flexible deployment across institutions.
major comments (2)
- [Abstract] The abstract asserts outperformance on BMAD but supplies no quantitative metrics, baseline details, statistical tests, or ablation results, so it is impossible to judge whether the data actually support the central claim.
- [Method] The assumption that dual-branch global-plus-patch matching on DINOv3 features plus uncertainty-weighted fusion of multi-stage maps will produce reliable anomaly scores without any task-specific optimization or domain adaptation is load-bearing for the training-free premise. No analysis is provided on how domain shift (intensity statistics, lack of color, modality-specific noise) affects the quality of clustered normal references or the resulting maps for the three BMAD modalities.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment point by point below, indicating where revisions will be made to the manuscript.
read point-by-point responses
-
Referee: [Abstract] The abstract asserts outperformance on BMAD but supplies no quantitative metrics, baseline details, statistical tests, or ablation results, so it is impossible to judge whether the data actually support the central claim.
Authors: We agree that the abstract would benefit from explicit quantitative support to allow readers to assess the central claim immediately. In the revised version, we will add concise reporting of key metrics (e.g., mean AUC across the three BMAD modalities), the primary baselines compared, and mention of statistical significance testing. revision: yes
-
Referee: [Method] The assumption that dual-branch global-plus-patch matching on DINOv3 features plus uncertainty-weighted fusion of multi-stage maps will produce reliable anomaly scores without any task-specific optimization or domain adaptation is load-bearing for the training-free premise. No analysis is provided on how domain shift (intensity statistics, lack of color, modality-specific noise) affects the quality of clustered normal references or the resulting maps for the three BMAD modalities.
Authors: DINOv3 features have been shown in the literature to transfer across domains without adaptation, and our dual-branch retrieval plus uncertainty fusion are designed to mitigate unreliable matches. We nevertheless recognize that an explicit examination of domain-shift effects would strengthen the training-free claim. We will add a dedicated discussion subsection with qualitative and quantitative observations on how intensity distributions and modality-specific noise influence reference clustering and anomaly-map quality across the BMAD datasets. revision: yes
Circularity Check
No significant circularity; method is self-contained via pretrained features and external benchmarks
full rationale
The paper describes a training-free pipeline that extracts features from a fixed pretrained DINOv3 model, performs dual-branch retrieval and multi-stage map generation, then fuses via uncertainty weighting. All load-bearing steps are defined directly from the external DINOv3 representations and the BMAD test sets; no parameter is fitted to the target data and then re-used as a 'prediction,' no self-citation supplies a uniqueness theorem, and no equation reduces to its own input by construction. The performance claims rest on reported comparisons against external baselines rather than any internal re-definition of quantities.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Pretrained DINOv3 visual features are suitable for medical image anomaly detection without task-specific fine-tuning
Reference graph
Works this paper leans on
-
[1]
Information Fusion p
Ammar, M.B., Mendoza, A., Belkhir, N., Manzanera, A., Franchi, G.: Foundation models and transformers for anomaly detection: A survey. Information Fusion p. 103517 (2025)
2025
-
[2]
arXiv preprint arXiv:2107.02314 (2021)
Baid, U., Ghodasara, S., Mohan, S., Bilello, M., Calabrese, E., Colak, E., Farahani, K., Kalpathy-Cramer, J., Kitamura, F.C., Pati, S., et al.: The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classifica- tion. arXiv preprint arXiv:2107.02314 (2021)
Pith/arXiv arXiv 2021
-
[3]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Bao,J.,Sun,H.,Deng,H.,He,Y.,Zhang,Z.,Li,X.:Bmad:Benchmarksformedical anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4042–4053 (2024)
2024
-
[4]
Medical image analysis84, 102680 (2023)
Bilic,P.,Christ,P.,Li,H.B.,Vorontsov,E.,Ben-Cohen,A.,Kaissis,G.,Szeskin,A., Jacobs, C., Mamani, G.E.H., Chartrand, G., et al.: The liver tumor segmentation benchmark (lits). Medical image analysis84, 102680 (2023)
2023
-
[5]
arXiv preprint arXiv:2602.19611 (2026)
Cai, M., Zhang, Z., Wu, G., Chai, T., Zhu, X.: Raid: Retrieval-augmented anomaly detection. arXiv preprint arXiv:2602.19611 (2026)
arXiv 2026
-
[6]
Medical Image Analysis102, 103500 (2025)
Cai, Y., Zhang, W., Chen, H., Cheng, K.T.: Medianomaly: A comparative study of anomaly detection in medical images. Medical Image Analysis102, 103500 (2025). https://doi.org/10.1016/j.media.2025.103500
-
[7]
arXiv preprint arXiv:2401.16402 (2024)
Cao, Y., Xu, X., Zhang, J., Cheng, Y., Huang, X., Pang, G., Shen, W.: A survey on visual anomaly detection: Challenge, approach, and prospect. arXiv preprint arXiv:2401.16402 (2024)
arXiv 2024
-
[8]
Neural Networks147, 53–62 (2022)
Chen, L., You, Z., Zhang, N., Xi, J., Le, X.: Utrad: Anomaly detection and local- ization with u-transformer. Neural Networks147, 53–62 (2022)
2022
-
[9]
In: 2025 IEEE/CVF Winter Con- ference on Applications of Computer Vision (WACV)
Damm, S., Laszkiewicz, M., Lederer, J., Fischer, A.: Anomalydino: Boosting patch- based few-shot anomaly detection with dinov2. In: 2025 IEEE/CVF Winter Con- ference on Applications of Computer Vision (WACV). pp. 1319–1329. IEEE (2025)
2025
-
[10]
In: International conference on pattern recognition
Defard,T.,Setkov,A.,Loesch,A.,Audigier,R.:Padim:apatchdistributionmodel- ing framework for anomaly detection and localization. In: International conference on pattern recognition. pp. 475–489. Springer (2021)
2021
-
[11]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Deng, H., Li, X.: Anomaly detection via reverse distillation from one-class embed- ding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9737–9746 (2022)
2022
-
[12]
arXiv preprint arXiv:2010.11929 (2020) 10 J
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) 10 J. Huo et al
Pith/arXiv arXiv 2010
-
[13]
ACM computing surveys (CSUR)54(7), 1–37 (2021)
Fernando, T., Gammulle, H., Denman, S., Sridharan, S., Fookes, C.: Deep learning for medical anomaly detection–a survey. ACM computing surveys (CSUR)54(7), 1–37 (2021)
2021
-
[14]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Gu, Z., Zhu, B., Zhu, G., Chen, Y., Tang, M., Wang, J.: Univad: A training- free unified model for few-shot visual anomaly detection. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 15194–15203 (2025)
2025
-
[15]
Medical image analysis55, 216–227 (2019)
Hu, J., Chen, Y., Yi, Z.: Automated segmentation of macular edema in oct using deep neural networks. Medical image analysis55, 216–227 (2019)
2019
-
[16]
arXiv preprint arXiv:2602.03870 (2026)
Huo, J., Hong, J., Chen, L.: Dino-ad: Unsupervised anomaly detection with frozen dino-v3 features. arXiv preprint arXiv:2602.03870 (2026)
arXiv 2026
-
[17]
In: Proc
Landman, B., Xu, Z., Igelsias, J., Styner, M., Langerak, T., Klein, A.: Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. In: Proc. MICCAI multi-atlas labeling beyond cranial vault—workshop challenge. vol. 5, p. 12. Munich, Germany (2015)
2015
-
[18]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Liu, Z., Zhou, Y., Xu, Y., Wang, Z.: Simplenet: A simple network for image anomaly detection and localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20402–20411 (2023)
2023
-
[19]
arXiv preprint arXiv:2304.07193 (2023)
Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)
Pith/arXiv arXiv 2023
-
[20]
In: International conference on machine learning
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021)
2021
-
[21]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Roth, K., Pemula, L., Zepeda, J., Schölkopf, B., Brox, T., Gehler, P.: Towards total recall in industrial anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14318–14328 (2022)
2022
-
[22]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Salehi, M., Sadjadi, N., Baselizadeh, S., Rohban, M.H., Rabiee, H.R.: Mul- tiresolution knowledge distillation for anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14902– 14912 (2021)
2021
-
[23]
Schlegl,T.,Seeböck,P.,Waldstein,S.M.,Langs,G.,Schmidt-Erfurth,U.:f-anogan: Fastunsupervisedanomalydetectionwithgenerativeadversarialnetworks.Medical image analysis54, 30–44 (2019)
2019
-
[24]
In: International Conference on Medical Image Computing and Computer-Assisted Intervention
Schulthess, N., Konukoglu, E.: Anomaly detection by clustering dino embeddings using a dirichlet process mixture. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 46–56. Springer (2025)
2025
-
[25]
arXiv preprint arXiv:2508.10104 (2025)
Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khali- dov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., et al.: Dinov3. arXiv preprint arXiv:2508.10104 (2025)
Pith/arXiv arXiv 2025
-
[26]
arXiv preprint arXiv:2103.04257 (2021)
Wang, G., Han, S., Ding, E., Huang, D.: Student-teacher feature pyramid matching for anomaly detection. arXiv preprint arXiv:2103.04257 (2021)
arXiv 2021
-
[27]
In: Proceedings of the IEEE/CVF international conference on computer vision
Zavrtanik, V., Kristan, M., Skočaj, D.: Draem-a discriminatively trained re- construction embedding for surface anomaly detection. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 8330–8339 (2021)
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.