pith. sign in

arxiv: 2606.22556 · v1 · pith:LDPITVK2new · submitted 2026-06-21 · 💻 cs.CV

HiMatch-AD: DINOv3-driven Hierarchical Matching for Training-free Medical Anomaly Detection

Pith reviewed 2026-06-26 11:14 UTC · model grok-4.3

classification 💻 cs.CV
keywords medical anomaly detectiontraining-freeDINOv3hierarchical matchinguncertainty fusionbrain MRIliver CTretinal OCT
0
0 comments X

The pith

Dual-branch DINOv3 matching with uncertainty fusion enables training-free medical anomaly detection that outperforms existing methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that medical anomaly detection works effectively without any training by using a pretrained DINOv3 model for hierarchical matching between query images and normal references. It retrieves references through combined global and patch-level comparisons, builds anomaly maps at multiple transformer stages, and fuses those maps with weights based on their estimated reliability. This approach would matter if true because it removes the usual requirements for large amounts of normal data and task-specific optimization, allowing the same system to handle brain MRI, liver CT, and retinal OCT scans. A reader would see this as a route to more flexible deployment across hospitals and imaging devices.

Core claim

HiMatch-AD retrieves semantically relevant normal references via dual-branch matching that jointly considers global CLS-token similarity and patch-level representations from DINOv3, then generates hierarchical anomaly maps across multiple transformer stages by comparing clustered normal features with query representations, and finally aggregates the maps through a unified uncertainty-based fusion mechanism that adaptively weights them according to reliability, all without task-specific training.

What carries the argument

Dual-branch global-plus-patch matching on DINOv3 features combined with uncertainty-weighted fusion of multi-stage anomaly maps.

If this is right

  • Anomaly detection can scale across institutions and modalities using only one pretrained model and no retraining.
  • Performance exceeds both training-based methods and prior DINO-based approaches on brain MRI, liver CT, and retinal OCT datasets.
  • Uncertainty-aware fusion reduces the impact of unreliable anomaly responses compared with naive aggregation.
  • Hierarchical maps from multiple stages capture deviations at both coarse and fine scales in a single pass.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same matching structure might transfer to anomaly detection in non-medical images if the DINOv3 features remain informative.
  • Future gains in foundation models would likely improve the method directly since it relies entirely on pretrained representations.
  • Clinical workflows could incorporate the approach for new anomaly types without collecting fresh training sets each time.

Load-bearing premise

Dual-branch global-plus-patch matching on DINOv3 features plus uncertainty-weighted fusion of multi-stage maps will produce reliable anomaly scores without any task-specific optimization or domain adaptation.

What would settle it

Running the method on a medical imaging modality outside the BMAD benchmark, such as ultrasound, and checking whether it still beats simple nearest-neighbor baselines on both detection and localization metrics.

Figures

Figures reproduced from arXiv: 2606.22556 by Jiayu Huo, Jingyuan Hong, Le Zhang, Liyun Chen, Meng Zhou.

Figure 1
Figure 1. Figure 1: Overview of the proposed HiMatch-AD framework. Our HiMatch-AD frame￾work contains two steps: K support images are retrieved through dual-branch match￾ing (left). Then the query and support images are sent to the hierarchical training-free anomaly detection pipeline for anomaly map generation (upper right). The details of A-Map Gen module are shown on the bottom right. Here “Enc.” denotes the encoder, where… view at source ↗
Figure 2
Figure 2. Figure 2: Quantitative comparison of different ViT backbones for feature extraction (left). Performance and inference speed as the number of support images K increases from 1 to 16 (middle). Visualization of anomaly maps generated by our approach, with green contours indicating anomalous regions (right). are observed on liver CT (98.92%) and retinal OCT (97.01%), demonstrating the advantage of stronger self-supervis… view at source ↗
read the original abstract

Anomaly detection is essential for medical image analysis, where pathological regions often appear as rare deviations from normal anatomical structures. While training-based methods have achieved promising performance, they require task-specific optimization and extensive normal data, which limits scalability across modalities and institutions. Training-free approaches offer greater flexibility by leveraging pretrained visual representations, yet existing methods typically rely on simple nearest-neighbor retrieval and naive aggregation strategies, which may fail to capture hierarchical semantics and ignore the reliability of multiple anomaly responses. In this work, we propose HiMatch-AD, a DINOv3-driven hierarchical matching framework for training-free medical anomaly detection. Our method first retrieves semantically relevant normal references via dual-branch matching that jointly considers global CLS-token similarity and patch-level representations. Hierarchical anomaly maps are then generated across multiple transformer stages by comparing clustered normal features with query representations. To robustly aggregate anomaly responses, we introduce a unified uncertainty-based fusion mechanism that adaptively weights maps according to their reliability. The entire framework operates without any task-specific training. Extensive experiments on the BMAD benchmark, including brain MRI, liver CT, and retinal OCT datasets, demonstrate that HiMatch-AD consistently outperforms both training-based and DINO-based state-of-the-art methods, which highlights the effectiveness of multi-level matching and uncertainty-aware fusion for scalable medical anomaly detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes HiMatch-AD, a training-free medical anomaly detection framework driven by DINOv3 features. It retrieves normal references via dual-branch matching on global CLS tokens and patch representations, generates hierarchical anomaly maps across multiple transformer stages via clustered normal feature comparison, and fuses the maps with an uncertainty-weighted mechanism. The central empirical claim is that this approach consistently outperforms both training-based and DINO-based state-of-the-art methods on the BMAD benchmark spanning brain MRI, liver CT, and retinal OCT datasets.

Significance. If the reported outperformance holds under rigorous verification, the work would establish that pretrained natural-image features can support effective hierarchical matching and uncertainty-aware fusion for anomaly detection across multiple medical modalities without any task-specific training or adaptation. This would address key scalability limitations of training-based methods that require large normal datasets and modality-specific optimization, potentially enabling more flexible deployment across institutions.

major comments (2)
  1. [Abstract] The abstract asserts outperformance on BMAD but supplies no quantitative metrics, baseline details, statistical tests, or ablation results, so it is impossible to judge whether the data actually support the central claim.
  2. [Method] The assumption that dual-branch global-plus-patch matching on DINOv3 features plus uncertainty-weighted fusion of multi-stage maps will produce reliable anomaly scores without any task-specific optimization or domain adaptation is load-bearing for the training-free premise. No analysis is provided on how domain shift (intensity statistics, lack of color, modality-specific noise) affects the quality of clustered normal references or the resulting maps for the three BMAD modalities.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment point by point below, indicating where revisions will be made to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] The abstract asserts outperformance on BMAD but supplies no quantitative metrics, baseline details, statistical tests, or ablation results, so it is impossible to judge whether the data actually support the central claim.

    Authors: We agree that the abstract would benefit from explicit quantitative support to allow readers to assess the central claim immediately. In the revised version, we will add concise reporting of key metrics (e.g., mean AUC across the three BMAD modalities), the primary baselines compared, and mention of statistical significance testing. revision: yes

  2. Referee: [Method] The assumption that dual-branch global-plus-patch matching on DINOv3 features plus uncertainty-weighted fusion of multi-stage maps will produce reliable anomaly scores without any task-specific optimization or domain adaptation is load-bearing for the training-free premise. No analysis is provided on how domain shift (intensity statistics, lack of color, modality-specific noise) affects the quality of clustered normal references or the resulting maps for the three BMAD modalities.

    Authors: DINOv3 features have been shown in the literature to transfer across domains without adaptation, and our dual-branch retrieval plus uncertainty fusion are designed to mitigate unreliable matches. We nevertheless recognize that an explicit examination of domain-shift effects would strengthen the training-free claim. We will add a dedicated discussion subsection with qualitative and quantitative observations on how intensity distributions and modality-specific noise influence reference clustering and anomaly-map quality across the BMAD datasets. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is self-contained via pretrained features and external benchmarks

full rationale

The paper describes a training-free pipeline that extracts features from a fixed pretrained DINOv3 model, performs dual-branch retrieval and multi-stage map generation, then fuses via uncertainty weighting. All load-bearing steps are defined directly from the external DINOv3 representations and the BMAD test sets; no parameter is fitted to the target data and then re-used as a 'prediction,' no self-citation supplies a uniqueness theorem, and no equation reduces to its own input by construction. The performance claims rest on reported comparisons against external baselines rather than any internal re-definition of quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the untested premise that DINOv3 representations transfer directly to medical anomaly detection and that the proposed matching and fusion steps add value beyond simple nearest-neighbor baselines; no free parameters, invented entities, or additional axioms are stated in the abstract.

axioms (1)
  • domain assumption Pretrained DINOv3 visual features are suitable for medical image anomaly detection without task-specific fine-tuning
    The entire framework is built on this transfer assumption.

pith-pipeline@v0.9.1-grok · 5772 in / 1303 out tokens · 26562 ms · 2026-06-26T11:14:27.618378+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 1 canonical work pages

  1. [1]

    Information Fusion p

    Ammar, M.B., Mendoza, A., Belkhir, N., Manzanera, A., Franchi, G.: Foundation models and transformers for anomaly detection: A survey. Information Fusion p. 103517 (2025)

  2. [2]

    arXiv preprint arXiv:2107.02314 (2021)

    Baid, U., Ghodasara, S., Mohan, S., Bilello, M., Calabrese, E., Colak, E., Farahani, K., Kalpathy-Cramer, J., Kitamura, F.C., Pati, S., et al.: The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classifica- tion. arXiv preprint arXiv:2107.02314 (2021)

  3. [3]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Bao,J.,Sun,H.,Deng,H.,He,Y.,Zhang,Z.,Li,X.:Bmad:Benchmarksformedical anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4042–4053 (2024)

  4. [4]

    Medical image analysis84, 102680 (2023)

    Bilic,P.,Christ,P.,Li,H.B.,Vorontsov,E.,Ben-Cohen,A.,Kaissis,G.,Szeskin,A., Jacobs, C., Mamani, G.E.H., Chartrand, G., et al.: The liver tumor segmentation benchmark (lits). Medical image analysis84, 102680 (2023)

  5. [5]

    arXiv preprint arXiv:2602.19611 (2026)

    Cai, M., Zhang, Z., Wu, G., Chai, T., Zhu, X.: Raid: Retrieval-augmented anomaly detection. arXiv preprint arXiv:2602.19611 (2026)

  6. [6]

    Medical Image Analysis102, 103500 (2025)

    Cai, Y., Zhang, W., Chen, H., Cheng, K.T.: Medianomaly: A comparative study of anomaly detection in medical images. Medical Image Analysis102, 103500 (2025). https://doi.org/10.1016/j.media.2025.103500

  7. [7]

    arXiv preprint arXiv:2401.16402 (2024)

    Cao, Y., Xu, X., Zhang, J., Cheng, Y., Huang, X., Pang, G., Shen, W.: A survey on visual anomaly detection: Challenge, approach, and prospect. arXiv preprint arXiv:2401.16402 (2024)

  8. [8]

    Neural Networks147, 53–62 (2022)

    Chen, L., You, Z., Zhang, N., Xi, J., Le, X.: Utrad: Anomaly detection and local- ization with u-transformer. Neural Networks147, 53–62 (2022)

  9. [9]

    In: 2025 IEEE/CVF Winter Con- ference on Applications of Computer Vision (WACV)

    Damm, S., Laszkiewicz, M., Lederer, J., Fischer, A.: Anomalydino: Boosting patch- based few-shot anomaly detection with dinov2. In: 2025 IEEE/CVF Winter Con- ference on Applications of Computer Vision (WACV). pp. 1319–1329. IEEE (2025)

  10. [10]

    In: International conference on pattern recognition

    Defard,T.,Setkov,A.,Loesch,A.,Audigier,R.:Padim:apatchdistributionmodel- ing framework for anomaly detection and localization. In: International conference on pattern recognition. pp. 475–489. Springer (2021)

  11. [11]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Deng, H., Li, X.: Anomaly detection via reverse distillation from one-class embed- ding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9737–9746 (2022)

  12. [12]

    arXiv preprint arXiv:2010.11929 (2020) 10 J

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) 10 J. Huo et al

  13. [13]

    ACM computing surveys (CSUR)54(7), 1–37 (2021)

    Fernando, T., Gammulle, H., Denman, S., Sridharan, S., Fookes, C.: Deep learning for medical anomaly detection–a survey. ACM computing surveys (CSUR)54(7), 1–37 (2021)

  14. [14]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Gu, Z., Zhu, B., Zhu, G., Chen, Y., Tang, M., Wang, J.: Univad: A training- free unified model for few-shot visual anomaly detection. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 15194–15203 (2025)

  15. [15]

    Medical image analysis55, 216–227 (2019)

    Hu, J., Chen, Y., Yi, Z.: Automated segmentation of macular edema in oct using deep neural networks. Medical image analysis55, 216–227 (2019)

  16. [16]

    arXiv preprint arXiv:2602.03870 (2026)

    Huo, J., Hong, J., Chen, L.: Dino-ad: Unsupervised anomaly detection with frozen dino-v3 features. arXiv preprint arXiv:2602.03870 (2026)

  17. [17]

    In: Proc

    Landman, B., Xu, Z., Igelsias, J., Styner, M., Langerak, T., Klein, A.: Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. In: Proc. MICCAI multi-atlas labeling beyond cranial vault—workshop challenge. vol. 5, p. 12. Munich, Germany (2015)

  18. [18]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Liu, Z., Zhou, Y., Xu, Y., Wang, Z.: Simplenet: A simple network for image anomaly detection and localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20402–20411 (2023)

  19. [19]

    arXiv preprint arXiv:2304.07193 (2023)

    Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)

  20. [20]

    In: International conference on machine learning

    Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021)

  21. [21]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Roth, K., Pemula, L., Zepeda, J., Schölkopf, B., Brox, T., Gehler, P.: Towards total recall in industrial anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14318–14328 (2022)

  22. [22]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Salehi, M., Sadjadi, N., Baselizadeh, S., Rohban, M.H., Rabiee, H.R.: Mul- tiresolution knowledge distillation for anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14902– 14912 (2021)

  23. [23]

    Schlegl,T.,Seeböck,P.,Waldstein,S.M.,Langs,G.,Schmidt-Erfurth,U.:f-anogan: Fastunsupervisedanomalydetectionwithgenerativeadversarialnetworks.Medical image analysis54, 30–44 (2019)

  24. [24]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention

    Schulthess, N., Konukoglu, E.: Anomaly detection by clustering dino embeddings using a dirichlet process mixture. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 46–56. Springer (2025)

  25. [25]

    arXiv preprint arXiv:2508.10104 (2025)

    Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khali- dov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., et al.: Dinov3. arXiv preprint arXiv:2508.10104 (2025)

  26. [26]

    arXiv preprint arXiv:2103.04257 (2021)

    Wang, G., Han, S., Ding, E., Huang, D.: Student-teacher feature pyramid matching for anomaly detection. arXiv preprint arXiv:2103.04257 (2021)

  27. [27]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Zavrtanik, V., Kristan, M., Skočaj, D.: Draem-a discriminatively trained re- construction embedding for surface anomaly detection. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 8330–8339 (2021)