Hypergraph-Enhanced Training-Free and Language-Free Few-Shot Anomaly Detection

Dingying Fan; Guohuan Xie; Siqi Li; Xin He; Yun Liu

arxiv: 2605.10628 · v1 · submitted 2026-05-11 · 💻 cs.CV

Hypergraph-Enhanced Training-Free and Language-Free Few-Shot Anomaly Detection

Guohuan Xie , Xin He , Dingying Fan , Siqi Li , Yun Liu This is my paper

Pith reviewed 2026-05-12 05:16 UTC · model grok-4.3

classification 💻 cs.CV

keywords few-shot anomaly detectionhypergraphtraining-freelanguage-freeDINOv3sparsemaxanomaly detectionvisual inspection

0 comments

The pith

HyperFSAD uses sparsemax-selected hyperedges on DINOv3 features to perform training-free and language-free few-shot anomaly detection that outperforms prior approaches across six datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces HyperFSAD, a framework that addresses few-shot anomaly detection without any task-specific training or language prompts by relying solely on pre-trained visual features. It replaces standard nearest-neighbor matching with Sparse Hyper Matching, in which sparsemax selects relevant support patches that are then grouped into hyperedges to form compact representations of normal patterns. These hyperedges suppress noise from background or distractors. The method further combines local patch-level anomaly evidence with global semantic checks through support-aware CLS token matching to produce final image scores. This purely visual pipeline delivers competitive results on industrial and medical benchmarks, showing that hypergraph structures can support robust inference under strict zero-adaptation constraints.

Core claim

HyperFSAD performs inference by first extracting patch features from DINOv3, then applying sparsemax to select the most relevant support patches for aggregation into hyperedges that serve as normal prototypes, and finally computing anomaly scores via Dual-Branch Image Scoring that fuses a patch-grid anomaly map with global deviation measured by support-aware CLS matching, all without optimization or text, to achieve state-of-the-art results on MVTecAD, VisA, MPDD, BTAD, RESC, and BraTS.

What carries the argument

Sparse Hyper Matching, in which sparsemax selects support patches that are aggregated into hyperedges as compact normal evidence, combined with Dual-Branch Image Scoring that merges spatial patch-grid anomaly maps and support-aware CLS token matching.

If this is right

Anomaly detection systems can be deployed on new tasks or domains with no fine-tuning or prompt engineering required.
Background noise and distractors in patch features are suppressed through hyperedge aggregation rather than simple nearest-neighbor selection.
Image-level decisions gain reliability by balancing local spatial evidence with global semantic deviation in a single visual pipeline.
The same framework applies uniformly to both industrial manufacturing defects and medical imaging anomalies.
Labor-intensive creation of text prompts or dataset-specific training loops becomes unnecessary for competitive performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Replacing DINOv3 with stronger future self-supervised backbones could improve results while keeping the hypergraph and dual-branch logic unchanged.
The hyperedge construction may transfer to other few-shot visual tasks such as classification or segmentation that also rely on patch-level comparisons.
In resource-limited settings the absence of training steps could enable rapid on-site setup of inspection systems.
Higher-order relations captured by hypergraphs might reduce sensitivity to patch-level outliers compared with pairwise matching alone.

Load-bearing premise

That DINOv3 patch features will contain sufficient domain-general information so that sparsemax-selected hyperedges can reliably represent normal patterns without any further adaptation.

What would settle it

A new dataset containing anomalies that depend on semantic relationships or contextual cues absent from DINOv3 patch embeddings, where the method would fail to separate normal from anomalous images despite the hypergraph construction.

Figures

Figures reproduced from arXiv: 2605.10628 by Dingying Fan, Guohuan Xie, Siqi Li, Xin He, Yun Liu.

**Figure 1.** Figure 1: Comparison on MVTecAD [1], VisA [33], MPDD [17], BTAD [24], RESC [14], and BraTS [23] datasets under the 1-shot setting. Left: I-AUROC. Right: P-AUROC. on a dataset’s training set, potentially augmented with synthetic anomalies, then transfers it to new datasets for testing [29,20,15,32,3]. In target domain training, a prompt or adapter is learned on the target dataset using only normal samples before test… view at source ↗

**Figure 2.** Figure 2: Overview of HyperFSAD. Given K normal support images and a query image, a frozen DINOv3 encoder extracts multi-layer patch tokens and ¡CLS¿ tokens to construct a Multi-layer Memory Bank of normal features. For each query patch at layer l, we perform Sparse Hyper Matching: similarities to all support patches in Pl are converted into sparse fusion weights via sparsemax, selecting a few informative neighbors … view at source ↗

**Figure 3.** Figure 3: Average AUROC across different shot settings (left: I-AUROC, right: P [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of predicted anomaly maps under the 1-shot set [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: t-SNE visualization of dinov3 vitb16 Layer-10 ¡CLS¿ embeddings for normal vs. anomalous samples across eight MVTecAD categories [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative comparison of predicted anomaly maps under the 1-shot set [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative comparison of predicted anomaly maps under the 1-shot set [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

read the original abstract

Few-shot anomaly detection (FSAD) has made significant strides, yet existing methods still face critical challenges: (i) dependence on task- or dataset-specific training/fine-tuning, (ii) reliance on language supervision or carefully hand-crafted prompts, and (iii) limited robustness across domains. In this paper, we introduce HyperFSAD, a novel FSAD framework that is training-free, language-free, and robust across domains, offering a powerful solution to these challenges. Built upon DINOv3 and a hypergraph-based inference mechanism, our approach performs inference without any task-specific optimization or text prompts, while remaining competitive. Specifically, we replace sensitive nearest-neighbor / top-$n$ matching with \textbf{Sparse Hyper Matching}: \textit{sparsemax} first selects the most relevant support patches, which are then aggregated into a \textit{hyperedge} as compact normal evidence to suppress background noise and distractors. We further introduce \textbf{Dual-Branch Image Scoring}, which fuses \emph{spatial anomaly evidence} from the patch-grid anomaly map with \emph{global semantic deviation} captured by support-aware CLS matching, yielding a robust image-level anomaly score in a strictly visual manner. Notably, all components of HyperFSAD are purely visual, eliminating the need for labor-intensive hand-crafted text prompts. Under the stringent training-free and language-free setting, HyperFSAD achieves state-of-the-art performance across six datasets spanning four industrial datasets (MVTecAD, VisA, MPDD, BTAD) and two medical datasets (RESC, BraTS).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HyperFSAD offers a clean training-free language-free FSAD approach via sparsemax hyperedges on DINOv3 but the SOTA claim rests on unshown numbers.

read the letter

HyperFSAD gives a training-free, language-free way to do few-shot anomaly detection by using hypergraphs on DINOv3 patch features. The key novelties are Sparse Hyper Matching that applies sparsemax to pick and aggregate relevant support patches into hyperedges for cleaner normal evidence, plus Dual-Branch Image Scoring that blends local spatial anomaly maps with global CLS-based semantic deviation. This works well for avoiding the usual needs for fine-tuning or prompt design, which helps with cross-domain use in industry and medicine. The method stays purely visual and uses established tools like sparsemax without adding learnable parameters. It directly tackles the three challenges listed in the abstract. The soft spots are mainly around evidence: the abstract claims state-of-the-art results on six datasets but shows no actual scores, ablations, or comparison details. That makes it hard to see how big the improvement is or whether the hypergraph parts are the main driver versus the backbone. I'd also want confirmation that all baselines were run under the exact same training-free and language-free rules, and check for any sensitivity to the few-shot sample selection. If the full paper has solid tables and controls, this could be a useful reference for anyone building simple FSAD pipelines. It seems like honest work on a practical problem without overclaiming in the description. The approach doesn't appear to have internal contradictions or hidden fitting. I'd recommend sending it to peer review so the experimental claims get checked properly by referees who can look at the numbers and setups.

Referee Report

2 major / 2 minor

Summary. The paper introduces HyperFSAD, a training-free and language-free few-shot anomaly detection method built on DINOv3 patch features. It replaces nearest-neighbor matching with Sparse Hyper Matching, where sparsemax selects relevant support patches that are aggregated into hyperedges serving as compact normal evidence. A Dual-Branch Image Scoring module then fuses a patch-grid anomaly map (spatial evidence) with support-aware CLS token matching (global semantic deviation) to produce the final image-level score. The central claim is that this purely visual pipeline achieves state-of-the-art performance on six datasets (MVTecAD, VisA, MPDD, BTAD, RESC, BraTS) under strict training-free and language-free constraints.

Significance. If the performance claims are substantiated with rigorous quantitative results, ablations, and fair comparisons, the work would be significant for practical FSAD deployment. It removes the need for task-specific fine-tuning or prompt engineering, which is a common bottleneck in industrial and medical imaging applications where data is scarce and domain shifts are frequent. The parameter-free nature of the hyperedge construction and the dual-branch fusion are particularly attractive strengths.

major comments (2)

[Abstract] Abstract: The assertion of 'state-of-the-art performance' is made without any numerical metrics, baseline comparisons, AUC/F1 scores, or error bars. This is load-bearing for the central claim; the abstract supplies no evidence that allows evaluation of the magnitude or statistical significance of the reported gains.
[Methods] Methods (Sparse Hyper Matching): The description states that sparsemax selects support patches which are then aggregated into a hyperedge, yet no equation or algorithmic detail is supplied showing how the hyperedge embedding is computed or how background suppression is guaranteed. Without this, it is impossible to verify that the method is truly parameter-free or that it avoids the very sensitivity issues it claims to solve.

minor comments (2)

[Abstract] The abstract lists four industrial and two medical datasets but does not name the exact splits or few-shot shot counts used; these details should appear in the experimental protocol section for reproducibility.
[Methods] Notation for the dual-branch score fusion (spatial map + CLS matching) is introduced only descriptively; an explicit equation would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below with point-by-point responses, indicating where revisions have been made to strengthen the paper.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion of 'state-of-the-art performance' is made without any numerical metrics, baseline comparisons, AUC/F1 scores, or error bars. This is load-bearing for the central claim; the abstract supplies no evidence that allows evaluation of the magnitude or statistical significance of the reported gains.

Authors: We agree that the abstract would be strengthened by including concrete quantitative evidence. In the revised manuscript, we have updated the abstract to report key performance metrics, including average AUC-ROC scores across the six datasets, specific gains over leading baselines, and reference to error bars from repeated runs. This provides immediate substantiation for the state-of-the-art claim while preserving the abstract's brevity. revision: yes
Referee: [Methods] Methods (Sparse Hyper Matching): The description states that sparsemax selects support patches which are then aggregated into a hyperedge, yet no equation or algorithmic detail is supplied showing how the hyperedge embedding is computed or how background suppression is guaranteed. Without this, it is impossible to verify that the method is truly parameter-free or that it avoids the very sensitivity issues it claims to solve.

Authors: We acknowledge that the original Methods description of Sparse Hyper Matching was insufficiently detailed. We have revised the section to include the explicit equations for hyperedge construction: the sparsemax operator produces selection weights over support patches, which are then used to compute the hyperedge embedding as a convex combination of the selected patch features. Background suppression is achieved through the sparsity property of sparsemax, which assigns near-zero weights to irrelevant patches. These additions confirm the parameter-free character of the approach and provide the algorithmic rigor needed for verification and reproducibility. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The abstract and method outline rely on an external pre-trained DINOv3 model for patch features, standard sparsemax for hyperedge selection, and a dual-branch scoring rule that fuses patch-grid maps with CLS matching. No equations, fitted parameters, or derivations are shown that reduce the claimed SOTA performance to quantities defined by the same data or by self-citation chains. All components are described as training-free and language-free, drawing on independent external models and fixed operations rather than any self-referential construction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the assumption that DINOv3 features are sufficiently general for anomaly detection and that the hypergraph aggregation step adds robustness without introducing new fitted parameters. No explicit free parameters are named in the abstract.

axioms (2)

domain assumption DINOv3 produces patch features that separate normal from anomalous regions across domains without fine-tuning
The entire pipeline is built on frozen DINOv3 features; this is invoked when the method is described as training-free.
domain assumption Sparsemax selection followed by hyperedge aggregation suppresses background noise better than standard top-n matching
This is the justification given for replacing nearest-neighbor matching with the proposed Sparse Hyper Matching.

invented entities (1)

hyperedge as compact normal evidence no independent evidence
purpose: Aggregate selected support patches into a single representation that suppresses distractors
New construct introduced to replace conventional patch matching; no independent falsifiable prediction is provided in the abstract.

pith-pipeline@v0.9.0 · 5597 in / 1587 out tokens · 42172 ms · 2026-05-12T05:16:50.276640+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we replace sensitive nearest-neighbor / top-n matching with Sparse Hyper Matching: sparsemax first selects the most relevant support patches, which are then aggregated into a hyperedge as compact normal evidence
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Dual-Branch Image Scoring, which fuses spatial anomaly evidence from the patch-grid anomaly map with global semantic deviation captured by support-aware CLS matching

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 1 internal anchor

[1]

In: IEEE Conf

Bergmann, P., Fauser, M., Sattlegger, D., Steger, C.: Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection. In: IEEE Conf. Comput. Vis. Pattern Recog. pp. 9592–9600 (2019)

work page 2019
[2]

Cao, Y., Zhang, J., Frittoli, L., Cheng, Y., Shen, W., Boracchi, G.: Adaclip: Adapt- ing clip with hybrid learnable prompts for zero-shot anomaly detection. In: Eur. Conf. Comput. Vis. pp. 55–72 (2024)

work page 2024
[3]

IEEE Trans

Chen, Q., Luo, H., Yao, H., Luo, W., Qu, Z., Lv, C., Zhang, Z.: Center-aware resid- ual anomaly synthesis for multiclass industrial anomaly detection. IEEE Trans. Ind. Informatics1(2025)

work page 2025
[4]

A zero-/fewshot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad,

Chen, X., Han, Y., Zhang, J.: April-gan: A zero-/few-shot anomaly classifica- tion and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad. arXiv preprint arXiv:2305.17382 (2023)

work page arXiv 2023
[5]

Fang, Z., Wang, X., Li, H., Liu, J., Hu, Q., Xiao, J.: Fastrecon: Few-shot industrial anomaly detection via fast feature reconstruction. In: Int. Conf. Comput. Vis. pp. 17481–17490 (2023)

work page 2023
[6]

IEEE transactions on pattern analysis and machine intelligence47(4), 2388–2401 (2024)

Feng, Y., Huang, J., Du, S., Ying, S., Yong, J.H., Li, Y., Ding, G., Ji, R., Gao, Y.: Hyper-yolo: When visual object detection meets hypergraph computation. IEEE transactions on pattern analysis and machine intelligence47(4), 2388–2401 (2024)

work page 2024
[7]

In: AAAI

Feng, Y., You, H., Zhang, Z., Ji, R., Gao, Y.: Hypergraph neural networks. In: AAAI. pp. 3558–3565 (2019)

work page 2019
[8]

In: IEEE Conf

Fixelle, J.: Hypergraph vision transformers: Images are more than nodes, more than edges. In: IEEE Conf. Comput. Vis. Pattern Recog. pp. 9751–9761 (2025)

work page 2025
[9]

IEEE TPAMI44(5), 2548–2566 (2020)

Gao, Y., Zhang, Z., Lin, H., Xu, X., Ti, J.R., Utschick, W.: Hypergraph learning: Methods and practices. IEEE TPAMI44(5), 2548–2566 (2020)

work page 2020
[10]

Journal of Sensors and Sensor Systems14(2), 119–132 (2025)

Goodarzi, P., Sch¨ utze, A., Schneider, T.: Domain shifts in industrial condition monitoring: a comparative analysis of automated machine learning models. Journal of Sensors and Sensor Systems14(2), 119–132 (2025)

work page 2025
[11]

In: AAAI

Gu, Z., Zhu, B., Zhu, G., Chen, Y., Tang, M., Wang, J.: Anomalygpt: Detecting industrial anomalies using large vision-language models. In: AAAI. pp. 1932–1940 (2024)

work page 1932
[12]

The impact of scanner domain shift on deep learning performance in medical imaging: an experimental study,

Guo, B., Lu, D., Szumel, G., Gui, R., Wang, T., Konz, N., Mazurowski, M.A.: The impact of scanner domain shift on deep learning performance in medical imaging: an experimental study. arXiv preprint arXiv:2409.04368 (2024)

work page arXiv 2024
[13]

Han, Y., Wang, P., Kundu, S., Ding, Y., Wang, Z.: Vision hgnn: An image is more than a graph of nodes. In: Int. Conf. Comput. Vis. pp. 19878–19888 (2023)

work page 2023
[14]

Medical image analysis55, 216–227 (2019)

Hu, J., Chen, Y., Yi, Z.: Automated segmentation of macular edema in oct using deep neural networks. Medical image analysis55, 216–227 (2019)

work page 2019
[15]

Huang, C., Guan, H., Jiang, A., Zhang, Y., Spratling, M., Wang, Y.F.: Registration based few-shot anomaly detection. In: Eur. Conf. Comput. Vis. pp. 303–319 (2022)

work page 2022
[16]

In: IEEE Conf

Jeong, J., Zou, Y., Kim, T., Zhang, D., Ravichandran, A., Dabeer, O.: Winclip: Zero-/few-shot anomaly classification and segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. pp. 19606–19616 (2023)

work page 2023
[17]

In: 2021 13th International congress on ultra modern telecommunications and control systems and workshops (ICUMT)

Jezek, S., Jonak, M., Burget, R., Dvorak, P., Skotak, M.: Deep learning-based de- fect detection of metal parts: evaluating current methods in complex conditions. In: 2021 13th International congress on ultra modern telecommunications and control systems and workshops (ICUMT). pp. 66–71. IEEE (2021)

work page 2021
[18]

Yolov13: Real-time object detection with hypergraph-enhanced adaptive visual perception

Lei, M., Li, S., Wu, Y., Hu, H., Zhou, Y., Zheng, X., Ding, G., Du, S., Wu, Z., Gao, Y.: Yolov13: Real-time object detection with hypergraph-enhanced adaptive visual perception. arXiv preprint arXiv:2506.17733 (2025)

work page arXiv 2025
[19]

In: IEEE Conf

Li, X., Zhang, Z., Tan, X., Chen, C., Qu, Y., Xie, Y., Ma, L.: Promptad: Learning prompts with only normal samples for few-shot anomaly detection. In: IEEE Conf. Comput. Vis. Pattern Recog. pp. 16838–16848 (2024)

work page 2024
[20]

Ma, H., Yang, G., Zhao, D., Ji, Y., Zuo, W.: Remp-ad: Retrieval-enhanced multi- modal prompt fusion for few-shot industrial visual anomaly detection. In: Int. Conf. Comput. Vis. pp. 20425–20434 (2025)

work page 2025
[21]

Mahapatra, D., Bozorgtabar, B., Ge, Z.: Medical image classification using gener- alized zero shot learning. In: Int. Conf. Comput. Vis. pp. 3344–3353 (2021)

work page 2021
[22]

Martins, A., Astudillo, R.: From softmax to sparsemax: A sparse model of attention and multi-label classification. In: Int. Conf. Mach. Learn. pp. 1614–1623 (2016)

work page 2016
[23]

IEEE Trans

Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., Burren, Y., Porz, N., Slotboom, J., Wiest, R., et al.: The multimodal brain tumor image segmentation benchmark (brats). IEEE Trans. Med. Imaging34(10), 1993– 2024 (2014)

work page 1993
[24]

In: IEEE International Symposium on Industrial Electronics

Mishra, P., Verk, R., Fornasier, D., Piciarelli, C., Foresti, G., et al.: Vt-adl: A vision transformer network for image anomaly detection and localization. In: IEEE International Symposium on Industrial Electronics. pp. 01–06 (2021)

work page 2021
[25]

In: Assoc

Peters, B., Niculae, V., Martins, A.F.: Sparse sequence-to-sequence models. In: Assoc. Comput. Linguistics. pp. 1504–1519 (2019)

work page 2019
[26]

Qu, Z., Tao, X., Prasad, M., Shen, F., Zhang, Z., Gong, X., Ding, G.: Vcp-clip: A visual context prompting model for zero-shot anomaly segmentation. In: Eur. Conf. Comput. Vis. pp. 301–317 (2024)

work page 2024
[27]

IEEE Trans

Qu, Z., Tao, X., Shen, F., Zhang, Z., Li, T.: Investigating shift equivalence of convolutional neural networks in industrial defect segmentation. IEEE Trans. In- strumentation and Measurement72, 1–17 (2023)

work page 2023
[28]

DINOv3

Sim´ eoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khali- dov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., et al.: Dinov3. arXiv preprint arXiv:2508.10104 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[29]

In: AAAI

Tao, F., Xie, G.S., Zhao, F., Shu, X.: Kernel-aware graph prompt learning for few-shot anomaly detection. In: AAAI. pp. 7347–7355 (2025)

work page 2025
[30]

Journal of Intelligent Manufac- turing36(7), 4963–4975 (2025)

Wang, F., Wu, J., Yang, Z., Song, Y.: Industrial vision inspection using digital twins: bridging cad models and realistic scenarios. Journal of Intelligent Manufac- turing36(7), 4963–4975 (2025)

work page 2025
[31]

In: NIPS

Zhou, D., Huang, J., Sch¨ olkopf, B.: Learning with hypergraphs: Clustering, classi- fication, and embedding. In: NIPS. pp. 1601–1608 (2006)

work page 2006
[32]

Zhou, Q., Pang, G., Tian, Y., He, S., Chen, J.: Anomalyclip: Object-agnostic prompt learning for zero-shot anomaly detection. In: Int. Conf. Learn. Represent. pp. 49705–49737 (2024)

work page 2024
[33]

Zou, Y., Jeong, J., Pemula, L., Zhang, D., Dabeer, O.: Spot-the-difference self- supervised pre-training for anomaly detection and segmentation. In: Eur. Conf. Comput. Vis. pp. 392–408 (2022) A Detailed Results on Additional Metrics In this section, we report detailed quantitative results on additional evaluation metrics beyond the AUROC results presented...

work page arXiv 2022

[1] [1]

In: IEEE Conf

Bergmann, P., Fauser, M., Sattlegger, D., Steger, C.: Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection. In: IEEE Conf. Comput. Vis. Pattern Recog. pp. 9592–9600 (2019)

work page 2019

[2] [2]

Cao, Y., Zhang, J., Frittoli, L., Cheng, Y., Shen, W., Boracchi, G.: Adaclip: Adapt- ing clip with hybrid learnable prompts for zero-shot anomaly detection. In: Eur. Conf. Comput. Vis. pp. 55–72 (2024)

work page 2024

[3] [3]

IEEE Trans

Chen, Q., Luo, H., Yao, H., Luo, W., Qu, Z., Lv, C., Zhang, Z.: Center-aware resid- ual anomaly synthesis for multiclass industrial anomaly detection. IEEE Trans. Ind. Informatics1(2025)

work page 2025

[4] [4]

A zero-/fewshot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad,

Chen, X., Han, Y., Zhang, J.: April-gan: A zero-/few-shot anomaly classifica- tion and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad. arXiv preprint arXiv:2305.17382 (2023)

work page arXiv 2023

[5] [5]

Fang, Z., Wang, X., Li, H., Liu, J., Hu, Q., Xiao, J.: Fastrecon: Few-shot industrial anomaly detection via fast feature reconstruction. In: Int. Conf. Comput. Vis. pp. 17481–17490 (2023)

work page 2023

[6] [6]

IEEE transactions on pattern analysis and machine intelligence47(4), 2388–2401 (2024)

Feng, Y., Huang, J., Du, S., Ying, S., Yong, J.H., Li, Y., Ding, G., Ji, R., Gao, Y.: Hyper-yolo: When visual object detection meets hypergraph computation. IEEE transactions on pattern analysis and machine intelligence47(4), 2388–2401 (2024)

work page 2024

[7] [7]

In: AAAI

Feng, Y., You, H., Zhang, Z., Ji, R., Gao, Y.: Hypergraph neural networks. In: AAAI. pp. 3558–3565 (2019)

work page 2019

[8] [8]

In: IEEE Conf

Fixelle, J.: Hypergraph vision transformers: Images are more than nodes, more than edges. In: IEEE Conf. Comput. Vis. Pattern Recog. pp. 9751–9761 (2025)

work page 2025

[9] [9]

IEEE TPAMI44(5), 2548–2566 (2020)

Gao, Y., Zhang, Z., Lin, H., Xu, X., Ti, J.R., Utschick, W.: Hypergraph learning: Methods and practices. IEEE TPAMI44(5), 2548–2566 (2020)

work page 2020

[10] [10]

Journal of Sensors and Sensor Systems14(2), 119–132 (2025)

Goodarzi, P., Sch¨ utze, A., Schneider, T.: Domain shifts in industrial condition monitoring: a comparative analysis of automated machine learning models. Journal of Sensors and Sensor Systems14(2), 119–132 (2025)

work page 2025

[11] [11]

In: AAAI

Gu, Z., Zhu, B., Zhu, G., Chen, Y., Tang, M., Wang, J.: Anomalygpt: Detecting industrial anomalies using large vision-language models. In: AAAI. pp. 1932–1940 (2024)

work page 1932

[12] [12]

The impact of scanner domain shift on deep learning performance in medical imaging: an experimental study,

Guo, B., Lu, D., Szumel, G., Gui, R., Wang, T., Konz, N., Mazurowski, M.A.: The impact of scanner domain shift on deep learning performance in medical imaging: an experimental study. arXiv preprint arXiv:2409.04368 (2024)

work page arXiv 2024

[13] [13]

Han, Y., Wang, P., Kundu, S., Ding, Y., Wang, Z.: Vision hgnn: An image is more than a graph of nodes. In: Int. Conf. Comput. Vis. pp. 19878–19888 (2023)

work page 2023

[14] [14]

Medical image analysis55, 216–227 (2019)

Hu, J., Chen, Y., Yi, Z.: Automated segmentation of macular edema in oct using deep neural networks. Medical image analysis55, 216–227 (2019)

work page 2019

[15] [15]

Huang, C., Guan, H., Jiang, A., Zhang, Y., Spratling, M., Wang, Y.F.: Registration based few-shot anomaly detection. In: Eur. Conf. Comput. Vis. pp. 303–319 (2022)

work page 2022

[16] [16]

In: IEEE Conf

Jeong, J., Zou, Y., Kim, T., Zhang, D., Ravichandran, A., Dabeer, O.: Winclip: Zero-/few-shot anomaly classification and segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. pp. 19606–19616 (2023)

work page 2023

[17] [17]

In: 2021 13th International congress on ultra modern telecommunications and control systems and workshops (ICUMT)

Jezek, S., Jonak, M., Burget, R., Dvorak, P., Skotak, M.: Deep learning-based de- fect detection of metal parts: evaluating current methods in complex conditions. In: 2021 13th International congress on ultra modern telecommunications and control systems and workshops (ICUMT). pp. 66–71. IEEE (2021)

work page 2021

[18] [18]

Yolov13: Real-time object detection with hypergraph-enhanced adaptive visual perception

Lei, M., Li, S., Wu, Y., Hu, H., Zhou, Y., Zheng, X., Ding, G., Du, S., Wu, Z., Gao, Y.: Yolov13: Real-time object detection with hypergraph-enhanced adaptive visual perception. arXiv preprint arXiv:2506.17733 (2025)

work page arXiv 2025

[19] [19]

In: IEEE Conf

Li, X., Zhang, Z., Tan, X., Chen, C., Qu, Y., Xie, Y., Ma, L.: Promptad: Learning prompts with only normal samples for few-shot anomaly detection. In: IEEE Conf. Comput. Vis. Pattern Recog. pp. 16838–16848 (2024)

work page 2024

[20] [20]

Ma, H., Yang, G., Zhao, D., Ji, Y., Zuo, W.: Remp-ad: Retrieval-enhanced multi- modal prompt fusion for few-shot industrial visual anomaly detection. In: Int. Conf. Comput. Vis. pp. 20425–20434 (2025)

work page 2025

[21] [21]

Mahapatra, D., Bozorgtabar, B., Ge, Z.: Medical image classification using gener- alized zero shot learning. In: Int. Conf. Comput. Vis. pp. 3344–3353 (2021)

work page 2021

[22] [22]

Martins, A., Astudillo, R.: From softmax to sparsemax: A sparse model of attention and multi-label classification. In: Int. Conf. Mach. Learn. pp. 1614–1623 (2016)

work page 2016

[23] [23]

IEEE Trans

Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., Burren, Y., Porz, N., Slotboom, J., Wiest, R., et al.: The multimodal brain tumor image segmentation benchmark (brats). IEEE Trans. Med. Imaging34(10), 1993– 2024 (2014)

work page 1993

[24] [24]

In: IEEE International Symposium on Industrial Electronics

Mishra, P., Verk, R., Fornasier, D., Piciarelli, C., Foresti, G., et al.: Vt-adl: A vision transformer network for image anomaly detection and localization. In: IEEE International Symposium on Industrial Electronics. pp. 01–06 (2021)

work page 2021

[25] [25]

In: Assoc

Peters, B., Niculae, V., Martins, A.F.: Sparse sequence-to-sequence models. In: Assoc. Comput. Linguistics. pp. 1504–1519 (2019)

work page 2019

[26] [26]

Qu, Z., Tao, X., Prasad, M., Shen, F., Zhang, Z., Gong, X., Ding, G.: Vcp-clip: A visual context prompting model for zero-shot anomaly segmentation. In: Eur. Conf. Comput. Vis. pp. 301–317 (2024)

work page 2024

[27] [27]

IEEE Trans

Qu, Z., Tao, X., Shen, F., Zhang, Z., Li, T.: Investigating shift equivalence of convolutional neural networks in industrial defect segmentation. IEEE Trans. In- strumentation and Measurement72, 1–17 (2023)

work page 2023

[28] [28]

DINOv3

Sim´ eoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khali- dov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., et al.: Dinov3. arXiv preprint arXiv:2508.10104 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[29] [29]

In: AAAI

Tao, F., Xie, G.S., Zhao, F., Shu, X.: Kernel-aware graph prompt learning for few-shot anomaly detection. In: AAAI. pp. 7347–7355 (2025)

work page 2025

[30] [30]

Journal of Intelligent Manufac- turing36(7), 4963–4975 (2025)

Wang, F., Wu, J., Yang, Z., Song, Y.: Industrial vision inspection using digital twins: bridging cad models and realistic scenarios. Journal of Intelligent Manufac- turing36(7), 4963–4975 (2025)

work page 2025

[31] [31]

In: NIPS

Zhou, D., Huang, J., Sch¨ olkopf, B.: Learning with hypergraphs: Clustering, classi- fication, and embedding. In: NIPS. pp. 1601–1608 (2006)

work page 2006

[32] [32]

Zhou, Q., Pang, G., Tian, Y., He, S., Chen, J.: Anomalyclip: Object-agnostic prompt learning for zero-shot anomaly detection. In: Int. Conf. Learn. Represent. pp. 49705–49737 (2024)

work page 2024

[33] [33]

Zou, Y., Jeong, J., Pemula, L., Zhang, D., Dabeer, O.: Spot-the-difference self- supervised pre-training for anomaly detection and segmentation. In: Eur. Conf. Comput. Vis. pp. 392–408 (2022) A Detailed Results on Additional Metrics In this section, we report detailed quantitative results on additional evaluation metrics beyond the AUROC results presented...

work page arXiv 2022