AnomalyVFM -- Transforming Vision Foundation Models into Zero-Shot Anomaly Detectors

arxiv: 2601.20524 · v2 · submitted 2026-01-28 · 💻 cs.CV

AnomalyVFM -- Transforming Vision Foundation Models into Zero-Shot Anomaly Detectors

Matic Fu\v{c}ka , Vitjan Zavrtanik , Danijel Sko\v{c}aj This is my paper

Pith reviewed 2026-05-16 10:43 UTC · model grok-4.3

classification 💻 cs.CV

keywords zero-shot anomaly detectionvision foundation modelssynthetic data generationanomaly localizationparameter-efficient adaptationlow-rank adapters

0 comments p. Extension

The pith

AnomalyVFM converts any vision foundation model into a zero-shot anomaly detector using synthetic data and efficient adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that vision foundation models can be adapted for zero-shot anomaly detection by generating diverse synthetic anomalies in three stages and applying low-rank feature adapters with a confidence-weighted loss. This approach addresses the performance gap between vision-language models and pure vision models in detecting anomalies without in-domain training data. A sympathetic reader would care because it enables anomaly detection in new domains without collecting real anomaly examples, which are often rare or expensive to obtain. With the RADIO backbone, it reaches 94.1% average image-level AUROC on nine datasets, improving on prior methods by 3.3 points. The framework is general and works with various pretrained VFMs.

Core claim

AnomalyVFM is a framework that combines a three-stage synthetic dataset generation scheme with parameter-efficient adaptation using low-rank feature adapters and a confidence-weighted pixel loss to transform pretrained vision foundation models into strong zero-shot anomaly detectors, achieving superior performance on multiple datasets.

What carries the argument

The three-stage synthetic anomaly dataset generation combined with low-rank feature adapters and confidence-weighted pixel loss, which adapts the VFM features for anomaly scoring without full retraining.

If this is right

Any pretrained VFM can be turned into a zero-shot anomaly detector without domain-specific training images.
Performance on zero-shot anomaly detection improves significantly, reaching 94.1% AUROC on average across diverse datasets.
The method closes the gap with VLM-based approaches by using only vision models.
Adaptation is parameter-efficient, making it practical for large models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such synthetic data methods could extend to other vision tasks where real anomalies are scarce.
Future work might explore combining this with multi-modal models for even better localization.
Testing on more industrial or medical domains could reveal if the synthetic generation generalizes further.

Load-bearing premise

The synthetic anomalies generated in three stages have statistical properties close enough to real anomalies in the nine test domains.

What would settle it

Evaluating AnomalyVFM on a new dataset where real anomalies differ markedly in appearance or distribution from the synthetic ones generated, and finding performance drops below prior methods.

Figures

Figures reproduced from arXiv: 2601.20524 by Danijel Sko\v{c}aj, Matic Fu\v{c}ka, Vitjan Zavrtanik.

**Figure 2.** Figure 2: Examples of generated anomaly-free images [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Dataset generation pipeline. The image I is generated using a text-conditioned image generation model. Then, the foreground mask Mfg is extracted and an anomalous region R is sampled from it. Then, the anomalous image Ia is generated by inpainting an anomaly inside R. Finally, features are extracted from I and Ia, and then compared and thresholded to obtain M. f and fa, respectively. The extracted featu… view at source ↗

**Figure 4.** Figure 4: Architecture of AnomalyVFM. All additions to the base VFM are colored in [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison of the anomaly segmentation masks produced by AnomalyVFM and two other best-performing methods. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 1.** Figure 1: Failure Cases in Image Generation Process [PITH_FULL_IMAGE:figures/full_fig_p013_1.png] view at source ↗

**Figure 2.** Figure 2: Anomalous Area Distribution in the generated synthetic [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Model performance and rejection rate in relation to filtering threshold [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Model performance in comparison to the number of [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Model performance in comparison to the number of images in the training set. [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative examples of anomaly segmentation masks produced by AnomalyVFM. In the first row, the image is shown. In the [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

read the original abstract

Zero-shot anomaly detection aims to detect and localise abnormal regions in the image without access to any in-domain training images. While recent approaches leverage vision-language models (VLMs), such as CLIP, to transfer high-level concept knowledge, methods based on purely vision foundation models (VFMs), like DINOv2, have lagged behind in performance. We argue that this gap stems from two practical issues: (i) limited diversity in existing auxiliary anomaly detection datasets and (ii) overly shallow VFM adaptation strategies. To address both challenges, we propose AnomalyVFM, a general and effective framework that turns any pretrained VFM into a strong zero-shot anomaly detector. Our approach combines a robust three-stage synthetic dataset generation scheme with a parameter-efficient adaptation mechanism, utilising low-rank feature adapters and a confidence-weighted pixel loss. Together, these components enable modern VFMs to substantially outperform current state-of-the-art methods. More specifically, with RADIO as a backbone, AnomalyVFM achieves an average image-level AUROC of 94.1% across 9 diverse datasets, surpassing previous methods by significant 3.3 percentage points. Project Page: https://maticfuc.github.io/anomaly_vfm/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes AnomalyVFM, a framework that adapts pretrained vision foundation models (VFMs) such as RADIO and DINOv2 into zero-shot anomaly detectors. It combines a three-stage synthetic anomaly dataset generation scheme with parameter-efficient low-rank feature adapters and a confidence-weighted pixel loss. The central empirical claim is that this yields an average image-level AUROC of 94.1% across nine diverse datasets with the RADIO backbone, outperforming prior methods by 3.3 percentage points.

Significance. If the performance claims hold under rigorous validation, the work would be significant for zero-shot anomaly detection. It shows that modern VFMs can close the gap with VLM-based approaches through synthetic data augmentation and efficient adaptation rather than direct parameter fitting to target domains. This could enable more practical deployment in domains with scarce real anomalies, such as industrial inspection.

major comments (3)

[§3] §3 (Synthetic Dataset Generation): The three-stage procedure is load-bearing for the zero-shot claim, yet the manuscript provides no quantitative validation (e.g., feature distribution distances, texture statistics, or scale histograms) that the generated anomalies match the visual properties of real anomalies across the nine evaluation domains. Without this, the reported gains may reflect memorization of the generation recipe rather than transferable anomaly cues.
[Results section, Table 1] Results section, Table 1: The 94.1% average AUROC and 3.3pp improvement are presented without statistical significance tests, standard deviations across runs, or explicit train/test split details. This leaves moderate uncertainty about whether the central performance claim is robust.
[§4.2] §4.2 (Ablation Studies): The ablations on adapter rank and loss weighting coefficient do not include controls that vary the synthetic generation parameters, making it impossible to isolate whether gains arise from the adaptation mechanism or from the specific synthetic distribution.

minor comments (2)

[Abstract] The abstract states 'surpassing previous methods by significant 3.3 percentage points' without naming the exact baselines or providing the per-dataset breakdown in the summary.
[§4.1] Notation for the confidence-weighted loss could be clarified with an explicit equation reference to avoid ambiguity in the weighting term.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important aspects of rigor in validating the synthetic data, statistical robustness, and ablation design. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§3] §3 (Synthetic Dataset Generation): The three-stage procedure is load-bearing for the zero-shot claim, yet the manuscript provides no quantitative validation (e.g., feature distribution distances, texture statistics, or scale histograms) that the generated anomalies match the visual properties of real anomalies across the nine evaluation domains. Without this, the reported gains may reflect memorization of the generation recipe rather than transferable anomaly cues.

Authors: We agree that explicit quantitative validation of the synthetic anomalies would better support the zero-shot transferability claim. Although the consistent gains across nine diverse, unseen datasets provide indirect evidence against pure memorization, we will add analyses in the revised manuscript, including feature distribution distances (e.g., MMD between VFM embeddings of synthetic vs. real anomalies), texture statistics (e.g., LBP histograms), and scale histograms. These will be reported for representative datasets to demonstrate alignment with real anomaly properties. revision: yes
Referee: [Results section, Table 1] Results section, Table 1: The 94.1% average AUROC and 3.3pp improvement are presented without statistical significance tests, standard deviations across runs, or explicit train/test split details. This leaves moderate uncertainty about whether the central performance claim is robust.

Authors: We acknowledge that reporting variability and significance strengthens the central claim. In the revision, we will add standard deviations computed over multiple random seeds (minimum 3 runs), paired statistical significance tests (e.g., t-tests) against prior methods, and explicit clarification of the evaluation protocol: zero-shot adaptation uses only the synthetic dataset with no in-domain real images, while test splits follow the standard benchmarks from prior zero-shot anomaly detection literature. revision: yes
Referee: [§4.2] §4.2 (Ablation Studies): The ablations on adapter rank and loss weighting coefficient do not include controls that vary the synthetic generation parameters, making it impossible to isolate whether gains arise from the adaptation mechanism or from the specific synthetic distribution.

Authors: We will expand §4.2 with new ablation experiments that systematically vary synthetic generation parameters (e.g., anomaly density, scale ranges, and blending factors in the three-stage pipeline) while holding the adapter and loss fixed. This will help disentangle the contributions of the low-rank feature adapters and confidence-weighted pixel loss from the synthetic data characteristics, with results added as an additional table or figure. revision: yes

Circularity Check

0 steps flagged

No circularity: method uses independent synthetic data and external pretrained backbones

full rationale

The paper's core derivation trains low-rank adapters on procedurally generated synthetic anomalies (three-stage scheme) and applies them zero-shot to nine real evaluation datasets. No equations or steps reduce by construction to fitted parameters from the target distributions, no self-citation chains justify uniqueness or ansatzes, and no predictions are statistically forced by input fitting. The 94.1% AUROC claim is an external evaluation result, not a tautology. This is the normal self-contained case.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that pretrained vision foundation models already encode anomaly-relevant features and that synthetic anomalies generated in three stages are representative enough to transfer to real test distributions.

free parameters (2)

adapter rank
Low-rank adapters require a chosen rank hyperparameter that is tuned on the synthetic data.
loss weighting coefficient
The confidence-weighted pixel loss introduces at least one scalar that balances certain versus uncertain pixels.

axioms (1)

domain assumption Pretrained vision foundation models contain transferable features useful for anomaly localization without task-specific fine-tuning.
Invoked when the authors state that modern VFMs have lagged behind VLMs due to adaptation strategy rather than inherent feature quality.

pith-pipeline@v0.9.0 · 5531 in / 1395 out tokens · 21916 ms · 2026-05-16T10:43:52.831084+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

91 extracted references · 91 canonical work pages · 5 internal anchors

[1]

Zero-shot versus many-shot: Unsupervised texture anomaly detection

Toshimichi Aota, Lloyd Teh Tzer Tong, and Takayuki Okatani. Zero-shot versus many-shot: Unsupervised texture anomaly detection. InProceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision, pages 5564–5572, 2023. 2, 5

work page 2023
[2]

Efficien- tAD: Accurate Visual Anomaly Detection at Millisecond- Level Latencies

Kilian Batzner, Lars Heckler, and Rebecca K ¨onig. Efficien- tAD: Accurate Visual Anomaly Detection at Millisecond- Level Latencies. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 128– 138, 2024. 2, 8

work page 2024
[3]

Improving Unsupervised Defect Segmentation by Applying Structural Similarity to Autoencoders

Paul Bergmann, Sindy L ¨owe, Michael Fauser, David Sattleg- ger, and Carsten Steger. Improving Unsupervised defect seg- mentation by applying structural similarity to autoencoders. ArXiv, abs/1807.02011, 2018. 2

work page internal anchor Pith review Pith/arXiv arXiv 2018
[4]

MVTec AD–A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection

Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. MVTec AD–A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9592–9600, 2019. 1, 2, 5, 7

work page 2019
[5]

Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs

Jorge Bernal, F Javier S ´anchez, Gloria Fern ´andez- Esparrach, Debora Gil, Cristina Rodr ´ıguez, and Fernando Vilari˜no. Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physi- cians.Computerized medical imaging and graphics, 43:99– 111, 2015. 5

work page 2015
[6]

Language mod- els are realistic tabular data generators.arXiv preprint arXiv:2210.06280, 2022

Vadim Borisov, Kathrin Seßler, Tobias Leemann, Mar- tin Pawelczyk, and Gjergji Kasneci. Language mod- els are realistic tabular data generators.arXiv preprint arXiv:2210.06280, 2022. 2

work page arXiv 2022
[7]

Mixed supervision for surface-defect detection: From weakly to fully supervised learning.Computers in Industry, 129: 103459, 2021

Jakob Bo ˇziˇc, Domen Tabernik, and Danijel Sko ˇcaj. Mixed supervision for surface-defect detection: From weakly to fully supervised learning.Computers in Industry, 129: 103459, 2021. 5

work page 2021
[8]

Segment Any Anomaly without Training via Hybrid Prompt Regularization.arXiv preprint arXiv:2305.10724, 2023

Yunkang Cao, Xiaohao Xu, Chen Sun, Yuqi Cheng, Zongwei Du, Liang Gao, and Weiming Shen. Segment Any Anomaly without Training via Hybrid Prompt Regularization.arXiv preprint arXiv:2305.10724, 2023. 2, 3, 6

work page arXiv 2023
[9]

AdaCLIP: Adapting CLIP with hybrid learnable prompts for zero-shot anomaly detection

Yunkang Cao, Jiangning Zhang, Luca Frittoli, Yuqi Cheng, Weiming Shen, and Giacomo Boracchi. AdaCLIP: Adapting CLIP with hybrid learnable prompts for zero-shot anomaly detection. InEuropean Conference on Computer Vision, pages 55–72. Springer, 2024. 1, 2, 3, 4, 5, 6, 7, 8

work page 2024
[10]

Back on track: Bundle adjustment for dynamic scene re- construction

Weirong Chen, Ganlin Zhang, Felix Wimbauer, Rui Wang, Nikita Araslanov, Andrea Vedaldi, and Daniel Cremers. Back on track: Bundle adjustment for dynamic scene re- construction. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4951–4960,

work page
[11]

Clip-AD: A Language-Guided Staged Dual-Path Model for Zero-shot Anomaly Detection

Xuhai Chen, Jiangning Zhang, Guanzhong Tian, Haoyang He, Wuhao Zhang, Yabiao Wang, Chengjie Wang, and Yong Liu. Clip-AD: A Language-Guided Staged Dual-Path Model for Zero-shot Anomaly Detection. InInternational Joint Conference on Artificial Intelligence, pages 17–33. Springer,

work page
[12]

Noel CF Codella, David Gutman, M Emre Celebi, Brian Helba, Michael A Marchetti, Stephen W Dusza, Aadi Kalloo, Konstantinos Liopyris, Nabin Mishra, Harald Kit- tler, et al. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomed- ical imaging (isbi), hosted by the international skin imaging collaboration (i...

work page 2017
[13]

Padim: a patch distribution modeling framework for anomaly detection and localization

Thomas Defard, Aleksandr Setkov, Angelique Loesch, and Romaric Audigier. Padim: a patch distribution modeling framework for anomaly detection and localization. InInter- national conference on pattern recognition, pages 475–489. Springer, 2021. 2

work page 2021
[14]

Outlier detec- tion by ensembling uncertainty with negative objectness

Anja Deli ´c, Matej Grcic, and Sini ˇsa ˇSegvi´c. Outlier detec- tion by ensembling uncertainty with negative objectness. In 35th British Machine Vision Conference 2024, BMVC 2024, Glasgow, UK, November 25-28, 2024. BMV A, 2024. 1

work page 2024
[15]

Anomaly Detection via Re- verse Distillation from One-Class Embedding

Hanqiu Deng and Xingyu Li. Anomaly Detection via Re- verse Distillation from One-Class Embedding. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9737–9746, 2022. 2, 5

work page 2022
[16]

Few- shot defect image generation via defect-aware feature ma- nipulation.Proceedings of the AAAI Conference on Artificial Intelligence, 37(1):571–578, 2023

Yuxuan Duan, Yan Hong, Li Niu, and Liqing Zhang. Few- shot defect image generation via defect-aware feature ma- nipulation.Proceedings of the AAAI Conference on Artificial Intelligence, 37(1):571–578, 2023. 2

work page 2023
[17]

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M ¨uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling Rectified Flow Transformers for High-Resolution Image Synthesis. In Forty-first International Conference on Machine Learning,

work page
[18]

TransFusion–a Transparency-based Diffusion Model for Anomaly Detection

Matic Fu ˇcka, Vitjan Zavrtanik, and Danijel Sko ˇcaj. TransFusion–a Transparency-based Diffusion Model for Anomaly Detection. InEuropean conference on computer vision, pages 91–108. Springer, 2025. 1, 2

work page 2025
[19]

SALAD – Semantics-Aware Logical Anomaly Detection

Matic Fu ˇcka, Vitjan Zavrtanik, and Danijel Skoˇcaj. SALAD – Semantics-Aware Logical Anomaly Detection. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025. 2

work page 2025
[20]

Multi- task learning for thyroid nodule segmentation with thyroid region prior

Haifan Gong, Guanqi Chen, Ranran Wang, Xiang Xie, Mingzhi Mao, Yizhou Yu, Fei Chen, and Guanbin Li. Multi- task learning for thyroid nodule segmentation with thyroid region prior. In2021 IEEE 18th international symposium on biomedical imaging (ISBI), pages 257–261. IEEE, 2021. 5

work page 2021
[21]

A. Hamada. Br35h: Brain tumor detection.https: //www.kaggle.com/datasets/ahmedhamada0/ braintumor - detection, 2020. Online; accessed

work page 2020
[22]

The 9 endotect 2020 challenge: evaluation and comparison of clas- sification, segmentation and inference time for endoscopy

Steven A Hicks, Debesh Jha, Vajira Thambawita, P ˚al Halvorsen, Hugo L Hammer, and Michael A Riegler. The 9 endotect 2020 challenge: evaluation and comparison of clas- sification, segmentation and inference time for endoscopy. InInternational Conference on Pattern Recognition, pages 263–274. Springer, 2021. 5

work page 2020
[23]

LoRA: Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InIn- ternational Conference on Learning Representations, 2022. 4, 8

work page 2022
[24]

Anomalyd- iffusion: Few-shot anomaly image generation with diffusion model.Proceedings of the AAAI Conference on Artificial Intelligence, 38(8):8526–8534, 2024

Teng Hu, Jiangning Zhang, Ran Yi, Yuzhen Du, Xu Chen, Liang Liu, Yabiao Wang, and Chengjie Wang. Anomalyd- iffusion: Few-shot anomaly image generation with diffusion model.Proceedings of the AAAI Conference on Artificial Intelligence, 38(8):8526–8534, 2024. 2

work page 2024
[25]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perel- man, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Weli- hinda, Alan Hayes, Alec Radford, et al. GPT-4o system card. arXiv preprint arXiv:2410.21276, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024
[26]

WinCLIP: Zero- /Few-Shot Anomaly Classification and Segmentation

Jongheon Jeong, Yang Zou, Taewan Kim, Dongqing Zhang, Avinash Ravichandran, and Onkar Dabeer. WinCLIP: Zero- /Few-Shot Anomaly Classification and Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 19606–19616, 2023. 3, 6, 7

work page 2023
[27]

Deep learning-based defect detection of metal parts: evaluating current methods in complex condi- tions

Stepan Jezek, Martin Jonak, Radim Burget, Pavel Dvorak, and Milos Skotak. Deep learning-based defect detection of metal parts: evaluating current methods in complex condi- tions. In2021 13th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), pages 66–71, 2021. 5

work page 2021
[28]

Kvasir-seg: A segmented polyp dataset

Debesh Jha, Pia H Smedsrud, Michael A Riegler, P ˚al Halvorsen, Thomas De Lange, Dag Johansen, and H˚avard D Johansen. Kvasir-seg: A segmented polyp dataset. InIn- ternational conference on multimedia modeling, pages 451–

work page
[29]

Springer, 2019. 1, 5

work page 2019
[30]

Vi- sual prompt tuning

Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Vi- sual prompt tuning. InEuropean Conference on Computer Vision, pages 709–727. Springer, 2022. 8

work page 2022
[31]

Brain tumor detec- tion using mri images.Brain, 3(2):146–150, 2015

Pranita Balaji Kanade and PP Gumaste. Brain tumor detec- tion using mri images.Brain, 3(2):146–150, 2015. 5

work page 2015
[32]

Diffusion Models for Open-Vocabulary Segmen- tation

Laurynas Karazija, Iro Laina, Andrea Vedaldi, and Christian Rupprecht. Diffusion Models for Open-Vocabulary Segmen- tation. InEuropean Conference on Computer Vision, pages 299–317. Springer, 2024. 2

work page 2024
[33]

Repurpos- ing diffusion-based image generators for monocular depth estimation

Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Met- zger, Rodrigo Caye Daudt, and Konrad Schindler. Repurpos- ing diffusion-based image generators for monocular depth estimation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9492–9502,

work page
[34]

Multi-task learning using uncertainty to weigh losses for scene geome- try and semantics

Alex Kendall, Yarin Gal, and Roberto Cipolla. Multi-task learning using uncertainty to weigh losses for scene geome- try and semantics. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7482–7491,

work page
[35]

Segment any- thing

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 4015–4026, 2023. 3

work page 2023
[36]

Dataset Enhancement with Instance-Level Augmentations

Orest Kupyn and Christian Rupprecht. Dataset Enhancement with Instance-Level Augmentations. InEuropean Confer- ence on Computer Vision, pages 384–402. Springer, 2024. 2

work page 2024
[37]

Flux.https://github.com/ black-forest-labs/flux, 2024

Black Forest Labs. Flux.https://github.com/ black-forest-labs/flux, 2024. 2, 3, 5, 7, 8, 1

work page 2024
[38]

Zero-Shot Anomaly Detection via Batch Normalization.Advances in Neural Information Processing Systems, 36, 2024

Aodong Li, Chen Qiu, Marius Kloft, Padhraic Smyth, Maja Rudolph, and Stephan Mandt. Zero-Shot Anomaly Detection via Batch Normalization.Advances in Neural Information Processing Systems, 36, 2024. 3

work page 2024
[39]

PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection

Xiaofan Li, Zhizhong Zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yuan Xie, and Lizhuang Ma. PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 16838–16848, 2024. 7

work page 2024
[40]

PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection

Xiaofan Li, Zhizhong Zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yuan Xie, and Lizhuang Ma. PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16838–16848, 2024. 1

work page 2024
[41]

Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning.Advances in Neural Information Processing Systems, 35:109–123, 2022

Dongze Lian, Daquan Zhou, Jiashi Feng, and Xinchao Wang. Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning.Advances in Neural Information Processing Systems, 35:109–123, 2022. 8

work page 2022
[42]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. InPro- ceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017. 4

work page 2017
[43]

Can OOD Object Detectors Learn from Founda- tion Models? InEuropean Conference on Computer Vision, pages 213–231

Jiahui Liu, Xin Wen, Shizhen Zhao, Yingxian Chen, and Xi- aojuan Qi. Can OOD Object Detectors Learn from Founda- tion Models? InEuropean Conference on Computer Vision, pages 213–231. Springer, 2024. 2

work page 2024
[44]

Grounding DINO: Marrying dino with grounded pre-training for open-set object detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, et al. Grounding DINO: Marrying dino with grounded pre-training for open-set object detection. In European Conference on Computer Vision, pages 38–55. Springer, 2024. 3

work page 2024
[45]

SimpleNet: A Simple Network for Image Anomaly Detec- tion and Localization

Zhikang Liu, Yiming Zhou, Yuansheng Xu, and Zilei Wang. SimpleNet: A Simple Network for Image Anomaly Detec- tion and Localization. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 20402–20411, 2023. 2

work page 2023
[46]

RePaint: Inpainting using denoising diffusion probabilistic models

Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. RePaint: Inpainting using denoising diffusion probabilistic models. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11461–11471, 2022. 3

work page 2022
[47]

Exploring intrinsic normal prototypes within a single im- age for universal anomaly detection

Wei Luo, Yunkang Cao, Haiming Yao, Xiaotian Zhang, Jianan Lou, Yuqi Cheng, Weiming Shen, and Wenyong Yu. Exploring intrinsic normal prototypes within a single im- age for universal anomaly detection. InProceedings of the 10 Computer Vision and Pattern Recognition Conference, pages 9974–9983, 2025. 7

work page 2025
[48]

Aa-clip: Enhancing zero-shot anomaly detection via anomaly-aware clip

Wenxin Ma, Xu Zhang, Qingsong Yao, Fenghe Tang, Chenxu Wu, Yingtai Li, Rui Yan, Zihang Jiang, and S Kevin Zhou. Aa-clip: Enhancing zero-shot anomaly detection via anomaly-aware clip. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 4744–4754,

work page
[49]

VT-ADL: A vision trans- former network for image anomaly detection and localiza- tion

Pankaj Mishra, Riccardo Verk, Daniele Fornasier, Claudio Piciarelli, and Gian Luca Foresti. VT-ADL: A vision trans- former network for image anomaly detection and localiza- tion. In30th IEEE/IES International Symposium on Indus- trial Electronics (ISIE), 2021. 5

work page 2021
[50]

Maxime Oquab, Timoth ´ee Darcet, Th´eo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Je- gou, Julien Mairal, Patr...

work page 2024
[51]

Inpainting transformer for anomaly detection

Jonathan Pirnay and Keng Chai. Inpainting transformer for anomaly detection. InInternational Conference on Image Analysis and Processing, pages 394–406. Springer, 2022. 2

work page 2022
[52]

Supporting high-level to low-level requirements coverage reviewing with large lan- guage models

Anamaria-Roberta Preda, Christoph Mayr-Dorn, Atif Mashkoor, and Alexander Egyed. Supporting high-level to low-level requirements coverage reviewing with large lan- guage models. InProceedings of the 21st International Con- ference on Mining Software Repositories, pages 242–253,

work page
[53]

Highly Accurate Dichotomous Im- age Segmentation

Xuebin Qin, Hang Dai, Xiaobin Hu, Deng-Ping Fan, Ling Shao, and Luc Van Gool. Highly Accurate Dichotomous Im- age Segmentation. InEuropean Conference on Computer Vision, pages 38–56. Springer, 2022. 3

work page 2022
[54]

Bayesian Prompt Flow Learning for Zero-Shot Anomaly De- tection

Zhen Qu, Xian Tao, Xinyi Gong, ShiChen Qu, Qiyu Chen, Zhengtao Zhang, Xingang Wang, and Guiguang Ding. Bayesian Prompt Flow Learning for Zero-Shot Anomaly De- tection. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 30398–30408, 2025. 1, 5, 6, 7, 8

work page 2025
[55]

Learn- ing Transferable Visual Models From Natural Language Su- pervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing Transferable Visual Models From Natural Language Su- pervision. InInternational conference on machine learning, pages 8748–8763. PMLR, 2021. 1, 3

work page 2021
[56]

AM-RADIO: Agglomerative vision founda- tion model reduce all domains into one

Mike Ranzinger, Greg Heinrich, Jan Kautz, and Pavlo Molchanov. AM-RADIO: Agglomerative vision founda- tion model reduce all domains into one. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12490–12500, 2024. 5

work page 2024
[57]

SuperSim- pleNet: Unifying Unsupervised and Supervised Learning for Fast and Reliable Surface Defect Detection

Bla ˇz Rolih, Matic Fu ˇcka, and Danijel Sko ˇcaj. SuperSim- pleNet: Unifying Unsupervised and Supervised Learning for Fast and Reliable Surface Defect Detection. InInternational Conference on Pattern Recognition, 2024. 2

work page 2024
[58]

No Label Left Behind: A Unified Surface Defect Detection model for all Supervision Regimes.Journal of Intelligent Manufacturing,

Bla ˇz Rolih, Matic Fuˇcka, and Danijel Skoˇcaj. No Label Left Behind: A Unified Surface Defect Detection model for all Supervision Regimes.Journal of Intelligent Manufacturing,

work page
[59]

High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-Resolution Image Synthesis with Latent Diffusion Models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 2

work page 2022
[60]

Towards To- tal Recall in Industrial Anomaly Detection

Karsten Roth, Latha Pemula, Joaquin Zepeda, Bernhard Sch¨olkopf, Thomas Brox, and Peter Gehler. Towards To- tal Recall in Industrial Anomaly Detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14318–14328, 2022. 1, 2, 7

work page 2022
[61]

Multiresolution knowledge distillation for anomaly detection

Mohammadreza Salehi, Niousha Sadjadi, Soroosh Baselizadeh, Mohammad H Rohban, and Hamid R Ra- biee. Multiresolution knowledge distillation for anomaly detection. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 14902–14912, 2021. 1, 5

work page 2021
[62]

DINOv3

Oriane Sim ´eoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025. 5

work page internal anchor Pith review Pith/arXiv arXiv 2025
[63]

Segmentation-Based Deep-Learning Approach for Surface-Defect Detection.Journal of Intelligent Manufac- turing, 2019

Domen Tabernik, Samo ˇSela, Jure Skvar ˇc, and Danijel Skoˇcaj. Segmentation-Based Deep-Learning Approach for Surface-Defect Detection.Journal of Intelligent Manufac- turing, 2019. 5

work page 2019
[64]

Automated polyp detection in colonoscopy videos using shape and context information.IEEE transactions on medical imaging, 35(2):630–644, 2015

Nima Tajbakhsh, Suryakanth R Gurudu, and Jianming Liang. Automated polyp detection in colonoscopy videos using shape and context information.IEEE transactions on medical imaging, 35(2):630–644, 2015. 5

work page 2015
[65]

Kernel-aware graph prompt learning for few-shot anomaly detection

Fenfang Tao, Guo-Sen Xie, Fang Zhao, and Xiangbo Shu. Kernel-aware graph prompt learning for few-shot anomaly detection. InProceedings of the AAAI Conference on Artifi- cial Intelligence, pages 7347–7355, 2025. 1

work page 2025
[66]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 4

work page 2017
[67]

Image-consistent detection of road anomalies as unpredictable patches

Tom ´aˇs V oj´ıˇr and Ji ˇr´ı Matas. Image-consistent detection of road anomalies as unpredictable patches. InProceedings of the IEEE/CVF Winter Conference on Applications of Com- puter Vision, pages 5491–5500, 2023. 1

work page 2023
[68]

Pixood: Pixel- level out-of-distribution detection

Tom ´aˇs V oj´ıˇr, Jan ˇSochman, and Ji ˇr´ı Matas. Pixood: Pixel- level out-of-distribution detection. InEuropean Conference on Computer Vision, pages 93–109. Springer, 2024. 1

work page 2024
[69]

Wan: Open and Advanced Large-Scale Video Generative Models

Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and advanced large-scale video gen- erative models.arXiv preprint arXiv:2503.20314, 2025. 7, 8, 1

work page internal anchor Pith review Pith/arXiv arXiv 2025
[70]

Real-IAD: A real-world multi-view dataset for benchmarking versatile industrial anomaly detec- tion

Chengjie Wang, Wenbing Zhu, Bin-Bin Gao, Zhenye Gan, Jiangning Zhang, Zhihao Gu, Shuguang Qian, Mingang Chen, and Lizhuang Ma. Real-IAD: A real-world multi-view dataset for benchmarking versatile industrial anomaly detec- tion. InProceedings of the IEEE/CVF Conference on Com- 11 puter Vision and Pattern Recognition, pages 22883–22892,

work page
[71]

DUST3R: Geometric 3d vision made easy

Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. DUST3R: Geometric 3d vision made easy. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 20697–20709, 2024. 4

work page 2024
[72]

LLM-powered data augmentation for enhanced cross- lingual performance

Chenxi Whitehouse, Monojit Choudhury, and Alham Fikri Aji. LLM-powered data augmentation for enhanced cross- lingual performance. InThe 2023 Conference on Empirical Methods in Natural Language Processing, 2023. 2

work page 2023
[73]

Weakly supervised learn- ing for industrial optical inspection

Matthias Wieler and Tobias Hahn. Weakly supervised learn- ing for industrial optical inspection. InDAGM symposium in, page 11, 2007. 5

work page 2007
[74]

Qwen-Image Technical Report

Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng-ming Yin, Shuai Bai, Xiao Xu, Yilei Chen, et al. Qwen-image technical report.arXiv preprint arXiv:2508.02324, 2025. 7, 8, 1

work page internal anchor Pith review Pith/arXiv arXiv 2025
[75]

Group normalization

Yuxin Wu and Kaiming He. Group normalization. InPro- ceedings of the European conference on computer vision (ECCV), pages 3–19, 2018. 4

work page 2018
[76]

Memseg: A semi- supervised method for image surface defect detection using differences and commonalities.Engineering Applications of Artificial Intelligence, 119:105835, 2023

Minghui Yang, Peng Wu, and Hui Feng. Memseg: A semi- supervised method for image surface defect detection using differences and commonalities.Engineering Applications of Artificial Intelligence, 119:105835, 2023. 4

work page 2023
[77]

Defect spectrum: A granular look of large-scale defect datasets with rich seman- tics

Shuai Yang, Zhifei Chen, Pengguang Chen, Xi Fang, Yixun Liang, Shu Liu, and Yingcong Chen. Defect spectrum: A granular look of large-scale defect datasets with rich seman- tics. InComputer Vision – ECCV 2024, pages 187–203, Cham, 2024. Springer Nature Switzerland. 2

work page 2024
[78]

GPT3Mix: Leveraging large- scale language models for text augmentation

Kang Min Yoo, Dongju Park, Jaewook Kang, Sang-Woo Lee, and Woomyoung Park. GPT3Mix: Leveraging large- scale language models for text augmentation. InFindings of the Association for Computational Linguistics: EMNLP 2021, pages 2225–2239, Punta Cana, Dominican Republic,

work page 2021
[79]

Association for Computational Linguistics. 2

work page
[80]

DRÆM - a Discriminatively Trained Reconstruction Embedding for Surface Anomaly Detection

Vitjan Zavrtanik, Matej Kristan, and Danijel Skoˇcaj. DRÆM - a Discriminatively Trained Reconstruction Embedding for Surface Anomaly Detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 8330–8339, 2021. 2

work page 2021

Showing first 80 references.

[1] [1]

Zero-shot versus many-shot: Unsupervised texture anomaly detection

Toshimichi Aota, Lloyd Teh Tzer Tong, and Takayuki Okatani. Zero-shot versus many-shot: Unsupervised texture anomaly detection. InProceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision, pages 5564–5572, 2023. 2, 5

work page 2023

[2] [2]

Efficien- tAD: Accurate Visual Anomaly Detection at Millisecond- Level Latencies

Kilian Batzner, Lars Heckler, and Rebecca K ¨onig. Efficien- tAD: Accurate Visual Anomaly Detection at Millisecond- Level Latencies. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 128– 138, 2024. 2, 8

work page 2024

[3] [3]

Improving Unsupervised Defect Segmentation by Applying Structural Similarity to Autoencoders

Paul Bergmann, Sindy L ¨owe, Michael Fauser, David Sattleg- ger, and Carsten Steger. Improving Unsupervised defect seg- mentation by applying structural similarity to autoencoders. ArXiv, abs/1807.02011, 2018. 2

work page internal anchor Pith review Pith/arXiv arXiv 2018

[4] [4]

MVTec AD–A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection

Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. MVTec AD–A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9592–9600, 2019. 1, 2, 5, 7

work page 2019

[5] [5]

Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs

Jorge Bernal, F Javier S ´anchez, Gloria Fern ´andez- Esparrach, Debora Gil, Cristina Rodr ´ıguez, and Fernando Vilari˜no. Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physi- cians.Computerized medical imaging and graphics, 43:99– 111, 2015. 5

work page 2015

[6] [6]

Language mod- els are realistic tabular data generators.arXiv preprint arXiv:2210.06280, 2022

Vadim Borisov, Kathrin Seßler, Tobias Leemann, Mar- tin Pawelczyk, and Gjergji Kasneci. Language mod- els are realistic tabular data generators.arXiv preprint arXiv:2210.06280, 2022. 2

work page arXiv 2022

[7] [7]

Mixed supervision for surface-defect detection: From weakly to fully supervised learning.Computers in Industry, 129: 103459, 2021

Jakob Bo ˇziˇc, Domen Tabernik, and Danijel Sko ˇcaj. Mixed supervision for surface-defect detection: From weakly to fully supervised learning.Computers in Industry, 129: 103459, 2021. 5

work page 2021

[8] [8]

Segment Any Anomaly without Training via Hybrid Prompt Regularization.arXiv preprint arXiv:2305.10724, 2023

Yunkang Cao, Xiaohao Xu, Chen Sun, Yuqi Cheng, Zongwei Du, Liang Gao, and Weiming Shen. Segment Any Anomaly without Training via Hybrid Prompt Regularization.arXiv preprint arXiv:2305.10724, 2023. 2, 3, 6

work page arXiv 2023

[9] [9]

AdaCLIP: Adapting CLIP with hybrid learnable prompts for zero-shot anomaly detection

Yunkang Cao, Jiangning Zhang, Luca Frittoli, Yuqi Cheng, Weiming Shen, and Giacomo Boracchi. AdaCLIP: Adapting CLIP with hybrid learnable prompts for zero-shot anomaly detection. InEuropean Conference on Computer Vision, pages 55–72. Springer, 2024. 1, 2, 3, 4, 5, 6, 7, 8

work page 2024

[10] [10]

Back on track: Bundle adjustment for dynamic scene re- construction

Weirong Chen, Ganlin Zhang, Felix Wimbauer, Rui Wang, Nikita Araslanov, Andrea Vedaldi, and Daniel Cremers. Back on track: Bundle adjustment for dynamic scene re- construction. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4951–4960,

work page

[11] [11]

Clip-AD: A Language-Guided Staged Dual-Path Model for Zero-shot Anomaly Detection

Xuhai Chen, Jiangning Zhang, Guanzhong Tian, Haoyang He, Wuhao Zhang, Yabiao Wang, Chengjie Wang, and Yong Liu. Clip-AD: A Language-Guided Staged Dual-Path Model for Zero-shot Anomaly Detection. InInternational Joint Conference on Artificial Intelligence, pages 17–33. Springer,

work page

[12] [12]

Noel CF Codella, David Gutman, M Emre Celebi, Brian Helba, Michael A Marchetti, Stephen W Dusza, Aadi Kalloo, Konstantinos Liopyris, Nabin Mishra, Harald Kit- tler, et al. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomed- ical imaging (isbi), hosted by the international skin imaging collaboration (i...

work page 2017

[13] [13]

Padim: a patch distribution modeling framework for anomaly detection and localization

Thomas Defard, Aleksandr Setkov, Angelique Loesch, and Romaric Audigier. Padim: a patch distribution modeling framework for anomaly detection and localization. InInter- national conference on pattern recognition, pages 475–489. Springer, 2021. 2

work page 2021

[14] [14]

Outlier detec- tion by ensembling uncertainty with negative objectness

Anja Deli ´c, Matej Grcic, and Sini ˇsa ˇSegvi´c. Outlier detec- tion by ensembling uncertainty with negative objectness. In 35th British Machine Vision Conference 2024, BMVC 2024, Glasgow, UK, November 25-28, 2024. BMV A, 2024. 1

work page 2024

[15] [15]

Anomaly Detection via Re- verse Distillation from One-Class Embedding

Hanqiu Deng and Xingyu Li. Anomaly Detection via Re- verse Distillation from One-Class Embedding. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9737–9746, 2022. 2, 5

work page 2022

[16] [16]

Few- shot defect image generation via defect-aware feature ma- nipulation.Proceedings of the AAAI Conference on Artificial Intelligence, 37(1):571–578, 2023

Yuxuan Duan, Yan Hong, Li Niu, and Liqing Zhang. Few- shot defect image generation via defect-aware feature ma- nipulation.Proceedings of the AAAI Conference on Artificial Intelligence, 37(1):571–578, 2023. 2

work page 2023

[17] [17]

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M ¨uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling Rectified Flow Transformers for High-Resolution Image Synthesis. In Forty-first International Conference on Machine Learning,

work page

[18] [18]

TransFusion–a Transparency-based Diffusion Model for Anomaly Detection

Matic Fu ˇcka, Vitjan Zavrtanik, and Danijel Sko ˇcaj. TransFusion–a Transparency-based Diffusion Model for Anomaly Detection. InEuropean conference on computer vision, pages 91–108. Springer, 2025. 1, 2

work page 2025

[19] [19]

SALAD – Semantics-Aware Logical Anomaly Detection

Matic Fu ˇcka, Vitjan Zavrtanik, and Danijel Skoˇcaj. SALAD – Semantics-Aware Logical Anomaly Detection. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025. 2

work page 2025

[20] [20]

Multi- task learning for thyroid nodule segmentation with thyroid region prior

Haifan Gong, Guanqi Chen, Ranran Wang, Xiang Xie, Mingzhi Mao, Yizhou Yu, Fei Chen, and Guanbin Li. Multi- task learning for thyroid nodule segmentation with thyroid region prior. In2021 IEEE 18th international symposium on biomedical imaging (ISBI), pages 257–261. IEEE, 2021. 5

work page 2021

[21] [21]

A. Hamada. Br35h: Brain tumor detection.https: //www.kaggle.com/datasets/ahmedhamada0/ braintumor - detection, 2020. Online; accessed

work page 2020

[22] [22]

The 9 endotect 2020 challenge: evaluation and comparison of clas- sification, segmentation and inference time for endoscopy

Steven A Hicks, Debesh Jha, Vajira Thambawita, P ˚al Halvorsen, Hugo L Hammer, and Michael A Riegler. The 9 endotect 2020 challenge: evaluation and comparison of clas- sification, segmentation and inference time for endoscopy. InInternational Conference on Pattern Recognition, pages 263–274. Springer, 2021. 5

work page 2020

[23] [23]

LoRA: Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InIn- ternational Conference on Learning Representations, 2022. 4, 8

work page 2022

[24] [24]

Anomalyd- iffusion: Few-shot anomaly image generation with diffusion model.Proceedings of the AAAI Conference on Artificial Intelligence, 38(8):8526–8534, 2024

Teng Hu, Jiangning Zhang, Ran Yi, Yuzhen Du, Xu Chen, Liang Liu, Yabiao Wang, and Chengjie Wang. Anomalyd- iffusion: Few-shot anomaly image generation with diffusion model.Proceedings of the AAAI Conference on Artificial Intelligence, 38(8):8526–8534, 2024. 2

work page 2024

[25] [25]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perel- man, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Weli- hinda, Alan Hayes, Alec Radford, et al. GPT-4o system card. arXiv preprint arXiv:2410.21276, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024

[26] [26]

WinCLIP: Zero- /Few-Shot Anomaly Classification and Segmentation

Jongheon Jeong, Yang Zou, Taewan Kim, Dongqing Zhang, Avinash Ravichandran, and Onkar Dabeer. WinCLIP: Zero- /Few-Shot Anomaly Classification and Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 19606–19616, 2023. 3, 6, 7

work page 2023

[27] [27]

Deep learning-based defect detection of metal parts: evaluating current methods in complex condi- tions

Stepan Jezek, Martin Jonak, Radim Burget, Pavel Dvorak, and Milos Skotak. Deep learning-based defect detection of metal parts: evaluating current methods in complex condi- tions. In2021 13th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), pages 66–71, 2021. 5

work page 2021

[28] [28]

Kvasir-seg: A segmented polyp dataset

Debesh Jha, Pia H Smedsrud, Michael A Riegler, P ˚al Halvorsen, Thomas De Lange, Dag Johansen, and H˚avard D Johansen. Kvasir-seg: A segmented polyp dataset. InIn- ternational conference on multimedia modeling, pages 451–

work page

[29] [29]

Springer, 2019. 1, 5

work page 2019

[30] [30]

Vi- sual prompt tuning

Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Vi- sual prompt tuning. InEuropean Conference on Computer Vision, pages 709–727. Springer, 2022. 8

work page 2022

[31] [31]

Brain tumor detec- tion using mri images.Brain, 3(2):146–150, 2015

Pranita Balaji Kanade and PP Gumaste. Brain tumor detec- tion using mri images.Brain, 3(2):146–150, 2015. 5

work page 2015

[32] [32]

Diffusion Models for Open-Vocabulary Segmen- tation

Laurynas Karazija, Iro Laina, Andrea Vedaldi, and Christian Rupprecht. Diffusion Models for Open-Vocabulary Segmen- tation. InEuropean Conference on Computer Vision, pages 299–317. Springer, 2024. 2

work page 2024

[33] [33]

Repurpos- ing diffusion-based image generators for monocular depth estimation

Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Met- zger, Rodrigo Caye Daudt, and Konrad Schindler. Repurpos- ing diffusion-based image generators for monocular depth estimation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9492–9502,

work page

[34] [34]

Multi-task learning using uncertainty to weigh losses for scene geome- try and semantics

Alex Kendall, Yarin Gal, and Roberto Cipolla. Multi-task learning using uncertainty to weigh losses for scene geome- try and semantics. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7482–7491,

work page

[35] [35]

Segment any- thing

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 4015–4026, 2023. 3

work page 2023

[36] [36]

Dataset Enhancement with Instance-Level Augmentations

Orest Kupyn and Christian Rupprecht. Dataset Enhancement with Instance-Level Augmentations. InEuropean Confer- ence on Computer Vision, pages 384–402. Springer, 2024. 2

work page 2024

[37] [37]

Flux.https://github.com/ black-forest-labs/flux, 2024

Black Forest Labs. Flux.https://github.com/ black-forest-labs/flux, 2024. 2, 3, 5, 7, 8, 1

work page 2024

[38] [38]

Zero-Shot Anomaly Detection via Batch Normalization.Advances in Neural Information Processing Systems, 36, 2024

Aodong Li, Chen Qiu, Marius Kloft, Padhraic Smyth, Maja Rudolph, and Stephan Mandt. Zero-Shot Anomaly Detection via Batch Normalization.Advances in Neural Information Processing Systems, 36, 2024. 3

work page 2024

[39] [39]

PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection

Xiaofan Li, Zhizhong Zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yuan Xie, and Lizhuang Ma. PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 16838–16848, 2024. 7

work page 2024

[40] [40]

PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection

Xiaofan Li, Zhizhong Zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yuan Xie, and Lizhuang Ma. PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16838–16848, 2024. 1

work page 2024

[41] [41]

Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning.Advances in Neural Information Processing Systems, 35:109–123, 2022

Dongze Lian, Daquan Zhou, Jiashi Feng, and Xinchao Wang. Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning.Advances in Neural Information Processing Systems, 35:109–123, 2022. 8

work page 2022

[42] [42]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. InPro- ceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017. 4

work page 2017

[43] [43]

Can OOD Object Detectors Learn from Founda- tion Models? InEuropean Conference on Computer Vision, pages 213–231

Jiahui Liu, Xin Wen, Shizhen Zhao, Yingxian Chen, and Xi- aojuan Qi. Can OOD Object Detectors Learn from Founda- tion Models? InEuropean Conference on Computer Vision, pages 213–231. Springer, 2024. 2

work page 2024

[44] [44]

Grounding DINO: Marrying dino with grounded pre-training for open-set object detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, et al. Grounding DINO: Marrying dino with grounded pre-training for open-set object detection. In European Conference on Computer Vision, pages 38–55. Springer, 2024. 3

work page 2024

[45] [45]

SimpleNet: A Simple Network for Image Anomaly Detec- tion and Localization

Zhikang Liu, Yiming Zhou, Yuansheng Xu, and Zilei Wang. SimpleNet: A Simple Network for Image Anomaly Detec- tion and Localization. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 20402–20411, 2023. 2

work page 2023

[46] [46]

RePaint: Inpainting using denoising diffusion probabilistic models

Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. RePaint: Inpainting using denoising diffusion probabilistic models. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11461–11471, 2022. 3

work page 2022

[47] [47]

Exploring intrinsic normal prototypes within a single im- age for universal anomaly detection

Wei Luo, Yunkang Cao, Haiming Yao, Xiaotian Zhang, Jianan Lou, Yuqi Cheng, Weiming Shen, and Wenyong Yu. Exploring intrinsic normal prototypes within a single im- age for universal anomaly detection. InProceedings of the 10 Computer Vision and Pattern Recognition Conference, pages 9974–9983, 2025. 7

work page 2025

[48] [48]

Aa-clip: Enhancing zero-shot anomaly detection via anomaly-aware clip

Wenxin Ma, Xu Zhang, Qingsong Yao, Fenghe Tang, Chenxu Wu, Yingtai Li, Rui Yan, Zihang Jiang, and S Kevin Zhou. Aa-clip: Enhancing zero-shot anomaly detection via anomaly-aware clip. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 4744–4754,

work page

[49] [49]

VT-ADL: A vision trans- former network for image anomaly detection and localiza- tion

Pankaj Mishra, Riccardo Verk, Daniele Fornasier, Claudio Piciarelli, and Gian Luca Foresti. VT-ADL: A vision trans- former network for image anomaly detection and localiza- tion. In30th IEEE/IES International Symposium on Indus- trial Electronics (ISIE), 2021. 5

work page 2021

[50] [50]

Maxime Oquab, Timoth ´ee Darcet, Th´eo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Je- gou, Julien Mairal, Patr...

work page 2024

[51] [51]

Inpainting transformer for anomaly detection

Jonathan Pirnay and Keng Chai. Inpainting transformer for anomaly detection. InInternational Conference on Image Analysis and Processing, pages 394–406. Springer, 2022. 2

work page 2022

[52] [52]

Supporting high-level to low-level requirements coverage reviewing with large lan- guage models

Anamaria-Roberta Preda, Christoph Mayr-Dorn, Atif Mashkoor, and Alexander Egyed. Supporting high-level to low-level requirements coverage reviewing with large lan- guage models. InProceedings of the 21st International Con- ference on Mining Software Repositories, pages 242–253,

work page

[53] [53]

Highly Accurate Dichotomous Im- age Segmentation

Xuebin Qin, Hang Dai, Xiaobin Hu, Deng-Ping Fan, Ling Shao, and Luc Van Gool. Highly Accurate Dichotomous Im- age Segmentation. InEuropean Conference on Computer Vision, pages 38–56. Springer, 2022. 3

work page 2022

[54] [54]

Bayesian Prompt Flow Learning for Zero-Shot Anomaly De- tection

Zhen Qu, Xian Tao, Xinyi Gong, ShiChen Qu, Qiyu Chen, Zhengtao Zhang, Xingang Wang, and Guiguang Ding. Bayesian Prompt Flow Learning for Zero-Shot Anomaly De- tection. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 30398–30408, 2025. 1, 5, 6, 7, 8

work page 2025

[55] [55]

Learn- ing Transferable Visual Models From Natural Language Su- pervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing Transferable Visual Models From Natural Language Su- pervision. InInternational conference on machine learning, pages 8748–8763. PMLR, 2021. 1, 3

work page 2021

[56] [56]

AM-RADIO: Agglomerative vision founda- tion model reduce all domains into one

Mike Ranzinger, Greg Heinrich, Jan Kautz, and Pavlo Molchanov. AM-RADIO: Agglomerative vision founda- tion model reduce all domains into one. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12490–12500, 2024. 5

work page 2024

[57] [57]

SuperSim- pleNet: Unifying Unsupervised and Supervised Learning for Fast and Reliable Surface Defect Detection

Bla ˇz Rolih, Matic Fu ˇcka, and Danijel Sko ˇcaj. SuperSim- pleNet: Unifying Unsupervised and Supervised Learning for Fast and Reliable Surface Defect Detection. InInternational Conference on Pattern Recognition, 2024. 2

work page 2024

[58] [58]

No Label Left Behind: A Unified Surface Defect Detection model for all Supervision Regimes.Journal of Intelligent Manufacturing,

Bla ˇz Rolih, Matic Fuˇcka, and Danijel Skoˇcaj. No Label Left Behind: A Unified Surface Defect Detection model for all Supervision Regimes.Journal of Intelligent Manufacturing,

work page

[59] [59]

High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-Resolution Image Synthesis with Latent Diffusion Models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 2

work page 2022

[60] [60]

Towards To- tal Recall in Industrial Anomaly Detection

Karsten Roth, Latha Pemula, Joaquin Zepeda, Bernhard Sch¨olkopf, Thomas Brox, and Peter Gehler. Towards To- tal Recall in Industrial Anomaly Detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14318–14328, 2022. 1, 2, 7

work page 2022

[61] [61]

Multiresolution knowledge distillation for anomaly detection

Mohammadreza Salehi, Niousha Sadjadi, Soroosh Baselizadeh, Mohammad H Rohban, and Hamid R Ra- biee. Multiresolution knowledge distillation for anomaly detection. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 14902–14912, 2021. 1, 5

work page 2021

[62] [62]

DINOv3

Oriane Sim ´eoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025. 5

work page internal anchor Pith review Pith/arXiv arXiv 2025

[63] [63]

Segmentation-Based Deep-Learning Approach for Surface-Defect Detection.Journal of Intelligent Manufac- turing, 2019

Domen Tabernik, Samo ˇSela, Jure Skvar ˇc, and Danijel Skoˇcaj. Segmentation-Based Deep-Learning Approach for Surface-Defect Detection.Journal of Intelligent Manufac- turing, 2019. 5

work page 2019

[64] [64]

Automated polyp detection in colonoscopy videos using shape and context information.IEEE transactions on medical imaging, 35(2):630–644, 2015

Nima Tajbakhsh, Suryakanth R Gurudu, and Jianming Liang. Automated polyp detection in colonoscopy videos using shape and context information.IEEE transactions on medical imaging, 35(2):630–644, 2015. 5

work page 2015

[65] [65]

Kernel-aware graph prompt learning for few-shot anomaly detection

Fenfang Tao, Guo-Sen Xie, Fang Zhao, and Xiangbo Shu. Kernel-aware graph prompt learning for few-shot anomaly detection. InProceedings of the AAAI Conference on Artifi- cial Intelligence, pages 7347–7355, 2025. 1

work page 2025

[66] [66]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 4

work page 2017

[67] [67]

Image-consistent detection of road anomalies as unpredictable patches

Tom ´aˇs V oj´ıˇr and Ji ˇr´ı Matas. Image-consistent detection of road anomalies as unpredictable patches. InProceedings of the IEEE/CVF Winter Conference on Applications of Com- puter Vision, pages 5491–5500, 2023. 1

work page 2023

[68] [68]

Pixood: Pixel- level out-of-distribution detection

Tom ´aˇs V oj´ıˇr, Jan ˇSochman, and Ji ˇr´ı Matas. Pixood: Pixel- level out-of-distribution detection. InEuropean Conference on Computer Vision, pages 93–109. Springer, 2024. 1

work page 2024

[69] [69]

Wan: Open and Advanced Large-Scale Video Generative Models

Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and advanced large-scale video gen- erative models.arXiv preprint arXiv:2503.20314, 2025. 7, 8, 1

work page internal anchor Pith review Pith/arXiv arXiv 2025

[70] [70]

Real-IAD: A real-world multi-view dataset for benchmarking versatile industrial anomaly detec- tion

Chengjie Wang, Wenbing Zhu, Bin-Bin Gao, Zhenye Gan, Jiangning Zhang, Zhihao Gu, Shuguang Qian, Mingang Chen, and Lizhuang Ma. Real-IAD: A real-world multi-view dataset for benchmarking versatile industrial anomaly detec- tion. InProceedings of the IEEE/CVF Conference on Com- 11 puter Vision and Pattern Recognition, pages 22883–22892,

work page

[71] [71]

DUST3R: Geometric 3d vision made easy

Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. DUST3R: Geometric 3d vision made easy. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 20697–20709, 2024. 4

work page 2024

[72] [72]

LLM-powered data augmentation for enhanced cross- lingual performance

Chenxi Whitehouse, Monojit Choudhury, and Alham Fikri Aji. LLM-powered data augmentation for enhanced cross- lingual performance. InThe 2023 Conference on Empirical Methods in Natural Language Processing, 2023. 2

work page 2023

[73] [73]

Weakly supervised learn- ing for industrial optical inspection

Matthias Wieler and Tobias Hahn. Weakly supervised learn- ing for industrial optical inspection. InDAGM symposium in, page 11, 2007. 5

work page 2007

[74] [74]

Qwen-Image Technical Report

Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng-ming Yin, Shuai Bai, Xiao Xu, Yilei Chen, et al. Qwen-image technical report.arXiv preprint arXiv:2508.02324, 2025. 7, 8, 1

work page internal anchor Pith review Pith/arXiv arXiv 2025

[75] [75]

Group normalization

Yuxin Wu and Kaiming He. Group normalization. InPro- ceedings of the European conference on computer vision (ECCV), pages 3–19, 2018. 4

work page 2018

[76] [76]

Memseg: A semi- supervised method for image surface defect detection using differences and commonalities.Engineering Applications of Artificial Intelligence, 119:105835, 2023

Minghui Yang, Peng Wu, and Hui Feng. Memseg: A semi- supervised method for image surface defect detection using differences and commonalities.Engineering Applications of Artificial Intelligence, 119:105835, 2023. 4

work page 2023

[77] [77]

Defect spectrum: A granular look of large-scale defect datasets with rich seman- tics

Shuai Yang, Zhifei Chen, Pengguang Chen, Xi Fang, Yixun Liang, Shu Liu, and Yingcong Chen. Defect spectrum: A granular look of large-scale defect datasets with rich seman- tics. InComputer Vision – ECCV 2024, pages 187–203, Cham, 2024. Springer Nature Switzerland. 2

work page 2024

[78] [78]

GPT3Mix: Leveraging large- scale language models for text augmentation

Kang Min Yoo, Dongju Park, Jaewook Kang, Sang-Woo Lee, and Woomyoung Park. GPT3Mix: Leveraging large- scale language models for text augmentation. InFindings of the Association for Computational Linguistics: EMNLP 2021, pages 2225–2239, Punta Cana, Dominican Republic,

work page 2021

[79] [79]

Association for Computational Linguistics. 2

work page

[80] [80]

DRÆM - a Discriminatively Trained Reconstruction Embedding for Surface Anomaly Detection

Vitjan Zavrtanik, Matej Kristan, and Danijel Skoˇcaj. DRÆM - a Discriminatively Trained Reconstruction Embedding for Surface Anomaly Detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 8330–8339, 2021. 2

work page 2021