Defect-aware Hybrid Prompt Optimization via Progressive Tuning for Zero-Shot Multi-type Anomaly Detection and Segmentation

Hongkuan Zhou; Lavdim Halilaj; Nadeem Nazer; Steffen Staab; Ylli Sadikaj

arxiv: 2512.09446 · v3 · pith:4DSA6IU4new · submitted 2025-12-10 · 💻 cs.CV

Defect-aware Hybrid Prompt Optimization via Progressive Tuning for Zero-Shot Multi-type Anomaly Detection and Segmentation

Nadeem Nazer , Hongkuan Zhou , Lavdim Halilaj , Ylli Sadikaj , Steffen Staab This is my paper

classification 💻 cs.CV

keywords anomalyunderaveragedapodefectdefect-awaredetectiondistribution

0 comments

read the original abstract

Recent vision-language models (VLMs) like CLIP have shown impressive anomaly detection performance under significant distribution shift by utilizing high-level semantic information through text prompts. However, these models often overlook fine-grained defect cues, e.g., hole, cut, or scratch, that are essential for understanding the anomaly's nature. Moreover, the modality gap between images and text can lead to subtle visual evidence being poorly captured in textual descriptions. To address the gap, we enhance the representation of "abnormal" with structured semantics, bridging coarse anomaly signals and fine-grained defect categories. We propose a hybrid prompting mechanism that combines human-readable descriptions of defect types with learnable token embeddings. Building on these ideas, we introduce DAPO, a Defect-aware Prompt Optimization framework for zero-shot multi-type and binary anomaly detection and segmentation under distribution shift. DAPO aligns anomaly-relevant visual features with their corresponding textual semantics by learning hybrid defect-aware prompts that combine fixed textual anchors with trainable token embeddings. We conducted experiments on public benchmarks (MPDD, VisA, MVTec-AD, MAD, and Real-IAD) and an internal dataset. The results suggest that compared to the baseline models, DAPO achieves a 3.6% average improvement in AUROC and average precision metrics at the image level under distribution shift, and a 5.2% average improvement in AUROC and F1 when localizing novel anomaly types under zero-shot settings.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

GenAU: Language-Grounded Industrial Anomaly Understanding with Vision-Language Models
cs.CV 2026-07 unverdicted novelty 6.0

GenAU augments a vision-language model with segmentation tokens to unify image-level anomaly detection, pixel-level segmentation, multi-type classification, and language-based defect analysis in a single instruction-f...