UniVAD v2: Unified Visual Anomaly Detection via Support-Conditioned Boundary Construction
Pith reviewed 2026-06-30 06:51 UTC · model grok-4.3
The pith
UniVAD v2 builds episode-specific detection boundaries from small normal and abnormal support sets for unified anomaly detection across domains.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
UniVAD v2 is a two-sided support-conditioned boundary construction framework that extends the component-patch approach of UniVAD with an Optimal Transport-based Relational Modeling module for support-query allocation, an Adaptive Coordination mechanism for Retrieval and Relational Modeling to fuse evidence, and a Few-Shot Abnormal Reference module that converts optional abnormal examples into boundary-adjustment evidence, yielding improved cross-domain performance on six datasets.
What carries the argument
Two-sided support-conditioned boundary construction that fuses normal-side transport-style relational modeling with abnormal reference adjustment.
If this is right
- Under 1N-shot the mean image-level AUC rises from 83.0% to 84.5% and reaches 85.7% with one additional abnormal reference.
- On the MVTec-AD Severity Split the method records 96.2% image-level AUC and 96.9% pixel-level AUC.
- The same detector generalizes across industrial, logical, and medical anomaly tasks without retraining.
Where Pith is reading between the lines
- The approach could be tested on streaming scenarios where the support set arrives incrementally rather than all at once.
- If abnormal references prove consistently helpful, the framework suggests a low-cost path to controllable tolerance tuning in deployed systems.
- The reliance on support-set representativeness points to possible gains from active selection of the most informative normal or abnormal examples.
Load-bearing premise
The small support set of normal examples plus optional abnormal references is representative enough of the unseen target category and domain to let the modules form a reliable episode-specific boundary.
What would settle it
A new test set drawn from a category and domain whose support set distribution differs markedly from the query images, where the reported AUC gains disappear or reverse.
Figures
read the original abstract
Unified visual anomaly detection seeks to train a single detector that can be deployed across categories, domains, and application scenarios. In the few-shot transfer regime, the key challenge is to estimate an episode-specific boundary for an unseen target category from a small support set. Existing approaches mainly infer this boundary from normal-side evidence and provide limited abnormal-side evidence for deployment-specific tolerance. Within the normal side, they often struggle to jointly capture local correspondences and global support-query relations, making their boundaries less reliable for unseen anomalies. To address these issues, we propose UniVAD v2, a two-sided support-conditioned boundary construction framework for unified visual anomaly detection. Built on the component-patch divide-and-conquer framework of UniVAD, UniVAD v2 strengthens the normal side with an Optimal Transport-based Relational Modeling module (OTRM), which complements retrieval with support-query matching through transport-style allocation, and an Adaptive Coordination mechanism for Retrieval and Relational Modeling (ACRRM), which estimates episode-conditioned reliabilities to fuse the two sources of evidence. On the abnormal side, a Few-Shot Abnormal Reference module (FAR) converts optional abnormal references into rejection-side evidence for boundary adjustment. Experiments on six datasets spanning industrial, logical, and medical anomaly detection demonstrate strong cross-domain generalization. Under the 1N-shot protocol, UniVAD v2 improves the mean image-level AUC over UniVAD from 83.0\% to 84.5\%, and further reaches 85.7\% in the 1N+1A-shot setting. On the MVTec-AD Severity Split (MVTec-AD-SS), UniVAD v2 achieves 96.2\% image-level AUC and 96.9\% pixel-level AUC, showing that abnormal references enable controllable boundary customization without retraining.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes UniVAD v2, a two-sided support-conditioned boundary construction framework for unified visual anomaly detection. Building on UniVAD's component-patch divide-and-conquer approach, it introduces an Optimal Transport-based Relational Modeling module (OTRM) to complement retrieval with transport-style support-query allocation, an Adaptive Coordination mechanism for Retrieval and Relational Modeling (ACRRM) to estimate episode-conditioned reliabilities for fusion, and a Few-Shot Abnormal Reference module (FAR) to convert optional abnormal references into rejection-side evidence. Experiments on six datasets report mean image-level AUC gains from 83.0% (UniVAD) to 84.5% under the 1N-shot protocol and 85.7% under 1N+1A-shot, plus 96.2% image-level and 96.9% pixel-level AUC on MVTec-AD-SS.
Significance. If the gains prove robust to support-set variation, the work offers a concrete mechanism for incorporating limited abnormal-side evidence to customize decision boundaries without retraining, while strengthening normal-side modeling of local correspondences and global relations. This addresses a practical gap in few-shot transfer for anomaly detection across industrial, logical, and medical domains.
major comments (1)
- [Experiments] The headline performance deltas (83.0% o 84.5% 1N-shot; 85.7% 1N+1A-shot; 96.2/96.9% on MVTec-AD-SS) rest on the claim that 1- or 2-example support sets suffice for OTRM, ACRRM, and FAR to produce stable episode-specific boundaries. No analysis of support-set sampling variance, no ablation on support selection strategy, and no cross-validation over multiple random supports are described in the experimental evaluation; this is load-bearing for attributing the observed improvements to the proposed modules rather than favorable support examples.
minor comments (1)
- The abstract states results on 'six datasets' but does not enumerate them or provide per-dataset breakdowns; a summary table would improve clarity.
Simulated Author's Rebuttal
We thank the referee for highlighting the importance of demonstrating robustness to support-set variation. This is a substantive point that strengthens the attribution of gains to the proposed modules.
read point-by-point responses
-
Referee: [Experiments] The headline performance deltas (83.0% o 84.5% 1N-shot; 85.7% 1N+1A-shot; 96.2/96.9% on MVTec-AD-SS) rest on the claim that 1- or 2-example support sets suffice for OTRM, ACRRM, and FAR to produce stable episode-specific boundaries. No analysis of support-set sampling variance, no ablation on support selection strategy, and no cross-validation over multiple random supports are described in the experimental evaluation; this is load-bearing for attributing the observed improvements to the proposed modules rather than favorable support examples.
Authors: We agree that the absence of explicit support-set variance analysis limits the strength of the claims. The reported means follow the standard 1N-shot protocol used in prior work, but no standard deviations across random supports or ablations on selection strategy appear in the current manuscript. In the revision we will add: (i) mean and standard deviation of image-level AUC computed over 5 independent random support draws per category on all six datasets, (ii) an ablation comparing random versus k-means-based support selection, and (iii) a brief cross-validation table showing that the relative ordering of methods remains consistent across draws. These additions will directly address the concern that observed gains may stem from favorable support examples. revision: yes
Circularity Check
Minor self-citation to UniVAD base framework; no load-bearing circularity in empirical claims
full rationale
The paper describes an empirical architecture (OTRM, ACRRM, FAR modules) extending the UniVAD component-patch framework and validates gains via direct experiments on public datasets (MVTec-AD, etc.) under 1N-shot and 1N+1A-shot protocols. No equations, fitted parameters, or derivations are presented that reduce outputs to inputs by construction. The single self-citation to UniVAD supplies the base divide-and-conquer structure but is not invoked as a uniqueness theorem or to justify the reported AUC deltas; those rest on external benchmark results. This yields a normal non-circular finding with only minor self-citation.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Towards total recall in industrial anomaly detection,
K. Roth, L. Pemula, J. Zepeda, B. Sch ¨olkopf, T. Brox, and P . Gehler, “Towards total recall in industrial anomaly detection,” inPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 14 318–14 328
2022
-
[2]
Anoma- lygpt: Detecting industrial anomalies using large vision-language models,
Z. Gu, B. Zhu, G. Zhu, Y. Chen, M. Tang, and J. Wang, “Anoma- lygpt: Detecting industrial anomalies using large vision-language models,” inProceedings of the AAAI Conference on Artificial Intelli- gence, vol. 38, no. 3, 2024, pp. 1932–1940
2024
-
[3]
Anomalymoe: Towards a language-free generalist model for unified visual anomaly detection,
Z. Gu, B. Zhu, G. Zhu, Y. Chen, W. Ge, M. Tang, and J. Wang, “Anomalymoe: Towards a language-free generalist model for unified visual anomaly detection,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 6, 2026, pp. 4348– 4356
2026
-
[4]
Filo: Zero-shot anomaly detection by fine-grained description and high-quality localization,
Z. Gu, B. Zhu, G. Zhu, Y. Chen, H. Li, M. Tang, and J. Wang, “Filo: Zero-shot anomaly detection by fine-grained description and high-quality localization,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 2041–2049
2024
-
[5]
Filo++: Zero-/few-shot anomaly detection by fused fine-grained descrip- tions and deformable localization,
Z. Gu, B. Zhu, G. Zhu, Y. Chen, M. Tang, and J. Wang, “Filo++: Zero-/few-shot anomaly detection by fused fine-grained descrip- tions and deformable localization,”IEEE Transactions on Circuits and Systems for Video Technology, 2026
2026
-
[6]
Pixel-level contrastive pre-trainer for industrial image representation,
B. Zhu, Y. Chen, M. Tang, and J. Wang, “Pixel-level contrastive pre-trainer for industrial image representation,”IEEE Transactions on Instrumentation and Measurement, 2024
2024
-
[7]
Quality-aware language-conditioned local auto-regressive anomaly synthesis and detection,
L. Qian, B. Zhu, Y. Chen, M. Tang, and J. Wang, “Quality-aware language-conditioned local auto-regressive anomaly synthesis and detection,” inProceedings of the AAAI Conference on Artificial Intelli- gence, vol. 40, no. 18, 2026, pp. 15 626–15 634
2026
-
[8]
Bmad: Benchmarks for medical anomaly detection,
J. Bao, H. Sun, H. Deng, Y. He, Z. Zhang, and X. Li, “Bmad: Benchmarks for medical anomaly detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 4042–4053
2024
-
[9]
Adapting visual-language models for generalizable anomaly de- tection in medical images,
C. Huang, A. Jiang, J. Feng, Y. Zhang, X. Wang, and Y. Wang, “Adapting visual-language models for generalizable anomaly de- tection in medical images,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, 2024, pp. 11 375– 11 385
2024
-
[10]
Few-shot domain-adaptive anomaly detection for cross-site brain images,
J. Su, H. Shen, L. Peng, and D. Hu, “Few-shot domain-adaptive anomaly detection for cross-site brain images,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 3, pp. 1819– 1835, 2021
2021
-
[11]
Anomaly detection in video via self-supervised and multi-task learning,
M.-I. Georgescu, A. Barbalau, R. T. Ionescu, F. S. Khan, M. Popescu, and M. Shah, “Anomaly detection in video via self-supervised and multi-task learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12 742–12 752
2021
-
[12]
Dam safety monitoring data anomaly recognition using multiple-point model with local outlier factor,
Z. Rong, R. Pang, B. Xu, and Y. Zhou, “Dam safety monitoring data anomaly recognition using multiple-point model with local outlier factor,”Automation in Construction, vol. 159, p. 105290, 2024
2024
-
[13]
Open-vocabulary video anomaly detection,
P . Wu, X. Zhou, G. Pang, Y. Sun, J. Liu, P . Wang, and Y. Zhang, “Open-vocabulary video anomaly detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 18 297–18 307
2024
-
[14]
Padim: a patch distribution modeling framework for anomaly detection and localization,
T. Defard, A. Setkov, A. Loesch, and R. Audigier, “Padim: a patch distribution modeling framework for anomaly detection and localization,” inInternational Conference on Pattern Recognition. Springer, 2021, pp. 475–489
2021
-
[15]
Anomaly detection via reverse distillation from one-class embedding,
H. Deng and X. Li, “Anomaly detection via reverse distillation from one-class embedding,” inProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, 2022, pp. 9737–9746
2022
-
[16]
Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows,
D. Gudovskiy, S. Ishizaka, and K. Kozuka, “Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows,” inProceedings of the IEEE/CVF winter conference on applications of computer vision, 2022, pp. 98–107
2022
-
[17]
Univad: A training-free unified model for few-shot visual anomaly detec- tion,
Z. Gu, B. Zhu, G. Zhu, Y. Chen, M. Tang, and J. Wang, “Univad: A training-free unified model for few-shot visual anomaly detec- tion,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 15 194–15 203
2025
-
[18]
Dinomaly: The less is more philosophy in multi-class unsupervised anomaly detection,
J. Guo, S. Lu, W. Zhang, F. Chen, H. Li, and H. Liao, “Dinomaly: The less is more philosophy in multi-class unsupervised anomaly detection,” inProceedings of the Computer Vision and Pattern Recog- nition Conference, 2025, pp. 20 405–20 415
2025
-
[19]
Uninet: A contrastive learning-guided unified framework with feature selection for anomaly detection,
S. Wei, J. Jiang, and X. Xu, “Uninet: A contrastive learning-guided unified framework with feature selection for anomaly detection,” inProceedings of the Computer Vision and Pattern Recognition Confer- ence, 2025, pp. 9994–10 003
2025
-
[20]
Reason and discovery: A new paradigm for open set recognition,
Y. Fu, Z. Liu, and J. Lyu, “Reason and discovery: A new paradigm for open set recognition,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
2025
-
[21]
Bayesian embeddings for few-shot open world recognition,
J. Willes, J. Harrison, A. Harakeh, C. Finn, M. Pavone, and S. L. Waslander, “Bayesian embeddings for few-shot open world recognition,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 3, pp. 1513–1529, 2022
2022
-
[22]
Raid: Retrieval- augmented anomaly detection,
M. Cai, Z. Zhang, G. Wu, T. Chai, and X. Zhu, “Raid: Retrieval- augmented anomaly detection,”arXiv preprint arXiv:2602.19611, 2026
-
[23]
Normal-abnormal guided generalist anomaly detection,
Y. Wang, X. Wang, Y. Gong, and J. Xiao, “Normal-abnormal guided generalist anomaly detection,”arXiv preprint arXiv:2510.00495, 2025
-
[24]
Mvtec ad– a comprehensive real-world dataset for unsupervised anomaly detection,
P . Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Mvtec ad– a comprehensive real-world dataset for unsupervised anomaly detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 9592–9600
2019
-
[25]
Spot- the-difference self-supervised pre-training for anomaly detection and segmentation,
Y. Zou, J. Jeong, L. Pemula, D. Zhang, and O. Dabeer, “Spot- the-difference self-supervised pre-training for anomaly detection and segmentation,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 392–408
2022
-
[26]
Beyond dents and scratches: Logical constraints in unsupervised anomaly detection and localization,
P . Bergmann, K. Batzner, M. Fauser, D. Sattlegger, and C. Steger, “Beyond dents and scratches: Logical constraints in unsupervised anomaly detection and localization,”International Journal of Com- puter Vision, vol. 130, no. 4, pp. 947–969, 2022
2022
-
[27]
Sub-image anomaly detection with deep pyramid correspondences,
N. Cohen and Y. Hoshen, “Sub-image anomaly detection with deep pyramid correspondences,”arXiv preprint arXiv:2005.02357, 2020
-
[28]
Query reconstruction net- work for referring expression image segmentation,
H. Shi, H. Li, Q. Wu, and K. N. Ngan, “Query reconstruction net- work for referring expression image segmentation,”IEEE Transac- tions on Multimedia, vol. 23, pp. 995–1007, 2020
2020
-
[29]
Registration based few-shot anomaly detection,
C. Huang, H. Guan, A. Jiang, Y. Zhang, M. Spratling, and Y.- F. Wang, “Registration based few-shot anomaly detection,” in European Conference on Computer Vision. Springer, 2022, pp. 303– 319
2022
-
[30]
Spatial trans- former networks,
M. Jaderberg, K. Simonyan, A. Zissermanet al., “Spatial trans- former networks,”Advances in neural information processing systems, vol. 28, 2015
2015
-
[31]
Self-supervised masked convolu- tional transformer block for anomaly detection,
N. Madan, N.-C. Ristea, R. T. Ionescu, K. Nasrollahi, F. S. Khan, T. B. Moeslund, and M. Shah, “Self-supervised masked convolu- tional transformer block for anomaly detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 1, pp. 525– 542, 2023. IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 15
2023
-
[32]
Ad- former: Generalizable few-shot anomaly detection with dual cnn- transformer architecture,
B. Zhu, Z. Gu, G. Zhu, Y. Chen, M. Tang, and J. Wang, “Ad- former: Generalizable few-shot anomaly detection with dual cnn- transformer architecture,”IEEE Transactions on Instrumentation and Measurement, 2024
2024
-
[33]
Deep order- preserving learning with adaptive optimal transport distance,
A. Akbari, M. Awais, S. Fatemifar, and J. Kittler, “Deep order- preserving learning with adaptive optimal transport distance,” IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 1, pp. 313–328, 2022
2022
-
[34]
Learnable graph match- ing: A practical paradigm for data association,
J. He, Z. Huang, N. Wang, and Z. Zhang, “Learnable graph match- ing: A practical paradigm for data association,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 7, pp. 4880– 4895, 2024
2024
-
[35]
Plovad: Prompting vision- language models for open vocabulary video anomaly detection,
C. Xu, K. Xu, X. Jiang, and T. Sun, “Plovad: Prompting vision- language models for open vocabulary video anomaly detection,” IEEE Transactions on Circuits and Systems for Video Technology, 2025
2025
-
[36]
Transformers can learn temporal difference methods for in-context reinforce- ment learning,
J. Wang, E. Blaser, H. Daneshmand, and S. Zhang, “Transformers can learn temporal difference methods for in-context reinforce- ment learning,”arXiv preprint arXiv:2405.13861, 2024
-
[37]
Lotformer: Doubly-stochastic linear attention via low- rank optimal transport,
A. Shahbazi, C. Thrash, Y. Bai, K. Hamm, N. NaderiAlizadeh, and S. Kolouri, “Lotformer: Doubly-stochastic linear attention via low- rank optimal transport,”arXiv preprint arXiv:2509.23436, 2025
-
[38]
Revisiting reverse distillation for anomaly detection,
T. D. Tien, A. T. Nguyen, N. H. Tran, T. D. Huy, S. Duong, C. D. T. Nguyen, and S. Q. Truong, “Revisiting reverse distillation for anomaly detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 24 511–24 520
2023
-
[39]
Prototype-based optimal transport for out-of-distribution detec- tion,
A. Ke, W. Chen, C. Feng, Y. Cao, X. Xie, S. K. Zhou, and L. Feng, “Prototype-based optimal transport for out-of-distribution detec- tion,”arXiv preprint arXiv:2410.07617, 2024
-
[40]
Robust distribution alignment for industrial anomaly detection under distribution shift,
J. Liao, X. Xu, Y. Su, R.-C. Tu, Y. Liu, D. Tao, and X. Yang, “Robust distribution alignment for industrial anomaly detection under distribution shift,”arXiv preprint arXiv:2503.14910, 2025
-
[41]
Deep sets,
M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhut- dinov, and A. J. Smola, “Deep sets,”Advances in neural information processing systems, vol. 30, 2017
2017
-
[42]
Villaniet al.,Optimal transport: old and new
C. Villaniet al.,Optimal transport: old and new. Springer, 2009, vol. 338
2009
-
[43]
Provable optimal transport with transformers: The essence of depth and prompt engineering,
H. Daneshmand, “Provable optimal transport with transformers: The essence of depth and prompt engineering,”arXiv preprint arXiv:2410.19931, 2024
-
[44]
Attention is all you need,
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017
2017
-
[45]
Winclip: Zero-/few-shot anomaly classification and segmentation,
J. Jeong, Y. Zou, T. Kim, D. Zhang, A. Ravichandran, and O. Dabeer, “Winclip: Zero-/few-shot anomaly classification and segmentation,” inProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2023, pp. 19 606–19 616
2023
-
[46]
Component-aware anomaly detection framework for adjustable and logical industrial visual inspection,
T. Liu, B. Li, X. Du, B. Jiang, X. Jin, L. Jin, and Z. Zhao, “Component-aware anomaly detection framework for adjustable and logical industrial visual inspection,”Advanced Engineering Informatics, vol. 58, p. 102161, 2023
2023
-
[47]
A unified model for multi-class anomaly detection,
Z. You, L. Cui, Y. Shen, K. Yang, X. Lu, Y. Zheng, and X. Le, “A unified model for multi-class anomaly detection,”Advances in Neural Information Processing Systems, vol. 35, pp. 4571–4584, 2022
2022
-
[48]
Medclip: Contrastive learning from unpaired medical images and text,
Z. Wang, Z. Wu, D. Agarwal, and J. Sun, “Medclip: Contrastive learning from unpaired medical images and text,”arXiv preprint arXiv:2210.10163, 2022
-
[49]
U. Baid, S. Ghodasara, S. Mohan, M. Bilello, E. Calabrese, E. Co- lak, K. Farahani, J. Kalpathy-Cramer, F. C. Kitamura, S. Pati et al., “The rsna-asnr-miccai brats 2021 benchmark on brain tu- mor segmentation and radiogenomic classification,”arXiv preprint arXiv:2107.02314, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[50]
Automated segmentation of macular edema in oct using deep neural networks,
J. Hu, Y. Chen, and Z. Yi, “Automated segmentation of macular edema in oct using deep neural networks,”Medical image analysis, vol. 55, pp. 216–227, 2019
2019
-
[51]
Dictas: A framework for class-generalizable few-shot anomaly segmentation via dictionary lookup,
Z. Qu, X. Tao, X. Gong, S. Qu, X. Zhang, X. Wang, F. Shen, Z. Zhang, M. Prasad, and G. Ding, “Dictas: A framework for class-generalizable few-shot anomaly segmentation via dictionary lookup,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 20 519–20 528. Zhaopeng Gureceived the B.E. degree from Beijing University of Po...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.