UniVAD v2: Unified Visual Anomaly Detection via Support-Conditioned Boundary Construction

Bingke Zhu; Guibo Zhu; Jinqiao Wang; Ming Tang; Peng Su; Yingying Chen; Zhaopeng Gu; Zhaowen Li

arxiv: 2606.29714 · v1 · pith:HEYY2QS2new · submitted 2026-06-29 · 💻 cs.CV

UniVAD v2: Unified Visual Anomaly Detection via Support-Conditioned Boundary Construction

Zhaopeng Gu , Bingke Zhu , Zhaowen Li , Guibo Zhu , Yingying Chen , Ming Tang , Peng Su , Jinqiao Wang This is my paper

Pith reviewed 2026-06-30 06:51 UTC · model grok-4.3

classification 💻 cs.CV

keywords unified visual anomaly detectionfew-shot transfersupport-conditioned boundaryoptimal transport relational modelingabnormal reference adjustmentcross-domain generalizationepisode-specific boundary

0 comments

The pith

UniVAD v2 builds episode-specific detection boundaries from small normal and abnormal support sets for unified anomaly detection across domains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to solve the problem of training one detector that works on new categories and domains without retraining by estimating a reliable boundary from a tiny support set of normal examples plus optional abnormal ones. It strengthens the normal side with transport-based matching and reliability-weighted fusion while using abnormal references to adjust the rejection side. A reader would care because this enables practical deployment of a single model in industrial, medical, or logical inspection settings where collecting large labeled sets per new scenario is costly.

Core claim

UniVAD v2 is a two-sided support-conditioned boundary construction framework that extends the component-patch approach of UniVAD with an Optimal Transport-based Relational Modeling module for support-query allocation, an Adaptive Coordination mechanism for Retrieval and Relational Modeling to fuse evidence, and a Few-Shot Abnormal Reference module that converts optional abnormal examples into boundary-adjustment evidence, yielding improved cross-domain performance on six datasets.

What carries the argument

Two-sided support-conditioned boundary construction that fuses normal-side transport-style relational modeling with abnormal reference adjustment.

If this is right

Under 1N-shot the mean image-level AUC rises from 83.0% to 84.5% and reaches 85.7% with one additional abnormal reference.
On the MVTec-AD Severity Split the method records 96.2% image-level AUC and 96.9% pixel-level AUC.
The same detector generalizes across industrial, logical, and medical anomaly tasks without retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested on streaming scenarios where the support set arrives incrementally rather than all at once.
If abnormal references prove consistently helpful, the framework suggests a low-cost path to controllable tolerance tuning in deployed systems.
The reliance on support-set representativeness points to possible gains from active selection of the most informative normal or abnormal examples.

Load-bearing premise

The small support set of normal examples plus optional abnormal references is representative enough of the unseen target category and domain to let the modules form a reliable episode-specific boundary.

What would settle it

A new test set drawn from a category and domain whose support set distribution differs markedly from the query images, where the reported AUC gains disappear or reverse.

Figures

Figures reproduced from arXiv: 2606.29714 by Bingke Zhu, Guibo Zhu, Jinqiao Wang, Ming Tang, Peng Su, Yingying Chen, Zhaopeng Gu, Zhaowen Li.

**Figure 2.** Figure 2: Comparison of boundary-evidence mechanisms in unified anomaly detection. (a) Memory-based retrieval builds memory banks from normal [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of UniVAD v2 as support-conditioned boundary construction. In stage (1), normal supports, the query image, and optional abnormal [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Illustration of MVTec-AD-SS. Each column shows one object [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative visualization of branch complementarity on diverse anomaly types. Each column shows a test input together with the anomaly [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 7.** Figure 7: Effect of asymmetric support composition. Each subplot com [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 9.** Figure 9: Average branch weights assigned by ACRRM on different [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 8.** Figure 8: Image-level score separation between normal and abnormal [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 10.** Figure 10: Effect of input resolution. Each subplot reports image-level and [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

read the original abstract

Unified visual anomaly detection seeks to train a single detector that can be deployed across categories, domains, and application scenarios. In the few-shot transfer regime, the key challenge is to estimate an episode-specific boundary for an unseen target category from a small support set. Existing approaches mainly infer this boundary from normal-side evidence and provide limited abnormal-side evidence for deployment-specific tolerance. Within the normal side, they often struggle to jointly capture local correspondences and global support-query relations, making their boundaries less reliable for unseen anomalies. To address these issues, we propose UniVAD v2, a two-sided support-conditioned boundary construction framework for unified visual anomaly detection. Built on the component-patch divide-and-conquer framework of UniVAD, UniVAD v2 strengthens the normal side with an Optimal Transport-based Relational Modeling module (OTRM), which complements retrieval with support-query matching through transport-style allocation, and an Adaptive Coordination mechanism for Retrieval and Relational Modeling (ACRRM), which estimates episode-conditioned reliabilities to fuse the two sources of evidence. On the abnormal side, a Few-Shot Abnormal Reference module (FAR) converts optional abnormal references into rejection-side evidence for boundary adjustment. Experiments on six datasets spanning industrial, logical, and medical anomaly detection demonstrate strong cross-domain generalization. Under the 1N-shot protocol, UniVAD v2 improves the mean image-level AUC over UniVAD from 83.0\% to 84.5\%, and further reaches 85.7\% in the 1N+1A-shot setting. On the MVTec-AD Severity Split (MVTec-AD-SS), UniVAD v2 achieves 96.2\% image-level AUC and 96.9\% pixel-level AUC, showing that abnormal references enable controllable boundary customization without retraining.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UniVAD v2 layers three modules onto the prior framework for two-sided support boundaries and shows small AUC lifts, but the support-set stability claim lacks direct checks.

read the letter

UniVAD v2 adds OTRM for optimal-transport relational modeling, ACRRM to adaptively fuse retrieval and relational signals, and FAR to incorporate optional abnormal references into the boundary. These sit on top of the existing component-patch setup and target the few-shot cross-domain case where only one or two normal examples (plus at most one abnormal) are available per episode.

The paper reports concrete gains: mean image-level AUC moves from 83.0 % to 84.5 % under the 1N-shot protocol and reaches 85.7 % once one abnormal reference is added. On the MVTec-AD severity split it hits 96.2 % image-level and 96.9 % pixel-level. Coverage across six datasets that mix industrial, logical, and medical anomalies is a plus, and the two-sided framing (normal plus abnormal evidence) is a logical next step from the first UniVAD paper.

The deltas remain modest, and nothing in the abstract or stress-test description shows variance measured across different random draws of the support set. If the modules are supposed to produce reliable episode-specific boundaries from one or two examples, some quantification of sensitivity to support choice would be needed to separate architecture effects from favorable sampling. The representativeness assumption is load-bearing and currently untested in the reported experiments.

This is incremental work aimed at the anomaly-detection sub-community. Readers already running few-shot transfer experiments on MVTec-style data could extract the three modules and try them, but the paper does not open new theoretical ground. It is solid enough on its own terms to go out for peer review; referees will probably request the missing support-set ablations and full method details.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes UniVAD v2, a two-sided support-conditioned boundary construction framework for unified visual anomaly detection. Building on UniVAD's component-patch divide-and-conquer approach, it introduces an Optimal Transport-based Relational Modeling module (OTRM) to complement retrieval with transport-style support-query allocation, an Adaptive Coordination mechanism for Retrieval and Relational Modeling (ACRRM) to estimate episode-conditioned reliabilities for fusion, and a Few-Shot Abnormal Reference module (FAR) to convert optional abnormal references into rejection-side evidence. Experiments on six datasets report mean image-level AUC gains from 83.0% (UniVAD) to 84.5% under the 1N-shot protocol and 85.7% under 1N+1A-shot, plus 96.2% image-level and 96.9% pixel-level AUC on MVTec-AD-SS.

Significance. If the gains prove robust to support-set variation, the work offers a concrete mechanism for incorporating limited abnormal-side evidence to customize decision boundaries without retraining, while strengthening normal-side modeling of local correspondences and global relations. This addresses a practical gap in few-shot transfer for anomaly detection across industrial, logical, and medical domains.

major comments (1)

[Experiments] The headline performance deltas (83.0% o 84.5% 1N-shot; 85.7% 1N+1A-shot; 96.2/96.9% on MVTec-AD-SS) rest on the claim that 1- or 2-example support sets suffice for OTRM, ACRRM, and FAR to produce stable episode-specific boundaries. No analysis of support-set sampling variance, no ablation on support selection strategy, and no cross-validation over multiple random supports are described in the experimental evaluation; this is load-bearing for attributing the observed improvements to the proposed modules rather than favorable support examples.

minor comments (1)

The abstract states results on 'six datasets' but does not enumerate them or provide per-dataset breakdowns; a summary table would improve clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the importance of demonstrating robustness to support-set variation. This is a substantive point that strengthens the attribution of gains to the proposed modules.

read point-by-point responses

Referee: [Experiments] The headline performance deltas (83.0% o 84.5% 1N-shot; 85.7% 1N+1A-shot; 96.2/96.9% on MVTec-AD-SS) rest on the claim that 1- or 2-example support sets suffice for OTRM, ACRRM, and FAR to produce stable episode-specific boundaries. No analysis of support-set sampling variance, no ablation on support selection strategy, and no cross-validation over multiple random supports are described in the experimental evaluation; this is load-bearing for attributing the observed improvements to the proposed modules rather than favorable support examples.

Authors: We agree that the absence of explicit support-set variance analysis limits the strength of the claims. The reported means follow the standard 1N-shot protocol used in prior work, but no standard deviations across random supports or ablations on selection strategy appear in the current manuscript. In the revision we will add: (i) mean and standard deviation of image-level AUC computed over 5 independent random support draws per category on all six datasets, (ii) an ablation comparing random versus k-means-based support selection, and (iii) a brief cross-validation table showing that the relative ordering of methods remains consistent across draws. These additions will directly address the concern that observed gains may stem from favorable support examples. revision: yes

Circularity Check

0 steps flagged

Minor self-citation to UniVAD base framework; no load-bearing circularity in empirical claims

full rationale

The paper describes an empirical architecture (OTRM, ACRRM, FAR modules) extending the UniVAD component-patch framework and validates gains via direct experiments on public datasets (MVTec-AD, etc.) under 1N-shot and 1N+1A-shot protocols. No equations, fitted parameters, or derivations are presented that reduce outputs to inputs by construction. The single self-citation to UniVAD supplies the base divide-and-conquer structure but is not invoked as a uniqueness theorem or to justify the reported AUC deltas; those rest on external benchmark results. This yields a normal non-circular finding with only minor self-citation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no identifiable free parameters, axioms, or invented entities; all technical details are deferred to the unavailable full text.

pith-pipeline@v0.9.1-grok · 5882 in / 1288 out tokens · 32239 ms · 2026-06-30T06:51:14.886180+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 10 canonical work pages · 1 internal anchor

[1]

Towards total recall in industrial anomaly detection,

K. Roth, L. Pemula, J. Zepeda, B. Sch ¨olkopf, T. Brox, and P . Gehler, “Towards total recall in industrial anomaly detection,” inPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 14 318–14 328

2022
[2]

Anoma- lygpt: Detecting industrial anomalies using large vision-language models,

Z. Gu, B. Zhu, G. Zhu, Y. Chen, M. Tang, and J. Wang, “Anoma- lygpt: Detecting industrial anomalies using large vision-language models,” inProceedings of the AAAI Conference on Artificial Intelli- gence, vol. 38, no. 3, 2024, pp. 1932–1940

2024
[3]

Anomalymoe: Towards a language-free generalist model for unified visual anomaly detection,

Z. Gu, B. Zhu, G. Zhu, Y. Chen, W. Ge, M. Tang, and J. Wang, “Anomalymoe: Towards a language-free generalist model for unified visual anomaly detection,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 6, 2026, pp. 4348– 4356

2026
[4]

Filo: Zero-shot anomaly detection by fine-grained description and high-quality localization,

Z. Gu, B. Zhu, G. Zhu, Y. Chen, H. Li, M. Tang, and J. Wang, “Filo: Zero-shot anomaly detection by fine-grained description and high-quality localization,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 2041–2049

2024
[5]

Filo++: Zero-/few-shot anomaly detection by fused fine-grained descrip- tions and deformable localization,

Z. Gu, B. Zhu, G. Zhu, Y. Chen, M. Tang, and J. Wang, “Filo++: Zero-/few-shot anomaly detection by fused fine-grained descrip- tions and deformable localization,”IEEE Transactions on Circuits and Systems for Video Technology, 2026

2026
[6]

Pixel-level contrastive pre-trainer for industrial image representation,

B. Zhu, Y. Chen, M. Tang, and J. Wang, “Pixel-level contrastive pre-trainer for industrial image representation,”IEEE Transactions on Instrumentation and Measurement, 2024

2024
[7]

Quality-aware language-conditioned local auto-regressive anomaly synthesis and detection,

L. Qian, B. Zhu, Y. Chen, M. Tang, and J. Wang, “Quality-aware language-conditioned local auto-regressive anomaly synthesis and detection,” inProceedings of the AAAI Conference on Artificial Intelli- gence, vol. 40, no. 18, 2026, pp. 15 626–15 634

2026
[8]

Bmad: Benchmarks for medical anomaly detection,

J. Bao, H. Sun, H. Deng, Y. He, Z. Zhang, and X. Li, “Bmad: Benchmarks for medical anomaly detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 4042–4053

2024
[9]

Adapting visual-language models for generalizable anomaly de- tection in medical images,

C. Huang, A. Jiang, J. Feng, Y. Zhang, X. Wang, and Y. Wang, “Adapting visual-language models for generalizable anomaly de- tection in medical images,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, 2024, pp. 11 375– 11 385

2024
[10]

Few-shot domain-adaptive anomaly detection for cross-site brain images,

J. Su, H. Shen, L. Peng, and D. Hu, “Few-shot domain-adaptive anomaly detection for cross-site brain images,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 3, pp. 1819– 1835, 2021

2021
[11]

Anomaly detection in video via self-supervised and multi-task learning,

M.-I. Georgescu, A. Barbalau, R. T. Ionescu, F. S. Khan, M. Popescu, and M. Shah, “Anomaly detection in video via self-supervised and multi-task learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12 742–12 752

2021
[12]

Dam safety monitoring data anomaly recognition using multiple-point model with local outlier factor,

Z. Rong, R. Pang, B. Xu, and Y. Zhou, “Dam safety monitoring data anomaly recognition using multiple-point model with local outlier factor,”Automation in Construction, vol. 159, p. 105290, 2024

2024
[13]

Open-vocabulary video anomaly detection,

P . Wu, X. Zhou, G. Pang, Y. Sun, J. Liu, P . Wang, and Y. Zhang, “Open-vocabulary video anomaly detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 18 297–18 307

2024
[14]

Padim: a patch distribution modeling framework for anomaly detection and localization,

T. Defard, A. Setkov, A. Loesch, and R. Audigier, “Padim: a patch distribution modeling framework for anomaly detection and localization,” inInternational Conference on Pattern Recognition. Springer, 2021, pp. 475–489

2021
[15]

Anomaly detection via reverse distillation from one-class embedding,

H. Deng and X. Li, “Anomaly detection via reverse distillation from one-class embedding,” inProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, 2022, pp. 9737–9746

2022
[16]

Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows,

D. Gudovskiy, S. Ishizaka, and K. Kozuka, “Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows,” inProceedings of the IEEE/CVF winter conference on applications of computer vision, 2022, pp. 98–107

2022
[17]

Univad: A training-free unified model for few-shot visual anomaly detec- tion,

Z. Gu, B. Zhu, G. Zhu, Y. Chen, M. Tang, and J. Wang, “Univad: A training-free unified model for few-shot visual anomaly detec- tion,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 15 194–15 203

2025
[18]

Dinomaly: The less is more philosophy in multi-class unsupervised anomaly detection,

J. Guo, S. Lu, W. Zhang, F. Chen, H. Li, and H. Liao, “Dinomaly: The less is more philosophy in multi-class unsupervised anomaly detection,” inProceedings of the Computer Vision and Pattern Recog- nition Conference, 2025, pp. 20 405–20 415

2025
[19]

Uninet: A contrastive learning-guided unified framework with feature selection for anomaly detection,

S. Wei, J. Jiang, and X. Xu, “Uninet: A contrastive learning-guided unified framework with feature selection for anomaly detection,” inProceedings of the Computer Vision and Pattern Recognition Confer- ence, 2025, pp. 9994–10 003

2025
[20]

Reason and discovery: A new paradigm for open set recognition,

Y. Fu, Z. Liu, and J. Lyu, “Reason and discovery: A new paradigm for open set recognition,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

2025
[21]

Bayesian embeddings for few-shot open world recognition,

J. Willes, J. Harrison, A. Harakeh, C. Finn, M. Pavone, and S. L. Waslander, “Bayesian embeddings for few-shot open world recognition,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 3, pp. 1513–1529, 2022

2022
[22]

Raid: Retrieval- augmented anomaly detection,

M. Cai, Z. Zhang, G. Wu, T. Chai, and X. Zhu, “Raid: Retrieval- augmented anomaly detection,”arXiv preprint arXiv:2602.19611, 2026

work page arXiv 2026
[23]

Normal-abnormal guided generalist anomaly detection,

Y. Wang, X. Wang, Y. Gong, and J. Xiao, “Normal-abnormal guided generalist anomaly detection,”arXiv preprint arXiv:2510.00495, 2025

work page arXiv 2025
[24]

Mvtec ad– a comprehensive real-world dataset for unsupervised anomaly detection,

P . Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Mvtec ad– a comprehensive real-world dataset for unsupervised anomaly detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 9592–9600

2019
[25]

Spot- the-difference self-supervised pre-training for anomaly detection and segmentation,

Y. Zou, J. Jeong, L. Pemula, D. Zhang, and O. Dabeer, “Spot- the-difference self-supervised pre-training for anomaly detection and segmentation,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 392–408

2022
[26]

Beyond dents and scratches: Logical constraints in unsupervised anomaly detection and localization,

P . Bergmann, K. Batzner, M. Fauser, D. Sattlegger, and C. Steger, “Beyond dents and scratches: Logical constraints in unsupervised anomaly detection and localization,”International Journal of Com- puter Vision, vol. 130, no. 4, pp. 947–969, 2022

2022
[27]

Sub-image anomaly detection with deep pyramid correspondences,

N. Cohen and Y. Hoshen, “Sub-image anomaly detection with deep pyramid correspondences,”arXiv preprint arXiv:2005.02357, 2020

work page arXiv 2005
[28]

Query reconstruction net- work for referring expression image segmentation,

H. Shi, H. Li, Q. Wu, and K. N. Ngan, “Query reconstruction net- work for referring expression image segmentation,”IEEE Transac- tions on Multimedia, vol. 23, pp. 995–1007, 2020

2020
[29]

Registration based few-shot anomaly detection,

C. Huang, H. Guan, A. Jiang, Y. Zhang, M. Spratling, and Y.- F. Wang, “Registration based few-shot anomaly detection,” in European Conference on Computer Vision. Springer, 2022, pp. 303– 319

2022
[30]

Spatial trans- former networks,

M. Jaderberg, K. Simonyan, A. Zissermanet al., “Spatial trans- former networks,”Advances in neural information processing systems, vol. 28, 2015

2015
[31]

Self-supervised masked convolu- tional transformer block for anomaly detection,

N. Madan, N.-C. Ristea, R. T. Ionescu, K. Nasrollahi, F. S. Khan, T. B. Moeslund, and M. Shah, “Self-supervised masked convolu- tional transformer block for anomaly detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 1, pp. 525– 542, 2023. IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 15

2023
[32]

Ad- former: Generalizable few-shot anomaly detection with dual cnn- transformer architecture,

B. Zhu, Z. Gu, G. Zhu, Y. Chen, M. Tang, and J. Wang, “Ad- former: Generalizable few-shot anomaly detection with dual cnn- transformer architecture,”IEEE Transactions on Instrumentation and Measurement, 2024

2024
[33]

Deep order- preserving learning with adaptive optimal transport distance,

A. Akbari, M. Awais, S. Fatemifar, and J. Kittler, “Deep order- preserving learning with adaptive optimal transport distance,” IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 1, pp. 313–328, 2022

2022
[34]

Learnable graph match- ing: A practical paradigm for data association,

J. He, Z. Huang, N. Wang, and Z. Zhang, “Learnable graph match- ing: A practical paradigm for data association,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 7, pp. 4880– 4895, 2024

2024
[35]

Plovad: Prompting vision- language models for open vocabulary video anomaly detection,

C. Xu, K. Xu, X. Jiang, and T. Sun, “Plovad: Prompting vision- language models for open vocabulary video anomaly detection,” IEEE Transactions on Circuits and Systems for Video Technology, 2025

2025
[36]

Transformers can learn temporal difference methods for in-context reinforce- ment learning,

J. Wang, E. Blaser, H. Daneshmand, and S. Zhang, “Transformers can learn temporal difference methods for in-context reinforce- ment learning,”arXiv preprint arXiv:2405.13861, 2024

work page arXiv 2024
[37]

Lotformer: Doubly-stochastic linear attention via low- rank optimal transport,

A. Shahbazi, C. Thrash, Y. Bai, K. Hamm, N. NaderiAlizadeh, and S. Kolouri, “Lotformer: Doubly-stochastic linear attention via low- rank optimal transport,”arXiv preprint arXiv:2509.23436, 2025

work page arXiv 2025
[38]

Revisiting reverse distillation for anomaly detection,

T. D. Tien, A. T. Nguyen, N. H. Tran, T. D. Huy, S. Duong, C. D. T. Nguyen, and S. Q. Truong, “Revisiting reverse distillation for anomaly detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 24 511–24 520

2023
[39]

Prototype-based optimal transport for out-of-distribution detec- tion,

A. Ke, W. Chen, C. Feng, Y. Cao, X. Xie, S. K. Zhou, and L. Feng, “Prototype-based optimal transport for out-of-distribution detec- tion,”arXiv preprint arXiv:2410.07617, 2024

work page arXiv 2024
[40]

Robust distribution alignment for industrial anomaly detection under distribution shift,

J. Liao, X. Xu, Y. Su, R.-C. Tu, Y. Liu, D. Tao, and X. Yang, “Robust distribution alignment for industrial anomaly detection under distribution shift,”arXiv preprint arXiv:2503.14910, 2025

work page arXiv 2025
[41]

Deep sets,

M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhut- dinov, and A. J. Smola, “Deep sets,”Advances in neural information processing systems, vol. 30, 2017

2017
[42]

Villaniet al.,Optimal transport: old and new

C. Villaniet al.,Optimal transport: old and new. Springer, 2009, vol. 338

2009
[43]

Provable optimal transport with transformers: The essence of depth and prompt engineering,

H. Daneshmand, “Provable optimal transport with transformers: The essence of depth and prompt engineering,”arXiv preprint arXiv:2410.19931, 2024

work page arXiv 2024
[44]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017

2017
[45]

Winclip: Zero-/few-shot anomaly classification and segmentation,

J. Jeong, Y. Zou, T. Kim, D. Zhang, A. Ravichandran, and O. Dabeer, “Winclip: Zero-/few-shot anomaly classification and segmentation,” inProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2023, pp. 19 606–19 616

2023
[46]

Component-aware anomaly detection framework for adjustable and logical industrial visual inspection,

T. Liu, B. Li, X. Du, B. Jiang, X. Jin, L. Jin, and Z. Zhao, “Component-aware anomaly detection framework for adjustable and logical industrial visual inspection,”Advanced Engineering Informatics, vol. 58, p. 102161, 2023

2023
[47]

A unified model for multi-class anomaly detection,

Z. You, L. Cui, Y. Shen, K. Yang, X. Lu, Y. Zheng, and X. Le, “A unified model for multi-class anomaly detection,”Advances in Neural Information Processing Systems, vol. 35, pp. 4571–4584, 2022

2022
[48]

Medclip: Contrastive learning from unpaired medical images and text,

Z. Wang, Z. Wu, D. Agarwal, and J. Sun, “Medclip: Contrastive learning from unpaired medical images and text,”arXiv preprint arXiv:2210.10163, 2022

work page arXiv 2022
[49]

The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification

U. Baid, S. Ghodasara, S. Mohan, M. Bilello, E. Calabrese, E. Co- lak, K. Farahani, J. Kalpathy-Cramer, F. C. Kitamura, S. Pati et al., “The rsna-asnr-miccai brats 2021 benchmark on brain tu- mor segmentation and radiogenomic classification,”arXiv preprint arXiv:2107.02314, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[50]

Automated segmentation of macular edema in oct using deep neural networks,

J. Hu, Y. Chen, and Z. Yi, “Automated segmentation of macular edema in oct using deep neural networks,”Medical image analysis, vol. 55, pp. 216–227, 2019

2019
[51]

Dictas: A framework for class-generalizable few-shot anomaly segmentation via dictionary lookup,

Z. Qu, X. Tao, X. Gong, S. Qu, X. Zhang, X. Wang, F. Shen, Z. Zhang, M. Prasad, and G. Ding, “Dictas: A framework for class-generalizable few-shot anomaly segmentation via dictionary lookup,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 20 519–20 528. Zhaopeng Gureceived the B.E. degree from Beijing University of Po...

2025

[1] [1]

Towards total recall in industrial anomaly detection,

K. Roth, L. Pemula, J. Zepeda, B. Sch ¨olkopf, T. Brox, and P . Gehler, “Towards total recall in industrial anomaly detection,” inPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 14 318–14 328

2022

[2] [2]

Anoma- lygpt: Detecting industrial anomalies using large vision-language models,

Z. Gu, B. Zhu, G. Zhu, Y. Chen, M. Tang, and J. Wang, “Anoma- lygpt: Detecting industrial anomalies using large vision-language models,” inProceedings of the AAAI Conference on Artificial Intelli- gence, vol. 38, no. 3, 2024, pp. 1932–1940

2024

[3] [3]

Anomalymoe: Towards a language-free generalist model for unified visual anomaly detection,

Z. Gu, B. Zhu, G. Zhu, Y. Chen, W. Ge, M. Tang, and J. Wang, “Anomalymoe: Towards a language-free generalist model for unified visual anomaly detection,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 6, 2026, pp. 4348– 4356

2026

[4] [4]

Filo: Zero-shot anomaly detection by fine-grained description and high-quality localization,

Z. Gu, B. Zhu, G. Zhu, Y. Chen, H. Li, M. Tang, and J. Wang, “Filo: Zero-shot anomaly detection by fine-grained description and high-quality localization,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 2041–2049

2024

[5] [5]

Filo++: Zero-/few-shot anomaly detection by fused fine-grained descrip- tions and deformable localization,

Z. Gu, B. Zhu, G. Zhu, Y. Chen, M. Tang, and J. Wang, “Filo++: Zero-/few-shot anomaly detection by fused fine-grained descrip- tions and deformable localization,”IEEE Transactions on Circuits and Systems for Video Technology, 2026

2026

[6] [6]

Pixel-level contrastive pre-trainer for industrial image representation,

B. Zhu, Y. Chen, M. Tang, and J. Wang, “Pixel-level contrastive pre-trainer for industrial image representation,”IEEE Transactions on Instrumentation and Measurement, 2024

2024

[7] [7]

Quality-aware language-conditioned local auto-regressive anomaly synthesis and detection,

L. Qian, B. Zhu, Y. Chen, M. Tang, and J. Wang, “Quality-aware language-conditioned local auto-regressive anomaly synthesis and detection,” inProceedings of the AAAI Conference on Artificial Intelli- gence, vol. 40, no. 18, 2026, pp. 15 626–15 634

2026

[8] [8]

Bmad: Benchmarks for medical anomaly detection,

J. Bao, H. Sun, H. Deng, Y. He, Z. Zhang, and X. Li, “Bmad: Benchmarks for medical anomaly detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 4042–4053

2024

[9] [9]

Adapting visual-language models for generalizable anomaly de- tection in medical images,

C. Huang, A. Jiang, J. Feng, Y. Zhang, X. Wang, and Y. Wang, “Adapting visual-language models for generalizable anomaly de- tection in medical images,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, 2024, pp. 11 375– 11 385

2024

[10] [10]

Few-shot domain-adaptive anomaly detection for cross-site brain images,

J. Su, H. Shen, L. Peng, and D. Hu, “Few-shot domain-adaptive anomaly detection for cross-site brain images,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 3, pp. 1819– 1835, 2021

2021

[11] [11]

Anomaly detection in video via self-supervised and multi-task learning,

M.-I. Georgescu, A. Barbalau, R. T. Ionescu, F. S. Khan, M. Popescu, and M. Shah, “Anomaly detection in video via self-supervised and multi-task learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12 742–12 752

2021

[12] [12]

Dam safety monitoring data anomaly recognition using multiple-point model with local outlier factor,

Z. Rong, R. Pang, B. Xu, and Y. Zhou, “Dam safety monitoring data anomaly recognition using multiple-point model with local outlier factor,”Automation in Construction, vol. 159, p. 105290, 2024

2024

[13] [13]

Open-vocabulary video anomaly detection,

P . Wu, X. Zhou, G. Pang, Y. Sun, J. Liu, P . Wang, and Y. Zhang, “Open-vocabulary video anomaly detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 18 297–18 307

2024

[14] [14]

Padim: a patch distribution modeling framework for anomaly detection and localization,

T. Defard, A. Setkov, A. Loesch, and R. Audigier, “Padim: a patch distribution modeling framework for anomaly detection and localization,” inInternational Conference on Pattern Recognition. Springer, 2021, pp. 475–489

2021

[15] [15]

Anomaly detection via reverse distillation from one-class embedding,

H. Deng and X. Li, “Anomaly detection via reverse distillation from one-class embedding,” inProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, 2022, pp. 9737–9746

2022

[16] [16]

Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows,

D. Gudovskiy, S. Ishizaka, and K. Kozuka, “Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows,” inProceedings of the IEEE/CVF winter conference on applications of computer vision, 2022, pp. 98–107

2022

[17] [17]

Univad: A training-free unified model for few-shot visual anomaly detec- tion,

Z. Gu, B. Zhu, G. Zhu, Y. Chen, M. Tang, and J. Wang, “Univad: A training-free unified model for few-shot visual anomaly detec- tion,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 15 194–15 203

2025

[18] [18]

Dinomaly: The less is more philosophy in multi-class unsupervised anomaly detection,

J. Guo, S. Lu, W. Zhang, F. Chen, H. Li, and H. Liao, “Dinomaly: The less is more philosophy in multi-class unsupervised anomaly detection,” inProceedings of the Computer Vision and Pattern Recog- nition Conference, 2025, pp. 20 405–20 415

2025

[19] [19]

Uninet: A contrastive learning-guided unified framework with feature selection for anomaly detection,

S. Wei, J. Jiang, and X. Xu, “Uninet: A contrastive learning-guided unified framework with feature selection for anomaly detection,” inProceedings of the Computer Vision and Pattern Recognition Confer- ence, 2025, pp. 9994–10 003

2025

[20] [20]

Reason and discovery: A new paradigm for open set recognition,

Y. Fu, Z. Liu, and J. Lyu, “Reason and discovery: A new paradigm for open set recognition,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

2025

[21] [21]

Bayesian embeddings for few-shot open world recognition,

J. Willes, J. Harrison, A. Harakeh, C. Finn, M. Pavone, and S. L. Waslander, “Bayesian embeddings for few-shot open world recognition,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 3, pp. 1513–1529, 2022

2022

[22] [22]

Raid: Retrieval- augmented anomaly detection,

M. Cai, Z. Zhang, G. Wu, T. Chai, and X. Zhu, “Raid: Retrieval- augmented anomaly detection,”arXiv preprint arXiv:2602.19611, 2026

work page arXiv 2026

[23] [23]

Normal-abnormal guided generalist anomaly detection,

Y. Wang, X. Wang, Y. Gong, and J. Xiao, “Normal-abnormal guided generalist anomaly detection,”arXiv preprint arXiv:2510.00495, 2025

work page arXiv 2025

[24] [24]

Mvtec ad– a comprehensive real-world dataset for unsupervised anomaly detection,

P . Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Mvtec ad– a comprehensive real-world dataset for unsupervised anomaly detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 9592–9600

2019

[25] [25]

Spot- the-difference self-supervised pre-training for anomaly detection and segmentation,

Y. Zou, J. Jeong, L. Pemula, D. Zhang, and O. Dabeer, “Spot- the-difference self-supervised pre-training for anomaly detection and segmentation,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 392–408

2022

[26] [26]

Beyond dents and scratches: Logical constraints in unsupervised anomaly detection and localization,

P . Bergmann, K. Batzner, M. Fauser, D. Sattlegger, and C. Steger, “Beyond dents and scratches: Logical constraints in unsupervised anomaly detection and localization,”International Journal of Com- puter Vision, vol. 130, no. 4, pp. 947–969, 2022

2022

[27] [27]

Sub-image anomaly detection with deep pyramid correspondences,

N. Cohen and Y. Hoshen, “Sub-image anomaly detection with deep pyramid correspondences,”arXiv preprint arXiv:2005.02357, 2020

work page arXiv 2005

[28] [28]

Query reconstruction net- work for referring expression image segmentation,

H. Shi, H. Li, Q. Wu, and K. N. Ngan, “Query reconstruction net- work for referring expression image segmentation,”IEEE Transac- tions on Multimedia, vol. 23, pp. 995–1007, 2020

2020

[29] [29]

Registration based few-shot anomaly detection,

C. Huang, H. Guan, A. Jiang, Y. Zhang, M. Spratling, and Y.- F. Wang, “Registration based few-shot anomaly detection,” in European Conference on Computer Vision. Springer, 2022, pp. 303– 319

2022

[30] [30]

Spatial trans- former networks,

M. Jaderberg, K. Simonyan, A. Zissermanet al., “Spatial trans- former networks,”Advances in neural information processing systems, vol. 28, 2015

2015

[31] [31]

Self-supervised masked convolu- tional transformer block for anomaly detection,

N. Madan, N.-C. Ristea, R. T. Ionescu, K. Nasrollahi, F. S. Khan, T. B. Moeslund, and M. Shah, “Self-supervised masked convolu- tional transformer block for anomaly detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 1, pp. 525– 542, 2023. IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 15

2023

[32] [32]

Ad- former: Generalizable few-shot anomaly detection with dual cnn- transformer architecture,

B. Zhu, Z. Gu, G. Zhu, Y. Chen, M. Tang, and J. Wang, “Ad- former: Generalizable few-shot anomaly detection with dual cnn- transformer architecture,”IEEE Transactions on Instrumentation and Measurement, 2024

2024

[33] [33]

Deep order- preserving learning with adaptive optimal transport distance,

A. Akbari, M. Awais, S. Fatemifar, and J. Kittler, “Deep order- preserving learning with adaptive optimal transport distance,” IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 1, pp. 313–328, 2022

2022

[34] [34]

Learnable graph match- ing: A practical paradigm for data association,

J. He, Z. Huang, N. Wang, and Z. Zhang, “Learnable graph match- ing: A practical paradigm for data association,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 7, pp. 4880– 4895, 2024

2024

[35] [35]

Plovad: Prompting vision- language models for open vocabulary video anomaly detection,

C. Xu, K. Xu, X. Jiang, and T. Sun, “Plovad: Prompting vision- language models for open vocabulary video anomaly detection,” IEEE Transactions on Circuits and Systems for Video Technology, 2025

2025

[36] [36]

Transformers can learn temporal difference methods for in-context reinforce- ment learning,

J. Wang, E. Blaser, H. Daneshmand, and S. Zhang, “Transformers can learn temporal difference methods for in-context reinforce- ment learning,”arXiv preprint arXiv:2405.13861, 2024

work page arXiv 2024

[37] [37]

Lotformer: Doubly-stochastic linear attention via low- rank optimal transport,

A. Shahbazi, C. Thrash, Y. Bai, K. Hamm, N. NaderiAlizadeh, and S. Kolouri, “Lotformer: Doubly-stochastic linear attention via low- rank optimal transport,”arXiv preprint arXiv:2509.23436, 2025

work page arXiv 2025

[38] [38]

Revisiting reverse distillation for anomaly detection,

T. D. Tien, A. T. Nguyen, N. H. Tran, T. D. Huy, S. Duong, C. D. T. Nguyen, and S. Q. Truong, “Revisiting reverse distillation for anomaly detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 24 511–24 520

2023

[39] [39]

Prototype-based optimal transport for out-of-distribution detec- tion,

A. Ke, W. Chen, C. Feng, Y. Cao, X. Xie, S. K. Zhou, and L. Feng, “Prototype-based optimal transport for out-of-distribution detec- tion,”arXiv preprint arXiv:2410.07617, 2024

work page arXiv 2024

[40] [40]

Robust distribution alignment for industrial anomaly detection under distribution shift,

J. Liao, X. Xu, Y. Su, R.-C. Tu, Y. Liu, D. Tao, and X. Yang, “Robust distribution alignment for industrial anomaly detection under distribution shift,”arXiv preprint arXiv:2503.14910, 2025

work page arXiv 2025

[41] [41]

Deep sets,

M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhut- dinov, and A. J. Smola, “Deep sets,”Advances in neural information processing systems, vol. 30, 2017

2017

[42] [42]

Villaniet al.,Optimal transport: old and new

C. Villaniet al.,Optimal transport: old and new. Springer, 2009, vol. 338

2009

[43] [43]

Provable optimal transport with transformers: The essence of depth and prompt engineering,

H. Daneshmand, “Provable optimal transport with transformers: The essence of depth and prompt engineering,”arXiv preprint arXiv:2410.19931, 2024

work page arXiv 2024

[44] [44]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017

2017

[45] [45]

Winclip: Zero-/few-shot anomaly classification and segmentation,

J. Jeong, Y. Zou, T. Kim, D. Zhang, A. Ravichandran, and O. Dabeer, “Winclip: Zero-/few-shot anomaly classification and segmentation,” inProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2023, pp. 19 606–19 616

2023

[46] [46]

Component-aware anomaly detection framework for adjustable and logical industrial visual inspection,

T. Liu, B. Li, X. Du, B. Jiang, X. Jin, L. Jin, and Z. Zhao, “Component-aware anomaly detection framework for adjustable and logical industrial visual inspection,”Advanced Engineering Informatics, vol. 58, p. 102161, 2023

2023

[47] [47]

A unified model for multi-class anomaly detection,

Z. You, L. Cui, Y. Shen, K. Yang, X. Lu, Y. Zheng, and X. Le, “A unified model for multi-class anomaly detection,”Advances in Neural Information Processing Systems, vol. 35, pp. 4571–4584, 2022

2022

[48] [48]

Medclip: Contrastive learning from unpaired medical images and text,

Z. Wang, Z. Wu, D. Agarwal, and J. Sun, “Medclip: Contrastive learning from unpaired medical images and text,”arXiv preprint arXiv:2210.10163, 2022

work page arXiv 2022

[49] [49]

The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification

U. Baid, S. Ghodasara, S. Mohan, M. Bilello, E. Calabrese, E. Co- lak, K. Farahani, J. Kalpathy-Cramer, F. C. Kitamura, S. Pati et al., “The rsna-asnr-miccai brats 2021 benchmark on brain tu- mor segmentation and radiogenomic classification,”arXiv preprint arXiv:2107.02314, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[50] [50]

Automated segmentation of macular edema in oct using deep neural networks,

J. Hu, Y. Chen, and Z. Yi, “Automated segmentation of macular edema in oct using deep neural networks,”Medical image analysis, vol. 55, pp. 216–227, 2019

2019

[51] [51]

Dictas: A framework for class-generalizable few-shot anomaly segmentation via dictionary lookup,

Z. Qu, X. Tao, X. Gong, S. Qu, X. Zhang, X. Wang, F. Shen, Z. Zhang, M. Prasad, and G. Ding, “Dictas: A framework for class-generalizable few-shot anomaly segmentation via dictionary lookup,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 20 519–20 528. Zhaopeng Gureceived the B.E. degree from Beijing University of Po...

2025