pith. sign in

arxiv: 2604.04632 · v1 · submitted 2026-04-06 · 💻 cs.CV

InCTRLv2: Generalist Residual Models for Few-Shot Anomaly Detection and Segmentation

Pith reviewed 2026-05-10 20:22 UTC · model grok-4.3

classification 💻 cs.CV
keywords few-shot anomaly detectiongeneralist anomaly detectionanomaly segmentationresidual learningdual-branch frameworkvision-language priorsdiscriminative anomaly learningone-class anomaly learning
0
0 comments X

The pith

A dual-branch residual model detects anomalies across unseen domains using few normal examples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes InCTRLv2 as a generalist few-shot framework for anomaly detection and segmentation that extends in-context residual learning with two complementary branches. One branch uses both normal and abnormal examples to learn semantic spaces for discriminating abnormality and normality, while the other uses only normal data to capture generalized normality patterns. Both branches receive guidance from visual-text semantic priors to support anomaly scoring from dual perspectives. This setup targets the limitation of specialist models that need large domain-specific training sets by enabling a single model to handle diverse domains without retraining. A sympathetic reader would care because it points toward practical anomaly detection where collecting extensive labeled data per domain remains expensive.

Core claim

InCTRLv2 extends prior in-context residual learning by introducing a dual-branch framework with Discriminative Anomaly Score Learning that builds semantic-guided spaces for both abnormality and normality classification from normal and abnormal data, and One-class Anomaly Score Learning that focuses on normality-deviated semantics from normal data alone in an auxiliary branch. The branches together supply complementary views on anomalies and are directed by vision-language priors, producing state-of-the-art results on anomaly detection and segmentation across ten datasets in multiple few-shot settings.

What carries the argument

Dual-branch framework that pairs Discriminative Anomaly Score Learning for normal-abnormal discrimination with One-class Anomaly Score Learning for normality-focused detection, both built on in-context residuals and guided by semantic priors.

Load-bearing premise

Vision-language priors plus the dual modules of discriminative and one-class score learning will let one residual model generalize to new domains from only a few normal examples without retraining.

What would settle it

Apply the model without any retraining to an anomaly detection dataset drawn from a domain absent from the original ten, then compare its detection and segmentation accuracy against models trained specifically on that new domain.

read the original abstract

While recent anomaly detection (AD) methods have made substantial progress in recognizing abnormal patterns within specific domains, most of them are specialist models that are trained on large training samples from a specific target dataset, struggling to generalize to unseen datasets. To address this limitation, the paradigm of Generalist Anomaly Detection (GAD) has emerged in recent years, aiming to learn a single generalist model to detect anomalies across diverse domains without retraining. To this end, this work introduces InCTRLv2, a novel few-shot Generalist Anomaly Detection and Segmentation (GADS) framework that significantly extends our previously proposed GAD model, InCTRL. Building on the idea of learning in-context residuals with few-shot normal examples to detect anomalies as in InCTRL, InCTRLv2 introduces two new, complementary perspectives of anomaly perception under a dual-branch framework. This is accomplished by two novel modules upon InCTRL: i) Discriminative Anomaly Score Learning (DASL) with both normal and abnormal data in the main branch, which learns a semantic-guided abnormality and normality space that supports the classification of query samples from both the abnormality and normality perspectives; and ii) One-class Anomaly Score Learning (OASL) using only the normal data, which learns generalized normality patterns in a semantic space via an auxiliary branch, focusing on detecting anomalies through the lens of normality solely. Both branches are guided by rich visual-text semantic priors encoded by large-scale vision-language models. Together, they offer a dual semantic perspective for AD: one emphasizes normal-abnormal discriminations, while the other emphasizes normality-deviated semantics. Extensive experiments on ten AD datasets demonstrate that InCTRLv2 achieves SotA performance in both anomaly detection and segmentation tasks across various settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces InCTRLv2, a few-shot Generalist Anomaly Detection and Segmentation (GADS) framework extending the prior InCTRL model. It proposes a dual-branch architecture with Discriminative Anomaly Score Learning (DASL) that uses both normal and abnormal data in the main branch to learn a semantic-guided abnormality and normality space, and One-class Anomaly Score Learning (OASL) that uses only normal data in an auxiliary branch to learn generalized normality patterns. Both are guided by vision-language model priors and build on in-context residual learning with few-shot normal examples. The central claim is that this single generalist model achieves state-of-the-art performance in anomaly detection and segmentation across ten diverse AD datasets without retraining.

Significance. If the cross-dataset generalization holds under strict source/target partitioning, the work would advance the emerging GAD paradigm by demonstrating that dual semantic perspectives (discriminative and normality-deviated) combined with VLM priors can enable practical few-shot deployment across domains. The empirical scope across ten datasets and both detection/segmentation tasks is a strength; reproducible code or parameter-free derivations are not mentioned.

major comments (2)
  1. [Experimental Setup and §3.2 (DASL module)] The generalization claim in the abstract and §1 rests on DASL learning an abnormality space from source-domain abnormals that transfers to held-out targets using only few-shot normals at inference. The manuscript does not specify the exact partitioning of the ten datasets (which are sources vs. strictly held-out targets) nor provide an ablation isolating domain shift in abnormality semantics when no target abnormals are observed during training.
  2. [Results section, Table 1] Table 1 (or equivalent results table) reports SOTA on ten datasets, but without explicit confirmation that all baselines are evaluated under identical few-shot generalist constraints (no target-domain retraining or abnormal samples), the cross-period/cross-dataset superiority cannot be verified as load-bearing evidence for the dual-branch advantage over plain in-context residuals.
minor comments (2)
  1. [Abstract] The abstract claims SOTA results but omits any mention of the specific metrics (e.g., AUROC, AUPRO), baselines, or few-shot shot count, which reduces immediate clarity.
  2. [§3] Notation for the dual-branch outputs (e.g., how DASL and OASL scores are fused) is described in prose but would benefit from an explicit equation in §3.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and detailed comments on our manuscript. We address each major comment below and will revise the paper accordingly to improve clarity and strengthen the presentation of our results.

read point-by-point responses
  1. Referee: [Experimental Setup and §3.2 (DASL module)] The generalization claim in the abstract and §1 rests on DASL learning an abnormality space from source-domain abnormals that transfers to held-out targets using only few-shot normals at inference. The manuscript does not specify the exact partitioning of the ten datasets (which are sources vs. strictly held-out targets) nor provide an ablation isolating domain shift in abnormality semantics when no target abnormals are observed during training.

    Authors: We agree that explicit specification of the source/target partitioning is necessary to substantiate the generalization claims. The current manuscript follows the standard cross-dataset GAD protocol in which the model is trained on source datasets and evaluated on strictly held-out target datasets using only few-shot normal samples at inference; however, the exact splits were not tabulated. In the revision we will add a clear description of the partitioning (including which datasets serve as sources versus targets) to Section 4.1. We will also include an ablation study (in the main paper or supplementary material) that isolates the effect of domain shift on abnormality semantics by comparing performance when source abnormals are available versus when they are withheld. These changes will directly address the concern. revision: yes

  2. Referee: [Results section, Table 1] Table 1 (or equivalent results table) reports SOTA on ten datasets, but without explicit confirmation that all baselines are evaluated under identical few-shot generalist constraints (no target-domain retraining or abnormal samples), the cross-period/cross-dataset superiority cannot be verified as load-bearing evidence for the dual-branch advantage over plain in-context residuals.

    Authors: We acknowledge that an explicit statement confirming identical evaluation constraints for all methods is important for verifying the contribution of the dual-branch design. All baselines reported in Table 1 were re-implemented and evaluated under the same few-shot generalist protocol as InCTRLv2 (no target-domain retraining and no access to target abnormal samples). To remove any ambiguity, we will add a dedicated paragraph in Section 4.2 and a clarifying footnote to Table 1 stating that every method adheres to these constraints. This revision will make the comparison transparent and reinforce that the observed gains stem from the proposed DASL and OASL modules. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical extension

full rationale

The paper introduces InCTRLv2 as an empirical extension of prior InCTRL work via two new modules (DASL and OASL) under a dual-branch framework guided by VLM priors. All claims rest on experimental results across ten AD datasets rather than any derivation, equations, or first-principles predictions. The reference to InCTRL is a standard citation for the base in-context residual idea and is not load-bearing for any mathematical reduction; the new dual semantic perspectives are independently motivated and evaluated. No self-definitional steps, fitted inputs renamed as predictions, or ansatz smuggling occur.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review is based solely on the abstract; no mathematical derivations, free parameters, axioms, or invented entities are identifiable from the provided text.

pith-pipeline@v0.9.0 · 5623 in / 1191 out tokens · 35056 ms · 2026-05-10T20:22:33.472614+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages

  1. [1]

    In: Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III 14, Springer, pp 622–637

    Akcay S, Atapour-Abarghouei A, Breckon TP (2019) Ganomaly: Semi-supervised anomaly detection via adversarial training. In: Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III 14, Springer, pp 622–637

  2. [2]

    Advances in Neural Informa- tion Processing Systems 35:23716–23736

    Alayrac JB, Donahue J, Luc P, et al (2022) Flamingo: a visual language model for few- shot learning. Advances in Neural Informa- tion Processing Systems 35:23716–23736

  3. [3]

    Advances in Neural Information Process- ing Systems 35:25005–25017

    Bar A, Gandelsman Y, Darrell T, et al (2022) Visual prompting via image inpaint- ing. Advances in Neural Information Process- ing Systems 35:25005–25017

  4. [4]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2977– 2986

    Belton N, Hagos MT, Lawlor A, et al (2023) Fewsome: One-class few shot anomaly detec- tion with siamese networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2977– 2986

  5. [5]

    arXiv preprint arXiv:200502359

    Bergman L, Hoshen Y (2020) Classification- based anomaly detection for general data. arXiv preprint arXiv:200502359

  6. [6]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9592–9600

    Bergmann P, Fauser M, Sattlegger D, et al (2019) Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9592–9600

  7. [7]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4183–4192

    Bergmann P, Fauser M, Sattlegger D, et al (2020) Uninformed students: Student- teacher anomaly detection with discrimina- tive latent embeddings. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4183–4192

  8. [8]

    Advances in neural information processing systems 33:1877–1901

    Brown T, Mann B, Ryder N, et al (2020) Lan- guage models are few-shot learners. Advances in neural information processing systems 33:1877–1901

  9. [9]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 6511–6523

    Cao T, Zhu J, Pang G (2023) Anomaly detec- tion under distribution shift. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 6511–6523

  10. [10]

    arXiv preprint arXiv:240116402

    Cao Y, Xu X, Zhang J, et al (2024) A survey on visual anomaly detection: Chal- lenge, approach, and prospect. arXiv preprint arXiv:240116402

  11. [11]

    arXiv preprint arXiv:210910852

    Chen T, Saxena S, Li L, et al (2021) Pix2seq: A language modeling framework for object 17 detection. arXiv preprint arXiv:210910852

  12. [12]

    Advances in Neural Information Processing Systems 35:31333–31346

    Chen T, Saxena S, Li L, et al (2022) A unified sequence interface for vision tasks. Advances in Neural Information Processing Systems 35:31333–31346

  13. [13]

    arXiv preprint arXiv:230517382

    Chen X, Han Y, Zhang J (2023) April-gan: A zero-/few-shot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad. arXiv preprint arXiv:230517382

  14. [14]

    In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 383– 392

    Chen Y, Tian Y, Pang G, et al (2022) Deep one-class classification via interpolated gaus- sian descriptor. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 383– 392

  15. [15]

    arXiv preprint arXiv:200502357

    Cohen N, Hoshen Y (2020) Sub-image anomaly detection with deep pyramid corre- spondences. arXiv preprint arXiv:200502357

  16. [16]

    In: International Conference on Pattern Recognition, Springer, pp 475–489

    Defard T, Setkov A, Loesch A, et al (2021) Padim: a patch distribution modeling frame- work for anomaly detection and localiza- tion. In: International Conference on Pattern Recognition, Springer, pp 475–489

  17. [17]

    Solar Energy 185:455–468

    Deitsch S, Christlein V, Berger S, et al (2019) Automatic classification of defective photo- voltaic module cells in electroluminescence images. Solar Energy 185:455–468

  18. [18]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, pp 9737–9746

    Deng H, Li X (2022) Anomaly detection via reverse distillation from one-class embedding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, pp 9737–9746

  19. [19]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15180–15190

    Girdhar R, El-Nouby A, Liu Z, et al (2023) Imagebind: One embedding space to bind them all. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15180–15190

  20. [20]

    In: Proceedings of the 32nd ACM International Conference on Multimedia, pp 2041–2049

    Gu Z, Zhu B, Zhu G, et al (2024) Filo: Zero-shot anomaly detection by fine-grained description and high-quality localization. In: Proceedings of the 32nd ACM International Conference on Multimedia, pp 2041–2049

  21. [21]

    arXiv preprint arXiv:220606336

    Hao Y, Song H, Dong L, et al (2022) Lan- guage models are general-purpose interfaces. arXiv preprint arXiv:220606336

  22. [22]

    In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pp 8791–8800

    Hou J, Zhang Y, Zhong Q, et al (2021) Divide-and-assemble: Learning block-wise memory for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pp 8791–8800

  23. [23]

    In: European Conference on Computer Vision, Springer, pp 303–319

    Huang C, Guan H, Jiang A, et al (2022) Registration based few-shot anomaly detec- tion. In: European Conference on Computer Vision, Springer, pp 303–319

  24. [24]

    Zenodo 4:5

    Ilharco G, Wortsman M, Wightman R, et al (2021) Openclip. Zenodo 4:5

  25. [25]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 19606–19616

    Jeong J, Zou Y, Kim T, et al (2023) Win- clip: Zero-/few-shot anomaly classification and segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 19606–19616

  26. [26]

    Advances in Neural Information Pro- cessing Systems 35:26295–26308

    Kolesnikov A, Susano Pinto A, Beyer L, et al (2022) Uvim: A unified modeling approach for vision with learned guiding codes. Advances in Neural Information Pro- cessing Systems 35:26295–26308

  27. [27]

    Krizhevsky A, Nair V, Hinton G (2009) Cifar- 10 (canadian institute for advanced research)

  28. [28]

    URL http://www cs toronto edu/kriz/- cifar html 5

  29. [29]

    Proceedings of the IEEE 86(11):2278–2324

    LeCun Y, Bottou L, Bengio Y, et al (1998) Gradient-based learning applied to docu- ment recognition. Proceedings of the IEEE 86(11):2278–2324

  30. [30]

    ArXiv abs/1911.02855

    Li X, Sun X, Meng Y, et al (2019) Dice loss for data-imbalanced nlp tasks. ArXiv abs/1911.02855

  31. [31]

    In: Proceed- ings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pp 16838–16848 18

    Li X, Zhang Z, Tan X, et al (2024) Promptad: Learning prompts with only normal samples for few-shot anomaly detection. In: Proceed- ings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pp 16838–16848 18

  32. [32]

    arXiv preprint arXiv:230405653

    Li Y, Wang H, Duan Y, et al (2023) Clip surgery for better explainability with enhancement in open-vocabulary tasks. arXiv preprint arXiv:230405653

  33. [33]

    Advances in Neural Information Processing Systems 37:78371– 78393

    Li Y, Zhang S, Li K, et al (2024) One- to-normal: Anomaly personalization for few- shot anomaly detection. Advances in Neural Information Processing Systems 37:78371– 78393

  34. [34]

    arXiv preprint arXiv:240218998

    Liao J, Xu X, Nguyen MC, et al (2024) Coft-ad: Contrastive fine-tuning for few- shot anomaly detection. arXiv preprint arXiv:240218998

  35. [35]

    2017 IEEE International Conference on Computer Vision (ICCV) pp 2999–3007

    Lin TY, Goyal P, Girshick RB, et al (2017) Focal loss for dense object detection. 2017 IEEE International Conference on Computer Vision (ICCV) pp 2999–3007

  36. [36]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12147–12156

    Liu W, Chang H, Ma B, et al (2023) Diversity-measurable anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12147–12156

  37. [37]

    arXiv preprint arXiv:220608916

    Lu J, Clark C, Zellers R, et al (2022) Unified-io: A unified model for vision, lan- guage, and multi-modal tasks. arXiv preprint arXiv:220608916

  38. [38]

    IEEE transactions on medical imaging 34(10):1993–2024

    Menze BH, Jakab A, Bauer S, et al (2014) The multimodal brain tumor image segmen- tation benchmark (brats). IEEE transactions on medical imaging 34(10):1993–2024

  39. [39]

    In: 2015 IEEE inter- national conference on data mining workshop (ICDMW), IEEE, pp 623–630

    Pang G, Ting KM, Albrecht D (2015) Lesinn: Detecting anomalies by identifying least sim- ilar nearest neighbours. In: 2015 IEEE inter- national conference on data mining workshop (ICDMW), IEEE, pp 623–630

  40. [40]

    In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discov- ery & data mining, pp 2041–2050

    Pang G, Cao L, Chen L, et al (2018) Learning representations of ultrahigh-dimensional data for random distance-based outlier detection. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discov- ery & data mining, pp 2041–2050

  41. [41]

    ACM computing surveys (CSUR) 54(2):1–38

    Pang G, Shen C, Cao L, et al (2021) Deep learning for anomaly detection: A review. ACM computing surveys (CSUR) 54(2):1–38

  42. [42]

    In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recog- nition, pp 14372–14381

    Park H, Noh J, Ham B (2020) Learning memory-guided normality for anomaly detec- tion. In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recog- nition, pp 14372–14381

  43. [43]

    In: International conference on machine learning, PMLR, pp 8748–8763

    Radford A, Kim JW, Hallacy C, et al (2021) Learning transferable visual models from nat- ural language supervision. In: International conference on machine learning, PMLR, pp 8748–8763

  44. [44]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14318–14328

    Roth K, Pemula L, Zepeda J, et al (2022) Towards total recall in industrial anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14318–14328

  45. [45]

    In: Inter- national conference on machine learning, PMLR, pp 4393–4402

    Ruff L, Vandermeulen R, Goernitz N, et al (2018) Deep one-class classification. In: Inter- national conference on machine learning, PMLR, pp 4393–4402

  46. [46]

    In: ICLR

    Ruff L, Vandermeulen RA, G¨ ornitz N, et al (2020) Deep semi-supervised anomaly detec- tion. In: ICLR

  47. [47]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14902–14912

    Salehi M, Sadjadi N, Baselizadeh S, et al (2021) Multiresolution knowledge distillation for anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14902–14912

  48. [48]

    Medical image analysis 54:30–44

    Schlegl T, Seeb¨ ock P, Waldstein SM, et al (2019) f-anogan: Fast unsupervised anomaly detection with generative adversarial net- works. Medical image analysis 54:30–44

  49. [49]

    arXiv preprint arXiv:221114307

    Schwartz E, Arbelle A, Karlinsky L, et al (2022) Maeday: Mae for few and zero shot anomaly-detection. arXiv preprint arXiv:221114307

  50. [50]

    In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pp 8495–8504 19

    Sheynin S, Benaim S, Wolf L (2021) A hier- archical transformation-discriminating gener- ative model for few shot anomaly detection. In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pp 8495–8504 19

  51. [51]

    Autex Research Journal 19(4):363–374

    Silvestre-Blanes J, Albero-Albero T, Miralles I, et al (2019) A public fabric database for defect detection methods and results. Autex Research Journal 19(4):363–374

  52. [52]

    Journal of Intel- ligent Manufacturing 31(3):759–776

    Tabernik D, ˇSela S, Skvarˇ c J, et al (2020) Segmentation-based deep-learning approach for surface-defect detection. Journal of Intel- ligent Manufacturing 31(3):759–776

  53. [53]

    Machine learning 54:45–66

    Tax DM, Duin RP (2004) Support vector data description. Machine learning 54:45–66

  54. [54]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 24511–24520

    Tien TD, Nguyen AT, Tran NH, et al (2023) Revisiting reverse distillation for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 24511–24520

  55. [55]

    arXiv preprint arXiv:210304257

    Wang G, Han S, Ding E, et al (2021) Student-teacher feature pyramid match- ing for anomaly detection. arXiv preprint arXiv:210304257

  56. [56]

    In: International Con- ference on Machine Learning, PMLR, pp 23318–23340

    Wang P, Yang A, Men R, et al (2022) Ofa: Unifying architectures, tasks, and modali- ties through a simple sequence-to-sequence learning framework. In: International Con- ference on Machine Learning, PMLR, pp 23318–23340

  57. [57]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6830–6839

    Wang X, Wang W, Cao Y, et al (2023) Images speak in images: A generalist painter for in- context visual learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6830–6839

  58. [58]

    arXiv preprint arXiv:230403284

    Wang X, Zhang X, Cao Y, et al (2023) Seg- gpt: Segmenting everything in context. arXiv preprint arXiv:230403284

  59. [59]

    Advances in Neural Information Processing Systems 35:4957–4970

    Wang Z, Zhou Y, Wang R, et al (2022) Few-shot fast-adaptive anomaly detection. Advances in Neural Information Processing Systems 35:4957–4970

  60. [60]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 4369–4378

    Wu JC, Chen DJ, Fuh CS, et al (2021) Learning unsupervised metaformer for anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 4369–4378

  61. [61]

    arXiv preprint arXiv:231107042

    Wu P, Zhou X, Pang G, et al (2023) Open- vocabulary video anomaly detection. arXiv preprint arXiv:231107042

  62. [62]

    arXiv preprint arXiv:230811681

    Wu P, Zhou X, Pang G, et al (2023) Vadclip: Adapting vision-language models for weakly supervised video anomaly detection. arXiv preprint arXiv:230811681

  63. [63]

    arXiv preprint arXiv:240905383

    Wu P, Pan C, Yan Y, et al (2024) Deep learn- ing for video anomaly detection: A review. arXiv preprint arXiv:240905383

  64. [64]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 23890–23901

    Xiang T, Zhang Y, Lu Y, et al (2023) Squid: Deep feature in-painting for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 23890–23901

  65. [65]

    arXiv preprint arXiv:230112082

    Xie G, Wang J, Liu J, et al (2023) Push- ing the limits of fewshot anomaly detection in industry vision: Graphcore. arXiv preprint arXiv:230112082

  66. [66]

    In: Pro- ceedings of the AAAI Conference on Artificial Intelligence, pp 3110–3118

    Yan X, Zhang H, Xu X, et al (2021) Learn- ing semantic context from normal samples for unsupervised anomaly detection. In: Pro- ceedings of the AAAI Conference on Artificial Intelligence, pp 3110–3118

  67. [67]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6803– 6813

    Yao X, Li R, Qian Z, et al (2023) Focus the discrepancy: Intra-and inter-correlation learning for image anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6803– 6813

  68. [68]

    In: Proceedings of the AAAI Con- ference on Artificial Intelligence, pp 4792– 4800

    Yao X, Zhang C, Li R, et al (2023) One- for-all: Proposal masked cross-class anomaly detection. In: Proceedings of the AAAI Con- ference on Artificial Intelligence, pp 4792– 4800

  69. [69]

    Advances in Neural Information Processing Systems 37:125287– 125311

    Yao X, Chen Z, Gao C, et al (2024) Resad: A simple framework for class generaliz- able anomaly detection. Advances in Neural Information Processing Systems 37:125287– 125311

  70. [70]

    In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 8679–8688

    Ye M, Liu W, He P (2025) Vera: Explain- able video anomaly detection via verbalized 20 learning of vision-language models. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 8679–8688

  71. [71]

    In: Proceedings of the Asian Conference on Computer Vision

    Yi J, Yoon S (2020) Patch svdd: Patch-level svdd for anomaly detection and segmenta- tion. In: Proceedings of the Asian Conference on Computer Vision

  72. [72]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14183–14193

    Zaheer MZ, Lee Jh, Astrid M, et al (2020) Old is gold: Redefining the adversarially learned one-class classifier training paradigm. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14183–14193

  73. [73]

    In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pp 8330–8339

    Zavrtanik V, Kristan M, Skoˇ caj D (2021) Draem-a discriminatively trained reconstruc- tion embedding for surface anomaly detec- tion. In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pp 8330–8339

  74. [74]

    Pattern Recognition 112:107706

    Zavrtanik V, Kristan M, Skoˇ caj D (2021) Reconstruction by inpainting for visual anomaly detection. Pattern Recognition 112:107706

  75. [75]

    arXiv:2211.11317

    Zhang X, Li S, Li X, et al (2023) Destseg: Seg- mentation guided denoising student-teacher for anomaly detection. arXiv:2211.11317

  76. [76]

    In: The Twelfth International Conference on Learning Repre- sentations

    Zhou Q, Pang G, Tian Y, et al (2024) Anoma- lyclip: Object-agnostic prompt learning for zero-shot anomaly detection. In: The Twelfth International Conference on Learning Repre- sentations

  77. [77]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 17826–17836

    Zhu J, Pang G (2024) Toward generalist anomaly detection via in-context residual learning with few-shot sample prompts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 17826–17836

  78. [78]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 17616– 17626

    Zhu J, Ding C, Tian Y, et al (2024) Anomaly heterogeneity learning for open-set super- vised anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 17616– 17626

  79. [79]

    arXiv preprint arXiv:241010289

    Zhu J, Ong YS, Shen C, et al (2024) Fine- grained abnormality prompt learning for zero-shot anomaly detection. arXiv preprint arXiv:241010289

  80. [80]

    In: European Conference on Computer Vision, Springer, pp 392–408 21

    Zou Y, Jeong J, Pemula L, et al (2022) Spot- the-difference self-supervised pre-training for anomaly detection and segmentation. In: European Conference on Computer Vision, Springer, pp 392–408 21