Benchmark AUC Is Not Deployable Reliability: A Cross-Dataset Audit of Off-the-Shelf Features for Surveillance Video Anomaly Detection

Mohammadreza Rashidi

arxiv: 2606.29506 · v1 · pith:6I4OJJFAnew · submitted 2026-06-28 · 💻 cs.CV · cs.CR

Benchmark AUC Is Not Deployable Reliability: A Cross-Dataset Audit of Off-the-Shelf Features for Surveillance Video Anomaly Detection

Mohammadreza Rashidi This is my paper

Pith reviewed 2026-06-30 07:26 UTC · model grok-4.3

classification 💻 cs.CV cs.CR

keywords video anomaly detectioncross-dataset evaluationsurveillance videooff-the-shelf embeddingsAUCnormality modelfalse alarm ratedeployment reliability

0 comments

The pith

A detector trained on one surveillance scene performs at chance on another.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper audits off-the-shelf feature embeddings for video anomaly detection by training normality models on normal frames from one dataset and testing on frames from the same or different datasets. Same-dataset performance reaches an average AUC of 0.704 across four benchmarks, but cross-dataset performance falls to 0.499. This drop occurs for multiple backbones including DINOv2 and CLIP and persists when nearest-neighbour scoring is replaced by Mahalanobis distance. The result implies that commonly reported benchmark numbers describe performance only within a single calibrated camera and scene rather than across varying real deployments.

Core claim

We build an unsupervised normality model from the all-normal training frames of one dataset using frozen off-the-shelf embeddings and a nearest-neighbour distance, then score the test frames of the same and of other datasets. Across four real datasets and four backbones, same-dataset AUC averages 0.704 but cross-dataset AUC averages 0.499. The collapse is reproduced with a PaDiM-style Mahalanobis detector, and the strongest backbone exhibits the largest drop.

What carries the argument

Cross-dataset protocol that trains a nearest-neighbour normality model on one dataset's normal frames and evaluates it on test frames from other datasets.

If this is right

A detector calibrated on one scene is no better than a coin flip on another scene.
Stronger backbones such as DINOv2 produce the largest cross-dataset drops.
The gap remains essentially unchanged when nearest-neighbour scoring is replaced by Mahalanobis distance.
Even at a favourable operating point the false-alarm rate reaches tens of thousands per hour.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practical surveillance systems may require per-camera or per-scene calibration rather than reliance on a single pre-trained model.
Benchmark suites for anomaly detection would benefit from mandatory cross-scene test splits to better reflect deployment conditions.
The observed generalization failure may stem from the static nature of frame-level embeddings without temporal or domain-adaptation components.
Similar cross-dataset audits could be applied to other unsupervised detection tasks that currently report only in-distribution metrics.

Load-bearing premise

That performance measured by training a normality model on one dataset's normal frames and testing on another dataset's frames is a valid proxy for real-world deployment across different cameras and scenes.

What would settle it

A replication using the same four datasets, same backbones, and same scoring rules that obtains average cross-dataset AUC materially above 0.5 would falsify the reported collapse to chance.

Figures

Figures reproduced from arXiv: 2606.29506 by Mohammadreza Rashidi.

**Figure 1.** Figure 1: The cross-dataset audit protocol. A normality model is calibrated on the normal-only training frames of one dataset (top), then used to [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Same-dataset versus cross-dataset frame-level ROC-AUC per backbone. The dashed line is chance. Calibrated performance does not [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Frame-level ROC-AUC for every train/test pair and every backbone. Each panel’s bright diagonal (calibrated) collapses to a muted [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Same-dataset ROC-AUC against the anomalous-frame fraction of each test set. The positive correlation ( [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Same-dataset and cross-dataset AUC for the nearest-neighbour and the Mahalanobis (PaDiM-style) detector. The two detector families [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

Automated "suspicious behavior" flagging is a headline promise of AI surveillance, and the field reports high frame-level ROC-AUC on standard video anomaly detection benchmarks. Those numbers are measured by training and testing on the same camera and scene. We audit what happens when that assumption is dropped. We build an unsupervised normality model from the all-normal training frames of one dataset, using frozen off-the-shelf embeddings (CLIP, DINOv2, ResNet-50, EfficientNet-B0) and a nearest-neighbour distance, and score the test frames of the same and of other datasets. Across 4 real datasets (UCSD Ped1, UCSD Ped2, CUHK Avenue, ShanghaiTech) and 4 backbones, same-dataset AUC averages 0.704 but cross-dataset AUC averages 0.499, which is chance: a detector calibrated on one scene is no better than a coin flip on another, and in several pairs it is below chance. The strongest backbone makes this worse, not better: DINOv2 has the best same-dataset AUC (up to 0.901 on Ped2) and the largest cross-dataset drop. The collapse is not an artefact of the scoring rule: replacing the nearest-neighbour detector with a PaDiM-style Mahalanobis detector reproduces it almost exactly (cross-dataset gap 0.202 versus 0.208). Even at a favourable operating point the false-alarm rate is on the order of 31,931 per hour. We conclude that the benchmark numbers quoted for surveillance anomaly detection describe a calibrated laboratory setting and overstate deployable reliability by a wide margin, and we release the code that reproduces every number.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper documents a consistent AUC drop from ~0.7 same-dataset to ~0.5 cross-dataset across backbones and detectors, but the claim that this directly shows benchmark numbers overstate deployable reliability hinges on an untested assumption about per-scene recalibration.

read the letter

The core finding is straightforward: training a normality model on one dataset's normal frames and testing on another's yields AUC near chance (0.499 average) while same-dataset hits 0.704, with the gap holding for both nearest-neighbor and Mahalanobis scoring. DINOv2 shows the biggest same-dataset gains but also the largest cross-dataset collapse. The work is new in its scale—four datasets, four frozen backbones, two detector families, and full code release—and the numbers are reported consistently enough to be useful for anyone tracking generalization in unsupervised video anomaly detection.

What it does cleanly is quantify the cross-scene failure mode with public data and reproducible steps. The false-alarm rate at a chosen operating point (31k per hour) is also a concrete number that follows from the AUCs.

The soft spot is the leap from these numbers to the deployment conclusion. The paper treats the cross-dataset protocol as the right proxy for real surveillance, but the stress-test point is fair: many systems can collect a modest set of normal frames from the target camera itself. Nothing in the manuscript shows that per-scene normal collection is infeasible or nonstandard, so the claim that benchmark AUC "overstates deployable reliability by a wide margin" rests on that unexamined step. If recalibration on target data is routine, the practical gap shrinks. The below-chance results in some pairs are also worth a closer look for possible label or distribution artifacts.

This is for researchers in video anomaly detection who care about out-of-domain behavior. The empirical audit is solid enough to deserve referee time even if the deployment framing needs tightening.

Referee Report

2 major / 1 minor

Summary. The paper evaluates unsupervised video anomaly detection using frozen off-the-shelf embeddings (CLIP, DINOv2, ResNet-50, EfficientNet-B0) and nearest-neighbor or Mahalanobis scoring on four public datasets (UCSD Ped1/2, CUHK Avenue, ShanghaiTech). It reports average same-dataset frame-level ROC-AUC of 0.704 versus cross-dataset AUC of 0.499 (chance level), with the gap reproduced across backbones and detectors; it concludes that same-scene benchmark numbers overstate deployable reliability for surveillance across cameras and scenes, and releases code for all reported numbers.

Significance. If the cross-dataset protocol is accepted as a valid proxy for deployment without per-scene adaptation, the result identifies a substantial evaluation gap in current VAD benchmarks. Strengths include the multi-dataset, multi-backbone consistency, reproduction of the gap with an alternative (PaDiM-style) detector, and the public code release that enables direct verification of every reported AUC.

major comments (2)

[Abstract] Abstract (paragraph on cross-dataset protocol) and conclusion: the central claim that same-dataset AUCs 'overstate deployable reliability' treats the observed cross-dataset collapse as the relevant deployment regime. This interpretation requires that real-world systems cannot or do not collect a modest set of normal frames from the target camera/scene to build the reference model; the manuscript supplies no citation, argument, or empirical support for the infeasibility of such per-scene collection, which is the load-bearing step linking the reported numbers to the deployment conclusion.
[Abstract] Abstract and methods description: the reported same-dataset average of 0.704 and cross-dataset average of 0.499 are presented as robust, yet the text does not specify the exact train/test splits used for each dataset pair, the precise definition of 'all-normal training frames,' or any controls for scene-specific statistics that might differ systematically between datasets; without these details the numerical gap cannot be fully audited even with the released code.

minor comments (1)

[Abstract] The false-alarm-rate claim of ~31,931 per hour at a favourable operating point should cite the exact threshold and frame rate assumptions used to derive it.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. Below we respond point-by-point to the major comments and indicate planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract (paragraph on cross-dataset protocol) and conclusion: the central claim that same-dataset AUCs 'overstate deployable reliability' treats the observed cross-dataset collapse as the relevant deployment regime. This interpretation requires that real-world systems cannot or do not collect a modest set of normal frames from the target camera/scene to build the reference model; the manuscript supplies no citation, argument, or empirical support for the infeasibility of such per-scene collection, which is the load-bearing step linking the reported numbers to the deployment conclusion.

Authors: We agree the manuscript does not supply citations or empirical evidence on the feasibility of per-scene normal-frame collection. The cross-dataset protocol is presented as a proxy for deployment to unseen scenes without scene-specific adaptation. We will revise the abstract and conclusion to state this assumption explicitly and add a short discussion paragraph noting that while per-scene collection is possible in controlled settings, many surveillance deployments involve new cameras, changing conditions, or resource constraints where such adaptation is not performed. This clarifies the scope of the claim without overstating it. revision: yes
Referee: [Abstract] Abstract and methods description: the reported same-dataset average of 0.704 and cross-dataset average of 0.499 are presented as robust, yet the text does not specify the exact train/test splits used for each dataset pair, the precise definition of 'all-normal training frames,' or any controls for scene-specific statistics that might differ systematically between datasets; without these details the numerical gap cannot be fully audited even with the released code.

Authors: The released code contains the exact splits and frame selections used for every reported number. To improve readability and auditability from the text, we will expand the methods section with a table or explicit list of the train/test splits for each dataset pair, the precise definition of all-normal training frames, and any scene-statistic controls applied. This addresses the concern directly. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical cross-dataset audit

full rationale

The manuscript reports direct computation of frame-level AUC using frozen off-the-shelf embeddings (CLIP, DINOv2, etc.) and two detectors (nearest-neighbour, Mahalanobis) on four public datasets. Same-dataset vs. cross-dataset AUC values are obtained by training on one dataset's normal frames and testing on another's test frames. No equations, fitted parameters renamed as predictions, self-citations, or ansatzes appear in the derivation chain; the central numbers are produced by running the described protocol on the data. The paper is self-contained against external benchmarks and contains no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper is an empirical audit that relies on standard computer-vision practices and public datasets rather than new theoretical constructs.

axioms (2)

standard math ROC-AUC is an appropriate scalar summary for ranking-based anomaly detection performance
Used throughout the abstract to report same- and cross-dataset results
domain assumption The four chosen datasets (UCSD Ped1/2, CUHK Avenue, ShanghaiTech) represent meaningfully distinct scenes and camera conditions
Invoked when interpreting cross-dataset AUC collapse as evidence of non-deployability

pith-pipeline@v0.9.1-grok · 5848 in / 1471 out tokens · 54007 ms · 2026-06-30T07:26:49.230936+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 13 canonical work pages · 7 internal anchors

[1]

Anomaly detection in crowded scenes,

V . Mahadevan, W.-X. Li, V . Bhalodia, and N. Vasconcelos, “Anomaly detection in crowded scenes,” inIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 1975–1981

2010
[2]

Abnormal event detection at 150 FPS in MATLAB,

C. Lu, J. Shi, and J. Jia, “Abnormal event detection at 150 FPS in MATLAB,” inIEEE International Conference on Computer Vision (ICCV), 2013, pp. 2720–2727

2013
[3]

Future Frame Prediction for Anomaly Detection -- A New Baseline

W. Liu, W. Luo, D. Lian, and S. Gao, “Future frame prediction for anomaly detection: A new baseline,” inIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6536–6545, arXiv:1712.09867

work page internal anchor Pith review Pith/arXiv arXiv 2018
[4]

A survey of single-scene video anomaly detection,

B. Ramachandra, M. J. Jones, and R. R. Vatsavai, “A survey of single-scene video anomaly detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 44, no. 5, pp. 2293–2312, 2022, arXiv:2004.05993

work page arXiv 2022
[5]

Deep learning for anomaly detection: A review,

G. Pang, C. Shen, L. Cao, and A. van den Hengel, “Deep learning for anomaly detection: A review,”ACM Computing Surveys, vol. 54, no. 2, pp. 1–38, 2022, arXiv:2007.02500

work page arXiv 2022
[6]

Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection,

D. Gong, L. Liu, V . Le, B. Saha, M. R. Mansour, S. Venkatesh, and A. van den Hengel, “Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection,” inIEEE/CVF International Conference on Computer Vision (ICCV), 2019, memAE; arXiv:1904.02639

work page arXiv 2019
[7]

Learning memory-guided normality for anomaly detection,

H. Park, J. Noh, and B. Ham, “Learning memory-guided normality for anomaly detection,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 14 372–14 381, mNAD

2020
[8]

Towards total recall in industrial anomaly detection,

K. Roth, L. Pemula, J. Zepeda, B. Sch ¨olkopf, T. Brox, and P. Gehler, “Towards total recall in industrial anomaly detection,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, patchCore; arXiv:2106.08265

work page arXiv 2022
[9]

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” in International Conference on Learning Representations (ICLR), 2017, arXiv:1610.02136

work page internal anchor Pith review Pith/arXiv arXiv 2017
[10]

Real-world Anomaly Detection in Surveillance Videos

W. Sultani, C. Chen, and M. Shah, “Real-world anomaly detection in surveillance videos,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6479–6488, arXiv:1801.04264

work page internal anchor Pith review Pith/arXiv arXiv 2018
[11]

Learning Transferable Visual Models From Natural Language Supervision

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” inInternational Conference on Machine Learning (ICML), 2021, arXiv:2103.00020

work page internal anchor Pith review Pith/arXiv arXiv 2021
[12]

Reproducible scaling laws for contrastive language-image learning,

M. Cherti, R. Beaumont, R. Wightman, M. Wortsman, G. Ilharco, C. Gordon, C. Schuhmann, L. Schmidt, and J. Jitsev, “Reproducible scaling laws for contrastive language-image learning,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, openCLIP; arXiv:2212.07143

work page arXiv 2023
[13]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby et al., “DINOv2: Learning robust visual features without supervision,”Transactions on Machine Learning Research (TMLR), 2024, arXiv:2304.07193

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

Deep Residual Learning for Image Recognition

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778, arXiv:1512.03385

work page internal anchor Pith review Pith/arXiv arXiv 2016
[15]

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

M. Tan and Q. V . Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” inInternational Conference on Machine Learning (ICML), 2019, arXiv:1905.11946

work page internal anchor Pith review Pith/arXiv arXiv 2019
[16]

PyTorch image models (timm),

R. Wightman, “PyTorch image models (timm),” https://github.com/huggingface/pytorch-image-models, 2019, accessed 2026-06-15

2019
[17]

PyTorch: An imperative style, high-performance deep learning library,

A. Paszke, S. Gross, F. Massa, A. Lereret al., “PyTorch: An imperative style, high-performance deep learning library,”Advances in Neural Information Processing Systems (NeurIPS), 2019

2019
[18]

PaDiM: A patch distribution modeling framework for anomaly detection and localization,

T. Defard, A. Setkov, A. Loesch, and R. Audigier, “PaDiM: A patch distribution modeling framework for anomaly detection and localization,” inInternational Conference on Pattern Recognition (ICPR) Workshops, 2021, arXiv:2011.08785

work page arXiv 2021

[1] [1]

Anomaly detection in crowded scenes,

V . Mahadevan, W.-X. Li, V . Bhalodia, and N. Vasconcelos, “Anomaly detection in crowded scenes,” inIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 1975–1981

2010

[2] [2]

Abnormal event detection at 150 FPS in MATLAB,

C. Lu, J. Shi, and J. Jia, “Abnormal event detection at 150 FPS in MATLAB,” inIEEE International Conference on Computer Vision (ICCV), 2013, pp. 2720–2727

2013

[3] [3]

Future Frame Prediction for Anomaly Detection -- A New Baseline

W. Liu, W. Luo, D. Lian, and S. Gao, “Future frame prediction for anomaly detection: A new baseline,” inIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6536–6545, arXiv:1712.09867

work page internal anchor Pith review Pith/arXiv arXiv 2018

[4] [4]

A survey of single-scene video anomaly detection,

B. Ramachandra, M. J. Jones, and R. R. Vatsavai, “A survey of single-scene video anomaly detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 44, no. 5, pp. 2293–2312, 2022, arXiv:2004.05993

work page arXiv 2022

[5] [5]

Deep learning for anomaly detection: A review,

G. Pang, C. Shen, L. Cao, and A. van den Hengel, “Deep learning for anomaly detection: A review,”ACM Computing Surveys, vol. 54, no. 2, pp. 1–38, 2022, arXiv:2007.02500

work page arXiv 2022

[6] [6]

Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection,

D. Gong, L. Liu, V . Le, B. Saha, M. R. Mansour, S. Venkatesh, and A. van den Hengel, “Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection,” inIEEE/CVF International Conference on Computer Vision (ICCV), 2019, memAE; arXiv:1904.02639

work page arXiv 2019

[7] [7]

Learning memory-guided normality for anomaly detection,

H. Park, J. Noh, and B. Ham, “Learning memory-guided normality for anomaly detection,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 14 372–14 381, mNAD

2020

[8] [8]

Towards total recall in industrial anomaly detection,

K. Roth, L. Pemula, J. Zepeda, B. Sch ¨olkopf, T. Brox, and P. Gehler, “Towards total recall in industrial anomaly detection,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, patchCore; arXiv:2106.08265

work page arXiv 2022

[9] [9]

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” in International Conference on Learning Representations (ICLR), 2017, arXiv:1610.02136

work page internal anchor Pith review Pith/arXiv arXiv 2017

[10] [10]

Real-world Anomaly Detection in Surveillance Videos

W. Sultani, C. Chen, and M. Shah, “Real-world anomaly detection in surveillance videos,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6479–6488, arXiv:1801.04264

work page internal anchor Pith review Pith/arXiv arXiv 2018

[11] [11]

Learning Transferable Visual Models From Natural Language Supervision

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” inInternational Conference on Machine Learning (ICML), 2021, arXiv:2103.00020

work page internal anchor Pith review Pith/arXiv arXiv 2021

[12] [12]

Reproducible scaling laws for contrastive language-image learning,

M. Cherti, R. Beaumont, R. Wightman, M. Wortsman, G. Ilharco, C. Gordon, C. Schuhmann, L. Schmidt, and J. Jitsev, “Reproducible scaling laws for contrastive language-image learning,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, openCLIP; arXiv:2212.07143

work page arXiv 2023

[13] [13]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby et al., “DINOv2: Learning robust visual features without supervision,”Transactions on Machine Learning Research (TMLR), 2024, arXiv:2304.07193

work page internal anchor Pith review Pith/arXiv arXiv 2024

[14] [14]

Deep Residual Learning for Image Recognition

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778, arXiv:1512.03385

work page internal anchor Pith review Pith/arXiv arXiv 2016

[15] [15]

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

M. Tan and Q. V . Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” inInternational Conference on Machine Learning (ICML), 2019, arXiv:1905.11946

work page internal anchor Pith review Pith/arXiv arXiv 2019

[16] [16]

PyTorch image models (timm),

R. Wightman, “PyTorch image models (timm),” https://github.com/huggingface/pytorch-image-models, 2019, accessed 2026-06-15

2019

[17] [17]

PyTorch: An imperative style, high-performance deep learning library,

A. Paszke, S. Gross, F. Massa, A. Lereret al., “PyTorch: An imperative style, high-performance deep learning library,”Advances in Neural Information Processing Systems (NeurIPS), 2019

2019

[18] [18]

PaDiM: A patch distribution modeling framework for anomaly detection and localization,

T. Defard, A. Setkov, A. Loesch, and R. Audigier, “PaDiM: A patch distribution modeling framework for anomaly detection and localization,” inInternational Conference on Pattern Recognition (ICPR) Workshops, 2021, arXiv:2011.08785

work page arXiv 2021