arxiv: 2604.15291 · v1 · submitted 2026-04-16 · 💻 cs.CV · cs.AI

Recognition: unknown

AD4AD: Benchmarking Visual Anomaly Detection Models for Safer Autonomous Driving

Fabrizio Genilotti , Arianna Stropeni , Gionata Grotto , Francesco Borsatti , Manuel Barusco , Davide Dalle Pezze , Gian Antonio Susto

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:48 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords visual anomaly detectionautonomous drivingAnoVox datasetbenchmarkingedge deploymentpixel-level anomaly mapsout-of-distribution objects

0 comments

The pith

Visual anomaly detection transfers effectively to road scenes and supports lightweight edge deployment in autonomous vehicles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper benchmarks eight visual anomaly detection methods on the AnoVox dataset to determine whether these models can reliably identify unfamiliar objects in driving scenes that were absent from training data. It finds that the approach works across different network scales, producing pixel-level maps that highlight potential hazards without any prior specification of what the anomaly might look like. A standout result is that the Tiny-Dinomaly variant achieves localization accuracy close to larger models while using far less memory, making it suitable for onboard vehicle hardware. This matters because autonomous driving systems encounter unexpected conditions that standard perception models are not trained to handle, and anomaly detection offers a way to flag such situations for human or system intervention. The work positions VAD as a practical addition to perception pipelines aimed at reducing physical risk on public roads.

Core claim

By evaluating eight state-of-the-art VAD methods across four backbone architectures on the AnoVox synthetic dataset for autonomous driving, the work establishes that visual anomaly detection transfers effectively to road scenes. The models identify anomalous objects not present during training and generate pixel-level anomaly maps that can guide attention to regions of concern without requiring assumptions about the hazard's nature or form. In particular, Tiny-Dinomaly achieves the best accuracy-efficiency trade-off for edge deployment, matching full-scale localization performance at a fraction of the memory cost.

What carries the argument

Benchmark evaluation of eight VAD methods on the AnoVox dataset across backbones from large networks to lightweight ones such as MobileNet and DeiT-Tiny, focusing on accuracy, localization, and memory efficiency for road-scene anomaly detection.

If this is right

Autonomous vehicle systems can flag unfamiliar obstacles in real time using pixel-level maps without needing exhaustive training on every possible hazard.
Lightweight models enable deployment of anomaly detection on resource-limited vehicle hardware while retaining most of the localization capability of larger versions.
VAD provides a general mechanism for handling out-of-distribution scenes that standard supervised perception cannot cover.
Integration of such models could reduce reliance on perfect training data coverage for safe operation in variable road environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Validation on diverse real-road datasets would be needed to confirm whether the synthetic results hold under actual traffic distributions and sensor noise.
Combining VAD outputs with existing object detectors could create layered safety checks that trigger alerts only on high-confidence anomalies.
The efficiency gains of tiny models open the possibility of running continuous anomaly monitoring alongside primary perception tasks without additional hardware.
Extending the benchmark to include temporal consistency across video frames might reveal whether anomaly maps remain stable enough for driver guidance in motion.

Load-bearing premise

That measured performance on the synthetic AnoVox dataset and its particular anomaly types will generalize to the distribution of real-world anomalies and edge cases encountered by autonomous vehicles on public roads.

What would settle it

A controlled evaluation on real-world autonomous driving video where the models systematically miss common but previously unseen hazards, such as unusual road debris or construction elements absent from AnoVox, would falsify the transfer claim.

Figures

Figures reproduced from arXiv: 2604.15291 by Arianna Stropeni, Davide Dalle Pezze, Fabrizio Genilotti, Francesco Borsatti, Gian Antonio Susto, Gionata Grotto, Manuel Barusco.

**Figure 2.** Figure 2: Comparison of VAD models for performance (P-AP), [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

The reliability of a machine vision system for autonomous driving depends heavily on its training data distribution. When a vehicle encounters significantly different conditions, such as atypical obstacles, its perceptual capabilities can degrade substantially. Unlike many domains where errors carry limited consequences, failures in autonomous driving translate directly into physical risk for passengers, pedestrians, and other road users. To address this challenge, we explore Visual Anomaly Detection (VAD) as a solution. VAD enables the identification of anomalous objects not present during training, allowing the system to alert the driver when an unfamiliar situation is detected. Crucially, VAD models produce pixel-level anomaly maps that can guide driver attention to specific regions of concern without requiring any prior assumptions about the nature or form of the hazard. We benchmark eight state-of-the-art VAD methods on AnoVox, the largest synthetic dataset for anomaly detection in autonomous driving. In particular, we evaluate performance across four backbone architectures spanning from large networks to lightweight ones such as MobileNet and DeiT-Tiny. Our results demonstrate that VAD transfers effectively to road scenes. Notably, Tiny-Dinomaly achieves the best accuracy-efficiency trade-off for edge deployment, matching full-scale localization performance at a fraction of the memory cost. This study represents a concrete step toward safer, more responsible deployment of autonomous vehicles, ultimately improving protection for passengers, pedestrians, and all road users.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a straightforward benchmark of existing VAD methods on the synthetic AnoVox driving dataset that flags a lightweight model for good efficiency, but the transfer-to-real-roads claim has no real data behind it.

read the letter

The main takeaway is that the paper runs eight known visual anomaly detection methods on the AnoVox synthetic dataset and finds that Tiny-Dinomaly gives the strongest accuracy-memory trade-off among the options tested. It compares them across four backbones, from full-scale down to MobileNet and DeiT-Tiny, and reports localization metrics plus efficiency numbers for edge use. That comparison is the concrete new piece: the first published set of results for these methods on this particular dataset with the efficiency angle spelled out. The evaluation looks systematic and gives practitioners a usable reference point for choosing models when memory is tight. The paper does that part cleanly without overclaiming new algorithms or theory. The soft spot is the leap from synthetic results to real road scenes. The abstract states that VAD transfers effectively to road scenes and supports safer autonomous driving, yet every number comes from generated anomalies in AnoVox. No real driving footage, sensor data, or field anomalies are tested, so factors like motion blur, lighting shifts, or unusual object shapes stay unexamined. That gap directly weakens the safety conclusions even if the benchmark numbers themselves hold up on the synthetic set. The work shows clear thinking and standard citation patterns for an empirical study, with no circular fitting or invented entities. It is aimed at AV perception researchers and VAD practitioners who need baseline numbers on AnoVox or efficiency comparisons for deployment. A reader working on model selection for edge hardware would get direct value from the trade-off data. It deserves peer review because the benchmark is new and the comparisons are reproducible, though referees will need to press on the real-world validation step.

Referee Report

2 major / 1 minor

Summary. The paper benchmarks eight state-of-the-art visual anomaly detection (VAD) methods on the synthetic AnoVox dataset for autonomous driving. It evaluates performance across four backbone architectures (large to lightweight, including MobileNet and DeiT-Tiny) and concludes that VAD transfers effectively to road scenes, with Tiny-Dinomaly achieving the best accuracy-efficiency trade-off for edge deployment.

Significance. If the synthetic results generalize, this provides a useful benchmark for VAD in autonomous driving and identifies efficient models for edge devices that could aid safety by detecting out-of-distribution objects. The work contributes concrete steps toward responsible AV deployment, though its broader impact depends on addressing generalization gaps.

major comments (2)

Abstract: The central claim that 'VAD transfers effectively to road scenes' and that Tiny-Dinomaly 'matches full-scale localization performance' rests entirely on experiments using the synthetic AnoVox dataset and its generated anomaly types; no real-world driving footage, sensor data, or public-road anomalies are evaluated, leaving the results vulnerable to distribution shift from factors such as lighting variations, motion blur, and sensor noise.
Abstract and experimental setup: The abstract states results and a preferred model but supplies no information on exact metrics, statistical tests, data splits, or controls for confounding factors; without these details the strength of the empirical claims cannot be verified from the provided summary.

minor comments (1)

The abstract would be strengthened by including key quantitative metrics (e.g., AUC or IoU values) to support the stated conclusions about accuracy-efficiency trade-offs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of clarity and scope that we address point by point below, proposing targeted revisions where appropriate.

read point-by-point responses

Referee: [—] Abstract: The central claim that 'VAD transfers effectively to road scenes' and that Tiny-Dinomaly 'matches full-scale localization performance' rests entirely on experiments using the synthetic AnoVox dataset and its generated anomaly types; no real-world driving footage, sensor data, or public-road anomalies are evaluated, leaving the results vulnerable to distribution shift from factors such as lighting variations, motion blur, and sensor noise.

Authors: We agree that the evaluation relies solely on the synthetic AnoVox dataset, which emulates driving scenes but cannot fully replicate real-world variabilities such as sensor noise or uncontrolled lighting. Our claims of effective transfer are therefore scoped to this benchmark setting, where anomaly types are generated to reflect road hazards. To address the concern, we will revise the abstract to qualify the language (e.g., 'transfers effectively to synthetic road scenes') and add a dedicated Limitations section that discusses generalization risks, including the listed factors, along with suggestions for future real-world validation. This maintains the paper's focus on the synthetic benchmark while being transparent about its scope. revision: partial
Referee: [—] Abstract and experimental setup: The abstract states results and a preferred model but supplies no information on exact metrics, statistical tests, data splits, or controls for confounding factors; without these details the strength of the empirical claims cannot be verified from the provided summary.

Authors: The abstract serves as a high-level overview, with full experimental details provided in the manuscript's Methods and Experiments sections, including metrics (AUROC and AUPRC for both image-level detection and pixel-level localization), the fixed train/test splits on AnoVox, and controls such as standardized preprocessing pipelines and evaluation across multiple backbones. No formal statistical tests were conducted because the benchmarks use deterministic dataset splits and report averaged results over training seeds where applicable. To improve verifiability, we will revise the abstract to include key quantitative results (e.g., specific AUROC values for Tiny-Dinomaly) and a brief reference to the evaluation protocol. This strengthens the abstract without expanding it excessively. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical benchmarking on external synthetic dataset

full rationale

The paper is a pure benchmarking study that evaluates eight VAD methods on the AnoVox dataset and reports measured performance metrics (accuracy, efficiency, memory cost) for models including Tiny-Dinomaly. No derivation chain, equations, fitted parameters, or predictions appear in the provided text. Claims rest on direct experimental outcomes rather than any self-definitional, fitted-input, or self-citation reduction. The evaluation is self-contained against the stated benchmarks and held-out synthetic data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a pure empirical benchmarking study; it introduces no free parameters, mathematical axioms, or new postulated entities beyond the existing VAD methods and the synthetic dataset already named in the abstract.

pith-pipeline@v0.9.0 · 5572 in / 1224 out tokens · 37104 ms · 2026-05-10T11:48:23.096427+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Draem-a discriminatively trained reconstruction embedding for surface anomaly detection,

V . Zavrtanik, M. Kristan, and D. Sko ˇcaj, “Draem-a discriminatively trained reconstruction embedding for surface anomaly detection,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 8330–8339, 2021

2021
[2]

Diffusionad: Norm-guided one-step denoising diffusion for anomaly detection,

H. Zhang, Z. Wang, D. Zeng, Z. Wu, and Y .-G. Jiang, “Diffusionad: Norm-guided one-step denoising diffusion for anomaly detection,” 2025

2025
[3]

Student-teacher feature pyramid matching for anomaly detection.arXiv preprint arXiv:2103.04257, 2021

G. Wang, S. Han, E. Ding, and D. Huang, “Student-teacher feature pyramid matching for anomaly detection,”arXiv:2103.04257, 2021

work page arXiv 2021
[4]

PaDiM: A patch distribution modeling framework for anomaly detection and localization,

T. Defard, A. Setkov, A. Loesch, and R. Audigier, “PaDiM: A patch distribution modeling framework for anomaly detection and localization,” inPattern Recognition. ICPR International Workshops and Challenges, pp. 475–489, Springer International Publishing, 2021

2021
[5]

CoRRabs/2106.08265(2021), https://arxiv.org/abs/2106.08265

K. Roth, L. Pemula, J. Zepeda, B. Sch ¨olkopf, T. Brox, and P. Gehler, “Towards total recall in industrial anomaly detection,” arXiv:2106.08265, 2022

work page arXiv 2022
[6]

Cfa: Coupled-hypersphere-based feature adaptation for target-oriented anomaly localization,

S. Lee, S. Lee, and B. C. Song, “Cfa: Coupled-hypersphere-based feature adaptation for target-oriented anomaly localization,”IEEE Access, vol. 10, pp. 78446–78454, 2022

2022
[7]

Fastflow: Unsupervised anomaly detection and localization via 2d normalizing flows,

J. Yu, Y . Zheng, X. Wang, W. Li, Y . Wu, R. Zhao, and L. Wu, “Fastflow: Unsupervised anomaly detection and localization via 2d normalizing flows,” 2021

2021
[8]

Perception datasets for anomaly detection in autonomous driving: A survey,

D. Bogdoll, S. Uhlemeyer, K. Kowol, and J. M. Z ¨ollner, “Perception datasets for anomaly detection in autonomous driving: A survey,” in 2023 IEEE Intelligent V ehicles Symposium (IV), p. 1–8, IEEE, June 2023

2023
[9]

Segmentmeifyoucan: A benchmark for anomaly segmentation,

R. Chan, K. Lis, S. Uhlemeyer, H. Blum, S. Honari, R. Siegwart, P. Fua, M. Salzmann, and M. Rottmann, “Segmentmeifyoucan: A benchmark for anomaly segmentation,” 2021

2021
[10]

Scaling out-of-distribution detection for real-world settings,

D. Hendrycks, S. Basart, M. Mazeika, A. Zou, J. Kwon, M. Mostajabi, J. Steinhardt, and D. Song, “Scaling out-of-distribution detection for real-world settings,” 2022

2022
[11]

Lost and found: detecting small road hazards for self-driving vehi- cles,

P. Pinggera, S. Ramos, S. Gehrig, U. Franke, C. Rother, and R. Mester, “Lost and found: detecting small road hazards for self-driving vehi- cles,” inInternational Conference on Intelligent Robots and Systems, pp. 1099–1106, 10 2016

2016
[12]

The fishyscapes benchmark: Measuring blind spots in semantic segmenta- tion,

H. Blum, P.-E. Sarlin, J. Nieto, R. Siegwart, and C. Cadena, “The fishyscapes benchmark: Measuring blind spots in semantic segmenta- tion,”International Journal of Computer Vision, vol. 129, pp. 1–17, 11 2021

2021
[13]

Anovox: A benchmark for multimodal anomaly detection in autonomous driving,

D. Bogdoll, I. Hamdard, L. N. R ¨oßler, F. Geisler, M. Bayram, F. Wang, J. Imhof, M. De Campos, A. Tabarov, Y . Yang,et al., “Anovox: A benchmark for multimodal anomaly detection in autonomous driving,” inEuropean Conference on Computer Vision, pp. 206–223, Springer, 2024

2024
[14]

Anomaly detection in autonomous driving: A survey,

D. Bogdoll, M. Nitsche, and J. M. Z ¨ollner, “Anomaly detection in autonomous driving: A survey,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4488– 4499, 2022

2022
[15]

Paste: Improving the efficiency of visual anomaly detection at the edge,

M. Barusco, F. Borsatti, D. D. Pezze, F. Paissan, E. Farella, and G. A. Susto, “Paste: Improving the efficiency of visual anomaly detection at the edge,”arXiv preprint arXiv:2410.11591, 2024

work page arXiv 2024
[16]

Anomaly detection via reverse distillation from one-class embedding,

H. Deng and X. Li, “Anomaly detection via reverse distillation from one-class embedding,” 2022

2022
[17]

Supersimplenet: Unifying unsupervised and supervised learning for fast and reliable surface defect detection,

B. Rolih, M. Fu ˇcka, and D. Sko ˇcaj, “Supersimplenet: Unifying unsupervised and supervised learning for fast and reliable surface defect detection,” inInternational Conference on Pattern Recognition, pp. 47–65, Springer, 2025

2025
[18]

Dinomaly: The less is more philosophy in multi-class unsupervised anomaly detec- tion,

J. Guo, S. Lu, W. Zhang, F. Chen, H. Li, and H. Liao, “Dinomaly: The less is more philosophy in multi-class unsupervised anomaly detec- tion,” inProceedings of the Computer Vision and Pattern Recognition Conference, pp. 20405–20415, 2025

2025
[19]

Training data-efficient image transformers & distillation through attention,

H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. J ´egou, “Training data-efficient image transformers & distillation through attention,” inInternational conference on machine learning, pp. 10347–10357, PMLR, 2021

2021
[20]

Continual Visual Anomaly Detection on the Edge: Benchmark and Efficient Solutions

M. Barusco, F. Borsatti, D. Petrovic, D. D. Pezze, and G. A. Susto, “Continual visual anomaly detection on the edge: Benchmark and efficient solutions,”arXiv preprint arXiv:2604.06435, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[21]

SA8295P automotive development platform

Lantronix, Inc., “SA8295P automotive development platform.” Lantronix Product Page, 2026

2026