We Need to Rethink Benchmarking in Anomaly Detection
read the original abstract
Despite the continuous proposal of new anomaly detection algorithms and extensive benchmarking efforts, progress seems to stagnate, with only minor performance differences between established baselines and new algorithms. In this position paper, we argue that this stagnation is due to limitations in how we evaluate anomaly detection algorithms. In current benchmarks, a trivial algorithm that only checks for extreme values in individual features performs competitively with state-of-the-art deep learning methods, despite failing on simple cases such as anomalies within an annulus of normal points. Moreover, existing benchmarks do not adequately reflect the diversity of anomaly detection applications, making it difficult for practitioners to reliably select algorithms for their applications. Consequently, we need to rethink benchmarking in anomaly detection. In our opinion, anomaly detection should be studied using scenarios that group applications sharing relevant characteristics, defined through a common taxonomy. Benchmarking within scenarios enables scenario-specific choices for preprocessing, metrics, and model selection, clarifying which advances transfer across similar applications and providing practitioners with reliable guidance for their specific contexts.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
MacrOData: New Benchmarks of Thousands of Datasets for Tabular Outlier Detection
MacrOData supplies three large, curated benchmark suites totaling 2,446 datasets for tabular outlier detection, complete with standardized splits, metadata, and a public leaderboard.
-
Evaluating Tabular Representation Learning for Network Intrusion Detection
Tabular representation learning for network intrusion detection exhibits strong dataset-model dependency, with supervised methods outperforming unsupervised anomaly detection and limited but possible cross-dataset gen...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.