Randomized PCA Forest for Unsupervised Outlier Detection

Arthur Zimek; Farhad Pakdaman; Moncef Gabbouj; Muhammad Rajabinasab; Peter Schneider-Kamp

arxiv: 2508.12776 · v3 · submitted 2025-08-18 · 💻 cs.LG · cs.AI· stat.ML

Randomized PCA Forest for Unsupervised Outlier Detection

Muhammad Rajabinasab , Farhad Pakdaman , Moncef Gabbouj , Peter Schneider-Kamp , Arthur Zimek This is my paper

Pith reviewed 2026-05-18 22:51 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords unsupervised outlier detectionrandomized PCAPCA forestanomaly detectionensemble methodshigh-dimensional datamachine learning

0 comments

The pith

A Randomized PCA Forest can detect outliers unsupervised by turning its internal structure into an anomaly score.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method that builds a forest of randomized principal component analysis trees and extracts an outlier score directly from how the forest organizes the data. This approach is motivated by the forest's success in fast nearest-neighbor search and aims to identify anomalies without any labeled examples. Experiments indicate it outperforms several classical and recent detectors on multiple datasets while remaining competitive elsewhere. The work emphasizes that the method requires no extra parameter tuning beyond the forest construction itself. A reader would care because many real applications need reliable anomaly finding in unlabeled high-dimensional data at reasonable computational cost.

Core claim

The central claim is that an outlier score derived from the intrinsic properties of a Randomized PCA Forest reliably flags anomalies. The forest is constructed by repeatedly applying randomized PCA to split the data, and the score reflects aspects such as how isolated a point appears across the collection of trees. This yields a fully unsupervised detector whose performance is evaluated on standard benchmark collections, showing superiority over baseline and state-of-the-art alternatives on several of them.

What carries the argument

The Randomized PCA Forest, used by deriving an outlier score from its intrinsic structural properties rather than from external distance computations.

If this is right

The method outperforms classical outlier detectors on several benchmark datasets.
It remains competitive with recent state-of-the-art approaches on the remaining datasets.
Computational cost stays low because the forest construction reuses the same randomized PCA splits.
Robustness follows from the ensemble nature of the forest and the intrinsic score definition.
The approach requires no labeled data or additional model fitting beyond building the forest.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same forest structure could be reused for both approximate nearest-neighbor lookup and outlier detection in a single pipeline.
Because the score comes from tree organization, the method may scale more gracefully to very large collections than distance-based alternatives.
Extensions that vary the number of trees or the dimensionality reduction target inside each split could be tested directly on the existing experimental setup.

Load-bearing premise

That an outlier score taken directly from the forest's internal properties will identify anomalies reliably on new datasets without needing further validation or adjustment of the score formula.

What would settle it

Running the method on a fresh collection of high-dimensional datasets where it consistently ranks below standard isolation-forest or local-outlier-factor baselines in standard AUC or precision-recall metrics.

Figures

Figures reproduced from arXiv: 2508.12776 by Arthur Zimek, Farhad Pakdaman, Moncef Gabbouj, Muhammad Rajabinasab, Peter Schneider-Kamp.

**Figure 2.** Figure 2: The effect of forest size on the performance of RPCA forest. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: The performance of RPCA forest using different hyperparameter combinations. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: The investigation of the amount of explained variance ratio using [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: The critical difference diagram based on AUC. The methods to the right show a better average ranking across all datasets. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Histogram showing the frequency of each method being ranked as the best or second-best across all datasets. (a) shows the models with different [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: The generalizability analysis of the proposed method. The box plots show the different AUC values observed in the evaluation of the competitors [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: The comparison of the effect of forest size on the performance of the [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

read the original abstract

We propose a novel unsupervised outlier detection method based on Randomized Principal Component Analysis (PCA). Motivated by the performance of Randomized PCA (RPCA) Forest in approximate K-Nearest Neighbor (KNN) search, we develop a novel unsupervised outlier detection method that utilizes RPCA Forest for unsupervised outlier detection by deriving an outlier score from its intrinsic properties. Experimental results showcase the superiority of the proposed approach compared to the classical and state-of-the-art methods in performing the outlier detection task on several datasets while performing competitively on the rest. The extensive analysis of the proposed method reflects its robustness and its computational efficiency, highlighting it as a good choice for unsupervised outlier detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adapts an existing RPCA Forest for outlier scoring but leaves the score formula and its validation thin.

read the letter

The main thing here is a straightforward repurposing: take the Randomized PCA Forest that was already used for approximate KNN search and derive an outlier score from its internal structure instead. That is the actual new application, and it sits inside the broader family of tree and subspace methods rather than breaking new ground on theory. The work does show some practical upside in the experiments, with claims of better performance than several classical and recent baselines on multiple datasets while remaining competitive on others, plus notes on speed and robustness that could matter for real pipelines. Efficiency is a reasonable selling point given the randomized projections involved. The soft spots are more noticeable. The abstract and stress-test note both flag that the outlier score comes from intrinsic properties like leaf stats or residuals, yet no explicit formula, derivation steps, or ablation across score variants appears. Without that, the reported wins risk looking like post-hoc fitting rather than a reliably separating measure. Experiments seem to rest on direct comparisons without statistical significance tests or clear protocol details, which weakens the superiority claim. The method is probably fine as a heuristic for practitioners who already like forest-based detectors and need something fast on moderate-sized data. A reader focused on applied unsupervised detection might pick up a usable idea or two from the efficiency angle. I would send this to peer review so the authors can supply the missing score definition, add controls, and let referees check whether the gains are stable.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a novel unsupervised outlier detection method based on Randomized Principal Component Analysis (RPCA) Forest. Motivated by RPCA Forest's performance in approximate KNN search, the authors derive an outlier score directly from the forest's intrinsic properties and evaluate it experimentally against classical and state-of-the-art methods, claiming superiority on several datasets and competitive results on the remainder, along with robustness and computational efficiency.

Significance. If the outlier score is rigorously defined and the empirical comparisons hold under standard statistical scrutiny, the work could contribute a computationally efficient unsupervised outlier detection approach that reuses the structure of randomized PCA forests without requiring separate model fitting. This would be of interest in high-dimensional settings where existing isolation-based or density-based methods scale poorly.

major comments (2)

[Method section (likely §3)] The central claim depends on an outlier score derived from the RPCA Forest's intrinsic properties (e.g., leaf statistics, path lengths, or reconstruction residuals). No explicit mathematical definition, aggregation rule, or derivation appears in the method section; without it, reproducibility and the claim that the score separates anomalies without dataset-specific tuning cannot be assessed.
[Experiments section (likely §4)] Experimental results assert superiority on 'several datasets' but provide no details on the precise outlier score formula used in the reported runs, the full experimental protocol (train/test splits, hyperparameter selection), or statistical significance testing. This undermines the robustness and superiority assertions in §4.

minor comments (2)

[Abstract] The abstract would benefit from naming the specific datasets, performance metrics (e.g., AUC-ROC, precision@K), and number of baselines to give readers an immediate sense of scope.
[Method section] Notation for randomized projection parameters and forest hyperparameters should be introduced once and used consistently; occasional undefined symbols appear in the method description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment point by point below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Method section (likely §3)] The central claim depends on an outlier score derived from the RPCA Forest's intrinsic properties (e.g., leaf statistics, path lengths, or reconstruction residuals). No explicit mathematical definition, aggregation rule, or derivation appears in the method section; without it, reproducibility and the claim that the score separates anomalies without dataset-specific tuning cannot be assessed.

Authors: We agree that an explicit mathematical definition is required for reproducibility. In the revised manuscript we will add to Section 3 a precise formulation of the outlier score, specifying its derivation from leaf statistics and reconstruction residuals, the aggregation rule across the forest, and a short justification for its parameter-free separation of anomalies. revision: yes
Referee: [Experiments section (likely §4)] Experimental results assert superiority on 'several datasets' but provide no details on the precise outlier score formula used in the reported runs, the full experimental protocol (train/test splits, hyperparameter selection), or statistical significance testing. This undermines the robustness and superiority assertions in §4.

Authors: We accept that additional experimental details are necessary. The revised version will state the exact outlier score formula used in the reported experiments, describe the full protocol (including train/test splits, hyperparameter ranges and selection method), and include statistical significance tests (e.g., Wilcoxon signed-rank or paired t-tests with p-values) to support the performance claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation of outlier score.

full rationale

The paper proposes deriving a new outlier score directly from the intrinsic properties of an RPCA Forest, motivated by prior KNN performance but without any quoted equations or steps that reduce the score definition to a fitted parameter, self-referential construction, or load-bearing self-citation chain. No self-definitional loop, fitted-input-as-prediction, or ansatz-smuggled-via-citation is exhibited in the provided abstract or description. The central claim of superiority rests on experimental results rather than a tautological re-derivation, making the method self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard unsupervised learning assumptions that tree-based structures capture density or distance information useful for anomaly scoring; no new entities or fitted parameters are mentioned in the abstract.

axioms (1)

domain assumption Randomized PCA Forest structure encodes sufficient information about local data density to serve as an outlier indicator
Implicit in the claim that an outlier score can be derived from intrinsic forest properties

pith-pipeline@v0.9.0 · 5650 in / 1092 out tokens · 32266 ms · 2026-05-18T22:51:28.831160+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The final outlier score for a point q is then the mean distance of q within its leaf node weighted by its depth-based probability: RPCAForestScore(q) = P(q)μdist(q)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a novel unsupervised outlier detection method based on Randomized Principal Component Analysis (PCA).

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

[1]

Dimensionality-aware outlier detec- tion,

A. Anderberg, J. Bailey, R. J. Campello, M. E. Houle, H. O. Marques, M. Radovanovi ´c, and A. Zimek, “Dimensionality-aware outlier detec- tion,” in Proceedings of the 2024 SIAM International Conference on Data Mining (SDM) . SIAM, 2024, pp. 652–660

work page 2024
[2]

Fast outlier detection in high dimensional spaces,

F. Angiulli and C. Pizzuti, “Fast outlier detection in high dimensional spaces,” in Principles of Data Mining and Knowledge Discovery , T. Elomaa, H. Mannila, and H. Toivonen, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002, pp. 15–27

work page 2002
[3]

Barnett and T

V . Barnett and T. Lewis, Outliers in Statistical Data, 3rd ed. Chichester: John Wiley & Sons, 1994

work page 1994
[4]

Fast one- class classification using class boundary-preserving random projections,

A. Bhattacharya, S. Varambally, A. Bagchi, and S. Bedathur, “Fast one- class classification using class boundary-preserving random projections,” in KDD, 2021

work page 2021
[5]

Lof: identifying density-based local outliers,

M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “Lof: identifying density-based local outliers,” in Proceedings of the 2000 ACM SIGMOD international conference on Management of data , 2000, pp. 93–104

work page 2000
[6]

On the evaluation of unsu- pervised outlier detection: measures, datasets, and an empirical study,

G. O. Campos, A. Zimek, J. Sander, R. J. Campello, B. Micenkov ´a, E. Schubert, I. Assent, and M. E. Houle, “On the evaluation of unsu- pervised outlier detection: measures, datasets, and an empirical study,” Data mining and knowledge discovery , vol. 30, pp. 891–927, 2016

work page 2016
[7]

An outlier detection approach on credit card fraud detection using machine learning: a comparative analysis on supervised and unsupervised learning,

P. Caroline Cynthia and S. Thomas George, “An outlier detection approach on credit card fraud detection using machine learning: a comparative analysis on supervised and unsupervised learning,” in Intelligence in Big Data Technologies—Beyond the Hype: Proceedings of ICBDCC 2019 . Springer, 2021, pp. 125–135

work page 2019
[8]

Density-preserving projec- tions for large-scale local anomaly detection,

T. de Vries, S. Chawla, and M. E. Houle, “Density-preserving projec- tions for large-scale local anomaly detection,” KAIS, vol. 32, no. 1, pp. 25–52, 2012

work page 2012
[9]

Outlier detection by ensembling uncertainty with negative objectness,

A. Deli ´c, M. Grci ´c, and S. ˇSegvi´c, “Outlier detection by ensembling uncertainty with negative objectness,” arXiv preprint arXiv:2402.15374, 2024

work page arXiv 2024
[10]

Generative adversarial nets for unsupervised outlier detection,

X. Du, J. Chen, J. Yu, S. Li, and Q. Tan, “Generative adversarial nets for unsupervised outlier detection,” Expert Systems with Applications , vol. 236, p. 121161, 2024

work page 2024
[11]

An experimental study of existing tools for outlier detection and cleaning in trajectories,

M. M. Garcez Duarte and M. Sakr, “An experimental study of existing tools for outlier detection and cleaning in trajectories,” GeoInformatica, vol. 29, no. 1, pp. 31–51, 2025

work page 2025
[12]

A Comparative Evaluation of Unsu- pervised Anomaly Detection Algorithms for Multivariate Data,

M. Goldstein and S. Uchida, “A Comparative Evaluation of Unsu- pervised Anomaly Detection Algorithms for Multivariate Data,” PLOS ONE, vol. 11, no. 4, p. e0152173, apr 2016

work page 2016
[13]

Two-stage approach with combination of outlier detection method and deep learning enhances automatic epileptic seizure detection,

V . V . Grubov, S. I. Nazarikov, S. A. Kurkin, N. P. Utyashev, D. A. Andrikov, O. E. Karpov, and A. E. Hramov, “Two-stage approach with combination of outlier detection method and deep learning enhances automatic epileptic seizure detection,” IEEE Access, 2024

work page 2024
[14]

Finding structure with randomness: Probabilistic algorithms for constructing approximate ma- trix decompositions,

N. Halko, P.-G. Martinsson, and J. A. Tropp, “Finding structure with randomness: Probabilistic algorithms for constructing approximate ma- trix decompositions,” SIAM review, vol. 53, no. 2, pp. 217–288, 2011

work page 2011
[15]

Adbench: Anomaly detection benchmark,

S. Han, X. Hu, H. Huang, M. Jiang, and Y . Zhao, “Adbench: Anomaly detection benchmark,” in NeurIPS, 2022

work page 2022
[16]

Extended isolation forest,

S. Hariri, M. C. Kind, and R. J. Brunner, “Extended isolation forest,” IEEE transactions on knowledge and data engineering , vol. 33, no. 4, pp. 1479–1489, 2019

work page 2019
[17]

Outlier detection using k-nearest neighbour graph,

V . Hautamaki, I. Karkkainen, and P. Franti, “Outlier detection using k-nearest neighbour graph,” in Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., vol. 3, 2004, pp. 430–433 V ol.3

work page 2004
[18]

D. M. Hawkins, Identification of Outliers. London: Chapman and Hall, 1980

work page 1980
[19]

Local intrinsic dimensionality I: an extreme-value- theoretic foundation for similarity applications,

M. E. Houle, “Local intrinsic dimensionality I: an extreme-value- theoretic foundation for similarity applications,” inSimilarity Search and Applications - 10th International Conference, SISAP , 2017, pp. 64–79

work page 2017
[20]

Intrusion detection on internet of vehicles via combining log-ratio oversampling, outlier detection and metric learning,

F. Jin, M. Chen, W. Zhang, Y . Yuan, and S. Wang, “Intrusion detection on internet of vehicles via combining log-ratio oversampling, outlier detection and metric learning,” Information Sciences, vol. 579, pp. 814– 831, 2021

work page 2021
[21]

Good and bad neighborhood approximations for outlier detection ensembles,

E. Kirner, E. Schubert, and A. Zimek, “Good and bad neighborhood approximations for outlier detection ensembles,” in Similarity Search and Applications - 10th International Conference, SISAP, 2017, pp. 173– 187

work page 2017
[22]

Asynchronism-based principal component analysis for time series data mining,

C. Li, “Asynchronism-based principal component analysis for time series data mining,” Expert Systems with Applications , vol. 41, no. 11, pp. 5182–5190, 2014

work page 2014
[23]

Ms2od: outlier detection using minimum spanning tree and medoid selection,

J. Li, J. Li, C. Wang, F. J. Verbeek, T. Schultz, and H. Liu, “Ms2od: outlier detection using minimum spanning tree and medoid selection,” Machine Learning: Science and Technology , vol. 5, no. 1, p. 015025, 2024

work page 2024
[24]

Isolation forest,

F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” in 2008 eighth ieee international conference on data mining. IEEE, 2008, pp. 413–422

work page 2008
[25]

The effect of principal component analysis on machine learning accuracy with high dimensional spectral data,

M. G. Madden and A. G. Ryder, “The effect of principal component analysis on machine learning accuracy with high dimensional spectral data,” in Applications and Innovations in Intelligent Systems XIII, Proceedings of AI-2005 , Cambridge, UK, December 2005

work page 2005
[26]

Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs,

Y . A. Malkov and D. A. Yashunin, “Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs,” IEEE TPAMI, vol. 42, no. 4, 2020

work page 2020
[27]

Sensitivity analysis with iterative outlier detection for systematic reviews and meta-analyses,

Z. Meng, J. Wang, L. Lin, and C. Wu, “Sensitivity analysis with iterative outlier detection for systematic reviews and meta-analyses,” Statistics in Medicine, vol. 43, no. 8, pp. 1549–1563, 2024

work page 2024
[28]

On the design of scalable outlier detection methods using approximate nearest neighbor graphs,

C. B. Okkels, M. Aum ¨uller, and A. Zimek, “On the design of scalable outlier detection methods using approximate nearest neighbor graphs,” in Similarity Search and Applications - 17th International Conference, SISAP, 2024, pp. 170–184

work page 2024
[29]

Rajabinasab, A

M. Rajabinasab, A. Lautrup, and A. Zimek, Metrics for Inter-Dataset Similarity with Example Applications in Synthetic Data and Feature Selection Evaluation. Philadelphia, PA, USA: Proceedings of the 2025 SIAM International Conference on Data Mining, 2025, pp. 527–537

work page 2025
[30]

Randomized pca forest for approximate k-nearest neighbor search,

M. Rajabinasab, F. Pakdaman, A. Zimek, and M. Gabbouj, “Randomized pca forest for approximate k-nearest neighbor search,” Expert Systems with Applications, p. 126254, 2024

work page 2024
[31]

Efficient algorithms for mining outliers from large data sets,

S. Ramaswamy, R. Rastogi, and K. Shim, “Efficient algorithms for mining outliers from large data sets,” in Proceedings of the 2000 ACM SIGMOD international conference on Management of data , 2000, pp. 427–438

work page 2000
[32]

Evaluating outlier probabilities: assessing sharpness, refinement, and calibration using stratified and weighted measures,

P. R ¨ochner, H. O. Marques, R. J. G. B. Campello, and A. Zimek, “Evaluating outlier probabilities: assessing sharpness, refinement, and calibration using stratified and weighted measures,” Data Min. Knowl. Discov., vol. 38, no. 6, pp. 3719–3757, 2024

work page 2024
[33]

Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection,

E. Schubert, A. Zimek, and H. P. Kriegel, “Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection,” Data mining and knowledge discovery, vol. 28, pp. 190–237, 2014

work page 2014
[34]

Fast and scalable outlier detection with approximate nearest neighbor ensembles,

E. Schubert, A. Zimek, and H.-P. Kriegel, “Fast and scalable outlier detection with approximate nearest neighbor ensembles,” in Proc. DAS- FAA, 2015

work page 2015
[35]

Outlier detection: applications and tech- niques,

K. Singh and S. Upadhyaya, “Outlier detection: applications and tech- niques,” International Journal of Computer Science Issues (IJCSI) , vol. 9, no. 1, p. 307, 2012

work page 2012
[36]

A comparative evaluation of clustering-based outlier detection,

B. V . S. Vinces, E. Schubert, A. Zimek, and R. L. F. Cordeiro, “A comparative evaluation of clustering-based outlier detection,” Data Min. Knowl. Discov., vol. 39, no. 2, p. 13, 2025

work page 2025
[37]

Enhanced fault detection for gnss/ins integration using maximum correntropy filter and local outlier factor,

W. Wang, W. Shangguan, J. Liu, and J. Chen, “Enhanced fault detection for gnss/ins integration using maximum correntropy filter and local outlier factor,” IEEE Transactions on Intelligent Vehicles , vol. 9, no. 1, pp. 2077–2093, 2023

work page 2077
[38]

Locality sensitive outlier detection: A ranking driven approach,

Y . Wang, S. Parthasarathy, and S. Tatikonda, “Locality sensitive outlier detection: A ranking driven approach,” in Proc. ICDE, 2011

work page 2011
[39]

Breast cancer wisconsin (diagnostic),

W. H. Wolberg, O. L. Mangasarian, N. Street, and W. Street, “Breast cancer wisconsin (diagnostic),” UCI Machine Learning Repository, 1995

work page 1995
[40]

Deep isolation forest for anomaly detection,

H. Xu, G. Pang, Y . Wang, and Y . Wang, “Deep isolation forest for anomaly detection,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 12, pp. 12 591–12 604, 2023

work page 2023
[41]

LSHiForest: A generic framework for fast tree isolation based ensemble anomaly analysis,

X. Zhang, W. Dou, Q. He, R. Zhou, C. Leckie, R. Kotagiri, and Z. Salcic, “LSHiForest: A generic framework for fast tree isolation based ensemble anomaly analysis,” in Proc. ICDE, 2017

work page 2017
[42]

Outlier detection method based on high-density iteration,

Y . Zhou, H. Xia, D. Yu, J. Cheng, and J. Li, “Outlier detection method based on high-density iteration,” Information Sciences , vol. 662, p. 120286, 2024

work page 2024
[43]

There and back again: Outlier detection between statistical reasoning and data mining algorithms,

A. Zimek and P. Filzmoser, “There and back again: Outlier detection between statistical reasoning and data mining algorithms,” WIREs Data Mining Knowl. Discov., vol. 8, no. 6, 2018

work page 2018

[1] [1]

Dimensionality-aware outlier detec- tion,

A. Anderberg, J. Bailey, R. J. Campello, M. E. Houle, H. O. Marques, M. Radovanovi ´c, and A. Zimek, “Dimensionality-aware outlier detec- tion,” in Proceedings of the 2024 SIAM International Conference on Data Mining (SDM) . SIAM, 2024, pp. 652–660

work page 2024

[2] [2]

Fast outlier detection in high dimensional spaces,

F. Angiulli and C. Pizzuti, “Fast outlier detection in high dimensional spaces,” in Principles of Data Mining and Knowledge Discovery , T. Elomaa, H. Mannila, and H. Toivonen, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002, pp. 15–27

work page 2002

[3] [3]

Barnett and T

V . Barnett and T. Lewis, Outliers in Statistical Data, 3rd ed. Chichester: John Wiley & Sons, 1994

work page 1994

[4] [4]

Fast one- class classification using class boundary-preserving random projections,

A. Bhattacharya, S. Varambally, A. Bagchi, and S. Bedathur, “Fast one- class classification using class boundary-preserving random projections,” in KDD, 2021

work page 2021

[5] [5]

Lof: identifying density-based local outliers,

M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “Lof: identifying density-based local outliers,” in Proceedings of the 2000 ACM SIGMOD international conference on Management of data , 2000, pp. 93–104

work page 2000

[6] [6]

On the evaluation of unsu- pervised outlier detection: measures, datasets, and an empirical study,

G. O. Campos, A. Zimek, J. Sander, R. J. Campello, B. Micenkov ´a, E. Schubert, I. Assent, and M. E. Houle, “On the evaluation of unsu- pervised outlier detection: measures, datasets, and an empirical study,” Data mining and knowledge discovery , vol. 30, pp. 891–927, 2016

work page 2016

[7] [7]

An outlier detection approach on credit card fraud detection using machine learning: a comparative analysis on supervised and unsupervised learning,

P. Caroline Cynthia and S. Thomas George, “An outlier detection approach on credit card fraud detection using machine learning: a comparative analysis on supervised and unsupervised learning,” in Intelligence in Big Data Technologies—Beyond the Hype: Proceedings of ICBDCC 2019 . Springer, 2021, pp. 125–135

work page 2019

[8] [8]

Density-preserving projec- tions for large-scale local anomaly detection,

T. de Vries, S. Chawla, and M. E. Houle, “Density-preserving projec- tions for large-scale local anomaly detection,” KAIS, vol. 32, no. 1, pp. 25–52, 2012

work page 2012

[9] [9]

Outlier detection by ensembling uncertainty with negative objectness,

A. Deli ´c, M. Grci ´c, and S. ˇSegvi´c, “Outlier detection by ensembling uncertainty with negative objectness,” arXiv preprint arXiv:2402.15374, 2024

work page arXiv 2024

[10] [10]

Generative adversarial nets for unsupervised outlier detection,

X. Du, J. Chen, J. Yu, S. Li, and Q. Tan, “Generative adversarial nets for unsupervised outlier detection,” Expert Systems with Applications , vol. 236, p. 121161, 2024

work page 2024

[11] [11]

An experimental study of existing tools for outlier detection and cleaning in trajectories,

M. M. Garcez Duarte and M. Sakr, “An experimental study of existing tools for outlier detection and cleaning in trajectories,” GeoInformatica, vol. 29, no. 1, pp. 31–51, 2025

work page 2025

[12] [12]

A Comparative Evaluation of Unsu- pervised Anomaly Detection Algorithms for Multivariate Data,

M. Goldstein and S. Uchida, “A Comparative Evaluation of Unsu- pervised Anomaly Detection Algorithms for Multivariate Data,” PLOS ONE, vol. 11, no. 4, p. e0152173, apr 2016

work page 2016

[13] [13]

Two-stage approach with combination of outlier detection method and deep learning enhances automatic epileptic seizure detection,

V . V . Grubov, S. I. Nazarikov, S. A. Kurkin, N. P. Utyashev, D. A. Andrikov, O. E. Karpov, and A. E. Hramov, “Two-stage approach with combination of outlier detection method and deep learning enhances automatic epileptic seizure detection,” IEEE Access, 2024

work page 2024

[14] [14]

Finding structure with randomness: Probabilistic algorithms for constructing approximate ma- trix decompositions,

N. Halko, P.-G. Martinsson, and J. A. Tropp, “Finding structure with randomness: Probabilistic algorithms for constructing approximate ma- trix decompositions,” SIAM review, vol. 53, no. 2, pp. 217–288, 2011

work page 2011

[15] [15]

Adbench: Anomaly detection benchmark,

S. Han, X. Hu, H. Huang, M. Jiang, and Y . Zhao, “Adbench: Anomaly detection benchmark,” in NeurIPS, 2022

work page 2022

[16] [16]

Extended isolation forest,

S. Hariri, M. C. Kind, and R. J. Brunner, “Extended isolation forest,” IEEE transactions on knowledge and data engineering , vol. 33, no. 4, pp. 1479–1489, 2019

work page 2019

[17] [17]

Outlier detection using k-nearest neighbour graph,

V . Hautamaki, I. Karkkainen, and P. Franti, “Outlier detection using k-nearest neighbour graph,” in Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., vol. 3, 2004, pp. 430–433 V ol.3

work page 2004

[18] [18]

D. M. Hawkins, Identification of Outliers. London: Chapman and Hall, 1980

work page 1980

[19] [19]

Local intrinsic dimensionality I: an extreme-value- theoretic foundation for similarity applications,

M. E. Houle, “Local intrinsic dimensionality I: an extreme-value- theoretic foundation for similarity applications,” inSimilarity Search and Applications - 10th International Conference, SISAP , 2017, pp. 64–79

work page 2017

[20] [20]

Intrusion detection on internet of vehicles via combining log-ratio oversampling, outlier detection and metric learning,

F. Jin, M. Chen, W. Zhang, Y . Yuan, and S. Wang, “Intrusion detection on internet of vehicles via combining log-ratio oversampling, outlier detection and metric learning,” Information Sciences, vol. 579, pp. 814– 831, 2021

work page 2021

[21] [21]

Good and bad neighborhood approximations for outlier detection ensembles,

E. Kirner, E. Schubert, and A. Zimek, “Good and bad neighborhood approximations for outlier detection ensembles,” in Similarity Search and Applications - 10th International Conference, SISAP, 2017, pp. 173– 187

work page 2017

[22] [22]

Asynchronism-based principal component analysis for time series data mining,

C. Li, “Asynchronism-based principal component analysis for time series data mining,” Expert Systems with Applications , vol. 41, no. 11, pp. 5182–5190, 2014

work page 2014

[23] [23]

Ms2od: outlier detection using minimum spanning tree and medoid selection,

J. Li, J. Li, C. Wang, F. J. Verbeek, T. Schultz, and H. Liu, “Ms2od: outlier detection using minimum spanning tree and medoid selection,” Machine Learning: Science and Technology , vol. 5, no. 1, p. 015025, 2024

work page 2024

[24] [24]

Isolation forest,

F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” in 2008 eighth ieee international conference on data mining. IEEE, 2008, pp. 413–422

work page 2008

[25] [25]

The effect of principal component analysis on machine learning accuracy with high dimensional spectral data,

M. G. Madden and A. G. Ryder, “The effect of principal component analysis on machine learning accuracy with high dimensional spectral data,” in Applications and Innovations in Intelligent Systems XIII, Proceedings of AI-2005 , Cambridge, UK, December 2005

work page 2005

[26] [26]

Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs,

Y . A. Malkov and D. A. Yashunin, “Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs,” IEEE TPAMI, vol. 42, no. 4, 2020

work page 2020

[27] [27]

Sensitivity analysis with iterative outlier detection for systematic reviews and meta-analyses,

Z. Meng, J. Wang, L. Lin, and C. Wu, “Sensitivity analysis with iterative outlier detection for systematic reviews and meta-analyses,” Statistics in Medicine, vol. 43, no. 8, pp. 1549–1563, 2024

work page 2024

[28] [28]

On the design of scalable outlier detection methods using approximate nearest neighbor graphs,

C. B. Okkels, M. Aum ¨uller, and A. Zimek, “On the design of scalable outlier detection methods using approximate nearest neighbor graphs,” in Similarity Search and Applications - 17th International Conference, SISAP, 2024, pp. 170–184

work page 2024

[29] [29]

Rajabinasab, A

M. Rajabinasab, A. Lautrup, and A. Zimek, Metrics for Inter-Dataset Similarity with Example Applications in Synthetic Data and Feature Selection Evaluation. Philadelphia, PA, USA: Proceedings of the 2025 SIAM International Conference on Data Mining, 2025, pp. 527–537

work page 2025

[30] [30]

Randomized pca forest for approximate k-nearest neighbor search,

M. Rajabinasab, F. Pakdaman, A. Zimek, and M. Gabbouj, “Randomized pca forest for approximate k-nearest neighbor search,” Expert Systems with Applications, p. 126254, 2024

work page 2024

[31] [31]

Efficient algorithms for mining outliers from large data sets,

S. Ramaswamy, R. Rastogi, and K. Shim, “Efficient algorithms for mining outliers from large data sets,” in Proceedings of the 2000 ACM SIGMOD international conference on Management of data , 2000, pp. 427–438

work page 2000

[32] [32]

Evaluating outlier probabilities: assessing sharpness, refinement, and calibration using stratified and weighted measures,

P. R ¨ochner, H. O. Marques, R. J. G. B. Campello, and A. Zimek, “Evaluating outlier probabilities: assessing sharpness, refinement, and calibration using stratified and weighted measures,” Data Min. Knowl. Discov., vol. 38, no. 6, pp. 3719–3757, 2024

work page 2024

[33] [33]

Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection,

E. Schubert, A. Zimek, and H. P. Kriegel, “Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection,” Data mining and knowledge discovery, vol. 28, pp. 190–237, 2014

work page 2014

[34] [34]

Fast and scalable outlier detection with approximate nearest neighbor ensembles,

E. Schubert, A. Zimek, and H.-P. Kriegel, “Fast and scalable outlier detection with approximate nearest neighbor ensembles,” in Proc. DAS- FAA, 2015

work page 2015

[35] [35]

Outlier detection: applications and tech- niques,

K. Singh and S. Upadhyaya, “Outlier detection: applications and tech- niques,” International Journal of Computer Science Issues (IJCSI) , vol. 9, no. 1, p. 307, 2012

work page 2012

[36] [36]

A comparative evaluation of clustering-based outlier detection,

B. V . S. Vinces, E. Schubert, A. Zimek, and R. L. F. Cordeiro, “A comparative evaluation of clustering-based outlier detection,” Data Min. Knowl. Discov., vol. 39, no. 2, p. 13, 2025

work page 2025

[37] [37]

Enhanced fault detection for gnss/ins integration using maximum correntropy filter and local outlier factor,

W. Wang, W. Shangguan, J. Liu, and J. Chen, “Enhanced fault detection for gnss/ins integration using maximum correntropy filter and local outlier factor,” IEEE Transactions on Intelligent Vehicles , vol. 9, no. 1, pp. 2077–2093, 2023

work page 2077

[38] [38]

Locality sensitive outlier detection: A ranking driven approach,

Y . Wang, S. Parthasarathy, and S. Tatikonda, “Locality sensitive outlier detection: A ranking driven approach,” in Proc. ICDE, 2011

work page 2011

[39] [39]

Breast cancer wisconsin (diagnostic),

W. H. Wolberg, O. L. Mangasarian, N. Street, and W. Street, “Breast cancer wisconsin (diagnostic),” UCI Machine Learning Repository, 1995

work page 1995

[40] [40]

Deep isolation forest for anomaly detection,

H. Xu, G. Pang, Y . Wang, and Y . Wang, “Deep isolation forest for anomaly detection,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 12, pp. 12 591–12 604, 2023

work page 2023

[41] [41]

LSHiForest: A generic framework for fast tree isolation based ensemble anomaly analysis,

X. Zhang, W. Dou, Q. He, R. Zhou, C. Leckie, R. Kotagiri, and Z. Salcic, “LSHiForest: A generic framework for fast tree isolation based ensemble anomaly analysis,” in Proc. ICDE, 2017

work page 2017

[42] [42]

Outlier detection method based on high-density iteration,

Y . Zhou, H. Xia, D. Yu, J. Cheng, and J. Li, “Outlier detection method based on high-density iteration,” Information Sciences , vol. 662, p. 120286, 2024

work page 2024

[43] [43]

There and back again: Outlier detection between statistical reasoning and data mining algorithms,

A. Zimek and P. Filzmoser, “There and back again: Outlier detection between statistical reasoning and data mining algorithms,” WIREs Data Mining Knowl. Discov., vol. 8, no. 6, 2018

work page 2018