pith. sign in

arxiv: 2510.17381 · v2 · submitted 2025-10-20 · 💻 cs.LG

Beyond Binary Out-of-Distribution Detection: Characterizing Distributional Shifts with Multi-Statistic Diffusion Trajectories

Pith reviewed 2026-05-18 06:13 UTC · model grok-4.3

classification 💻 cs.LG
keywords out-of-distribution detectiondiffusion modelsdistributional shiftsmachine learningstatistical characterizationOOD type classificationdenoising trajectories
0
0 comments X

The pith

Diffusion models extract multi-dimensional signatures from denoising steps to detect out-of-distribution data and classify the type of shift.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DISC to move beyond single-score OOD detectors that only answer yes or no. It uses the sequence of noise levels in a pre-trained diffusion model's iterative denoising to build a feature vector of statistical differences. This vector supports both standard detection and the additional step of identifying which kind of distributional change is present. A reader would care because knowing the shift type opens the door to context-appropriate handling rather than treating all anomalies the same. Experiments on image and tabular data show the approach matches top detectors while adding the type-classification ability.

Core claim

DISC leverages the iterative denoising process of diffusion models to extract a rich, multi-dimensional feature vector that captures statistical discrepancies across multiple noise levels and thereby enables classification of OOD type in addition to detection.

What carries the argument

Diffusion-based Statistical Characterization (DISC), which builds multi-statistic trajectories from the denoising steps of a pre-trained diffusion model to represent distributional shifts.

Load-bearing premise

The statistical discrepancies observed across noise levels in a pre-trained diffusion model are sufficiently discriminative for OOD type classification without requiring type-specific labels or retraining of the diffusion model itself.

What would settle it

On standard image or tabular OOD benchmarks, the multi-dimensional features extracted by DISC would fail to separate OOD types better than chance while also showing detection accuracy below current single-score methods.

Figures

Figures reproduced from arXiv: 2510.17381 by Achref Jaziri, Martin Mundt, Martin Rogmann, Visvanathan Ramesh.

Figure 1
Figure 1. Figure 1: Simplified illustration to highlight that [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: OOD scores overlap significantly for different OOD types for models trained on CIFAR-10 as in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: AUROC in the standard OOD detection set [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: (a) Average SSIM vs. diffusion timestep. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Mean AUC-ROC scores on ADBench for unsupervised anomaly detection. Our method, DISC (red), uses an Isolation Forest on combined rank￾consistency and reconstruction error scores. It is com￾pared to diffusion-based methods (blue) and tabular detectors (dark yellow). The blue bars show that for diffusion-based approaches, the thresholding choice is critical, with IForest outperforming other schemes. Acknowled… view at source ↗
read the original abstract

Detecting out-of-distribution (OOD) data is critical for machine learning, be it for safety reasons or to enable open-ended learning. However, beyond mere detection, choosing an appropriate course of action typically hinges on the type of OOD data encountered. Unfortunately, the latter is generally not distinguished in practice, as modern OOD detection methods collapse distributional shifts into single scalar outlier scores. This work argues that scalar-based methods are thus insufficient for OOD data to be properly contextualized and prospectively exploited, a limitation we overcome with the introduction of DISC: Diffusion-based Statistical Characterization. DISC leverages the iterative denoising process of diffusion models to extract a rich, multi-dimensional feature vector that captures statistical discrepancies across multiple noise levels. Extensive experiments on image and tabular benchmarks show that DISC matches or surpasses state-of-the-art detectors for OOD detection and, crucially, also classifies OOD type, a capability largely absent from prior work. As such, our work enables a shift from simple binary OOD detection to a more granular detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces DISC (Diffusion-based Statistical Characterization), which extracts multi-dimensional feature vectors from statistical discrepancies observed across multiple noise levels in the iterative denoising trajectories of pre-trained diffusion models. This enables both standard OOD detection (matching or surpassing SOTA) and classification of OOD shift types on image and tabular benchmarks, without type-specific labels or diffusion retraining.

Significance. If the central claims hold, the work meaningfully extends OOD research beyond binary scalar scores by providing a label-free route to type-aware characterization. Strengths include reliance on off-the-shelf pre-trained diffusion models, multi-statistic trajectories that capture noise-level discrepancies, and experiments that add a new capability (type classification) largely absent from prior detectors.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (method): the central claim that type classification occurs without type-specific labels or supervision is load-bearing, yet the manuscript provides no explicit algorithm, derivation, or pseudocode showing how a raw multi-statistic trajectory vector is mapped to a type prediction via a label-free procedure (e.g., clustering or nearest-centroid assignment) whose groups generalize beyond the finite benchmark set.
  2. [§4] §4 (experiments): the reported classification accuracies are presented without detailing the exact evaluation protocol for type labels, the baselines used for the classification task, or ablation confirming that gains do not arise from post-hoc supervised heads trained on the same OOD-type annotations used for testing.
minor comments (2)
  1. [Table 1 and Figure 3] Table 1 and Figure 3: clarify the precise statistics extracted at each noise level and how the resulting feature dimensionality is chosen; the current description leaves the construction of the multi-dimensional vector somewhat underspecified.
  2. [§5] §5 (related work): add explicit comparison to any prior diffusion-based OOD methods that also operate on denoising trajectories, even if they remain scalar-based.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address the two major comments below, clarifying the unsupervised nature of the type classification procedure and committing to expanded methodological and experimental details in the revision.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (method): the central claim that type classification occurs without type-specific labels or supervision is load-bearing, yet the manuscript provides no explicit algorithm, derivation, or pseudocode showing how a raw multi-statistic trajectory vector is mapped to a type prediction via a label-free procedure (e.g., clustering or nearest-centroid assignment) whose groups generalize beyond the finite benchmark set.

    Authors: We agree that an explicit description of the label-free mapping is essential. In the current §3, the multi-statistic trajectory vectors are processed via unsupervised clustering (k-means with the number of clusters selected by the elbow method on the within-cluster sum of squares) to discover shift-type groups directly from the feature space; no type labels are used during clustering or assignment. Nearest-centroid assignment is then performed on held-out trajectories using the discovered centroids. We will add a dedicated algorithm box and pseudocode in the revised §3 to make this procedure fully explicit, including the precise distance metric and cluster selection criterion. Regarding generalization beyond the benchmark set, the clusters are formed independently on each dataset's OOD trajectories, and we observe consistent separation of semantically distinct shifts (e.g., noise vs. semantic vs. covariate) across image and tabular domains; we will include a short discussion of this inductive behavior in the revision. revision: yes

  2. Referee: [§4] §4 (experiments): the reported classification accuracies are presented without detailing the exact evaluation protocol for type labels, the baselines used for the classification task, or ablation confirming that gains do not arise from post-hoc supervised heads trained on the same OOD-type annotations used for testing.

    Authors: We acknowledge that the evaluation protocol for type classification requires additional clarification. Type labels for accuracy computation are derived from the known generative processes used to create the OOD sets in each benchmark (e.g., Gaussian noise, adversarial perturbations, semantic class shifts); these labels are used solely for post-hoc evaluation of cluster purity and are never provided to the clustering algorithm itself. The classification step remains fully unsupervised. We will expand §4 to (i) state the exact protocol, (ii) include unsupervised baselines such as k-means on raw features or PCA-reduced trajectories, and (iii) add an ablation that replaces the unsupervised clustering with a supervised linear head trained on the same type annotations, demonstrating that the reported gains are not attributable to supervised post-processing. These additions will be included in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on external pre-trained diffusion models and benchmark experiments

full rationale

The paper's core construction extracts multi-dimensional statistical features from the denoising trajectories of a standard pre-trained diffusion model across noise levels. This process is defined directly from the diffusion forward/reverse process without any equation that sets a prediction equal to a parameter fitted inside the paper. No self-citation is invoked as a uniqueness theorem or to justify an ansatz; the method is presented as a direct application of existing diffusion mechanics. Classification of OOD type is evaluated on external image and tabular benchmarks rather than being forced by internal definitions or fitted inputs. The derivation chain therefore remains independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach assumes a pre-trained diffusion model already encodes useful statistical structure for arbitrary OOD types and that the chosen statistics at discrete noise levels are sufficient without further justification.

axioms (1)
  • domain assumption Pre-trained diffusion models capture statistical properties of the training distribution that remain informative for out-of-distribution characterization at multiple noise levels.
    Invoked when the method extracts features from the denoising process without retraining or fine-tuning the diffusion model.

pith-pipeline@v0.9.0 · 5719 in / 1182 out tokens · 22791 ms · 2026-05-18T06:13:12.921342+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 8 internal anchors

  1. [1]

    Qui˜ nonero-Candela, M

    J. Qui˜ nonero-Candela, M. Sugiyama, A. Schwaighofer, and N. D. Lawrence,Dataset shift in machine learning. Mit Press, 2022

  2. [2]

    Concrete Problems in AI Safety

    D. Amodei, C. Olah, J. Steinhardt, P. Christiano, J. Schulman, and D. Man´ e, “Concrete problems in ai safety,”arXiv preprint arXiv:1606.06565, 2016

  3. [3]

    Deep Anomaly Detection with Outlier Exposure

    D. Hendrycks, M. Mazeika, and T. Dietterich, “Deep anomaly detection with outlier exposure,” arXiv preprint arXiv:1812.04606, 2018

  4. [4]

    Domain generalization: A survey,

    K. Zhou, Z. Liu, Y. Qiao, T. Xiang, and C. C. Loy, “Domain generalization: A survey,”IEEE transactions on pattern analysis and machine in- telligence, vol. 45, no. 4, pp. 4396–4415, 2022

  5. [5]

    A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

    D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,”arXiv preprint arXiv:1610.02136, 2016

  6. [6]

    A wholistic view of continual learning with deep neural networks: Forgotten lessons and the bridge to active and open world learning,

    M. Mundt, Y. Hong, I. Pliushch, and V. Ramesh, “A wholistic view of continual learning with deep neural networks: Forgotten lessons and the bridge to active and open world learning,”Neural Net- works, vol. 160, pp. 306–336, 2023

  7. [7]

    A com- prehensive survey of continual learning: Theory, method and application,

    L. Wang, X. Zhang, H. Su, and J. Zhu, “A com- prehensive survey of continual learning: Theory, method and application,”IEEE transactions on pattern analysis and machine intelligence, vol. 46, no. 8, pp. 5362–5383, 2024

  8. [8]

    Cifar-10 (canadian institute for advanced research),

    A. Krizhevsky, V. Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research),”Tech- nical Report, 2009

  9. [9]

    Learning and the unknown: Surveying steps toward open world recognition,

    T. E. Boult, S. Cruz, A. R. Dhamija, M. Gunther, J. Henrydoss, and W. J. Scheirer, “Learning and the unknown: Surveying steps toward open world recognition,” inProceedings of the AAAI confer- ence on artificial intelligence, vol. 33, pp. 9801– 9807, 2019

  10. [10]

    Toward open set recognition,

    W. J. Scheirer, A. de Rezende Rocha, A. Sapkota, and T. E. Boult, “Toward open set recognition,” IEEE transactions on pattern analysis and ma- chine intelligence, vol. 35, no. 7, pp. 1757–1772, 2012

  11. [11]

    Prob- ability models for open set recognition,

    W. J. Scheirer, L. P. Jain, and T. E. Boult, “Prob- ability models for open set recognition,”IEEE transactions on pattern analysis and machine in- telligence, vol. 36, no. 11, pp. 2317–2324, 2014

  12. [12]

    Test distribution-aware active learning: A principled approach against distribution shift and outliers,

    A. Kirsch, T. Rainforth, and Y. Gal, “Test distribution-aware active learning: A principled approach against distribution shift and outliers,” arXiv preprint arXiv:2106.11719, 2021

  13. [13]

    Denoising dif- fusion probabilistic models,

    J. Ho, A. Jain, and P. Abbeel, “Denoising dif- fusion probabilistic models,”Advances in neural information processing systems, vol. 33, pp. 6840– 6851, 2020

  14. [14]

    Do Deep Generative Models Know What They Don't Know?

    E. Nalisnick, A. Matsukawa, Y. W. Teh, D. Gorur, and B. Lakshminarayanan, “Do deep generative models know what they don’t know?,” arXiv preprint arXiv:1810.09136, 2018

  15. [15]

    A review of novelty detection,

    M. A. Pimentel, D. A. Clifton, L. Clifton, and L. Tarassenko, “A review of novelty detection,” Signal processing, vol. 99, pp. 215–249, 2014

  16. [16]

    Energy- based out-of-distribution detection,

    W. Liu, X. Wang, J. Owens, and Y. Li, “Energy- based out-of-distribution detection,”Advances in neural information processing systems, vol. 33, pp. 21464–21475, 2020

  17. [17]

    Generalized odin: Detecting out-of-distribution image without learning from out-of-distribution data,

    Y.-C. Hsu, Y. Shen, H. Jin, and Z. Kira, “Generalized odin: Detecting out-of-distribution image without learning from out-of-distribution data,” inProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pp. 10951–10960, 2020

  18. [18]

    Simple and scalable predictive uncertainty estimation using deep ensembles,

    B. Lakshminarayanan, A. Pritzel, and C. Blun- dell, “Simple and scalable predictive uncertainty estimation using deep ensembles,”Advances in neural information processing systems, vol. 30, 2017

  19. [19]

    Dropout as a bayesian approximation: Representing model un- certainty in deep learning,

    Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model un- certainty in deep learning,” ininternational con- ference on machine learning, pp. 1050–1059, PMLR, 2016

  20. [20]

    A simple uni- fied framework for detecting out-of-distribution samples and adversarial attacks,

    K. Lee, K. Lee, H. Lee, and J. Shin, “A simple uni- fied framework for detecting out-of-distribution samples and adversarial attacks,”Advances in neural information processing systems, vol. 31, 2018

  21. [21]

    A simple fix to ma- halanobis distance for improving near-ood detec- tion,

    J. Ren, S. Fort, J. Liu, A. G. Roy, S. Padhy, and B. Lakshminarayanan, “A simple fix to ma- halanobis distance for improving near-ood detec- tion,”arXiv preprint arXiv:2106.09022, 2021

  22. [22]

    Out-of- distribution detection with deep nearest neigh- bors,

    Y. Sun, Y. Ming, X. Zhu, and Y. Li, “Out-of- distribution detection with deep nearest neigh- bors,” inInternational Conference on Machine Learning, pp. 20827–20840, PMLR, 2022. Characterizing Distributional Shifts with Multi-Statistic Diffusion Trajectories

  23. [23]

    React: Out-of- distribution detection with rectified activations,

    Y. Sun, C. Guo, and Y. Li, “React: Out-of- distribution detection with rectified activations,” Advances in neural information processing sys- tems, vol. 34, pp. 144–157, 2021

  24. [24]

    Dice: Leveraging sparsifica- tion for out-of-distribution detection,

    Y. Sun and Y. Li, “Dice: Leveraging sparsifica- tion for out-of-distribution detection,” inEuro- pean conference on computer vision, pp. 691–708, Springer, 2022

  25. [25]

    Extremely simple activation shaping for out-of-distribution detection,

    A. Djurisic, N. Bozanic, A. Ashok, and R. Liu, “Extremely simple activation shaping for out-of-distribution detection,”arXiv preprint arXiv:2209.09858, 2022

  26. [26]

    On the im- portance of gradients for detecting distributional shifts in the wild,

    R. Huang, A. Geng, and Y. Li, “On the im- portance of gradients for detecting distributional shifts in the wild,”Advances in Neural Informa- tion Processing Systems, vol. 34, pp. 677–689, 2021

  27. [27]

    Approximations to the fisher information metric of deep generative models for out-of-distribution detection,

    S. Dauncey, C. Holmes, C. Williams, and F. Falck, “Approximations to the fisher information metric of deep generative models for out-of-distribution detection,”arXiv preprint arXiv:2403.01485, 2024

  28. [28]

    PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications

    T. Salimans, A. Karpathy, X. Chen, and D. P. Kingma, “Pixelcnn++: Improving the pix- elcnn with discretized logistic mixture likeli- hood and other modifications,”arXiv preprint arXiv:1701.05517, 2017

  29. [29]

    Glow: Gener- ative flow with invertible 1x1 convolutions,

    D. P. Kingma and P. Dhariwal, “Glow: Gener- ative flow with invertible 1x1 convolutions,”Ad- vances in neural information processing systems, vol. 31, 2018

  30. [30]

    Understanding failures in out-of-distribution de- tection with deep generative models,

    L. Zhang, M. Goldstein, and R. Ranganath, “Understanding failures in out-of-distribution de- tection with deep generative models,” inIn- ternational Conference on Machine Learning, pp. 12427–12436, PMLR, 2021

  31. [31]

    Denois- ing diffusion models for out-of-distribution detec- tion,

    M. S. Graham, W. H. Pinaya, P.-D. Tudosiu, P. Nachev, S. Ourselin, and J. Cardoso, “Denois- ing diffusion models for out-of-distribution detec- tion,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recogni- tion, pp. 2948–2957, 2023

  32. [32]

    On Diffusion Modeling for Anomaly Detection

    V. Livernoche, V. Jain, Y. Hezaveh, and S. Ra- vanbakhsh, “On diffusion modeling for anomaly detection,”arXiv preprint arXiv:2305.18593, 2023

  33. [33]

    Unsupervised out-of-distribution detec- tion with diffusion inpainting,

    Z. Liu, J. P. Zhou, Y. Wang, and K. Q. Wein- berger, “Unsupervised out-of-distribution detec- tion with diffusion inpainting,” inInternational Conference on Machine Learning, pp. 22528– 22538, PMLR, 2023

  34. [34]

    Out-of-distribution detection with a single unconditional diffusion model,

    A. Heng, H. Soh,et al., “Out-of-distribution detection with a single unconditional diffusion model,”Advances in Neural Information Process- ing Systems, vol. 37, pp. 43952–43974, 2024

  35. [35]

    Diffusion-based layer-wise semantic reconstruction for unsuper- vised out-of-distribution detection,

    Y. Yang, D. Cheng, C. Fang, Y. Wang, C. Jiao, L. Cheng, N. Wang, and X. Gao, “Diffusion-based layer-wise semantic reconstruction for unsuper- vised out-of-distribution detection,”Advances in Neural Information Processing Systems, vol. 37, pp. 26846–26871, 2024

  36. [36]

    Diffusion models are minimax optimal distribution estima- tors,

    K. Oko, S. Akiyama, and T. Suzuki, “Diffusion models are minimax optimal distribution estima- tors,” inInternational Conference on Machine Learning, pp. 26517–26582, PMLR, 2023

  37. [37]

    Expecting the un- expected: Towards broad out-of-distribution de- tection,

    C. Guille-Escuret, P.-A. No¨ el, I. Mitliagkas, D. Vazquez, and J. Monteiro, “Expecting the un- expected: Towards broad out-of-distribution de- tection,”Advances in Neural Information Pro- cessing Systems, vol. 37, pp. 130953–130976, 2024

  38. [38]

    Out-of- distribution detection methods answer the wrong questions,

    Y. L. Li, D. Lu, P. Kirichenko, S. Qiu, T. G. Rudner, C. B. Bruss, and A. G. Wilson, “Out-of- distribution detection methods answer the wrong questions,”arXiv preprint arXiv:2507.01831, 2025

  39. [39]

    How to overcome curse-of-dimensionality for out-of-distribution de- tection?,

    S. S. Ghosal, Y. Sun, and Y. Li, “How to overcome curse-of-dimensionality for out-of-distribution de- tection?,” inProceedings of the AAAI Confer- ence on Artificial Intelligence, vol. 38, pp. 19849– 19857, 2024

  40. [40]

    On the surprising behavior of distance metrics in high dimensional space,

    C. C. Aggarwal, A. Hinneburg, and D. A. Keim, “On the surprising behavior of distance metrics in high dimensional space,” inInternational confer- ence on database theory, pp. 420–434, Springer, 2001

  41. [41]

    Is out-of-distribution detection learnable?,

    Z. Fang, Y. Li, J. Lu, J. Dong, B. Han, and F. Liu, “Is out-of-distribution detection learnable?,”Ad- vances in Neural Information Processing Systems, vol. 35, pp. 37199–37213, 2022

  42. [42]

    A closer look at the learnability of out-of-distribution (ood) detec- tion,

    K. Garov and K. Chaudhuri, “A closer look at the learnability of out-of-distribution (ood) detec- tion,”arXiv preprint arXiv:2501.08821, 2025

  43. [43]

    Out-of-distribution detection based on in-distribution data patterns memorization with modern hopfield energy,

    J. Zhang, Q. Fu, X. Chen, L. Du, Z. Li, G. Wang, S. Han, D. Zhang,et al., “Out-of-distribution detection based on in-distribution data patterns memorization with modern hopfield energy,” in The Eleventh International Conference on Learn- ing Representations, 2022. Achref Jaziri1, Martin Rogmann, Martin Mundt 2, Visvanathan Ramesh 3

  44. [44]

    On Detecting Adversarial Perturbations

    J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff, “On detecting adversarial perturba- tions,”arXiv preprint arXiv:1702.04267, 2017

  45. [45]

    Score-Based Generative Modeling through Stochastic Differential Equations

    Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Ku- mar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differen- tial equations,”arXiv preprint arXiv:2011.13456, 2020

  46. [46]

    Image quality assessment: from error visibility to structural similarity,

    Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Si- moncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE transac- tions on image processing, vol. 13, no. 4, pp. 600– 612, 2004

  47. [47]

    The unreasonable effectiveness of deep features as a perceptual metric,

    R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” inProceed- ings of the IEEE conference on computer vision and pattern recognition, pp. 586–595, 2018

  48. [48]

    Deep unsu- pervised learning using nonequilibrium ther- modynamics,

    J. Sohl-Dickstein, E. Weiss, N. Mah- eswaranathan, and S. Ganguli, “Deep unsu- pervised learning using nonequilibrium ther- modynamics,” inInternational conference on machine learning, pp. 2256–2265, pmlr, 2015

  49. [49]

    Tweedie’s formula and selection bias,

    B. Efron, “Tweedie’s formula and selection bias,” Journal of the American Statistical Association, vol. 106, no. 496, pp. 1602–1614, 2011

  50. [50]

    Deep networks always grok and here is why,

    A. I. Humayun, R. Balestriero, and R. Baraniuk, “Deep networks always grok and here is why,” arXiv preprint arXiv:2402.15555, 2024

  51. [51]

    What secrets do your manifolds hold? under- standing the local geometry of generative mod- els,

    A. I. Humayun, I. Amara, C. Vasconcelos, D. Ra- machandran, C. Schumann, J. He, K. Heller, G. Farnadi, N. Rostamzadeh, and M. Havaei, “What secrets do your manifolds hold? under- standing the local geometry of generative mod- els,”arXiv preprint arXiv:2408.08307, 2024

  52. [52]

    Local binary patterns,

    M. Pietik¨ ainen, “Local binary patterns,”Scholar- pedia, vol. 5, no. 3, p. 9775, 2010

  53. [53]

    Continuous and dis- crete wavelet transforms,

    C. E. Heil and D. F. Walnut, “Continuous and dis- crete wavelet transforms,”SIAM review, vol. 31, no. 4, pp. 628–666, 1989

  54. [54]

    Representation learning in a decomposed en- coder design for bio-inspired hebbian learning,

    A. Jaziri, S. Ditzel, I. Pliushch, and V. Ramesh, “Representation learning in a decomposed en- coder design for bio-inspired hebbian learning,” inEuropean Conference on Computer Vision, pp. 203–213, Springer, 2024

  55. [55]

    Uncertainty-aware decomposed hybrid net- works,

    S. Ditzel, A. Jaziri, I. Pliushch, and V. Ramesh, “Uncertainty-aware decomposed hybrid net- works,”arXiv preprint arXiv:2503.19096, 2025

  56. [56]

    Order consistent change detection via fast sta- tistical significance testing,

    M. Singh, V. Parameswaran, and V. Ramesh, “Order consistent change detection via fast sta- tistical significance testing,” in2008 IEEE Con- ference on Computer Vision and Pattern Recog- nition, pp. 1–8, IEEE, 2008

  57. [57]

    Isolation forest,

    F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” in2008 eighth ieee international confer- ence on data mining, pp. 413–422, IEEE, 2008

  58. [58]

    Adbench: Anomaly detection benchmark,

    S. Han, X. Hu, H. Huang, M. Jiang, and Y. Zhao, “Adbench: Anomaly detection benchmark,”Ad- vances in neural information processing systems, vol. 35, pp. 32142–32159, 2022