The Signal in the Noise: OOD Detection Through Goodness-of-Fit Testing in Factorised Latent Spaces

Henry Gouk; Jack Geary; Philipp Bomatter

arxiv: 2605.22496 · v1 · pith:NQHQEQAGnew · submitted 2026-05-21 · 💻 cs.LG

The Signal in the Noise: OOD Detection Through Goodness-of-Fit Testing in Factorised Latent Spaces

Philipp Bomatter , Jack Geary , Henry Gouk This is my paper

Pith reviewed 2026-05-22 08:04 UTC · model grok-4.3

classification 💻 cs.LG

keywords OOD detectioncontinuous normalizing flowsgoodness-of-fit testinglatent spacegenerative modelsout-of-distributionlikelihood unreliability

0 comments

The pith

Continuous normalizing flows map out-of-distribution samples to atypical noise under the prior, enabling single-sample detection without relying on likelihood.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that out-of-distribution samples are mapped by continuous normalizing flows to noise that is atypical under the prior distribution, an effect not captured by likelihood values alone. This insight leads to the Signal in the Noise method for OOD detection using goodness-of-fit tests on the latent noise. The approach works at the single-sample level and avoids needing any out-of-distribution examples during training or testing. Readers care because it offers better reliability than standard likelihood methods while keeping computation low and allowing control over error rates.

Core claim

The diffeomorphic and mass-preserving properties of continuous normalizing flows cause OOD samples to be mapped to noise samples that are highly atypical under the noise prior in ways not captured by the likelihood. The proposed Signal in the Noise (SITN) method leverages this by performing goodness-of-fit testing in factorised latent spaces for single-sample OOD detection.

What carries the argument

Goodness-of-fit testing applied to the noise samples obtained by inverting a continuous normalizing flow on an input.

If this is right

SITN requires no access to OOD data.
The method adds minimal computational overhead beyond standard likelihood evaluation.
It provides strict control over false positive rates.
Evaluations show no complexity bias toward simpler in-distribution samples.
It performs well on both standard benchmarks and synthetic perturbations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The atypical noise mapping might generalize to other invertible models beyond continuous flows.
This could be extended to multi-sample or batch detection scenarios for even stronger statistical power.
Applications in safety-critical systems could benefit from the controlled false positive rates.

Load-bearing premise

The diffeomorphic and mass-preserving properties of continuous normalizing flows map OOD inputs to atypical noise under the prior independently of likelihood.

What would settle it

A dataset where OOD samples consistently produce noise samples that pass goodness-of-fit tests at rates similar to in-distribution samples would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.22496 by Henry Gouk, Jack Geary, Philipp Bomatter.

**Figure 2.** Figure 2: OOD detection performance across the CIFAR-10-C corruptions in terms of AUROC. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Images with the highest and lowest OOD scores for each method in the CIFAR-10 [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Visualisation of images along with their corresponding noise samples. Images from the [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Ablation showing the OOD detection performance of the individual statistics [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Complexity bias of OOD detection metrics. Scatter plots showing the relationship between [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Log-Likelihood distributions across the different datasets. Plot titles indicate what dataset [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Images with the highest and lowest OOD scores for each method in the CIFAR-10 [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Distribution of the ensemble mean and variance of the log-likelihoods across the CIFAR-10 [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Calibration curves showing the actual false positive rate (FPR)—the fraction of in [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: Top 150 samples with highest log-likelihood OOD score (lowest log-likelihoods) for the [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗

**Figure 12.** Figure 12: Top 150 samples with highest Typicality OOD score for the CIFAR-10 [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗

**Figure 13.** Figure 13: Top 150 samples with highest DoSE OOD score for the CIFAR-10 [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗

**Figure 14.** Figure 14: Top 150 samples with highest SITN OOD score for the CIFAR-10 [PITH_FULL_IMAGE:figures/full_fig_p025_14.png] view at source ↗

read the original abstract

Deep generative models offer a natural foundation for out-of-distribution (OOD) detection, yet prior work has shown that their assigned likelihoods are notoriously unreliable indicators for in- vs out-of-distribution data. In this paper, we address this problem by leveraging the diffeomorphic and mass-preserving properties of continuous normalising flows. Our analysis shows that OOD samples are mapped to noise samples that are highly atypical under the noise prior in ways not captured by the likelihood. Based on this observation, we propose a new method -- Signal in the Noise (SITN) -- for OOD detection on the single-sample level. SITN requires no access to OOD data, incurs minimal computational overhead, and provides strict control of false positive rates. Comprehensive evaluations through standard benchmarks and synthetic perturbations highlight the method's effectiveness and the absence of the complexity bias inherent to likelihood-based methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SITN reframes OOD detection as a per-dimension goodness-of-fit test on CNF noise variables, which cleanly avoids likelihood bias but rests on an assumption of exact prior matching that finite training usually violates.

read the letter

The main takeaway is that this paper turns the latent noise of a trained continuous normalizing flow into a test statistic for OOD detection. Instead of relying on the overall likelihood, it checks whether each dimension of the base noise looks like it came from the standard normal using CDF transforms and a combined p-value, then flags points whose noise is atypical. This is presented as a single-sample procedure that needs no OOD examples and claims exact false-positive control at a chosen alpha level.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Signal in the Noise (SITN), an OOD detection method that exploits the diffeomorphic and mass-preserving properties of continuous normalizing flows. OOD inputs are mapped to highly atypical points under the base noise prior in a factorised latent space; these atypicalities are detected via per-dimension CDF transforms to uniform p-values followed by a combination statistic (e.g., Fisher), yielding a single-sample test that requires no OOD data, incurs negligible overhead, and claims exact false-positive-rate control at any nominal alpha.

Significance. If the exact FPR guarantee and the separation from likelihood-based complexity bias both hold, the work would supply a theoretically grounded, low-cost alternative to existing generative-model OOD detectors and could influence downstream applications that need calibrated type-I error without access to outlier examples.

major comments (2)

[§3] §3 (SITN construction) and the abstract: the claim of 'strict control of false positive rates' rests on the premise that, after training, the push-forward measure of held-out ID data under the flow is exactly the base prior (standard normal). Because the flow is obtained by maximum-likelihood estimation on finite data, the empirical marginals in the factorised latent space deviate from the prior; consequently the transformed p-values are not exactly uniform and the combined statistic does not possess the nominal null distribution. This directly affects the central guarantee of alpha-level thresholding without OOD data.
[§4] §4 (Empirical evaluation): the reported benchmark results do not include a calibration check that compares the observed type-I error on held-out ID data against the nominal alpha across multiple thresholds. Without such a diagnostic, it is impossible to verify whether the finite-sample mismatch identified above remains negligible in the regimes tested.

minor comments (2)

The precise form of the per-dimension goodness-of-fit transform and the p-value combination rule should be stated explicitly (including any continuity corrections) so that the null distribution can be reproduced.
Figure captions and axis labels in the synthetic-perturbation experiments would benefit from explicit indication of which panels correspond to ID versus OOD samples.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our manuscript. We respond to each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [§3] §3 (SITN construction) and the abstract: the claim of 'strict control of false positive rates' rests on the premise that, after training, the push-forward measure of held-out ID data under the flow is exactly the base prior (standard normal). Because the flow is obtained by maximum-likelihood estimation on finite data, the empirical marginals in the factorised latent space deviate from the prior; consequently the transformed p-values are not exactly uniform and the combined statistic does not possess the nominal null distribution. This directly affects the central guarantee of alpha-level thresholding without OOD data.

Authors: We agree that the finite-sample MLE training of the flow means the push-forward of held-out ID data is not exactly the base prior, so the uniformity of the p-values and the exact null distribution of the combined statistic hold only asymptotically or conditional on a perfectly trained model. Our derivation in §3 proceeds under the standard population-level assumption that the flow has recovered the base distribution exactly. In practice the mismatch is small for the dataset sizes and model capacities used, but we acknowledge the referee's point is valid. In the revision we will update the abstract and §3 to state that the FPR control is exact with respect to the learned flow (i.e., if test data were drawn from the model's implied distribution) and add a short paragraph discussing finite-sample deviations and their expected magnitude. revision: partial
Referee: [§4] §4 (Empirical evaluation): the reported benchmark results do not include a calibration check that compares the observed type-I error on held-out ID data against the nominal alpha across multiple thresholds. Without such a diagnostic, it is impossible to verify whether the finite-sample mismatch identified above remains negligible in the regimes tested.

Authors: We agree that an explicit calibration diagnostic is needed to quantify how closely the observed type-I error tracks the nominal alpha. In the revised manuscript we will add, in §4, a calibration plot (or table) that reports the empirical false-positive rate on held-out in-distribution data for several nominal levels (0.01, 0.05, 0.10) across all benchmarks. This will directly address the finite-sample concern raised in the previous comment. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation rests on standard CNF properties without reduction to fitted inputs or self-citations

full rationale

The paper grounds its atypicality claim and strict FPR control directly in the diffeomorphic and mass-preserving properties of continuous normalizing flows, which are standard mathematical facts independent of the present work. No equations or sections reduce the detection statistic to a parameter fitted on the target task, nor does the central argument rely on a load-bearing self-citation chain or imported uniqueness theorem. The method is presented as a direct consequence of the flow's change-of-variables formula applied to goodness-of-fit testing in the latent space, with no renaming of known results or smuggling of ansatzes. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard mathematical properties of continuous normalizing flows and introduces a new statistical testing procedure on the resulting noise; no new entities are postulated.

free parameters (1)

false-positive-rate threshold
Chosen to achieve strict control of false positives; specific operating point may be set per application.

axioms (1)

domain assumption Continuous normalizing flows are diffeomorphic and mass-preserving.
Invoked to justify that OOD samples map to atypical noise not captured by likelihood.

pith-pipeline@v0.9.0 · 5682 in / 1249 out tokens · 32467 ms · 2026-05-22T08:04:57.647979+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We quantify this atypicality at the single-sample level by exploiting the completely factorised nature of the Gaussian noise prior... Anderson-Darling statistic... coefficient of variation of the empirical power spectrum

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 3 internal anchors

[1]

11 Nicholas Baker, Hongjing Lu, Gennady Erlikhman, and Philip J

Anh Mai Nguyen, Jason Yosinski, and Jeff Clune. “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images”. In:Proc of CVPR. 2015, pp. 427–436. DOI:10.1109/CVPR.2015.7298640

work page doi:10.1109/cvpr.2015.7298640 2015
[2]

Neural Ordinary Differential Equations

Tian Qi Chen et al. “Neural Ordinary Differential Equations”. In:Proc. of NeurIPS. 2018, pp. 6572–6583

work page 2018
[3]

Flow Matching for Generative Modeling

Yaron Lipman et al. “Flow Matching for Generative Modeling”. In:Proc. of ICLR. 2023

work page 2023
[4]

Do Deep Generative Models Know What They Don’t Know?

Eric T. Nalisnick et al. “Do Deep Generative Models Know What They Don’t Know?” In: Proc. of ICLR. 2019

work page 2019
[5]

Input Complexity and Out-of-distribution Detection with Likelihood-based Generative Models

Joan Serrà et al. “Input Complexity and Out-of-distribution Detection with Likelihood-based Generative Models”. In:Proc. of ICLR. 2020

work page 2020
[6]

Understanding Anomaly Detection with Deep Invertible Networks through Hierarchies of Distributions and Features

Robin Schirrmeister et al. “Understanding Anomaly Detection with Deep Invertible Networks through Hierarchies of Distributions and Features”. In:Proc. of NeurIPS. 2020

work page 2020
[7]

Why Normalizing Flows Fail to Detect Out-of-Distribution Data

Polina Kirichenko, Pavel Izmailov, and Andrew Gordon Wilson. “Why Normalizing Flows Fail to Detect Out-of-Distribution Data”. In:Proc. of NeurIPS. 2020

work page 2020
[8]

Detecting Out-of-Distribution Inputs to Deep Generative Models Using Typicality,

Eric Nalisnick et al. “Detecting Out-of-Distribution Inputs to Deep Generative Models Using Typicality”. In:ArXiv preprintabs/1906.02994 (2019)

work page arXiv 1906
[9]

Likelihood Ratios for Out-of-Distribution Detection

Jie Ren et al. “Likelihood Ratios for Out-of-Distribution Detection”. In:Proc. of NeurIPS. 2019, pp. 14680–14691

work page 2019
[10]

Likelihood Regret: An Out-of-Distribution Detec- tion Score For Variational Auto-encoder

Zhisheng Xiao, Qing Yan, and Yali Amit. “Likelihood Regret: An Out-of-Distribution Detec- tion Score For Variational Auto-encoder”. In:Proc. of NeurIPS. 2020

work page 2020
[11]

Density of States Estimation for Out of Distribution Detection

Warren R. Morningstar et al. “Density of States Estimation for Out of Distribution Detection”. In:Proc. of AISTATS. V ol. 130. Proceedings of Machine Learning Research. 2021, pp. 3232– 3240

work page 2021
[12]

Hierarchical V AEs Know What They Don’t Know

Jakob Drachmann Havtorn et al. “Hierarchical V AEs Know What They Don’t Know”. In:Proc. of ICML. V ol. 139. Proceedings of Machine Learning Research. 2021, pp. 4117–4128

work page 2021
[13]

A Geometric Explanation of the Likelihood OOD Detection Paradox

Hamidreza Kamkari et al. “A Geometric Explanation of the Likelihood OOD Detection Paradox”. In:Proc. of ICML. 2024

work page 2024
[14]

Hopcroft, and Ravindran Kannan.Foundations of data science

Avrim Blum, John E. Hopcroft, and Ravindran Kannan.Foundations of data science. 2020. ISBN: 978-1-108-48506-7 978-1-108-75552-8.DOI:10.1017/9781108755528

work page doi:10.1017/9781108755528 2020
[15]

Understanding Failures in Out-of- Distribution Detection with Deep Generative Models

Lily H. Zhang, Mark Goldstein, and Rajesh Ranganath. “Understanding Failures in Out-of- Distribution Detection with Deep Generative Models”. In:Proc. of ICML. V ol. 139. Proceed- ings of Machine Learning Research. 2021, pp. 12427–12436

work page 2021
[16]

WAIC, but Why? Generative Ensembles for Robust Anomaly Detection

Hyunsun Choi, Eric Jang, and Alexander A. Alemi. “W AIC, but Why? Generative Ensembles for Robust Anomaly Detection”. In:ArXiv preprintabs/1810.01392 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[17]

Revisiting flow generative models for Out-of- distribution detection

Dihong Jiang, Sun Sun, and Yaoliang Yu. “Revisiting flow generative models for Out-of- distribution detection”. In:Proc. of ICLR. 2022

work page 2022
[18]

Out-of-Distribution Detection with a Single Unconditional Diffusion Model

Alvin Heng, Alexandre H. Thiery, and Harold Soh. “Out-of-Distribution Detection with a Single Unconditional Diffusion Model”. In:Proc. of NeurIPS. 2024

work page 2024
[19]

Marcus Hutter.Testing Independence of Exchangeable Random Variables. Oct. 2022.DOI: 10.48550/arXiv.2210.12392

work page doi:10.48550/arxiv.2210.12392 2022
[20]

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Dan Hendrycks and Thomas G. Dietterich. “Benchmarking Neural Network Robustness to Common Corruptions and Perturbations”. In:Proc. of ICLR. 2019

work page 2019
[21]

Denoising Diffusion Probabilistic Models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. “Denoising Diffusion Probabilistic Models”. In: Proc. of NeurIPS. 2020

work page 2020
[22]

Diffusion Models Beat GANs on Image Synthesis

Prafulla Dhariwal and Alexander Quinn Nichol. “Diffusion Models Beat GANs on Image Synthesis”. In:Proc. of NeurIPS. 2021, pp. 8780–8794

work page 2021
[23]

doi: https://doi.org/ 10.1016/0771-050X(80)90013-3

J. R. Dormand and P. J. Prince. “A family of embedded Runge-Kutta formulae”. In:Journal of Computational and Applied Mathematics6.1 (1980), pp. 19–26.ISSN: 0377-0427.DOI: 10.1016/0771-050X(80)90013-3

work page doi:10.1016/0771-050x(80)90013-3 1980
[24]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. “Learning multiple layers of features from tiny images” (2009)

work page 2009
[25]

Reading digits in natural images with unsupervised feature learning

Yuval Netzer et al. “Reading digits in natural images with unsupervised feature learning”. In: NIPS workshop on deep learning and unsupervised feature learning. V ol. 2011. 2. 2011, p. 4. 11

work page 2011
[26]

Deep Learning Face Attributes in the Wild

Ziwei Liu et al. “Deep Learning Face Attributes in the Wild”. In:Proc. of ICCV. 2015, pp. 3730–3738.DOI:10.1109/ICCV.2015.425

work page doi:10.1109/iccv.2015.425 2015
[27]

Rania Briq et al.The Amazing Stability of Flow Matching. Apr. 2026.DOI: 10.48550/arXiv. 2604.16079

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2026
[28]

Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator

Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz. “Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator”. In:The Annals of Mathematical Statistics(1956), pp. 642–669

work page 1956
[29]

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke et al. “PyTorch: An Imperative Style, High-Performance Deep Learning Library”. In:Proc. of NeurIPS. 2019, pp. 8024–8035

work page 2019
[30]

Flow Matching Guide and Code

Yaron Lipman et al. “Flow Matching Guide and Code”. In:ArXiv preprintabs/2412.06264 (2024). 12 A Metric Ablations Bright Contrast Defocus Elastic Fog Frost Gauss B Gauss N Glass Impulse JPEG Motion Pixel Saturate Shot Snow Spatter Speckle Zoom SITN AD CV 0.82 0.60 0.73 0.74 0.64 0.88 0.75 0.91 0.95 1.00 0.97 0.90 0.95 0.74 0.93 0.96 0.97 0.96 0.80 0.85 0....

work page internal anchor Pith review Pith/arXiv arXiv 2024

[1] [1]

11 Nicholas Baker, Hongjing Lu, Gennady Erlikhman, and Philip J

Anh Mai Nguyen, Jason Yosinski, and Jeff Clune. “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images”. In:Proc of CVPR. 2015, pp. 427–436. DOI:10.1109/CVPR.2015.7298640

work page doi:10.1109/cvpr.2015.7298640 2015

[2] [2]

Neural Ordinary Differential Equations

Tian Qi Chen et al. “Neural Ordinary Differential Equations”. In:Proc. of NeurIPS. 2018, pp. 6572–6583

work page 2018

[3] [3]

Flow Matching for Generative Modeling

Yaron Lipman et al. “Flow Matching for Generative Modeling”. In:Proc. of ICLR. 2023

work page 2023

[4] [4]

Do Deep Generative Models Know What They Don’t Know?

Eric T. Nalisnick et al. “Do Deep Generative Models Know What They Don’t Know?” In: Proc. of ICLR. 2019

work page 2019

[5] [5]

Input Complexity and Out-of-distribution Detection with Likelihood-based Generative Models

Joan Serrà et al. “Input Complexity and Out-of-distribution Detection with Likelihood-based Generative Models”. In:Proc. of ICLR. 2020

work page 2020

[6] [6]

Understanding Anomaly Detection with Deep Invertible Networks through Hierarchies of Distributions and Features

Robin Schirrmeister et al. “Understanding Anomaly Detection with Deep Invertible Networks through Hierarchies of Distributions and Features”. In:Proc. of NeurIPS. 2020

work page 2020

[7] [7]

Why Normalizing Flows Fail to Detect Out-of-Distribution Data

Polina Kirichenko, Pavel Izmailov, and Andrew Gordon Wilson. “Why Normalizing Flows Fail to Detect Out-of-Distribution Data”. In:Proc. of NeurIPS. 2020

work page 2020

[8] [8]

Detecting Out-of-Distribution Inputs to Deep Generative Models Using Typicality,

Eric Nalisnick et al. “Detecting Out-of-Distribution Inputs to Deep Generative Models Using Typicality”. In:ArXiv preprintabs/1906.02994 (2019)

work page arXiv 1906

[9] [9]

Likelihood Ratios for Out-of-Distribution Detection

Jie Ren et al. “Likelihood Ratios for Out-of-Distribution Detection”. In:Proc. of NeurIPS. 2019, pp. 14680–14691

work page 2019

[10] [10]

Likelihood Regret: An Out-of-Distribution Detec- tion Score For Variational Auto-encoder

Zhisheng Xiao, Qing Yan, and Yali Amit. “Likelihood Regret: An Out-of-Distribution Detec- tion Score For Variational Auto-encoder”. In:Proc. of NeurIPS. 2020

work page 2020

[11] [11]

Density of States Estimation for Out of Distribution Detection

Warren R. Morningstar et al. “Density of States Estimation for Out of Distribution Detection”. In:Proc. of AISTATS. V ol. 130. Proceedings of Machine Learning Research. 2021, pp. 3232– 3240

work page 2021

[12] [12]

Hierarchical V AEs Know What They Don’t Know

Jakob Drachmann Havtorn et al. “Hierarchical V AEs Know What They Don’t Know”. In:Proc. of ICML. V ol. 139. Proceedings of Machine Learning Research. 2021, pp. 4117–4128

work page 2021

[13] [13]

A Geometric Explanation of the Likelihood OOD Detection Paradox

Hamidreza Kamkari et al. “A Geometric Explanation of the Likelihood OOD Detection Paradox”. In:Proc. of ICML. 2024

work page 2024

[14] [14]

Hopcroft, and Ravindran Kannan.Foundations of data science

Avrim Blum, John E. Hopcroft, and Ravindran Kannan.Foundations of data science. 2020. ISBN: 978-1-108-48506-7 978-1-108-75552-8.DOI:10.1017/9781108755528

work page doi:10.1017/9781108755528 2020

[15] [15]

Understanding Failures in Out-of- Distribution Detection with Deep Generative Models

Lily H. Zhang, Mark Goldstein, and Rajesh Ranganath. “Understanding Failures in Out-of- Distribution Detection with Deep Generative Models”. In:Proc. of ICML. V ol. 139. Proceed- ings of Machine Learning Research. 2021, pp. 12427–12436

work page 2021

[16] [16]

WAIC, but Why? Generative Ensembles for Robust Anomaly Detection

Hyunsun Choi, Eric Jang, and Alexander A. Alemi. “W AIC, but Why? Generative Ensembles for Robust Anomaly Detection”. In:ArXiv preprintabs/1810.01392 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[17] [17]

Revisiting flow generative models for Out-of- distribution detection

Dihong Jiang, Sun Sun, and Yaoliang Yu. “Revisiting flow generative models for Out-of- distribution detection”. In:Proc. of ICLR. 2022

work page 2022

[18] [18]

Out-of-Distribution Detection with a Single Unconditional Diffusion Model

Alvin Heng, Alexandre H. Thiery, and Harold Soh. “Out-of-Distribution Detection with a Single Unconditional Diffusion Model”. In:Proc. of NeurIPS. 2024

work page 2024

[19] [19]

Marcus Hutter.Testing Independence of Exchangeable Random Variables. Oct. 2022.DOI: 10.48550/arXiv.2210.12392

work page doi:10.48550/arxiv.2210.12392 2022

[20] [20]

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Dan Hendrycks and Thomas G. Dietterich. “Benchmarking Neural Network Robustness to Common Corruptions and Perturbations”. In:Proc. of ICLR. 2019

work page 2019

[21] [21]

Denoising Diffusion Probabilistic Models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. “Denoising Diffusion Probabilistic Models”. In: Proc. of NeurIPS. 2020

work page 2020

[22] [22]

Diffusion Models Beat GANs on Image Synthesis

Prafulla Dhariwal and Alexander Quinn Nichol. “Diffusion Models Beat GANs on Image Synthesis”. In:Proc. of NeurIPS. 2021, pp. 8780–8794

work page 2021

[23] [23]

doi: https://doi.org/ 10.1016/0771-050X(80)90013-3

J. R. Dormand and P. J. Prince. “A family of embedded Runge-Kutta formulae”. In:Journal of Computational and Applied Mathematics6.1 (1980), pp. 19–26.ISSN: 0377-0427.DOI: 10.1016/0771-050X(80)90013-3

work page doi:10.1016/0771-050x(80)90013-3 1980

[24] [24]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. “Learning multiple layers of features from tiny images” (2009)

work page 2009

[25] [25]

Reading digits in natural images with unsupervised feature learning

Yuval Netzer et al. “Reading digits in natural images with unsupervised feature learning”. In: NIPS workshop on deep learning and unsupervised feature learning. V ol. 2011. 2. 2011, p. 4. 11

work page 2011

[26] [26]

Deep Learning Face Attributes in the Wild

Ziwei Liu et al. “Deep Learning Face Attributes in the Wild”. In:Proc. of ICCV. 2015, pp. 3730–3738.DOI:10.1109/ICCV.2015.425

work page doi:10.1109/iccv.2015.425 2015

[27] [27]

Rania Briq et al.The Amazing Stability of Flow Matching. Apr. 2026.DOI: 10.48550/arXiv. 2604.16079

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2026

[28] [28]

Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator

Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz. “Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator”. In:The Annals of Mathematical Statistics(1956), pp. 642–669

work page 1956

[29] [29]

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke et al. “PyTorch: An Imperative Style, High-Performance Deep Learning Library”. In:Proc. of NeurIPS. 2019, pp. 8024–8035

work page 2019

[30] [30]

Flow Matching Guide and Code

Yaron Lipman et al. “Flow Matching Guide and Code”. In:ArXiv preprintabs/2412.06264 (2024). 12 A Metric Ablations Bright Contrast Defocus Elastic Fog Frost Gauss B Gauss N Glass Impulse JPEG Motion Pixel Saturate Shot Snow Spatter Speckle Zoom SITN AD CV 0.82 0.60 0.73 0.74 0.64 0.88 0.75 0.91 0.95 1.00 0.97 0.90 0.95 0.74 0.93 0.96 0.97 0.96 0.80 0.85 0....

work page internal anchor Pith review Pith/arXiv arXiv 2024