Out-of-Distribution Detection Using Neural Rendering Generative Models

Anima Anandkumar; Richard G. Baraniuk; Sihui Dai; Tan Nguyen; Yujia Huang

arxiv: 1907.04572 · v1 · pith:F2NK4TD2new · submitted 2019-07-10 · 💻 cs.LG · cs.CV· stat.ML

Out-of-Distribution Detection Using Neural Rendering Generative Models

Yujia Huang , Sihui Dai , Tan Nguyen , Richard G. Baraniuk , Anima Anandkumar This is my paper

Pith reviewed 2026-05-24 23:33 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML

keywords out-of-distribution detectionneural rendering modelsgenerative modelslatent likelihoodCIFAR-10SVHNdeep generative models

0 comments

The pith

Neural rendering models detect out-of-distribution data by assigning lower joint likelihood to latent variables.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that neural rendering generative models can perform out-of-distribution detection by using the joint likelihood of their latent variables as the primary metric. Unlike standard likelihood or reconstruction-loss methods in flow models and VAEs, this approach correctly assigns lower likelihood to OoD samples that have smaller variance than the training data, such as SVHN images when the model is trained on CIFAR-10. The neural rendering model unifies the two prior approaches because it supplies both a likelihood estimate and layer-wise reconstruction. A sympathetic reader would care because this fixes a documented failure mode that has limited the use of generative models for reliable OoD detection in practice. The metric is reported to remain consistent when tested on additional OoD datasets.

Core claim

The neural rendering model unifies likelihood-based and reconstruction-based OoD detection by providing both in one architecture; among the derived metrics, the joint likelihood of the latent variables is the most effective and consistently assigns lower likelihood to OoD data with smaller variance, such as SVHN images after training on CIFAR-10, while also working across other OoD test sets.

What carries the argument

Joint likelihood of latent variables, computed by the neural rendering model to quantify how well an input fits the learned distribution while incorporating per-layer reconstruction.

If this is right

The joint latent likelihood metric succeeds on smaller-variance OoD data where flow-based and VAE likelihoods fail.
The same metric remains effective when evaluated on OoD datasets other than SVHN.
A single neural rendering model supplies both likelihood estimates and layer-wise reconstruction for OoD scoring.
Existing generative-model OoD methods are limited by their inability to handle variance mismatch between training and test distributions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Architectures that embed reconstruction at multiple layers may be inherently better suited for OoD tasks than pure likelihood models.
The consistency across multiple OoD sets suggests the metric captures distribution mismatch rather than dataset-specific features.
The approach could be tested by swapping in other generative backbones that also produce layered reconstructions to check whether the metric transfers.

Load-bearing premise

The trained neural rendering model produces latent likelihood values whose ordering reliably separates in-distribution from out-of-distribution samples without being driven by architecture-specific artifacts or the particular choice of test sets.

What would settle it

An experiment in which the joint latent likelihood is higher, on average, for SVHN images than for CIFAR-10 images when the model is trained on CIFAR-10 would falsify the central claim.

Figures

Figures reproduced from arXiv: 1907.04572 by Anima Anandkumar, Richard G. Baraniuk, Sihui Dai, Tan Nguyen, Yujia Huang.

**Figure 2.** Figure 2: (a): Graphical model of NRM. Start from object category, intermediate rendered images [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: (a) Top 25 images with highest and lowest likelihood from NRM. Log-likelihood histograms [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: (a) Histograms of p({z ∗ (`)}`=1:L|y ∗ ) of CIFAR-10 and CIFAR-100 using NRM trained on CIFAR-10. We observe that CIFAR-10 shares a large overlap with CIFAR-100. (b) Left: A histogram of p({z ∗ (`)}`=1:L|y ∗ ) of CIFAR-10 automobile, CIFAR-10 truck, and CIFAR-100 pickup truck. Right: histogram of p({z ∗ (`)}`=1:L|y ∗ ) for the girl category of CIFAR-100 and CelebA. We see that the distribution of p({z ∗ (`… view at source ↗

**Figure 5.** Figure 5: Reconstruction loss at different layers. Layer 0 to Layer 9: pixel level to one layer above [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Out-of-distribution (OoD) detection is a natural downstream task for deep generative models, due to their ability to learn the input probability distribution. There are mainly two classes of approaches for OoD detection using deep generative models, viz., based on likelihood measure and the reconstruction loss. However, both approaches are unable to carry out OoD detection effectively, especially when the OoD samples have smaller variance than the training samples. For instance, both flow based and VAE models assign higher likelihood to images from SVHN when trained on CIFAR-10 images. We use a recently proposed generative model known as neural rendering model (NRM) and derive metrics for OoD. We show that NRM unifies both approaches since it provides a likelihood estimate and also carries out reconstruction in each layer of the neural network. Among various measures, we found the joint likelihood of latent variables to be the most effective one for OoD detection. Our results show that when trained on CIFAR-10, lower likelihood (of latent variables) is assigned to SVHN images. Additionally, we show that this metric is consistent across other OoD datasets. To the best of our knowledge, this is the first work to show consistently lower likelihood for OoD data with smaller variance with deep generative models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NRM joint latent likelihood appears to reverse the usual SVHN-over-CIFAR likelihood ordering for OoD, but the abstract supplies no numbers or implementation details to judge whether it actually works.

read the letter

The main point is that this paper takes a neural rendering model and shows its joint latent likelihood assigns lower scores to SVHN than to CIFAR-10, which is the direction we want for OoD detection and the opposite of what flows and VAEs produce. They also test consistency on other OoD sets and argue the NRM structure lets them combine likelihood with per-layer reconstruction so they can compare metrics directly. That part is straightforward and addresses a known practical failure mode.

Referee Report

2 major / 1 minor

Summary. The paper claims that Neural Rendering Models (NRM) unify likelihood and reconstruction approaches for OoD detection and that the joint likelihood over latent variables is the most effective metric, consistently assigning lower likelihood to OoD samples with smaller variance (e.g., SVHN vs. CIFAR-10) where standard generative models fail, with consistency shown across additional OoD datasets.

Significance. If the quantitative results and controls hold, the work would be significant for addressing a documented failure mode of likelihood-based OoD detection in deep generative models. The unification via NRM's layered structure is a conceptual strength, and a metric that reliably separates lower-variance OoD data would be a useful advance if shown to be robust rather than architecture- or dataset-specific.

major comments (2)

[Abstract] Abstract: the central claim that the joint likelihood of latent variables is the most effective OoD metric and 'consistently assigns lower likelihood to OoD data with smaller variance' is unsupported by any reported scores, baseline comparisons (e.g., to VAE or flow likelihoods), statistical tests, or error bars; the data-to-claim link cannot be evaluated from the given text.
[Abstract] Abstract: the precise definition of the joint likelihood (product of which conditionals? marginals? aggregation across which layers of the NRM?) is not supplied, which is load-bearing for determining whether the reported ordering arises from improved density estimation or from unexamined architecture-specific artifacts or low-level statistics of the chosen test sets.

minor comments (1)

The manuscript should add a dedicated subsection detailing the exact computation of the joint latent likelihood, including any normalization steps across datasets.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We address each point below and will revise the manuscript to improve clarity.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the joint likelihood of latent variables is the most effective OoD metric and 'consistently assigns lower likelihood to OoD data with smaller variance' is unsupported by any reported scores, baseline comparisons (e.g., to VAE or flow likelihoods), statistical tests, or error bars; the data-to-claim link cannot be evaluated from the given text.

Authors: We agree the abstract's brevity prevents inclusion of specific numerical scores, error bars, or statistical tests. The main manuscript body contains the quantitative results, including direct comparisons showing NRM joint latent likelihood assigns lower values to SVHN than CIFAR-10 (unlike VAE and flow baselines) with consistency across additional OoD sets. We will revise the abstract to reference these experimental findings and the supporting figures more explicitly. revision: yes
Referee: [Abstract] Abstract: the precise definition of the joint likelihood (product of which conditionals? marginals? aggregation across which layers of the NRM?) is not supplied, which is load-bearing for determining whether the reported ordering arises from improved density estimation or from unexamined architecture-specific artifacts or low-level statistics of the chosen test sets.

Authors: We will revise the abstract to supply the definition: the joint likelihood is the product of the per-layer conditional likelihoods of the latent variables in the NRM's hierarchical rendering structure. This will clarify that the metric aggregates across layers rather than using a single marginal. revision: yes

Circularity Check

0 steps flagged

No significant circularity; OoD metric is an empirical evaluation of NRM latents

full rationale

The paper's central claim is an experimental finding that the joint likelihood over NRM latent variables separates CIFAR-10 from SVHN (and other OoD sets) more reliably than likelihood or reconstruction baselines. This ordering is obtained by direct computation from the trained generative model rather than by fitting a parameter to the target OoD labels or by renaming a known result. The NRM itself is referenced as prior work; no load-bearing uniqueness theorem, self-citation chain, or ansatz is invoked to force the reported superiority. The derivation therefore remains self-contained against external benchmarks and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities are identifiable from the provided text. The NRM is described as recently proposed, so its internal modeling assumptions are not detailed here.

pith-pipeline@v0.9.0 · 5775 in / 1117 out tokens · 27686 ms · 2026-05-24T23:33:57.641187+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 8 internal anchors

[1]

and Cho, S

An, J. and Cho, S. (2015). Variational autoencoder based anomaly detection using reconstruction probability. Special Lecture on IE, 2:1–18

work page 2015
[2]

WAIC, but Why? Generative Ensembles for Robust Anomaly Detection

Choi, H. and Jang, E. (2018). Generative ensembles for robust anomaly detection. arXiv preprint arXiv:1810.01392

work page internal anchor Pith review Pith/arXiv arXiv 2018
[3]

Friston, K. (2018). Does predictive coding have a future? Nature neuroscience, 21(8):1019

work page 2018
[4]

Guo, C., Pleiss, G., Sun, Y ., and Weinberger, K. Q. (2017). On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning-V olume 70, pages 1321–1330. JMLR. org

work page 2017
[5]

Hendrycks, D., Mazeika, M., and Dietterich, T. G. (2018). Deep anomaly detection with outlier exposure. CoRR, abs/1812.04606

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

B., Anandkumar, A., Jordan, M

Ho, N., Nguyen, T., Patel, A. B., Anandkumar, A., Jordan, M. I., and Baraniuk, R. G. (2018). Neural rendering model: Joint generation and prediction for semi-supervised learning. CoRR, abs/1811.02657

work page arXiv 2018
[7]

Kingma, D. P. and Dhariwal, P. (2018). Glow: Generative ﬂow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems, pages 10215–10224

work page 2018
[8]

and Hinton, G

Krizhevsky, A. and Hinton, G. (2009). Learning multiple layers of features from tiny images. Technical report, Citeseer

work page 2009
[9]

Li, D., Chen, D., Goh, J., and Ng, S.-k. (2018). Anomaly detection with generative adversarial networks for multivariate time series. arXiv preprint arXiv:1809.04758

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Liu, Z., Luo, P., Wang, X., and Tang, X. (2015). Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV)

work page 2015
[11]

Do Deep Generative Models Know What They Don't Know?

Nalisnick, E., Matsukawa, A., Teh, Y . W., Gorur, D., and Lakshminarayanan, B. (2018). Do deep generative models know what they don’t know? arXiv preprint arXiv:1810.09136

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

Hybrid Models with Deep and Invertible Features

Nalisnick, E. T., Matsukawa, A., Teh, Y . W., Görür, D., and Lakshminarayanan, B. (2019). Hybrid models with deep and invertible features. CoRR, abs/1902.02767

work page internal anchor Pith review Pith/arXiv arXiv 2019
[13]

Netzer, Y ., Wang, T., Coates, A., Bissacco, A., Wu, B., and Ng, A. Y . (2011). Reading digits in natural images with unsupervised feature learning

work page 2011
[14]

Striving for Simplicity: The All Convolutional Net

Springenberg, J. T., Dosovitskiy, A., Brox, T., and Riedmiller, M. (2014). Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806

work page internal anchor Pith review Pith/arXiv arXiv 2014
[15]

Zeiler, M. D. and Fergus, R. (2013). Visualizing and understanding convolutional networks. CoRR, abs/1311.2901

work page internal anchor Pith review Pith/arXiv arXiv 2013
[16]

Efficient GAN-Based Anomaly Detection

Zenati, H., Foo, C. S., Lecouat, B., Manek, G., and Chandrasekhar, V . R. (2018). Efﬁcient gan-based anomaly detection. arXiv preprint arXiv:1802.06222

work page internal anchor Pith review Pith/arXiv arXiv 2018
[17]

cat" and reconstruction of cat from false label

Zhou, C. and Paffenroth, R. C. (2017). Anomaly detection with robust deep autoencoders. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 665–674. ACM. 10 Appendices A A closer look at NRM The notations used to deﬁne NRM are summarized in Table 2. The generation process in NRM is described in Sec...

work page 2017

[1] [1]

and Cho, S

An, J. and Cho, S. (2015). Variational autoencoder based anomaly detection using reconstruction probability. Special Lecture on IE, 2:1–18

work page 2015

[2] [2]

WAIC, but Why? Generative Ensembles for Robust Anomaly Detection

Choi, H. and Jang, E. (2018). Generative ensembles for robust anomaly detection. arXiv preprint arXiv:1810.01392

work page internal anchor Pith review Pith/arXiv arXiv 2018

[3] [3]

Friston, K. (2018). Does predictive coding have a future? Nature neuroscience, 21(8):1019

work page 2018

[4] [4]

Guo, C., Pleiss, G., Sun, Y ., and Weinberger, K. Q. (2017). On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning-V olume 70, pages 1321–1330. JMLR. org

work page 2017

[5] [5]

Hendrycks, D., Mazeika, M., and Dietterich, T. G. (2018). Deep anomaly detection with outlier exposure. CoRR, abs/1812.04606

work page internal anchor Pith review Pith/arXiv arXiv 2018

[6] [6]

B., Anandkumar, A., Jordan, M

Ho, N., Nguyen, T., Patel, A. B., Anandkumar, A., Jordan, M. I., and Baraniuk, R. G. (2018). Neural rendering model: Joint generation and prediction for semi-supervised learning. CoRR, abs/1811.02657

work page arXiv 2018

[7] [7]

Kingma, D. P. and Dhariwal, P. (2018). Glow: Generative ﬂow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems, pages 10215–10224

work page 2018

[8] [8]

and Hinton, G

Krizhevsky, A. and Hinton, G. (2009). Learning multiple layers of features from tiny images. Technical report, Citeseer

work page 2009

[9] [9]

Li, D., Chen, D., Goh, J., and Ng, S.-k. (2018). Anomaly detection with generative adversarial networks for multivariate time series. arXiv preprint arXiv:1809.04758

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Liu, Z., Luo, P., Wang, X., and Tang, X. (2015). Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV)

work page 2015

[11] [11]

Do Deep Generative Models Know What They Don't Know?

Nalisnick, E., Matsukawa, A., Teh, Y . W., Gorur, D., and Lakshminarayanan, B. (2018). Do deep generative models know what they don’t know? arXiv preprint arXiv:1810.09136

work page internal anchor Pith review Pith/arXiv arXiv 2018

[12] [12]

Hybrid Models with Deep and Invertible Features

Nalisnick, E. T., Matsukawa, A., Teh, Y . W., Görür, D., and Lakshminarayanan, B. (2019). Hybrid models with deep and invertible features. CoRR, abs/1902.02767

work page internal anchor Pith review Pith/arXiv arXiv 2019

[13] [13]

Netzer, Y ., Wang, T., Coates, A., Bissacco, A., Wu, B., and Ng, A. Y . (2011). Reading digits in natural images with unsupervised feature learning

work page 2011

[14] [14]

Striving for Simplicity: The All Convolutional Net

Springenberg, J. T., Dosovitskiy, A., Brox, T., and Riedmiller, M. (2014). Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806

work page internal anchor Pith review Pith/arXiv arXiv 2014

[15] [15]

Zeiler, M. D. and Fergus, R. (2013). Visualizing and understanding convolutional networks. CoRR, abs/1311.2901

work page internal anchor Pith review Pith/arXiv arXiv 2013

[16] [16]

Efficient GAN-Based Anomaly Detection

Zenati, H., Foo, C. S., Lecouat, B., Manek, G., and Chandrasekhar, V . R. (2018). Efﬁcient gan-based anomaly detection. arXiv preprint arXiv:1802.06222

work page internal anchor Pith review Pith/arXiv arXiv 2018

[17] [17]

cat" and reconstruction of cat from false label

Zhou, C. and Paffenroth, R. C. (2017). Anomaly detection with robust deep autoencoders. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 665–674. ACM. 10 Appendices A A closer look at NRM The notations used to deﬁne NRM are summarized in Table 2. The generation process in NRM is described in Sec...

work page 2017