pith. sign in

arxiv: 2402.17888 · v5 · pith:RUASZ4XOnew · submitted 2024-02-27 · 💻 cs.LG · cs.AI

ConjNorm: Tractable Density Estimation for Out-of-Distribution Detection

Pith reviewed 2026-05-25 08:33 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords out-of-distribution detectiondensity estimationBregman divergenceConjNormMonte Carlo estimatornorm coefficientexponential family distributions
0
0 comments X

The pith

ConjNorm reframes density estimation for out-of-distribution detection as optimization of a norm coefficient under Bregman divergence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Many OOD detection approaches rely on scores from logits or distances that may not capture true data density. The paper offers a unified view using Bregman divergence to cover exponential family distributions. It derives ConjNorm, turning density design into finding the right norm coefficient p for the dataset. A Monte Carlo importance sampling method provides an unbiased estimate of the needed partition function. This setup delivers better OOD detection across benchmarks.

Core claim

We propose a novel theoretical framework grounded in Bregman divergence, which extends distribution considerations to encompass an exponential family of distributions. Leveraging the conjugation constraint revealed in our theorem, we introduce a ConjNorm method, reframing density function design as a search for the optimal norm coefficient p against the given dataset. In light of the computational challenges of normalization, we devise an unbiased and analytically tractable estimator of the partition function using the Monte Carlo-based importance sampling technique.

What carries the argument

The conjugation constraint from the Bregman divergence theorem that reframes density function design as a search for the optimal norm coefficient p.

Load-bearing premise

The conjugation constraint from the Bregman divergence theorem allows reframing density function design as a search for the optimal norm coefficient p against the given dataset.

What would settle it

A demonstration that the Monte Carlo estimator is biased on real datasets or that ConjNorm fails to outperform existing methods would falsify the performance claims.

Figures

Figures reproduced from arXiv: 2402.17888 by Bo Peng, Yadan Luo, Yixuan Li, Yonggang Zhang, Zhen Fang.

Figure 1
Figure 1. Figure 1: Illustration of the alignment of GEM score and true density of Gaussian (Left) and Gamma (Right) distributions. Distance-based OOD methods (Lee et al., 2017) target on deriving gθ(z, k) by assessing the proximity of the input to the k-th prototype µk. The selection of appropriate similarity met￾rics is crucial in capturing the intrinsic geomet￾ric data relationships. One of the most repre￾sentative metrics… view at source ↗
Figure 2
Figure 2. Figure 2: Evaluations of different partition function estimation baselines on ImageNet: Left: Mo [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Ablation study using feature extractions from (a) the first, (b) the second, and (c) the last [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Ablation study w.r.t varing sampling ratio [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparisons of varying q when p is fixed at 2.5 (Left) and 3.0 (Right) on CIFAR-100 [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
read the original abstract

Post-hoc out-of-distribution (OOD) detection has garnered intensive attention in reliable machine learning. Many efforts have been dedicated to deriving score functions based on logits, distances, or rigorous data distribution assumptions to identify low-scoring OOD samples. Nevertheless, these estimate scores may fail to accurately reflect the true data density or impose impractical constraints. To provide a unified perspective on density-based score design, we propose a novel theoretical framework grounded in Bregman divergence, which extends distribution considerations to encompass an exponential family of distributions. Leveraging the conjugation constraint revealed in our theorem, we introduce a \textsc{ConjNorm} method, reframing density function design as a search for the optimal norm coefficient $p$ against the given dataset. In light of the computational challenges of normalization, we devise an unbiased and analytically tractable estimator of the partition function using the Monte Carlo-based importance sampling technique. Extensive experiments across OOD detection benchmarks empirically demonstrate that our proposed \textsc{ConjNorm} has established a new state-of-the-art in a variety of OOD detection setups, outperforming the current best method by up to 13.25$\%$ and 28.19$\%$ (FPR95) on CIFAR-100 and ImageNet-1K, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes ConjNorm for post-hoc OOD detection. Grounded in Bregman divergence, it extends considerations to an exponential family and uses a conjugation constraint to reframe density design as a search for the optimal norm coefficient p on the given dataset. An unbiased Monte Carlo importance-sampling estimator is introduced for the partition function to address normalization. Experiments across OOD benchmarks report new SOTA results, with FPR95 gains up to 13.25% on CIFAR-100 and 28.19% on ImageNet-1K over prior best methods.

Significance. If the Bregman theorem and unbiasedness of the estimator hold, the work supplies a unified theoretical lens on density-based OOD scores together with a tractable estimator, and the reported gains would represent a substantial empirical advance over existing logit-, distance-, and density-based detectors.

major comments (3)
  1. [Theoretical framework] The Bregman-divergence theorem and conjugation constraint (theoretical development section): the central claim that this constraint legitimately reframes density estimation as a search over p and yields a valid density (rather than a fitted score) cannot be assessed without the explicit theorem statement, proof, and any assumptions on the exponential family.
  2. [Estimator] Monte Carlo importance-sampling estimator for the partition function (method section): the assertion of unbiasedness is load-bearing for tractability and for attributing performance gains to the framework rather than to post-hoc fitting; the derivation, proposal distribution, and variance behavior in high-dimensional image regimes must be shown explicitly.
  3. [Experiments] Selection of the norm coefficient p (experimental protocol): the method searches for optimal p against the given dataset; it is unclear whether this search is performed solely on ID training data or involves validation/test splits, which would introduce circularity and undermine the claim that scores retain independent grounding.
minor comments (2)
  1. [Introduction/Theory] Notation for the exponential family and the resulting density should be introduced with explicit equations early in the theoretical section to aid readability.
  2. [Experiments] Table captions and axis labels on the main result figures should explicitly state the evaluation metric (FPR95) and the baselines being compared.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and detailed review. The three major comments identify areas where additional explicit detail would strengthen the manuscript. We address each point below and indicate the corresponding revisions.

read point-by-point responses
  1. Referee: [Theoretical framework] The Bregman-divergence theorem and conjugation constraint (theoretical development section): the central claim that this constraint legitimately reframes density estimation as a search over p and yields a valid density (rather than a fitted score) cannot be assessed without the explicit theorem statement, proof, and any assumptions on the exponential family.

    Authors: Theorem 1 in Section 3 states the conjugation constraint and its consequence for reframing the density as a search over the norm coefficient p within the exponential family. The full proof appears in Appendix A, under the assumption that the base measure is positive and the natural parameter space is convex. To improve accessibility we will move a concise statement of the theorem and the key proof steps into the main text of Section 3. revision: yes

  2. Referee: [Estimator] Monte Carlo importance-sampling estimator for the partition function (method section): the assertion of unbiasedness is load-bearing for tractability and for attributing performance gains to the framework rather than to post-hoc fitting; the derivation, proposal distribution, and variance behavior in high-dimensional image regimes must be shown explicitly.

    Authors: Section 4.2 derives the unbiased estimator via importance sampling with the in-distribution empirical measure as the proposal; unbiasedness follows directly from the standard Monte Carlo identity. We will insert the complete derivation, the explicit proposal distribution, and a short analysis of variance scaling with dimension into the main text of Section 4.2, together with additional high-dimensional variance diagnostics in the supplementary material. revision: yes

  3. Referee: [Experiments] Selection of the norm coefficient p (experimental protocol): the method searches for optimal p against the given dataset; it is unclear whether this search is performed solely on ID training data or involves validation/test splits, which would introduce circularity and undermine the claim that scores retain independent grounding.

    Authors: The search for p is performed exclusively on the ID training set (using an internal validation split carved from the training data) and never touches OOD or test data. This protocol is stated in Section 5.1. We will add an explicit sentence clarifying that no OOD or test information is used during p selection. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces its own Bregman divergence theorem and conjugation constraint to reframe density design as a search over norm coefficient p on the given (in-distribution) dataset, followed by an MC importance-sampling estimator for the partition function. This constitutes an explicit modeling choice and fitting procedure whose outputs are then evaluated on separate OOD benchmarks; the derivation chain does not reduce by construction to prior inputs, self-citations, or renamed known results. The central empirical claims rest on independent validation rather than tautological re-use of fitted quantities as predictions. No load-bearing self-citation or self-definitional step is present in the provided text.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; full derivations, assumptions, and experimental protocols unavailable. The ledger therefore records only elements explicitly named in the abstract.

free parameters (1)
  • norm coefficient p
    Described as searched against the given dataset to obtain the optimal value for the density function.
axioms (1)
  • domain assumption Bregman divergence framework extends distribution considerations to an exponential family of distributions
    Invoked as the grounding for the proposed theorem on conjugation constraints.

pith-pipeline@v0.9.0 · 5762 in / 1391 out tokens · 24901 ms · 2026-05-25T08:33:46.042749+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. On the Provable Importance of Gradients for Language-Assisted Image Clustering

    cs.CV 2025-10 unverdicted novelty 6.0

    GradNorm selects positive nouns via gradient magnitudes from cross-entropy loss, with an error bound proving it subsumes prior CLIP methods and delivers SOTA clustering results.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · cited by 1 Pith paper · 6 internal anchors

  1. [1]

    Line: Out-of-distribution detection by leveraging important neurons

    Yong Hyun Ahn, Gyeong-Moon Park, and Seong Tae Kim. Line: Out-of-distribution detection by leveraging important neurons. arXiv preprint arXiv:2303.13995, 2023

  2. [2]

    Building Normalizing Flows with Stochastic Interpolants

    Michael S Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants. arXiv preprint arXiv:2209.15571, 2022

  3. [3]

    Exponential families and mixture families of probability distributions

    Shun-ichi Amari. Exponential families and mixture families of probability distributions. In Information Geometry and Its Applications, pp.\ 31--49. Springer, 2016

  4. [4]

    Relative loss bounds for on-line density estimation with the exponential family of distributions

    Katy S Azoury and Manfred K Warmuth. Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine learning, 43: 0 211--246, 2001

  5. [5]

    On the effectiveness of out-of-distribution data in self-supervised long-tail learning

    Jianhong Bai, Zuozhu Liu, Hualiang Wang, Jin Hao, Yang Feng, Huanpeng Chu, and Haoji Hu. On the effectiveness of out-of-distribution data in self-supervised long-tail learning. arXiv preprint arXiv:2306.04934, 2023

  6. [6]

    Density modeling of images using a generalized normalization transformation

    Johannes Ball \'e , Valero Laparra, and Eero P Simoncelli. Density modeling of images using a generalized normalization transformation. arXiv preprint arXiv:1511.06281, 2015

  7. [7]

    Clustering with bregman divergences

    Arindam Banerjee, Srujana Merugu, Inderjit S Dhillon, Joydeep Ghosh, and John Lafferty. Clustering with bregman divergences. Journal of machine learning research, 6 0 (10), 2005

  8. [8]

    Adaptive importance sampling for multilevel monte carlo euler method

    Mohamed Ben Alaya, Kaouther Hajji, and Ahmed Kebaier. Adaptive importance sampling for multilevel monte carlo euler method. Stochastics, 95 0 (2): 0 303--327, 2023

  9. [9]

    The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming

    Lev M Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR computational mathematics and mathematical physics, 7 0 (3): 0 200--217, 1967

  10. [10]

    Fundamentals of statistical exponential families: with applications in statistical decision theory

    Lawrence D Brown. Fundamentals of statistical exponential families: with applications in statistical decision theory. Ims, 1986

  11. [11]

    Learning imbalanced datasets with label-distribution-aware margin loss

    Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems, 32, 2019

  12. [12]

    Adversarial reciprocal points learning for open set recognition

    Guangyao Chen, Peixi Peng, Xiangqian Wang, and Yonghong Tian. Adversarial reciprocal points learning for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44 0 (11): 0 8065--8081, 2021 a

  13. [13]

    Atom: Robustifying out-of-distribution detection using outlier mining

    Jiefeng Chen, Yixuan Li, Xi Wu, Yingyu Liang, and Somesh Jha. Atom: Robustifying out-of-distribution detection using outlier mining. In Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13--17, 2021, Proceedings, Part III 21, pp.\ 430--445. Springer, 2021 b

  14. [14]

    Milestones in autonomous driving and intelligent vehicles: Survey of surveys

    Long Chen, Yuchen Li, Chao Huang, Bai Li, Yang Xing, Daxin Tian, Li Li, Zhongxu Hu, Xiaoxiang Na, Zixuan Li, et al. Milestones in autonomous driving and intelligent vehicles: Survey of surveys. IEEE Transactions on Intelligent Vehicles, 8 0 (2): 0 1046--1056, 2022

  15. [15]

    A tutorial on kernel density estimation and recent advances

    Yen-Chi Chen. A tutorial on kernel density estimation and recent advances. Biostatistics & Epidemiology, 1 0 (1): 0 161--187, 2017

  16. [16]

    Bregman deviations of generic exponential families

    Sayak Ray Chowdhury, Patrick Saux, Odalric Maillard, and Aditya Gopalan. Bregman deviations of generic exponential families. In The Thirty Sixth Annual Conference on Learning Theory, pp.\ 394--449. PMLR, 2023

  17. [17]

    Describing textures in the wild

    Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.\ 3606--3613, 2014

  18. [18]

    Density estimation using Real NVP

    Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016

  19. [19]

    Extremely simple activation shaping for out-of-distribution detection

    Andrija Djurisic, Nebojsa Bozanic, Arjun Ashok, and Rosanne Liu. Extremely simple activation shaping for out-of-distribution detection. arXiv preprint arXiv:2209.09858, 2022

  20. [20]

    Vos: Learning what you don't know by virtual outlier synthesis

    Xuefeng Du, Zhaoning Wang, Mu Cai, and Yixuan Li. Vos: Learning what you don't know by virtual outlier synthesis. arXiv preprint arXiv:2202.01197, 2022

  21. [21]

    Is out-of-distribution detection learnable? In NeurIPS, 2022

    Zhen Fang, Yixuan Li, Jie Lu, Jiahua Dong, Bo Han, and Feng Liu. Is out-of-distribution detection learnable? In NeurIPS, 2022

  22. [22]

    Relative expected instantaneous loss bounds

    J \"u rgen Forster and Manfred K Warmuth. Relative expected instantaneous loss bounds. Journal of Computer and System Sciences, 64 0 (1): 0 76--102, 2002

  23. [23]

    A review on speech recognition technique

    Santosh K Gaikwad, Bharti W Gawali, and Pravin Yannawar. A review on speech recognition technique. International Journal of Computer Applications, 10 0 (3): 0 16--24, 2010

  24. [24]

    Made: Masked autoencoder for distribution estimation

    Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. Made: Masked autoencoder for distribution estimation. In International conference on machine learning, pp.\ 881--889. PMLR, 2015

  25. [25]

    Flow-gan: Combining maximum likelihood and adversarial learning in generative models

    Aditya Grover, Manik Dhar, and Stefano Ermon. Flow-gan: Combining maximum likelihood and adversarial learning in generative models. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018

  26. [26]

    Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics

    Michael U Gutmann and Aapo Hyv \"a rinen. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. Journal of machine learning research, 13 0 (2), 2012 a

  27. [27]

    Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics

    Michael U Gutmann and Aapo Hyv \"a rinen. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. Journal of machine learning research, 13 0 (2), 2012 b

  28. [28]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.\ 770--778, 2016

  29. [29]

    A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

    Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136, 2016

  30. [30]

    Scaling out-of-distribution detection for real-world settings

    Dan Hendrycks, Steven Basart, Mantas Mazeika, Andy Zou, Joe Kwon, Mohammadreza Mostajabi, Jacob Steinhardt, and Dawn Song. Scaling out-of-distribution detection for real-world settings. arXiv preprint arXiv:1911.11132, 2019

  31. [31]

    Densely connected convolutional networks

    Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.\ 4700--4708, 2017

  32. [32]

    Mos: Towards scaling out-of-distribution detection for large semantic space

    Rui Huang and Yixuan Li. Mos: Towards scaling out-of-distribution detection for large semantic space. 2021 ieee. In CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp, pp.\ 8706--8715, 2021

  33. [33]

    A historical perspective of speech recognition

    Xuedong Huang, James Baker, and Raj Reddy. A historical perspective of speech recognition. Communications of the ACM, 57 0 (1): 0 94--103, 2014

  34. [34]

    Detecting out-of-distribution data through in-distribution class prior

    Xue Jiang, Feng Liu, Zhen Fang, Hong Chen, Tongliang Liu, Feng Zheng, and Bo Han. Detecting out-of-distribution data through in-distribution class prior. 2023

  35. [35]

    Training ood detectors in their natural habitats

    Julian Katz-Samuels, Julia B Nakhleh, Robert Nowak, and Yixuan Li. Training ood detectors in their natural habitats. In International Conference on Machine Learning, pp.\ 10848--10865. PMLR, 2022

  36. [36]

    Supervised contrastive learning

    Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning. Advances in neural information processing systems, 33: 0 18661--18673, 2020

  37. [37]

    Robust kernel density estimation

    JooSeuk Kim and Clayton D Scott. Robust kernel density estimation. The Journal of Machine Learning Research, 13 0 (1): 0 2529--2565, 2012

  38. [38]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

  39. [39]

    Imagenet classification with deep convolutional neural networks

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012

  40. [40]

    Tiny imagenet visual recognition challenge

    Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. 2015

  41. [41]

    Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples

    Kimin Lee, Honglak Lee, Kibok Lee, and Jinwoo Shin. Training confidence-calibrated classifiers for detecting out-of-distribution samples. arXiv preprint arXiv:1711.09325, 2017

  42. [42]

    A simple unified framework for detecting out-of-distribution samples and adversarial attacks

    Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems, 31, 2018

  43. [43]

    Your diffusion model is secretly a zero-shot classifier

    Alexander C Li, Mihir Prabhudesai, Shivam Duggal, Ellis Brown, and Deepak Pathak. Your diffusion model is secretly a zero-shot classifier. arXiv preprint arXiv:2303.16203, 2023

  44. [44]

    Enhancing the reliability of out-of-distribution image detection in neural networks

    Shiyu Liang, Yixuan Li, and Rayadurgam Srikant. Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv preprint arXiv:1706.02690, 2017

  45. [45]

    Estimating the partition function by discriminance sampling

    Qiang Liu, Jian Peng, Alexander Ihler, and John Fisher III. Estimating the partition function by discriminance sampling. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, pp.\ 514--522, 2015

  46. [46]

    Energy-based out-of-distribution detection

    Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. Energy-based out-of-distribution detection. Advances in neural information processing systems, 33: 0 21464--21475, 2020

  47. [47]

    Class-incremental learning: survey and performance evaluation on image classification

    Marc Masana, Xialei Liu, Bart omiej Twardowski, Mikel Menta, Andrew D Bagdanov, and Joost Van De Weijer. Class-incremental learning: survey and performance evaluation on image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45 0 (5): 0 5513--5533, 2022

  48. [48]

    Poem: Out-of-distribution detection with posterior sampling

    Yifei Ming, Ying Fan, and Yixuan Li. Poem: Out-of-distribution detection with posterior sampling. In International Conference on Machine Learning, pp.\ 15650--15665. PMLR, 2022 a

  49. [49]

    How to exploit hyperspherical embeddings for out-of-distribution detection? arXiv preprint arXiv:2203.04450, 2022 b

    Yifei Ming, Yiyou Sun, Ousmane Dia, and Yixuan Li. How to exploit hyperspherical embeddings for out-of-distribution detection? arXiv preprint arXiv:2203.04450, 2022 b

  50. [50]

    Learning word embeddings efficiently with noise-contrastive estimation

    Andriy Mnih and Koray Kavukcuoglu. Learning word embeddings efficiently with noise-contrastive estimation. Advances in neural information processing systems, 26, 2013

  51. [51]

    Provable guarantees for understanding out-of-distribution detection

    Peyman Morteza and Yixuan Li. Provable guarantees for understanding out-of-distribution detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.\ 7831--7840, 2022

  52. [52]

    Integral probability metrics and their generating classes of functions

    Alfred M \"u ller. Integral probability metrics and their generating classes of functions. Advances in applied probability, 29 0 (2): 0 429--443, 1997

  53. [53]

    Reading digits in natural images with unsupervised feature learning

    Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011

  54. [54]

    Masked autoregressive flow for density estimation

    George Papamakarios, Theo Pavlakou, and Iain Murray. Masked autoregressive flow for density estimation. Advances in neural information processing systems, 30, 2017

  55. [55]

    Pytorch: An imperative style, high-performance deep learning library

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019

  56. [56]

    Scalable diffusion models with transformers

    William Peebles and Saining Xie. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 4195--4205, 2023

  57. [57]

    Variational inference with normalizing flows

    Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In International conference on machine learning, pp.\ 1530--1538. PMLR, 2015

  58. [58]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj \"o rn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 10684--10695, 2022

  59. [59]

    Mobilenetv2: Inverted residuals and linear bottlenecks

    Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.\ 4510--4520, 2018

  60. [60]

    A survey on approaches of object detection

    Sanjivani Shantaiya, Keshri Verma, and Kamal Mehta. A survey on approaches of object detection. International Journal of Computer Applications, 65 0 (18), 2013

  61. [61]

    Dice: Leveraging sparsification for out-of-distribution detection

    Yiyou Sun and Yixuan Li. Dice: Leveraging sparsification for out-of-distribution detection. In European Conference on Computer Vision, pp.\ 691--708. Springer, 2022

  62. [62]

    React: Out-of-distribution detection with rectified activations

    Yiyou Sun, Chuan Guo, and Yixuan Li. React: Out-of-distribution detection with rectified activations. Advances in Neural Information Processing Systems, 34: 0 144--157, 2021

  63. [63]

    Out-of-distribution detection with deep nearest neighbors

    Yiyou Sun, Yifei Ming, Xiaojin Zhu, and Yixuan Li. Out-of-distribution detection with deep nearest neighbors. In International Conference on Machine Learning, pp.\ 20827--20840. PMLR, 2022

  64. [64]

    Csi: Novelty detection via contrastive learning on distributionally shifted instances

    Jihoon Tack, Sangwoo Mo, Jongheon Jeong, and Jinwoo Shin. Csi: Novelty detection via contrastive learning on distributionally shifted instances. Advances in neural information processing systems, 33: 0 11839--11852, 2020

  65. [65]

    Importance sampling: a review

    Surya T Tokdar and Robert E Kass. Importance sampling: a review. Wiley Interdisciplinary Reviews: Computational Statistics, 2 0 (1): 0 54--60, 2010

  66. [66]

    Tsybakov

    Alexandre B. Tsybakov. Introduction to nonparametric estimation. 2008

  67. [67]

    Neural autoregressive distribution estimation

    Benigno Uria, Marc-Alexandre C \^o t \'e , Karol Gregor, Iain Murray, and Hugo Larochelle. Neural autoregressive distribution estimation. The Journal of Machine Learning Research, 17 0 (1): 0 7184--7220, 2016

  68. [68]

    Partial and asymmetric contrastive learning for out-of-distribution detection in long-tailed recognition

    Haotao Wang, Aston Zhang, Yi Zhu, Shuai Zheng, Mu Li, Alex J Smola, and Zhangyang Wang. Partial and asymmetric contrastive learning for out-of-distribution detection in long-tailed recognition. In International Conference on Machine Learning, pp.\ 23446--23458. PMLR, 2022

  69. [69]

    Out-of-distribution detection with implicit outlier transformation

    Qizhou Wang, Junjie Ye, Feng Liu, Quanyu Dai, Marcus Kalander, Tongliang Liu, Jianye Hao, and Bo Han. Out-of-distribution detection with implicit outlier transformation. arXiv preprint arXiv:2303.05033, 2023

  70. [70]

    Mitigating neural network overconfidence with logit normalization

    Hongxin Wei, Renchunzi Xie, Hao Cheng, Lei Feng, Bo An, and Yixuan Li. Mitigating neural network overconfidence with logit normalization. In International Conference on Machine Learning, pp.\ 23631--23644. PMLR, 2022

  71. [71]

    Unsupervised feature learning via non-parametric instance discrimination

    Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.\ 3733--3742, 2018

  72. [72]

    Sun database: Large-scale scene recognition from abbey to zoo

    Jianxiong Xiao, James Hays, Krista A Ehinger, Aude Oliva, and Antonio Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pp.\ 3485--3492. IEEE, 2010 a

  73. [73]

    Sun database: Large-scale scene recognition from abbey to zoo

    Jianxiong Xiao, James Hays, Krista A Ehinger, Aude Oliva, and Antonio Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pp.\ 3485--3492. IEEE, 2010 b

  74. [74]

    TurkerGaze: Crowdsourcing Saliency with Webcam based Eye Tracking

    Pingmei Xu, Krista A Ehinger, Yinda Zhang, Adam Finkelstein, Sanjeev R Kulkarni, and Jianxiong Xiao. Turkergaze: Crowdsourcing saliency with webcam based eye tracking. arXiv preprint arXiv:1504.06755, 2015

  75. [75]

    LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

    Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015

  76. [76]

    Out-of-distribution detection based on in-distribution data patterns memorization with modern hopfield energy

    Jinsong Zhang, Qiang Fu, Xu Chen, Lun Du, Zelin Li, Gang Wang, Shi Han, Dongmei Zhang, et al. Out-of-distribution detection based on in-distribution data patterns memorization with modern hopfield energy. In The Eleventh International Conference on Learning Representations, 2022

  77. [77]

    Understanding failures in out-of-distribution detection with deep generative models

    Lily Zhang, Mark Goldstein, and Rajesh Ranganath. Understanding failures in out-of-distribution detection with deep generative models. In International Conference on Machine Learning, pp.\ 12427--12436. PMLR, 2021

  78. [78]

    Object detection with deep learning: A review

    Zhong-Qiu Zhao, Peng Zheng, Shou-tao Xu, and Xindong Wu. Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems, 30 0 (11): 0 3212--3232, 2019

  79. [79]

    Improving calibration for long-tailed recognition

    Zhisheng Zhong, Jiequan Cui, Shu Liu, and Jiaya Jia. Improving calibration for long-tailed recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 16489--16498, 2021

  80. [80]

    Places: A 10 million image database for scene recognition

    Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 40 0 (6): 0 1452--1464, 2017

Showing first 80 references.