pith. machine review for the scientific record. sign in

arxiv: 2602.02409 · v3 · submitted 2026-02-02 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Catalyst: Out-of-Distribution Detection via Elastic Scaling

Authors on Pith no claims yet

Pith reviewed 2026-05-16 07:55 UTC · model grok-4.3

classification 💻 cs.CV
keywords out-of-distribution detectionpost-hoc methodelastic scalingpre-pooling feature mapchannel-wise statisticsfalse positive rateResNetCIFAR-10
0
0 comments X

The pith

Catalyst improves out-of-distribution detection by multiplicatively scaling baseline scores with an input-dependent factor derived from pre-pooling channel-wise statistics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper contends that standard post-hoc OOD methods discard valuable information by depending solely on output logits or globally averaged features. Catalyst recovers this signal by deriving an input-specific scaling factor gamma from the mean, standard deviation, and maximum activations across channels in the feature map before pooling. This factor then elastically scales any baseline score, increasing the separation between in-distribution and out-of-distribution examples. The method is shown to work with various detectors and yields large reductions in false positive rates on CIFAR-10, CIFAR-100, and ImageNet benchmarks using ResNet architectures. It requires no retraining and is presented as complementary to prior approaches.

Core claim

Catalyst is a post-hoc framework that computes an input-dependent scaling factor (γ) on-the-fly from the raw channel-wise statistics of the pre-pooling feature map and fuses it multiplicatively with existing baseline OOD scores to push the ID and OOD distributions further apart.

What carries the argument

Elastic scaling via an input-dependent factor γ computed from channel-wise mean, standard deviation, and maximum activation of the pre-pooling feature map, which multiplicatively modulates baseline scores.

If this is right

  • Integrates seamlessly with logit-based methods such as Energy, ReAct, and SCALE.
  • Provides significant boosts to distance-based detectors like KNN.
  • Reduces average false positive rate by 32.87% on CIFAR-10 with ResNet-18.
  • Achieves 27.94% reduction on CIFAR-100 and 22.25% on ImageNet with ResNet-50.
  • Demonstrates that pre-pooling statistics offer complementary signal to GAP and logits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Global average pooling may be discarding per-channel distributional information that is useful for distinguishing OOD inputs.
  • Detectors could be redesigned to use these statistics directly instead of post-hoc scaling.
  • The approach might generalize to other vision architectures or even non-vision domains where intermediate activations are available.
  • It suggests revisiting intermediate layer representations for other safety tasks in neural networks.

Load-bearing premise

The raw channel-wise statistics of the pre-pooling feature map contain a rich complementary signal that is systematically discarded by global average pooling and logit-based scoring.

What would settle it

Running Catalyst on a held-out OOD benchmark with ResNet and finding that the scaled scores yield higher FPR than the unscaled baseline would falsify the improvement claim.

Figures

Figures reproduced from arXiv: 2602.02409 by Abid Hassan, Nenad Medvidovic, Saad Shafiq, Tuan Ngo.

Figure 1
Figure 1. Figure 1: Information cues from each channel before the penultimate layer of a ResNet-50 trained on ImageNet-1k, evaluated with Texture as the OOD dataset. The x-axis shows channel indices; the y-axis shows cue strength. Left to right: (a) µ(x): mean activation, (b) σ(x): standard deviation, (c) max(x): dominant activation, and (d) H(x): entropy per channel. The existing methods have under-explored these distinc￾tiv… view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of Catalyst’s effectiveness. The model is ResNet-50 trained on ImageNet-1k, evaluated on Texture (OOD). Here, we apply γ computed from the channel-maximum statistic (m) multiplicatively to the baseline ReAct. (a) The unscaled score distribution shows more significant overlap than (b) the Catalyst-scaled score distribution. 3. Methodology The key contribution of this paper is Catalyst, a novel … view at source ↗
Figure 3
Figure 3. Figure 3: Sensitivity analysis of the clipping percentile (p) on Catalyst(m) performance. All values averaged over 4 OOD test datasets for a ResNet-50 (ImageNet). 4.6. Comparison with Other Baselines Comparing Catalyst against three contemporary meth￾ods, AdaScale [50], NCI [35] and fDBD [34], con￾firms its superiority, particularly when used as a com￾plementary module. Aginst, AdaScale’s reported reslults using Den… view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of scaling factor \gamma from the penultimate layer of a ResNet-50 trained on ImageNet-1k, evaluated with Texture as the OOD dataset. The scales show clear separation between ID and OOD samples. Left to right: (a) µ(x): mean, (b) σ(x): standard deviation, (c) max(x): max require this assumption to hold to achieve strong perfor￾mance. Assumption 2. The mean scaling factor for ID data is larger … view at source ↗
Figure 5
Figure 5. Figure 5: Distributions of the scaling factor \gamma , derived from the penultimate layer of a MobileNet-V2 model trained on ImageNet-1k. The rows (top to bottom) correspond to the OOD datasets: SUN, Places365, Texture, and iNaturalist. The columns (left to right) correspond to the statistical cue used to compute \gamma : (a) mean: µ(x), (b) standard deviation: σ(x), and (c) maximum value: max(x) ( we used \qopname … view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of scaling factor \gamma from the penultimate layer of a ResNet-18 trained on CIFAR-100, evaluated with Places365 as the OOD dataset. The scales shows high overlap between ID and OOD samples. Left to right: (a) µ(x): mean, (b) σ(x): standard deviation, (c) max(x): max [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Distribution of scaling factor \gamma from the penultimate layer of a DenseNet-101 trained on CIFAR-100, evaluated with Places365 as the OOD dataset. The scales shows high overlap between ID and OOD samples. Left to right: (a) µ(x): mean, (b) σ(x): standard deviation, (c) max(x): max [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Distributions of the scaling factor (γ) are computed from the four residual stages. The model was trained on ImageNet-1K (ID) and evaluated against Texture (OOD). (a-c) The γ distributions from the early-to-mid stages (Layer 1 to Layer 3) show significant overlap between ID and OOD samples, rendering them ineffective as a discriminative signal. (d) In sharp contrast, the distribution from the final residua… view at source ↗
Figure 9
Figure 9. Figure 9: Distribution of the scaling factor, γ, computed using the median statistic. The model is a DenseNet-101 trained on CIFAR-100 (ID), evaluated against the SVHN dataset (OOD). The plot reveals that the OOD distribution is shifted to the right of the ID distribution, indicating that OOD samples produce a higher γ value than ID samples. This contradicts the core assumption of our method, leading to degraded OOD… view at source ↗
Figure 10
Figure 10. Figure 10: Superior OOD separation of γentropy as a standalone score on CIFAR-100. The model is a ResNet-18 trained on CIFAR-100 (ID), evaluated against the Texture dataset (OOD). (Left) Significant distribution overlap between ID and OOD using the baseline Energy score. (Right) Dramatically improved separation using the standalone γentropy score.This visualization confirms the finding from [PITH_FULL_IMAGE:figures… view at source ↗
read the original abstract

Out-of-distribution (OOD) detection is critical for the safe deployment of deep neural networks. State-of-the-art post-hoc methods typically derive OOD scores from the output logits or penultimate feature vector obtained via global average pooling (GAP). We contend that this exclusive reliance on the logit or feature vector discards a rich, complementary signal: the raw channel-wise statistics of the pre-pooling feature map lost in GAP. In this paper, we introduce Catalyst, a post-hoc framework that exploits these under-explored signals. Catalyst computes an input-dependent scaling factor ($\gamma$) on-the-fly from these raw statistics (e.g., mean, standard deviation, and maximum activation). This $\gamma$ is then fused with the existing baseline score, multiplicatively modulating it -- an $\textit{elastic scaling}$ -- to push the ID and OOD distributions further apart. We demonstrate Catalyst is a generalizable framework: it seamlessly integrates with logit-based methods (e.g., Energy, ReAct, SCALE) and also provides a significant boost to distance-based detectors like KNN. As a result, Catalyst achieves substantial and consistent performance gains, reducing the average False Positive Rate by 32.87 on CIFAR-10 (ResNet-18), 27.94% on CIFAR-100 (ResNet-18), and 22.25% on ImageNet (ResNet-50). Our results highlight the untapped potential of pre-pooling statistics and demonstrate that Catalyst is complementary to existing OOD detection approaches. Our code is available here: https://github.com/bingabid/Catalyst

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Catalyst, a post-hoc OOD detection framework that computes an input-dependent scaling factor γ from raw channel-wise statistics (mean, std, max) of the pre-pooling feature map. This γ is multiplicatively fused with baseline scores (e.g., Energy, ReAct, SCALE, KNN) via elastic scaling to increase separation between in-distribution and out-of-distribution samples. Experiments on CIFAR-10/100 (ResNet-18) and ImageNet (ResNet-50) report average FPR reductions of 32.87%, 27.94%, and 22.25% respectively, with code released for reproducibility.

Significance. If the gains are shown to arise specifically from the pre-pooling channel statistics rather than generic input-dependent modulation, the work would be significant by identifying an under-exploited signal complementary to logits and GAP features. The method is simple, training-free, and generalizes across logit-based and distance-based detectors, offering a practical enhancement for safe deployment of DNNs.

major comments (2)
  1. [Method (γ computation and elastic scaling)] The central claim that raw channel-wise statistics of the pre-pooling feature map supply a rich complementary signal systematically lost by GAP (abstract and method description) is load-bearing but unsupported by controls. No experiments replace the statistic-derived γ with a random draw from the same range or a simple function of the baseline score alone; without these, the reported FPR reductions cannot be attributed to the specific statistics rather than any input-dependent scaling.
  2. [Experiments] §4 (experiments): the reported average FPR reductions (32.87 on CIFAR-10, etc.) are presented without error bars, ablation on the choice of statistics (mean/std/max), or statistical significance tests. This undermines assessment of whether the gains are robust or subject to post-hoc selection across the three datasets and two architectures.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'reducing the average False Positive Rate by 32.87' should specify units (e.g., percentage points) and the exact baseline method for each number to avoid ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point-by-point below. We agree that the suggested controls and statistical analyses would strengthen the paper and have revised the manuscript to include them.

read point-by-point responses
  1. Referee: [Method (γ computation and elastic scaling)] The central claim that raw channel-wise statistics of the pre-pooling feature map supply a rich complementary signal systematically lost by GAP (abstract and method description) is load-bearing but unsupported by controls. No experiments replace the statistic-derived γ with a random draw from the same range or a simple function of the baseline score alone; without these, the reported FPR reductions cannot be attributed to the specific statistics rather than any input-dependent scaling.

    Authors: We agree that explicit controls are required to isolate the contribution of the pre-pooling channel statistics. In the revised manuscript we add two sets of controls: (1) γ is replaced by a random scalar drawn uniformly from the empirical range of observed γ values on the same dataset, and (2) γ is replaced by a simple monotonic function of the baseline score alone (e.g., γ = 1 + 0.1 × baseline). The new results, reported in an expanded Section 4.3 and Table 3, show that only the statistic-derived γ produces the claimed FPR reductions; the random and baseline-only variants yield negligible or negative gains. These additions directly support the central claim without altering the original method. revision: yes

  2. Referee: [Experiments] §4 (experiments): the reported average FPR reductions (32.87 on CIFAR-10, etc.) are presented without error bars, ablation on the choice of statistics (mean/std/max), or statistical significance tests. This undermines assessment of whether the gains are robust or subject to post-hoc selection across the three datasets and two architectures.

    Authors: We acknowledge the absence of error bars, statistic ablations, and significance testing. The revised version reruns all experiments over five random seeds and reports mean FPR ± standard deviation. We add a full ablation (new Table 4) that evaluates every subset of {mean, std, max} for computing γ, confirming that the full triplet is optimal. We also include paired t-test p-values (all < 0.01) comparing Catalyst-augmented scores against the corresponding baselines on each dataset/architecture pair. These results appear in Section 4 and the supplementary material. revision: yes

Circularity Check

0 steps flagged

No circularity: γ computed directly from input statistics with no fitting or self-citation reduction

full rationale

The paper defines Catalyst's core mechanism as computing an input-dependent γ on-the-fly from the raw channel-wise statistics (mean, std, max) of the pre-pooling feature map, then multiplicatively scaling an existing baseline score. This computation uses only the current input's own activations and does not fit parameters to OOD labels, baseline scores, or target metrics. No equations reduce the claimed FPR reductions to a fitted parameter by construction, and the text contains no load-bearing self-citations or uniqueness theorems imported from prior author work. The reported gains are presented as empirical results on standard benchmarks rather than a derived necessity. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the method relies on standard assumptions of post-hoc OOD detection and computes gamma directly from feature statistics.

pith-pipeline@v0.9.0 · 5593 in / 982 out tokens · 23246 ms · 2026-05-16T07:55:05.133685+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

91 extracted references · 91 canonical work pages · 3 internal anchors

  1. [1]

    NoiseOut: A Simple Way to Prune Neural Networks

    Mohammad Babaeizadeh, Paris Smaragdis, and Roy H. Campbell. Noiseout: A simple way to prune neural net- works.CoRR, abs/1611.06211, 2016. 8

  2. [2]

    Gradorth: A simple yet efficient out- of-distribution detection with orthogonal projection of gra- dients

    Sima Behpour, Thang Doan, Xin Li, Wenbin He, Liang Gou, and Liu Ren. Gradorth: A simple yet efficient out- of-distribution detection with orthogonal projection of gra- dients. InThirty-seventh Conference on Neural Information Processing Systems, 2023. 6, 8, 4, 13, 14

  3. [3]

    Cimpoi, S

    M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, , and A. Vedaldi. Describing textures in the wild. InProceedings of the IEEE Conf. on Computer Vision and Pattern Recogni- tion, page 3606–3613, 2014. 4, 5, 6, 9, 11, 12

  4. [4]

    Density estimation using real NVP

    Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real NVP. InInternational Con- ference on Learning Representations, 2017. 8

  5. [5]

    Extremely simple activation shaping for out- of-distribution detection

    Andrija Djurisic, Nebojsa Bozanic, Arjun Ashok, and Rosanne Liu. Extremely simple activation shaping for out- of-distribution detection. InThe Eleventh International Con- ference on Learning Representations, 2023. 1, 2, 4, 6, 8, 13, 14, 24

  6. [6]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representa- tions, 2021. 8

  7. [7]

    Unknown-aware object detection: Learning what you don’t know from videos in the wild

    Xuefeng Du, Xin Wang, Gabriel Gozum, and Yixuan Li. Unknown-aware object detection: Learning what you don’t know from videos in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 8

  8. [8]

    To- wards unknown-aware learning with virtual outlier synthe- sis

    Xuefeng Du, Zhaoning Wang, Mu Cai, and Sharon Li. To- wards unknown-aware learning with virtual outlier synthe- sis. InInternational Conference on Learning Representa- tions, 2022. 8

  9. [9]

    Can au- tonomous vehicles identify, recover from, and adapt to dis- tribution shifts? InInternational Conference on Machine Learning (ICML), 2020

    Angelos Filos, Panagiotis Tigas, Rowan McAllister, Nicholas Rhinehart, Sergey Levine, and Yarin Gal. Can au- tonomous vehicles identify, recover from, and adapt to dis- tribution shifts? InInternational Conference on Machine Learning (ICML), 2020. 1

  10. [10]

    Dropout as a bayesian approximation: Representing model uncertainty in deep learning

    Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. InProceedings of The 33rd International Confer- ence on Machine Learning, pages 1050–1059, 2016. 8

  11. [11]

    Soumya Suvra Ghosal, Yiyou Sun, and Yixuan Li. How to overcome curse-of-dimensionality for out-of-distribution de- tection? InProceedings of the Thirty-Eighth AAAI Con- ference on Artificial Intelligence and Thirty-Sixth Confer- ence on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intell...

  12. [12]

    Battle of the backbones: A large-scale comparison of pretrained models across computer vision tasks

    Micah Goldblum, Hossein Souri, Renkun Ni, Manli Shu, Viraj Uday Prabhu, Gowthami Somepalli, Prithvijit Chat- topadhyay, Mark Ibrahim, Adrien Bardes, Judy Hoffman, Rama Chellappa, Andrew Gordon Wilson, and Tom Gold- stein. Battle of the backbones: A large-scale comparison of pretrained models across computer vision tasks. InThirty- seventh Conference on Ne...

  13. [13]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 5, 8

  14. [14]

    Exploring channel-aware typical features for out-of-distribution detec- tion.Proceedings of the AAAI Conference on Artificial Intel- ligence, 38:12402–12410, 2024

    Rundong He, Yue Yuan, Zhongyi Han, Fan Wang, Wan Su, Yilong Yin, Tongliang Liu, and Yongshun Gong. Exploring channel-aware typical features for out-of-distribution detec- tion.Proceedings of the AAAI Conference on Artificial Intel- ligence, 38:12402–12410, 2024. 14

  15. [15]

    Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the prob- lem

    Matthias Hein, Maksym Andriushchenko, and Julian Bitter- wolf. Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the prob- lem. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 41–50, 2019. 1, 8

  16. [16]

    A baseline for detect- ing misclassified and out-of-distribution examples in neural networks

    Dan Hendrycks and Kevin Gimpel. A baseline for detect- ing misclassified and out-of-distribution examples in neural networks. InInternational Conference on Learning Repre- sentations, 2017. 1, 2, 4, 7, 8, 13, 14, 24

  17. [17]

    Deep anomaly detection with outlier exposure

    Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. Deep anomaly detection with outlier exposure. InInterna- tional Conference on Learning Representations, 2019. 8

  18. [18]

    Generalized odin: Detecting out-of-distribution im- age without learning from out-of-distribution data

    Yen-Chang Hsu, Yilin Shen, Hongxia Jin, and Zsolt Kira. Generalized odin: Detecting out-of-distribution im- age without learning from out-of-distribution data. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10948–10957, 2020. 8, 13, 14

  19. [19]

    Weinberger

    Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kil- ian Q. Weinberger. Densely connected convolutional net- works. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 5

  20. [20]

    Mos: Towards scaling out-of- distribution detection for large semantic space

    Rui Huang and Yixuan Li. Mos: Towards scaling out-of- distribution detection for large semantic space. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8710–8719, 2021. 8

  21. [21]

    On the impor- tance of gradients for detecting distributional shifts in the wild

    Rui Huang, Andrew Geng, and Yixuan Li. On the impor- tance of gradients for detecting distributional shifts in the wild. InAdvances in Neural Information Processing Sys- tems, pages 677–689. Curran Associates, Inc., 2021. 6, 8, 13, 14

  22. [22]

    Stacked generative adversarial networks

    Xun Huang, Yixuan Li, Omid Poursaeed, John Hopcroft, and Serge Belongie. Stacked generative adversarial networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 8

  23. [23]

    Ood-maml: Meta- learning for few-shot out-of-distribution detection and clas- sification

    Taewon Jeong and Heeyoung Kim. Ood-maml: Meta- learning for few-shot out-of-distribution detection and clas- sification. InAdvances in Neural Information Processing Systems, pages 3907–3916, 2020. 8

  24. [24]

    Billion- scale similarity search with GPUs.IEEE Transactions on Big Data, 7(3):535–547, 2019

    Jeff Johnson, Matthijs Douze, and Herv ´e J ´egou. Billion- scale similarity search with GPUs.IEEE Transactions on Big Data, 7(3):535–547, 2019. 4

  25. [25]

    Training OOD detectors in their natural habitats

    Julian Katz-Samuels, Julia B Nakhleh, Robert Nowak, and Yixuan Li. Training OOD detectors in their natural habitats. InProceedings of the 39th International Conference on Ma- chine Learning, pages 10848–10865, 2022. 8

  26. [26]

    Auto-encoding varia- tional bayes, 2014

    Diederik P Kingma and Max Welling. Auto-encoding varia- tional bayes, 2014. 8

  27. [27]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009. 4

  28. [28]

    Simple and scalable predictive uncertainty esti- mation using deep ensembles

    Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty esti- mation using deep ensembles. InAdvances in Neural Infor- mation Processing Systems, 2017. 8

  29. [29]

    Training confidence-calibrated classifiers for detecting out- of-distribution samples

    Kimin Lee, Honglak Lee, Kibok Lee, and Jinwoo Shin. Training confidence-calibrated classifiers for detecting out- of-distribution samples. InInternational Conference on Learning Representations, 2018. 8

  30. [30]

    A simple unified framework for detecting out-of-distribution samples and adversarial attacks

    Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. InAdvances in Neural In- formation Processing Systems, 2018. 2, 6, 8, 4, 13, 14

  31. [31]

    Pruning filters for efficient convnets

    Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning filters for efficient convnets. InIn- ternational Conference on Learning Representations, 2017. 8

  32. [32]

    Shiyu Liang, Yixuan Li, and R. Srikant. Enhancing the re- liability of out-of-distribution image detection in neural net- works. InInternational Conference on Learning Represen- tations, 2018. 2, 4, 8, 1, 13, 14, 24

  33. [33]

    Mood: Multi-level out-of-distribution detection

    Ziqian Lin, Sreya Dutta Roy, and Yixuan Li. Mood: Multi-level out-of-distribution detection. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15308–15318, 2021. 8

  34. [34]

    Fast decision boundary based out-of-distribution detector.ICML Workshop or arXiv preprint, 2023

    Litian Liu and Yao Qin. Fast decision boundary based out-of-distribution detector.ICML Workshop or arXiv preprint, 2023. 6, 8, 13, 14

  35. [35]

    Detecting out-of-distribution through the lens of neural collapse

    Litian Liu and Yao Qin. Detecting out-of-distribution through the lens of neural collapse. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 6, 8, 13

  36. [36]

    Energy-based out-of-distribution detection

    Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. Energy-based out-of-distribution detection. InAdvances in Neural Information Processing Systems, pages 21464– 21475. Curran Associates, Inc., 2020. 1, 2, 4, 5, 7, 8, 13, 14, 24

  37. [37]

    A convnet for the 2020s

    Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feicht- enhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 11976– 11986, 2022. 8

  38. [38]

    A simple baseline for bayesian uncertainty in deep learning

    Wesley J Maddox, Pavel Izmailov, Timur Garipov, Dmitry P Vetrov, and Andrew Gordon Wilson. A simple baseline for bayesian uncertainty in deep learning. InAdvances in Neural Information Processing Systems, 2019. 8

  39. [39]

    Predictive uncertainty es- timation via prior networks

    Andrey Malinin and Mark Gales. Predictive uncertainty es- timation via prior networks. InAdvances in Neural Informa- tion Processing Systems, 2018. 8

  40. [40]

    Reverse kl-divergence training of prior networks: Improved uncertainty and adver- sarial robustness

    Andrey Malinin and Mark Gales. Reverse kl-divergence training of prior networks: Improved uncertainty and adver- sarial robustness. InAdvances in Neural Information Pro- cessing Systems, 2019. 8

  41. [41]

    Towards neural net- works that provably know when they don’t know

    Alexander Meinke and Matthias Hein. Towards neural net- works that provably know when they don’t know. InInter- national Conference on Learning Representations, 2020. 8

  42. [42]

    POEM: Out-of- distribution detection with posterior sampling

    Yifei Ming, Ying Fan, and Yixuan Li. POEM: Out-of- distribution detection with posterior sampling. InProceed- ings of the 39th International Conference on Machine Learn- ing, pages 15650–15665, 2022. 8

  43. [43]

    How to exploit hyperspherical embeddings for out-of-distribution detection? InThe Eleventh International Conference on Learning Representations, 2023

    Yifei Ming, Yiyou Sun, Ousmane Dia, and Yixuan Li. How to exploit hyperspherical embeddings for out-of-distribution detection? InThe Eleventh International Conference on Learning Representations, 2023. 14

  44. [44]

    Provable guarantees for understanding out-of-distribution detection

    Peyman Morteza and Yixuan Li. Provable guarantees for understanding out-of-distribution detection. InProceedings of the AAAI conference on Aritificial Intelligence, 2021. 8

  45. [45]

    Do deep generative models know what they don’t know? InInternational Con- ference on Learning Representations, 2019

    Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, and Balaji Lakshminarayanan. Do deep generative models know what they don’t know? InInternational Con- ference on Learning Representations, 2019. 8

  46. [46]

    Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bis- sacco, Bo Wu, and Andrew Y . Ng. Reading digits in natural images with unsupervised feature learning. InNIPS Work- shop on Deep Learning and Unsupervised Feature Learning,

  47. [47]

    Dnn modularization via activation-driven training,

    Tuan Ngo, Abid Hassan, Saad Shafiq, and Nenad Medvi- dovic. Dnn modularization via activation-driven training,

  48. [48]

    Deep neural networks are easily fooled: High confidence predictions for unrecognizable images

    Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In2015 IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 427– 436, 2015. 1, 8

  49. [49]

    Nearest neighbor guidance for out-of-distribution detection,

    Jaewoo Park, Yoon Gyo Jung, and Andrew Beng Jin Teoh. Nearest neighbor guidance for out-of-distribution detection,

  50. [50]

    Adascale: Adaptive scaling for ood de- tection, 2025

    Sudarshan Regmi. Adascale: Adaptive scaling for ood de- tection, 2025. 6, 17

  51. [51]

    Stochastic backpropagation and approximate inference in deep generative models, 2014

    Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wier- stra. Stochastic backpropagation and approximate inference in deep generative models, 2014. 8

  52. [52]

    Roy, J Ren, S Azizi, A Loh, V Natarajan, B Mustafa, N Pawlowski, J Freyberg, Y Liu, and Z Beaver

    A.G. Roy, J Ren, S Azizi, A Loh, V Natarajan, B Mustafa, N Pawlowski, J Freyberg, Y Liu, and Z Beaver. Does your dermatology classifier know what it doesn’t know? detecting the long-tail of unseen conditions.CoRR, arXiv:2104.03829,

  53. [53]

    Ssd: A unified framework for self-supervised outlier detection

    Vikash Sehwag, Mung Chiang, and Prateek Mittal. Ssd: A unified framework for self-supervised outlier detection. In Proceedings of the International Conference on Learning Representations, 2021. 8, 4

  54. [54]

    Dice: Leveraging sparsifica- tion for out-of-distribution detection

    Yiyou Sun and Yixuan Li. Dice: Leveraging sparsifica- tion for out-of-distribution detection. InComputer Vision – ECCV 2022, pages 691–708. Springer Nature Switzerland,

  55. [55]

    1, 2, 6, 8, 4, 13, 14, 15, 24

  56. [56]

    React: Out-of- distribution detection with rectified activations

    Yiyou Sun, Chuan Guo, and Yixuan Li. React: Out-of- distribution detection with rectified activations. InAdvances in Neural Information Processing Systems, pages 144–157. Curran Associates, Inc., 2021. 1, 2, 3, 4, 5, 6, 8, 13, 14, 15, 24

  57. [57]

    Out-of- distribution detection with deep nearest neighbors

    Yiyou Sun, Yifei Ming, Xiaojin Zhu, and Yixuan Li. Out-of- distribution detection with deep nearest neighbors. InPro- ceedings of the 39th International Conference on Machine Learning, pages 20827–20840, 2022. 1, 2, 4, 6, 8, 13, 14, 24

  58. [58]

    Tabak and Turner Cristina

    E. Tabak and Turner Cristina. A family of nonparametric density estimation algorithms.Communications on Pure and Applied Mathematics, 66:145–164, 2013. 8

  59. [59]

    Uncertainty estimation using a single deep deter- ministic neural network

    Joost Van Amersfoort, Lewis Smith, Yee Whye Teh, and Yarin Gal. Uncertainty estimation using a single deep deter- ministic neural network. InProceedings of the 37th Interna- tional Conference on Machine Learning, pages 9690–9700,

  60. [60]

    Condi- tional image generation with pixelcnn decoders

    Aaron van den Oord, Nal Kalchbrenner, Lasse Espeholt, ko- ray kavukcuoglu, Oriol Vinyals, and Alex Graves. Condi- tional image generation with pixelcnn decoders. InAdvances in Neural Information Processing Systems, 2016. 8

  61. [61]

    The inaturalist species classification and de- tection dataset

    Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The inaturalist species classification and de- tection dataset. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 5, 9

  62. [62]

    Can multi-label classification networks know what they don’t know? InAdvances in Neural Information Processing Systems, 2021

    Haoran Wang, Weitang Liu, Alex Bocchieri, and Yixuan Li. Can multi-label classification networks know what they don’t know? InAdvances in Neural Information Processing Systems, 2021. 8

  63. [63]

    Vim: Out-of-distribution with virtual-logit matching

    Haoqi Wang, Zhizhong Li, Litong Feng, and Wayne Zhang. Vim: Out-of-distribution with virtual-logit matching. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022. 5, 8, 7, 13, 14

  64. [64]

    Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mo- hammadhadi Bagheri, and Ronald M. Summers. Chestx- ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of com- mon thorax diseases. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 1

  65. [65]

    Mitigating neural network overconfidence with logit normalization

    Hongxin Wei, Renchunzi Xie, Hao Cheng, Lei Feng, Bo An, and Yixuan Li. Mitigating neural network overconfidence with logit normalization. InProceedings of the 39th Interna- tional Conference on Machine Learning, 2022. 8

  66. [66]

    Con- vnext v2: Co-designing and scaling convnets with masked autoencoders

    Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, and Saining Xie. Con- vnext v2: Co-designing and scaling convnets with masked autoencoders. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16133–16142, 2023. 8

  67. [67]

    Ehinger, Aude Oliva, and Antonio Torralba

    Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva, and Antonio Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In2010 IEEE Computer So- ciety Conference on Computer Vision and Pattern Recogni- tion, pages 3485–3492, 2010. 5, 9

  68. [68]

    Scaling for training time and post-hoc out-of-distribution de- tection enhancement

    Kai Xu, Rongyu Chen, Gianni Franchi, and Angela Yao. Scaling for training time and post-hoc out-of-distribution de- tection enhancement. InThe Twelfth International Confer- ence on Learning Representations, 2024. 1, 2, 8, 13, 14, 16

  69. [69]

    TurkerGaze: Crowdsourcing Saliency with Webcam based Eye Tracking

    Pingmei Xu, Krista A. Ehinger, Yinda Zhang, Adam Finkel- stein, Sanjeev R. Kulkarni, and Jianxiong Xiao. Turkergaze: Crowdsourcing saliency with webcam based eye tracking. CoRR, 1504.06755, 2015. 5, 6, 11, 12

  70. [70]

    Semantically Coherent Out-of-Distribution Detection

    Jingkang Yang, Haoqi Wang, Litong Feng, Xiaopeng Yan, Huabin Zheng, Wayne Zhang, and Ziwei Liu. Semantically Coherent Out-of-Distribution Detection . In2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021. 8

  71. [71]

    LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

    Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianx- iong Xiao. Lsun: construction of a large-scale image dataset using deep learning with humans in the loop.CoRR, 1506.03365, 2015. 4, 5, 6, 11, 12

  72. [72]

    Visualizing and under- standing convolutional networks, 2013

    Matthew D Zeiler and Rob Fergus. Visualizing and under- standing convolutional networks, 2013. 7

  73. [73]

    Openood v1

    Jingyang Zhang, Jingkang Yang, Pengyun Wang, Haoqi Wang, Yueqian Lin, Haoran Zhang, Yiyou Sun, Xue- feng Du, Kaiyang Zhou, Wayne Zhang, Yixuan Li, Ziwei Liu, Yiran Chen, and Li Hai. Openood v1.5: Enhanced benchmark for out-of-distribution detection.arXiv preprint arXiv:2306.09301, 2023. 5

  74. [74]

    Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analy- sis and Machine Intelligence, 40(6):1452–1464, 2017

    Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analy- sis and Machine Intelligence, 40(6):1452–1464, 2017. 4, 5, 6, 9, 11, 12

  75. [75]

    Boosting out-of-distribution detection with typical features,

    Yao Zhu, YueFeng Chen, Chuanlong Xie, Xiaodan Li, Rong Zhang, Hui Xue, Xiang Tian, bolun zheng, and Yaowu Chen. Boosting out-of-distribution detection with typical features,

  76. [76]

    13, 14 Catalyst: Out-of-Distribution Detection via Elastic Scaling Supplementary Material A. Description of Baseline Methods In resonance with existing work [5, 36, 54, 55], for the reader’s convenience, we summarize in detail a few com- mon techniques for defining OOD scores that measure the degree of ID-ness on the given sample. All the methods derive t...

  77. [77]

    Compute the p-th percentile threshold t of h(x)

  78. [78]

    Let s1 =P h(x), the sum of all activation values before pruning

  79. [79]

    Set all values in h(x) less than t to zero

  80. [80]

    Let s2 =P h(x), the sum after pruning

Showing first 80 references.