pith. sign in

arxiv: 2606.29952 · v1 · pith:52LGOVCDnew · submitted 2026-06-29 · 💻 cs.LG · cs.AI· cs.CV

Exploiting Local Flatness for Efficient Out-of-Distribution Detection

Pith reviewed 2026-06-30 07:40 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV
keywords out-of-distribution detectionHessian curvaturelocal flatnesspost-hoc methodsfeature normalizationmachine learning robustness
0
0 comments X

The pith

Out-of-distribution inputs exhibit larger Hessian curvature in feature space than in-distribution data, widening with stronger shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how the curvature of the loss landscape differs between in-distribution and out-of-distribution inputs in pre-trained networks. It establishes that OOD samples produce higher curvature values in the feature Hessian, and that this separation grows as the distributional shift increases. From this observation the authors derive Fold, a detector that applies the feature Hessian together with partial feature normalization to mark OOD points without computing full parameter-space curvature. A companion procedure called AutoFold generates pseudo-OOD samples from the model’s own logits to tune the normalization automatically. The resulting method runs at the cost of one forward pass and raises average AUROC while lowering FPR95 on standard OOD benchmarks.

Core claim

OOD inputs exhibit larger Hessian curvature than ID data, with the gap widening under stronger distributional shifts. Fold exploits this discrepancy by computing the feature Hessian and applying partial feature normalization, thereby improving ID-OOD separability while sidestepping the expense of parameter-space curvature estimates. AutoFold supplies a self-supervised calibration step that creates pseudo-OOD examples via ID logit masking, removing the need for external data.

What carries the argument

The feature Hessian combined with partial feature normalization, which quantifies local flatness directly in feature space to separate ID from OOD inputs.

If this is right

  • OOD detection becomes possible with only a forward pass and a single Hessian-vector product in feature space.
  • Partial normalization of the feature Hessian improves separability without requiring full parameter-space computations.
  • AutoFold enables automatic threshold calibration using only the model’s own predictions on ID data.
  • The curvature signal strengthens as distributional shift increases, suggesting graded uncertainty estimates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same curvature signal could be tested on other tasks that rely on model uncertainty, such as selective classification or active learning.
  • If the feature-Hessian gap generalizes, lightweight curvature checks might replace heavier ensemble or temperature-scaling baselines in resource-constrained settings.
  • The approach invites direct comparison against gradient-norm or logit-based scores on the same architectures to isolate the contribution of curvature.

Load-bearing premise

The observed curvature gap between ID and OOD examples remains consistent enough after partial normalization to serve as a reliable detection signal across models and datasets.

What would settle it

A controlled experiment in which OOD samples produce equal or lower feature-Hessian curvature than ID samples on a standard benchmark would falsify the central observation.

Figures

Figures reproduced from arXiv: 2606.29952 by Dongyeop Lee, Hyunji Jung, Namhoon Lee, Seonghwan Park.

Figure 1
Figure 1. Figure 1: (a) The trace of Hessian, serving as a metric for loss landscape sharpness, progressively increases as the data shifts from ID to near-OOD and far-OOD. (b) The trade-off between detection performance (i.e., standardized AUROC) and computational cost (i.e., standardized computation time). The top-left corner represents the ideal region of high accuracy and low latency. Our proposed FOLD and its automaticall… view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of per-sample Hessian trace for ID and OOD data, encompassing both near￾and far-OOD settings. For models trained on CIFAR, we use TIN [31] as the near-OOD dataset and SVHN [42] as the far-OOD dataset, while for models trained on ImageNet, we use NINCO [5] and iNaturalist [61] as the near- and far-OOD datasets, respectively. Across datasets, ID samples are concentrated at small values, whereas … view at source ↗
Figure 3
Figure 3. Figure 3: Setup and inference times of baseline methods on ImageNet-1K, ordered left to right. All measurements are conducted on a single A100 GPU with a batch size of 256. FOLD incurs near-minimal setup and inference overhead. AUTOFOLD, which automatically determines the optimal α, introduces only a marginal increase in setup time and no additional inference cost. As shown in [PITH_FULL_IMAGE:figures/full_fig_p008… view at source ↗
Figure 4
Figure 4. Figure 4: Ablation study on the normalization parameter α. On CIFAR-10, larger α values con￾sistently improve AUROC and FPR95 across all FOLD variants, whereas relatively smaller α values yield better performance on ImageNet-200, indicating dataset-dependent sensitivity to α. In contrast, AUTOFOLD automatically identifies a near-optimal α for each dataset, and the resulting performance (dotted line) closely matches … view at source ↗
Figure 5
Figure 5. Figure 5: Scatter plots illustrating the relationship between feature norm and FOLD score under three normalization regimes. Partial normalization enhances ID–OOD separa￾bility by increasing the mean score gap and reducing OOD variance, yielding a higher Cohen’s d (1.605 → 1.689) [8]. In contrast, full normalization removes magnitude infor￾mation, collapsing score variance and markedly reducing discriminability (Coh… view at source ↗
Figure 6
Figure 6. Figure 6: Top-50 absolute eigenvalues of the feature Hessian for ID and OOD samples on ImageNet￾200 under no, partial, and full normalization. Full normalization collapses the spectral gap, making the ID and OOD spectra nearly identical. Without normalization, the difference is concentrated in only the top portion of the spectrum. In contrast, partial normalization enlarges the spectral gap across a broader range of… view at source ↗
Figure 7
Figure 7. Figure 7: Empirical spectral densities of the parameter and feature Hessians for ID and OOD data on CIFAR-10 and ImageNet-200. Consistent with the full-parameter Hessian, the feature Hessian demonstrates a monotonic increase in the largest eigenvalue when transitioning from ID to OOD samples. Furthermore, the feature Hessian exhibits a broader spectral distribution than the parameter Hessian, suggesting a more prono… view at source ↗
Figure 8
Figure 8. Figure 8: Parameter-space Hessian ESD of CIFAR-10–trained models across multiple OOD datasets. Colors indicate ID (blue), near-OOD (orange), and far-OOD (green). 0.00 0.15 0.30 0.45 Eigenvalue 10 5 10 2 10 1 10 4 Density (log) ID: CIFAR-10 (a) CIFAR-10 0.00 0.15 0.30 0.45 Eigenvalue 10 5 10 2 10 1 10 4 Density (log) Near OOD: CIFAR-100 (b) CIFAR-100 0.00 0.15 0.30 0.45 Eigenvalue 10 5 10 2 10 1 10 4 Density (log) Ne… view at source ↗
Figure 9
Figure 9. Figure 9: Feature Hessian ESD of CIFAR-10–trained models across multiple OOD datasets. Colors indicate ID (blue), near-OOD (orange), and far-OOD (green) [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Parameter-space Hessian ESD of ImageNet-200–trained models across multiple OOD datasets. Colors indicate ID (blue), near-OOD (orange), and far-OOD (green). 0.000 0.025 0.050 0.075 Eigenvalue 10 5 10 2 10 1 10 4 Density (log) ID: ImageNet-200 (a) ImageNet-200 0.000 0.025 0.050 0.075 Eigenvalue 10 5 10 2 10 1 10 4 Density (log) Near OOD: SSB-Hard (b) SSB-Hard 0.000 0.025 0.050 0.075 Eigenvalue 10 5 10 2 10 … view at source ↗
Figure 11
Figure 11. Figure 11: Feature Hessian ESD of ImageNet-200–trained models across multiple OOD datasets. Colors indicate ID (blue), near-OOD (orange), and far-OOD (green). B.2 Curvature Analysis Building upon the sample-level curvature analysis presented in the main text, we provide additional visualizations of the local loss landscape across multiple datasets. Specifically, we plot the distributions of sample-wise Hessian trace… view at source ↗
Figure 12
Figure 12. Figure 12: Per-sample Hessian trace distribution for a CIFAR-10–trained model evaluated on multiple OOD datasets. Colors indicate ID (blue), near-OOD (orange), and far-OOD (green). 0 20000 40000 60000 Hessian Trace 0.0 1.5 3.0 4.5 6.0 Density ×10 5 ID: CIFAR-100 (a) CIFAR-100 0 20000 40000 60000 Hessian Trace 0.0 1.5 3.0 4.5 6.0 Density ×10 5 Near OOD: CIFAR-10 (b) CIFAR-10 0 20000 40000 60000 Hessian Trace 0.0 1.5 … view at source ↗
Figure 13
Figure 13. Figure 13: Per-sample Hessian trace distribution for a CIFAR-100–trained model evaluated on multiple OOD datasets. Colors indicate ID (blue), near-OOD (orange), and far-OOD (green). 0 40000 80000 120000 160000 Hessian Trace 0.0 0.8 1.6 2.4 3.2 Density ×10 5 ID: ImageNet-200 (a) ImageNet-200 0 40000 80000 120000 160000 Hessian Trace 0.0 0.8 1.6 2.4 3.2 Density ×10 5 Near OOD: SSB-Hard (b) SSB-Hard 0 40000 80000 12000… view at source ↗
Figure 14
Figure 14. Figure 14: Per-sample Hessian trace distribution for a ImageNet-200–trained model evaluated on multiple OOD datasets. Colors indicate ID (blue), near-OOD (orange), and far-OOD (green) [PITH_FULL_IMAGE:figures/full_fig_p026_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Per-sample Hessian trace distribution for a ImageNet-1K–trained model evaluated on multiple OOD datasets. Colors indicate ID (blue), near-OOD (orange), and far-OOD (green). B.3 OOD Benchmark Results To provide a comprehensive evaluation of detection performance, we report AUROC and FPR95 across the OOD benchmarks introduced in Section 4. Detailed per-dataset results are presented in [PITH_FULL_IMAGE:figu… view at source ↗
read the original abstract

Detecting out-of-distribution (OOD) data is crucial for reliable machine learning deployment. Among detection strategies, post-hoc methods are particularly attractive due to their efficiency, as they operate directly on pre-trained networks without requiring retraining. Within this paradigm, one promising direction exploits loss-landscape curvature to estimate model uncertainty; however, such methods incur substantial computational cost and rely on implicit assumptions about how landscape flatness differs between in-distribution (ID) and OOD data. In this work, we provide the first systematic investigation of this curvature discrepancy and show that OOD inputs exhibit larger Hessian curvature than ID data, with the gap widening under stronger distributional shifts. Motivated by these observations, we propose Fold, a lightweight flatness-modulated OOD detector that leverages the feature Hessian and partial feature normalization to improve ID-OOD separability while avoiding costly parameter-space curvature approximations. To optimally adapt this normalization across diverse datasets, we further introduce AutoFold, a self-supervised tuning scheme that synthesizes pseudo-OOD samples via ID logit masking for automatic calibration without requiring external data. Experiments on OOD benchmarks show that Fold outperforms prior methods, improving the average AUROC by 1.63% and reducing FPR95 by 2.30%, while maintaining computational efficiency comparable to a standard forward pass. Supported by theoretical analysis and extensive ablations, Fold provides a principled and practical solution for robust real-world deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that OOD inputs exhibit larger feature-Hessian curvature than ID data (with the gap widening under stronger shifts), provides the first systematic investigation of this discrepancy, and introduces Fold: a lightweight post-hoc detector that uses the feature Hessian plus partial feature normalization to improve separability while avoiding full parameter-space Hessian costs. It further proposes AutoFold, a self-supervised scheme that generates pseudo-OOD samples via ID logit masking to tune the normalization parameters without external data. Experiments report average gains of 1.63% AUROC and 2.30% FPR95 reduction over prior methods at forward-pass cost, supported by theoretical analysis and ablations.

Significance. If the curvature discrepancy is shown to be general rather than architecture- or dataset-specific, Fold would supply a practical, efficient addition to the post-hoc OOD toolkit that sidesteps expensive curvature approximations. The self-supervised AutoFold component is a notable strength for real-world applicability. The modest reported gains, however, suggest incremental rather than transformative impact, and the result's value hinges on verification that the observed flatness gap is not an artifact of the tested CNNs or ID/OOD splits.

major comments (2)
  1. [Abstract] Abstract and empirical investigation: the load-bearing claim that OOD inputs exhibit reliably larger feature-Hessian curvature than ID data (widening with shift strength) is presented as a general property enabling Fold; yet the manuscript provides no explicit cross-architecture validation (e.g., on transformers) to address the possibility that the gap is driven by inductive biases of the tested CNNs, which would make the partial-normalization step fit to those biases rather than exploit a fundamental flatness difference.
  2. [AutoFold] AutoFold description: the scheme synthesizes pseudo-OOD via ID logit masking and adapts normalization parameters on these internally generated samples; this construction risks circularity because the detector is calibrated on quantities derived from the ID model itself, and the paper must demonstrate that the resulting separability is not an artifact of this self-generation process (e.g., via ablation removing the masking step).
minor comments (2)
  1. The abstract states 'the first systematic investigation' without referencing prior curvature-based OOD works; a brief related-work paragraph would clarify the precise novelty.
  2. [Experiments] Reported gains (1.63% AUROC, 2.30% FPR95) are averages; the paper should include per-dataset tables with standard deviations across multiple random seeds to allow assessment of statistical reliability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We respond to each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract and empirical investigation: the load-bearing claim that OOD inputs exhibit reliably larger feature-Hessian curvature than ID data (widening with shift strength) is presented as a general property enabling Fold; yet the manuscript provides no explicit cross-architecture validation (e.g., on transformers) to address the possibility that the gap is driven by inductive biases of the tested CNNs, which would make the partial-normalization step fit to those biases rather than exploit a fundamental flatness difference.

    Authors: We thank the referee for this observation. Our experiments and analysis are performed on the CNN architectures that dominate the OOD detection literature and the specific benchmarks we evaluate. The curvature gap is shown to be consistent across multiple CNN families and to increase with shift strength, which directly motivates the Fold design. We do not assert that the phenomenon holds for every possible architecture. In the revised manuscript we will add an explicit limitations paragraph clarifying the scope of the empirical claims and noting the lack of transformer results as an open question for future work. revision: partial

  2. Referee: [AutoFold] AutoFold description: the scheme synthesizes pseudo-OOD via ID logit masking and adapts normalization parameters on these internally generated samples; this construction risks circularity because the detector is calibrated on quantities derived from the ID model itself, and the paper must demonstrate that the resulting separability is not an artifact of this self-generation process (e.g., via ablation removing the masking step).

    Authors: We agree that an explicit check is warranted to confirm the masking step is not merely an artifact. The current manuscript already contains ablations on AutoFold components and a theoretical justification for the masking procedure. To address the referee's specific request, we will add a new ablation that directly compares AutoFold performance when the logit-masking step is removed (i.e., calibration performed on unmodified ID samples or random perturbations). This will be included in the revised version. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's core chain begins with an empirical observation of curvature discrepancy (presented as a systematic investigation, not derived from Fold), which motivates the use of feature Hessian and partial normalization. AutoFold's synthesis of pseudo-OOD via logit masking is a self-supervised calibration step whose parameters are adapted on generated samples but evaluated for detection performance on external real OOD benchmarks; this does not reduce the reported AUROC/FPR95 gains to a quantity fitted by construction. No equations, self-citations, or uniqueness claims reduce the central result to its inputs. The derivation is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical observation that OOD data produces higher feature-Hessian curvature and on the effectiveness of partial normalization for separability; no explicit free parameters or invented entities are named in the abstract.

axioms (1)
  • domain assumption OOD inputs exhibit larger Hessian curvature than ID data in feature space, with the gap increasing under stronger shifts
    This is the load-bearing observation that motivates Fold.

pith-pipeline@v0.9.1-grok · 5793 in / 1196 out tokens · 25277 ms · 2026-06-30T07:40:01.727386+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

73 extracted references · 2 canonical work pages · 1 internal anchor

  1. [1]

    In: AAAI (2020)

    Ahmed, F., Courville, A.: Detecting semantic anomalies. In: AAAI (2020)

  2. [2]

    Journal of the ACM (2011)

    Avron, H., Toledo, S.: Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. Journal of the ACM (2011)

  3. [3]

    Journal of Computational and Applied Mathematics (1996)

    Bai, Z., Fahey, G., Golub, G.: Some large-scale matrix computation problems. Journal of Computational and Applied Mathematics (1996)

  4. [4]

    In: CVPR (2016)

    Bendale, A., Boult, T.E.: Towards open set deep networks. In: CVPR (2016)

  5. [5]

    In: ICML (2023)

    Bitterwolf, J., Müller, M., Hein, M.: In or out? fixing imagenet out-of-distribution detection evaluation. In: ICML (2023)

  6. [6]

    In: NeurIPS (2023)

    Chen, C., Fu, Z., Liu, K., Chen, Z., Tao, M., Ye, J.: Optimal parameter and neuron pruning for out-of-distribution detection. In: NeurIPS (2023)

  7. [7]

    In: CVPR (2014)

    Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: CVPR (2014)

  8. [8]

    routledge (2013)

    Cohen, J.: Statistical power analysis for the behavioral sciences. routledge (2013)

  9. [9]

    In: CVPR (2009)

    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: CVPR (2009)

  10. [10]

    IEEE signal processing magazine (2012)

    Deng, L.: The mnist database of handwritten digit images for machine learning research. IEEE signal processing magazine (2012)

  11. [12]

    Learning Confidence for Out-of-Distribution Detection in Neural Networks

    DeVries, T., Taylor, G.W.: Learning confidence for out-of-distribution detection in neural networks. arXiv preprint arXiv:1802.04865 (2018)

  12. [13]

    In: NeurIPS (2018)

    Dhamija, A.R., Günther, M., Boult, T.: Reducing network agnostophobia. In: NeurIPS (2018)

  13. [14]

    In: ICLR (2023)

    Djurisic, A., Bozanic, N., Ashok, A., Liu, R.: Extremely simple activation shaping for out-of-distribution detection. In: ICLR (2023)

  14. [15]

    In: ICLR (2022)

    Du, X., Wang, Z., Cai, M., Li, Y .: VOS: Learning what you don’t know by virtual outlier synthesis. In: ICLR (2022)

  15. [16]

    In: NeurIPS (2024)

    Fang, K., et al.: Kernel PCA for out-of-distribution detection. In: NeurIPS (2024)

  16. [17]

    In: ICML (2017)

    Guo, C., Pleiss, G., Sun, Y ., Weinberger, K.Q.: On calibration of modern neural networks. In: ICML (2017)

  17. [18]

    In: CVPR (2016)

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

  18. [19]

    In: ICML (2022)

    Hendrycks, D., Basart, S., Mazeika, M., Mostajabi, M., Steinhardt, J., Song, D.: Scaling out-of-distribution detection for real-world settings. In: ICML (2022)

  19. [20]

    In: ICLR (2017)

    Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of- distribution examples in neural networks. In: ICLR (2017)

  20. [21]

    In: ICLR (2019)

    Hendrycks, D., Mazeika, M., Dietterich, T.: Deep anomaly detection with outlier exposure. In: ICLR (2019)

  21. [22]

    In: CVPR (2020)

    Hsu, Y .C., Shen, Y ., Jin, H., Kira, Z.: Generalized odin: Detecting out-of- distribution image without learning from out-of-distribution data. In: CVPR (2020)

  22. [23]

    In: CVPR (2017) Exploiting Local Flatness for Efficient Out-of-Distribution Detection 17

    Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017) Exploiting Local Flatness for Efficient Out-of-Distribution Detection 17

  23. [24]

    In: NeurIPS (2021)

    Huang, R., Geng, A., Li, Y .: On the importance of gradients for detecting distribu- tional shifts in the wild. In: NeurIPS (2021)

  24. [25]

    Communications in Statistics-Simulation and Computation (1989)

    Hutchinson, M.F.: A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines. Communications in Statistics-Simulation and Computation (1989)

  25. [26]

    In: COLT (2019)

    Ji, Z., Telgarsky, M.: The implicit bias of gradient descent on nonseparable data. In: COLT (2019)

  26. [27]

    In: ICML (2020)

    Kristiadi, A., Hein, M., Hennig, P.: Being bayesian, even just a bit, fixes overconfi- dence in relu networks. In: ICML (2020)

  27. [28]

    Master’s thesis, Department of Computer Science, University of Toronto (2009)

    Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto (2009)

  28. [29]

    IJCV (2020)

    Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A., et al.: The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. IJCV (2020)

  29. [30]

    In: NeurIPS (2017)

    Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: NeurIPS (2017)

  30. [31]

    CS 231N (2015)

    Le, Y ., Yang, X.: Tiny imagenet visual recognition challenge. CS 231N (2015)

  31. [32]

    In: ICLR (2018)

    Lee, K., Lee, H., Lee, K., Shin, J.: Training confidence-calibrated classifiers for detecting out-of-distribution samples. In: ICLR (2018)

  32. [33]

    In: NeurIPS (2018)

    Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In: NeurIPS (2018)

  33. [34]

    In: ICML (2023)

    Lee, S., Park, J., Lee, J.: Implicit jacobian regularization weighted with impurity of probability output. In: ICML (2023)

  34. [35]

    In: ICLR (2018)

    Liang, S., Li, Y ., Srikant, R.: Enhancing the reliability of out-of-distribution image detection in neural networks. In: ICLR (2018)

  35. [36]

    In: NeurIPS (2020)

    Liu, W., Wang, X., Owens, J., Li, Y .: Energy-based out-of-distribution detection. In: NeurIPS (2020)

  36. [37]

    In: ICLR (2024)

    Liu, Y ., Chris, X., Li, H., Ma, L., Wang, S.: Neuron activation coverage: Rethinking out-of-distribution detection and generalization. In: ICLR (2024)

  37. [38]

    In: ICLR (2019)

    Madras, D., Atwood, J., D’Amour, A.: Detecting extrapolation with local ensem- bles. In: ICLR (2019)

  38. [39]

    In: CVPR (2017)

    Moosavi-Dezfooli, S.M., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. In: CVPR (2017)

  39. [40]

    In: ICML (2025)

    Mueller, M., Hein, M.: Mahalanobis++: Improving ood detection via feature nor- malization. In: ICML (2025)

  40. [41]

    In: AISTATS (2019)

    Nacson, M.S., Srebro, N., Soudry, D.: Stochastic gradient descent on separable data: Exact convergence with a fixed learning rate. In: AISTATS (2019)

  41. [42]

    In: NeurIPS Workshop on deep learning and unsupervised feature learning (2011)

    Netzer, Y ., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y ., et al.: Reading digits in natural images with unsupervised feature learning. In: NeurIPS Workshop on deep learning and unsupervised feature learning (2011)

  42. [43]

    In: CVPR (2015)

    Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: CVPR (2015)

  43. [44]

    In: ICLR (2026) 18 Park et al

    Nguyen, A., Bertrand, A., Le Hégarat-Mascle, S., Aldea, E., FLORIN, F., EL- KORSO, M.N., LUSTRAT, R.: Fisher-rao sensitivity for out-of-distribution detec- tion in deep neural networks. In: ICLR (2026) 18 Park et al

  44. [45]

    In: ICML (2023)

    Oh, J., Yun, C.: Provable benefit of mixup for finding optimal decision boundaries. In: ICML (2023)

  45. [46]

    In: ICCV (2023)

    Park, J., Chai, J.C.L., Yoon, J., Teoh, A.B.J.: Understanding the feature norm for out-of-distribution detection. In: ICCV (2023)

  46. [47]

    In: CVPR (2020)

    Radosavovic, I., Kosaraju, R.P., Girshick, R., He, K., Dollár, P.: Designing network design spaces. In: CVPR (2020)

  47. [48]

    In: NeurIPS (2024)

    Ravikumar, D., Soufleri, E., Roy, K.: Curvature clues: Decoding deep learning privacy with input loss curvature. In: NeurIPS (2024)

  48. [49]

    In: ICML Workshop on Uncertainty and Robustness in Deep Learning (2021)

    Ren, J., Fort, S., Liu, J., Roy, A.G., Padhy, S., Lakshminarayanan, B.: A simple fix to mahalanobis distance for improving near-ood detection. In: ICML Workshop on Uncertainty and Robustness in Deep Learning (2021)

  49. [50]

    In: ICLR (2018)

    Ritter, H., Botev, A., Barber, D.: A scalable laplace approximation for neural networks. In: ICLR (2018)

  50. [51]

    In: ICML (2018)

    Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S.A., Binder, A., Müller, E., Kloft, M.: Deep one-class classification. In: ICML (2018)

  51. [52]

    In: ICLR (2026)

    Seleznova, M., et al.: GradPCA: Leveraging NTK alignment for reliable out-of- distribution detection. In: ICLR (2026)

  52. [53]

    In: UAI (2021)

    Sharma, A., Azizan, N., Pavone, M.: Sketching curvature for efficient out-of- distribution detection for deep neural networks. In: UAI (2021)

  53. [54]

    JMLR (2018)

    Soudry, D., Hoffer, E., Nacson, M.S., Gunasekar, S., Srebro, N.: The implicit bias of gradient descent on separable data. JMLR (2018)

  54. [55]

    In: NeurIPS (2021)

    Sun, Y ., Guo, C., Li, Y .: React: Out-of-distribution detection with rectified activa- tions. In: NeurIPS (2021)

  55. [56]

    In: ECCV (2022)

    Sun, Y ., Li, S.: Dice: Leveraging sparsification for out-of-distribution detection. In: ECCV (2022)

  56. [57]

    In: ICML (2022)

    Sun, Y ., Ming, Y ., Zhu, X., Li, Y .: Out-of-distribution detection with deep nearest neighbors. In: ICML (2022)

  57. [58]

    In: ICLR (2014)

    Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. In: ICLR (2014)

  58. [59]

    In: NeurIPS (2020)

    Tack, J., Mo, S., Jeong, J., Shin, J.: Csi: Novelty detection via contrastive learning on distributionally shifted instances. In: NeurIPS (2020)

  59. [60]

    In: ICLR (2023)

    Tao, L., Du, X., Zhu, X., Li, Y .: Non-parametric outlier synthesis. In: ICLR (2023)

  60. [61]

    In: CVPR (2018)

    Van Horn, G., Mac Aodha, O., Song, Y ., Cui, Y ., Sun, C., Shepard, A., Adam, H., Perona, P., Belongie, S.: The inaturalist species classification and detection dataset. In: CVPR (2018)

  61. [62]

    In: ICLR (2022)

    Vaze, S., Han, K., Vedaldi, A., Zisserman, A.: Open-set recognition: A good closed-set classifier is all you need. In: ICLR (2022)

  62. [63]

    In: ECCV (2018)

    Vyas, A., Jammalamadaka, N., Zhu, X., Das, D., Kaul, B., Willke, T.L.: Out-of- distribution detection using an ensemble of self supervised leave-out classifiers. In: ECCV (2018)

  63. [64]

    In: CVPR (2022)

    Wang, H., Li, Z., Feng, L., Zhang, W.: ViM: Out-of-distribution with virtual-logit matching. In: CVPR (2022)

  64. [65]

    In: ICML (2022)

    Wei, H., Xie, R., Cheng, H., Feng, L., An, B., Li, Y .: Mitigating neural network overconfidence with logit normalization. In: ICML (2022)

  65. [66]

    In: CVPR (2017) Exploiting Local Flatness for Efficient Out-of-Distribution Detection 19

    Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR (2017) Exploiting Local Flatness for Efficient Out-of-Distribution Detection 19

  66. [67]

    In: NeurIPS (2022)

    Yang, J., Wang, P., Zou, D., Zhou, Z., Ding, K., Peng, W., Wang, H., Chen, G., Li, B., Sun, Y ., et al.: Openood: Benchmarking generalized out-of-distribution detection. In: NeurIPS (2022)

  67. [68]

    In: IEEE Big Data (2020)

    Yao, Z., Gholami, A., Keutzer, K., Mahoney, M.W.: Pyhessian: Neural networks through the lens of the hessian. In: IEEE Big Data (2020)

  68. [69]

    In: ICCV (2019)

    Yu, Q., Aizawa, K.: Unsupervised out-of-distribution detection by maximum clas- sifier discrepancy. In: ICCV (2019)

  69. [70]

    In: BMVC (2016)

    Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)

  70. [71]

    In: NeurIPS Workshop on Distribution Shifts (2023)

    Zhang, J., Yang, J., Wang, P., Wang, H., Lin, Y ., Zhang, H., Sun, Y ., Du, X., Li, Y ., Liu, Z., Chen, Y ., Li, H.: Openood v1.5: Enhanced benchmark for out-of- distribution detection. In: NeurIPS Workshop on Distribution Shifts (2023)

  71. [72]

    In: ICLR (2023)

    Zhang, J., Fu, Q., Chen, X., Du, L., Li, Z., Wang, G., xiaoguang Liu, Han, S., Zhang, D.: Out-of-distribution detection based on in-distribution data patterns memorization with modern hopfield energy. In: ICLR (2023)

  72. [73]

    IEEE TPAMI (2017)

    Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: A 10 million image database for scene recognition. IEEE TPAMI (2017)

  73. [74]

    In: ICCV (2025) 20 Park et al

    Zöngür, B., et al.: Activation subspaces for out-of-distribution detection. In: ICCV (2025) 20 Park et al. Exploiting Local Flatness for Efficient Out-of-Distribution Detection Supplementary Material A Experimental Details This section provides additional details of the experimental framework, expanding upon the core setup described in the main text. For ...