Exploiting Local Flatness for Efficient Out-of-Distribution Detection
Pith reviewed 2026-06-30 07:40 UTC · model grok-4.3
The pith
Out-of-distribution inputs exhibit larger Hessian curvature in feature space than in-distribution data, widening with stronger shifts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OOD inputs exhibit larger Hessian curvature than ID data, with the gap widening under stronger distributional shifts. Fold exploits this discrepancy by computing the feature Hessian and applying partial feature normalization, thereby improving ID-OOD separability while sidestepping the expense of parameter-space curvature estimates. AutoFold supplies a self-supervised calibration step that creates pseudo-OOD examples via ID logit masking, removing the need for external data.
What carries the argument
The feature Hessian combined with partial feature normalization, which quantifies local flatness directly in feature space to separate ID from OOD inputs.
If this is right
- OOD detection becomes possible with only a forward pass and a single Hessian-vector product in feature space.
- Partial normalization of the feature Hessian improves separability without requiring full parameter-space computations.
- AutoFold enables automatic threshold calibration using only the model’s own predictions on ID data.
- The curvature signal strengthens as distributional shift increases, suggesting graded uncertainty estimates.
Where Pith is reading between the lines
- The same curvature signal could be tested on other tasks that rely on model uncertainty, such as selective classification or active learning.
- If the feature-Hessian gap generalizes, lightweight curvature checks might replace heavier ensemble or temperature-scaling baselines in resource-constrained settings.
- The approach invites direct comparison against gradient-norm or logit-based scores on the same architectures to isolate the contribution of curvature.
Load-bearing premise
The observed curvature gap between ID and OOD examples remains consistent enough after partial normalization to serve as a reliable detection signal across models and datasets.
What would settle it
A controlled experiment in which OOD samples produce equal or lower feature-Hessian curvature than ID samples on a standard benchmark would falsify the central observation.
Figures
read the original abstract
Detecting out-of-distribution (OOD) data is crucial for reliable machine learning deployment. Among detection strategies, post-hoc methods are particularly attractive due to their efficiency, as they operate directly on pre-trained networks without requiring retraining. Within this paradigm, one promising direction exploits loss-landscape curvature to estimate model uncertainty; however, such methods incur substantial computational cost and rely on implicit assumptions about how landscape flatness differs between in-distribution (ID) and OOD data. In this work, we provide the first systematic investigation of this curvature discrepancy and show that OOD inputs exhibit larger Hessian curvature than ID data, with the gap widening under stronger distributional shifts. Motivated by these observations, we propose Fold, a lightweight flatness-modulated OOD detector that leverages the feature Hessian and partial feature normalization to improve ID-OOD separability while avoiding costly parameter-space curvature approximations. To optimally adapt this normalization across diverse datasets, we further introduce AutoFold, a self-supervised tuning scheme that synthesizes pseudo-OOD samples via ID logit masking for automatic calibration without requiring external data. Experiments on OOD benchmarks show that Fold outperforms prior methods, improving the average AUROC by 1.63% and reducing FPR95 by 2.30%, while maintaining computational efficiency comparable to a standard forward pass. Supported by theoretical analysis and extensive ablations, Fold provides a principled and practical solution for robust real-world deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that OOD inputs exhibit larger feature-Hessian curvature than ID data (with the gap widening under stronger shifts), provides the first systematic investigation of this discrepancy, and introduces Fold: a lightweight post-hoc detector that uses the feature Hessian plus partial feature normalization to improve separability while avoiding full parameter-space Hessian costs. It further proposes AutoFold, a self-supervised scheme that generates pseudo-OOD samples via ID logit masking to tune the normalization parameters without external data. Experiments report average gains of 1.63% AUROC and 2.30% FPR95 reduction over prior methods at forward-pass cost, supported by theoretical analysis and ablations.
Significance. If the curvature discrepancy is shown to be general rather than architecture- or dataset-specific, Fold would supply a practical, efficient addition to the post-hoc OOD toolkit that sidesteps expensive curvature approximations. The self-supervised AutoFold component is a notable strength for real-world applicability. The modest reported gains, however, suggest incremental rather than transformative impact, and the result's value hinges on verification that the observed flatness gap is not an artifact of the tested CNNs or ID/OOD splits.
major comments (2)
- [Abstract] Abstract and empirical investigation: the load-bearing claim that OOD inputs exhibit reliably larger feature-Hessian curvature than ID data (widening with shift strength) is presented as a general property enabling Fold; yet the manuscript provides no explicit cross-architecture validation (e.g., on transformers) to address the possibility that the gap is driven by inductive biases of the tested CNNs, which would make the partial-normalization step fit to those biases rather than exploit a fundamental flatness difference.
- [AutoFold] AutoFold description: the scheme synthesizes pseudo-OOD via ID logit masking and adapts normalization parameters on these internally generated samples; this construction risks circularity because the detector is calibrated on quantities derived from the ID model itself, and the paper must demonstrate that the resulting separability is not an artifact of this self-generation process (e.g., via ablation removing the masking step).
minor comments (2)
- The abstract states 'the first systematic investigation' without referencing prior curvature-based OOD works; a brief related-work paragraph would clarify the precise novelty.
- [Experiments] Reported gains (1.63% AUROC, 2.30% FPR95) are averages; the paper should include per-dataset tables with standard deviations across multiple random seeds to allow assessment of statistical reliability.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We respond to each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract and empirical investigation: the load-bearing claim that OOD inputs exhibit reliably larger feature-Hessian curvature than ID data (widening with shift strength) is presented as a general property enabling Fold; yet the manuscript provides no explicit cross-architecture validation (e.g., on transformers) to address the possibility that the gap is driven by inductive biases of the tested CNNs, which would make the partial-normalization step fit to those biases rather than exploit a fundamental flatness difference.
Authors: We thank the referee for this observation. Our experiments and analysis are performed on the CNN architectures that dominate the OOD detection literature and the specific benchmarks we evaluate. The curvature gap is shown to be consistent across multiple CNN families and to increase with shift strength, which directly motivates the Fold design. We do not assert that the phenomenon holds for every possible architecture. In the revised manuscript we will add an explicit limitations paragraph clarifying the scope of the empirical claims and noting the lack of transformer results as an open question for future work. revision: partial
-
Referee: [AutoFold] AutoFold description: the scheme synthesizes pseudo-OOD via ID logit masking and adapts normalization parameters on these internally generated samples; this construction risks circularity because the detector is calibrated on quantities derived from the ID model itself, and the paper must demonstrate that the resulting separability is not an artifact of this self-generation process (e.g., via ablation removing the masking step).
Authors: We agree that an explicit check is warranted to confirm the masking step is not merely an artifact. The current manuscript already contains ablations on AutoFold components and a theoretical justification for the masking procedure. To address the referee's specific request, we will add a new ablation that directly compares AutoFold performance when the logit-masking step is removed (i.e., calibration performed on unmodified ID samples or random perturbations). This will be included in the revised version. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper's core chain begins with an empirical observation of curvature discrepancy (presented as a systematic investigation, not derived from Fold), which motivates the use of feature Hessian and partial normalization. AutoFold's synthesis of pseudo-OOD via logit masking is a self-supervised calibration step whose parameters are adapted on generated samples but evaluated for detection performance on external real OOD benchmarks; this does not reduce the reported AUROC/FPR95 gains to a quantity fitted by construction. No equations, self-citations, or uniqueness claims reduce the central result to its inputs. The derivation is self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption OOD inputs exhibit larger Hessian curvature than ID data in feature space, with the gap increasing under stronger shifts
Reference graph
Works this paper leans on
-
[1]
In: AAAI (2020)
Ahmed, F., Courville, A.: Detecting semantic anomalies. In: AAAI (2020)
2020
-
[2]
Journal of the ACM (2011)
Avron, H., Toledo, S.: Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. Journal of the ACM (2011)
2011
-
[3]
Journal of Computational and Applied Mathematics (1996)
Bai, Z., Fahey, G., Golub, G.: Some large-scale matrix computation problems. Journal of Computational and Applied Mathematics (1996)
1996
-
[4]
In: CVPR (2016)
Bendale, A., Boult, T.E.: Towards open set deep networks. In: CVPR (2016)
2016
-
[5]
In: ICML (2023)
Bitterwolf, J., Müller, M., Hein, M.: In or out? fixing imagenet out-of-distribution detection evaluation. In: ICML (2023)
2023
-
[6]
In: NeurIPS (2023)
Chen, C., Fu, Z., Liu, K., Chen, Z., Tao, M., Ye, J.: Optimal parameter and neuron pruning for out-of-distribution detection. In: NeurIPS (2023)
2023
-
[7]
In: CVPR (2014)
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: CVPR (2014)
2014
-
[8]
routledge (2013)
Cohen, J.: Statistical power analysis for the behavioral sciences. routledge (2013)
2013
-
[9]
In: CVPR (2009)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: CVPR (2009)
2009
-
[10]
IEEE signal processing magazine (2012)
Deng, L.: The mnist database of handwritten digit images for machine learning research. IEEE signal processing magazine (2012)
2012
-
[12]
Learning Confidence for Out-of-Distribution Detection in Neural Networks
DeVries, T., Taylor, G.W.: Learning confidence for out-of-distribution detection in neural networks. arXiv preprint arXiv:1802.04865 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[13]
In: NeurIPS (2018)
Dhamija, A.R., Günther, M., Boult, T.: Reducing network agnostophobia. In: NeurIPS (2018)
2018
-
[14]
In: ICLR (2023)
Djurisic, A., Bozanic, N., Ashok, A., Liu, R.: Extremely simple activation shaping for out-of-distribution detection. In: ICLR (2023)
2023
-
[15]
In: ICLR (2022)
Du, X., Wang, Z., Cai, M., Li, Y .: VOS: Learning what you don’t know by virtual outlier synthesis. In: ICLR (2022)
2022
-
[16]
In: NeurIPS (2024)
Fang, K., et al.: Kernel PCA for out-of-distribution detection. In: NeurIPS (2024)
2024
-
[17]
In: ICML (2017)
Guo, C., Pleiss, G., Sun, Y ., Weinberger, K.Q.: On calibration of modern neural networks. In: ICML (2017)
2017
-
[18]
In: CVPR (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
2016
-
[19]
In: ICML (2022)
Hendrycks, D., Basart, S., Mazeika, M., Mostajabi, M., Steinhardt, J., Song, D.: Scaling out-of-distribution detection for real-world settings. In: ICML (2022)
2022
-
[20]
In: ICLR (2017)
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of- distribution examples in neural networks. In: ICLR (2017)
2017
-
[21]
In: ICLR (2019)
Hendrycks, D., Mazeika, M., Dietterich, T.: Deep anomaly detection with outlier exposure. In: ICLR (2019)
2019
-
[22]
In: CVPR (2020)
Hsu, Y .C., Shen, Y ., Jin, H., Kira, Z.: Generalized odin: Detecting out-of- distribution image without learning from out-of-distribution data. In: CVPR (2020)
2020
-
[23]
In: CVPR (2017) Exploiting Local Flatness for Efficient Out-of-Distribution Detection 17
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017) Exploiting Local Flatness for Efficient Out-of-Distribution Detection 17
2017
-
[24]
In: NeurIPS (2021)
Huang, R., Geng, A., Li, Y .: On the importance of gradients for detecting distribu- tional shifts in the wild. In: NeurIPS (2021)
2021
-
[25]
Communications in Statistics-Simulation and Computation (1989)
Hutchinson, M.F.: A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines. Communications in Statistics-Simulation and Computation (1989)
1989
-
[26]
In: COLT (2019)
Ji, Z., Telgarsky, M.: The implicit bias of gradient descent on nonseparable data. In: COLT (2019)
2019
-
[27]
In: ICML (2020)
Kristiadi, A., Hein, M., Hennig, P.: Being bayesian, even just a bit, fixes overconfi- dence in relu networks. In: ICML (2020)
2020
-
[28]
Master’s thesis, Department of Computer Science, University of Toronto (2009)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto (2009)
2009
-
[29]
IJCV (2020)
Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A., et al.: The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. IJCV (2020)
2020
-
[30]
In: NeurIPS (2017)
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: NeurIPS (2017)
2017
-
[31]
CS 231N (2015)
Le, Y ., Yang, X.: Tiny imagenet visual recognition challenge. CS 231N (2015)
2015
-
[32]
In: ICLR (2018)
Lee, K., Lee, H., Lee, K., Shin, J.: Training confidence-calibrated classifiers for detecting out-of-distribution samples. In: ICLR (2018)
2018
-
[33]
In: NeurIPS (2018)
Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In: NeurIPS (2018)
2018
-
[34]
In: ICML (2023)
Lee, S., Park, J., Lee, J.: Implicit jacobian regularization weighted with impurity of probability output. In: ICML (2023)
2023
-
[35]
In: ICLR (2018)
Liang, S., Li, Y ., Srikant, R.: Enhancing the reliability of out-of-distribution image detection in neural networks. In: ICLR (2018)
2018
-
[36]
In: NeurIPS (2020)
Liu, W., Wang, X., Owens, J., Li, Y .: Energy-based out-of-distribution detection. In: NeurIPS (2020)
2020
-
[37]
In: ICLR (2024)
Liu, Y ., Chris, X., Li, H., Ma, L., Wang, S.: Neuron activation coverage: Rethinking out-of-distribution detection and generalization. In: ICLR (2024)
2024
-
[38]
In: ICLR (2019)
Madras, D., Atwood, J., D’Amour, A.: Detecting extrapolation with local ensem- bles. In: ICLR (2019)
2019
-
[39]
In: CVPR (2017)
Moosavi-Dezfooli, S.M., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. In: CVPR (2017)
2017
-
[40]
In: ICML (2025)
Mueller, M., Hein, M.: Mahalanobis++: Improving ood detection via feature nor- malization. In: ICML (2025)
2025
-
[41]
In: AISTATS (2019)
Nacson, M.S., Srebro, N., Soudry, D.: Stochastic gradient descent on separable data: Exact convergence with a fixed learning rate. In: AISTATS (2019)
2019
-
[42]
In: NeurIPS Workshop on deep learning and unsupervised feature learning (2011)
Netzer, Y ., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y ., et al.: Reading digits in natural images with unsupervised feature learning. In: NeurIPS Workshop on deep learning and unsupervised feature learning (2011)
2011
-
[43]
In: CVPR (2015)
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: CVPR (2015)
2015
-
[44]
In: ICLR (2026) 18 Park et al
Nguyen, A., Bertrand, A., Le Hégarat-Mascle, S., Aldea, E., FLORIN, F., EL- KORSO, M.N., LUSTRAT, R.: Fisher-rao sensitivity for out-of-distribution detec- tion in deep neural networks. In: ICLR (2026) 18 Park et al
2026
-
[45]
In: ICML (2023)
Oh, J., Yun, C.: Provable benefit of mixup for finding optimal decision boundaries. In: ICML (2023)
2023
-
[46]
In: ICCV (2023)
Park, J., Chai, J.C.L., Yoon, J., Teoh, A.B.J.: Understanding the feature norm for out-of-distribution detection. In: ICCV (2023)
2023
-
[47]
In: CVPR (2020)
Radosavovic, I., Kosaraju, R.P., Girshick, R., He, K., Dollár, P.: Designing network design spaces. In: CVPR (2020)
2020
-
[48]
In: NeurIPS (2024)
Ravikumar, D., Soufleri, E., Roy, K.: Curvature clues: Decoding deep learning privacy with input loss curvature. In: NeurIPS (2024)
2024
-
[49]
In: ICML Workshop on Uncertainty and Robustness in Deep Learning (2021)
Ren, J., Fort, S., Liu, J., Roy, A.G., Padhy, S., Lakshminarayanan, B.: A simple fix to mahalanobis distance for improving near-ood detection. In: ICML Workshop on Uncertainty and Robustness in Deep Learning (2021)
2021
-
[50]
In: ICLR (2018)
Ritter, H., Botev, A., Barber, D.: A scalable laplace approximation for neural networks. In: ICLR (2018)
2018
-
[51]
In: ICML (2018)
Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S.A., Binder, A., Müller, E., Kloft, M.: Deep one-class classification. In: ICML (2018)
2018
-
[52]
In: ICLR (2026)
Seleznova, M., et al.: GradPCA: Leveraging NTK alignment for reliable out-of- distribution detection. In: ICLR (2026)
2026
-
[53]
In: UAI (2021)
Sharma, A., Azizan, N., Pavone, M.: Sketching curvature for efficient out-of- distribution detection for deep neural networks. In: UAI (2021)
2021
-
[54]
JMLR (2018)
Soudry, D., Hoffer, E., Nacson, M.S., Gunasekar, S., Srebro, N.: The implicit bias of gradient descent on separable data. JMLR (2018)
2018
-
[55]
In: NeurIPS (2021)
Sun, Y ., Guo, C., Li, Y .: React: Out-of-distribution detection with rectified activa- tions. In: NeurIPS (2021)
2021
-
[56]
In: ECCV (2022)
Sun, Y ., Li, S.: Dice: Leveraging sparsification for out-of-distribution detection. In: ECCV (2022)
2022
-
[57]
In: ICML (2022)
Sun, Y ., Ming, Y ., Zhu, X., Li, Y .: Out-of-distribution detection with deep nearest neighbors. In: ICML (2022)
2022
-
[58]
In: ICLR (2014)
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. In: ICLR (2014)
2014
-
[59]
In: NeurIPS (2020)
Tack, J., Mo, S., Jeong, J., Shin, J.: Csi: Novelty detection via contrastive learning on distributionally shifted instances. In: NeurIPS (2020)
2020
-
[60]
In: ICLR (2023)
Tao, L., Du, X., Zhu, X., Li, Y .: Non-parametric outlier synthesis. In: ICLR (2023)
2023
-
[61]
In: CVPR (2018)
Van Horn, G., Mac Aodha, O., Song, Y ., Cui, Y ., Sun, C., Shepard, A., Adam, H., Perona, P., Belongie, S.: The inaturalist species classification and detection dataset. In: CVPR (2018)
2018
-
[62]
In: ICLR (2022)
Vaze, S., Han, K., Vedaldi, A., Zisserman, A.: Open-set recognition: A good closed-set classifier is all you need. In: ICLR (2022)
2022
-
[63]
In: ECCV (2018)
Vyas, A., Jammalamadaka, N., Zhu, X., Das, D., Kaul, B., Willke, T.L.: Out-of- distribution detection using an ensemble of self supervised leave-out classifiers. In: ECCV (2018)
2018
-
[64]
In: CVPR (2022)
Wang, H., Li, Z., Feng, L., Zhang, W.: ViM: Out-of-distribution with virtual-logit matching. In: CVPR (2022)
2022
-
[65]
In: ICML (2022)
Wei, H., Xie, R., Cheng, H., Feng, L., An, B., Li, Y .: Mitigating neural network overconfidence with logit normalization. In: ICML (2022)
2022
-
[66]
In: CVPR (2017) Exploiting Local Flatness for Efficient Out-of-Distribution Detection 19
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR (2017) Exploiting Local Flatness for Efficient Out-of-Distribution Detection 19
2017
-
[67]
In: NeurIPS (2022)
Yang, J., Wang, P., Zou, D., Zhou, Z., Ding, K., Peng, W., Wang, H., Chen, G., Li, B., Sun, Y ., et al.: Openood: Benchmarking generalized out-of-distribution detection. In: NeurIPS (2022)
2022
-
[68]
In: IEEE Big Data (2020)
Yao, Z., Gholami, A., Keutzer, K., Mahoney, M.W.: Pyhessian: Neural networks through the lens of the hessian. In: IEEE Big Data (2020)
2020
-
[69]
In: ICCV (2019)
Yu, Q., Aizawa, K.: Unsupervised out-of-distribution detection by maximum clas- sifier discrepancy. In: ICCV (2019)
2019
-
[70]
In: BMVC (2016)
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)
2016
-
[71]
In: NeurIPS Workshop on Distribution Shifts (2023)
Zhang, J., Yang, J., Wang, P., Wang, H., Lin, Y ., Zhang, H., Sun, Y ., Du, X., Li, Y ., Liu, Z., Chen, Y ., Li, H.: Openood v1.5: Enhanced benchmark for out-of- distribution detection. In: NeurIPS Workshop on Distribution Shifts (2023)
2023
-
[72]
In: ICLR (2023)
Zhang, J., Fu, Q., Chen, X., Du, L., Li, Z., Wang, G., xiaoguang Liu, Han, S., Zhang, D.: Out-of-distribution detection based on in-distribution data patterns memorization with modern hopfield energy. In: ICLR (2023)
2023
-
[73]
IEEE TPAMI (2017)
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: A 10 million image database for scene recognition. IEEE TPAMI (2017)
2017
-
[74]
Zöngür, B., et al.: Activation subspaces for out-of-distribution detection. In: ICCV (2025) 20 Park et al. Exploiting Local Flatness for Efficient Out-of-Distribution Detection Supplementary Material A Experimental Details This section provides additional details of the experimental framework, expanding upon the core setup described in the main text. For ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.