pith. machine review for the scientific record. sign in

arxiv: 2605.08191 · v1 · submitted 2026-05-05 · 💻 cs.CV · cs.AI

Recognition: 2 theorem links

· Lean Theorem

A Robust Out-of-Distribution Detection Framework via Synergistic Smoothing

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:13 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords out-of-distribution detectionadversarial robustnessmedian smoothingscore instabilitypost-hoc detectorcomputer visionsymmetric defence
0
0 comments X

The pith

Median smoothing on OOD scores lets the same noise samples quantify local instability, producing a detector robust to both minimizing and maximizing attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper starts from any baseline OOD detector and wraps its scores with median smoothing to balance clean and adversarial performance. The noisy samples created during that smoothing are then reused to compute a simple instability measure around each input. OOD inputs turn out to be markedly less stable than in-distribution inputs under the same small perturbations. ROSS feeds this instability signal back into the decision rule, yielding a post-hoc detector that resists attacks trying to either suppress or inflate the OOD score. Experiments on CIFAR-10, CIFAR-100 and ImageNet show gains of up to 40 AUROC points over prior robust detectors under symmetric adversarial conditions.

Core claim

The central claim is that OOD samples exhibit higher local instability under perturbation than ID samples, and that the noisy samples already generated by median smoothing of a base OOD score can be repurposed to measure this instability reliably, resulting in the ROSS detector that achieves symmetric robustness against both score-minimising and score-maximising adversarial attacks.

What carries the argument

The instability metric obtained by repurposing the perturbation samples from median smoothing, which quantifies how much the base OOD score fluctuates in a local neighbourhood and is observed to be systematically larger for OOD inputs.

If this is right

  • ROSS maintains competitive clean AUROC while delivering large gains under both attack directions.
  • The method is post-hoc and can be applied on top of any existing OOD scoring function.
  • Symmetric robustness holds across CIFAR-10, CIFAR-100 and ImageNet-scale experiments.
  • The same perturbation budget used for smoothing supplies the instability signal at no extra cost.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The instability signal may prove useful for other safety tasks that must distinguish natural from anomalous inputs under perturbation.
  • Because the defence reuses existing smoothing samples, it could be combined with other certified or empirical smoothing schemes without additional queries.
  • If instability remains a reliable separator even under stronger adaptive attacks, it suggests that distribution shift and perturbation sensitivity are more tightly coupled than previously exploited.

Load-bearing premise

The assumption that OOD samples are reliably more unstable than ID samples under small perturbations and that the noise samples from median smoothing capture this difference without introducing fresh vulnerabilities.

What would settle it

Finding a set of OOD images whose base scores remain more stable under the median-smoothing perturbations than ID images from the same dataset, or constructing an attack that defeats ROSS while leaving the instability signal intact.

Figures

Figures reproduced from arXiv: 2605.08191 by Abdelrahman Hekal, Alessio Lomuscio, Maria Stoica.

Figure 1
Figure 1. Figure 1: Comparison of GEN score profiles for an ID sample [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distributions of Median, MAD, and ROSS scores for ID (CIFAR-10) and various OOD datasets (e.g., SVHN, Texture). Columns correspond to different OOD datasets; rows show Median Smoothed Score (top), MAD (middle, negated so ID is larger), and ROSS (bottom). Blue: ID, orange: OOD. ROSS yields the best ID–OOD separation. aging median smoothing and the instability of the model’s score landscape. 4.1. Perturbatio… view at source ↗
Figure 3
Figure 3. Figure 3: Trade-off between clean and adversarial (PGD-Max) performance. OOD detection AUROC on CIFAR-10 under at￾tacks of varying strength (ϵ). Each line shows a different ROSS noise level (σnoise), revealing the balance between clean accuracy (ϵ = 0) and robustness. Together, these components yield a scoring function that adaptively rewards both high confidence and local score sta￾bility, two key hallmarks of in-d… view at source ↗
read the original abstract

Reliable out-of-distribution (OOD) detection is a critical requirement for the safe deployment of machine learning systems. Despite recent progress, state-of-the-art OOD detectors are highly susceptible to adversarial attacks, which undermines their trustworthiness in automated systems. To address this vulnerability, we apply median smoothing to baseline OOD detection scores, balancing clean and adversarial accuracies. Our key insight is that the noisy samples generated for median smoothing can be repurposed to quantify the local instability of the base score. We observe that OOD samples exhibit higher instability under perturbation. Based on this, we propose ROSS, a novel and robust post-hoc OOD detector that leverages the instability of baseline scores to further distinguish between in-distribution (ID) and OOD samples. ROSS achieves symmetric robustness, performing strongly against both score-minimising and score-maximising attacks, unlike prior work. This symmetric defence leads to state-of-the-art robustness, outperforming prior methods by up to 40 AUROC points. We demonstrate ROSS's effectiveness on extensive experiments across CIFAR-10, CIFAR-100, and ImageNet. Code is available at: https://github.com/Abdu-Hekal/ROSS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes ROSS, a post-hoc OOD detector that applies median smoothing to baseline OOD scores for balancing clean and adversarial performance, then repurposes the generated noisy samples to quantify local instability of the base score. It observes that OOD samples exhibit higher instability under perturbation than ID samples and combines this with the smoothed score to achieve symmetric robustness against both score-minimizing and score-maximizing attacks, reporting up to 40 AUROC gains over prior methods on CIFAR-10, CIFAR-100, and ImageNet.

Significance. If the central empirical claims hold under rigorous adaptive evaluation, the work would meaningfully advance reliable OOD detection for safety-critical systems by addressing the documented vulnerability of existing detectors to adversarial perturbations. The approach is a lightweight post-hoc construction that reuses computation already performed for smoothing, and the public code release supports reproducibility.

major comments (3)
  1. [abstract, §3, experiments] The central claim of symmetric robustness and large AUROC gains rests on the assumption that OOD samples reliably show higher local instability than ID samples when quantified from the same median-smoothing perturbations (abstract and §3). No adaptive attack evaluation targeting the full ROSS metric (smoothed score + instability term) is described; an adversary jointly optimizing against both components could potentially suppress the distinction, undermining the reported gains.
  2. [experiments] Experimental details on attack implementations, baseline choices, statistical testing, and ablation of the instability component are insufficient to support the claimed improvements (abstract reports extensive experiments but lacks full attack protocols and variance reporting). This weakens the evidence for SOTA performance across CIFAR-10/100 and ImageNet.
  3. [§3] The paper does not provide a formal analysis or bound showing that the instability term is non-exploitable or that the synergistic combination preserves the observed separation under worst-case perturbations; the construction remains an empirical observation without reduction to a parameter-free or provably robust quantity.
minor comments (2)
  1. [§3] Notation for the instability metric and its combination with the smoothed score should be defined more explicitly with equations to improve clarity.
  2. [figures/tables] Figure captions and table headers could more clearly distinguish clean vs. adversarial settings and report exact attack strengths used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which has helped us identify areas for improvement in the manuscript. We address each major comment below and have revised the paper accordingly to strengthen the empirical evidence and clarity of our claims.

read point-by-point responses
  1. Referee: The central claim of symmetric robustness and large AUROC gains rests on the assumption that OOD samples reliably show higher local instability than ID samples when quantified from the same median-smoothing perturbations (abstract and §3). No adaptive attack evaluation targeting the full ROSS metric (smoothed score + instability term) is described; an adversary jointly optimizing against both components could potentially suppress the distinction, undermining the reported gains.

    Authors: We appreciate this observation on the evaluation scope. Our experiments demonstrate that OOD samples exhibit higher instability under the median-smoothing perturbations used for the smoothed score, leading to symmetric robustness against both score-minimizing and score-maximizing attacks. However, we agree that joint adaptive attacks on the combined ROSS metric would provide stronger validation. In the revised manuscript, we have added such adaptive attack evaluations (using PGD-style optimization targeting both terms simultaneously) on CIFAR-10/100 and ImageNet, confirming that the performance gains hold with only minor degradation. revision: yes

  2. Referee: Experimental details on attack implementations, baseline choices, statistical testing, and ablation of the instability component are insufficient to support the claimed improvements (abstract reports extensive experiments but lacks full attack protocols and variance reporting). This weakens the evidence for SOTA performance across CIFAR-10/100 and ImageNet.

    Authors: We acknowledge the need for greater detail to support reproducibility and the SOTA claims. The revised manuscript now includes: (i) full attack protocols with hyperparameters, step counts, and perturbation budgets; (ii) rationale for baseline selections with additional comparisons; (iii) statistical testing including mean AUROC with standard deviations over 5 random seeds; and (iv) a dedicated ablation study isolating the instability term's contribution. These additions are placed in the experiments section and supplementary material. revision: yes

  3. Referee: The paper does not provide a formal analysis or bound showing that the instability term is non-exploitable or that the synergistic combination preserves the observed separation under worst-case perturbations; the construction remains an empirical observation without reduction to a parameter-free or provably robust quantity.

    Authors: We agree that the instability term is an empirical observation rather than a formally bounded quantity. The manuscript does not claim a parameter-free or provably robust construction; ROSS is presented as a practical, lightweight post-hoc method. In the revision, we have expanded §3 to discuss the empirical nature of the separation, added sensitivity analysis under varying perturbation strengths, and clarified limitations regarding worst-case exploitability. A rigorous theoretical bound remains an open direction for future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical post-hoc construction with independent experimental validation.

full rationale

The paper's core construction applies median smoothing to baseline OOD scores and repurposes the generated noisy samples to measure local instability, observing (rather than deriving) that OOD samples exhibit higher instability. ROSS is then defined directly from this combined metric. No equations reduce the claimed AUROC gains or symmetric robustness to a fitted parameter, self-referential definition, or self-citation chain. The robustness results are presented as empirical outcomes across CIFAR-10/100 and ImageNet, not as predictions forced by the input assumptions. This matches the default expectation of a non-circular empirical method.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the empirical observation that OOD samples show measurably higher score instability under the smoothing perturbations; no free parameters, axioms or invented entities are introduced beyond standard ML training and evaluation practices.

pith-pipeline@v0.9.0 · 5515 in / 1064 out tokens · 36664 ms · 2026-05-12T01:13:46.933924+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 1 internal anchor

  1. [1]

    Concrete Problems in AI Safety

    Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Chris- tiano, John Schulman, and Dan Man ´e. Concrete problems in ai safety.arXiv preprint arXiv:1606.06565, 2016. 1

  2. [2]

    Obfus- cated gradients give a false sense of security: Circumventing defenses to adversarial examples

    Anish Athalye, Nicholas Carlini, and David Wagner. Obfus- cated gradients give a false sense of security: Circumventing defenses to adversarial examples. InProceedings of the In- ternational Conference on Machine Learning (ICML), 2018. 2

  3. [3]

    Your out-of-distribution de- tection method is not robust! InAdvances in Neural Infor- mation Processing Systems (NeurIPS), 2022

    Mohammad Azizmalayeri, Arshia Soltani Moakar, Arman Zarei, Reihaneh Zohrabi, Mohammad Taghi Manzuri, and Mohammad Hossein Rohban. Your out-of-distribution de- tection method is not robust! InAdvances in Neural Infor- mation Processing Systems (NeurIPS), 2022. 1, 3

  4. [4]

    ATOM: Robustifying out-of-distribution detection using outlier mining

    Jiefeng Chen, Yixuan Li, Xi Wu, Yingyu Liang, and Somesh Jha. ATOM: Robustifying out-of-distribution detection using outlier mining. InMachine Learning and Knowledge Dis- covery in Databases. Research Track. Springer International Publishing, 2021. 3, 6

  5. [5]

    Robust out-of-distribution detection for neural net- works

    Jiefeng Chen, Yixuan Li, Xi Wu, Yingyu Liang, and Somesh Jha. Robust out-of-distribution detection for neural net- works. InThe AAAI-22 Workshop on Adversarial Machine Learning and Beyond, 2021. 3

  6. [6]

    Yeh, Shaoshuai Mou, and Yan Gu

    Wenxi Chen, Raymond A. Yeh, Shaoshuai Mou, and Yan Gu. Leveraging perturbation robustness to enhance out-of- distribution detection. InProceedings of the Computer Vi- sion and Pattern Recognition Conference (CVPR), 2025. 2, 3, 4, 6, 8, 1

  7. [7]

    De- tection as regression: Certified object detection with median smoothing

    Ping-yeh Chiang, Michael Curry, Ahmed Abdelkader, Aounon Kumar, John Dickerson, and Tom Goldstein. De- tection as regression: Certified object detection with median smoothing. InAdvances in Neural Information Processing Systems (NeurIPS). Curran Associates, Inc., 2020. 5

  8. [8]

    Describing textures in the wild

    Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing textures in the wild. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2014. 5

  9. [9]

    Zico Kolter

    Jeremy Cohen, Elan Rosenfeld, and J. Zico Kolter. Certified adversarial robustness via randomized smoothing. InPro- ceedings of the International Conference on Machine Learn- ing (ICML), 2019. 5, 2

  10. [10]

    ImageNet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2009. 2, 5

  11. [11]

    Extremely simple activation shaping for out- of-distribution detection

    Andrija Djurisic, Nebojsa Bozanic, Arjun Ashok, and Rosanne Liu. Extremely simple activation shaping for out- of-distribution detection. InProceedings of the International Conference on Learning Representations (ICLR), 2023. 3

  12. [12]

    On the limitations of stochastic pre-processing de- fenses

    Yue Gao, Ilia Shumailov, Kassem Fawaz, and Nicolas Pa- pernot. On the limitations of stochastic pre-processing de- fenses. InAdvances in Neural Information Processing Sys- tems (NeurIPS), 2022. 2

  13. [13]

    MIT Press, 2016

    Ian Goodfellow, Yoshua Bengio, and Aaron Courville.Deep Learning. MIT Press, 2016. 3

  14. [14]

    The influence curve and its role in robust estimation.Journal of the American Statistical Association,

    Frank R Hampel. The influence curve and its role in robust estimation.Journal of the American Statistical Association,

  15. [15]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR). IEEE, 2016. 6

  16. [16]

    A baseline for detecting misclassified and out-of-distribution examples in neural net- works

    Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural net- works. InProceedings of the International Conference on Learning Representations (ICLR), 2017. 1, 2

  17. [17]

    Evalu- ating robustness of predictive uncertainty estimation: Are dirichlet-based models reliable? InProceedings of the Inter- national Conference on Machine Learning (ICML)

    Anna-Kathrin Kopetzki, Bertrand Charpentier, Daniel Z¨ugner, Sandhya Giri, and Stephan G ¨unnemann. Evalu- ating robustness of predictive uncertainty estimation: Are dirichlet-based models reliable? InProceedings of the Inter- national Conference on Machine Learning (ICML). PMLR,

  18. [18]

    Krizhevsky and G

    A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. 2, 5

  19. [19]

    Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

    Yann Le and Xuan Yang. Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015. 5

  20. [20]

    LeCun, C

    Y . LeCun, C. Cortes, and C. J. Burges. The mnist database of handwritten digits, 1998. 5

  21. [21]

    A simple unified framework for detecting out-of-distribution samples and adversarial attacks

    Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. InAdvances in Neural In- formation Processing Systems (NeurIPS), 2018. 3

  22. [22]

    Shiyu Liang, Yixuan Li, and R. Srikant. Enhancing the re- liability of out-of-distribution image detection in neural net- works. InProceedings of the International Conference on Learning Representations (ICLR), 2018. 2, 3, 8

  23. [23]

    Kochenderfer

    Changliu Liu, Tomer Arnon, Christopher Lazarus, Christo- pher Strong, Clark Barrett, and Mykel J. Kochenderfer. Al- gorithms for verifying deep neural networks.F oundations and Trends in Optimization, 2021. 1

  24. [24]

    Fast decision boundary based out- of-distribution detector

    Litian Liu and Yao Qin. Fast decision boundary based out- of-distribution detector. InProceedings of the International Conference on Machine Learning (ICML), 2024. 3

  25. [25]

    Energy-based out-of-distribution detection

    Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. Energy-based out-of-distribution detection. InAdvances in Neural Information Processing Systems (NeurIPS), 2020. 2

  26. [26]

    GEN: Pushing the limits of softmax-based out-of-distribution de- tection

    Xixi Liu, Yaroslava Lochman, and Christopher Zach. GEN: Pushing the limits of softmax-based out-of-distribution de- tection. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2023. 2, 4, 6

  27. [27]

    Deciphering the definition of adversarial robust- ness for post-hoc ood detectors

    Peter Lorenz, Mario Ruben Fernandez, Jens M ¨uller, and Ull- rich Koethe. Deciphering the definition of adversarial robust- ness for post-hoc ood detectors. InICML 2024 Next Gener- ation of AI Safety Workshop, 2024. 1, 3

  28. [28]

    Towards deep learning models resistant to adversarial attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InProceedings of the International Conference on Learning Representations (ICLR), 2018. 6

  29. [29]

    Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bis- sacco, Bo Wu, and Andrew Y . Ng. Reading digits in natural images with unsupervised feature learning. InNIPS Work- shop on Deep Learning and Unsupervised Feature Learning,

  30. [30]

    Deep neural networks are easily fooled: High confidence predictions for unrecognizable images

    Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. InProceedings of the Computer Vi- sion and Pattern Recognition Conference (CVPR), 2015. 2

  31. [31]

    Provably robust deep learning via adversarially trained smoothed clas- sifiers

    Hadi Salman, Greg Yang, Jerry Li, Pengchuan Zhang, Huan Zhang, Ilya Razenshteyn, and S ´ebastien Bubeck. Provably robust deep learning via adversarially trained smoothed clas- sifiers. InAdvances in Neural Information Processing Sys- tems (NeurIPS), 2019. 2

  32. [32]

    Out-of-distribution detection using counterfactual distance

    Maria Stoica, Francesco Leofante, and Alessio Lomuscio. Out-of-distribution detection using counterfactual distance. arXiv preprint arXiv:2508.10148, 2025. 3

  33. [33]

    ReAct: Out-of- distribution detection with rectified activations

    Yiyou Sun, Chuan Guo, and Yixuan Li. ReAct: Out-of- distribution detection with rectified activations. InAdvances in Neural Information Processing Systems (NeurIPS). Cur- ran Associates, Inc., 2021. 3

  34. [34]

    Out-of- distribution detection with deep nearest neighbors

    Yiyou Sun, Yifei Ming, Xiaojin Zhu, and Yixuan Li. Out-of- distribution detection with deep nearest neighbors. InPro- ceedings of the International Conference on Machine Learn- ing (ICML). PMLR, 2022. 3

  35. [35]

    OpenOOD: Benchmarking generalized out-of-distribution detection.Advances in Neu- ral Information Processing Systems (NeurIPS), 2022

    Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, Wenxuan Peng, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, et al. OpenOOD: Benchmarking generalized out-of-distribution detection.Advances in Neu- ral Information Processing Systems (NeurIPS), 2022. 3

  36. [36]

    Openood v1

    Jingyang Zhang, Jingkang Yang, Pengyun Wang, Haoqi Wang, Yueqian Lin, Haoran Zhang, Yiyou Sun, Xuefeng Du, Yixuan Li, Ziwei Liu, et al. OpenOOD v1.5: Enhanced benchmark for out-of-distribution detection.arXiv preprint arXiv:2306.09301, 2024. 5, 6

  37. [37]

    Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analy- sis and Machine Intelligence, 40(6):1452–1464, 2018

    Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analy- sis and Machine Intelligence, 40(6):1452–1464, 2018. 5 A Robust Out-of-Distribution Detection Framework via Synergistic Smoothing Supplementary Material The appendix is organised as fol...