Recognition: 2 theorem links
· Lean TheoremA Robust Out-of-Distribution Detection Framework via Synergistic Smoothing
Pith reviewed 2026-05-12 01:13 UTC · model grok-4.3
The pith
Median smoothing on OOD scores lets the same noise samples quantify local instability, producing a detector robust to both minimizing and maximizing attacks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that OOD samples exhibit higher local instability under perturbation than ID samples, and that the noisy samples already generated by median smoothing of a base OOD score can be repurposed to measure this instability reliably, resulting in the ROSS detector that achieves symmetric robustness against both score-minimising and score-maximising adversarial attacks.
What carries the argument
The instability metric obtained by repurposing the perturbation samples from median smoothing, which quantifies how much the base OOD score fluctuates in a local neighbourhood and is observed to be systematically larger for OOD inputs.
If this is right
- ROSS maintains competitive clean AUROC while delivering large gains under both attack directions.
- The method is post-hoc and can be applied on top of any existing OOD scoring function.
- Symmetric robustness holds across CIFAR-10, CIFAR-100 and ImageNet-scale experiments.
- The same perturbation budget used for smoothing supplies the instability signal at no extra cost.
Where Pith is reading between the lines
- The instability signal may prove useful for other safety tasks that must distinguish natural from anomalous inputs under perturbation.
- Because the defence reuses existing smoothing samples, it could be combined with other certified or empirical smoothing schemes without additional queries.
- If instability remains a reliable separator even under stronger adaptive attacks, it suggests that distribution shift and perturbation sensitivity are more tightly coupled than previously exploited.
Load-bearing premise
The assumption that OOD samples are reliably more unstable than ID samples under small perturbations and that the noise samples from median smoothing capture this difference without introducing fresh vulnerabilities.
What would settle it
Finding a set of OOD images whose base scores remain more stable under the median-smoothing perturbations than ID images from the same dataset, or constructing an attack that defeats ROSS while leaving the instability signal intact.
Figures
read the original abstract
Reliable out-of-distribution (OOD) detection is a critical requirement for the safe deployment of machine learning systems. Despite recent progress, state-of-the-art OOD detectors are highly susceptible to adversarial attacks, which undermines their trustworthiness in automated systems. To address this vulnerability, we apply median smoothing to baseline OOD detection scores, balancing clean and adversarial accuracies. Our key insight is that the noisy samples generated for median smoothing can be repurposed to quantify the local instability of the base score. We observe that OOD samples exhibit higher instability under perturbation. Based on this, we propose ROSS, a novel and robust post-hoc OOD detector that leverages the instability of baseline scores to further distinguish between in-distribution (ID) and OOD samples. ROSS achieves symmetric robustness, performing strongly against both score-minimising and score-maximising attacks, unlike prior work. This symmetric defence leads to state-of-the-art robustness, outperforming prior methods by up to 40 AUROC points. We demonstrate ROSS's effectiveness on extensive experiments across CIFAR-10, CIFAR-100, and ImageNet. Code is available at: https://github.com/Abdu-Hekal/ROSS.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ROSS, a post-hoc OOD detector that applies median smoothing to baseline OOD scores for balancing clean and adversarial performance, then repurposes the generated noisy samples to quantify local instability of the base score. It observes that OOD samples exhibit higher instability under perturbation than ID samples and combines this with the smoothed score to achieve symmetric robustness against both score-minimizing and score-maximizing attacks, reporting up to 40 AUROC gains over prior methods on CIFAR-10, CIFAR-100, and ImageNet.
Significance. If the central empirical claims hold under rigorous adaptive evaluation, the work would meaningfully advance reliable OOD detection for safety-critical systems by addressing the documented vulnerability of existing detectors to adversarial perturbations. The approach is a lightweight post-hoc construction that reuses computation already performed for smoothing, and the public code release supports reproducibility.
major comments (3)
- [abstract, §3, experiments] The central claim of symmetric robustness and large AUROC gains rests on the assumption that OOD samples reliably show higher local instability than ID samples when quantified from the same median-smoothing perturbations (abstract and §3). No adaptive attack evaluation targeting the full ROSS metric (smoothed score + instability term) is described; an adversary jointly optimizing against both components could potentially suppress the distinction, undermining the reported gains.
- [experiments] Experimental details on attack implementations, baseline choices, statistical testing, and ablation of the instability component are insufficient to support the claimed improvements (abstract reports extensive experiments but lacks full attack protocols and variance reporting). This weakens the evidence for SOTA performance across CIFAR-10/100 and ImageNet.
- [§3] The paper does not provide a formal analysis or bound showing that the instability term is non-exploitable or that the synergistic combination preserves the observed separation under worst-case perturbations; the construction remains an empirical observation without reduction to a parameter-free or provably robust quantity.
minor comments (2)
- [§3] Notation for the instability metric and its combination with the smoothed score should be defined more explicitly with equations to improve clarity.
- [figures/tables] Figure captions and table headers could more clearly distinguish clean vs. adversarial settings and report exact attack strengths used.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which has helped us identify areas for improvement in the manuscript. We address each major comment below and have revised the paper accordingly to strengthen the empirical evidence and clarity of our claims.
read point-by-point responses
-
Referee: The central claim of symmetric robustness and large AUROC gains rests on the assumption that OOD samples reliably show higher local instability than ID samples when quantified from the same median-smoothing perturbations (abstract and §3). No adaptive attack evaluation targeting the full ROSS metric (smoothed score + instability term) is described; an adversary jointly optimizing against both components could potentially suppress the distinction, undermining the reported gains.
Authors: We appreciate this observation on the evaluation scope. Our experiments demonstrate that OOD samples exhibit higher instability under the median-smoothing perturbations used for the smoothed score, leading to symmetric robustness against both score-minimizing and score-maximizing attacks. However, we agree that joint adaptive attacks on the combined ROSS metric would provide stronger validation. In the revised manuscript, we have added such adaptive attack evaluations (using PGD-style optimization targeting both terms simultaneously) on CIFAR-10/100 and ImageNet, confirming that the performance gains hold with only minor degradation. revision: yes
-
Referee: Experimental details on attack implementations, baseline choices, statistical testing, and ablation of the instability component are insufficient to support the claimed improvements (abstract reports extensive experiments but lacks full attack protocols and variance reporting). This weakens the evidence for SOTA performance across CIFAR-10/100 and ImageNet.
Authors: We acknowledge the need for greater detail to support reproducibility and the SOTA claims. The revised manuscript now includes: (i) full attack protocols with hyperparameters, step counts, and perturbation budgets; (ii) rationale for baseline selections with additional comparisons; (iii) statistical testing including mean AUROC with standard deviations over 5 random seeds; and (iv) a dedicated ablation study isolating the instability term's contribution. These additions are placed in the experiments section and supplementary material. revision: yes
-
Referee: The paper does not provide a formal analysis or bound showing that the instability term is non-exploitable or that the synergistic combination preserves the observed separation under worst-case perturbations; the construction remains an empirical observation without reduction to a parameter-free or provably robust quantity.
Authors: We agree that the instability term is an empirical observation rather than a formally bounded quantity. The manuscript does not claim a parameter-free or provably robust construction; ROSS is presented as a practical, lightweight post-hoc method. In the revision, we have expanded §3 to discuss the empirical nature of the separation, added sensitivity analysis under varying perturbation strengths, and clarified limitations regarding worst-case exploitability. A rigorous theoretical bound remains an open direction for future work. revision: partial
Circularity Check
No significant circularity; empirical post-hoc construction with independent experimental validation.
full rationale
The paper's core construction applies median smoothing to baseline OOD scores and repurposes the generated noisy samples to measure local instability, observing (rather than deriving) that OOD samples exhibit higher instability. ROSS is then defined directly from this combined metric. No equations reduce the claimed AUROC gains or symmetric robustness to a fitted parameter, self-referential definition, or self-citation chain. The robustness results are presented as empirical outcomes across CIFAR-10/100 and ImageNet, not as predictions forced by the input assumptions. This matches the default expectation of a non-circular empirical method.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We apply median smoothing to baseline OOD detection scores... repurposed to quantify the local instability of the base score... σ_med (MAD)... S_ROSS = min(S95, Smed) + Δscore · (1 + λ / σmed)
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanalpha_pin_under_high_calibration unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ROSS achieves symmetric robustness... outperforming prior methods by up to 40 AUROC points
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Concrete Problems in AI Safety
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Chris- tiano, John Schulman, and Dan Man ´e. Concrete problems in ai safety.arXiv preprint arXiv:1606.06565, 2016. 1
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[2]
Anish Athalye, Nicholas Carlini, and David Wagner. Obfus- cated gradients give a false sense of security: Circumventing defenses to adversarial examples. InProceedings of the In- ternational Conference on Machine Learning (ICML), 2018. 2
work page 2018
-
[3]
Mohammad Azizmalayeri, Arshia Soltani Moakar, Arman Zarei, Reihaneh Zohrabi, Mohammad Taghi Manzuri, and Mohammad Hossein Rohban. Your out-of-distribution de- tection method is not robust! InAdvances in Neural Infor- mation Processing Systems (NeurIPS), 2022. 1, 3
work page 2022
-
[4]
ATOM: Robustifying out-of-distribution detection using outlier mining
Jiefeng Chen, Yixuan Li, Xi Wu, Yingyu Liang, and Somesh Jha. ATOM: Robustifying out-of-distribution detection using outlier mining. InMachine Learning and Knowledge Dis- covery in Databases. Research Track. Springer International Publishing, 2021. 3, 6
work page 2021
-
[5]
Robust out-of-distribution detection for neural net- works
Jiefeng Chen, Yixuan Li, Xi Wu, Yingyu Liang, and Somesh Jha. Robust out-of-distribution detection for neural net- works. InThe AAAI-22 Workshop on Adversarial Machine Learning and Beyond, 2021. 3
work page 2021
-
[6]
Yeh, Shaoshuai Mou, and Yan Gu
Wenxi Chen, Raymond A. Yeh, Shaoshuai Mou, and Yan Gu. Leveraging perturbation robustness to enhance out-of- distribution detection. InProceedings of the Computer Vi- sion and Pattern Recognition Conference (CVPR), 2025. 2, 3, 4, 6, 8, 1
work page 2025
-
[7]
De- tection as regression: Certified object detection with median smoothing
Ping-yeh Chiang, Michael Curry, Ahmed Abdelkader, Aounon Kumar, John Dickerson, and Tom Goldstein. De- tection as regression: Certified object detection with median smoothing. InAdvances in Neural Information Processing Systems (NeurIPS). Curran Associates, Inc., 2020. 5
work page 2020
-
[8]
Describing textures in the wild
Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing textures in the wild. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2014. 5
work page 2014
-
[9]
Jeremy Cohen, Elan Rosenfeld, and J. Zico Kolter. Certified adversarial robustness via randomized smoothing. InPro- ceedings of the International Conference on Machine Learn- ing (ICML), 2019. 5, 2
work page 2019
-
[10]
ImageNet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2009. 2, 5
work page 2009
-
[11]
Extremely simple activation shaping for out- of-distribution detection
Andrija Djurisic, Nebojsa Bozanic, Arjun Ashok, and Rosanne Liu. Extremely simple activation shaping for out- of-distribution detection. InProceedings of the International Conference on Learning Representations (ICLR), 2023. 3
work page 2023
-
[12]
On the limitations of stochastic pre-processing de- fenses
Yue Gao, Ilia Shumailov, Kassem Fawaz, and Nicolas Pa- pernot. On the limitations of stochastic pre-processing de- fenses. InAdvances in Neural Information Processing Sys- tems (NeurIPS), 2022. 2
work page 2022
-
[13]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville.Deep Learning. MIT Press, 2016. 3
work page 2016
-
[14]
Frank R Hampel. The influence curve and its role in robust estimation.Journal of the American Statistical Association,
-
[15]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR). IEEE, 2016. 6
work page 2016
-
[16]
A baseline for detecting misclassified and out-of-distribution examples in neural net- works
Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural net- works. InProceedings of the International Conference on Learning Representations (ICLR), 2017. 1, 2
work page 2017
-
[17]
Anna-Kathrin Kopetzki, Bertrand Charpentier, Daniel Z¨ugner, Sandhya Giri, and Stephan G ¨unnemann. Evalu- ating robustness of predictive uncertainty estimation: Are dirichlet-based models reliable? InProceedings of the Inter- national Conference on Machine Learning (ICML). PMLR,
-
[18]
A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. 2, 5
work page 2009
-
[19]
Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015
Yann Le and Xuan Yang. Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015. 5
work page 2015
- [20]
-
[21]
A simple unified framework for detecting out-of-distribution samples and adversarial attacks
Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. InAdvances in Neural In- formation Processing Systems (NeurIPS), 2018. 3
work page 2018
-
[22]
Shiyu Liang, Yixuan Li, and R. Srikant. Enhancing the re- liability of out-of-distribution image detection in neural net- works. InProceedings of the International Conference on Learning Representations (ICLR), 2018. 2, 3, 8
work page 2018
-
[23]
Changliu Liu, Tomer Arnon, Christopher Lazarus, Christo- pher Strong, Clark Barrett, and Mykel J. Kochenderfer. Al- gorithms for verifying deep neural networks.F oundations and Trends in Optimization, 2021. 1
work page 2021
-
[24]
Fast decision boundary based out- of-distribution detector
Litian Liu and Yao Qin. Fast decision boundary based out- of-distribution detector. InProceedings of the International Conference on Machine Learning (ICML), 2024. 3
work page 2024
-
[25]
Energy-based out-of-distribution detection
Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. Energy-based out-of-distribution detection. InAdvances in Neural Information Processing Systems (NeurIPS), 2020. 2
work page 2020
-
[26]
GEN: Pushing the limits of softmax-based out-of-distribution de- tection
Xixi Liu, Yaroslava Lochman, and Christopher Zach. GEN: Pushing the limits of softmax-based out-of-distribution de- tection. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2023. 2, 4, 6
work page 2023
-
[27]
Deciphering the definition of adversarial robust- ness for post-hoc ood detectors
Peter Lorenz, Mario Ruben Fernandez, Jens M ¨uller, and Ull- rich Koethe. Deciphering the definition of adversarial robust- ness for post-hoc ood detectors. InICML 2024 Next Gener- ation of AI Safety Workshop, 2024. 1, 3
work page 2024
-
[28]
Towards deep learning models resistant to adversarial attacks
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InProceedings of the International Conference on Learning Representations (ICLR), 2018. 6
work page 2018
-
[29]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bis- sacco, Bo Wu, and Andrew Y . Ng. Reading digits in natural images with unsupervised feature learning. InNIPS Work- shop on Deep Learning and Unsupervised Feature Learning,
-
[30]
Deep neural networks are easily fooled: High confidence predictions for unrecognizable images
Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. InProceedings of the Computer Vi- sion and Pattern Recognition Conference (CVPR), 2015. 2
work page 2015
-
[31]
Provably robust deep learning via adversarially trained smoothed clas- sifiers
Hadi Salman, Greg Yang, Jerry Li, Pengchuan Zhang, Huan Zhang, Ilya Razenshteyn, and S ´ebastien Bubeck. Provably robust deep learning via adversarially trained smoothed clas- sifiers. InAdvances in Neural Information Processing Sys- tems (NeurIPS), 2019. 2
work page 2019
-
[32]
Out-of-distribution detection using counterfactual distance
Maria Stoica, Francesco Leofante, and Alessio Lomuscio. Out-of-distribution detection using counterfactual distance. arXiv preprint arXiv:2508.10148, 2025. 3
-
[33]
ReAct: Out-of- distribution detection with rectified activations
Yiyou Sun, Chuan Guo, and Yixuan Li. ReAct: Out-of- distribution detection with rectified activations. InAdvances in Neural Information Processing Systems (NeurIPS). Cur- ran Associates, Inc., 2021. 3
work page 2021
-
[34]
Out-of- distribution detection with deep nearest neighbors
Yiyou Sun, Yifei Ming, Xiaojin Zhu, and Yixuan Li. Out-of- distribution detection with deep nearest neighbors. InPro- ceedings of the International Conference on Machine Learn- ing (ICML). PMLR, 2022. 3
work page 2022
-
[35]
Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, Wenxuan Peng, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, et al. OpenOOD: Benchmarking generalized out-of-distribution detection.Advances in Neu- ral Information Processing Systems (NeurIPS), 2022. 3
work page 2022
-
[36]
Jingyang Zhang, Jingkang Yang, Pengyun Wang, Haoqi Wang, Yueqian Lin, Haoran Zhang, Yiyou Sun, Xuefeng Du, Yixuan Li, Ziwei Liu, et al. OpenOOD v1.5: Enhanced benchmark for out-of-distribution detection.arXiv preprint arXiv:2306.09301, 2024. 5, 6
-
[37]
Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analy- sis and Machine Intelligence, 40(6):1452–1464, 2018. 5 A Robust Out-of-Distribution Detection Framework via Synergistic Smoothing Supplementary Material The appendix is organised as fol...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.